Sunday, May 5, 2024
 Popular · Latest · Hot · Upcoming
3
rated 0 times [  3] [ 0]  / answers: 1 / hits: 1614  / 3 Years ago, thu, june 17, 2021, 10:56:19

When I use a browser to save this page:
http://maine.craigslist.org/fuo/
the links are saved in such a way that they link to content.
like this:
href="http://maine.craigslist.org/fuo/4323535885.html"



when I try to use wget, the links are



$ wget --no-parent maine.craigslist.org/fuo


saved as:
href="/fuo/4305913395.html"



I have tried options:



--spider
--page-requisites
--user-agent="Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:27.0) Gecko/20100101 Firefox/27.0"


but the links all come out without the url attached.



I have the rest of the script working, to parse out my location, and make a new list of links for furniture in my area. But I cannot figure out how to get the same output as I get when I save the page via firefox.



I thought using wget would be simplest. Perhaps that isnt right. If I can achieve the same effect using some other software, so long as i can write a script to make it work, I will be happy.


More From » wget

 Answers
1

The --convert-links option should do what you're looking for:



wget --convert-links --no-parent maine.craigslist.org/fuo


More information about this option and what it does is below (copied from man wget):



   --convert-links
After the download is complete, convert the links in the document
to make them suitable for local viewing. This affects not only the
visible hyperlinks, but any part of the document that links to
external content, such as embedded images, links to style sheets,
hyperlinks to non-HTML content, etc.

Each link will be changed in one of the two ways:

· The links to files that have been downloaded by Wget will be
changed to refer to the file they point to as a relative link.

Example: if the downloaded file /foo/doc.html links to
/bar/img.gif, also downloaded, then the link in doc.html will
be modified to point to ../bar/img.gif. This kind of
transformation works reliably for arbitrary combinations of
directories.

· The links to files that have not been downloaded by Wget will
be changed to include host name and absolute path of the
location they point to.

Example: if the downloaded file /foo/doc.html links to
/bar/img.gif (or to ../bar/img.gif), then the link in doc.html
will be modified to point to http://hostname/bar/img.gif.

Because of this, local browsing works reliably: if a linked file
was downloaded, the link will refer to its local name; if it was
not downloaded, the link will refer to its full Internet address
rather than presenting a broken link. The fact that the former
links are converted to relative links ensures that you can move the
downloaded hierarchy to another directory.

[#26930] Friday, June 18, 2021, 3 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
mentpengu

Total Points: 148
Total Questions: 114
Total Answers: 119

Location: Anguilla
Member since Sun, Aug 7, 2022
2 Years ago
mentpengu questions
Sun, Apr 17, 22, 18:09, 2 Years ago
Fri, Aug 12, 22, 01:35, 2 Years ago
Tue, Jul 26, 22, 14:52, 2 Years ago
;