Using Links Finder source with sitemap.xml files
-
I’m trying to load existing content from a website using the links finder, but I don’t want to make a list of lots and lots of container URLs. Instead, I would like to use the sitemap.xml file. It has all the links already listed but for some reason I can’t make it work.
Here is a sample wordpress.com generated sitemap.xml
https://citizenwells.com/sitemap.xml
As you can see it would be very useful to be able to process these because they have:
a) All the links.
b) URL to featured image.
c) Last modification dateFrom this you could even store the xml page and so when you process it again, you can skip anything that is old (because you have the date) and only process the latest links (new and/or modified). For the modified you can update the existing post, or just skip.
The HMTL parser could then get title etc from the page.
This becomes an easy way to load existing content from most sites , and also to load new content for sites without RSS feeds. I think it would be easier than using the normal links finder in many cases.
Almost all websites have a sitemap.xml because that is what google uses to index a website. And all sitemap.xml files have the link and last modified date, some may have image urls. More information is here:
- The topic ‘Using Links Finder source with sitemap.xml files’ is closed to new replies.