• Resolved kaj69

    (@kaj69)


    Hi,
    I’m having TSF to create the sitemap (“Output Sitemap” and “Add sitemap locations in robots” are checked). I can view the sitemap when clicking “The sitemap can be found here”.

    But, when doing a Site Audit using SEMrush I receive an error, “sitemap.xml file has format errors”. I have asked SEMrush to provide more detailed info of what is actually wrong and received the answer:

    Here is the checking tool on which our sitemap checker is based:
    https://www.freeformatter.com/xml-validator-xsd.html
    
    Unfortunately the page https://www.wpsupport.se/sitemap.xml is not available to view at the moment.
    
    curl -i -sS -L --max-time 5 -A  'Mozilla/5.0 ooglebot/2.1; +https://www.google.com/bot.html)' https://www.wpsupport.se/sitemap.xml
    HTTP/1.1 451 Unavailable For Legal Reasons
    Date: Wed, 10 May 2017 12:18:34 GMT
    Server: Apache
    X-Powered-By: PHP/7.0.18
    Cache-Control: max-age=0
    Expires: Wed, 10 May 2017 12:18:34 GMT
    Vary: Accept-Encoding
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=UTF-8

    If I manually paste the TSF-sitemap for validation at freeformatter.com then errors are shown.

    If I remove some lines in the beginning, leaving the following:

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">
    <!-- Sitemap is generated on 2017-05-09 10:10:43 -->
    	<url>

    then the validation is passed.

    While searching the TSF-forum I’ve found older posts saying that TSF works well with “Google XML Sitemap” and in the description of TSF it says that it “creates a sitemap with all your pages, posts…”

    As I’m using SEMrush’s Site Audit to show my clients the technical SEO-status, is there anything I can do to eliminate the error message of “Sitemap has format errors”?

    Are other sitemap plugins required (i.e. Google XML Sitemap) or is TSF sufficiently today?

Viewing 5 replies - 1 through 5 (of 5 total)
  • Plugin Author Sybre Waaijer

    (@cybr)

    Hi @kaj69,

    What you’re describing is unlikely with TSF (it tries to defend itself), though it’s very common in general when other plugins extravagate with power beyond their capabilities.

    When that happens, these can be any of the causes:

    1. A minification plugin minifies the sitemap incorrectly. The sitemap is already minified…
    2. Some plugins tend to add incorrect headers. The dynamic sitemap should act as an XML file. Not a webpage.
    3. Some plugins write PHP outside of guideline standards, causing whitepsace to form where it shouldn’t. (i.e. stray PHP opening/closing tags)
    4. Firewall plugins are too easily to set up incorrectly so they eradicate everything in their path.
    5. Firewalls should be set up at the server level. Good WP hosting companies generally do this with something like modsecurity.

    I’m highlighting the firewall because I got this trying to check your sitemap:
    WordFence too strict

    About the sitemap:
    The sitemap of TSF keeps Search Engines up-to-date with the latest post/page/product/cpt changes of your website. It’s not required to get your website indexed or ranked; in fact, it has zero effect on ranking at all.

    If your website’s properly built, Search Engines should find updated or new content eventually without any help from a sitemap. Just like a visitor would — or rather, should.

    • This reply was modified 7 years, 6 months ago by Sybre Waaijer. Reason: added sitemap explanation
    • This reply was modified 7 years, 6 months ago by Sybre Waaijer.
    Thread Starter kaj69

    (@kaj69)

    @cybr

    You’re absolutely correct – the firewall was configured a little bit too tight, I have now changed it.

    I’m also using the cache plug-in “WP Rocket” where I had ticked the options to minify HTML, CSS and JS but I have now removed all those items.

    I even deactivated all plugins except TSF and ran the Site Audit.

    But, the result was the same in SEMrush: Sitemap has format errors.

    And, when using the tool SEMrush uses, the https://www.freeformatter.com/xml-validator-xsd.html, and pasting in the sitemap then I receive:

    S4s-elt-character: Non-whitespace Characters Are Not Allowed In Schema Elements Other Than 'xs:appinfo' And 'xs:documentation'. Saw 'Document Moved'., Line '1', Column '28'.
    The Markup In The Document Following The Root Element Must Be Well-formed., Line '2', Column '2'.
    The Markup In The Document Following The Root Element Must Be Well-formed.

    If I manually remove some of the lines in the beginning of the file (see above what is kept) then it passed the validation, howcome?

    Plugin Author Sybre Waaijer

    (@cybr)

    Hi @kaj69,

    Awesome, it works now ??

    Note that SEMrush might maintain sitemap cache of the blocked version; the contents of sitemaps can be massive and I can assume that they want to spare their servers’ resources.

    I’m also using the cache plug-in “WP Rocket” where I had ticked the options to minify HTML, CSS and JS but I have now removed all those items.

    That’s odd, last time I tested WP Rocket (it was at v2.9.0) it worked perfectly. I’ll check it out!

    About “Document Moved”:
    Well, we now know that they use IIS ?? Other than that, the tool simply doesn’t work.
    The sitemap protocol TSF uses is 0.9, which has been around for a long time.
    Version 0.84 (prior to 0.9) has been deprecated since 2012, and they might just still check for that.

    The only way to truly validate the sitemap is at the websites that consume it.
    Google: https://www.google.com/webmasters/tools/sitemap-list
    Bing: https://www.bing.com/webmaster/configure/sitemaps/
    Yandex: https://webmaster.yandex.com/tools/sitemap/

    If any of those fail, then it’s an issue; otherwise, we must assume that the alternative checker used is invalid.

    Thread Starter kaj69

    (@kaj69)

    Hi @cybr

    Once again, thanks for your rapid reply. You write “the only way to truly validate…” and Google actually says “No errors found” so I second your assumption that the alternative checker used is invalid.

    I will re-activate “WP Rocket” (v2.9.11) and submit the sitemap to search engines once again and do a follow-up at each engine within a couple of weeks.

    BTW, I’m very impressed of your support and the speed of your replies – very professional!

    Thanks once again.

    Plugin Author Sybre Waaijer

    (@cybr)

    @kaj69 No problem! If you have any more questions, feel free to ask ??

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Sitemap has format errors?’ is closed to new replies.