• Hi guys!

    The company I develop for asked me to do something I’m not entirely sure is possible. In short, I was asked to remove all HTML markup, CSS and JS, and anything that is not XML or plain text from our site’s RSS feed.

    How can I go about doing this? Is it even possible? Is there a plugin that generates markup-less RSS feeds?

    Thanks in advance!

Viewing 6 replies - 1 through 6 (of 6 total)
  • Site url?

    Thread Starter Larsmir

    (@larsmir)

    Oh, right, sorry. It’s https://blogs.disneylatino.com/

    The feed at https://blogs.disneylatino.com/feed/ is perfectly valid. Any RSS aggregator should be able to parse it and remove anything that is not needed. What is this for?

    Thread Starter Larsmir

    (@larsmir)

    To quote the person at my company who requested this:

    “In the description and body content of each item, would you be able to remove any HTML markup, javascript, CSS, etc? In the past, we’ve crawled feeds including these, but those feeds tend to break when some random sequence of characters get introduced, which negatively affects search indexing.”

    So, while the RSS feed is valid, the company’s internal search function seems to occasionally break the feed if it finds markup, javascript, or CSS. I know it’s a strange and very specific request.

    You would need to create this special feed yourself – although, frankly, I think it’s the internal search or indexing that needs sorting.

    Thread Starter Larsmir

    (@larsmir)

    Yeah, I fully agree that it’s the search/crawler that needs fixing, rather than jury-rigging every RSS feed.

    I was just reading through that entry in the Codex – any functions or snippets of code I could use to avoid outputting the non-XML portions of the feed?

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘Removing markup on RSS feed’ is closed to new replies.