Scraping html is just showing text
-
Hi,
I’m trying to display the html version of what I’m scraping but no matter what I try it just seems to either display the text version or it displays the entire page in html.
I tried using shortcodes but the closing tags in my xpath were causing problems. I then tried installing a php plugin to allow live php in my pages and I’m still seeing the same problem:I can’t see much support for html output, have I got the usage wrong for html? I’m expecting it to just place the html into my document. For example, I am scraping a table and all I see coming back is the text. I was expecting the table html to come back and then for it to be displayed as a table.
Upon checking the source I can see this coming back:
<!-- Start of web scrap (created by wp-web-scraper) Source URL: https://full-time.thefa.com/DisplayTeam.do?teamID=1769059&divisionseason=616234725 Selector: Xpath: //*[@id="common.ui.team.displayteam.DisplayTeamForm"]/table[2] Delivered thru: Cache WPWS options: Array ( [postargs] => [cache] => 60 [user_agent] => WPWS bot (https://hartshead.tk) [timeout] => 2 [on_error] => error_hide [output] => html [clear_regex] => [clear_selector] => [replace_regex] => [replace_selector] => [replace_with] => [replace_selector_with] => [basehref] => [striptags] => [removetags] => [callback] => [debug] => 1 [htmldecode] => ) -->
This makes me think that I’ve got the right code to output html but that I may have the usage wrong or I may not understand what the plugin is doing
I have tried the following:
<?php echo wpws_get_content('https://full-time.thefa.com/DisplayTeam.do?teamID=1769059&divisionseason=616234725', '', '//*[@id="common.ui.team.displayteam.DisplayTeamForm"]/table[2]', '', '', '', '', 'html', '', 'ppfearn', '', '')?>
I also tried ‘output=html’ and ‘output=”html”‘ with the same results.
I’m a bit stuck at this point so any advice would be great.
Thanks
- The topic ‘Scraping html is just showing text’ is closed to new replies.