• Resolved patrickkidd

    (@patrickkidd)


    My site was infected by malware that added a div like <div style=”position:absolute; left: -3000px” … > … </div> to post_content for 381 posts. It would take days to remove them manually. My client can’t afford to pay me to do that and they are not technical enough to parse and remove the text themselves.

    Does anyone know of a way to scrape and remove html based on a text string or DOM query using SQL?

    Thanks!

Viewing 5 replies - 1 through 5 (of 5 total)
  • One possible solution:

    Export all database tables from phpMyAdmin.

    Open database file in a text editor such as Notepad.

    Do a Find and Replace

    Save and upload the database file back to the web server.

    Thread Starter patrickkidd

    (@patrickkidd)

    Good call. I’m no grep / sed expert, but I wonder if there is a text scraper that can find html in a text file…

    Easiest way would be to import a backup of database – if you have backupbuddy or some other backup plugin there’s likely something that you can use.

    Manually clearing these types of links really sucks and takes a long long time. IMO worst / most time consuming type of infection to remove.

    Depending on how randomized these links are I would suggest making a database backup (MAKE SURE TO DO THIS FIRST!!!) and then trying a few SQL commands:

    UPDATE wp_posts SET post_content = replace(post_content, ‘put copy of spam link here’,”)

    If the links are randomized, though, then the only option is to remove manually ??

    Best way to do this is to search database for this:

    left: –

    This will show you all of the links. From there, you can copy the contents of the infected tables (one by one) to a local html file, so the links are highlighted and you can see them much better.

    Really speeds up the process ?? Good luck!

    Thread Starter patrickkidd

    (@patrickkidd)

    rngdmstr: That’s a good call and basically what I did in the end.

    I’ll put this here in case others have the same problem. Luckily the malware used the same pattern throughout it’s content. It always included a single ”'<div style=”position: absolute; left: “”’ filled with anchor tags and ending with a single div. This SQL snipped did the trick:

    update wp_posts
    set post_content = CONCAT(LEFT(post_content, LOCATE('<div style="position:absolute; left:', post_content)-1),  SUBSTRING(post_content, LOCATE('</div>', post_content)+7))
    where LOCATE('<div style="position:absolute; left:', post_content) > 0

    The malware had run twice and even stepped on it’s own entries so I had to run it twice which got 381 entries. That left 22 that I had to fix manually but that only took a few minutes.

    Thanks!

    Amazing! Nice work. Glad to hear it’s all sorted ??

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Scraping post_content to remove html elements from malware’ is closed to new replies.