Remove Spam Content Google Has Indexed From Your Website

January 31, 2011

After you have cleaned up a hack that lead you website to have spam content on it, Google will likely still contain some of that content in their search index.

Spam Content on Existing Pages

If the spam was added to existing pages on your website, either by replacing the content of the page with spam when being served to Google or by hiding the content on the pages, Google will automatically remove the spam content the next times it refreshes the copy of the pages in it's index. There is no way to force Google to refresh the pages. It can take minutes, hours, days, weeks, or months for a page to be refreshed depending on a range of factors Google uses to determine how often to refresh a page. You may increase the refresh time slightly by having an updated XML sitemap that has been submitted to Google or a prominent link to the page that needs to be refreshed. You can also use the remove URL tool in Google's Webmaster Tools to remove the snippet displayed in the search results for a page while you wait for it to be refreshed.

Spam Pages

If the spam consists of spam pages created by the hacker there are two measures that you can take to speed up the process of them being removed.

Use the URL Removal Tool

If all the pages are located in a directory that does not contain any pages you want indexed you can use the Remove URL Tool in Google's Webmaster Tools to remove the directory from their index. The Remove URL tool is located in the "Crawler Access" page in the "Site configuration" section of the Webmaster Tools. For the removal to be successful you need to make the sure directory is blocked in your robots.txt before requesting the removal. You could also use the tool to remove individual pages, but this is impractical if there are many pages that have been indexed. In situations where the spam pages are created from the same file using a different URL parameter to define each page, example: www.example.com/page.php?a=cheap+viagra, you could not remove the pages by removing www.example.com/page.php instead you would need to remove each individual URL.

Serve a 410 (Gone) HTTP Status Code

Google primarily relies on the HTTP status code served when a page is requested to determine if a page has been removed from a website and should be removed from their index. You can use a tool such as Web-Sniffer to see what HTTP status code is being served for a page. You should be serving a 404 (Not Found) or 410 (Gone) when one of the spam pages is requested to tell Google that is has been removed. With most configurations you should be serving a 404 (Not Found) by default after you have deleted the code causing the spam pages. Serving a 410 (Gone) will cause Google to remove the pages slightly faster than 404 (Not Found), it will require configuring your website to serve that status code for the requested pages.


Related:

Services

Resources