How Google Addresses Duplicate Content Due To Scrapers

Date June 18, 2008

Stumble it!

Enter your email address & Get Updates:

This is a topic that many bloggers and many webmasters ask, I am one of them. Sometimes, I found many websites and blogs copying my content like hell. This is one of the reason why my rss is just summary. In an interesting article, Sven Naumann from Google’s Search Quality Team, helps clarify the issues surrounding duplicate content. From Google Webmaster Central Blog:

Before diving in, I’d like to briefly touch on a concern webmasters often voice: in most cases a webmaster has no influence on third parties that scrape and redistribute content without the webmaster’s consent. We realize that this is not the fault of the affected webmaster, which in turn means that identical content showing up on several sites in itself is not inherently regarded as a violation of our webmaster guidelines. This simply leads to further processes with the intent of determining the original source of the content—something Google is quite good at, as in most cases the original content can be correctly identified, resulting in no negative effects for the site that originated the content.

Generally, we can differentiate between two major scenarios for issues related to duplicate content:

  • # Within-your-domain-duplicate-content, i.e. identical content which (often unintentionally) appears in more than one place on your site
  • # Cross-domain-duplicate-content, i.e. identical content of your site which appears (again, often unintentionally) on different external sites

Here are some webmaster points of view, read them all carefully and share with us yours:

  • I am not sure id agree with that. Why do scrapers still appear above the original site? Why do sites that have their content duplicated find they return to old positions when they get the offending sites to remove the duplicated text? They may get it right a lot of the time, perhaps within parameters they find acceptable, but the above suggests they get right almost all the time, im not so sure that’s accurate. Maybe i should qualify and say theres a big problem IMHO with large chunks of copied content rather than site for site or page for page.
  • Yes, there’s a good bit of happy face in the message - but given the amount of scraping that actually exists (have you looked at the amount of bot access that goes on in your server logs?)

    Google does get a big bunch of it right. But the situation is something like killing 99% of the mosquitos in a room and then hoping to take a rest. That 1% is still going to buzz in your ear! As I read the comments, there are three main reasons why the original might be outranked by a scraper:

    1. Unintentionally blocking googlebot from parts of your content
    2. Changes to the content after it was scraped
    3. The original is weakened in Google through some guidelines violations

    That second one is not often talked about - but I assume it can make your page look like the derivative page instead of the original.

  • Our biggest problem was (is) with scrapers that take our content and embed it on their site wrapped around their navigation, etc. They make a very good effort to masquerade as a legitimate site, which makes it very difficult for Google. Particularly when they are taking stories from our site on the same day we post them, and posting them on their site.

What is your experience about duplicate content and Google? Does Google detect Duplicate content in seconds? Or should they improve their algorithm a lot more than this?

Want One of the Cheapest and Affordable Hosting?



What Next?


 Subscribe To GoogleLady

 Digg It

 Save This Page

 Sphinn It

 Stumble it!

 Favorite This Post

 

Get Updates In Your Email

5 Responses to “How Google Addresses Duplicate Content Due To Scrapers”

  1. Mike said:

    I keep tabs on my feedburner stats a few times a week. Anytime I see I spike in my subscribers, I go and check for duplicate content… sure enough! there it is. I’ve been fortunate enough to outrank the scrapers so far. However, I am too concerned about this. I believe will get better, they always do.

  2. Solace said:

    @Mike: What do you mean your feedburner stats go up when you get scraped content? I don’t see the corrolation?

    I’ve had a few big jumps on my feedburner but not sure what it was from, maybe something similar?

  3. Mike said:

    Somebody new has subscribed to my feed… for the sole purpose of scraping my content. When my feedburner subscription rate increases for no good reason, I know there is likely to be a scraper stealing my content.

  4. Grizzly said:

    This won’t work for everyone but if you have a network then immediately post a high PR backlink from another site to your article as soon as it is posted. So far I have never had a scrapper outrank my posts.

    This may be common knowledge but for those who don’t know - always include a link in your posts pointing to your site - very few scrapers bother to remove or change links and you not only tell G who the original source is but you can pick up link juice from the scrapper. I gain a lot of backlinks (with optimized keyword anchors) from the scrappers - gotta love it.

  5. Tosin said:

    Hi,

    This is a sensitive topic.

    I have a question.

    What if it is a story you read and want to re-write in your own words? What happens then and what is it called?

    Thanks.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>