Duplicate content refers to pages that do not have unique content. It does not matter whether it is the same or a different domain. Especially for search engines, duplicate content is a critical signal that can have a negative impact on ranking and even crawling. You can find out how here.
Not all content with similar text is duplicate content
Duplicate content can be roughly divided into two categories. On the one hand, there is content that is exactly the same and has been duplicated. Such texts are unchanged copies of the original. On the other hand, even slightly modified texts that are still very close to the original are considered near duplicate content and run the risk of achieving a lower ranking. However, this does not mean that every piece of content has to be put through its paces. If you write the text yourself, you usually don’t have to worry about duplicate content.
Score with Google and the user with unique content
There is a lot of so-called unique content on the Internet, which rarely deals with a unique topic, but is usually found only once in writing style & form. Both for the users and for search engines unique content is better, because here some factors are considered as given:
- The author has dealt with the subject
- The text contains new or additional information
- Content is explained better, more precisely or faster, etc.
If the content is copied one-to-one, these assumptions no longer apply. Instead, duplicate content is rated as bad by search engines. This violates a fundamental formula for SEO: E-A-T. This formula was introduced in the course of the Medic-Google updates. Expertise – Authority – Trustworthiness are the factors with which your page is evaluated. With E-A-T optimized texts there are very good chances that the page will achieve a good ranking in the search results. Duplicate content, on the other hand, makes the page look as if the author has neither expertise nor authority, so that trust is lost at the same time. For this reason, it can also happen that Google does not display the duplicate in the search results. After all, the exact same content is already in the index – so why should Google show it twice in the search results?
Not only the playout and the ranking of the page suffer from duplicate content. The frequency of crawling by search engine bots can also decrease. Crawlers prefer pages that are popular, have fresh content or are updated frequently. Accordingly, it can take a long time for the Googlebot to revisit the pages with duplicate content to possibly index new content. So fixing a duplicate content error can be a lengthy process.
Find duplicate content
For duplicate content that might exist on other pages, there are several tools. Large providers like Ryte have integrated content checkers that can even detect and analyze near duplicate content. However, the CopyScape tool is probably the best-known tool for tracking down content thieves and also screening your own pages for duplicate text.
For your own content, you can also use the Google Search Console. Here, under the Coverage tab and the Excluded button, you can view the duplicate content classified by Google. A look can be worthwhile to discover possible duplicate content that is actually not wanted. Because sometimes it is almost impossible to avoid using the same content more than once. Especially online stores with many and similar products know the problem. There are ways and means to avoid being penalized by Google and still not having to write new texts for thousands of articles.
Avoid duplicate content
The easiest way is to always write new and different texts. This increases the user experience and you may rank for more keywords. However, as already mentioned, this is not always possible. In such a case the canonical tag can help. This tag indicates in the head of each page that the content on the page is a copy and the original can be found on another page.
Another possibility is either a noindex tag or blocking the URL in robots.txt. This way, the page does not end up in the index and it is theoretically irrelevant whether there is duplicate content on it or not. However, you should note that the Googlebot will not crawl the page at all if you disallow it in robots.txt. Also, pages that are permanently on noindex may not be visited by the crawler after a certain time, according to Google.
The best thing to do is still to try to avoid duplicate content as much as possible. This way there is less risk that Google will penalize the page and at the same time the pages may rank for more keywords. This is especially true for category or blog pages, where the content should always be as unique content as possible. This is less relevant for product texts, as these usually appear in the search results due to their specific product name and not due to the text.
Questions about duplicate content:
What is duplicate content?Duplicate content is pages that contain the same or very similar text as another page. It does not matter if the content is duplicated internally or on different domains.
Where do you find duplicate content?Duplicate content is either on your own site (e.g. in the form of product texts) or on external websites that may have simply copied your content. In both cases, your ranking in the search results can suffer from the duplicate content.
Is duplicate content a problem?In short, yes. Due to duplicate content, the content of the page is evaluated negatively by crawlers and in the worst case the page does not appear in search results.