WordPress SEO in 5 Minutes – What is Duplicate Content and What are the Ways to Fix It

Search engines continuously strive to provide users with the best search results. One of the problems search engines face while trying to index for pages is duplicate content. So, what is duplicate content and how does it affect your ranking? Our SEO experts from Perth will help you understand all about duplicate content and how to fix duplicate content issues on your website.

What is duplicate content?

Duplicate content is the content which appears on multiple URLs on the internet that is either similar or identical. Duplicate content can affect your ranking negatively. This is because Google cannot decide which URL to list, leading to lower ranks for both URLs. In extreme cases, your website might be entirely removed from the Google index, which will stop it from appearing in search results.

What are the issues caused by duplicate content?

  • Search engines cannot choose which version or versions of the content they should index or rank in search results. This will lead to your website losing visibility and ranking.
  • Search engines will be confused which content to assign link equity to.
  • With duplicate content, the inbound links from other users are dispersed to multiple pages instead of pointing to one. This leads to diluting of link equity for all versions.

What are the causes of duplicate content?

WWW vs non-WWW and HTTP vs HTTPS

When you have accessible versions of the same website, www and non-www (or http and https), it becomes duplicate for each of the webpages.

Session IDs

In online stores, the visitor’s history is tracked using sessions. The sessions need to be stored, and usually, it is done by using cookies. But, search engines often don’t store cookies. So, as an alternative, some systems use unique identifiers called Session IDs to differentiate sessions.

All session IDs are unique, and every internal link on the website gets that session’s Session ID added to its URL. This leads to creating a new URL and thereby duplicating content.

Order of URL variables

The order of URL parameters in CMS can lead to duplicate content. CMS creates URLs like /?P1=1&P2=2 or /?P2=2&P1=1 where “P1” represents the parameter 1, and “P2” is the parameter 2. Even though both the URLs give the same results in most cases, the search engine treats them as separate URLs.

URL variables

URL variables used for tracking and sorting can cause duplicate content. For example, look at the URLs below.
https://www.example.com/product-1?
https://www.example.com/product-1?source=rss
Both these URLs lead to the same page. However, search engines cannot discern this and will treat these URLs as two different pages containing duplicate content.

Scraped or Copied Content

Duplicate content can also be caused by other websites copying your content. These websites don’t always ask your approval to use your content or link to your original content, making the search engine consider it as a duplicate content.

Likewise, in e-commerce sites, if multiple online shops sell the same products, they tend to use the brand’s original description in their sites for those products. This will lead to the appearance of identical contents on different websites.

Comment pagination

Many CMS paginate the comments. This leads to the creation of duplicate content because the paginated comments will have different URLs.
E.g., article URL & article URL/comment-page-1/

Printer-friendly versions

Printer-friendly versions created by CMS can also cause duplicate content issues when both original and printer friendly version get indexed. Unless you particularly block these versions, they will be indexed by Google.

Best practices to fix duplicate content issues

Here are the best practices you can implement to reduce duplicate content on your website.

  • WWW or non-WWW issue can be solved by focusing on one version and redirecting the other to the focused website.
  • Disable Session IDs.
  • Build a coding script to determine an order for putting parameters and ensure you follow it everywhere on your website.
  • Instead of using URL parameters for tracking, use hash tag-based tracking.
  • Disable comment pagination in WordPress.
  • WordPress automatically generates tag and category pages. Add “noindex” tag to these pages to exclude them from indexing by search engines.
  • Block printer-friendly pages; use a print style sheet instead.
  • Redirect any duplicate version of the content to the original version using 301 redirects.
  • In case you have many similar pages, try to either expand each page or consolidate the pages into one.
  • When you syndicate content on other websites ensure that each site on which the content is syndicated carries a link back to your original content. So, Google knows which content it should index.
  • Use the canonical tag “rel=canonical” to inform search engines which page they need to include for indexing and which they should exclude.

These are some of the best practices to fix duplicate content issues. Duplication can happen by error. So being consistently on the lookout can help you prevent most of it from occurring. Fixing duplicate content is necessary to ensure content quality and good rankings in SERPs. For more information about duplicate content and other SEO related queries, contact us or email at hello@codesquad.cloud. Our SEO experts are always ready to assist you with your SEO queries.

Codesquad is a member of The Computing Australia Group of Companies.

Jargon Buster

Content Management System – CMS – Software used to create and manage digital content.
Universal Resource Locator – URL – The web address of a particular webpage or file on the Internet. It consists of the protocol, the domain name, and additional path information.