What types of duplicate content exist in Magento, and what are the most prevalent examples? Partial duplicates refer to instances where only a small portion of the content or its layout is distinct.

Below are the typical instances of partial duplicates found in Magento stores, each link providing a detailed overview upon clicking:

  1. Products Sorting
  2. Pagination
  3. Variations of the Same Product

The primary example of full duplicates in Magento is having an identical product listed in multiple categories.

 

A canonical URL in Magento 2 is a designated web address selected as the 'preferred' one for search engine indexation. It helps address the issue of duplicate or highly similar content across different pages. The Magento 2 canonical tag, often referred to as an HTML attribute, is applied to web pages to guide search engines in determining the primary source of content and assigning search value.

For instance, consider multiple pages for the same product:

  • example.com/dresses/blackzaradress.html
  • example.com/occasions/blackzaradress.html
  • example.com/color/black/blackzaradress.html
  • example.com/blackzaradress.html

Without the use of canonical URLs in Magento 2, search engines might autonomously choose one of these pages as the canonical version, deeming it the most relevant. This lack of control over the choice could lead to suboptimal outcomes. By implementing canonical URLs and setting up 301 redirects for all other variations, you can effectively inform search engines about the preferred canonical page, ensuring better control and accurate indexing.
 
To determine the canonical URL for a page, you can use Google's URL Inspection tool. Here are some key points to consider when using this tool:

 

  • Ownership of the URL:
    • You need to own the URL you want to test.
    • Ensure that you are using the correct account for the test.
  • Handling Duplicates:
    • The tool provides information about the canonical URL in the report if the tested page has duplicates, but only if the canonical URL also belongs to you.
  • Testing AMP and non-AMP URLs:
    • The tool allows testing both AMP and non-AMP URLs.
  • URL Inspection Tool Information:
    • Visit the official page for more detailed information about the URL Inspection tool.

If you encounter a situation where the canonical URL is in a property you don't own, it could be due to various reasons:

  • Mistakes in Site Content Localization:
    • Check the official localization guidelines to address this issue.
  • Incorrect Canonical Tags:
    • Learn how to set up a canonical URL in Magento 2 to ensure correct implementation.
  • Incorrect Server Settings:
    • Contact your hosting provider to resolve any issues related to server settings.
  • Hacker Attack:
    • Malicious activities, such as a hacker attack, might involve the use of 301 redirects or cross-domain rel=”canonical” links. Ensure the security of your website and address any potential breaches.
  • External Websites Copying Content:
    • If external websites are copying your content and incorrectly setting canonical URLs, you can submit a request to Google after confirming the unauthorized use.

By addressing these considerations and using the URL Inspection tool, you can gain insights into the canonical URL for a given page and take appropriate actions to optimize your site's performance in search engines.
 
 

Additional methods on how to specify a canonical page
In addition to the methods described above, there are several other options for how to mark a link as canonical:

  • rel=canonical <link> tag - Add this tag with the canonical link in the code for duplicate pages.
  • rel=canonical HTTP header - Send a rel=canonical header in your page response.
  • Sitemap - Define canonical URLs in a sitemap.
  • 301 redirect - Set up the 301 redirect to indicate the canonical page for Googlebot if the duplicate page is out to date.

 

How to set canonical URLs in Magento 2 (not programmatically)?

  • Log in to the Admin Panel, go to Stores>Settings>Configuration:

  • Expand the Catalog drop-down menu and choose Catalog. Then open the Search Engine Optimization section:

Make the next changes:
If you need Google (or any other search engine) to index the pages with complete category URL path only, make the changes:
Use Canonical Link Meta Tag for Categories – ‘Yes’;
Use Canonical Link Meta Tag for Products – ‘No’;

If you want Google (or any other search engine) to index the product pages only, complete the next settings:
Use Canonical Link Meta Tag for Categories – ‘No’;
Use Canonical Link Meta Tag for Products – ‘Yes’;

If you want Google (or any other search engine) to index categories and products, enable both options:
Use Canonical Link Meta Tag for Categories – ‘Yes’;
Use Canonical Link Meta Tag for Products – ‘Yes’;

 

Don’t forget to save the changes and clear the cache at the end. Or you can try out one of the Magento 2 canonical plugins.

 

How can you address duplicate content aside from the methods discussed earlier? Consider the scenario where a single product appears in multiple categories, resulting in distinct URLs like:

Despite there being only one necklace, there are three different URLs. In the context of Magento duplicate product URLs, Google may perceive even exceptional products as thin content. This seems unjust! Ensure your remarkable products receive the recognition they deserve from Google by imparting uniqueness to their representations.

 

Remove category from URL
Alternatively, you can remove the category path from the URL, so that each product will have only one address no matter in how many categories it can be found:

http://www.site.com/necklace.html

 

Leave only one category path in a product URL
If you have a red T-Shirt in 2 categories at once: T-Shirts and New, you can choose which category to use in the URL: either the longest one (T-Shirt) or the shortest one (new). This is possible with Unique Product URL extension.

 

Partial duplicates in Magento
As we already mentioned above, there are partial and full duplicates of content. All of the types you can resolve with canonicalization. But there are other options for how you can deal with them depending on where the issue appears.

 

Enabling users to sort products in your store based on various criteria such as bestsellers, newest items, Magento 2 price filters, and the number of reviews is a valuable feature. Providing options for users to determine the number of products displayed per page, whether it's 20, 50, or 100, enhances the user experience. However, the challenge arises when these sorting options result in URLs with different characters (?, =, |), as illustrated in the following examples:

  • http://site.co.uk/category/products.htm?sortby=total_reviews|desc
  • http://site.co.uk/category/products.htm?sortby=total_reviews|asc
  • http://site.co.uk/category/products.htm?sortby=relevance|desc

The issue arises when these sorting pages get indexed and cached by Google. The proliferation of such pages, potentially numbering in the thousands, means that Google crawlers spend valuable time indexing them. This, in turn, diverts resources that could be better utilized for indexing more crucial pages on your site, such as categories and products. Finding a solution to prevent unnecessary indexing of these sorting pages becomes essential to optimize the efficiency of Google's crawling resources.