Typically the canonical tag is deployed to the <head> section of a website, ex:
<link rel="canonical" href="https://geoffkenyon.com/how-to-add-canonical-tag-to-http-headers">
This works fine for most use cases and there are many plugins for popular CMS platforms to help you do this so that you don’t even need to request any support from your development team to implement the canonical tags. That said, there are two use cases where the standard <head> canonical tag simply doesn’t work:
- When the URL doesn’t have a head section
- When you aren’t able to make changes to the <head> section due to CMS limitations
URLs that don’t have head sections are going to be files, not pages, such as PDFs, images, word docs, etc. Since they are not HTML documents, they inherently don’t have <head> sections. When you find yourself in this situation you’ll want to add canonical tags to the HTTP headers of your URL.
Why Canonical a PDF?
There are a couple reasons why you may want to add the canonical tag to the HTTP headers. The first is to rank an HTML version of a PDF file. Often PDFs will get a lot of links since it can be more convenient than the HTML equivalent. This will typically result in the PDF outranking the HTML page which often presents problems:
- You can’t track sessions to PDFs in analytics
- You can’t tag visitors to the PDF with remarketing code
- Often PDFs are email gated; when the PDF ranks in Google, this is rendered ineffective
In addition to the above problems, having a PDF and HTML version of a page means that when people link to both versions of the content, your link equity is split between the two URLs. Consolidating link equity can be done via the canonical tag and will help your content to rank as well as possible.
The HTTP header canonical tag can help solve all of the problems above. If you need to remove a document from the search results as quickly as possible, the x-robots noindex tag will be a better solution for you. Though you miss the benefit of link consolidation onto a single URL, the noindex is a directive whereas the canonical is a strong suggestion. Learn more about how to add the x-robots noindex tag to your headers here.
How to Implement The Canonical Tag in HTTP Headers
First off, you will need access to the .htaccess files. To add in the canonical tag for a specific file, such as “seo-guide.pdf”, add the following to the .htaccess file:
<Files "seo-guide.pdf"> Header add Link "< http://guides.geoffkenyon.com/ecommerce/ >; rel=\"canonical\"" </Files>
In the HTTP headers, this should render as:
Link: < https://geoffkenyon.com/learn/ >; rel="canonical"
If you use a consistent naming structure for all of the files that you want to canonical and their corresponding pages, you can create a rule in .htaccess to systematically create canonical tags pointing to HTML equivalents.
RewriteRule ([^/]+)\.pdf$ - [E=FILENAME:$1] <FilesMatch "\.pdf$"> Header add Link "< https://geoffkenyon.com/uploads/%{FILENAME}e.html >; rel=\"canonical\"" </FilesMatch>
PRO Tips
A couple things that took me a few tries to figure out:
- When specifying the file name, don’t add the directory, simply specify the actual name of the file (think the slug of a URL not the whole path)
- There is a space between “< and your canonical URL as well as the closing > of the canonical link target
- Always test on a staging server so you don’t cause your production server to start throwing 500 responses
Getting Google to Find the Canonical Tag
While the “if you build it, they will come” philosophy may have worked in field of dreams, it rarely works well with Google. In order to get Google to pickup the canonical and then start ranking the HTML equivalent, you should give Googlebot some help.
Go into Google Search Console and open up the fetch and render tool. Tell Googlebot to fetch and render the URLs where you have added in the canonical tag. Once this is done, click the button to request that Googlebot crawls the URLs you submitted. This will help Google find the new canonicals as quickly as possible and jumpstart the canonicalization process.
How to Add the Canonical Tag to HTTP Headers
- Open up your .hataccess file
- Add in the Header add Link command to create the canonical tag
- Submit the URL to Google Search Console Fetch & Render to get the canonical crawled
Thanks for one piece of critical info!
“3. Submit the URL to Google Search Console Fetch & Render to get the canonical crawled”
I have been trying to test my “Header add Link” in my .htaccess file for 2 days. I could not really figure out to use the “Live HTTP Headers” browser extension (which others recommended), so could not see if adding the canonical header to my pdf file fetch was working. But checking the fetch in my Google Search Console revealed that the “Header add Link… rel=canonical” was working! thanks.