XML sitemaps are pretty well known within the SEO community but their purpose and value are often misunderstood, so let’s start with the basics. In this post, we’ll look at how to create XML sitemaps strategically. Shortly I’ll be posting a follow up post of XML sitemap generators to help you actually create your XML sitemaps once you’ve developed your strategy. I’ll touch on several different aspects of XML sitemaps here, feel free to jump around based on your background and familiarity with XML sitemaps:
- What are XML sitemaps
- XML sitemap protocols
- XML sitemap examples
- Alternative XML sitemap format
- What are XML sitemaps used for
- How to create XML sitemaps (strategy)
- XML sitemap generators (upcoming post)
What are XML Sitemaps
XML sitemaps are files which list URLs on your site that you want search engines to crawl and index. You can specify your XML sitemaps to search engines in your robots.txt file or in Google Search Console/Bing Webmaster Tools. XML sitemaps may contain lists of URLs on your site or large enterprise sites may create a XML sitemap index file which links to other XML sitemaps which actually contain the URLs you want indexed.
This post will look at setting up XML sitemaps on your site to improve the indexation of your pages and diagnose indexation problems. That said, there are several other types of XML sitemaps that warrant their own research if they are relevant to you:
XML Sitemap Protocols
- XML sitemaps may contain a maximum of 50,000 URLs or 50MB in size
- XML Sitemaps may be compressed using gzip, but uncompressed they may not be larger than 50MB
- In order to submit more than 50,000 URLs you must submit multiple URLs – when this is done each XML sitemap should be listed in an XML index sitemap
- All XML files must open with a <urlset> tag, end with a </urlset> tag, declare the protocol standard in the <urlset> tag, include a <url> tag for each URL and include a <loc> tag for each parent <url> tag
- All XML sitemap files must being with an opening <sitemapindex> tag and end with a </sitemapindex>; include a <sitemap> tag for each individual sitemap, and include a <loc> tag for each child of a parent <sitemap> tag
It is worth noting that XML sitemaps have location restrictions – an XML sitemap can only contain URLs that are under the given taxonomy. For example, an XML sitemap located at www.site.com/category/sitemap.xml could only have URLs stemming from the /category/ directory. Similarly you would need to create different XML sitemaps for different subdomains and host them on the given subdomain. URLs must also use the same protocol (http / https) as the XML sitemap.
Example XML Sitemap File
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc> https://www.site.com/product-name1.html </loc> </url> <url> <loc> https://www.site.com/product-name2.html </loc> </url> <url> <loc> https://www.site.com/product-name3.html </loc> </url> </urlset>
Example XML Sitemap Index File
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://www.site.com/products.xml</loc> <lastmod>2017-01-01</lastmod> </sitemap> <sitemap> <loc>https://www.site.com/categories.xml</loc> <lastmod>2017-01-03</lastmod> </sitemap> <sitemap> <loc>https://www.site.com/locations.xml</loc> <lastmod>2017-01-05</lastmod> </sitemap> <sitemap> <loc>https://www.site.com/landingpages.xml</loc> <lastmod>2017-01-01</lastmod> </sitemap> </sitemapindex>
Alternative XML Sitemap Format
Alternatively, you can create an XML sitemap in a text file that simply contains one URL per line. Protocols for these XML sitemaps are:
- URLs cannot contain embedded new lines
- You must declare the entire URL, including https/https
- These XML sitemaps must also contain a maximum of 50,000 URLs or be no larger than 50MB
- The text file needs to be encoded as UTF-8
- No other information other than the URLs can be contained (no lastmod, priority, etc)
You should make sure your XML sitemaps are clean and don’t include URLs that are noindexed, blocked by robots.txt, canonicaled elsewhere, 404, redirect, etc. Otherwise you risk search engines ignoring your XML sitemaps as they deem them low quality.
What are XML Sitemaps Used For
One of the biggest misconceptions about XML sitemaps is that they help you get content indexed quickly. While that might be the end result in many circumstances, that is not what they do. XML sitemaps basically spoon feed Google a list of your URLs to crawl. From there it is up to Google to decide if your content should be indexed.
Beyond this, in Google Search Console, you can submit your XML sitemaps directly to Google. What’s really useful is that they will let you know if there are any errors as well as the number of URLs submitted via the specific XML sitemap and the number of associated URLs that are indexed. This is the best value that can be derived from XML sitemaps.
Strategy: How to Create XML Sitemaps
Given that Google will tell you the inherent percentage of indexed pages for each XML sitemap, this information can be very useful if you properly create your XML sitemaps. If you throw all your pages into one XML sitemap, there really isn’t a lot of value here. If you have a large site and an indexation problem, this tool isn’t actually that helpful. You can see the total number of indexed URLs vs the number submitted but not much more than that. On a large site, you need a starting point to work on your indexation otherwise it is much like finding a needle in a haystack.
If you split up your XML sitemaps into different segments, you can derive a lot of value and information from the Search Console XML sitemap center. You can use the XML sitemaps report as a diagnostic tool to identify areas of your site where Google has poor indexation. With this information, you can dive into an audit reviewing your pages in this area of your site.
The basic idea is to setup and create your XML sitemaps so that they reflect the important areas of your site. For example, if I had an ecommerce site, I’d segment my XML sitemaps as follows:
- category pages
- product pages
- key landing pages
- blog / content pages
- location pages
- customer support pages
As you can see, this would outline the major areas of the site and would provide structure into solving indexation problems. It is worth noting that if you have a large site and have tiered category pages that require several clicks (ex: clothing > mens clothing > mens shirts > mens t shirts), you should consider setting up XML sitemaps to reflect this – such as category pages (level 1), category pages (level 2), etc. This will help you see if indexation problems begin or are exaggerated at specific depths of your site. Another approach you could use here would be to approach this by topics rather than XML sitemaps. For example, you could create XML sitemaps for:
- Mountain biking
- Rock climbing
Both of the above sitemaps provide value in understanding the indexation of your site. Rather than a high level single number for your site, you can if there are disproportional indexation problems with any particular portion of your site. If you follow this segmentation strategy when you are deciding how to build your XML sitemaps, you will be in a much better place to diagnose indexation problems should they arise.