The Ultimate Guide to XML Sitemaps
Search Engine Optimization

The Ultimate Guide to XML Sitemaps

12/25/2023 1:48 PM by SEO Admin in


What is a Sitemap?

A sitemap is a way to organize a website while identifying the URLs and data under each section. The XML document contains instructions for search engine bots. 

Also, a sitemap can be defined as a diagrammatic representation that shows how subpages are arranged beneath the parent groupings. This kind of diagram forms a map of a site. This map is an XML file which enables search engine bots to index a site. Sitemaps can also be defined as a demonstration of the navigation layout and how content is organized on a website.

The best way to help Google find your website’s pages is to create a Sitemap. With its help, you can show the search bots how your site’s pages are organized and which ones are the most important and relevant. 

Remember:

A sitemap.xml file is located in the root directory of a site. There you can specify the URLs, the priority of their scanning, the date of the last update, the availability of other language versions, etc. You can also add additional information, depending on the type of content. For example, you can specify a video’s duration, rating, or age limit.

Elements making up an XML Sitemap

All the elements are marked with special tags, which visually resemble HTML code.

Mandatory elements:

  • First line. The first line indicates the XML version and the required encoding for Sitemap files — UTF-8;
  • urlset — a tag that indicates the standard of the current protocol. It is a parent tag for the ones following;
  • url — a tag for each URL entry. It is a parent to the tags below and a child of urlset;
  • loc — URL of a page. This URL must begin with the protocol (such as HTTP) and end with a trailing slash if your web server requires it.

Optional elements:

  • lastmod — a tag that indicates the last date when the page was updated. It is a child tag of. Google considers the value of this tag only if it coincides with the actual time of the last page update;
  • changefreq — a tag that indicates the approximate refresh rate of the page. Valid values are: always, hourly, daily, weekly, monthly, yearly, and never;
  • priority — a tag that indicates the priority of the page in comparison with other pages. The value is between 0.0 and 1.0.

When Should You Use Sitemaps?

Usually, if web pages are properly linked, search engine crawlers are able to discover most parts of the site. However, if you have a sitemap, it makes crawling easier and more efficient. Let’s look at some instances where you need sitemaps:

  • When your site is quite large - there is a possibility that search engine spiders might overlook elements of your site, especially recently updated pages or new pages.
  • When your site is new, with few external links to it - search engine bots crawl the web by following links from one page to another. When your site is new and has no links, search engines might not crawl your pages at all.
  • When there is a large archive of content pages which are isolated or not properly linked - you need to list pages that do not naturally link to each other on a sitemap.
  • You are using sitemaps-compatible annotations (rich media content or content that is shown in Google News) - Google is able to take the additional information from the sitemaps into account and display it in search results.

Sitemap is one way to speed up their indexing. Otherwise, you might have to wait for a long time until the search engines pay attention to your pages. You will also need a Sitemap if you have a lot of multimedia or news-related content or even a large archive of pages that are not interlinked.

However, don’t think that a Sitemap is unimportant if your website doesn’t fall into any of these categories. Even though Google, in its documentation, offers a list of sites that may not need a Sitemap, we consider it a must-have component of a successful promotion.

A Sitemap offers the following benefits:

  • It helps crawlers know which pages to index. By adding a URL to the Sitemap, you’re emphasizing its importance.
  • It serves as a tool to control the crawl budget. With the help of a Sitemap, you can specify which pages to crawl more often and which to spend fewer resources on.
  • It allows you to indicate regional versions of pages — this is an easy way to organize a multi-regional site. For this, you only need to add hreflang attributes to URLs.
  • It facilitates the crawling of sites with a convoluted structure. If the structure and linking on the site are not organized correctly, search bots may not always get to the right pages when following the links from the main page. In this case, adding them to a Sitemap will help solve that.
  • It speeds up the crawling of media files and news pages. If you want your site’s content to appear in search results for pictures, videos, or news, adding information about it to the Sitemap is worthwhile.

Google’s requirements

To avoid problems associated with the use of a Sitemap by search engine crawlers, you should adhere to the following rules:

  1. Use UTF-8 encoding.
  2. The file size cannot exceed 50 MB in compressed format.
  3. The maximum number of URLs cannot exceed 50,000.
  4. The links within a Sitemap must be located within the same domain as the file itself.
  5. If the file is too big, divide it into several files and specify them in the Sitemap index file.
  6. The server response when accessing the file should be 200 OK.
  7. Specify only URLs (without GET parameters and session IDs).
  8. Mark additional language versions of a page with the hreflang attribute.
  9. Only numbers and Latin letters can be used.

Creating a Sitemap

We will look at different options for creating sitemaps below:

Using a Content Management System

If you use a Content Management System like WordPress, you can generate sitemaps automatically via the help of plugins like Yoast SEO.

Adding a Sitemap to WordPress

We will focus on how to add a sitemap to WordPress via Yoast SEO. When you use this plugin, the sitemap index is updated automatically as you add or remove content. The post types that you want to be indexed are also included. Note that post types marked “noindex” will not appear in the sitemap.

To create your sitemap using Yoast:

  • Log in to your WordPress account and access your dashboard.
  • Click on “SEO” on the left-hand side menu.

The SEO settings expand to give you more options.

  • Choose “general.”
  • Click on features.
  • Toggle the “XML sitemaps” to switch to “on.”
  • Save changes.

This will activate the XML sitemap.

Viewing the Sitemap

  • Log in to your WordPress account and access the dashboard.
  • Click on “SEO” on the left-hand side menu.

 

The SEO settings expand to give you more options.

 

  • Click “general” then select the features tab.
  • To view your sitemap, click the question mark that is next to the XML sitemaps toggle

Using All in One SEO, you’ll need to go to All in One SEO > Feature Manager and Activate the XML Sitemaps.

The XML Sitemap will then appear in the All in One SEO settings.

Depending on your site needs, you can click on the help ? icon to learn more on what you can do for each settings. We recommend leaving the check on for:

  • Create compressed sitemap
  • Link from Virtual Robots.txt
  • Dynamically generate sitemap

Where is Sitemap for Wix?

Wix sites have sitemaps that are dynamically generated on their server, and they automatically update when pages or content is added or removed from your site. These sitemaps are not editable.

Do I Need to Generate a Sitemap for Shopify?

All Shopify stores have an automatically generated sitemap.xml file. The file contains links to all products, pages, blog posts, product images, and collections. You’ll need to locate your sitemap file at the root directory of your store’s primary domain name (for example, suescollection.com/sitemap.xml) and then submit it to Google Search Console. We will look at this process a bit later in the article.

Using Tools to Generate a Sitemap

To help you generate a sitemap use tools like SanSEOTools Sitemap generator and xml-sitemap.com. Here is how to use

Using XML-sitemaps.com

  • Go to https://www.xml-sitemaps.com/
  •  
  • Copy your website URL
  • Paste it on the website
  • Click “start”
  • Wait while your website is being crawled. Note: that if your website has more than 500 URLs, you will be required to create a paid account.
  • Once the crawler is done, click “view sitemap details” then download the .XML file

At this point, it helps to review your sitemaps as any problem on your site is summarized. You may, for instance, find an issue like broken links. Let’s explore this a bit. Broken links can actually prevent your website from being totally indexed, as search engines see them as dead ends. Broken links suggest there is no more information, while in essence, the information is just inaccessible. Broken links can also be detrimental to a site’s reputation and increase your bounce rates.

  • After reviewing your sitemaps for errors, you need to upload the sitemap to the domain root folder of your website.
  • Next, open the Google Search Console and add your sitemap URL .
  • thats it

Using SanSEOTools Sitemap Generator

  • To generate a sitemap, you’ll need to enter your website:
  • Whether you are modifying or creating a new sitemap,
  • You may also need to add information, such as the last modified date, the frequency that your pages are likely to change, and the priority of your URLs.
  • Select the number of pages your page has (up to 5,000)
  • Hit the “Generate Sitemap” button.
  • You can see that the crawler has finished crawling your site
  • You can either click Save XML file or copy to Clipboard and add to your current or upload your newly created sitemap XML file.

It's a good idea to check for errors in your XML file and validate the sitemap before you submit to Google.

 Make the Sitemap available for search engines

For that, you need to:

  • Add the file to the site’s root directory;
  • Add the Sitemap link to the robots.txt file.

Once you uploaded your sitemap to your website. You then need to submit your sitemap to Google Search Console. Submitting to Bing Webmaster Tools is also a good idea.

Here is how to do:

How to Add a Sitemap to Google Search Console

You have just uploaded your sitemap to your root directory. The location of your sitemap should be https://mywebsite.com/sitemap.xml.

To upload it to Google Search Console, you need to log in to Search Console, go to “Index” then select “sitemaps.” Click “add/test Sitemap.” Enter the URL of the sitemap. Click “Submit”\

How to Add a Sitemap to Bing Webmaster Tools

To register your sitemap with Bing Webmaster Tools, confirm that the XML sitemap that you uploaded to your site was indeed successful and that the URL loads.

  • Log in to Bing Webmaster Tools
  • Click “Configure My Site”
  • Click “sitemaps”
  • Enter your sitemap URL in the text box labeled “submit a sitemap”
  • Click “submit”

Best Practices for Creating a Sitemap

1. Prioritize High-Quality Pages in Your Sitemap

Your sitemap should not direct search engine bots to low-quality pages as it could be considered a sign of a site with minimal value to visitors. Prioritize highly optimized pages, images, videos, unique content, and other pages that prompt user engagement on your sitemap.

2. Test Your Sitemaps before Submission

Always test your sitemap before you submit it to ensure that any errors are resolved.

Here is how to test your sitemap:

  • Click “add/test sitemap” on the reporting landing page on Google Search Console.
  • Enter the URL of the sitemap in the dialog box and click Test.
  • Click “Open Results” once the test is complete to check for any errors.
  • You can then submit.

4. Use Only Canonical Versions of URLs in Your Sitemap

If you have used the rel=canonical tag to differentiate pages that are very similar, then only include these on your sitemap. If you have not used the tag, you need to use it to distinguish pages that are similar (e.g., product pages) and only use the canonical version in your sitemap moving forward.

5. Keep Your Sitemap File Small

When your sitemap is small, it will not strain your server. The size of sitemaps was increased to 50 MB by Google and Bing.

6. Use Robots.txt for Pages that You do not Want Indexed

Using a sitemap does not mean that you need to index each and every page. Pages such as thank you pages should not be indexed even if they are listed on the sitemap. You should use the meta robots “noindex, follow” tag to preserve your link equity even if that particular page will not be indexed.

There are times when you can use robots.txt to block pages. One such instance is when your crawl budget is quickly used up. What is a crawl budget? It is the number of times a search engine bot crawls your site within a particular time frame. If your site gets crawled 32 times a day, for instance, there is a possibility that you have a monthly crawl budget of 960 times.

You can see your crawl budget under “crawl stats” in the search console.

7. Avoid Including “noindex” URLs in Your Sitemap

If you do not want certain pages to be crawled, it makes more sense to avoid including them in the sitemap rather than adding a “noindex” tag. This tag sends out conflicting information to the search engine bot, suggesting that the page should both be and not be indexed. This inconsistency wastes your crawl budget.

8. If You have a Large Site, Create Dynamic Sitemaps

If your site is quite large, you need to set up some rules that will help determine when a page can be included in your sitemap, or changed from “noindex” to “indexfollow.” It may help to use a tool to generate a dynamic XML Sitemap.

9. Combine XML Sitemaps and RSS/Atom Feeds

RSS/Atom Feeds help with notifying search engines when content is updated on your site and make it easier for both search engines and users to access fresh content.

Conclusion

Sitemaps make it easier for search engines to index and crawl your site. They also make it easier for users to navigate your site. This way, when you update content, both search engines and users will easily find the new material. What does this mean? That you are more likely to have a good ranking and high sight reputation, resulting in more and new visitors as well as conversions.