Unlock the Secrets to Boosting Your Website’s SEO: A Step-by-Step Guide to Creating and Submitting XML Sitemaps and Robots.txt files

Thato Mmusi
9 min readJan 21, 2023

--

As seen on the video clip on my last article “Discover the secret to a healthy website with Technical SEO” , when at the start of this year I did a quick audit using SEM RUSH tool [Search Audit]. I got an overall Website Health score of 62% down by some 20%, from 82% when I launched the website last year.

The results are as follows:

From the results it is clear one of the major issues causing the serious downgrading of the site health is as a result of issues to do with “Crawled Pages”. There are about 4 issues or to be precise URLs (Universal Resource locators that are causing these issues as shown on a further expanded window of the Crawled pages.

From the image above the issues are as follows

· https://techhandyman.tk/hire shows the link is broken and issue with it, with it is the code being searched for “hire” is missing. So it gives a status code of 404. I guess these are bots searching scrapping for Hiring pages on the websites. I simply did not include such a page.

· https://www.techhandyman.tk/ this link no HTTPS STATUS code was given, so it has no issues and an indicator that it is probably the only link on my site being crawled by Google.

· https://techhandyman.tk/robots.txt This link gives a HTTPS STATUS code of 404 as the first link, indicating that my website a robots.txt file is missing. Once again, I did not include it at the beginning of project and will now create one to fix the issue

By the way what is a robots.txt?

A robots.txt file is a simple text file that is placed on a website to instruct search engine bots, such as Googlebot, which pages or sections of the website should not be crawled or indexed.

This is important for SEO because it allows website developers to prevent search engines from crawling and indexing pages that are not important or relevant to the site’s content, such as duplicate pages or pages that are under development. A clear example being that of the first link above, with a robots.txt I will be able to tell Google to avoid crawling for non existent pages like [ https://techhandyman.tk/hire ] on my website.

This helps to improve the overall quality and relevance of the search results for a given website, as well as reduce the load on the site’s server. It’s important to note that while the robots.txt file can be used to block search engine crawlers, it does not provide any security and should not be used to protect sensitive information.

· https://techhandyman.tk/sitemap.xml The final link also gives an HTTPS STATUS code of 404 as previous link. A clear indicator once again that the sitemap.xml is non existent or file cannot be found on the server. Same as robots.txt I did not include the sitemap.xml. But as I now want to regularly carry out site audit and monitor site I have to develop one.

What is a sitemap?

Google describes a sitemap as:

“… a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines read this file to more intelligently crawl your site.”

Sitemaps can have multiple formats such as XML (Extensible Markup Language ), RSS (Really Simple Syndication), and Text. The XML format is the most widely used and will be the format that we discuss for the rest of the article.

As can be seen above we have identified some of the major issues that plays a crippling effect on my final website health score. The two are namely the

1. Sitemap

2. Robots.txt

A quick way to also prove the above is to use the a tool that most SEO specialists and web developers must have and acquaint with :

Google Search Console [ https://search.google.com/ ]

Google Search Console is a free tool offered by Google that helps website owners monitor and maintain their site’s presence in Google Search results. It allows users to check their website’s search performance, fix issues, and submit sitemaps. The tool also provides information on the number of clicks, impressions, and the average click-through rate for a website, as well as any crawl errors and security issues that may have been detected.

From the image above it is clear the only crawled link on my website is https://www.techhandyman.tk/, further giving evidence of what we have already found out above. If you expand the link on your left [Indexing] it will avail further options links namely the

· The sitemap

· Removal

We can now clearly see too that the website [ https://techhandyman.tk ] does not have a site map. All these can be seen as illustrated below:

Finally, from the steps we have taken above we have identified some of the issues that are hindering the performance of my website and that affect the its overall SEO score as the absence of a sitemap and robots.txt. I will in the final part of this article show by illustration how to produce the missing files.

NOTE ON SITEMAPS

As mentioned elsewhere, when I developed my website, I saw no need for such as I intended it just as an online CV. But now I want to expand more on it and scale the website to attract viewers and perform some basic functions on its own. I am saying this to try give perspective to anybody who might be asking himself or herself question (Do I really need a sitemap?). Further to that the short answer is, not really depending on the size of your site and how your site is built and its purpose

Creating and submitting a sitemap is one of the best sure ways to ensure all of the valuable content on your site can be found by search engines. Though a sitemap is not the only way search engines can find pages on your site, it is your “direct line” for telling Google what is important.

How to create an XML Sitemap

There are mainly two ways to create an XML sitemap, the automatic way and building one yourself. I will choose the latter as I am mostly for doing things myself as it normally enhances my knowledge.

To start, please follow these steps:

Step 1: Collect all of your site’s URLs

There are a few ways to get a list of URLs for your website. The easiest way is to use a crawler like Spotibo to crawl your entire website to find all URLs. If you don’t have a copy of Spotibo, you can also find your URLs by:

Manually looking at your site and collecting all URLs

Perform a site search in Google to identify the URLs that Google has in its index

The most important thing to keep in mind when collecting URLs for a sitemap is to determine if you want Google to index that specific page. For example, your homepage is most likely a page you want in the index but not your privacy policy or terms and conditions page.

Step 2: Code the URLs

Once you have a list of URLs that you want to include in your sitemap, it is time to code the URL in an XML format. In order to code the URLs in the proper format, you’ll need a text editor such as [ notepad++ ].

You can also watch the following video on how to download and install it on your machine at video. I prefer using it, but you can use any choice of your IDE such as Visual Code, NetBeans etc. or other text editors such as sublime.

The following are some of the links to my website:

· https://techhandyman.tk/

· https://bplstats.tk/

· https://barcodegenerator.ml/

· https://techhandyman.tk/myprojects.html

· https://payhip.com/TechHandyman

· https://mmusi-thato.medium.com/

· https://techhandyman.tk/about.html

I. Let us start by opening a <urlset>


xmlns="https://www.sitemaps.org/schemas/sitemap/0.9"> tag
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

II. Next, add your URL using the <url> and <loc> tags

<url>
<loc>https://techhandyman.tk</loc>
</url>

III. You can stop here if you like but there are other optional tags that you can use to add more detail to your sitemap. Optional tags are as follows in the example:

<lastmod>: The date the page was last modified.

<changefreq>: How often does the page change?

<priority>: How important is this page compared to your other pages. Values for this range from 0.0 to 1.0 with 1.0 being the highest

<url>
<loc>https://barcodegenerator.ml/</loc>
<lastmod>2021–10–21</lastmod>
<changefreq>quarterly</changefreq>
<priority>0.7</priority>
</url>

IV. Close with a closing </url> and </urlset> tag priority

</url>
</urlset>

A complete sitemap will look as follows :


<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://techhandyman.tk</loc>
<lastmod>2023–01–19</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://bplstats.tk/</loc>
<changefreq>quarterly</changefreq>
</url>
<url>
<loc>https://barcodegenerator.ml/</loc>
<lastmod>2021–10–21</lastmod>
<changefreq>quarterly</changefreq>
</url>
<url>
<loc>https://techhandyman.tk/myprojects.html</loc>
<lastmod>2021–10–21</lastmod>
<priority>0.7</priority>
</url>
<url>
<loc>https://payhip.com/TechHandyman</loc>
<lastmod>2021–10–21</lastmod>
<priority>0.8</priority>
</url>
<url>
<loc>https://mmusi-thato.medium.com/</loc>
<lastmod>2021–10–21</lastmod>
<priority>0.8</priority>
</url>
<url>
<loc>https://techhandyman.tk/about.html</loc>
<lastmod>2021–10–21</lastmod>
<priority>0.8</priority>
</url>
</urlset>

How to submit your sitemap to through Google Search Console

That now we have our XML sitemap created, it is time to submit it to Google through the Search Console. Here is how we go about it, in a few easy steps:

1. Sign into Google Search Console and click “Sitemaps” in the left sidebar

2. Add the URL of your sitemap at the top of the page where it says “Add a new sitemap”

3. Click submit and Google will crawl your newly created XML sitemap.

That is, it we have created a sitemap and uploaded it to Google to use.

Next we create a robots.txt text file.

How to Create a Robots.txt file

It is important to note that a robot.txt file is based in the root folder of your site. That is https://techhandyman.tk/robots.txt. So we will create the file in the same location.

Before starting the following are some basic guidelines to creating a robots.txt:

· Create a file named robots.txt.

The file must be named robots.txt file and placed in the roots of your site.

· Add rules to the robots.txt file.

Rules are instructions for crawlers about which parts of your site they can crawl. A robots.txt file consists of one or more groups (set of rules). Crawlers also process groups from top to bottom.

· Upload the robots.txt file to the root of your site.

Upon completing the robots.txt file you save it and upload it on your server to make it available for crawlers to use.

· Test the robots.txt file.

Once uploaded Google crawlers will find and start using the file.

To create a robots.txt, once again open your notepad++ or text editor of your choice.

  1. Create a file named robots.txt.

2. Code the rules

User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /

Sitemap: https://techhandyman.tk/sitemap.xml

3. Upload it to your root folder.

4. That is it , your robots file is now ready to be used.

I am available for all consultancy work with regards to Web Applications, Website analytics. You can kindly contact me via Tech Handyman to set up an appointment or simply send me a WhatsApp message via the same Tech Handyman.

Thank you , for taking the time to read this article. It belongs to you and others who it might make a difference: MAKE IT GROW,

  • Share it
  • Follow me here on medium
  • Star the project page on Github

Finally follow me on Twitter

Follow me on twitter

--

--

Thato Mmusi
Thato Mmusi

Written by Thato Mmusi

Zealous theorist, Web/Software enthusiast, Avid Reader & Writer, Freelance Explorer all year round... Otherwise just a chilli loving beer over soccer fundi...