How to Fix Indexed though block by robots.txt in Google Search Console
Search Engine Optimization

How to Fix Indexed though block by robots.txt in Google Search Console

12/18/2023 4:02 PM by SEO Admin in


 

If you received the warning ‘Indexed, though blocked by robots.txt’ notification in Google Search Console, you’ll want to fix it as soon as possible, as it could be affecting the ability of your pages to rank at all in Search Engine Results Pages (SERPS).

A robots.txt file is a file that contains a set of instructions to tell search engines like Googlebot which pages should and should not view.

‘Indexed, though blocked by robots.txt’ indicates that Google has found your page, but has also found an instruction to ignore it in your robots file (which means it won't show up in results).

Have you received an email from Google Search Console (GSC) that says, “Indexed, though blocked by robots.txt,” there are a number of reasons outlined below, and can be fixed.

Check Indexation Issues: Identify the affected page(s) or URL(s)

If you received a notification from Google Search Console (GSC), you need to check the particular page(s) or URL(s) in question.

You can view pages with the Indexed, though blocked by robots.txt issues on Google Search Console>>Coverage. If you do not see the warning label, then you are free and clear.

When you click on a specific page indexing issue and the URL, you’ll get the option to “Inspect URL.”  From there, you’ll be able to access more information and the report:

The other way to test your robots.txt with the Screaming frog robots.txt tester.  You may find that you are okay with whatever is being blocked staying ‘blocked’. You, therefore, do not need to take any action.

Our Google index pages checker is the best tool that enables you to check the web pages that are indexed by Google.

You can use this page indexing report. You’ll see when Google bots last crawled your pages, and if there are any indexing issues you should fix.

Identify the Reason of the Notification

The notification may result from several reasons. Here are the common ones:

But first, It's not necessarily a problem if there are pages blocked by robots.txt., It may have been designed due to reasons, such as, developer wanting to block unnecessary / category pages or duplicates. 

Wrong URL Format

A way to test your robots.txt is via using a robots.txt tester.  You may find that you are okay with whatever is being blocked staying ‘blocked’. You, therefore, do not need to take any action.

You can follow this GSC link. You then need to:

  • Open the list of the blocked resources and choose the domain.

  • Click each resource.

Sometimes, the issue might arise from a URL that is not really a page. For example, Google may have picked a variation of your page’s URL. If that’s the case, feel free to disregard the notification.

However, if it’s a page that contains information you want searchers to see, change the URL and validate the fix in Google Search Console. 

Pages that should be indexed

Check robots.txt directives: There may be a directive in your robots.txt file preventing the indexing of pages that should actually be indexed.  for example, tags and categories. Tags and categories are actual URLs on your site.

Check if you’re pointing the Googlebot to a redirect chain: Googlebot go through every link they can come across and do their best to read for indexing. However, if you set up a multiple, long, deep redirection, or if the page is just unreachable, Googlebot would stop looking.

Check for Duplicate robots.txt files: 

If you use a CMS like WordPress, it may automatically create your robots.txt file. SEO plugins do the same. If you also created your own, ensure you’re not duplicating or triplicating robots.txt files with different directives, confusing Google.

Implemented the canonical link correctly? A canonical tag is used in the HTML header to tell Googlebot which is the preferred and canonical page in the case of duplicated content. Every page should have a canonical tag.

Indexed, though blocked by robots.txt fix for WordPress

WordPress + Yoast SEO

If you’re using the Yoast SEO plugin, follow the steps below to adjust your robots.txt file:

  1. Log into your wp-admin section.

  2. In the sidebar, go to Yoast SEO plugin > Tools.

  3. Go to File editor.

WordPress + Rank Math

If you’re using the Rank Math SEO plugin, follow the steps below to adjust your robots.txt file:

  1. Log into your wp-admin section.

  2. In the sidebar, go to Rank Math > General Settings.

  3. Go to Edit robots.txt.

WordPress + All in One SEO

If you’re using the All in One SEO plugin, follow the steps below to adjust your robots.txt file:

  1. Log into your wp-admin section.

  2. In the sidebar, go to All in One SEO > Robots.txt.

Pages that should not be indexed

here are several reasons why pages that should not be indexed get indexed. Here are a few:

Robots.txt directives that ‘say’ that a page should not be indexed. Note that you need to allow the page with a ‘noindex’ directive to be crawled so that the search engine bots ‘know’ that it should not be indexed.

In your robots.txt file, make sure that:

  • The ‘disallow’ line does not immediately follow the ‘user-agent’ line.

  • There is no more than one ‘user-agent’ block.

  • Invisible Unicode characters - you need to run your robots.txt file through a text editor which will convert encodings. This will remove any special characters.

Pages are linked to from other sites. Pages can get indexed if they are linked to from other sites, even if disallowed in robots.txt. In this case, however, only the URL and anchor text appear in search engine results. This is how these URLs are displayed on search engine results page (SERP):

One way to resolve the robots.txt blocking issue is by password protecting the file(s) on your server.

Alternatively, delete the pages from robots.txt or use the following meta tag to block

them:

<meta name=”robots” content=”noindex”>

Old URLs

If you have created new content or a new site and used a ‘noindex’ directive in robots.txt to make sure that it does not get indexed, or recently signed up for GSC, there are two options to fix the blocked by robots.txt issue:

  • Give Google time to eventually drop the old URLs from its index

  • 301 redirect the old URLs to the current ones

In the first case, Google ultimately drops URLs from its index if all they do is return 404s (meaning that the pages do not exist). It is not advisable to use plugins to redirect your 404s. The plugins could cause issues that may lead to GSC sending you the ‘blocked by robots.txt’ warning.

Wrap Up

We have looked at the ‘Indexed, though blocked by robots.txt warning’, what it means, how to check and identify the affected pages or URLs. We have also looked at how to fix it. Note that the warning does not equal an error on your site. However, failing to fix it might result in your most important pages not being indexed which is not good for user experience