![Don’t make the mistake of blocking an indexed URL in robots.txt]
robots.txt
noindex
attribute.When you see the status Indexed, tough blocked by robots.txt
in URLinspector, then your website may be in danger.
This first indication of already indexed pages now being blocked in robots.txt can cause you to lose all rankings over time.
Take a look at the screenshot below:
We see:
(1) coverage graph “Indexed, though blocked by robots.txt”
(2) also a WARN in the “Indexed by Google” chart
(3) status “Blocked Robots.txt” on the chart “Page Fetch” (the result of page fetching by Googlebot)
(4) status “DISALLOWED” in the Robots TXT status chart.
Note: when you click on those charts in the URLinspector app, you get a full break-down in a table like here.
The problem is, that this e-Commerce store has some www URLs now redirect to a sub domain that has bots blocked.
We we can see this on a single URL basis in the popular free browser extension Link Redirect Trace as well, clearly.
So we are seeing the www URLs getting de-indexed from the URLs getting redirected to the blocked bot area.
When we look at the robots.txt
file of the sub-domain we see how it blocks all URLs in the first two lines already
User-agent: *
Disallow: /
What was probably desired here, was to block the root URL “/”.
Unfortunately, that is not how robots.txt works.
The solution is quick and easy:
If you want to get an URL de-indexed in Google, you need to use the noindex
attribute.
Let us know by tweeting @URLinspector and we’d be happy to cover it in an upcoming post.
Don’t let poor indexing hold your website back. Learn why looking after indexing is crucial for search engine visibility, credibility, and revenue.
Give us your opinion in the form at the bottom of this page, and get a chance to win 1 Year of URLinspector.
It’s not an easy decision between subdomains and subfolders, but we think subfolders are the best recommendation for most marketers.