Google keeps crawling and indexing URLs with 404 status code

Zhivko

Member
Joined
Mar 15, 2019
Messages
42
I am trying to convince Google to drop about 3000 URLs from their index.
  • Some of them I marked with 404,
  • some of them I marked with "noindex",
  • some of them I redirected.
This happened in February, more than 2 months ago. Google keeps crawling those URLs (even thought here are NO internal links to them!) and keep showing them in the GSC.

I don't want to spend a few days manually sending each of those URLs for removal in the GSC. But I don't want to wait years for Google to realise those URLs no longer exist.

Any ideas?
 

djbaxter

Administrator
Joined
Jun 28, 2012
Messages
2,739
Use a 410 error code instead of 404.

404 means "not found".

410 means "permanently removed".

List of HTTP status codes - Wikipedia
410 Gone Indicates that the resource requested is no longer available and will not be available again. This should be used when a resource has been intentionally removed and the resource should be purged. Upon receiving a 410 status code, the client should not request the resource in the future. Clients such as search engines should remove the resource from their indices.
Add this to your .htaccess file:

Code:
# need to 410 these URLs
Redirect 410 {old URL}
 

Tiggerito

Member
Joined
May 5, 2016
Messages
19
Google will check pages it knows about for a long time. It does not mean they are in the search index. It just means Google is aware of them and wants to see how they behave every once in a while.

If a page returns 404 or 410, then it is removed from the index. But Google will check every once in a while, because things change.

If it's marked as noindex, it is removed from the index, but Google will check every once in a while.

If it redirects.... you get the picture.

You don't need to remove them. Google is reporting to you that it knows that they 404, 410, 301, 302, noindex, disallow etc. And that they have removed them from the index already.

Only worry about URLs reported that you do want in the index. Use the reports to determine if you have made a mistake.
 

Zhivko

Member
Joined
Mar 15, 2019
Messages
42
@Tiggerito In my case, the URLs are still indexed. And that's about 3000 URLs, compared to about 1000 "real" URLs. So it sux.

I will try using 410, thank you all for the feedback.
 

Tiggerito

Member
Joined
May 5, 2016
Messages
19
How are you determining that they are indexed?

410 is a slightly stronger signal than 404 and may cause Googlebot to check them less frequently once they have seen the 410 status.

Having a valid xml sitemap can also push Googlebot to crawl your important pages more often.
 

JoshuaMackens

Local Search Expert
Joined
Sep 12, 2012
Messages
1,797
Google keeps 410's, 404's, "noindex" directives in their index for much longer than you would expect. From our experience it's been months.
 

Zhivko

Member
Joined
Mar 15, 2019
Messages
42
Google keeps 410's, 404's, "noindex" directives in their index for much longer than you would expect. From our experience it's been months.
A quick update - about a month and a half later, GSC shows 1000 fewer URLs. There were 4000, now there are 3000.

For now, I am willing to believe that what I did works. It just takes time.
 

JoshuaMackens

Local Search Expert
Joined
Sep 12, 2012
Messages
1,797
A quick update - about a month and a half later, GSC shows 1000 fewer URLs. There were 4000, now there are 3000.

For now, I am willing to believe that what I did works. It just takes time.
Yeah, it's just a factor of time.
 

Trending: Most Viewed

Promoted Posts

New advertising option: a promoted post by a Sterling Sky employee reviewing your product or service; this will also be shared on the Sterling Sky & LSF Twitter accounts, our Facebook group, LinkedIn, and both newsletters. More information...

Weekly Digest


Weekly Digest
Subscribe/Unsubscribe

Local Search Forum


Google Product Exert

@LocalSearchLink

Join Our Facebook Group

Top