SEO Agency NYC

How Google’s Spiders actually crawl websites

Crawl budgets are based on authority. Its not like you get crawled every 24 hours and its not like every page gets crawled.

When were my pages crawled

Step 1: Go to GSC > Pages > “View data about Indexed Pages” – you can see when pages were last crawled AND indexed.

Google doesn’t spider your whole site Everyday

I think there’s this misconception that Google constantly crawls sites or refresh’s your site map daily/weekly/monthly

How Crawling ACTUALLY works

It doesn’t

Firstly – if you server AND sitemap don’t show an update since the last crawl, it will move on. Secondly – crawlers don’t read or parse or process content – they fetch as much of a document as they can (frequently this could be less than 50% – they have short timeous) – if the CRC check hasn’t changed, they’ll dump and ignore the HTML.

If there’s a change – they’ll dump cut-out pieces of HTML into different parsers – the process that builds the meta-snippet runs on its own and if the meta-description wasn’t fully read, it will get ignored.

The crawlers DO grab other URLs and put the context (read: AHhref text) into another crawl file\

LastMod And Directives

If you’re honest about LastMod and the CRC checks do show a change, Google WILL obey your LastMod

“Google ignores <priority> and <changefreq> values.”
Source: developers.google. com/search/docs/crawling-indexing/sitemaps/build-sitemap

Google prefers to crawl pages from other pages, not sitemaps

If your site’s pages are properly linked, Google can usually discover most of your site. Proper linking means that all pages that you deem important can be reached through some form of navigation, be that your site’s menu or links that you placed on pages.

Pages WITH no Traffic don’t get re-indexed

If the pages have no traffic – they are removed from the crawl lists – this saves Google a lot of time

Only High-Authority Pages get an XML Listener (read: XML Sitemaps)

Only about 1-5% of the worlds site get XML listener which run on bands of 1-5 seconds (E.g. CNN, DA 35+) and then hourly and up to 24 hours. News Feeds get read differently – IF you’ve been accepted as a news site

No Authority? No XML Sitemap Reads

For most people – just your highest trafficked pages will get crawled and re-indexed – maybe even if they haven’t changed

Where can you read more?

The Google Sitemaps for Devs is a really good read:

What are Google Crawlers?

 

More Posts

SEO AI Expert

Hiring an AI SEO Consultant

David Quaid is a highly experienced and respected SEO consultant based in New York City and New Jersey. As the founder and managing partner of