It was a great tool in the last version and it was linked with the Sitemap explorer/status tool. We’ve missed it with the advent of the New Webmaster Tools but we’re delighted its been reintroduced. This tool is epic in its power and raw data – its great that Google are providing such data which they’ve normally been very protective about. Without it, the only way to know how many URL’s have been indexed was to guess. The XML sitemaps only tell Google what URL’s you think you have. But Google gets URL data from a variety of sources – broken external links for one – that it may try and crawl or index. Pages that haven’t been 404’d can (certainly used to) remain in the index for as long as 12 months!
Where do I find it?
Google have, appropriately, removed the link between the Sitemaps section and the Index Status tool by placing it in your health section. That’s good because it teaches webmasters that URL’s in your Sitemaps don’t necessarily make it in. In fact, depending on your site size, authority and structure, in some cases of sites with 1,5k+ pages, less than 40% get indexed.
So what does it tell me?
Indexed pages are most definitely not the same as crawled pages. This is a relatively averaged size single product ecommerce website, aged about 18 months:
The red line represents the total number of URL’s that Google has found. These come from a variety of sources including broken links and Sitemaps. Every page URL you have, including broken links, represents a page. So if you had a page called “about-us.php” and occasionally linked to it as “about_us.php” or “aboutus.php”, Google sees them as new, separate URL’s (or canons).
The Green line is the line of URL’s that Google has chosen to ignore – in this case, nearly as many as there were pages it decided to index.
The blue line maps the number of pages that Google has opted to actually index after it crawled them.