Why does crawl budget matter to Google?
Two reasons:
What is ‘crawl rate’?
The number of requests that Google’s crawlers make to your site each second
Bonus tip: In ‘SEO Mythbusting’ Martin Splitt described crawl rate as the amount of stress Google can put on your server without overwhelming it
What affects crawl demand?
What is crawl demand?
How often Google wants to crawl your site
Hypothetically, if a site ranked really well but was not crawled very often (low crawl demand), why could that be?
They have quality content, but the content does not change very often. This could be due to the nature of the content (dictated by the niche the website is in)
What are some ways you can indicate to Google how frequently content needs to be crawled?
Which types of sites should worry about crawl budget?
List two ways to control crawl budget for large sites with UGC
Does crawl budget affect Google’s rendering of web pages?
Yes, because requests for rendering-related resources are counted in your crawl budget
How do you reduce the Google crawl rate?
How can you verify that it is Googlebot crawling your website?
You can use a reverse DNS lookup, using the host command in terminal.
E.g. host 66.249.66.1
Returns
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.
Then, you must use a forward DNS lookup to verify that the domain name is indeed associated with the IP address you performed the reverse DNS on. This is to ensure that the original IP address is not DNS spoofing.
host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
What is the crawl capacity limit?
Googlebot wants to crawl your site without overwhelming your servers. To prevent this, Googlebot calculates a crawl capacity limit, which is the maximum number of simultaneous parallel connections that Googlebot can use to crawl a site, as well as the time delay between fetches. This is calculated to provide coverage of all your important content without overloading your servers.
What are the ways that maximise the efficiency with which Google crawls a site?
How do you figure out whether Google is encountering crawl availability issues?
Use the Crawl Stats report
Broadly, what data is available in the Crawl Stats report?
What affects Google’s crawl rate?
Largely the responsiveness of your server, but also whether you limit it in search console
Why would you want to limit Google’s crawl rate?
Because Google is overloading your servers with requests
What is crawl budget?
The number of URLs that Google can (crawl rate) and wants (crawl demand) to crawl
What are some ways to increase crawl budget?