What is Crawl Management?

Once you own a website, or start a job working behind the scenes building or optimising websites, you understand that they work a little differently than you might have once thought.

Rather than just existing on the web and showing up on Google when searched for, websites must first be ‘crawled’ and indexed by search engines to show up on their results at all. ‘Crawling’ happens frequently over the lifespan of a website to help search engines such as Google understand the content on the site, and add to and refresh their indexed results. These crawlers, also known as ‘bots’ or ‘spiders’ also crawl your website to view it once it has been updated, recently become popular, or if it is frequently linked to elsewhere through backlinks. This can be requested through Google Search Console, or regularly as the site becomes discovered by Google.

Crawling a site from the bots’ perspective is time and energy-consuming. However, they will continue to crawl the site regularly if they believe that there is a good enough chance that will find something new on the site or page.

Effective SEO might mean that you need to manage these bots’ time and energy better, ensuring that your site gets added to Google’s index more efficiently than just allowing the bot to run its course unguided. This is called ‘crawl management’. So what is ‘crawl management’ exactly, and how can you use it to get the best SEO from your campaign?

‘Crawl Management’ is a method used by SEOs and web developers to limit the number of pages that are effectively crawled and indexed by Search Engines to only relevant URLs. This may mean excluding certain pages or parts of your site from being crawled or improving your site structure and technical background to allow for more efficient crawling by bots such as Googlebot.

Find out everything you need to know about website crawling and how to efficiently request crawling and indexing from the team at Wildcat Digital.

How Does Crawling Work?

First off, the internet is huge.

Google, in its seemingly infinite capacity, attempts to understand the internet better and provide better results constantly. However, as it is essentially a service in itself, it only tries to understand the part of the internet that is going to be usable and beneficial to its users. To do this, it crawls the internet to find, read, understand and index sites and pages that its users may want to find.

As an SEO agency, it’s our job to help Google and other Search Engines find sites and pages to add to this index. Then, using SEO, tools, techniques and clever methods, we get Search Engines to promote our sites further up in their Search Engine Results Pages (SERPs). These are rankings, and they are how Search Engines promote sites within their index that they believe offer the best service to their users.

These indexed pages are around 30 – 50 billion in number, with trillions existing elsewhere on the net. It has been estimated that around 4% of the internet is actually indexed by Google, and therefore eligible to be found in Google Search Results. This is how it can find so many results in such a short period of time.

Crawling is therefore the process of discovering page content, including copy, images, documents and links to other web pages, which are followed to discover new URLs (unless robots are otherwise instructed). Crawling the web is a task that Google takes very seriously, and it does so within an allocated budget.

What is a Crawl Budget?

A ‘crawl budget’ is the number of pages that Google’s crawling bot (Googlebot) is willing to visit on a specific website within a specific time period. According to Google’s own documentation, this is defined by two main elements: crawl capacity limit and crawl demand.

As Googlebot can’t see all of the internet all of the time, it must allocate its budget of resources to websites that are worth serving to its users. Think of a crawl budget as an allowance for sites that meet Google’s requirements, or have asked for indexing through Google’s Search Console.

The crawl budget can be influenced by a variety of factors including the website’s quality as seen by the search engine, server performance, update frequency, page quality, relevance and how frequently the website is updated compared to your other sites. At its essence, more crawl budget is allocated out to websites that might require it, over ones that don’t. If you are allocated more crawl budget, more of your web pages will be crawled more frequently.

Looking at the factors above, you can make improvements to the website to improve the crawl budget. This might be why you might want to employ an SEO agency to work for you. Certainly, the work that SEOs do would help bots crawl your site more often and crawl more pages when it does so.

What Factors Influence Crawl Budget?

Googlebot and other crawlers are programmed to crawl your website without causing any strain on your servers. In order to achieve this, Googlebot determines a crawl capacity limit, which is the maximum number of parallel connections that can be used to crawl a website at the same time, along Google’s crawling limits.

If you have a large site that contains hundreds or even thousands of pages, then having it crawled regularly may slow down your site. If it is badly linked internally then it may cause a strain on the crawlers when attempting to crawl. If your site is old and not updated regularly or doesn’t get viewed often this may lead to problems convincing the crawlers to prioritise it over fresher content elsewhere.

A variety of factors influence crawl budget, some of which we have mentioned above, but others might include:

Server Performance

Slow servers or server response times can greatly hinder crawling efficiency. Additionally, if your pages repeatedly return 5xx errors (server-side error), search crawlers are likely to drop these results from rankings and even slow down the crawling of the website. If you own a website, or you are an SEO looking to test your server response time, you can use tools such as GTMetrix. Tools like Google Search console and website crawlers like Screaming Frog and Sitebulb can be used to find and analyse 5xx errors on your website.

Robots.txt

Directives in the robots.txt file guide crawlers to sitemaps, and allowed and disallowed pages. Setting up a robots.txt page can give instructions to bots crawling your site about where to go or not go next. This can help you manage your crawl budget efficiently.

XML Sitemaps

XML Sitemaps should exist in their own right but also be linked to your robot.txt page to instruct bots on which pages exist on your site. XML Sitemaps act as a directory for users and bots to quickly find all relevant URLs associated with a site and crawl accordingly.

These help search engines discover pages efficiently. You can submit a sitemap to Google Search Console which will be read every few days and allow Googlebot to discover new pages on your website. Find out how often you should update your sitemap (and how…)

Duplicate Content

Pages with duplicate content are a waste of your crawl budget. And, when indexed, they may also create cannibalisation issues by competing with each other. Duplicate content may occur due to issues with pagination, filtering, spammy page duplication, product variations or instances where content is available on the site and through a download (such as a printout or PDFs).

What’s more, duplicate pages without proper robots markup, may lead to Googlebot drawing its own conclusions about canonicals, and excluding certain pages from showing up on search results. More on how to address duplication issues below.

4xx & Soft 404 Errors

Broken links waste crawl budget. Once a crawler gets to a link it wants to crawl the content, find new links and continue doing its job. If it gets to a broken link, like a 4xx then it ends its journey.

Once Google encounters a page with a 404 Permanently removed code, it will stop coming back to crawl that page. However, when crawlers encounter soft 404 pages, they continue crawling these, potentially wasting the crawl budget.

Having broken links on your website then not only wastes a good opportunity for your users but also presents a risk to having your website effectively crawled, too.

Low-Value Pages

Low-value pages present an issue to effective crawling as they essentially waste the crawl budget. Over time, if the content on the site is generally poor, irrelevant, or outdated, Google may reduce the crawl budget on the site.

Redirect Chains

When a page is redirected, it will return a 301 code to the crawler, pointing it to a new URL and therefore issuing a new crawl request for the same page. Redirect may build over time, creating redirect chains, meaning that the crawler has to hop from URL to URL multiple times before reaching the destination URL.

This is not only a waste of crawl resources, but it also increases the page’s load time, and as we have mentioned above, slow site performance may lead to a loss of crawl budget over time. What’s more, Googlebot only follows 5 redirect hops before giving up on the crawling of the page, meaning that the final URL may never be crawled and indexed.

How Do You Manage a Crawl Budget?

Using the above as a guide, we can improve our allocated crawl budgets by improving various aspects of the site.

Improve Server Performance and Page Load Times

Improving server response and page load times can be carried out by an SEO, web developer or webmaster. Tools like WP Rocket for WordPress can help defer loading speeds, compress images, minify CSS and Javascript or introduce lazy loading can help bots and users use your site more efficiently. If your site responds quickly to Google’s requests, the allocated crawl capacity limit goes up and more of your site can be crawled.

Disallow Irrelevant pages in robots.txt

Disallowing the craw of irrelevant pages, page paths, directories, or parameters found on your site can vastly reduce the number of pages that are findable to crawlers and free up the crawl budget for more important URLs. Do so carefully as some pages may internally link to other important pages that you want Google to index. Find more about how robots.txt works here.

Update your XML Sitemap

Your Sitemap ought to automatically generate through Yoast or a similar tool. Most CMSs will do this for you.

If your site already has an XML sitemap, then ensure that it is updated regularly. Make sure to review your settings to ensure that only relevant URLs are included in your sitemap.

Remove Duplicated Content

To avoid wasting your allocated crawl budget, it is important to manage duplicate pages with appropriate canonicalization. Paginated URLs, filtered dynamic URLs, product variations, and duplicated pages, among others, should be canonicalised to the main URL, in order to consolidate your content and ensure that only relevant pages are crawled and indexed.

Address 404 Internal Links and Remove soft 404s

Marking permanently removed pages with a 404 may be the best bet for some, but as many pages are internally linked and may even be ranking, setting up relevant redirects for pages that have been removed will allow your crawl to continue and potentially find new areas of your site yet to be crawled.

As we have mentioned above, soft 404s waste your crawl budget. These are usually either pages with errors that still return a 200 OK response or pages with no content. You should monitor soft 404s in Google Search Console and make sure to implement 404 or 410 responses for permanently removed pages, redirect to the correct resource where relevant, or populate your page with the correct content and ensure that it returns and 200 OK response.

Remove or Optimise Low-Value Pages

Ensuring that your website is populated with relevant and valuable content and regularly updated, is a good way of sending search engines the positive signal they need to recognise the value of your pages and allocate more crawl budget to your site over time.

Google Search Console may provide some insight into the pages that are deemed low-value in the ‘Crawled – currently not indexed’ section of the ‘Pages’ page.

If these pages don’t currently serve a purpose and are not within the inner workings of your site, you could remove them to improve crawlability elsewhere.

Remove Redirect Chains

Redirects have their place in any SEO strategy, but redirect chains should be avoided for all of the reasons mentioned above. To remove redirect chains, update your redirect to point to the final URL. You should also ensure that the URLs within the chain are not linked to internally on your website.

Improving Your Website’s Crawlability With Wildcat Digital

If your website is struggling with crawling and indexing new URLs, you may have an issue with its crawlability. Getting your site crawled and indexed can be tricky, especially if you are already running a business yourself.

In most cases employing an SEO through an agency is a cost-effective way to get your website seen by Google and billions of Google users, helping you compete in the digital world.

Wildcat Digital is an SEO, PPC and Paid Social Digital Marketing Agency based in Sheffield, Yorkshire. We specialise in helping businesses punch above their weight online and compete with some of the largest names in their fields.

If you need help with crawling and indexing your website, then get in contact with our team today.

Post by

Jon Herdman

SEO Executive

Will Hitchmough

Founder

Our founder, Will Hitchmough, worked at a number of high profile Sheffield Digital Agencies before founding Wildcat Digital in 2018. He brings an extensive knowledge of all things related to SEO, PPC and Paid Social, as well as an expert knowledge of digital strategy.

Digital Marketing can be a minefield for many businesses, with many agencies ready to take your money without knowing how to deliver results. I founded Wildcat Digital to deliver digital success to businesses with smaller budgets in a transparent way.

Rich Ayre

Head of Growth

Rich joined us in May 2024 to head up our growth team. With years of experience helping other agencies to grow, Rich joins us at an exciting time as Wildcat is working on a five-year plan to become one of the biggest agencies in the UK.

Outside of work, Rich is a father to three children, which keeps him very busy! He’s also recently started running again to keep fit and loves a bit of DIY.

Sarah Tyree

Head of Digital

Sarah joined Wildcat in January 2025, bringing over seven years of SEO expertise to the team. With a background in Fashion Communication and Promotion, she has worked both in-house and at agencies, covering a range of digital marketing specialisms before focusing on SEO.

Passionate about all things search, Sarah thrives on helping brands grow their online presence.

Outside of work, she enjoys walking her dog, running, and shopping for vintage clothing.

Amelia Ashman

Office Manager

Amelia joined Wildcat Digital in January 2025, bringing extensive experience in HR, Health & Safety, Facilities Management and IT Support. Previously an Operations Manager at The University of Sheffield, she has a strong background in creating efficient and well-organized work environments.

Specialising in HR, Health & Safety, and Facilities Management, Amelia ensures the Wildcat Digital team has the resources and support needed to thrive. Whether managing office operations, maintaining compliance, or fostering a positive workplace culture, she keeps everything running smoothly.

Outside of work, Amelia loves trying new things, traveling, camping, and walking. She also enjoys socialising and exploring new places with friends and family. Her adventurous spirit and proactive approach make her a valued member of the team.

Siena Russell

Client Success Coordinator

Siena joined us in 2023 with a background in sales and digital marketing. She leads on client relationships across the company, ensuring that our customers are happy throughout their journey with us, from their initial consultation through to onboarding and beyond.

Outside of work, Siena enjoys travelling and getting stuck into the local culture. She likes to make the most of her experiences and particularly enjoys watching sunrises and sunsets from beautiful locations around the world.

Paul Pennington

SEO Account Director

Paul has a strong background in SEO, having previously founded and ran a successful eCommerce business, as well as running a personal blog that achieves an average of 17K users per month. Paul’s knowledge of SEO is extensive, with a strong emphasis on client handling and technical SEO.

Outside of work, Paul enjoys spending time with his family and staying active with weight lifting and combat sports.

Dariusz Baczyk

Team Lead & Technical SEO Account Manager

With a degree in Computer Science and SEO experience dating back to 2017, Dariusz has a wide range of SEO skills and knowledge. His specialist knowledge of Technical SEO has firmly landed him the title of Wildcat’s Technical Wizard, and he has recently taken on the responsibility of Team Leader for the Panthers Team.

In his spare time, Dariusz loves hiking, experimenting and trying new coffees and loves learning new things. He is currently learning more about CRO and AI and how this could benefit our clients.

Molly Sturgeon

Team Lead & Senior SEO Account Manager

With a background in sales, Molly is a natural Account Manager, brilliantly handling any issues that come her way. Having joined us as a Digital Marketing Executive, and working part-time through her final year of University, Molly is a shining example of how hard work pays off. She is now an SEO Account Manager with a particular interest in Content and Client Management.

In her spare time, Molly loves to get out in nature, hiking and exploring the Peak District. She also loves cooking and likes to unwind with a bit of yoga.

Libby Oldale

PPC Team Leader

Libby joined Wildcat in 2021 as our first PPC hire. With a degree in Digital Media Production, a Master’s in Digital Media Management and previous experience in Social Media Management, Libby hit the ground running and has since climbed the ranks to Senior PPC Account Manager and has a particular interest in the eCommerce sector.

Outside of work, Libby likes gaming, and cooking and likes to keep active by lifting weights.

Jamie Stowe

Senior SEO Account Manager

With a degree in Film and TV production, and a varied career history, Jamie made the move to marketing with a Masters degree in Digital Media Management. He has since worked in SEO at Agencies across Sheffield, before joining Wildcat and working his way up to SEO Account Manager. Jamie has a particular interest in backlinks and Digital PR and has recently gained a client a valuable backlink from Forbes!

In his spare time, Jamie is an avid foodie and loves trying new restaurants and cuisines. He also loves to travel and spent a year travelling to Australia after university.

Jasmine Savery

SEO Account Manager

Jasmine joined Wildcat in 2022 with a strong background in SEO and Account Management. At the time, she was finishing up a Level 4 Apprenticeship in Digital Marketing from the Chartered Institute of Marketing, and has since worked her way up to SEO Account Manager. Jasmine excels at content writing and promotion, and particularly enjoys finding creative ways to join the dots on multi-channel campaigns.

In her spare time, Jasmine volunteers at a charity, helping combat loneliness & social isolation experienced by older neighbours. Outside of Wildcat, she owns a catering company, Savery Grazing, creating delicious grazing tables & platters for a range of events. She also loves skiing and exploring the Peak District.

Thea Chapman

SEO Account Manager

Thea has a wealth of experience in SEO, having previously worked for other Digital Marketing Agencies in Sheffield. She has a particular interest and skills in Technical SEO, but is more than willing to get stuck in and give anything a go.

Outside of work, Thea spends most of her time with her children, but also loves reading, photography and gardening.

Masilda Hysi

PPC Account Manager

Masilda joined the Wildcat team in October 2024 with over seven years of experience in digital marketing. She specialises in Google Ads, but is also certified in Google Analytics, YouTube Ads, Google Ads for Ecommerce and Apple Search Ads. She has extensive expertise in performance marketing, display advertising, online lead generation and market planning.

In her free time, Masilda likes staying active, cooking, trying new restaurants and exploring new places.

Jon Herdman

Senior SEO Executive

After spending ten years managing businesses, restaurants, cafes and event spaces across Sheffield, Jon decided to change careers and joined Wildcat as an SEO Executive in 2022. He especially enjoys the client management side of the job, helping them to understand digital marketing and ways in which they can build their business’s presence online.

Outside of work, Jon likes to keep fit with running, badminton and football, and also loves music.

Andy Blanchard

Senior SEO Executive

Andy joined Wildcat in 2023 after starting his digital marketing career in-house for a local Sheffield company. Since joining, he has developed a strong interest in Technical SEO and has strong skills in Account Management.

Outside of work, Andy loves music and plays in a couple of bands. He also enjoys rock climbing, cycling, photography and good food.

Kezia Humphries

Senior SEO Executive

Kezia joined us in July 2024 after completing a CIM Certificate in Digital Marketing and gaining experience in Content SEO at another Sheffield agency.

In her spare time, Kezia loves to get outdoors, bouldering, hiking and travelling.

Alex Hickling

Senior PPC Executive

Alex joined Wildcat Digital in December 2024 as a Senior PPC Executive, bringing a strong background in Paid Media, Paid Social, and Programmatic advertising. With a degree in Business & Marketing and Google Ads certifications, she has the expertise to craft high-performing campaigns that drive results.

Before joining Wildcat Digital, Alex worked at two leading agencies in Leeds, honing her skills across various digital advertising platforms. Her analytical mindset and strategic approach help businesses maximize their online presence and advertising budgets.

Outside of work, Alex enjoys spending time with her dog, Lola, and going on walks with her dog walking group. She’s also a keen footballer and loves playing five-a-side whenever she gets the chance. Her enthusiasm and team spirit make him a great addition to the Wildcat Digital team.

Amy Varley

SEO Executive

Amy joined Wildcat in 2024 with a background in journalism, having worked as a News Editor and Editor-in-Chief at The Sheffield Tab. She is naturally interested in Content SEO and research, so will no doubt prove to be a content power-house.

In her spare time, Amy loves watching crime shows, listening to music and hanging out with her dog, Eddie!

Jump to:

How Does Crawling Work?

What is a Crawl Budget?

What Factors Influence Crawl Budget?

Server Performance

Robots.txt

XML Sitemaps

Duplicate Content

4xx & Soft 404 Errors

Low-Value Pages

Redirect Chains

How Do You Manage a Crawl Budget?

Improve Server Performance and Page Load Times

Disallow Irrelevant pages in robots.txt

Update your XML Sitemap

Remove Duplicated Content

Address 404 Internal Links and Remove soft 404s

Remove or Optimise Low-Value Pages

Remove Redirect Chains

Improving Your Website’s Crawlability With Wildcat Digital

Jon Herdman

Like what you see?

Will Hitchmough

Rich Ayre

Sarah Tyree

Amelia Ashman

Siena Russell

Paul Pennington

Dariusz Baczyk

Molly Sturgeon

Libby Oldale

Jamie Stowe

Jasmine Savery

Thea Chapman

Masilda Hysi

Jon Herdman

Andy Blanchard

Kezia Humphries

Alex Hickling

Amy Varley

More blogs.

June 2025’s Digital Marketing News Roundup

How to Rank in Google’s Map Pack: A Local SEO Guide for Businesses

May 2025’s Digital Marketing News Roundup