Banc at BrightonSEO: Combatting Crawl Bloat & Pruning Your Content Effectively

Our SEO Director Charlie Whitworth made his and Banc’s BrightonSEO debut on Friday. The second instalment of the conference since its move to the Brighton Centre, the event enjoyed not only some of the finest speakers in the sector but also some spectacularly summery south coast sunshine. Read on to find out more about Charlie’s talk and some of his other highlights of BrightonSEO’s spring 2017 edition…

When thinking about what I wanted to achieve from my presentation at BrightonSEO, it was really providing actionable advice that other SEOs can use to their advantage – these are the talks I have always enjoyed the most at the various events I have attended. With this in mind, the SlideShare below may not quite help you get the full gist of my talk, so I have summarised and hopefully filled in some of the blanks with the following article.

My first appearance at BrightonSEO focussed on something we have had great success with here at Banc for a wide range of our clients, particularly those with large e-commerce websites. The technique of refining crawl bloat and then pruning the remaining content effectively has been something of passion of ours for a few years now and has only been made all the more effective as a result of the latest Google updates.

With lots of definitions out there for what crawl bloat actually is and Google giving some much-needed clarification on things such as crawl rate, demand and budget not so long ago I thought it important to define just what I see crawl bloat as being.

It is essentially “Making search engines work too hard and not taking full advantage of your crawl budget”.

So, what are these definitions and how can they help you optimise your website and indeed your crawl bloat?

Google will take a look at your website’s crawl rate which in turn can influence its crawl demand and then decide upon the correct crawl budget to dedicate to crawling your site.

Crawl Rate

The crawl rate, or crawl rate limit as it sometimes referred to, is influenced by a range of factors such as the health of your site and its server and how hospitable an environment you have created for its bots.

You can, of course, set your minimum and maximum crawl rates from within Google Search Console although ramping this up will by no means guarantee that Google will crawl your site more often.

Crawl Demand

When describing crawl demand to colleagues and clients, I usually say that this is basically how much Google really wants to see your website’s pages. Think about the quality and popularity of your site and the aforementioned health, and how often you are creating compelling content.

If you have a popular site with regularly updated and high-quality copy, then your crawl demand will most likely never be an issue.

Crawl Budget

As mentioned, the end result is Google allocating your site a crawl budget which can and will change over time as you improve your site. This is what Google can and, of course, wants to crawl and keep crawling over time – provided your site ticks all boxes I have described.

Things to avoid when looking to promote your site’s crawl budget:

  • Empty Pages
  • Soft 404 Pages
  • Low Value Pages
  • Spammy Tactics
  • Duplicate Content
  • Malware & Hacked Pages

Does Your Site Have a Crawl Bloat Issue?

Perhaps the most important point I tried to hit home during my talk was that crawl bloat will definitely not be adversely impacting the performance of every site from an SEO perspective. If you are working on a fairly small, non-e-commerce website which doesn’t auto generate URLs, then the chances are that addressing crawl bloat will not be the best use of your time.

what is crawl bloat

A top tip is to use your software of choice to see just how many URLs are being crawled versus what is in the index and you should have your answer; although it may not be as clear cut as that in every scenario. You need to use your intuition as an SEO practitioner to determine whether or not you should be refining crawl bloat as part of your overall technical strategy.

Engagement Metrics

As we know following many of the search quality updates over the last few years, not least the recent Fred update, engagement metrics have soared in their importance. This has put more onus on SEOs and webmasters to ensure that everything being crawled and indexed, should be of high calibre.

Mobile first indexing has only exacerbated this and made the requirement of a clean, tidy and engaging SEO house all the more crucial.

Problematic URLs

With the vast amount of CMS systems, frameworks and platforms out there, it would be virtually impossible to list every type of URL and parameter that SEOs should be on the look out for when addressing crawl bloat but these can be determined with common sense.

pagination

Some typical culprits include:

  • Search Parameters
  • Paginated Series
  • Faceted Navigations

Tools For Combatting Crawl Bloat

If you have established that you definitely have a crawl bloat issue and you need to resolve this for the good of your SEO campaign, there are lots of tools available. The trick is determining which you should use for which parts of your site.

A key point I looked to make during my presentation was ensuring you don’t just try to throw as much of the proverbial at the wall and hope it sticks and resolves the issue. This can actually end up being counterproductive in a lot of cases.

A good example I made in my presentation was that of using robots.txt rules alongside meta robots noindex tags. One should remember that in order for Google to honour these tags, it makes sense that it needs to see them.

crawl bloat analytics

So, by blocking crawl paths in your robots.txt file you may not only be stemming the flow of internal PageRank, but also stopping the search engines from seeing that you no longer want the page(s) in question to be indexed. Choose your crawl bloat tools with great care as they are usually very powerful and have the potential to cripple your site if misused.

The most common techniques at your disposal are:

  • NoIndex Meta Robots Tag
  • URL Parameters Function (within Google Search Console)
  • XML Sitemaps
  • Robots.txt

You can, of course, gauge your efforts using some very handy tools that Google itself provides via Search Console. From crawl stats to the robots.txt tester, sitemaps and URL parameters, there are strong hints there of exactly what Google wants you to do, so be sure not to ignore them.

Pruning for SEO

Although I went for the term ‘pruning’, the second half of my BrightonSEO talk was ultimately about good SEO housekeeping – something we should all be doing regularly anyway.

This practice should be carried out once we have our crawl bloat under control, ensuring we only index content that has these strong engagement metrics we have already mentioned. By auditing regularly and noindexing poor content, we are able to promote the way the search engines see our content.

What kind of things are we looking to “prune” out of the index?

  • Spun Content
  • Over Optimised Content
  • Sparse Articles
  • Saturation
  • Excessive Blogging

Although much of the above was reasonable SEO practice not too long ago, the search engines have been wise to this for some time now. Don’t compromise all your efforts refining your crawl bloat by leaving lots of poor quality and likely unengaging content in the index, you will almost certainly be punished for this algorithmically.

charlie whitworth seo

Once you are sure you have done all you can technically to improve Google’s experience of crawling your site, why not pass your indexable pages to your content team for deep, creative audit?

Which techniques are available for pruning?

  • 410
  • 404
  • 301
  • NoIndex Meta Robots Tag
  • Canonical Tag

Search Quality Evaluator Guidelines

The last section of my talk was certainly not one I had planned when starting my presentation back in Autumn 2016, but the Fred update came at the perfect time.

With many of my points being related to content quality and what the search engines and users should be presented with, the new search quality evaluator guidelines and the latest quality updates all but endorsed just that.

fred update

I personally do not believe it a coincidence that these events happened within a month of each other and there is now clear advice about what “quality” means. Whether it is a coincidence or not, it really makes no difference as the message is clear and we should all be taking advantage of a rare occurrence – Google telling us exactly how to create and optimise content that will rank well.

I finished my talk on combatting crawl bloat and pruning content effectively by urging SEOs not to ruin all their hard work on addressing these issues with a shoddy link profile. The usual spam and links pointing at some of the poorer content we have discussed can only harm progress and mean the impact of fixing these onsite problems could be diluted significantly.

My Highlight of BrightonSEO

Other than my fellow speakers from the Crawl & Indexation session, Janet Plumpton and Sean Butcher who gave awesome presentations on X-Path and Canonical Tags respectively, the highlight of the event for me was without doubt the keynote speaker, Rory Sutherland. His talk on whether online marketers had created a boring culture was fascinating, particularly his point that by virtue of “optimising” for anything, we naturally fail to focus on other areas (the customer usually) and that this should be considered when looking at the wider picture at any agency.

If the the theme of crawl bloat and pruning has left you scratching your head as to whether or not this could be an issue for your e-commerce website or SEO performance, then don’t hesitate to get in touch with Banc. Our team would be only to happy to answer any questions you have regarding this, PPC, CRO or Content Marketing for your brand. Call us on 0345 459 0558 or drop us a line on [email protected]