The challenge for webmasters everywhere is to accurately know what GoogleBot is doing on your website. GoogleBot’s primary job whenever they enter a website is to crawl a specific number of pages set by the website’s crawl budget. After crawling, they save the pages they crawled to Google’s database.
Having the skills to thoroughly understand GoogleBot’s movements on your site is one of the most effective strategies to improve your technical and onsite SEO. Log file analysis helps you improve your SEO – leading to higher rankings, more traffic, and improved conversions & sales. But what exactly is Log File Analysis?
What is Log File Analysis
Log file analysis is the process of downloading your log files from your server and opening it through a log file analysis tool like Screaming Frog or Splunk. The log file analysis tool enables you to see all the information regarding the “hits” of your website – both bot and human – to assist you in making informed and effective SEO strategies that will take your website to the first page of Google SERPs.
Although log file analysis is an arduous undertaking, it massively helps SEO specialists find and discover important technical SEO problems that cannot be found in any other way. The data that log files contain is accurate, helpful, and important for webmasters and SEOs with regards to our understanding of how search engine crawlers move inside their websites and what specific information they store in their database. But, before we get into the whole process of analyzing log files, we must first understand the types of logs that are used.
Log File Types
The most common of logs come in 3 types. The most common one would be Apache. While other logs include elastic load balancing and W3C – which is common for users of Kibana. The last type would be custom log files that are usually seen for larger sites. So, after knowing the types, what do these log files look like?
They’re commonly made up of 5 parts:
First would be the URLs of the pages the crawler visited.
Second is the timestamp – date and time the crawler made a request.
The third is the Remote Host or the I.P. Address
The response/status code of the page they visited.
Lastly, the user agent. For us SEOs, the most important user agent would be Googlebot.
How to Analyze Log Files
When you’ve collated all these data, the next thing to do is to analyze it to understand how Googlebot and other crawlers go around your website. There’s a great number of tools you can use such as Splunk, Loggly, or you can even analyze your log files through Microsoft Excel. In the screenshots above, I used Screaming Frog Log Analyzer to open the log files of the SEO Hacker blog. The first step to analyzing log files is to use a tool that you’re comfortable with. Other SEOs I know primarily rely on Splunk, while I mostly use Screaming Frog Log Analyzer. Here’s what it looks like:
After opening your log files, what do you analyze? The process goes like this: check the top pages – the pages that have the most number of requests made by Googlebot. At the same time, you should also check the types of Googlebot types that are entering your site. It could either be Googlebot Smartphone, Googlebot Mobile, Googlebot Images, or the standard Googlebot. They should visit the right pages and the right pages should also be responsive and not have any errors.
Crawl Budget and Page Optimization
One of the main objectives of Log File Analysis is to help you know more information about GoogleBot and optimize your crawl budget. When it comes to crawl budget, this refers to the number of times that Google does its site crawl into your website. Here are the best ways to make it work for you and establish a more efficient SEO process:
Evaluate the timeframe, speed, resources, and traffic frequency
Page traffic and is one of the constant statistics that we always keep track of when assessing our SEO Strategies. This means checking out traffic frequency, which tends to be more evident when new or viral content is published, leading to GoogleBot performing site crawls more often. This means taking into account specific timeframes in which GoogleBot performs their actions. Looking into the months, weeks, and days will help you see site crawl, allowing you to take advantage of it when creating optimal strategies.
Focus on Mobile
Mobile Search has become one of the most important elements in SEO. With mobile internet becoming more accessible to a wide audience, it is important to take advantage of this traffic. This means optimizing your website for mobile users, and that includes allowing responsive design and AMP, which allows better viewing and faster loading speeds. The Google Speed update also means that mobile loading speed is now a ranking factor, which means that the GoogleBot might be taking your mobile performance into account.
Navigation allows you to not only be able to navigate through all of your web pages but also allow GoogleBot to conduct their site crawl. Internal links allow these web pages to get crawled, allowing it to appear in search and gain more traffic. We ensure that we do internal linking to a lot of our previous posts, and it has been a process that has provided us with more traffic reaching our website and getting more people to see our content.
Assessing Page Errors
Monitoring your site crawl also allows you to find pages that are not responding or have corresponding 301, 400, or 500 errors. Each of these pages is worth taking a look, as you would need to redirect and fix them in order for GoogleBot to crawl into the right locations. Finding them also opens up more questions on how these issues can be resolved, as cleaning this up will only bring about more benefits to your web site’s traffic, allowing your SEO strategy to take into effect more efficiently.
Removing Pages from the Index
Removing your web pages from the index and taking out duplicate content helps your crawl budget as it optimizes navigation, allowing users to be led into the right places. This may also help you find missing content as well, allowing those missed pages to receive more traffic, which leads to GoogleBot conducting their crawl.
Every SEO expert and webmaster wants to know what’s going on in their website. Log file analysis allows us to understand how Google views our site and which pages are being focused on by the crawler. Know what’s going on, make an effort to check all the resources and pages, clean up the errors you see, repeat. An optimized and authoritative site is what Google is looking for. Is your site one of them?