What Is a Robots.txt File and Why Does It Matter to Your School?July 12, 2020
If you're getting serious about your SEO (search engine optimization), then you've likely ticked the basics off the list. These include things like, optimizing your online content, making sure your information architecture is organized and checking that Google is crawling and indexing your website correctly.
Once this is done, you'll be looking to level up your SEO efforts. It's time to start exploring the world of robots.txt files. A robots.txt file is a small text file that is applied to your website that tells search engines where they can and can't go within your website. It's one of the easiest parts of a website that you can misconfigure, and is often one of the biggest reasons that your website isn't appearing in search results.
Troubleshooting your search results? Check out our post,
Does my school website need a robots.txt file?
In short — no, you don't, but it can be a really good idea. Most smaller websites can be easily crawled and indexed through Google Search Console. If you are satisfied with Googlebots automatically crawling and indexing every page of your website, then this is probably one optimization that you can leave alone.
However, robots.txt files, along with giving you control of where bots can crawl, can address a number of other potential issues on your website. Here's a list from ahrefs of things a robots.txt file can help with:
- Preventing the crawling of duplicate content
- Keeping sections of a website private (e.g. your staging site)
- Preventing the crawling of internal search results pages
- Preventing server overload
- Preventing Google from wasting “crawl budget”
- Preventing images, videos, and resources files from appearing in Google search results.
For schools, there are likely pages or assets on your website that are intended to be viewed by your school community only, rather than the general public. Robots.txt files are one way of protecting the sections of your website that you would rather not be available for others to find on a search engine (for example, a login page to an internal portal, or images within a private gallery).
How can I implement a robots.txt file on my school website?
Google provides some basic guidelines for creating a robots.txt file using a text editor. You can read those guidelines here. Once you've created your file, it's incredibly important that you thoroughly test its syntax and behavior to ensure it is properly blocking web crawlers from the asset. As we mentioned earlier, unfortunately, robots.txt is one of the easiest files to get wrong, and you can end up unintentionally blocking important content and messing up your SEO. One way that you can test your file is within Google Search Console's robot testing tool.
Once you have created your file, you can either install it yourself, or send instructions to your web developer to implement it at the root of your website to which it should apply. Once it's applied, you should be able to access it at www.yourdomain.com/robots.txt. (Note: you can also apply a robots.txt file to a subdomain of your website, such as blog.yourdomain.com/robots.txt.)
Robots.txt, when correctly implemented, can be a great tool for directing search engines where to go on your school website. However, because it's easy to mess up, it's important that you thoroughly understand what you are implementing when you add a robots.txt file to your website. There are some great resources available to school IT professionals that can help you with creating, implementing and troubleshooting robots.txt on your website: