Your Content Goes Here

What is Search Engine?

Search engines are answer machines. They scour billions of pieces of content and evaluate thousands of factors to determine which content is most likely to answer your query.

Search engines do all of this by discovering and cataloguing all available content on the Internet (web pages, PDFs, images, videos, etc.) via a process known as “crawling and indexing,” and then ordering it by how well it matches the query in a process we refer to as “ranking.” We’ll cover crawling, indexing, and ranking in more detail later during the program.

How do search engines work?

Search engines have three primary functions:

  1. Crawl: Scour the Internet for content, looking over the code/content for each URL they find.
  2. Index: Store and organize the content found during the crawling process. Once a page is in the index, it’s in the running to be displayed as a result to relevant queries.
  3. Rank: Provide the pieces of content that will best answer a searcher’s query, which means that results are ordered by most relevant to least relevant.

What is search engine crawling?

Crawling is the discovery process in which search engines send out a team of robots (known as crawlers or spiders) to find new and updated content. Content can vary — it could be a webpage, an image, a video, a PDF, etc. — but regardless of the format, content is discovered by links.

Search engine robots, also called spiders, crawl from page to page to find new and updated content.

Googlebot starts out by fetching a few web pages, and then follows the links on those webpages to find new URLs. By hopping along this path of links, the crawler is able to find new content and add it to their index called Caffeine — a massive database of discovered URLs — to later be retrieved when a searcher is seeking information that the content on that URL is a good match for.

What is a search engine index?

Search engines process and store information they find in an index, a huge database of all the content they’ve discovered and deem good enough to serve up to searchers.

Search engine ranking

When someone performs a search, search engines scour their index for highly relevant content and then orders that content in the hopes of solving the searcher’s query. This ordering of search results by relevance is known as ranking. In general, you can assume that the higher a website is ranked, the more relevant the search engine believes that site is to the query.

It’s possible to block search engine crawlers from part or all of your site, or instruct search engines to avoid storing certain pages in their index. While there can be reasons for doing this, if you want your content found by searchers, you have to first make sure it’s accessible to crawlers and is indexable. Otherwise, it’s as good as invisible

Crawling: Can search engines find your pages?

One way to check your indexed pages is “site:yourdomain.com”, an advanced search operator. Head to Google and type “site:yourdomain.com” into the search bar. This will return results Google has in its index for the site specified:

The number of results Google displays (see “About XX results” above) isn’t exact, but it does give you a solid idea of which pages are indexed on your site and how they are currently showing up in search results.

What is SEO?

SEO stands for “search engine optimization.” It’s the practice of increasing both the quality and quantity of website traffic, as well as exposure to your brand, through non-paid (also known as “organic”) search engine results.

Despite the acronym, SEO is as much about people as it is about search engines themselves.It’s about understanding what people are searching for online, the answers they are seeking, the words they’re using, and the type of content they wish to consume. Knowing the answers to these questions will allow you to connect to the people who are searching online for the solutions you offer.

If knowing your audience’s intent is one side of the SEO coin, delivering it in a way search engine crawlers can find and understand is the other. In this guide, expect to learn how to do both.

Which search results are “organic”?

Organic search results are the ones that are earned through effective SEO, not paid for (i.e. not advertising). These used to be easy to spot – the ads were clearly labeled as such and the remaining results typically took the form of “10 blue links” listed below them. But with the way search has changed, how can we spot organic results today?

Today, search engine results pages — often referred to as “SERPs” — are filled with both more advertising and more dynamic organic results formats (called “SERP features”) than we’ve ever seen before. Some examples of SERP features are featured snippets (or answer boxes), People Also Ask boxes, image carousels, etc. New SERP features continue to emerge, driven largely by what people are seeking.

For example, if you search for “Denver weather,” you’ll see a weather forecast for the city of Denver directly in the SERP instead of a link to a site that might have that forecast. And, if you search for “pizza Denver,” you’ll see a “local pack” result made up of Denver pizza places. Convenient, right?

It’s important to remember that search engines make money from advertising. Their goal is to better solve searcher’s queries (within SERPs), to keep searchers coming back, and to keep them on the SERPs longer.

Some SERP features on Google are organic and can be influenced by SEO. These include featured snippets (a promoted organic result that displays an answer inside a box) and related questions (a.k.a. “People Also Ask” boxes).

It’s worth noting that there are many other search features that, even though they aren’t paid advertising, can’t typically be influenced by SEO. These features often have data acquired from proprietary data sources, such as Wikipedia, WebMD, and IMDb.

Why SEO is important

While paid advertising, social media, and other online platforms can generate traffic to websites, the majority of online traffic is driven by search engines.

Organic search results cover more digital real estate, appear more credible to savvy searchers, and receive way more clicks than paid advertisements. For example, of all US searches, only ~2.8% of people click on paid advertisements.

In a nutshell: SEO has ~20X more traffic opportunity than PPC on both mobile and desktop.

SEO is also one of the only online marketing channels that, when set up correctly, can continue to pay dividends over time. If you provide a solid piece of content that deserves to rank for the right keywords, your traffic can snowball over time, whereas advertising needs continuous funding to send traffic to your site.

Search engines are getting smarter, but they still need our help.

Optimizing your site will help deliver better information to search engines so that your content can be properly indexed and displayed within search results.

What is on-site SEO?

On-site SEO (also known as on-page SEO) is the practice of optimizing elements on a website (as opposed to links elsewhere on the Internet and other external signals collectively known as “off-site SEO”) in order to rank higher and earn more relevant traffic from search engines. On-site SEO refers to optimizing both the content and HTML source code of a page.

Beyond helping search engines interpret page content, proper on-site SEO also helps users quickly and clearly understand what a page is about and whether it addresses their search query. In essence, good on-site SEO helps search engines understand what a human would see (and what value they would get) if they visited a page, so that search engines can reliably serve up what human visitors would consider high-quality content about a particular search query (keyword).

The ultimate goal of on-site SEO can be thought of as attempting to make it as easy as possible for both search engines and users to:

  • Understand what a webpage is about;
  • Identify that page as relevant to a search query or queries (i.e. a particular keyword or set of keywords);
  • Find that page useful and worthy of ranking well on a search engine results page (SERP).

What are On-Page Ranking Factors for SEO?

On-page ranking factors can have a big impact on your page’s ability to rank if optimized properly. The biggest on-page factors that affect search engine rankings are:

Content of Page

The content of a page is what makes it worthy of a search result position. It is what the user came to see and is thus extremely important to the search engines. As such, it is important to create good content. So what is good content? From an SEO perspective, all good content has two attributes. Good content must supply a demand and must be linkable.

Good content supplies a demand:

Just like the world’s markets, information is affected by supply and demand. The best content is that which does the best job of supplying the largest demand. It might take the form of an XKCD comic that is supplying nerd jokes to a large group of technologists or it might be a Wikipedia article that explains to the world the definition of Web 2.0. It can be a video, an image, a sound, or text, but it must supply a demand in order to be considered good content.

Good content is linkable:

From an SEO perspective, there is no difference between the best and worst content on the Internet if it is not linkable. If people can’t link to it, search engines will be very unlikely to rank it, and as a result the content won’t drive traffic to the given website. Unfortunately, this happens a lot more often than one might think. A few examples of this include: AJAX-powered image slide shows, content only accessible after logging in, and content that can’t be reproduced or shared. Content that doesn’t supply a demand or is not linkable is bad in the eyes of the search engines—and most likely some people, too.

Title Tag

Title tags are the second most important on-page factor for SEO, after content..

URL

Along with smart internal linking, SEOs should make sure that the category hierarchy of the given website is reflected in URLs.

The following is a good example of URL structure:

  • http://www.example.org/games/video-game-history

This URL clearly shows the hierarchy of the information on the page (history as it pertains to video games in the context of games in general). This information is used to determine the relevancy of a given web page by the search engines. Due to the hierarchy, the engines can deduce that the page likely doesn’t pertain to history in general but rather to that of the history of video games. This makes it an ideal candidate for search results related to video game history. All of this information can be speculated on without even needing to process the content on the page.

The following is a bad example of URL structure:

  • http://www.imdb.com/title/tt0468569

Unlike the first example, this URL does not reflect the information hierarchy of the website. Search engines can see that the given page relates to titles (/title/) and is on the IMDB domain but cannot determine what the page is about. The reference to “tt0468569” does not directly infer anything that a web surfer is likely to search for. This means that the information provided by the URL is of very little value to search engines.

URL structure is important because it helps the search engines to understand relative importance and adds a helpful relevancy metric to the given page. It is also helpful from an anchor text perspective because people are more likely to link with the relevant word or phrase if the keywords are included in the URL.

What is a meta title tag?

A title tag is an HTML element that specifies the title of a web page. Title tags are displayed on search engine results pages (SERPs) as the clickable headline for a given result, and are important for usability, SEO, and social sharing. The title tag of a web page is meant to be an accurate and concise description of a page’s content.

Optimal format

Primary Keyword – Secondary Keyword | Brand Name
8-foot Green Widgets – Widgets & Tools | Widget World

Optimal title length

Google typically displays the first 50–60 characters of a title tag. If you keep your titles under 60 characters, our research suggests that you can expect about 90% of your titles to display properly. There’s no exact character limit, because characters can vary in width and Google’s display titles max out (currently) at 600 pixels.


Why are title tags important?

Meta title tags are a major factor in helping search engines understand what your page is about, and they are the first impression many people have of your page. Title tags are used in three key places: (1) search engine results pages (SERPs), (2) web browsers, and (3) social networks.

1. Search engine result pages

Your title tag determines (with a few exceptions) your display title in SERPs, and is a search visitor’s first experience of your site. Even if your site ranks well, a good title can be the make-or-break factor in determining whether or not someone clicks on your link.

learning-title-tag-4.png?mtime=20170315125739#asset:3469:url

2. Web browsers

Your title tag is also displayed at the top of your web browser and acts as a placeholder, especially for people who have many browser tabs open. Unique and easily recognizable titles with important keywords near the front help ensure that people don’t lose track of your content.

learning-title-tag-5.png?mtime=20170315125740#asset:3467:url

3. Social networks

Some external websites — especially social networks — will use your title tag to determine what to display when you share that page. Here’s a screenshot from Facebook, for example:

learning-title-tag-6.png?mtime=20170315125741#asset:3465:url

Keep in mind that some social networks (including Facebook and Twitter) have their own meta tags, allowing you to specify titles that differ from your main title tag. This can allow you to optimize for each network, and provide longer titles when/where they might be beneficial.


How do I write a good title tag?

Because title tags are such an important part of both search engine optimization and the search user experience, writing them effectively is a terrific low-effort, high-impact SEO task. Here are critical recommendations for optimizing title tags for search engine and usability goals:

1. Watch your title length

If your title is too long, search engines may cut it off by adding an ellipsis (“…”) and could end up omitting important words. While we generally recommend keeping your titles under 60 characters long, the exact limit is a bit more complicated and is based on a 600-pixel container.

Some characters naturally take up more space. A character like uppercase “W” is wider than a lowercase character like “i” or “t”. Take a look at the examples below:

learning-title-tag-1.png?mtime=20170315125737#asset:3475:url

The first title displays a full 77 characters because the “ittl” in “Littlest” is very narrow, and the title contains pipes (“|”). The second title cuts off after only 42 characters because of wide capital letters (like “W”) and the fact that the next word in the title tag is the full website name.

Try to avoid ALL CAPS titles. They may be hard for search visitors to read, and may severely limit the number of characters Google will display.

Keep in mind that, even within a reasonable length limit, search engines may choose to display a different title than what you provide in your title tag. For example, Google might append your brand to the display title, like this one:

learning-title-tag-2.png?mtime=20170315125736#asset:3473:url

Here, because Google cut off the text before adding the brand (the text before “…” is the original text), only 35 characters of the original title were displayed. See more below about how to prevent search engines from rewriting your title tags.

Keep in mind that longer titles may work better for social sharing in some cases, and some titles are just naturally long. It’s good to be mindful of how your titles appear in search results, but there are no penalties for using a long title. Use your judgment, and think like a search visitor.

2. Don’t overdo SEO keywords

While there is no penalty built into Google’s algorithm for long titles, you can run into trouble if you start stuffing your title full of keywords in a way that creates a bad user experience, such as:

Buy Widgets, Best Widgets, Cheap Widgets, Widgets for Sale

Avoid titles that are just a list of keywords or repeat variations of the same keyword over and over. These titles are bad for search users and could get you into trouble with search engines. Search engines understand variations of keywords, and it’s unnecessary and counterproductive to stuff every version of your keyword into a title.

3. Give every page a unique title

Unique titles help search engines understand that your content is unique and valuable, and also drive higher click-through rates. On the scale of hundreds or thousands of pages, it may seem impossible to craft a unique title for every page, but modern CMS and code-based templates should allow you to at least create data-driven, unique titles for almost every important page of your site. For example, if you have thousands of product pages with a database of product names and categories, you could use that data to easily generate titles like:

[Product Name] – [Product Category] | [Brand Name]

Absolutely avoid default titles, like “Home” or “New Page” — these titles may cause Google to think that you have duplicate content across your site (or even across other sites on the web). In addition, these titles almost always reduce click-through rates. Ask yourself: how likely are you to click on a page called “Untitled” or “Product Page”?

4. Put important keywords first

According to Moz’s testing and experience, keywords closer to the beginning of your title tag may have more impact on search rankings. In addition, user experience research shows that people may scan as few as the first two words of a headline. This is why we recommend titles where the most unique aspect of the page (e.g. the product name) appears first. Avoid titles like:

Brand Name | Major Product Category – Minor Product Category – Name of Product

Titles like this example front-load repetitive information and provide very little unique value at first glance. In addition, if search engines cut off a title like this, the most unique portion is the most likely to disappear.

5. Take advantage of your brand

If you have a strong, well-known brand, then adding it to your titles may help boost click-through rates. We generally still recommend putting your brand at the end of the title, but there are cases (such as your home page or about page) where you may want to be more brand-focused. As mentioned earlier, Google may also append your brand automatically to your display titles, so be mindful of how your search results are currently displayed.

6. Write for your customers

While title tags are very important to SEO, remember that your first job is to attract clicks from well-targeted visitors who are likely to find your content valuable. It’s vital to think about the entire user experience when you’re creating your title tags, in addition to optimization and keyword usage. The title tag is a new visitor’s first interaction with your brand when they find it in a search result — it should convey the most positive and accurate message possible.


Why won’t Google use my title tag?

Sometimes, Google may display a title that doesn’t match your title tag. This can be frustrating, but there’s no easy way to force them to use the title you’ve defined. When this happens, there are four likely explanations…

1. Your title is keyword-stuffed

As discussed above, if you try to stuff your title with keywords (sometimes called “over-optimization”), Google may choose to simply rewrite it. For many reasons, consider rewriting your title to be more useful to search users.

2. Your title doesn’t match the query

If your page is matching for a search query that isn’t well represented in the title, Google may choose to rewrite your display title. This isn’t necessarily a bad thing — no title is going to match every imaginable search — but if your title is being overruled for desirable, high-volume searches, then consider rewriting it to better match those search keywords and their intent.

3. You have an alternate title

In some cases, if you include alternate title data, such as meta tags for Facebook or Twitter, Google may choose to use those titles instead. Again, this isn’t necessarily a bad thing, but if this creates an undesirable display title, you might want to rewrite the alternate title data.

What is Meta Description?

The meta description is an HTML attribute that provides a brief summary of a web page. Search engines such as Google often display the meta description in search results, which can influence click-through rates.

Example:

Meta Description example
Meta description example

Code sample

content=”Design and print outstandingly-premium Business Cards from MOO. Create your own Business Card or use a template. Quality guaranteed and award-winning customer service. | MOO (United States)“/>

Optimal length

Meta descriptions can be any length, but Google generally truncates snippets to ~155–160 characters. It’s best to keep meta descriptions long enough that they’re sufficiently descriptive, so we recommend descriptions between 50–160 characters. Keep in mind that the “optimal” length will vary depending on the situation, and your primary goal should be to provide value and drive clicks.

Optimal format

Meta description tags, while not tied to search engine rankings, are extremely important in gaining user click-through from SERPs. These short paragraphs are a webmaster’s opportunity to “advertise” content to searchers, and searchers’ chance to decide whether the content is relevant and contains the information they’re seeking from their search query.

A page’s meta description should intelligently (read: in a natural, active, non-spammy way) employ the keywords that page is targeting, but also create a compelling description that a searcher will want to click. It should be directly relevant to the page it describes, and unique from the descriptions for other pages.

Google ranking factor?

Google announced in September of 2009 that neither meta descriptions nor meta keywords factor into Google’s ranking algorithms for web search.

Meta descriptions can however impact a page’s CTR (click-through-rate) on Google which can positively impact a page’s ability to rank.

For that reason, among others, it’s important to put some effort into meta descriptions.

SEO best practices

Write compelling ad copy

The meta description tag serves the function of advertising copy. It draws readers to a website from the SERP, and thus is a very visible and important part of search marketing. Crafting a readable, compelling description using important keywords can improve the click-through rate for a given webpage. To maximize click-through rates on search engine result pages, it’s important to note that Google and other search engines bold keywords in the description when they match search queries. This bold text can draw the eyes of searchers, so you should match your descriptions to search terms as closely as possible.

Meta Description written as ad copy

Avoid duplicate meta description tags

As with title tags, it’s important that meta descriptions on each page be unique. Otherwise, you’ll end up with SERP results that look like this:

Avoid duplicate meta descriptions example

One way to combat duplicate meta descriptions is to implement a dynamic and programmatic way to create unique meta descriptions for automated pages. If possible, though, there’s no substitute for an original description that you write for each page.

Don’t include double quotation marks

Any time quotation marks are used in the HTML of a meta description, Google cuts off that description at the quotation mark when it appears on a SERP. To prevent this from happening, your best bet is to remove all non-alphanumeric characters from meta descriptions. If quotation marks are important in your meta description, you can use the HTML entity rather than double quotes to prevent truncation.

Sometimes it’s okay to not write meta descriptions

Although conventional logic would hold that it’s universally wiser to write a good meta description rather than let the engines scrape a given web page, this isn’t always the case. Use this general rule of thumb to identify whether you should write your own meta description:

If a page is targeting between one and three heavily searched terms or phrases, write your own meta description that targets those users performing search queries including those terms.

If the page is targeting long-tail traffic (three or more keywords), it can sometimes be wiser to let the engines populate a meta description themselves. The reason is simple: When search engines pull together a meta description, they always display the keywords and surrounding phrases that the user has searched for. If a webmaster writes a meta description into the page’s code, what they choose to write can actually detract from the relevance the engines make naturally, depending on the query.

One caveat to intentionally omitting meta description tags:  Keep in mind that social sharing sites like Facebook commonly use a page’s meta description tag as the description that appears when the page is shared on their sites. Without the meta description tag, social sharing sites may just use the first text they can find. Depending on the first text on your page, this might not create a good user experience for people encountering your content via social sharing.

Meta description pulling from other html text

Heads up: Search engines won’t always use your meta description

In some cases, search engines may overrule the meta description a webmaster has specified in the HTML of a page. Precisely when this will happen is unpredictable, but it often occurs when Google doesn’t think the existing meta description adequately answers a user’s query and identifies a snippet from the target page that better matches a searcher’s query.

What is a robots.txt file?

Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includes directives like meta robots, as well as page-, subdirectory-, or site-wide instructions for how search engines should treat links (such as “follow” or “nofollow”).

In practice, robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) user agents.

Basic format:
User-agent: [user-agent name]Disallow: [URL string not to be crawled]

Together, these two lines are considered a complete robots.txt file — though one robots file can contain multiple lines of user agents and directives (i.e., disallows, allows, crawl-delays, etc.).

Within a robots.txt file, each set of user-agent directives appear as a discrete set, separated by a line break:

User agent directives specified by line breaks.

In a robots.txt file with multiple user-agent directives, each disallow or allow rule only applies to the useragent(s) specified in that particular line break-separated set. If the file contains a rule that applies to more than one user-agent, a crawler will only pay attention to (and follow the directives in) the most specific group of instructions.

Here’s an example:

Robots.txt.png?mtime=20170427090303#asset:5201:large

Msnbot, discobot, and Slurp are all called out specifically, so those user-agents will only pay attention to the directives in their sections of the robots.txt file. All other user-agents will follow the directives in the user-agent: * group.

Example robots.txt:

Here are a few examples of robots.txt in action for a www.example.com site:

Robots.txt file URL: www.example.com/robots.txt
Blocking all web crawlers from all content
User-agent: * Disallow: /

Using this syntax in a robots.txt file would tell all web crawlers not to crawl any pages on www.example.com, including the homepage.

Allowing all web crawlers access to all content
User-agent: * Disallow:

Using this syntax in a robots.txt file tells web crawlers to crawl all pages on www.example.com, including the homepage.

Blocking a specific web crawler from a specific folder
User-agent: Googlebot Disallow: /example-subfolder/

This syntax tells only Google’s crawler (user-agent name Googlebot) not to crawl any pages that contain the URL string www.example.com/example-subfol… a specific web crawler from a specific web page

User-agent: BingbotDisallow: /example-subfolder/blocked-page.html

This syntax tells only Bing’s crawler (user-agent name Bing) to avoid crawling the specific page at www.example.com/example-subfol… does robots.txt work?

Search engines have two main jobs:

  1. Crawling the web to discover content;
  2. Indexing that content so that it can be served up to searchers who are looking for information.

To crawl sites, search engines follow links to get from one site to another — ultimately, crawling across many billions of links and websites. This crawling behavior is sometimes known as “spidering.”

After arriving at a website but before spidering it, the search crawler will look for a robots.txt file. If it finds one, the crawler will read that file first before continuing through the page. Because the robots.txt file contains information about how the search engine should crawl, the information found there will instruct further crawler action on this particular site. If the robots.txt file does not contain any directives that disallow a user-agent’s activity (or if the site doesn’t have a robots.txt file), it will proceed to crawl other information on the site.

Other quick robots.txt must-knows:

(discussed in more detail below)

  • In order to be found, a robots.txt file must be placed in a website’s top-level directory.
  • Robots.txt is case sensitive: the file must be named “robots.txt” (not Robots.txt, robots.TXT, or otherwise).
  • Some user agents (robots) may choose to ignore your robots.txt file. This is especially common with more nefarious crawlers like malware robots or email address scrapers.
  • The /robots.txt file is a publicly available: just add /robots.txt to the end of any root domain to see that website’s directives (if that site has a robots.txt file!). This means that anyone can see what pages you do or don’t want to be crawled, so don’t use them to hide private user information.
  • Each subdomain on a root domain uses separate robots.txt files. This means that both blog.example.com and example.com should have their own robots.txt files (at blog.example.com/robots.txt and example.com/robots.txt).
  • It’s generally a best practice to indicate the location of any sitemaps associated with this domain at the bottom of the robots.txt file. Here’s an example:
Sitemaps in robots.txt

Technical robots.txt syntax

Robots.txt syntax can be thought of as the “language” of robots.txt files. There are five common terms you’re likely come across in a robots file. They include:

  • User-agent: The specific web crawler to which you’re giving crawl instructions (usually a search engine). A list of most user agents can be found here.
  • Disallow: The command used to tell a user-agent not to crawl particular URL. Only one “Disallow:” line is allowed for each URL.
  • Allow (Only applicable for Googlebot): The command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.
  • Crawl-delay: How many seconds a crawler should wait before loading and crawling page content. Note that Googlebot does not acknowledge this command, but crawl rate can be set in Google Search Console.
  • Sitemap: Used to call out the location of any XML sitemap(s) associated with this URL. Note this command is only supported by Google, Ask, Bing, and Yahoo.

Pattern-matching

When it comes to the actual URLs to block or allow, robots.txt files can get fairly complex as they allow the use of pattern-matching to cover a range of possible URL options. Google and Bing both honor two regular expressionsthat can be used to identify pages or subfolders that an SEO wants excluded. These two characters are the asterisk (*) and the dollar sign ($).

  • * is a wildcard that represents any sequence of characters
  • $  matches the end of the URL

Google offers a great list of possible pattern-matching syntax and examples here.

Where does robots.txt go on a site?

Whenever they come to a site, search engines and other web-crawling robots (like Facebook’s crawler, Facebot) know to look for a robots.txt file. But, they’ll only look for that file in one specific place: the main directory (typically your root domain or homepage). If a user agent visits www.example.com/robots.txt and does not find a robots file there, it will assume the site does not have one and proceed with crawling everything on the page (and maybe even on the entire site). Even if the robots.txt page did exist at, say, example.com/index/robots.txt or www.example.com/homepage/robots.txt, it would not be discovered by user agents and thus the site would be treated as if it had no robots file at all.

In order to ensure your robots.txt file is found, always include it in your main directory or root domain.

Why do you need robots.txt?

Robots.txt files control crawler access to certain areas of your site. While this can be very dangerous if you accidentally disallow Googlebot from crawling your entire site (!!), there are some situations in which a robots.txt file can be very handy.

Some common use cases include:

  • Preventing duplicate content from appearing in SERPs (note that meta robots is often a better choice for this)
  • Keeping entire sections of a website private (for instance, your engineering team’s staging site)
  • Keeping internal search results pages from showing up on a public SERP
  • Specifying the location of sitemap(s)
  • Preventing search engines from indexing certain files on your website (images, PDFs, etc.)
  • Specifying a crawl delay in order to prevent your servers from being overloaded when crawlers load multiple pieces of content at once

If there are no areas on your site to which you want to control user-agent access, you may not need a robots.txt file at all.

Checking if you have a robots.txt file

Not sure if you have a robots.txt file? Simply type in your root domain, then add /robots.txt to the end of the URL. For instance, Moz’s robots file is located at moz.com/robots.txt.

If no .txt page appears, you do not currently have a (live) robots.txt page.

How to create a robots.txt file

If you found you didn’t have a robots.txt file or want to alter yours, creating one is a simple process. This articlefrom Google walks through the robots.txt file creation process, and this tool allows you to test whether your file is set up correctly.

Looking for some practice creating robots files? This blog post walks through some interactive examples.

SEO best practices

  • Make sure you’re not blocking any content or sections of your website you want crawled.
  • Links on pages blocked by robots.txt will not be followed. This means 1.) Unless they’re also linked from other search engine-accessible pages (i.e. pages not blocked via robots.txt, meta robots, or otherwise), the linked resources will not be crawled and may not be indexed. 2.) No link equity can be passed from the blocked page to the link destination. If you have pages to which you want equity to be passed, use a different blocking mechanism other than robots.txt.
  • Do not use robots.txt to prevent sensitive data (like private user information) from appearing in SERP results. Because other pages may link directly to the page containing private information (thus bypassing the robots.txt directives on your root domain or homepage), it may still get indexed. If you want to block your page from search results, use a different method like password protection or the noindex meta directive.
  • Some search engines have multiple user-agents. For instance, Google uses Googlebot for organic search and Googlebot-Image for image search. Most user agents from the same search engine follow the same rules so there’s no need to specify directives for each of a search engine’s multiple crawlers, but having the ability to do so does allow you to fine-tune how your site content is crawled.
  • A search engine will cache the robots.txt contents, but usually updates the cached contents at least once a day. If you change the file and want to update it more quickly than is occurring, you can submit your robots.txt url to Google.