allows all search engine spiders crawling. If want to let a search engine spider crawling list name behind. If a repeat, write.

file name is lowercase letters.

wildcard *

Pay attention to the following:

, robots

The corresponding

Allow: /index.php allows the site of index.php

Disallow: /*.jpg all JPG files are prohibited.

The basic syntax of file should be placed in the root directory of the web siteThe

robots.txt

The basic concept of

User-Agent: *

For example:

Allow key

must have a robot.txt file.

robots.txt file is a file website, it is to see the search engine spiders. The search engine spider crawling our website first is to grab the file, according to the contents to determine the scope of access to the web site files. It can protect some documents we are not exposed to the search engine, so as to effectively control the spider crawling path, to create the necessary conditions for our webmaster Shanghai dragon. Especially our website just created, some content is still not perfect, yet do not want to be included in the search engine when.

1) User-Agent key

in robots.txt, followed by the key: No., there must be a space, and value is separate.

does not allow the key to illustrate the path search engine spiders crawl URL.

For example: banned website file

2) Disallow key

the end of $

when you need to completely shield the file, need to meet the robots properties of meta.

behind the content is the name of each specific search engine crawler. Love is like Shanghai Baiduspider, Google is the baby bot.

said before.

content item: key: value pairs.

note: User-Agent: back to a space.

on behalf of any number of characters

Disallow: /index.php index.php

two, robots.txtThe basic format of

For example:

the key that allows the search engine spiders crawling URL path

We write this:

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*