Возраст домена | n/a |
Дата окончания | n/a |
ИКС | |
Страниц в Google | 51 |
Страниц в Яндексе | n/a |
Dmoz | Нет |
Яндекс Каталог | Нет |
Alexa Traffic Rank | Нет данных |
Alexa Country | Нет данных |
История изменения показателей | Авторизация |
Идет сбор информации... Обновить
Ruskington Garden Centre | The future of great gardening
n/a
n/a
UTF-8
20.97 КБ
274
1 945 симв.
1 633 симв.
Идет сбор информации... Обновить
Идет сбор информации... Обновить
Счетчик | Посетители за 24 часа | Просмотры | Просмотров на посетителя |
---|---|---|---|
Google Analytics | Нет доступа | Нет доступа | n/a |
Error for "ruskingtongardencentre.co.uk".
and will be replenished in 18088 seconds
WHOIS lookup made at 00:59:57 18-Nov-2016
--
Robots.txt Generator - McAnerin International Inc.
International English
Site Map | Search
International Home SEM | Search Engine Marketing SEO | Search Engine
Optimization Search Resources About + Contact Us Back | McAnerin Networks Inc. >
Robots.txt Tool
SEO Tools
Robot Control Code Generation Tool
If you know of a robot that should be added to this list please contact us and
we will verify and add it.
Newest Additions: XML Sitemap Auto Discovery directive
Recent Additions: Baidu, image and blog crawlers, Alexa/wayback, and
crawl-delay directives.
Default - All Robots are: Allowed Refused
Crawl-Delay: Default - No Delay 5 Seconds 10 Seconds 20 Seconds 60
seconds 120 Seconds
Sitemap: (leave blank for none)
Specific Search Robots:Google Same as Default Allowed Refused
googlebot
MSN Search Same as Default Allowed Refused msnbot
Yahoo Same as Default Allowed Refused yahoo-slurp
Ask/Teoma Same as Default Allowed Refused teoma
GigaBlast Same as Default Allowed Refused gigabot
Scrub The Web Same as Default Allowed Refused scrubby
DMOZ Checker Same as Default Allowed Refused robozilla
Nutch Same as Default Allowed Refused nutch
Alexa/Wayback Same as Default Allowed Refused ia_archiver
Baidu Same as Default Allowed Refused baiduspider
Specific Special Bots:Google Image Same as Default Allowed Refused
googlebot-image
Yahoo MM Same as Default Allowed Refused yahoo-mmcrawler
MSN PicSearch Same as Default Allowed Refused psbot
SingingFish Same as Default Allowed Refused asterias
Yahoo Blogs Same as Default Allowed Refused yahoo-blogs/v3.9
Restricted Directories:The path is relative to root and must contain
a trailing "/"
# robots.txt generated at http://www.mcanerin.com
User-agent: *
Disallow:
Disallow: /cgi-bin/
Sitemap: http://www.ruskingtongardencentre.co.uk/sitemap.xml
Now, copy and paste this text into a blank text file called "robots.txt"
(don't forget the "s" on the end of "robots") and put it in your root
directory. Like all other files on your server, make sure it's
permissions are set so that visitors (such as search engines) can read
it.
This Generator is Copyright © 2004-2007 Ian McAnerin & McAnerin Networks
Inc.
Introduction to Robots.txt
The robots.txt is a very simple text file that is placed on your root
directory. An example would be www.yourdomain.com/robots.txt. This file
tells search engine and other robots which areas of your site they are
allowed to visit and index.
You can ONLY have one robots.txt on your site and ONLY in the root
directory (where your home page is):
OK: www.yourdomain.com/robots.txt
BAD - Won't work: www.yourdomain.com/subdirectory/robots.txt
All major search engine spiders respect this, and naturally most spambots
(email collectors for spammers) do not. If you truly want security on your
site, you will have to actually put the files in a protected directory,
rather than trusting the robots.txt file to do the job. It's guidance for
robots, not security from prying eyes.
What does a Robots.txt look like?
At it's most simple, a robots.txt file looks like this:
User-agent: *
Disallow:This one tells all robots (user agents) to go anywhere they want
(disallow nothing).
This one, on the other hand, keeps out all compliant robots:
User-agent: *
Disallow: /As you can see, the only difference between them is a single
slash ( "/" ). But if you accidentally use that slash when you didn't mean
to, you could find your search engine rankings disappear. Be very careful.
One important thing to know if you are creating your own robots.txt file
is that although the wildcard (*) is used in the user-agent line, it is
not allowed in the disallow line. For example, you can't have something
like:
# Broken robots.txt - can't use the * symbol in the disallow line, even if
you really want to and it makes sense to have one (Google and MSN are an
exception to this - more information below)
User-agent: *
Disallow: /presentations/*.pptHere is the official information on the
subject: RobotsTxt.org
You may also be interested in:
robots.txt validator
and Robot Cop (Server module that enforces bot behaviour)
UPDATE: If you use Google Sitemaps (and you should), they have now
included a robots.txt validator in it - which will make certain that your
robots.txt file is understood properly by Google.
Pre-Made Robots.txt Files
If you want a simple file already pre-made and ready to drop into your
website root, you can get them here (right click and choose "save as"):
Allow All Robots
Refuse All Robots
Allow All Robots everywhere EXCEPT the cgi-bin and the images directory
Only Allow Known Major Search Engines
(note: this will disallow some good robots used by some directories to
check your listings - be careful)
After you upload these to your server, make sure you set the permissions
on the file so that visitors (like search engines) can read it.
If you need more control than this, there is a free robots.txt generator
at the top of this page that should help you out.
This is some commercial software helpful to people with very complicated
robots.txt needs: RoboGen
Major Known Spiders
Googlebot (Google), Googlebot-Image (Google Image Search), MSNBot (MSN),
Slurp (Yahoo), Yahoo-Blogs, Mozilla/2.0 (compatible; Ask Jeeves/Teoma),
Gigabot (Gigablast), Scrubby (Scrub The Web), Robozilla (DMOZ)
Search Engine Specific Commands
Google
Google allows the use of asterisks. Disallow patterns may include "*" to
match any sequence of characters, and patterns may end in "$" to indicate
the end of a name. To remove all files of a specific file type (for
example, to include .jpg but not .gif images), you'd use the following
robots.txt entry:
User-agent: Googlebot-Image
Disallow: /*.gif$This applies to both googlebot and google-image spiders.
Source: http://www.google.com/webmasters/remove.html
Apparently does NOT support the crawl-delay command.
Yahoo
Yahoo also has a few specific commands, including the:
Crawl-delay: xx instruction, where "xx" is the minimum delay in seconds
between successive crawler accesses. Yahoo's default crawl-delay value is
1 second. If the crawler rate is a problem for your server, you can set
the delay up to up to 5 or 20 or a comfortable value for your server.
Setting a crawl-delay of 20 seconds for Yahoo-Blogs/v3.9 would look
something like:
User-agent: Yahoo-Blogs/v3.9
Crawl-delay: 20Source:
http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html
Ask / Teoma
Supports the crawl-delay command.
MSN Search
Supports the crawl-delay command
Also allows wildcard behavior
User-agent: msnbot
Disallow: /*.[file extension]$(the "$" is required, in order to declare
the end of the file)
Examples:
User-agent: msnbot
Disallow: /*.PDF$
Disallow: /*.jpeg$
Disallow: /*.exe$Source:
http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm
Why do I want a Robots.txt?
There are several reasons you would want to control a robots visit to your
site:
It saves your bandwidth - the spider won't visit areas where there is no
useful information (your cgi-bin, images, etc)
It gives you a very basic level of protection - although it's not very
good security, it will keep people from easily finding stuff you don't
want easily accessible via search engines. They actually have to visit
your site and go to the directory instead of finding it on Google, MSN,
Yahoo or Teoma.
It cleans up your logs - every time a search engine visits your site it
requests the robots.txt, which can happen several times a day. If you
don't have one it generates a "404 Not Found" error each time. It's hard
to wade through all of these to find genuine errors at the end of the
month.
It can prevent spam and penalties associated with duplicate content.
Lets say you have a high speed and low speed version of your site, or a
landing page intended for use with advertising campaigns. If this
content duplicates other content on your site you can find yourself in
ill-favor with some search engines. You can use the robots.txt file to
prevent the content from being indexed, and therefore avoid issues. Some
webmasters also use it to exclude "test" or "development" areas of a
website that are not ready for public viewing yet.
It's good programming policy. Pros have a robots.txt. Amateurs don't.
What group do you want your site to be in? This is more of an ego/image
thing than a "real" reason but in competitive areas or when applying for
a job can make a difference. Some employers may consider not hiring a
webmaster who didn't know how to use one, on the assumption that they
may not to know other, more critical things, as well. Many feel it's
sloppy and unprofessional not to use one.
Robots.txt FAQ - Issues, Facts and Fiction
By itself, a robots.txt file is harmless and actually beneficial. However,
it's job is to tell a search engine to keep away from parts of your
website. If you misconfigure it, you can accidentally prevent your site
from being spidered and indexed.
This has happened to people both due to an error in the robots.txt file
and also after a site redesign where the directory structure of the site
has changed and the robots.txt has not been updated. Always check the
robots.txt after a major site redesign.
A robots.txt file and, for that matter, the robots metatag (related: free
robots meta tag generator), has NO EFFECT on speeding up the spidering and
indexing of a website, and no effect of the depth or breadth of the
spidering of a site.
You cannot issue a search engine spider a command to do something - you
can only tell it not to do something.
Security Issue: A robots.txt is not intended to provide security for your
website - humans ignore them. Additionally, there is actually an
additional possible security issue with them. Lets say you have a secret
directory in your site called "secretsauce'. You don't want it spidered so
you add this directory to your robots.txt.
The problem now is that anyone can look up your robots.txt file and see
that you don't want people looking at that directory. Obviously, if you
were a hacker, this would be your first stop. Additionally, if the path
you were excluding was "/secretfiles/secretsauce/" the same hacker now
knows that you have another directory called "secretfiles", as well. It's
never a good idea to tell a hacker details about your site structure and
design.
If you are trying to keep people away from information, you need to use
real file and folder level security on your site, which will prevent
robots from visiting just like people, even if the robots.txt file says
it's ok.
I recommend you set your robots.txt to only deal with non-critical and
normal directories, such as images, cgi-bin, etc and then use file
security for the rest. That way, even though the robots are not
specifically excluded from the folders and files, they are effectively
excluded by the the file permissions. Only use robots.txt (and robots
metatags) to exclude files, pages and directories that are intended to be
available to people but not to robots, such as duplicate pages, test pages
and demos.
Rule of thumb: If you want to restrict robots from entire websites and
directories, use the robots.txt file. If you want to restrict robots from
a single page, use the robots metatag. If you are looking to restrict the
spidering of a single link, you would use the link "nofollow" attribute.
GranularityBest Method
Websites or Directoriesrobots.txt
Single Pagesrobots metatag
Single Linksnofollow attribute
Unless otherwise noted, all articles written by Ian McAnerin, BASc, LLB.
Copyright © 2002-2004 All Rights Reserved. Permission must be specifically
granted in writing for use or reprinting anywhere but on this site, but we
do allow it and don't charge for it, other than a backlink. Contact Us for
more information.
Content Copyright © 1994 - 2007 McAnerin International Inc. & McAnerin Networks
Inc. All Rights Reserved. < Legal Notice >
Британия - Лидс - 94.136.40.100
Webfusion Internet Solutions
Webfusion Internet Solutions
HTTP/1.1 200 OK
Date: Wed, 01 Jan 2020 07:47:15 GMT
Server: Apache
X-Powered-By: PHP/5.2.17
X-Pingback: http://www.ruskingtongardencentre.co.uk/xmlrpc.php
Link: ; rel=shortlink
Vary: User-Agent,Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Кнопка для анализа сайта в один клик, для установки перетащите ссылку на "Панель закладок"