Возраст домена | n/a |
Дата окончания | n/a |
PR | 2 |
ИКС | n/a |
Страниц в Google | n/a |
Страниц в Яндексе | n/a |
Dmoz | Нет |
Яндекс Каталог | Нет |
Alexa Traffic Rank | 11677295 |
Alexa Country | Нет данных |
История изменения показателей | Авторизация |
Идет сбор информации... Обновить
TifaWARE
n/a
n/a
UTF-8
1.17 КБ
21
147 симв.
122 симв.
Идет сбор информации... Обновить
Идет сбор информации... Обновить
Идет сбор информации... Обновить
Внутренние ссылки главной страницы ( 5 ) | |
tifaware.com/ | <img> |
perl/ | perl scripts |
dos/ | DOS programs |
/~theall/ | https://www.tifaware.com/~theall/ |
/~theall/gpg.html | GPG Public Keys |
Идет сбор информации... Обновить
# See for
# detailed info on excluding robots from a site.
#
# See for
# a way to validate the contents of this file.
#
# updated: 2020-05-22, George A. Theall
# Selected search engine 'bots get pretty much free reign.
# nb:
# appie => Walhello, http://www.walhello.com/
# AspiegelBot => Huawei search engine
# Barkrowler => Babbar.Tech / Exensa, https://babbar.tech/crawler
# bingbot => Bing, http://www.bing.com/
# boitho.com-dc => Boitho, http://www.boitho.com/, Norwegian search engine
# Clarabot => http://www.clarabot.info/bots (domain not found) and https://hu.wikipedia.org/wiki/Clarabot, Hungarian search engine
# DotBot => http://www.opensiteexplorer.org/dotbot, used for SEO analytics
# fast => Fastsearch (used by alltheweb.com)
# gaisbot => Gais, http://gais.cs.ccu.edu.tw/, Taiwanese search engine
# GalaxyBot => Galaxy, http://www.galaxy.com/
# Googlebot => Google
# Linguee Bot => Linguee, https://www.linguee.com/bot, a multilingual text search engine.
# Mercator + Scooter => AltaVista
# Mj12bot => Majestic-12, http://www.majestic12.co.uk/projects/dsearch/mj12bot.php, a distributed search engine.
# mogimogi => http://www.goo.ne.jp/, Japanese search engine.
# mozDex => http://www.mozdex.com/, an open source search engine
# msnbot => MSN Search.
# NG => Exalead, http://www.exalead.com/, French search engine
# Nutch => http://www.nutch.org/, open-source search engine
# PetalBot => https://aspiegel.com/petalbot, Huawei search engine
# Pompos => dir.com, http://dir.com, French search engine
# QuepasaCreep => quepasa.com, Latin American portal / search engine
# SafeDNSBot => Safe DNS, https://www.safedns.com/en/searchbot/
# SeznamBot => https://napoveda.seznam.cz/en/seznamcz-web-search/, Czech search engine
# Slurp => Inktomi (includes MSN Search and HotBot)
# VIAS => http://vias.ncsa.uiuc.edu/viasarchivinginformation.html
# VoilaBot => http://www.voila.com (French search engine)
# yacybot => https://yacy.net/bot.html (Decentralized web search)
# YandexBot => https://yandex.com/support/webmaster/robot-workings/robot.html, Yandex search
# Yeti => http://naver.me/spd (NAVER search engine)
# Zao => Kototai, http://www.kototai.org/, Japanese search engine research project
# ZyBorg => WiseNut, http://www.wisenut.com/, and Looksmart
User-agent: appie
User-agent: AspiegelBot
User-agent: Barkrowler
User-agent: bingbot
User-agent: boitho.com-dc
User-agent: Clarabot
User-agent: DotBot
User-agent: fast
User-agent: gaisbot
User-agent: GalaxyBot
User-agent: Googlebot
User-agent: Linguee
User-agent: Mercator
User-agent: Mj12bot
User-agent: mogimogi
User-agent: mozDex
User-agent: msnbot
User-agent: NG
User-agent: Nutch
User-agent: PetalBot
User-agent: Pompos
User-agent: QuepasaCreep
User-agent: SafeDNSBot
User-agent: SeznamBot
User-agent: Scooter
# NB: for the month of July 2004, Intomi's slurp 'bot has done nothing
# but try to grab invalid URLs (other than robots.txt), URLs that
# *never* existed here. Can you say "database corruption"? :-(
#User-agent: Slurp
User-agent: VIAS
User-agent: VoilaBot
User-agent: yacybot
User-agent: YandexBot
User-agent: Yeti
User-agent: Zao
# NB: starting in January 2005, looksmart's seems to have switched from
# WiseNut to grub for its crawler. The later doesn't bother
# requesting robots.txt and doesn't seem to understand response
# codes of 403. So should WiseNut ever come back, screw 'em.
# User-agent: Zyborg
Disallow: /cgi-bin
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# Other 'bots that I'm ok with.
# o Applebot (https://support.apple.com/en-us/HT204683)
#
# nb: while it retrieves robots.txt, it has not respected rules in that,
# at least when it was not explicitly listed in the file.
User-agent: Applebot
Disallow: /cgi-bin
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o CCBot, https://commoncrawl.org/big-picture/frequently-asked-questions/
User-agent: CCBot
Disallow: /cgi-bin
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o IBM Almaden Research Center.
User-agent: http://www.almaden.ibm.com/cs/crawler
Disallow: /cgi-bin
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o The Internet Archive, http://www.archive.org/.
User-agent: ia_archiver
Disallow: /cgi-bin
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o LinkWalker, http://www.seventwentyfour.com/, for checking links.
User-agent: LinkWalker
Disallow: /cgi-bin
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o research project from Kitsuregawa Laboratory, The University of Tokyo.
User-agent: Steeler
Disallow: /cgi-bin
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# All robots are excluded by default. Please direct requests to
# allow access to webmaster@tifaware.com.
#
# 'bots I know about but don't want to bother with
# o arquivo-web-crawler, http://arquivo.pt
# Similar to the Internet Archive, although focused on
# the Portuguese web. Although it more or less respects
# robots.txt, I don't think the sites I host fit the
# bot's coverage area.
# o BLEXBot, http://webmeup-crawler.com/
# "BLEXBot assists internet marketers to get information
# on the link structure of sites and their interlinking
# on the web, to avoid any technical and possible legal
# issues and improve overall online experience." Count me
# out.
# o CheckMarkNetwork, http://www.checkmarknetwork.com/spider.html/
# Used by CheckMark, which describes itself as [offering]
# "Complete Brand Protection".
# o DomainStatsBot, https://domainstats.com/pages/our-bot
# Used for marketing SEO services.
# o evc-batch
# Operated by eVenture Capital Partners and reportedly
# scans for ads.txt (https://en.wikipedia.org/wiki/Ads.txt).
# I have no interest in supporting advertising here.
# o Girafabot
# Used by girafa.com to visualize search results. I'd be ok
# with this if only they'd respect robots.txt.
# o grub-client, http://grub.org/html/documents.php?op=robots-faq
# Distributed crawler for the grub search engine. I'd be ok
# with this if only they'd respect robots.txt.
# o ips-agent
# Reportedly operated by Verisign for periodic reports for
# expiring domains and their associated web traffic.
# o The Knowledge AI
# While it seems to respect restrictions in robots.txt,
# I haven't turned up any authoritative info about it,
# and what info there is suggests it doesn't support
# https (eg, https://www.webmasterworld.com/search_engine_spiders/4983886.htm).
# o lachesis, ftp://ftp.imag.fr/pub/labo-LSR/DRAKKAR/internet-performance/lachesis/
# Supposedly an Intel tool for measuring ISP latency, although
# after examining it I think it's mis-identified.
# o larbin, http://larbin.sourceforge.net/index-eng.html
# Multi-purpose web crawler.
# o Mb2345Browser
# Browser used by Chinese web directory 2345.com according to
# .
# It seems to respect robots.txt, at least from what I've observed here.
# o Mozilla/4.0 (efp@gmx.net)
# Spammer tool to scrape email addresses.
# o netEstate NE Crawler, http://www.website-datenbank.de/
# Some sites consider this crawler malicious and badly-behaved
# so for now it's blocked.
# o NPBot, http://www.nameprotect.com/botinfo.html
# Used by NameProtect to scan for brand / IP violations.
# o Pandalytics/1.0, https://domainsbot.com/pandalytics/
# While it seems to respect restrictions in robots.txt,
# it is operated by a company that studies the market
# for domain names, which I have no interest in
# supporting.
# o Psbot, http://www.picsearch.com/bot.html
# Used by Picsearch to index pictures. I don't really have any
# pictures here that I want indexed.
# o Screaming Frog SEO Spider, https://www.screamingfrog.co.uk/seo-spider/
# Free / commercial software for crawling a site, primarily for SEO.
# Seems to respect robots.txt, albeit with requests for the top-level
# root document.
# o SemrushBot*, https://www.semrush.com/bot/
# Used by SEMrush primarily for marketing.
# o Teoma
# Used by AskJeeves search engine. I'd be ok with it if only
# it would respect exclusions in robots.txt.
# o TurnitinBot, http://www.turnitin.com/robot/crawlerinfo.html
# Used by Turnitin.com to prevent plagarism.
# o Vagabondo, https://www.wise-guys.nl/
# Requests robots.txt but does not respect exclusions in that.
# o ZoominfoBot, https://www.zoominfo.com/
# Used for B2B marketing.
User-agent: *
Disallow: /
Disallow: /nogo
Франция - 92.243.1.63
Gandi Dedicated Hosting Servers
GANDI SAS
HTTP/1.1 200 OK
Date: Mon, 08 Jun 2020 20:51:48 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Accept-Ranges: bytes
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html
Кнопка для анализа сайта в один клик, для установки перетащите ссылку на "Панель закладок"