Q: What is Evliya Celebi ?
A: Evliya Celebi is a web robot searching the web for Turkish content. It's name
comes from the famous Turkish traveller.
Q: I don't want it to crawl my site, what can I do ?
A: You can add following entry into your robots.txt file at root
location.
User-Agent: EvliyaCelebi
Disallow: /
Q: Is the source code available?
A: Yes version 0.14 and v0.17 of the source code is available.
Q: What is the difference between v0.14 and v0.17 ?
A: v0.17 uses MySQL as a link database. Unfortunately I am too lazy to
give SQL definitions to create tables, you can look at the source and see the table structures. There are some also minor changes and bugfixes in v0.17.
Q: Is it a fully qualified robot?
A: In fact no. It is written as a hobby, main purpose is gathering
statistics. It may and in fact it does still contain lots of bugs.
Q: What is it's speed?
A: It can be run parallel among several computers, when 10-12 processes
are run on a linux (Ultra-5 sparc) it can gather about 100,000
pages a day.
Q: How does it select the links to fetch?
A: It makes a database of available links retrived from crawled documents. Then
it randomly jumps to one of those links.
Q: My page has nothing to do with Turkish content, why did your spider come here?
A: Evliya Celebi crawles pages under '.tr' domain or those having a Turkish character
encoding. If it found your site then your site is probably linked from a
page in Turkish. When it detects your page is not in Turkish
character encoding it will not follow links from your site and will not
retrive your page again.
Q: I changed my robots.txt but your robot did not recognize it ?
A: It tends to check robots.txt every 6 hours.
Q: How about Turkish pages encoded in UTF-8?
A: When this robot was written, there were no pages that had UTF-8 encoding ;-) (This question was added in 2009)