A standalone program that “bypasses” Internet resources to update a search engine database. Search robots (or “spiders”) index information about websites. They follow links, reaching out to more and more new pages.
Spiders can be limited to search, so as not to overload the database with too much text or sites with many indexing levels. Owners of Internet resources can block robots from specific pages or an entire domain. Instructions for search bots are stored in the robot.txt file, located in the root directory of the site.