The Basics of Search Engine Spiders
Search engine spiders, also known as crawlers, consist of scripts or programs that when used by search engines, go out and collect information about publicly-accessible web pages. These spiders are solely responsible for collecting information – including content, keywords and meta data – and use this information to index your site and rank it against other web sites for any given search query. There are multiple types of spiders that are designed to scan and engage in specific functions and acts, each contributing to how any given search engine views your site. By understanding what these spiders are doing when they happen upon your website, you will be better equipped to know how to use these spiders to your advantage.
Selection spiders are the most basic form of search engine crawler and are typically what we think of when the word ‘search engine spider’ comes to mind. As these spiders crawl the web, they are primarily looking for web pages that have not been indexed yet and from there, proceed to add these sites’ data and information to the specified search engine. These types of spiders will also look for any major changes to already-indexed pages and may apply the update, depending on the rank of the page and the exact commands of the spider.
This form of spider is tasked with the exclusive job of doing what some selection spiders do: visiting already-indexed pages to determine if any major updates or changes need to be applied to the search algorithms. Re-visitation spiders and crawlers are predominantly useful for larger sites and blogs that have constant content being added and need continuous updates in order for that content to be viewable by users of search engines. Re-visitation spiders will check any given site at a frequency that is calculated based on factors pertaining both to the specific search engine and the site itself.
These form of spiders work in many of the same ways as other spiders do, with the main difference being that the spider alternates data gathering between multiple websites in order to prevent excessive bandwidth consumption on behalf of the bot and to prevent the bot from crashing due to certain scripts being ran on particular sites. “Impolite” bots will often ignore specifications made by the site (robots.txt file is one example) and will end up wasting bandwidth or crashing the site (the bots can also be crashed by ignoring the commands of the site and being tricked into a “spider trap”). Polite bots gain their name from being considerate when it comes to bandwidth and an individual website’s specifications.
What Are Spiders Seeing?
If you are curious as to what data search engine spiders are collecting about your website, then you should try out Pingler’s Site Spider Viewer, which allows you to insert the URL of any primary, secondary or tertiary webpage and see the same information that spiders are collecting about your website. This tool is perfect for anyone who wants to see or revise the information being collected about their website.