A few days after I had launched my website, my wife rattled off some search terms in Google and she commented on how my website couldn’t be found. My response, was one of mild contempt: “Yeah, you won’t find it in Google, because my site hasn’t been indexed yet.” I quickly realized that she didn’t know what ‘indexing’ was, and figured that probably most people (outside website design, development, and marketing) don’t understand what indexing is, or how Google works.
And why should they? People type in what they need to find in Google, and what comes up is proof enough that the system is working. But if you’re a business owner with a website or you have your own personal site and you’d like to get listed into Google, the concept of ‘indexing’ is one you should understand. Perhaps you might just be curious what the magic is behind your searches that return the list of sites you get to choose from.
My wife was under the impression that when she typed in a search term or phrase in Google, the search engine performed some sort of ‘live search’ across the entire breadth of the world wide web, scanning everything as it exists at that very moment, and condensing it into a relevant list of site-links. Although (to me) that seems more glorified –and at one point Google probably did do just that– today, it just plain isn’t possible. So what does happen? How does Google generate it’s results, and how does it decide they’re relevant?
Let’s pretend you are Google. Imagine you, sitting in a room, with 10 reference books. every so often, someone comes into the room and asks you a question, to which the answer can be found in your reference books. You quickly flip through the book you know to be relevant to the question, find the section specific to subject, and spit out a few passages from the book that shed light on the question asked.
But now lets say it’s not 10 reference books, but 1000. And many of the books’ general subject matter is almost indistinguishable from others without reading precise details of their contexts. It’d be physically impossible to find the references you’d need to answer people’s questions in a timely manner. And they don’t want to wait.
However, let’s say you have a photographic memory, and you can speed-read entire texts in a matter of several minutes -retaining 100% of their content. What do you do? Of course: you start reading every book. You don’t wait until someone asks something, to look for the answer. You memorize all the contents of all the books, and when someone asks a question. you can just recite the passages from memory.
Of course, we can’t leave it there. There’s one more twist: the books get updated. The content inside them changes -some every day, some every week, and some monthly. So you have to keep re-reading them when new versions are issued, memorizing the new content so you can recall it when a question regarding it, is asked. You’d be indexing.
This analogy illustrates how Google works. The amount of computing power it would take to crunch the content of the entire web in the brief time it takes to preform a search, isn’t even close to being available. Even with the massive super-computers of today, it’s just not possible. So instead, Google’s out there scanning websites, remembering their content, all the time. When a search is conducted on Google, it looks -not at the world wide web- but in it’s memory of all the sites it has scanned. It’s memory of these scanned sites is called it’s “index”. The act of scanning new sites, or old sites with updated content is called “indexing”.
With the vast number of sites, and changing content on the web, Google has it’s work cut out. It’s constantly scanning multiple sites simultaneously, through the use of programs called “crawlers”, “spiders”, or “bots”. Over 20,000 terabytes of data is indexed every day by Google-bots. How does it find it’s way around the web? It follows links. Links (hyperlinks) are what make the internet the dynamic field of information that it is. Links connect most of the web together into a chained-unit of growing information.
But it can take a while for Google to get to your website. Depending on what preparation has been done, it can take days, weeks, even months before Google finds your site and indexes it’s information. How it analyzes that information, and presents it during relative searches, is a subject for another post entirely.