BoardTracker ForumsBoardTracker Forum Index

 FAQFAQ   SearchSearch   RegisterRegister   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in

brrreeeport and the search engine showdown
Topic Tags:
 
   BoardTracker Forum Index -> BoardTracker Blog.. Wuff!
 About Us 
BoardTracker is the leading search engine for message boards and forums and provides innovative search, analysis and social networking technology to bring people closer to the boards. BoardTracker is a Pidgin Technologies property.
 Pidgin Technologies 
PidginTech
Fotopages
ServerMojo
 BoardTracker Links 
RSS Subscribe
BoardTracker Forums
  
slashdot it! digg it! reddit! add to del.icio.usView previous topic :: View next topic  
Message
PostPosted: Thu Feb 16, 2006 1:11 pm    Post subject: brrreeeport and the search engine showdown
Author: Support
Reply with quote

It seems that Robert Scoble has started a storm in a teacup by inventing a word "brrreeeport" and asking everyone to blog it.

Its been interesting watching it spread around the 'blogosphere' and beyond (someone even registered brrreeeport.com!) although it hasn't yet reached the 'boardscape' until now at least.. Wink

But it has stirred up some debate about how accurate search engine results numbers really are with Google being the obvious first choice to jump on.. it apparently reported over 22k results for brrreeeport.

Scoble is not amused by search engines showing inaccurate results and even David Sifry (of Technorati) has weighed in on the topic and when dealing with small numbers of results they have a point, the numbers should be accurate, but when dealing with large numbers of results does it really matter anymore as long as they are not wildly innacurate? Do you search for the stats or the results themselves? I know some may need the stats more for marketing/biz intelligence purposes but in such a case its better to use tools designed for that task rather than the default/public search mechanism.

For one thing Google only shows the first 1000 results no matter how many it says there are, many other search engines do the same, afaik technorati shows just the first 200 results, so if the numbers are not exactly accurate how can anyone really know once you get beyond those limits? Does it even matter? After all you search to get the most relevant results and if the search engine is doing a good job there is no need to drill down so far.

Now as to why Google showed 22k results I would guess that includes some duplicates of one kind or another.. such as multiple rss/xml/whatever feeds which may be indexed along with the page itself plus all those aggregator sites showing links to all those blog posts (not to mention spam pages) and even results pages such as on technorati often appear in google.. plus there's the comments which probably all have the magic word in also and often will be on additional pages so all in all its not too impossible for google to find more results than expected. Which is not to say its good when so many may be dupes but probably most are hidden so only the most relevant, non dupes, actually show up by default.

There may also be various technical issues when pulling/counting/manipulating results from clusters and datacenters all over the place. When google has to display the number of results to a search term they show an estimate only and not an accurate number. This is true by the way, in almost all search engines case, where the amount of data indexed is vast and spread across many mirrors in a large server cluster.

Estimates are used to facilitate faster response to searches (which we all agree is an important aspect).
When google estimates number of results it does that on a non-accurate basis. If I had to guess, Google uses a density/time algorithm, computing the 'total" number of results based on the "density" of the search results in a smaller data/time-scale. To clarify, let's say that Google search resulted in 100 results for the "optimized" query and those results were spread across 100 days, while during those 100 days google had 20,000,000 pages/results overall (for other/all search terms). This may indicate that the specific brrreeeport term has a density of 0.0005% in the total pages of the last 100 days. Google might then estimate that if it has 80,000,000,000 pages in the database, there are 4,000,000 possible results to the search term within those pages. Or if their algorithm is a bit smarter, it might check and see that the first mention of this term was 300 days ago and therefore it is likely that the number is smaller (between 4M and 300) results.
Again, this is a very speculative (although probably close enough to reality) explanation to why google has innacurate numbers showing for results count.
Or.. maybe their techs were just winding you all up.. Wink

You could say that this shows the value of specialized search engines like technorati, blogpulse and of course boardtracker Wink which are less 'polluted' since they are designed to deal with very specific types of data/pages and so do a much better job of it.
_________________
If you need help, you are in the right place
Feel free to use these forums to ask questions and discuss anything BoardTracker related.
Subscribe to the blog rss feed.. http://blog.boardtracker.com/rss.php
View user's profile Send private message Visit poster's website          
Display posts from previous:   
Post new topic   Reply to topic     BoardTracker Blog.. Wuff! All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
 More Blogs 
PidginTech
DataMining
Cloudlands
Scobleizer
Jeremy Zawodny
SEW Blog
Micro Persuasion
Guy Kawasaki
Scripting News
Mashable!
Signal vs. Noise
  


Powered by phpBB © 2001, 2002 phpBB Group