play>>
Mike:
I've made Paul Gardi shudder a few times Apostolos, when I've mentioned Teoma and Kleinberg's HITS algorithm in the same sentence. But there is an underlying influence of Kleinberg's work in the Teoma algorithm, as we know. You touched on that before. Is it fair to say that Kleinberg's work was also an influence on PageRank ?
Apostolos:
There is some debate about this. I think both of them were
independently discovered. Has CLEVER influenced Google? The
answer is: later, yes. Have they implemented PageRank?
The answer is no.
Mike:
Well that's very interesting that you mention that. Because I was going to ask you. In 1998 both PageRank and Kleinberg's work were both pretty much advanced because search on the web was in its infancy then.
But, it's 2005 now, so things have changed. And I wanted to ask you about how important you think PageRank is in the work that goes on over at Google?
Apostolos:
The importance has diminished because PageRank is just one piece of the ranking algorithm over there. The ranking algorithm is so much more complex now. And PageRank is just used when they want to break ties.
I'm sure that they've not implemented Kleinberg's algorithm. But I'm also sure that they have created some kind of local implementation based on anchor text.
They've been very, very solid for the last few years. I think the biggest improvement they've made is anti-spam inclusion and consistency. But what surprises me is that they haven't used subject specific popularity, even though they know it's important.
Mike:
Well, I've talked to guys from Google about this. And you mentioned earlier that, when Teoma was launched there were rumours that it wasn't possible, you can't do this, it would take a server farm the size of Texas and all the electricity in the world to create just one calculation.
And obviously you've managed to do it. So, if you could just write down how you did it on a little piece of paper for me... [Bursts out laughing]
Apostolos:
Yes, it can be done.
And this has to do with my background. I come from a different background. So, I looked at this from a different perspective and was able to see a method of how this could be done.
What you guys are missing, is that, we're at the early days of search development. I know people want Teoma to succeed. But what people are forgetting is that we're still at the beginning of search.
If you were to ask me: Where are we standing right now? Well, in 2000 we were at level one. In 2004 we were at level three, and we need to get to level ten.
I can go back to the argument about Jeeves abandoning Teoma. In fact the opposite has happened. We were only seven people when we were bought by Jeeves. And search is not just about the algorithm. Search is about the infrastructure. And for us to go from a company which was worth 35 million dollars, to a company which is worth a few billion is quite an achievement.
I can assure you that we now have the infrastructure which is very hard to build. Jim mentioned it earlier, it's not about the algorithm alone, it's about building the infrastructure to be able to deliver 24/7never going down.
Jim:
Mike, it's what I was saying, the last successful new search brand ever to launch was Google...
Mike:
So, people ask me: Who's going to be the next Google? And I say, it's like asking who's going to be the next Beatles. It's never going to happen. I think anyone trying to get into this space, general purpose search, I going to have a problem.
Jim:
Sure, people are asking what's going to be the next technology that comes along. That's important. I think Andrew Goodman said that. It's important but not sufficient and that's true. What Ask Jeeves had was a brand. We had a well known brand but not a high quality product.
So we've worked our tails off the past three years, the Jeeves team and the Teoma team, as we've grown it and started bringing quality to the table.
Statistically we know that this engine is far more advanced than it was in 2001...
Mike:
Is Microsoft going to bring in new technology? Is that going to be the next generation? Or are they just building on what already exists out there, Apostolos?
Apostolos:
Mike, I'll tell you something. In 2000, I think I was really fascinated by the solution and making it work. There is some truth in what you say about us being a little low key while we build the infrastructure. But I'm as excited now as I was in 2000. I'm overwhelmed with excitement, because what we missed at that time, and it was something important, that there are what I call two galaxies.
The galaxy of you guys who build the web pages and there's also the galaxy of the users. And the thing which to me is beautiful and amazing, is that there has not been a connection between these two galaxies.
I can on and on and tell you how important your industry has been. I have utmost respect for you guys. And your impact is not recognised as much as it should be.
What you have been trying to do is to create highways between the galaxy of the users and the galaxy of the web, by building pages with the right keywords so that people can find them.
I don't think it has been recognised how important your work has been making all of those pages with the misspellings so that people can find them...
[Mike and Jim both have a chuckle at this comment]
Mike:
There seems to have been this very strange, strained kind of relationship between the search engines and search engine marketers as it's known or search engine optimisers, as was.
When I was trying to write the first edition of my book, just trying to get to speak to someone at a search engine meant hanging around in a corridor for five hours or something!
But by the time the second edition came around and we had paid inclusion and pay per click and money was involved - they were knocking on the door to take me to lunch!
There is something that I do worry about though. And that's to do with network theory. I'm not a scientist but I'm fascinated by how this works and the rich get richer issue.
I wrote a paper called "Filthy Linking Rich" and I wonder if in this industry we are in part responsible for that? You know for kind of shifting the ecology of the web with our highly optimised pages and trying to build linkage which is, perhaps, a little on the false side...
<< Pause >>
[There's a link above in the intro to this feature, to the paper "Filthy linking rich"]
<< Play >>
Apostolos:
I think what you do, as long as you're trying to build high quality pages and...
What your achievement has been, really, is in finding the right words for the users to find. And nobody is discussing this.
Take someone in my neighbourhood who is building a café. And he wants local people to find that. What is she going to use as keywords to be found?
I was so impressed with what you guys were doing to help us. I mean help the search engines to find that local café.
There is a problem with the rich getting richer. But I don't think there is a problem with you helping a local café to be found. I don't think anyone would question your contributions there.
Mike:
Can I just touch on paid inclusion? At the beginning of the year [2004] I was in New York when Yahoo! launched its new search engine and SiteMatch with it, which is their paid inclusion product.
I've never really believed in this very much. Subscriptions of URL's to be crawled is one thing - but XML feeds? Nah.
You guys pulled out of that fairly quickly. I've always had the suspicion that those XML feeds really couldn't blend very well with connectivity, with linkage data...
Jim:
Yeah, that's essentially why we pulled it. I don't think this is the end of that approach. If you could keep it in a contained environment where you have apples to apples...
But the mixing of apples and oranges, the structured data with the unstructured... it just didn't work.
You were either guessing how they should blend. Or if you weren't guessing you'd be fixing. And we certainly didn't want to get into that. So how do you do it?
And when we realised that and then you layer on top of that all of the other things we had to do...
But I don't think that working with feeds, not just with vendors, but with anyone, is done forever.
I'm looking at movie listings in the newspaper for where the movie Sideways is playing and those things are paid! It's interesting, especially on the editorial side, just how much of a newspaper or magazine is actually paid for. I never realised it.
Mike:
Yeah, you suddenly find you're just reading around a lot of ads!
Jim:
But they're not really adverts, I never knew that those were paid until this year. You know where it says: this movie theatre is showing these 8 movies. And I was thinking, well why isn't the other movie theatre mentioned.
Mike:
Can we talk a bit more about paid placement then, Jim? This was mentioned over at the forum the way that Jeeves presents listings. You know, with the 20 paid listings before you get to the actual web results.
I understand that AdWords pays the bills, but do you foresee that layout changing?
Jim:
Yes, but I should go back and add something to that conversation. The SEMs [search engine marketers]are only really looking at queries that are commercial. Because that's who their clients are. And they're not looking at the vast query stream where we don't have that issue.
On the majority of queries on our site there are no ads at all.
Mike:
That's interesting...
Jim:
It's just that when they are very commercial you have a lot of ads.
That's good news in those instances. The bad news is that it's the reverse of the competition which gets an editorial result above the fold on every query. We're not there yet.
So, we have a lot of obligations as a company, we're aware of it, we're working on it. That's why I'm saying stay tuned. Give it some time and we'll get there.
Mike:
Could I glean from what you've just said that, the majority of searches at Jeeves are non commercial searches?
Jim:
The majority of searches on the web are non commercial! And the SEM community is very unaware of that.
Mike:
Hmmmmm! Exactly, that's because we spend time looking for people who want to spend money online so that we can make money ourselves.
Jim:
Exactly! I mean the majority of usage by our users is on editorial results, on editorial features - not on ads.
<< Pause >>
[Since this interview took place, Jeeves has changed the presentation of its results pages.]
<< Play >>
Mike:
One of the buzz words/terms for 2004 certainly was "personalisation". What are your feelings on that Apostolos? Is that going to make a huge difference.
Apostolos:
Personalisation or localisation?
Mike:
I sometimes just think of localisation as being a subset of personalisation... Is that a way to look at do you think?
Apostolos:
Personalisation is quite hard. There are certain patterns, certain things for which you can give high quality results. But because people have certain behaviours, they sometimes don't remain stationary.
I think personalisation is interesting... but it's quite hard.
Mike:
I'm just wondering how close we're getting to personalisation. because if you look at the components of the major search engines, Jeeves included, they're similar.
For instance, Google was left behind on the personalisation side because it didn't have a subscriber base. Whereas, Yahoo! did from its previous history. They had 150 million subscribers. So they had a bit of data about their users.
And then Google introduces Orkut so that they bring people in and get a bit closer too them. And then they introduce GMail where people now have to log in. They have those kind of ad on services at Yahoo! and also at MSN where users have to log in.
The when you look at Jeeves acquisition of ISH and you see MyWay and... Well, all of a sudden you guys have sticky web mail and social networks and desktop search and that sort of thing...
Is personalisation all about locking people into the brand?
Jim:
Mike, I think your going down the right track on this. Because when people talk about personalisation and search, they talk about it as a way to deal with ambiguity.
You see the example of "bass". If you type that in as a query and I know that you like fishing, then I know you don't mean the beer!
But those instances are actually pretty rare. It's more important to actually give people the opportunity to clarify their query on topics they've never asked before.
That's like our related topics feature that came out of Teoma. It's easier to do that than it is to deal with it on a personalisation basis and try to understand what they're talking about.
So what you're getting into, Mike, is about the relationship with the user, let them have a better search...
Mike:
Yes, I agree you're right. There are two sides to it. I've looked at the theory behind personalisation and how it would work. At the moment we're looking at this linkage data and seeing how it relates and looking at all of the clues you can get from that. And the next thing is if you've got a peer group, if you got 15million guys all looking for Bass beer, or whatever it is, then you can rationalise results around that. Which should provide a better user experience.
And what I'm thinking is that, if you have all of this stuff then you're actually locking somebody into the brand.
And until you get it wrong for them, then there's no need for them to go and search anywhere else!
Jim:
Actually, I think it's a lot more interesting than even that.
The vision behind MyJeeves, which made us the first major search engine to launch what we called "personal search service, was understanding just what you said, being able to contain your search to what's important to you from a data perspective. It's very compelling to people.
What I was talking about when I was showing MyJeeves to people originally was... Well, envision this, it's 50,000 results because you've been using Jeeves for ten years. And out of the billions and billions of web pages out there, you'd save these 50,000 that are important to you. And then we give people the ability to add their own meta data to those. And that meta data is an immensely important part of personalisation.
I just got back from the consumer electronics show and wouldn't mind sharing something with you from that.
Meta data, not so much for the web, but when you go up the staircase of mobile, video, music, pictures which all require meta data to become important i.e. to become searchable and findable later: that's the vision behind it. And, of course, that's just you. You can also imagine layering something against that, such as different profiles.
So, if I'm a dentist and I just want to search against documents that dentists have found important and saved and created in their own community, I can do that. If I wanted to search in my family network for, let's say photos and music, it becomes more important in that vein.
Mike:
Just picking up on meta data. When I first came into this business, or dabbled in it as it was at the time. Meta tags were very important for a web page.
As you just said, for those non HTML files, multimedia and those kind of things, then the meta data is vitally important.
The problem we had with HTML pages is the fact that, it didn't take very long before, being humans as we are, that people just started to lie in their meta data [Laughs]
Most of the meta tags I saw in the early days, bore little or no resemblance to what the page was actually about!
Jim:
I think people are going to need to be trained. The digital photo and digital music revolution are going to be a big part of that teaching.
Especially photos where, if I'm using Flicker I need to label these photos so I can find them later. Whereas, if I'm using Jeeves desktop search, which is another important part of that staircase, then I need to label them correctly and get them in the right folders and subfolders.
If I want to find that photo of my trip to Monterey last month, then I better not label that one "cute picture" or something!
[Mike and Jim have a little chuckle at this]
I really need to label that one something about my son at the beach. I maybe want to put the year on it too. If in 20 years I want to use desktop search to find that picture, then just identifying the year may become important.
Mike:
Some years ago, about 2000 I think it was, I was talking to Craig Silverstein, Director of Technology at Google about tags on pages. We were talking about images and he said that, a lot of web developers are very good. But they do tend to give images kind of library references i.e. jpeg23147 or something.
He said the smart thing to do would be to give it a name and say in the alt text "picture of a fish" or whatever it happened to be.
I wrote about it in a newsletter and everybody took that to be the clue to ranking better at Google and immediately started stuffing alt tags with keywords [Laughs]
Jim:
That's why personalisation is so important. If we're talking about spam, that will decrease the likelihood that I'm going to have one of those spam pages in my personal index.
With personalisation it's about what the user finds important so quality is more important.
Mike:
Just to finish the story about the alt tags. Google was just working on its image search at the time and I think it was more of a ploy to get people to label images and mark them with an alt tag for the benefit of the new service [Laughs]
Apostolos, I want to talk about crawling the web for a moment. This still seems to be the most primitive part of search engines and still seems to have a way to go.
What you tend to find in this industry is that, we try to create pages for crawlers. And I'm just wondering, philosophically, if we are doing the right thing, if you know what I mean.
If we create pages for crawlers, then search engines will have less of an idea that there are problems with crawling and therefore won't do much to improve or fix them...
Apostolos:
I saw your comment about a page which was a cloaked page. I looked at it and it wasn't really a cloaked page, the author of the page had taken out the images to make it more text based and crawler friendly.
Crawling is primitive because it can't rely on an infrastructure which is built by an entity but by a chaotic infrastructure. Wherever you go on the web, everybody has different rules.
To our amazement we still find pages for Ford or large companies which don't know how to create pages to be crawled.
You know there are large companies which have robots.txt files which exclude search engines and wonder why they're not getting crawled!
[Mike has a knowing laugh at this too]
I think that everybody has made a lot of progress and I think we're at the point where the web is easier for the search engines.
There's not a uniform way of building pages. And optimisation is the only way for us to try and create a sort of uniformity.
The web is chaotic. There are no standards. People have tried to put standards in place. So I think it's very important to have pages that crawlers can find. We have problems finding pages that have no text.
I mean how do you recognise pages that are Flash without anything in them?
Mike:
Are you crawling Flash yet?
Apostolos:
Yes.
Mike:
I'll tell you what I'd like to do right now, while I still have both of you guys here. We've been talking about creating pages and optimising. Let me just take advantage and ask you what we need to do to build the perfect Jeeves/Teoma web site? What do we need to do to rank well?
Apostolos:
Well, build a great page which is recognised by the web. The only problem you have is how quickly this page will be recognised by the web, but Teoma, I don't know if you're aware has recently improved the crawling speed dramatically. So, it's basically building a great page which will be recognised within its community.
Don't try to build your own community and say: "I'm the one who recognises myself" though!
Now, with Teoma, as I mentioned before, we're going into the next generation. I want to get you excited because 2005 is going to be the most exciting year for search engines.
Now it's not just about communities, it's about the users. There are new technologies coming in which will change the way that people access information.
I don't know if you saw BBC news in the UK? There was an article where they had a competition with Teoma Vs all the major search engines and we came at the top. I don't know if you read that?
Mike:
I haven't seen that one, no. But I must go and find it. I do some stuff with BBC!
I have to tell you, when you I started working with the BBC on some stuff last year, I introduced Teoma to them. They had heard of Jeeves but not Teoma.
And since I introduced the guys doing the technology stuff to it, it's become their favourite search engine.
<< Pause >>
[You'll find the BBC article here:
http://news.bbc.co.uk/2/hi/uk_news/magazine/4003193.stm ]
<< Play >>
Jim:
Mike, what do you perceive to be the difference between them... Jeeves and Teoma?
Mike:
I like the presentation at Teoma. I have to say I've learned to live with Google's presentation of natural results down the left hand side and adverts down the right. With Jeeves I find it a little annoying at times that, perhaps because I'm doing mainly commercial searches, I get that long list of paid results before you get down to the natural results.
I'm not a big fan of the frames at Jeeves. I know I can look around and find a link to get rid of the frame. If I do a search for digital cameras and all I want are some reviews, then I still have to get through ten affiliates and other stuff before I can find them. This is just a personal thing, of course.
I guess, at the end of the day, the user is the most important thing to you guys. And if they're not complaining, that's the most important thing.
Jim:
But you're in the UK and they're still a little behind what's happening in the US.
The frames thing is interesting, by the way...
Mike:
Yeah, well that's probably because it's a bit more difficult for people inside the industry. We're just too close to it and we're a bit more objective about it than the average end user, I suppose.
There are things in this industry that only we care about. It's like the PageRank thing. I've said so many times that I don't believe that Google uses PageRank . It's the Emperors new clothes if you ask me. Only people in this industry worry about it.
If you ask the average end user at Google whether he's bothered about PageRank he probably wouldn't even know what it was.
Jim:
Of course!
Mike:
So, what about the future? I wouldn't like to get too deep into the technology but, you were talking about the end users Apostolos. There's a lot that we can learn because there is history now. So I guess with learning machines, genetic algorithms, there's a lot more data that you can work with than purely linkage data...
Apostolos:
I touched upon it a little earlier. Like I said there are two galaxies. One is the user galaxy which you guys don't know about. And the other is the web galaxy which you guys work with every day.
The lucky ones, really, are the search engines because they own the user galaxy. We were talking about personalisation, let's touch on that again. It's not the individual who is important, it's the group.
We belong to groups and we have group behaviour. And this to me, going back to the subject specific popularity and the clusters we have created on the web... Well just imagine if we can create clusters of user behaviours!
This, to me, is the next break through. I can't talk to you about this in detail, but I can tell you, one important thing: go and read that BBC article and you'll realise why we're the best!
Mike:
I didn't dispute it for a second [Laughs] And I'll certainly go and read the BBC article. It's strange that I missed it because I check the BBC a lot.
Apostolos:
I know the webmaster world has been focusing on only one thing and that's commercial. So webmasters have not noticed how much Teoma has been improving because they're only looking at commercial.
Probably excluding you, Mike, and some other people who have noticed it.
We've improved dramatically. And I haven't disappeared, as Jim said, I'm here. I have been for the past three years and I'm more excited than ever. And I want to make it happen for you guys and the world, that's the reason I'm here.
I'm excited, it's about what I call the "userweb" and the web that you are creating, it's the integration of the two that will take us to the next level.
Mike:
Fascinating.
Can we just touch on one more thing which we haven't really covered, although you did mention that cloaked page earlier. And that's the subject of spam. It is a major issue. Are we going to be able to clean up the index and get rid of rubbish... Will we be able to tame the wild west, do you think, Jim?
Jim:
I think relative to the other two major engines, we do very well. I think unfortunately it's a problem that all three of us have. That's because value [money] there in gaming the engines. Luckily, at our core, we have some technology which makes it harder to spam.
Moving forward, we're all developing technology to deal with this. And I think we're all going to be upping the anti in that war, from our side.
And as the web evolves and moves up the staircase with personalisation and new types of data it'll become harder and harder to spam.
Mike:
That's what I was kind of figuring, that it's going to get harder. I sometimes wonder, it's a very fine line, but the methods and the efforts that people put into spamming may be better spent just creating some great content instead of crappy spam. You may have somebody linking to you and end up just doing well naturally!
Jim:
If people don't find it valuable... Well, you can run for a while, but you can't hide. And if people don't find it valuable, we're gonna know.
Mike:
Listen guys, I know that I've taken up a lot of your time and I really do appreciate it.
Apostolos:
I just want to get a message to the community. I've been following you and reading your stuff and I learn a lot from you. So there is indeed an appreciation of the simple fact that you were able to help us. Overall, I think your contribution should be recognised.
Mike:
Wow! Absolutely fantastic!
Listen, if you're at SES NY, dinner's on me guys.
Jim:
Mike, you're gonna have you come down to Piscataway.
Mike:
Sure, I don't mind. I'll come down a see what you're doing down there.
Jim:
We'll take you to White Castle...
Mike:
Sounds great - what is it? [Laughs]
Jim:
These are very good cheap hamburgers, famous in New Jersey.
Mike [Bursts out laughing] Oh really? Sounds just like my kind of food.
Once again, thanks for your time guys.
© Mike Grehan 2005
Editor: Mike Grehan. Search
engine marketing consultant, speaker and author.
http://www.search-engine-book.co.uk
Associate Editor: Christine
Churchill. KeyRelevance.com
e-marketing-news is published
selectively on a when it's
ready basis. ©2005 Net Writer Publishing.
At no cost you may use the
content of this newsletter on
your own site, providing you display it in its entirety
(no cutting) with due credits and place a link to:
< http://www.e-marketing-news.co.uk
>