Home Book Current Archive Privacy About Experts Contact


"In conversation with..." Jim Lanzone & Apostolos Gerasoulis of Ask Jeeves/Teoma


by Mike Grehan

It's been quite some time since we last had an "in conversation with" feature (but don't worry, there are more in the pipeline).

I'm particularly pleased (and honoured) with this one. Not only did I get to speak with Jim Lanzone, Senior Vice President, Search Properties at Jeeves. I also got to speak to Apostolos Gerasoulis, founder of the Teoma search engine (which powers Ask Jeeves) and former Professor of Computer Science at Rutgers University.

Anyone who reads my scribbling in this and other online publications will be aware that, I've long been a great admirer of the work of Apostolos Gerasoulis.

Of Greek origin, he is regarded as one of the leading figures in the field of information retrieval on the web. His background is in research on the processing of large data sets. And with the web being the largest data set there is - he has been able to look at it from a different perspective, enabling him to solve a problem in web search which had confounded many.

There are two main link-based algorithms which have been developed in the web research community. Many know of PageRank at Google, as it has been so visible there. The other is HITS (Hyperlink Induced Topic Search) an algorithm developed by foremost computer scientist Jon Kleinberg.

If you don't know the difference between the two and the problem which so hindered the Kleinberg algorithm, then some of this interview may not make sense. It's pretty much a non scientific conversation. However, a little background is probably required to get a total understanding, as there are certain points where an assumption of the reader's prior knowledge is made.

So, you may wish to first read a paper I wrote on the subject of the two algorithms and what set the Kleinberg algorithm apart. Again I wrote this as a non scientific paper in order to simply get the essentials across.

It's not a long read and many people have told me that it has been very useful in, not just helping them understand more about the Kleinberg algorithm, but generally speaking, why links are so important in search engine marketing.

You'll find it here (requires Acrobat reader):

http://www.search-engine-book.co.uk/LinkEquityExplained.pdf

Here are some takeaways from the interview.

Does Google use PageRank? No says Apostolos. Have they created some kind of local implementation based on anchor text? Yes says Apostolos.

In fact, when I heard that opinion, it really made me think that, what I wrote about Google, perhaps, moving towards keyword dependant results following the Florida episode, may have a little more substance.

And the number of times that Apostolos mentions that search is not just about the algorithm, "it's about the infrastructure" has me thinking that my opinion about Google's "dark fiber" dabbling may, in part, also be about that.

Here's another take away, from Jim, this time.

In this business we talk about up to 95% of all traffic on the web coming from search (that's an arbitrary figure as I've seen so many variations).

However, I was very surprised when Jim told me that 80% of all searches are non commercial! I said: "At Jeeves?" He said: "on the web!"

So when I made my little complaint about the 20 AdWords listings we tend to see before the web results at Jeeves, he was able to point out that 80% percent of search results don't have them!

And what's the future of search. Is it still about links? Actually, the future is more about data on users. Find out about the web-galaxy developed by us (the web page makers) and the user-web-galaxy controlled by the search engines. And the link we need to create between them.

If you haven't read one of these features before, I should explain that they are virtually uncut, in detail, verbatim transcripts of my conversations, which are part of the research for my book.

Subscribers to e-marketing-news are the only people to receive exclusive copies.

Usually they are face-to-face, but this one had to take place as a conference call. And as ever, it has everything... including the usual interruptions!!

If you're attending SES New York, I look forward to seeing you there.

Enjoy!

Mike

Prefer to print and read later? Download the pdf here.

<< Play >>

Mike:

Hey Jim, how's it going? Boy, that was a lengthy post you made in that thread over at Search Engine Watch forums.

Jim:

Yeah, well there were a lot of good points made. A lot of things which needed to be addressed. So I wanted to take a fair crack at it.

Mike:

You certainly did that. And did you see that there's been quite a follow up to it. It looks like they're trying to run your business for you over there [Laughs]

Jim:

[Laughing] Yeah, but I think someone made the point that it's because they're interested.

Mike:

Yeah. I think there's a problem because, when we talk about search and the way it's developing online, it's like people are talking about three and a half search engines. I'm hoping that the thread and this conversation are able to establish that there really is a lot of stuff going on over at Jeeves.

Jim:

Well you can start with the fact that there really is only three world class search engines. And we're one of them.

We may be smaller, but not by the capabilities of our technology.

<< Pause >>

[MSN had not launched it's search service at the time of this conversation

<< Play >>

Mike:

On the subject of technology, did you say that Apostolos would be joining us a little later?

Apostolos:

I'm here online now.

Mike:

That's great. I'm so pleased that you were able to join us. As you know I'm a very big admirer of your work.

Apostolos:

Thanks I appreciate that.

Mike:

As we're all together, in my time honoured fashion, let's do a quick background on you guys. Jim, you first.

Jim:

I'm part of what was a very interesting time in Jeeves history which was around 2001. Either by hiring or by small acquisitions, a new management team came in.

At the time, I was the founder and president of a company called eTour in the US, which was a top 50 to 100 website. It was well known for being the number one site for user frequency over a period of two years.

We had a registered members service so they were very dedicated, loyal users. We were symptomatic of that dot-com era. Our IPO was scheduled for April 2000.

And things tailed off pretty quickly after that. It didn't happen. We hung on for a little while and then we were sold to Ask Jeeves in may of 2001.

Steve Berkowitz [CEO Ask Jeeves] was hired the at the same time as president of the web properties division of Ask Jeeves. There was a three month transition period where I headed up the eTour side until Steve asked me to come and run the product management side.

So, I moved out of Atlanta, which is where we were located, to the Bay area in 2001. And then the Teoma guys came on in September of that year.

That was a very important time for us. We brought in Paul Gardi [Senior Vice President of Sales & Business Development Ask Jeeves] and Apostolos and some other new people...

Mike:

So you guys were kind of the next generation of Jeeves, yeah?

Jim:

Yeah!

Mike:

Apostolos, I probably know a lot more about your background in this industry because I've followed it quite a bit, as you know. But just for the record: what about your background?

Apostolos:

As you know I was teaching at Rutgers for about 15 years. And I think it was about 98, there was lots of discussion about search at that time.

Alta Vista was the number one search engine. And two papers came out. I think it was the CLEVER project from IBM that Kleinberg had started. And two guys from Stanford, Larry page and Sergey Brin, who had written a paper about PageRank so I decided to have a look at this work.

I had a seminar where we had a group of ten kids from Rutgers looking at this work. And when I was looking at it with them, I remember thinking, wait a minute, there are a lot of interesting things we can do here. And basically, ended up solving the problem of making CLEVER fast.

<< Pause >>

[The CLEVER project and the run time analysis problem which Apostolos solved, is detailed in the white paper mentioned in the intro to this feature]

<< Play >>

If you remember, at that time, the whole discussion was about which is the best method? Is the PageRank method, or is it the CLEVER method.

And if you look at the work at that time, everybody agreed it was the CLEVER method. And indeed it is, because it's a local method. People are interested in finding out about the subject. And this is a subject specific method.

Around about that time everyone was thinking that IBM was going to come out with the best engine. But people didn't realise, it wasn't just the method itself, it was about building the infrastructure to deliver. And PageRank was easier to deliver because it was implemented offline, rather than online.

And so they got a bit ahead. I remember when I started Teoma, I met with some friends of mine, showed them what I was doing and they said this is great. So, I started Teoma in September 2001. By then I think Google had a couple of years ahead of us.

And even then, everybody was questioning the CLEVER method: Is it possible to do it?

There was a paper by some Berkley guys saying this is not feasible!

I remember talking to Ask Jeeves when we were negotiating and they said "you guys are a hundred million, but everybody is questioning your scalability, right?" And I said: "No, it is possible believe me! [Laughs]

We were only seven people at that time and we just needed some more money. So, anyway, we launched Teoma. And that was a desperation move, because we were running out of money a bit. But then the whole world started saying: hold on a minute - this works!

Then had a lot of offers which I can't mention right now. But I decided that it was critical to put it out there so that there was an alternative. Rather than just one technology. So I went with Ask Jeeves. As I say, there were opportunities to go with others...

Mike:

I want to come back and get a bit more in-depth about the work that you've been doing. About the difference between Kleinberg's work and what they're doing over at Google.

But you've just reminded me of a story that I wrote for Search Day which was looking back at the history of search on the web. The three guys who were on this particular panel at the search engine strategies conference in San Jose, were the founders of Alta Vista, Excite and Infoseek.

And each one of them told the story of how they had been approached by Larry page and Sergey Brin with PageRank and they all admitted to, kind of, throwing them out the door! And then think how big it got [Laughs]

<< Pause >>

[Mike's phone rings and he's forced to answer it. Someone from Taiwan is calling to see if they can get a copy of his book in Chinese! <<play>>

So, where was I? Oh yeah, Google hawked PageRank around all of the major search engines at that time.

I guess this is a very pertinent question, but with the better approach you were taking, did you ever think about talking to Google about it?

[Pregnant pause in conversation]

Jim:

You know what, we can't go there, Mike [Laughs]

Mike:

Okay, let's move on to a small world thing. Craig Nevill-Manning is the senior research scientist at Google. He's a former Rutgers Professor as well. Did you guys ever get together, Apostolos?

Apostolos:

Actually, Craig was helping us, initially, with Teoma. So yeah, we were very good friends.

Mike:

Really! So, do you guys ever see each other? Because if you do, I want to sit in the booth next to you at dinner and record the entire conversation... [Bursts out laughing]

We'll come back to the technology thing in a moment, but of course, I've been asked by some of the guys in the forums to straighten out some things with Jim.

Jim, we'll get into some of the brass-tacks of the webmaster stuff in a few minutes. But first, can we go through a kind of, relationship chart, because there's been this rumour that Jeeves bought Teoma and has now forgotten about it...

<< Pause >>

[At this point somebody walks into Mike's office to get him to sign something. And while his attention is distracted, he realises that, although he was talking about the Jeeves/Teoma purchase, he actually asks Jim to explain what has happened since Google bought Jeeves. Oops!]

<< Play >>

Jim: [Laughing at Mike's momentary lack of concentration] Well, there is no relationship. It is all Ask Jeeves now. As Apostolos said earlier, it was only seven people and it's now into triple digits.

I mean, regardless of the brand name out there, we're one company. Once group. Ask Jeeves is as much Teoma as Teoma is Ask Jeeves.

Mike:

So, you'll keep both brands going then?

Jim:

No, that's not necessarily the case. It's still up in the air on what we need to do about that. From a traffic perspective, the only brand that really makes sense is Ask Jeeves. That's the site with over 20 million users in the US and over ten in the UK.

And there's been no other search engine, since 1998, that has gotten over a million users a month. And that includes Teoma and AllTheWeb and any other start-up that has come along.

I mean, for all the noise, A9 is still not a top 2000 site as ranked by Alexa, which Amazon owns.

Mike:

A9 has some interesting stuff going on, so I may touch on that technology again later on. But for the guys in the UK, can I just clarify one thing. I was led to understand that Ask Jeeves in the UK is a separate company...

Jim:

No, we purchased that company in 2002. So it is now part of Ask Jeeves group as well.

Mike:

Okay, so what about the other acquisitions?

Jim:

ISH? [Interactive Search Holdings]

Mike:

Yeah, I think people are wondering where they all fit into the scheme of things. Obviously there was a purpose for that purchase. My guess is increased traffic, increased inventory because AdWords plays such a big deal on the financial side of the business...

Jim:

Exactly. They had a lot of search volume. Which goes back to my point about brands that have grown traffic. ISH has iWon and more recently Excite and MyWay and they had a lot of traffic. So, yeah, that's part of us trying to increase market share.

We're going to do that over time with our main brand, but that's not something that you can just double over night.

Mike:

But for the future there's the possibility that they'll all just come under one big umbrella and all be part of the Jeeves brand.

Jim:

Actually, I don't think so and disagree with Danny on that. Teoma is a little different because it really doesn't have a brand outside of the grass roots community.

Mike:

But the guys in this industry love it!

Jim:

Yeah, the technology. But in terms of brands you can grow with the mass market...? Merging brands is not something which works well either.

So taking iWon and Excite and making them one brand, or making them Ask Jeeves Portal is not something that really goes down smoothly with users. But that doesn't mean that's confusing to users because for the most part, they aren't aware of the corporate body behind Excite or iWon. It's still just the site that they know and use.

If you go back to Steve Berkowitz's past. He was the president of IDG Books. They own the Dummies series. And they purchased things like Cliff Notes and Frommers and other major publishing brands and operated them independently.

So that is a model that does work. And, in fact, I'd say it would be cannibalistic to just slap the Ask Jeeves brand onto everything over night. That wouldn't work.

Mike:

So, is it likely that they'll all be powered by Teoma? I mean all show Teoma results...

Jim:

Yes.

<< Pause >>

[Once again, Mike is interrupted by another member of staff. He writes a note with the words "interrupt me and I will kill you" and sticks it on the window of his office door

<<play>>

Mike:

I've made Paul Gardi shudder a few times Apostolos, when I've mentioned Teoma and Kleinberg's HITS algorithm in the same sentence. But there is an underlying influence of Kleinberg's work in the Teoma algorithm, as we know. You touched on that before. Is it fair to say that Kleinberg's work was also an influence on PageRank ?

Apostolos:

There is some debate about this. I think both of them were independently discovered. Has CLEVER influenced Google? The answer is: later, yes. Have they implemented PageRank? The answer is no.

Mike:

Well that's very interesting that you mention that. Because I was going to ask you. In 1998 both PageRank and Kleinberg's work were both pretty much advanced because search on the web was in its infancy then.

But, it's 2005 now, so things have changed. And I wanted to ask you about how important you think PageRank is in the work that goes on over at Google?

Apostolos:

The importance has diminished because PageRank is just one piece of the ranking algorithm over there. The ranking algorithm is so much more complex now. And PageRank is just used when they want to break ties.

I'm sure that they've not implemented Kleinberg's algorithm. But I'm also sure that they have created some kind of local implementation based on anchor text.

They've been very, very solid for the last few years. I think the biggest improvement they've made is anti-spam inclusion and consistency. But what surprises me is that they haven't used subject specific popularity, even though they know it's important.

Mike:

Well, I've talked to guys from Google about this. And you mentioned earlier that, when Teoma was launched there were rumours that it wasn't possible, you can't do this, it would take a server farm the size of Texas and all the electricity in the world to create just one calculation.

And obviously you've managed to do it. So, if you could just write down how you did it on a little piece of paper for me... [Bursts out laughing]

Apostolos:

Yes, it can be done.

And this has to do with my background. I come from a different background. So, I looked at this from a different perspective and was able to see a method of how this could be done.

What you guys are missing, is that, we're at the early days of search development. I know people want Teoma to succeed. But what people are forgetting is that we're still at the beginning of search.

If you were to ask me: Where are we standing right now? Well, in 2000 we were at level one. In 2004 we were at level three, and we need to get to level ten.

I can go back to the argument about Jeeves abandoning Teoma. In fact the opposite has happened. We were only seven people when we were bought by Jeeves. And search is not just about the algorithm. Search is about the infrastructure. And for us to go from a company which was worth 35 million dollars, to a company which is worth a few billion is quite an achievement.

I can assure you that we now have the infrastructure which is very hard to build. Jim mentioned it earlier, it's not about the algorithm alone, it's about building the infrastructure to be able to deliver 24/7never going down.

Jim:

Mike, it's what I was saying, the last successful new search brand ever to launch was Google...

Mike:

So, people ask me: Who's going to be the next Google? And I say, it's like asking who's going to be the next Beatles. It's never going to happen. I think anyone trying to get into this space, general purpose search, I going to have a problem.

Jim:

Sure, people are asking what's going to be the next technology that comes along. That's important. I think Andrew Goodman said that. It's important but not sufficient and that's true. What Ask Jeeves had was a brand. We had a well known brand but not a high quality product.

So we've worked our tails off the past three years, the Jeeves team and the Teoma team, as we've grown it and started bringing quality to the table.

Statistically we know that this engine is far more advanced than it was in 2001...

Mike:

Is Microsoft going to bring in new technology? Is that going to be the next generation? Or are they just building on what already exists out there, Apostolos?

Apostolos:

Mike, I'll tell you something. In 2000, I think I was really fascinated by the solution and making it work. There is some truth in what you say about us being a little low key while we build the infrastructure. But I'm as excited now as I was in 2000. I'm overwhelmed with excitement, because what we missed at that time, and it was something important, that there are what I call two galaxies.

The galaxy of you guys who build the web pages and there's also the galaxy of the users. And the thing which to me is beautiful and amazing, is that there has not been a connection between these two galaxies.

I can on and on and tell you how important your industry has been. I have utmost respect for you guys. And your impact is not recognised as much as it should be.

What you have been trying to do is to create highways between the galaxy of the users and the galaxy of the web, by building pages with the right keywords so that people can find them.

I don't think it has been recognised how important your work has been making all of those pages with the misspellings so that people can find them...

[Mike and Jim both have a chuckle at this comment]

Mike:

There seems to have been this very strange, strained kind of relationship between the search engines and search engine marketers as it's known or search engine optimisers, as was.

When I was trying to write the first edition of my book, just trying to get to speak to someone at a search engine meant hanging around in a corridor for five hours or something!

But by the time the second edition came around and we had paid inclusion and pay per click and money was involved - they were knocking on the door to take me to lunch!

There is something that I do worry about though. And that's to do with network theory. I'm not a scientist but I'm fascinated by how this works and the rich get richer issue.

I wrote a paper called "Filthy Linking Rich" and I wonder if in this industry we are in part responsible for that? You know for kind of shifting the ecology of the web with our highly optimised pages and trying to build linkage which is, perhaps, a little on the false side...

<< Pause >>

[There's a link above in the intro to this feature, to the paper "Filthy linking rich"]

<< Play >>

Apostolos:

I think what you do, as long as you're trying to build high quality pages and...

What your achievement has been, really, is in finding the right words for the users to find. And nobody is discussing this.

Take someone in my neighbourhood who is building a café. And he wants local people to find that. What is she going to use as keywords to be found?

I was so impressed with what you guys were doing to help us. I mean help the search engines to find that local café.

There is a problem with the rich getting richer. But I don't think there is a problem with you helping a local café to be found. I don't think anyone would question your contributions there.

Mike:

Can I just touch on paid inclusion? At the beginning of the year [2004] I was in New York when Yahoo! launched its new search engine and SiteMatch with it, which is their paid inclusion product.

I've never really believed in this very much. Subscriptions of URL's to be crawled is one thing - but XML feeds? Nah.

You guys pulled out of that fairly quickly. I've always had the suspicion that those XML feeds really couldn't blend very well with connectivity, with linkage data...

Jim:

Yeah, that's essentially why we pulled it. I don't think this is the end of that approach. If you could keep it in a contained environment where you have apples to apples...

But the mixing of apples and oranges, the structured data with the unstructured... it just didn't work.

You were either guessing how they should blend. Or if you weren't guessing you'd be fixing. And we certainly didn't want to get into that. So how do you do it?

And when we realised that and then you layer on top of that all of the other things we had to do...

But I don't think that working with feeds, not just with vendors, but with anyone, is done forever.

I'm looking at movie listings in the newspaper for where the movie Sideways is playing and those things are paid! It's interesting, especially on the editorial side, just how much of a newspaper or magazine is actually paid for. I never realised it.

Mike:

Yeah, you suddenly find you're just reading around a lot of ads!

Jim:

But they're not really adverts, I never knew that those were paid until this year. You know where it says: this movie theatre is showing these 8 movies. And I was thinking, well why isn't the other movie theatre mentioned.

Mike:

Can we talk a bit more about paid placement then, Jim? This was mentioned over at the forum the way that Jeeves presents listings. You know, with the 20 paid listings before you get to the actual web results.

I understand that AdWords pays the bills, but do you foresee that layout changing?

Jim:

Yes, but I should go back and add something to that conversation. The SEMs [search engine marketers]are only really looking at queries that are commercial. Because that's who their clients are. And they're not looking at the vast query stream where we don't have that issue.

On the majority of queries on our site there are no ads at all.

Mike:

That's interesting...

Jim:

It's just that when they are very commercial you have a lot of ads.

That's good news in those instances. The bad news is that it's the reverse of the competition which gets an editorial result above the fold on every query. We're not there yet.

So, we have a lot of obligations as a company, we're aware of it, we're working on it. That's why I'm saying stay tuned. Give it some time and we'll get there.

Mike:

Could I glean from what you've just said that, the majority of searches at Jeeves are non commercial searches?

Jim:

The majority of searches on the web are non commercial! And the SEM community is very unaware of that.

Mike:

Hmmmmm! Exactly, that's because we spend time looking for people who want to spend money online so that we can make money ourselves.

Jim:

Exactly! I mean the majority of usage by our users is on editorial results, on editorial features - not on ads.

<< Pause >>

[Since this interview took place, Jeeves has changed the presentation of its results pages.]

<< Play >>

Mike:

One of the buzz words/terms for 2004 certainly was "personalisation". What are your feelings on that Apostolos? Is that going to make a huge difference.

Apostolos:

Personalisation or localisation?

Mike:

I sometimes just think of localisation as being a subset of personalisation... Is that a way to look at do you think?

Apostolos:

Personalisation is quite hard. There are certain patterns, certain things for which you can give high quality results. But because people have certain behaviours, they sometimes don't remain stationary.

I think personalisation is interesting... but it's quite hard.

Mike:

I'm just wondering how close we're getting to personalisation. because if you look at the components of the major search engines, Jeeves included, they're similar.

For instance, Google was left behind on the personalisation side because it didn't have a subscriber base. Whereas, Yahoo! did from its previous history. They had 150 million subscribers. So they had a bit of data about their users.

And then Google introduces Orkut so that they bring people in and get a bit closer too them. And then they introduce GMail where people now have to log in. They have those kind of ad on services at Yahoo! and also at MSN where users have to log in.

The when you look at Jeeves acquisition of ISH and you see MyWay and... Well, all of a sudden you guys have sticky web mail and social networks and desktop search and that sort of thing...

Is personalisation all about locking people into the brand?

Jim:

Mike, I think your going down the right track on this. Because when people talk about personalisation and search, they talk about it as a way to deal with ambiguity.

You see the example of "bass". If you type that in as a query and I know that you like fishing, then I know you don't mean the beer!

But those instances are actually pretty rare. It's more important to actually give people the opportunity to clarify their query on topics they've never asked before.

That's like our related topics feature that came out of Teoma. It's easier to do that than it is to deal with it on a personalisation basis and try to understand what they're talking about.

So what you're getting into, Mike, is about the relationship with the user, let them have a better search...

Mike:

Yes, I agree you're right. There are two sides to it. I've looked at the theory behind personalisation and how it would work. At the moment we're looking at this linkage data and seeing how it relates and looking at all of the clues you can get from that. And the next thing is if you've got a peer group, if you got 15million guys all looking for Bass beer, or whatever it is, then you can rationalise results around that. Which should provide a better user experience.

And what I'm thinking is that, if you have all of this stuff then you're actually locking somebody into the brand.

And until you get it wrong for them, then there's no need for them to go and search anywhere else!

Jim:

Actually, I think it's a lot more interesting than even that.

The vision behind MyJeeves, which made us the first major search engine to launch what we called "personal search service, was understanding just what you said, being able to contain your search to what's important to you from a data perspective. It's very compelling to people.

What I was talking about when I was showing MyJeeves to people originally was... Well, envision this, it's 50,000 results because you've been using Jeeves for ten years. And out of the billions and billions of web pages out there, you'd save these 50,000 that are important to you. And then we give people the ability to add their own meta data to those. And that meta data is an immensely important part of personalisation.

I just got back from the consumer electronics show and wouldn't mind sharing something with you from that.

Meta data, not so much for the web, but when you go up the staircase of mobile, video, music, pictures which all require meta data to become important i.e. to become searchable and findable later: that's the vision behind it. And, of course, that's just you. You can also imagine layering something against that, such as different profiles.

So, if I'm a dentist and I just want to search against documents that dentists have found important and saved and created in their own community, I can do that. If I wanted to search in my family network for, let's say photos and music, it becomes more important in that vein.

Mike:

Just picking up on meta data. When I first came into this business, or dabbled in it as it was at the time. Meta tags were very important for a web page.

As you just said, for those non HTML files, multimedia and those kind of things, then the meta data is vitally important.

The problem we had with HTML pages is the fact that, it didn't take very long before, being humans as we are, that people just started to lie in their meta data [Laughs]

Most of the meta tags I saw in the early days, bore little or no resemblance to what the page was actually about!

Jim:

I think people are going to need to be trained. The digital photo and digital music revolution are going to be a big part of that teaching.

Especially photos where, if I'm using Flicker I need to label these photos so I can find them later. Whereas, if I'm using Jeeves desktop search, which is another important part of that staircase, then I need to label them correctly and get them in the right folders and subfolders.

If I want to find that photo of my trip to Monterey last month, then I better not label that one "cute picture" or something!

[Mike and Jim have a little chuckle at this]

I really need to label that one something about my son at the beach. I maybe want to put the year on it too. If in 20 years I want to use desktop search to find that picture, then just identifying the year may become important.

Mike:

Some years ago, about 2000 I think it was, I was talking to Craig Silverstein, Director of Technology at Google about tags on pages. We were talking about images and he said that, a lot of web developers are very good. But they do tend to give images kind of library references i.e. jpeg23147 or something.

He said the smart thing to do would be to give it a name and say in the alt text "picture of a fish" or whatever it happened to be.

I wrote about it in a newsletter and everybody took that to be the clue to ranking better at Google and immediately started stuffing alt tags with keywords [Laughs]

Jim:

That's why personalisation is so important. If we're talking about spam, that will decrease the likelihood that I'm going to have one of those spam pages in my personal index.

With personalisation it's about what the user finds important so quality is more important.

Mike:

Just to finish the story about the alt tags. Google was just working on its image search at the time and I think it was more of a ploy to get people to label images and mark them with an alt tag for the benefit of the new service [Laughs]

Apostolos, I want to talk about crawling the web for a moment. This still seems to be the most primitive part of search engines and still seems to have a way to go.

What you tend to find in this industry is that, we try to create pages for crawlers. And I'm just wondering, philosophically, if we are doing the right thing, if you know what I mean.

If we create pages for crawlers, then search engines will have less of an idea that there are problems with crawling and therefore won't do much to improve or fix them...

Apostolos:

I saw your comment about a page which was a cloaked page. I looked at it and it wasn't really a cloaked page, the author of the page had taken out the images to make it more text based and crawler friendly.

Crawling is primitive because it can't rely on an infrastructure which is built by an entity but by a chaotic infrastructure. Wherever you go on the web, everybody has different rules.

To our amazement we still find pages for Ford or large companies which don't know how to create pages to be crawled.

You know there are large companies which have robots.txt files which exclude search engines and wonder why they're not getting crawled!

[Mike has a knowing laugh at this too]

I think that everybody has made a lot of progress and I think we're at the point where the web is easier for the search engines.

There's not a uniform way of building pages. And optimisation is the only way for us to try and create a sort of uniformity.

The web is chaotic. There are no standards. People have tried to put standards in place. So I think it's very important to have pages that crawlers can find. We have problems finding pages that have no text.

I mean how do you recognise pages that are Flash without anything in them?

Mike:

Are you crawling Flash yet?

Apostolos:

Yes.

Mike:

I'll tell you what I'd like to do right now, while I still have both of you guys here. We've been talking about creating pages and optimising. Let me just take advantage and ask you what we need to do to build the perfect Jeeves/Teoma web site? What do we need to do to rank well?

Apostolos:

Well, build a great page which is recognised by the web. The only problem you have is how quickly this page will be recognised by the web, but Teoma, I don't know if you're aware has recently improved the crawling speed dramatically. So, it's basically building a great page which will be recognised within its community.

Don't try to build your own community and say: "I'm the one who recognises myself" though!

Now, with Teoma, as I mentioned before, we're going into the next generation. I want to get you excited because 2005 is going to be the most exciting year for search engines.

Now it's not just about communities, it's about the users. There are new technologies coming in which will change the way that people access information.

I don't know if you saw BBC news in the UK? There was an article where they had a competition with Teoma Vs all the major search engines and we came at the top. I don't know if you read that?

Mike:

I haven't seen that one, no. But I must go and find it. I do some stuff with BBC!

I have to tell you, when you I started working with the BBC on some stuff last year, I introduced Teoma to them. They had heard of Jeeves but not Teoma.

And since I introduced the guys doing the technology stuff to it, it's become their favourite search engine.

<< Pause >>

[You'll find the BBC article here:

http://news.bbc.co.uk/2/hi/uk_news/magazine/4003193.stm ]

<< Play >>

Jim:

Mike, what do you perceive to be the difference between them... Jeeves and Teoma?

Mike:

I like the presentation at Teoma. I have to say I've learned to live with Google's presentation of natural results down the left hand side and adverts down the right. With Jeeves I find it a little annoying at times that, perhaps because I'm doing mainly commercial searches, I get that long list of paid results before you get down to the natural results.

I'm not a big fan of the frames at Jeeves. I know I can look around and find a link to get rid of the frame. If I do a search for digital cameras and all I want are some reviews, then I still have to get through ten affiliates and other stuff before I can find them. This is just a personal thing, of course.

I guess, at the end of the day, the user is the most important thing to you guys. And if they're not complaining, that's the most important thing.

Jim:

But you're in the UK and they're still a little behind what's happening in the US.

The frames thing is interesting, by the way...

Mike:

Yeah, well that's probably because it's a bit more difficult for people inside the industry. We're just too close to it and we're a bit more objective about it than the average end user, I suppose.

There are things in this industry that only we care about. It's like the PageRank thing. I've said so many times that I don't believe that Google uses PageRank . It's the Emperors new clothes if you ask me. Only people in this industry worry about it.

If you ask the average end user at Google whether he's bothered about PageRank he probably wouldn't even know what it was.

Jim:

Of course!

Mike:

So, what about the future? I wouldn't like to get too deep into the technology but, you were talking about the end users Apostolos. There's a lot that we can learn because there is history now. So I guess with learning machines, genetic algorithms, there's a lot more data that you can work with than purely linkage data...

Apostolos:

I touched upon it a little earlier. Like I said there are two galaxies. One is the user galaxy which you guys don't know about. And the other is the web galaxy which you guys work with every day.

The lucky ones, really, are the search engines because they own the user galaxy. We were talking about personalisation, let's touch on that again. It's not the individual who is important, it's the group.

We belong to groups and we have group behaviour. And this to me, going back to the subject specific popularity and the clusters we have created on the web... Well just imagine if we can create clusters of user behaviours!

This, to me, is the next break through. I can't talk to you about this in detail, but I can tell you, one important thing: go and read that BBC article and you'll realise why we're the best!

Mike:

I didn't dispute it for a second [Laughs] And I'll certainly go and read the BBC article. It's strange that I missed it because I check the BBC a lot.

Apostolos:

I know the webmaster world has been focusing on only one thing and that's commercial. So webmasters have not noticed how much Teoma has been improving because they're only looking at commercial.

Probably excluding you, Mike, and some other people who have noticed it.

We've improved dramatically. And I haven't disappeared, as Jim said, I'm here. I have been for the past three years and I'm more excited than ever. And I want to make it happen for you guys and the world, that's the reason I'm here.

I'm excited, it's about what I call the "userweb" and the web that you are creating, it's the integration of the two that will take us to the next level.

Mike:

Fascinating.

Can we just touch on one more thing which we haven't really covered, although you did mention that cloaked page earlier. And that's the subject of spam. It is a major issue. Are we going to be able to clean up the index and get rid of rubbish... Will we be able to tame the wild west, do you think, Jim?

Jim:

I think relative to the other two major engines, we do very well. I think unfortunately it's a problem that all three of us have. That's because value [money] there in gaming the engines. Luckily, at our core, we have some technology which makes it harder to spam.

Moving forward, we're all developing technology to deal with this. And I think we're all going to be upping the anti in that war, from our side.

And as the web evolves and moves up the staircase with personalisation and new types of data it'll become harder and harder to spam.

Mike:

That's what I was kind of figuring, that it's going to get harder. I sometimes wonder, it's a very fine line, but the methods and the efforts that people put into spamming may be better spent just creating some great content instead of crappy spam. You may have somebody linking to you and end up just doing well naturally!

Jim:

If people don't find it valuable... Well, you can run for a while, but you can't hide. And if people don't find it valuable, we're gonna know.

Mike:

Listen guys, I know that I've taken up a lot of your time and I really do appreciate it.

Apostolos:

I just want to get a message to the community. I've been following you and reading your stuff and I learn a lot from you. So there is indeed an appreciation of the simple fact that you were able to help us. Overall, I think your contribution should be recognised.

Mike:

Wow! Absolutely fantastic!

Listen, if you're at SES NY, dinner's on me guys.

Jim:

Mike, you're gonna have you come down to Piscataway.

Mike:

Sure, I don't mind. I'll come down a see what you're doing down there.

Jim:

We'll take you to White Castle...

Mike:

Sounds great - what is it? [Laughs]

Jim:

These are very good cheap hamburgers, famous in New Jersey.

Mike [Bursts out laughing] Oh really? Sounds just like my kind of food.

Once again, thanks for your time guys.



© Mike Grehan 2005

Editor: Mike Grehan. Search engine marketing consultant, speaker and author. http://www.search-engine-book.co.uk

Associate Editor: Christine Churchill. KeyRelevance.com

e-marketing-news is published selectively on a when it's
ready basis. ©2005 Net Writer Publishing.

At no cost you may use the content of this newsletter on
your own site, providing you display it in its entirety
(no cutting) with due credits and place a link to:

< http://www.e-marketing-news.co.uk >

In This Issue
Newsletter Signup
e-mail:
We respect your privacy.
Your Editors
Editor: Mike Grehan.
Author: Search Engine Marketing: The essential best practice guide

Associate Editor: Christine Churchill
President, KeyRelevance

Subscription Info
To subscribe, click here

To unsubscribe, click here

Trouble subscribing / unsubscribing? Send mail here


e-marketing-news is powered by MailLoop. Fire up your own newsletter and power it up from your desktop with this multi-feature email processing software.