Better ways to capture data sought
By Anick Jesdanun
Associated Press
NEW YORK Startups and leading tech companies, including search exemplar Google, are tinkering with new ways of culling and presenting information ideas that could prompt the next revolution in search.
Associated Press
"Because information is exploding, (the Internet) is going to become increasingly difficult to use if we don't get it right," said Liesl Capper, chief executive of Australian search startup Mooter.
Investigator Cynthia Hetherington of New Jersey says using Internet search engines for information can be "very frustrating."
Users who consider Google exhaustive are only fooling themselves, experts say. Today's search engines may be capturing as little as 1 percent of the Web, largely because of how they find and index online resources.
Investigator Cynthia Hetherington, who runs a Haskell, N.J., company, turned first to Google when she suspected an Australian company recently of possible fraud. But then she had to go to the Australian Securities and Investments Commission, LexisNexis and Dun & Bradstreet to find what she was looking for.
"It's very frustrating," Hetherington said. "It's like going to a library and only pulling one book off the shelf."
Currently, all search engines fail to capture the bulk of the "invisible Web" resources locked up in databases and inaccessible by the engines' indexing crawlers.
These include regulatory filings at the U.S. Securities and Exchange Commission, detailed reports on charities at GuideStar and complete archives of most newspapers.
Sometimes, accessing an "invisible" database requires payment. Search engines can't let you know about a document's availability for purchase if they can't scan it in the first place.
But even when a database is free, a site may require registration, prohibit search crawlers or use incompatible formats.
In particular, crawlers are stymied by dynamic Web pages, which are customized as users choose various options, such as car color at Cars.com.
To counter that, Chicago-based Dipsie Inc. is developing software that promises to fill out Cars.com's simple online forms, which are based on multiple choice, though not the complex ones for the government's patent and trademark databases, which require typing in keywords. A public test version is expected by summer.
Other companies are working to capture sound and video files that have troubled text-based crawlers.
StreamSage Inc. uses speech-recognition technology to transcribe feeds, so a search engine can pull out relevant portions of a long presentation.
Company president Seth Murray said Harvard's medical school and NASA already use the technology, but engineers still must speed it up for broader use.
Yahoo Inc. is going a less technical, more controversial route: Businesses can pay to ensure that their "invisible Web" pages get indexed.
But indexing more of the Web only brings up another challenge identifying the most relevant among the billions of documents available. So some search developers are focused on personalizing and organizing searches.
Eurekster Inc., a startup launched in January, is marrying search with social networking, in which friends, your friends' friends and their friends form online circles.
Eurekster guesses what you're seeking based on what others in your circle have found relevant.
The major search engines, meanwhile, are trying to localize results, with Yahoo! and America Online having an advantage over Google because they already have billing or registration information on many users.
At Microsoft Corp., researchers are exploring ways to return specific facts rather than entire documents. A search for "Marilyn Monroe's birthday" would return an answer, "June 1, 1926," instead of sites on her famous "Happy Birthday, Mr. President" performance.
"We still have this library metaphor of 'Let me give you back a bunch of books that might help you,' ... rather than 'Let me go through the books for you and figure out what you're looking for," said Eric Brill, a senior researcher with Microsoft's AskMSR project.
But don't count Google out. It has hundreds of engineers in California, New York, India and soon Switzerland working to make searching better, most recently with localized searching.
Google's director of technology, Craig Silverstein, said the industry leader must keep innovating because search is bound to morph into something completely different within a decade.
"It will be something that we haven't even thought of yet," Silverstein said.