The service will look and function much like Google’s regular web search engine, according to product manager Tom Stocky, but results will be limited to actual source code files.
Users would be expected to search for, for example, functions that they plan to implement. The results would be examples of those functions being defined and called in real source code.
To build the index, the Google crawler has decompressed any Zip files, for example, it has found on its trawl of the web. If it finds source code in the archive, it adds it to the index. Ordinarily such files code be found by filename, but users would not be able to search within the code itself.
Relevancy is based on the usual criteria that ranks the popularity of a site. So kernel.org, where the Linux kernel code is stored, would merit a high rank, for example, Stocky said. In addition, queries would be more likely to return a function definition before a function call.
The system is completely algorithmic, like the web search, and returns any publicly accessible source code, which mainly means code from open source projects.
Stocky acknowledged that the search engine has not been able to determine the license under which some of the code it has indexed is offered, but noted that the search engine would be a way for developers of proprietary code to discover if their work has been published without their permission.
Searchers will be able to restrict the results by programming language (C++, JavaScript, etc), by license and by the package the code comes from, Stocky said.
They will also be able to search using regular expressions, a rather complex way of expressing search terms used frequently in programming but not usually in web search.
As well as indexed decompressed archives of code, the service also plugs into various code repositories using the CVS and Subversion protocols, which are commonly used for managing open source projects, Stocky said.
Despite this, there will not be a Google Cache-style version search, Stocky said, unless multiple versions of code packages have been stored as archives.
Google will also make APIs available, so developers can build search plug-ins for whatever development environment they use, or for web sites where they host code.