A Search Engine for Code

Kathryn Stolee
Kathryn Stolee

Writing software is kind of like solving a puzzle,” said Kathryn Stolee, the Harpole-Pentair Assistant Professor of Software Engineering.

Any programmer who has suffered long hours in search of missing code can attest to this analogy. But now, thanks to Stolee’s research and development of Satsy, a new code-specific search engine, digging up those final missing pieces has become easier than ever.

“I wanted to find a way to help programmers reuse existing code so they don’t have to re-invent the wheel.” said Stolee, who added that much of the programming code we write today has likely been written in the past. “I also wanted to assist novice programmers who don’t have much experience or formal training. I think that’s who will find the most value; people who know what they want to do, but aren’t quite sure how to do it.”

The first thing Stolee did was conduct a survey of programmers and their code searching habits to understand their needs and how to meet them. She gathered information on how often they search for code, what information they were looking for, and what tools they used.

After analyzing data from the survey, Stolee was not only surprised to learn just how frequently programmers searched for code, but she also was surprised to find out where they were searching for it. The survey revealed that Google search was the most frequently used tool among programmers, despite the numerous code-specific search engines available.

“It’s surprising, because Google wasn’t designed specifically for code search,” said Stolee. “It was designed for general search, and although it currently serves as the best tool, I think that for certain types of searches, we can do better.”

The survey results seemed to show a demand for a code-specific search engine that is accurate, efficient, and able to give Google a run for its money.

This is where Satsy comes in.

Satsy is the program that Stolee developed to help programmers search for code quickly and efficiently. Unlike Google where you search using a question or phrase, Satsy utilizes input values and output values to locate source code that best matches the programmer’s needs. By eliminating the textual query, Satsy is able to search by using the behavior of a function, rather than by the way the function is written into a search bar.

“You provide concrete examples of inputs and concrete examples of outputs for your desired code. Then, Satsy uses a constraint solver to find existing functions that satisfy the examples,” said Stolee. “It’s not easy, which may be why it hasn’t been done before, but it is more intuitive than a textual search and can achieve higher precision. By using a constraint solver, we also can find code that approximately or partially matches the examples when an exact match does not exist.”

The science behind Satsy (click to enlarge)
The science behind Satsy (click to enlarge)

Satsy scans through a library of source code called a repository, and pulls out any and all functions that satisfy the user’s initial values using an SMT solver to determine if a function matches the provided examples. A ranking system is then applied to these functions to determine which ones are most likely to satisfy the programmer’s needs.

After developing her approach, the next step in Stolee’s research was to evaluate how Satsy would hold up in practice when compared to the competition; in this case, Google and a pre-existing code-specific search engine called Merobase. Stolee gave programmers simple tasks that required them to run searches using each tool. Afterward, she collected data to determine which search engine provided the most satisfying results from the programmer’s standpoint.

Stolee’s evaluation was promising, showing that Satsy out-performed Merobase in providing relevant search results. While Google still provided the most relevant search results of three tools, the results were competitive.

When the evaluation was complete, Stolee began looking at areas to improve. One area she hopes to tweak is the ranking system used in Satsy. By increasing the accuracy and effectiveness of the ranking system, Stolee believes it could match Google.

“Google wouldn’t be as effective as it is if it didn’t have such an effective ranking system,” said Stolee. “Without a good ranking system, it’s hard for programmers to find what they want to use.”

Now, as Stolee looks toward the future, she has identified multiple “next steps” that she plans to take toward improving Satsy to make it more helpful and effective. Most importantly, she hopes to gain more knowledge about what the programmer already knows, and what information they are looking for when conducting code searches.

Stolee also is working with a senior design team to create an interface that will make Satsy more user-friendly and efficient. Once the program is fully developed, further tests can be run and measured more accurately.

Looking ahead five years, Stolee hopes to make Satsy publicly available to programmers everywhere. She also hopes that it can be adopted and used in practice by students and professionals alike.

January 8, 2014 by Brock Ascher