Last Updated on July 10, 2018 by Mathew Diekhake
If you’re a developer you probably read through a lot of documentation, forums and discussions to find solutions to your coding questions, learn new programming languages or master new tools. It always takes a great deal of time and energy to read through the many long threads discussing similar problems and potential solutions before finally finding that one answer.
Many of us are developers too, and we thought: what if Bing were intelligent enough to do this for us? What if it could save users’ time by automatically finding the exact piece of code containing the answer to the question? That is how Code Sample Answer was born.
You can see it live today on Bing by trying a query like “convert case using a function in R”, and we are sure you will appreciate how the code snippet is extracted for you from the article and surfaced right in the results in the requested programming language. If you need more context, the link below the code sample will take you directly to the corresponding article. And just to be clear – our support is not limited to the Microsoft family of programing languages.
The Challenge
According to Wikipedia, source code is any collection of computer instructions written using some human readable programming language, usually as plain text. Each programming language has a defined set of instructions, syntax and form that that is unique to that language.
We wanted our solution to support a broad range of programming languages. For Bing to be able to satisfy this requirement and find a corresponding code snippet answering a user’s query, it had to be able to parse and understand those instructions, the syntax and form of the many different programming languages. Would it be possible to do it without having to build a completely separate system for each language?
At the same time, developers prefer to search for solutions to their problems using natural language. Bing needs to be able to map the intent of the query (expressed in natural language) to the intent of a code sample (expressed in a programming language) in order to find the most relevant code sample for the query.We were able to solve this challenge by leveraging Bing’s Natural Language Processing (NLP) technology and language agnostic code understanding capabilities, dramatically cutting the time and effort of getting from question to answer for many developers.
How does Code Sample Answer work?
When a developer issues a complex query such as “send html email with attachment in outlook using java” containing multiple coding related terms: “html”, “java”, “outlook”, it takes quite a bit of work to correctly tease apart the actual intent of the user. In this case the intent is to send HTML formatted email with an attachment via Outlook using Java programming language. Getting query intent right is critical to being able to extract the most relevant code sample later in the process from a range of options available.
The natural language processing pipeline of Bing accomplishes this by converting the query to equivalent ‘coding query key-phrase’. Bing’s language agnostic code understanding engine then ensures the results correctly reflect the intent based on holistic query understanding rather than on simple, individual keyword matches. The diagram below shows this process in a bit more detail.
When a query is issued on Bing.com, it is first classified based on its intent as code or non-code type of query. It is then processed by several of Bing’s basic query alteration pipelines including our NLP pipeline and converted to a query key-phrase that can be subsequently matched to relevant web-pages.
Each of these pipelines is specialized for coding queries as their semantics is frequently quite different from general queries e.g.: consider the queries: “Chai or Mocha” and “Chai or Mocha assert”. While they may have some overlapping intent, the primary intent clearly differs. In the former case, a user may very well be looking for differences between chai and mocha beverage types, whereas in the latter it is quite obvious that the main intent is definitely to find out more about corresponding JavaScript testing frameworks.
In ambiguous cases such as this one, web results will continue to honor all likely intents and the Code Answer may be suppressed. It is only when Bing intelligently detects the coding intent with high confidence that that Code Sample Answer will trigger:
To achieve this level of precision for query intent detection Bing’s natural language processing pipelines for developers leverages patterns found in training data from developer queries collected over the years containing commonly used terms and text structure typical for coding queries. The system also leverages a multitude of click signals to improve the precision even further.
Once the query is classified as code vs. non-code query and key-phrase terms are identified, Bing’s language agnostic code understanding engine intelligently interprets the developer intent. This understanding of the intent is built from signals such as specific syntax, any API, tool or language names used in the query that are currently popular in the development community.
Based on query understanding the system then extracts the best matched code samples from popular, authoritative and well moderated sites like Stackoverflow, Github, W3Schools, MSDN, Tutorialpoints, etc. taking into account such aspects as fidelity of API and programming language match, counts of up/down-votes, completeness of the solution and more.
One of the key challenges in extracting and surfacing the best matched code snippet from a web page is that many of these pages can have multiple intents. e.g. this post on Stackoverflow has a primary topic as PDF generation by dynamic tables using Nodejs, but the same page contains a reply from a developer suggesting that Phantomjs is a solution. Such mixed suggestions on the page may lead to ambiguous results. So, to keep our results precise we extract web-page content using explicit semantic analysis which can measure the semantic relevance between a query key-phrase and a given web page.
The semantic score produced by the model captures the quality of match of the document snippet to the query. During the next steps in the process the snippets are evaluated, ranked, and the best one is returned.
In cases where we are highly confident that our best snippet completely satisfies user’s query it will be shown at the top of the result page as in the example above. You may also want to try: “array concat in js“, “c# string.substring”, “arraylist toarray java”, “php preg_match” to see more examples of high confidence answers.
In cases where our confidence is lower the Code Sample Answer will be rendered deeper in the body of the page (“c# native json“, “indenting bullets in html”, “INVALID PARAMETER EXCEPTION”, “how to move on root web site in sharepoint in c#.net”)
Closing thoughts
The functionality of Code Sample Answer is not limited strictly to programming languages, it also covers many tools developers commonly use. Thus, for example, if you have problems memorizing all the useful git commands and their syntax you can easily get that information as well using Code Sample Answer.
As we receive more and more signals from web searches and continuous feedback from developers we will continue working on enhancing the overall developer help experience and coverage of Code Sample Answer.
We count on feedback from users like you to help us understand what future enhancements would be most helpful and valuable in your daily tasks. We encourage you to use Bing when looking for any code related help, and if you have any suggestions or feedback, share it directly using the Feedback link on the page.
Happy Coding!
– Bing Tech Team (developer help)
Source: Intelligent search: Coding answers at your fingertips | Search Quality Insights