Search engines today are more than just the weaker keywords that used to be. You can ask a question – say: "How is a tall tower in Paris?" – and they will tell you that the Eiffel Tower is 324 meters (1
How do they do it? Like all other days, they use machine learning. Machine learning algorithms are used to construct vectors – in essence, long lists of numbers that in a sense represent their input data, whether it is text on a web page, image, sound, or video. Bing records billions of these vectors for all the different types of media that he indexes. Microsoft uses an algorithm called SPTAG ("Tree and Space Schedule") to search for vectors. The input query is converted into a vector, and SPTAG is used to quickly search for "approximate nearest neighbors" (ANNs), that is, vectors similar to input data.
This (with a handful of hands) How can you answer the Eiffel Tower question: Search "How is a tall tower in Paris?" will be "close" pages that talk about the towers, Paris, and how high things. Such pages are almost certainly about the Eiffel Tower.
Microsoft today released the SPTAG algorithm as an open-source MIT license on GitHub. This code is a proven and productive class used to answer questions in Bing. Developers can use this algorithm to find their own vectors sets and do it fast: one machine can process 250 million vectors and respond to 1000 requests per second. In Microsoft's AI labs, there are several samples and roles, and Azure will service using the same algorithms. , creating not just a centralized, specialized tool that requires a lot of experience, but that a wide range of developers that solve a wide range of problems can use as part of their toolkit. The release of SPTAG is an example of how Microsoft translates these words into life; the combination of Azure and open source service means that developers can start with a more limited, easy-to-use service, and because their expertise or requirements become more complex, they can use SPTAG to create their own services.