Machine Learning engineers from Talentica Software will be presenting their work on Fingerprinting Latent Structures at the 17th IEEE ICMLA 2018, Orlando, FL.
Summary of the Paper
One of the components of a Question-Answering (QA) system is an algorithm that can understand the articulation style of questions. Such an algorithm, if based on Machine Learning (ML), would require a large number of example questions for training. However, if one observes closely, the way we articulate questions depends on the answer we expect, which in turn is in the context of an underlying knowledge base. Grammar also plays an important role. This means we need examples of questions with different articulation styles to train the ML system.
Take this question: How many balls make an over? What is the knowledge-base for this question and what does the answer look like in the context of the knowledge-base? Those of you who are familiar with Cricket know that the answer is 6 valid balls where a ball is a type of action performed by the bowler. Now, let us look at the different ways to articulate the cricket related question: In an over, how many balls are there? The bowler can bowl, how many balls in an over? How many valid balls make an over? Do you know how many balls make an over in the game of cricket? etc.. etc.. Note how the articulation style is changing but the expected answer is still the same.
To train an ML system for Question-Answering one would need a data set with all possible questions for a particular answer, and all possible questions for all possible answers. This would lead to a humongous task of data collection. The alternative is to use a data set with different articulation styles and then let the machine learn the latent structure of articulation for each style. Based on the detection of the articulation style, the corresponding answer generation system can be triggered. This alternative approach would help us build a QA system that is accurate for a few types of articulations. As and when we improve the complexity of the answer generation system, we can support questions with more complex articulation. Until then, the system can choose to ignore complex questions. A fingerprinting system can be implemented to learn these articulation styles.
In this paper, the authors formulate the problem of understanding question articulation as an objective-driven optimization problem where examples of complementary objectives are not available. They show how the optimization problem can be solved and implemented using auto-encoders for fingerprinting. They also present k-fingerprints, an algorithm that refines clusters of questions such that the ability to separate articulation styles becomes more accurate. To know the technical details of the approach take a look at this pre-print. If one is interested in extending the technique to images, get some clues from this Slideshare.