“Talentica has been part of the family at Mist, and they have been a key part of our engineering team. They bring us startup spirit and a wide range of required skills like Data Science, AI, Cloud, DevOps, UI, and Embedded.” – Bob Friday, CTO, Mist Systems
Our DevOps Expert – Abhishek Amralkar will speak at the Pune Data Conference.
“Onyx- Distributed Computation for the Cloud”
About the event:
The Pune Data Conference brings together the Big Data Analytics community in Pune for a day-long event with multiple sessions on different topics such as Machine Learning, Artificial Intelligence, IoT, Hadoop Administration and many more conducted by the esteemed industry leaders and experts.
Monitoring modern real time distributed infrastructure is complex and expensive. In this talk we explore Riemann, specifically, how Riemann’s low latency helped us to get real time metrics from our Distributed Systems.
Large scale real time distributed systems require emitting hundreds of thousands of metrics per seconds for effective monitoring. A significant portions of metrics are either not of any use or we don’t understand them. With the rapid growth in infrastructure, monitoring infrastructure in real time and getting accurate metrics becomes challenging especially when you have an in-house monitoring setup.
Most monitoring systems are pull/poll based where your monitoring system queries the components being monitored. Pull based monitoring systems, where the system keeps changing some x values in every y minutes, are literally dead.
Riemann is a monitoring tool that aggregates events from hosts, servers and applications and can feed them into a stream processing language to be manipulated, summarized or action-ed. Riemann is fast and highly configurable. Most importantly, it is an event-centric push model.
We use Riemann to monitor Distributed Systems. Catching problems in real time requires monitoring tools that have low latency to detect errors faster and immediately see if the fix is working. Riemann provides this along with a transient shared state for systems with many moving parts.
One of the components of a Question-Answering (QA) system is an algorithm that can understand the articulation style of questions. Such an algorithm, if based on Machine Learning (ML), would require a large number of example questions for training. However, if one observes closely, the way we articulate questions depends on the answer we expect, which in turn is in the context of an underlying knowledge base. Grammar also plays an important role. This means we need examples of questions with different articulation styles to train the ML system.
Take this question: How many balls make an over? What is the knowledge-base for this question and what does the answer look like in the context of the knowledge-base? Those of you who are familiar with Cricket know that the answer is 6 valid balls where a ball is a type of action performed by the bowler. Now, let us look at the different ways to articulate the cricket related question: In an over, how many balls are there?The bowler can bowl, how many balls in an over?How many valid balls make an over?Do you know how many balls make an over in the game of cricket? etc.. etc.. Note how the articulation style is changing but the expected answer is still the same.
To train an ML system for Question-Answering one would need a data set with all possible questions for a particular answer, and all possible questions for all possible answers. This would lead to a humongous task of data collection. The alternative is to use a data set with different articulation styles and then let the machine learn the latent structure of articulation for each style. Based on the detection of the articulation style, the corresponding answer generation system can be triggered. This alternative approach would help us build a QA system that is accurate for a few types of articulations. As and when we improve the complexity of the answer generation system, we can support questions with more complex articulation. Until then, the system can choose to ignore complex questions. A fingerprinting system can be implemented to learn these articulation styles.
In this paper, the authors formulate the problem of understanding question articulation as an objective-driven optimization problem where examples of complementary objectives are not available. They show how the optimization problem can be solved and implemented using auto-encoders for fingerprinting. They also present k-fingerprints, an algorithm that refines clusters of questions such that the ability to separate articulation styles becomes more accurate. To know the technical details of the approach take a look at this pre-print. If one is interested in extending the technique to images, get some clues from this Slideshare.
Talentica is happy to be one of the key exhibitors at TechCrunch Disrupt yet again.
About the Event:
TechCrunch Disrupt is the world’s biggest and most impactful tech startup conference, and this year, we’re upping the stakes even more. Taking place at Moscone West, Disrupt SF will feature the biggest names in tech, from Reid Hoffman to Kirsten Green to Dara Khosrowshahi.
AWS Batch enables developers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (for example, CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.