Pune JS Meet -Talentica Chapter: Git Internals

This meetup aims to help you visualize what Git is, what exactly happens inside the local repository when you run different commands and how to avoid and recover from some non-trivial situations.


Talentica software Pvt.Ltd.office No. 501 Amar Megaplex, Above D-Mart, Baner Pune, 411045

Abhishek Amralkar will speak at the Pune Data Conference

Our DevOps Expert – Abhishek Amralkar will speak at the Pune Data Conference.

Topic Details: 

“Onyx- Distributed Computation for the Cloud”


About the event:

The Pune Data Conference brings together the Big Data Analytics community in Pune for a day-long event with multiple sessions on different topics such as Machine Learning, Artificial Intelligence, IoT, Hadoop Administration and many more conducted by the esteemed industry leaders and experts.

Read More

Abhishek, our DevOps head to speak at Clojured Berlin 2019

Monitoring modern real time distributed infrastructure is complex and expensive. In this talk we explore Riemann, specifically, how Riemann’s low latency helped us to get real time metrics from our Distributed Systems.

Large scale real time distributed systems require emitting hundreds of thousands of metrics per seconds for effective monitoring. A significant portions of metrics are either not of any use or we don’t understand them. With the rapid growth in infrastructure, monitoring infrastructure in real time and getting accurate metrics becomes challenging especially when you have an in-house monitoring setup.

Most monitoring systems are pull/poll based where your monitoring system queries the components being monitored. Pull based monitoring systems, where the system keeps changing some x values in every y minutes, are literally dead.

Riemann is a monitoring tool that aggregates events from hosts, servers and applications and can feed them into a stream processing language to be manipulated, summarized or action-ed. Riemann is fast and highly configurable. Most importantly, it is an event-centric push model.

We use Riemann to monitor Distributed Systems. Catching problems in real time requires monitoring tools that have low latency to detect errors faster and immediately see if the fix is working. Riemann provides this along with a transient shared state for systems with many moving parts.

Riemann is written in Clojure and leverages its core concepts. Riemann configs are Clojure code.

We will walk through the concepts of Riemann

  • Events
  • Streams
  • Indexes

We will also go over how to run Riemann in a production environment and how to write Riemann Clojure configs.

We will conclude our talk with the demo for monitoring distributed systems like Apache Zookeeper.

Get more details about the event here.

Talentica’s engineers to share their ML research at IEEE ICMLA2018

Machine Learning engineers from Talentica Software will be presenting their work on Fingerprinting Latent Structures at the 17th IEEE ICMLA 2018, Orlando, FL.

Summary of the Paper

One of the components of a Question-Answering (QA) system is an algorithm that can understand the articulation style of questions. Such an algorithm, if based on Machine Learning (ML), would require a large number of example questions for training. However, if one observes closely, the way we articulate questions depends on the answer we expect, which in turn is in the context of an underlying knowledge base. Grammar also plays an important role. This means we need examples of questions with different articulation styles to train the ML system.

Take this question: How many balls make an over? What is the knowledge-base for this question and what does the answer look like in the context of the knowledge-base? Those of you who are familiar with Cricket know that the answer is 6 valid balls where a ball is a type of action performed by the bowler. Now, let us look at the different ways to articulate the cricket related question: In an over, how many balls are there? The bowler can bowl, how many balls in an over? How many valid balls make an over? Do you know how many balls make an over in the game of cricket? etc.. etc.. Note how the articulation style is changing but the expected answer is still the same.

To train an ML system for Question-Answering one would need a data set with all possible questions for a particular answer, and all possible questions for all possible answers. This would lead to a humongous task of data collection. The alternative is to use a data set with different articulation styles and then let the machine learn the latent structure of articulation for each style. Based on the detection of the articulation style, the corresponding answer generation system can be triggered. This alternative approach would help us build a QA system that is accurate for a few types of articulations. As and when we improve the complexity of the answer generation system, we can support questions with more complex articulation. Until then, the system can choose to ignore complex questions. A fingerprinting system can be implemented to learn these articulation styles.

In this paper, the authors formulate the problem of understanding question articulation as an objective-driven optimization problem where examples of complementary objectives are not available. They show how the optimization problem can be solved and implemented using auto-encoders for fingerprinting. They also present k-fingerprints, an algorithm that refines clusters of questions such that the ability to separate articulation styles becomes more accurate. To know the technical details of the approach take a look at this pre-print. If one is interested in extending the technique to images, get some clues from this Slideshare.


Talentica Exhibiting at TechCrunch Disrupt SF

Talentica is happy to be one of the key exhibitors at TechCrunch Disrupt yet again.

About the Event:

TechCrunch Disrupt is the world’s biggest and most impactful tech startup conference, and this year, we’re upping the stakes even more. Taking place at Moscone West, Disrupt SF will feature the biggest names in tech, from Reid Hoffman to Kirsten Green to Dara Khosrowshahi.

Blog: AWS Batch Jobs

AWS Batch enables developers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (for example, CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.

And we do it again!

According to an independent survey for Software 500 companies by Forbes, we are the 4th best company to work in 2018!


Continuous Integration and Deployment with AWS Code Services

Talentica software hosted a meetup on 3rd February, 2018. Participants dived deep into CI/CD pipeline use cases and its automation know-how. The agenda for the meetup was:

  • Automate Software build and release process using AWS services
  • Setup AWS CodeCommit for source control
  • Build and test code with AWS CodeBuild
  • Automate CI/CD process with AWS CodePipeline
  • Live Demo
  • Discussion
  • Q&A session

Talentica software Pvt.Ltd.office No. 501 Amar Megaplex, Above D-Mart,Baner Pune, 411045

Wi-Fi-based Indoor Positioning System Using Smartphones

Indoor positioning problem using Wi-Fi signal fingerprints can be viewed as a machine-learning task to be solved mathematically. This white paper proposes an efficient and reliable Wi-Fi real-time indoor positioning system using fingerprinting algorithm.