WebRTC – Basics of web real-time communication

WebRTC is a free open source standard for real-time, plugin-free video, audio and data communication between peers. Many solutions like Skype, Facebook, Google Hangout offer RTC but they need downloads, native apps or plugins. The guiding principles of the WebRTC project are that its APIs should be open source, free, standardized, built into web browsers and more efficient than existing technologies.

How does it work

  • Obtain a Video, Audio or Data stream from the current client.
  • Gather network information and exchange it with peer WebRTC enabled client.
  • Exchange metadata about the data to be transferred.
  • Stream audio, video or data.

That’s it ! .. well almost, it’s a dumbed down version of what actually happens. Since now you have an overall picture let’s dig into the details.

How it really works

WebRTC provides the implementation of 3 basic APIs to achieve everything.

  • MediaStream: Allowing the client to access a stream from a WebCam or microphone.
  • RTCPeerConnection: Enabling audio or video data transfer, with support for encryption and bandwidth management.
  • RTCDataChannel: Allowing peer-to-peer communication for any generic data.

Along with these capabilities, we will need a server (yes we still need a server !)  to identify the remote peer and to do the initial handshake. Once the peer has been identified we can directly transfer data between two peers if possible or relay the information using a server.

Let’s look at each of these steps in detail.

MediaStream

MediaStream has a getUserMedia() method to get access of Audio or Video or a data stream and provide success and failure handler.

 

navigator.getUserMedia(constraints, successCallback, errorCallback);

 

The constraints is a json which specifies if an audio or video access is required. In addition, we can specify some metadata about the constraints like video with and height, example:

 

navigator.getUserMedia({ audio: true, video: true}, successCallback, errorCallback);

 

RTCPeerConnection

This interface represents the connection between local WebRTC client and a remote peer. It is used to do the efficient transfer of data between the peers. Both the peers need to setup RtcPeerConnection at their end. In general, we use an RTCPeerConnection::onaddstream event callback to take care of audio/video stream.

  • The initiator of the call (the caller) needs to create an offer and send it to the callee, with the help of a signalling server.
  • Callee which receives the offer needs to create an answer and send it back to the caller using the signalling server.
ICE

It is a framework that allows web browsers to connect with peers. There are many reasons why a straight up connection from Peer A to Peer B simply won’t work. Most of the clients won’t have a public IP address as they are usually sitting behind a firewall and a NAT. Given the involvement of NAT, our client has to figure out the IP address of the peer machine. This is where Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN) servers come into the picture

STUN

A STUN server allows clients to discover their public IP address and the type of NAT they are behind. This information is used to establish a media connection. In most cases, a STUN server is only used during the connection setup and once that session has been established, media will flow directly between clients.

TURN

If a STUN server cannot establish the connection, ICE can switch to TURN. Traversal Using Relay NAT (TURN) is an extension to STUN, that allows media traversal over a NAT that does not allow a peer to peer connection required by STUN traffic. TURN servers are often used in the case of a symmetric NAT.

Unlike STUN, a TURN server remains in the media path after the connection has been established. That is why the term “relay” is used to define TURN. A TURN server literally relays the media between the WebRTC peers.

RTCDataChannel

The RTCDataChannel interface represents a bi-directional data channel between two peers of a connection. Objects of this type can be created using

 

RTCPeerConnection.createDataChannel()

 

Data channel capabilities make use of events based communication:

var peerConn= new RTCPeerConnection(),
     dc = peerConn.createDataChannel("my channel");
 
 dc.onmessage = function (event) {
   console.log("received: " + event.data);
 };

Links and References

Android life cycle aware components

What is a life cycle aware component?

A life cycle aware component is a component which is aware of the life cycle of other components like activity or fragment and performs some action in response to change in life cycle status of this component.

Why have life cycle aware components?

Let’s say we are developing a simple video player application. Where we have an activity named as VideoActivity which contains the UI to play the video and we have a class named as VideoPlayer which contains all the logic and mechanism to play a video. Our VideoActivity creates an instance of this VideoPlayer class in onCreate() method

 

 

Now as for any video player we would like it to play the video when VideoActivity is in foreground i.e, in resumed state and pause the video when it goes in background i.e when it goes in the paused state. So we will have the following code in our VideoActivity’s onResume() and onPause() methods.

 

 

Also, we would like it to stop playing completely and release the resources when the activity gets destroyed. Thus we will have the following code in VideoActivity’s onDestroy() method

When we analyze this code we can see that even for this simple application our activity has to take a lot of care about calling the play, pause and stop methods of VideoPlayer class. Now imagine if we add separate components for audio, buffering etc, then our VideoActivity has to take care of all these components inside its life cycle callback methods which leads to poor organization of code, prone to errors.

 

Using arch.lifecycle 

With the introduction of life cycle aware components in android.arch.lifecycle library, we can move all this code to the individual components. Our activities or fragments no longer need to play with these component logic and can focus on their own primary job i.e. to maintain UI. Thus, the code becomes clean, maintainable and testable.

The android.arch.lifecycle package provides classes and interfaces that prove helpful to solve such problems in an isolated way.

So let’s dive and see how we can implement the above example using life cycle aware components.

Life cycle aware components way

To keep things simple we can add the below lines to our app gradle file to add life cycle components from android.arch library

 

 

Once we have integrated the arch components we can make our VideoPlayer class implement LifecycleObserver, which is an empty interface with annotations.
Using the specific annotations with the VideoPlayer class methods it will be notified about the life cycle state changes in VideoActivity. So our VideoPlayer class will be like:

We are not done yet. We need some binding between this VideoPlayer class and the VideoActivity so that our VideoPlayer object gets notified about the life cycle state changes in VideoActivity.

Well, this binding is quite easy, VideoActivity is an instance of android.support.v7.app.AppCompatActivity which implements Lifecycleowner interface. Lifecycleowner interface is a single method interface which contains a method, getLifecycle(), to get the Lifecycle object corresponding to its implementing class which keeps track about the life cycle state changes of activity/fragment or any other component having a life cycle. This Lifecycle object is observable and notifies its observers about the change in state.

So we have our VideoPlayer, instance of LifecycleObserver, and we need to add this as an observer to the Lifecycle object of VideoActivity. So we will modify VideoActivity as:

Well it makes things quite resilient and isolated. Our VideoPlayer class logic is separated from VideoActivity. Our VideoActivity no longer needs to bother about calling its dependent components methods to pause or play in its life cycle callback methods which makes the code clean, manageable and testable.

Conclusion

The beauty of such separation of concern can be also felt when we are developing some library and we intend it to be used as a third party library. It should not be a concern for end users of our library i.e. developers who would be using our library, to call life cycle dependent methods of our library. They might miss it or may not be aware at all which methods to call (because developers don’t usually read the documentation completely) leading to memory leaks or worse app crashes.

Another use case can be when an activity depends on some network call handled by a  network manager class. We can make the network manager class life cycle aware so that it tries to supply the data to activity only when it is alive or better to not keep a reference to activity when it is destroyed. Thus, avoiding memory leaks.

We can develop a well managed app using the life cycle aware components provided by android.arch.lifecycle package. The resulting code will be loosely coupled and thus easy for modifications, testing and debugging which makes our life easy as developers.

Kotlin Kronicles for Android developers — part 1

Another blog on“why Kotlin”? cliché? Not really. This is more like a “why not Kotlin?” kind of blog post. This blog is my attempt to convince android app developers to migrate to Kotlin. It doesn’t matter if you have little or no knowledge of Kotlin, or you are an iOS developer who worships swift, read along, I am sure Kotlin will impress you (if not my writing).

I am going to show some of the amazing features of Kotlin programming language that makes development so much easy and fun. And makes the code so readable as if you are reading plain English. I read somewhere that “Programming language isn’t for computers, computers understand only 1s and 0s, it is for humans” I couldn’t agree more. There is a learning curve, sure, where isn’t? It pays off nicely. Kotlin makes us do more with fewer lines of code, kotlin makes us productive.

Lets quickly walk over some of the obvious reasons for migrating to Kotlin:

  • Kotlin is one of the officially supported languages for android app development as announced in Google IO 2017.
  • Kotlin is 100% interoperable with Java. Which basically means Kotlin can use Java classes and methods and vice versa.
  • Kotlin has several modern programming language features like lambdas, higher order functions, null safety, extensions etc.
  • Kotlin is developed and maintained by JetBrains, which is the company behind several integrated development environments that developers use every day (or IDEs like IntelliJ IDEA, PyCharm, PhpStorm, GoLand etc).

This is available all over the internet. This is the content of “Why Kotlin” category of blogs.

Let’s talk about something a little more interesting.

Higher Order Functions:

Kotlin functions are first class citizens. Meaning functions can be stored in variables, passed as arguments or returned from other functions. A higher-order function is a function that takes a function as a parameter or returns a function.

This may sound strange at first. Why in the world would I pass a function to another function? (or return a function from another function) It is very common in various programming languages including javascript, swift, python (and Kotlin apparently). An excellent example of a higher function is the map. The map is a higher order function that takes in a function as a parameter and returns a list of results of applying the given function in each item of the original list or array.

checkout the map function above in line 3. It applies the stringStrirrer() function to each item of x. The result of the map operation is in line 4 above.

Data classes:

Java POJOs or Plain Old Java Objects or simply classes that store some data require a lot of boilerplate code most of the times, like getters, setters, equals, hashCode, toString etc. Kotlin data class derives these properties and functions automatically from properties defined in the primary constructor.

Just one line of code to replace the several lines of java POJO. For custom behavior we can override functions in data classes. Other than this Kotlin data classes are also bundled with copy, components etc which allows copying object and de-structuring respectively.

Dealing With Strings:

Kotlin standard library makes dealing with strings so much easier. Here is a sample:

No helper classes, public static methods or StringUtils is required. We can invoke these functions as if they belong to String class itself.

Dealing with Collections:

Same as String, the helper methods in “java.util.Collections” class are no longer required. We can directly call “sort, max, min, reverse, swap etc” on collections.

Consider a bank use case. A bank has many customers, a customer does several transactions every month. Think in terms of objects:

As it is clear from the picture above, a bank has many customers, a customer has some properties (name, list of transactions etc) and several transactions, a transaction has properties like amount, type etc. It will look something like this in Java:

Find the customer with minimum balance:

I don’t know about you but I think the Kotlin way is much more clean, simple and readable. And we didn’t import any helper class for that (Java way needed Collections class). I can read it as plain English, which is more than what I can say for the Java counterpart. The motive here is not to compare Java with Kotlin, but to appreciate the Kotlin Kronicles.

There are several functions like map, filter, reduce, flatmap, fold, partition etc. Here is how we can simplify our tasks by combining these standard functions (for each problem statement below, imagine doing it in Java):

As it is clear from the above gist, we can solve mundane problems with much fewer lines of code. Readability wise I just love it. Above code explanation here:

  1. FlatMap: Returns a single list of all elements yielded from results of transform function being invoked on each element of the original array (in above cases the transform function returned list of transactions by each individual user)
  2. Filter and SumBy: Here we combined filter and sum by operations to write a one-liner code to find the total amount deposited and withdrawn from the bank considering all customers.
  3. Fold: Accumulates value starting with the initial value (0.0 in our case) and applying operation (when statement above) from left to right to current accumulator value and each element. Here we used fold and when to find the net amount deposited in the bank considering all deposits and withdrawals.
  4. Partition: Splits the original array into a pair of lists, where the first list contains elements for which predicate (the separation function in this case) yielded true, while the 2nd where it yielded false. Of course, we can filter twice, but this is so much easier.

So many complex operations simplified by the Kotlin standard library.

Extensions:

One of my favourite features. Kotlin extensions let us add functionality to a class without modifying the original class. Just like in Swift and C#. This can be very handy if used properly. Check this out:

In the above code, we just added a new function called “toINR()” to Kotlin’s Double type. So we basically added a new function in Kotlin’s primitive type, how about that 😎. And it is a one-liner function, no curly braces, return type, return statement whatsoever. Noticed that conciseness did you?.

Since Kotlin supports higher order functions we can combine this with extension functions to solidify our code. One very common problem with android development involving SQLite is, developers often forget to end the transaction. Then we waste hours debugging it. Here is how we can avoid it:

We added an extension function called “performDBTransaction” in SQLiteDatabaseThis function takes a parameter that is a function with no input and no output, and this parameter function is whatever we want executed in between begin and end transactions. This function calls beginTransaction() then the passed operation and then calls endTransaction(). We can use this function wherever required without having to double check if we called endTransaction or not.

I always forget to call commit() or apply() when storing data in Shared Preferences. Similar approach:

extension function persist() (line 9 above) takes care of it. We are calling persist() as if it is a part of SharedPreferences.Editor.

Smart Casts:

Going back to our bank example. Let’s say the transaction can be of three types as explained in below figure:

NEFT transaction has fixed charges, IMPS has some bank-related charges. Now we deal with a transaction object, the super class “Transaction”. We need to identify the type of transaction so that the transaction can be processed accordingly. Here is how this can be handled in Kotlin:

In line 10 and 11 in above code gist, we didn’t cast the Transaction object into NEFT or IMPS, yet we are able to invoke the functions of these classes. This is a smart cast in Kotlin. Kotlin has automatically casted the transaction object into its respective type.

Epilogue:

As developers, we need to focus on stuff that matters, the core part, and boilerplate code isn’t one of them. Kotlin helps in reducing boilerplate code and makes development fun. Kotlin has many amazing features which ease development and testing. Do not let the fear of the unknown dictate your choice of the programming language; migrate your apps to kotlin now. The initial resistance is the only resistance.

I sincerely hope you enjoyed the first article of this Kotlin Kronicles series. We have just tapped the surface. Stay tuned for part 2. Let me know if you want me to cover anything specific.

Got any suggestions? shoot below in comments.

What is your favourite feature of Kotlin?

Keep Developing…

AWS Batch Jobs

What is batch computing?

Batch computing means running jobs asynchronously and automatically, across one or more computers.

What is AWS Batch Job?

AWS Batch enables developers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (for example, CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.

Why use AWS Batch Job ?

  • Fully managed infrastructure – No software to install or servers to manage. AWS Batch provisions, manages, and scales your infrastructure.
  • Integrated with AWS – Natively integrated with the AWS Platform, AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Recognition.
  • Cost-optimized Resource Provisioning – AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot.

AWS Batch Concepts

  • Jobs
  • Job Definitions
  • Job Queue
  • Compute Environments

Jobs

Jobs are the unit of work executed by AWS Batch as containerized applications running on Amazon EC2. Containerized jobs can reference a container image, command, and parameters or users can simply provide a .zip containing their application and AWS will run it on a default Amazon Linux container.

$ aws batch submit-job –job-name poller –job-definition poller-def –job-queue poller-queue

Job Dependencies

Jobs can express a dependency on the successful completion of other jobs or specific elements of an array job.

Use your preferred workflow engine and language to submit jobs. Flow-based systems simply submit jobs serially, while DAG-based systems submit many jobs at once, identifying inter-job dependencies.

Jobs run in approximately the same order in which they are submitted as long as all dependencies on other jobs have been met.

$ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f …

Job Definitions

Similar to ECS Task Definitions, AWS Batch Job Definitions specify how jobs are to be run. While each job must reference a job definition, many parameters can be overridden.

Some of the attributes specified in a job definition are:

  • IAM role associated with the job
  • vCPU and memory requirements
  • Mount points
  • Container properties
  • Environment variables
$ aws batch register-job-definition –job-definition-name gatk –container-properties …

Job Queues

Jobs are submitted to a Job Queue, where they reside until they are able to be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours.

$ aws batch create-job-queue –job-queue-name genomics –priority 500 –compute-environment-order …

 

Compute Environments

Job queues are mapped to one or more Compute Environments containing the EC2 instances that are used to run containerized batch jobs.

Managed (Recommended) compute environments enable you to describe your business requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as x % of On-Demand) and AWS launches and scale resources on your behalf.

We can choose specific instance types (e.g. c4.8xlarge), instance families (e.g. C4, M4, R3), or simply choose “optimal” and AWS Batch will launch appropriately sized instances from AWS more-modern instance families.

Alternatively, we can launch and manage our own resources within an Unmanaged compute environment. Your instances need to include the ECS agent and run supported versions of Linux and Docker.

$ aws batch create-compute-environment –compute- environment-name unmanagedce –type UNMANAGED …

AWS Batch will then create an Amazon ECS cluster which can accept the instances we launch. Jobs can be scheduled to your Compute Environment as soon as the instances are healthy and register with the ECS Agent.

Job States

Jobs submitted to a queue can have the following states:

  • SUBMITTED: Accepted into the queue, but not yet evaluated for execution
  • PENDING: The job has dependencies on other jobs which have not yet completed
  • RUNNABLE: The job has been evaluated by the scheduler and is ready to run
  • STARTING: The job is in the process of being scheduled to a compute resource
  • RUNNING: The job is currently running
  • SUCCEEDED: The job has finished with exit code 0
  • FAILED: The job finished with a non-zero exit code or was cancelled or terminated.

AWS Batch Actions

  • Jobs: SubmitJob, ListJobs, DescribeJobs, CancelJob, TerminateJob
  • Job Definitions: RegisterJobDefinition, DescribeJobDefinitions, DeregisterJobDefinition
  • Job Queues: CreateJobQueue, DescribeJobQueues, UpdateJobQueue, DeleteJobQueue
  • Compute Environments: CreateComputeEnvironment, DescribeComputeEnvironments, UpdateComputeEnvironment, DeleteComputeEnvironment

AWS Batch Pricing

There is no charge for AWS Batch. We only pay for the underlying resources we have consumed.

Use Case

Poller and Processor Service

Purpose

Poller service needs to be run every hour like a cron job which submits one or more requests to a processor service which has to launch the required number of EC2 resource, process files in parallel and terminate them when done.

Solution

We plan to go with Serverless Architecture approach instead of using the traditional beanstalk/EC2 instance, as we don’t want to maintain and keep running EC2 server instance 24/7.

This approach will reduce our AWS billing cost as the EC2 instance launches when the job is submitted to Batch Job and terminates when the job execution is completed.

Poller Service Architecture Diagram

Processor Service Architecture Diagram

First time release

For Poller and Processor Service:

  • Create Compute environment
  • Create Job queue
  • Create Job definition

To automate above resource creation process, we use batchbeagle (for Installaion and configuration, please refer batch-deploymnent repository)

Command to Create/Update Batch Job Resources of a Stack (Creates all Job Descriptions, Job Queues and Compute Environments)

beagle -f stack/stackname/servicename.yml assemble

To start Poller service:

  • Enable a Scheduler using AWS CloudWatch rule to trigger poller service batch job.

Incremental release

We must create a new revision of existing Job definition environment which will point to the new release version tagged ECR image to be deployed.

Command to deploy new release version of Docker image to Batch Job (Creates a new revision of an existing Job Definition)

 

beagle -f stack/stackname/servicename.yml job update job-definition-name

Monitoring

Cloudwatch Events

We will use AWS Batch event stream for CloudWatch Events to receive near real-time notifications regarding the current state of jobs that have been submitted to your job queues.

AWS Batch sends job status change events to CloudWatch Events. AWS Batch tracks the state of your jobs. If a previously submitted job’s status changes, an event is triggered. For example, if a job in the RUNNING status moves to the FAILED status.

We will configure an Amazon SNS topic to serve as an event target which sends notification to lambda function which will then filter out relevant content from the SNS message (json) content and beautify it and send to the respective Environment slack channel .

CloudWatch Event Rule → SNS Topic → Lambda Function → Slack Channel

Batch Job Status Notification in Slack

Slack notification provides the following details:

  • Job name
  • Job Status
  • Job ID
  • Job Queue Name
  • Log Stream Name

Explanation of Git design principles through Git internals

In this blog we’ll explore the internals of how Git works. Having some behind-the-scenes working knowledge of Git will help you understand why Git is so much faster than traditional version control systems. Git also helps you recover data from unexpected crash/ delete scenarios.

For a developer, it is quite useful to understand the design principles of Git and see how it manages both speed of access (traversing to previous commits) and small disk space for repository.

In this blog, we will cover the following topics:

  • Initializing a new repository
  • Working directory and local repository
  • Git objects
  • Blob
  • Tree
  • Commit
  • Tag
  • Packs

I’m using Ubuntu 16.04 LTS, Zsh terminal and Git v2.7.4 for this blog, but you can use any operating system and terminal to follow along.

Initializing a new repository

  1. Initialize a new Git repository with ‘git init’ command.

 

  1. Create a couple of files by running the following commands.

$ echo “First file”>>first.txt

$ echo “Second file”>>second.txt

blog1

  1. Run ls –la to display all the contents of the folder.
    It should show .git directory and the two files we created.

blog2

Working directory and local repository

Working directory comprises of the files and folder we want to manage with Git Version Control System (VCS). In this case ~/testRepo folder constitutes our working directory except for ‘.git’ folder. .git folder forms the local repository. ‘.git’ folders contain everything that Git stores to manage our data.

 

Git objects

Git doesn’t store diff of the contents of the file. It stores snapshots of each file, that is  each version of the file is stored exactly as it is at the point it is staged and committed. This is done for faster access, one of the core principles for development of Git.

The latest version of the file is always stored as is in Git as it is most likely the one to be used. Also,  storing it makes it much faster for retrieve operations.

As further versions of the file are added, Git automatically creates pack files. We’ll discuss more on pack files later.

There are 4 types of git objects.

  • Blob
  • Tree
  • Commit
  • Tag

Let’s understand these with an example. First let’s check the contents of .git folder.

blog3

There are 5 directories and 3 files. Let’s start with objects.

blog5

Find command returned no results, i.e. there are currently no files in objects folder. Let’s stage our two files and then check.

blog6

We can see that there are two files. To explore these objects, we would need to use the command ’git cat-file

From the main pages of the Git cat-file, “Provides content or type and size information for repository objects”

blog7

-p argument pretty-prints the object contents and –t returns the type.

The objectId(SHA-1 Id) is subpath of the file from .git/objects folder without the slashes.

For example, object Id of .git/objects/20/d5b672a347112783818b3fc8cc7cd66ade3008 is 20d5b672a347112783818b3fc8cc7cd66ade3008.

The type that was returned for both the objects is blob.

So blobs are objects used for storing content of the file. Blobs just store the content, no other information like file name.

Let’s commit our code now with ‘git commit’ command.

blog8

Next, run ‘git log’ to retrieve the commit ID.

blog9

Copy the commit ID and then run cat-file commands on it.

blog10

The object type that is returned is ‘commit’. It contains reference to a tree object, author name, committer name and the commit message.

Now let’s check the tree object.

blog11

blog12

The tree object contains reference to the blob files we saw earlier, and also the reference to file names. We can summarize our Git repository state at this point with the following object diagram:

blog13

 

Let’s add a folder to our repository.

Run the following commands. We’ll use ‘git add’ to add folder from working directory to the local repository.

$ mkdir fol1

$ echo “Third File”>> fol1/third.txt

$ git add fol1

$ git commit -m “Second commit”

Inspect the second commit object.

blog14

Notice that the tree reference has changed. Also, there is a parent object property. So commit object also stores the parent commit’s id. Let’s inspect the tree.

blog15

The folder we just added, fol1, is stored as a tree and it contains reference to a blob referencing third.txt file. So tree objects reference blobs or other sub tree objects.

blog16

Now let’s discuss tags. Tags are used to provide an alias to a commit SHA ID for future reference/use.  There are two types of tags: lightweight tags and annotated tags.

  • Lightweight tags contain only the SHA-ID information.

blog17

The command ‘git tag light’ creates a file under .git/refs/tags/light which contains the commit Id on which the tag was created. No separate tag object is created. This is mostly used for development purposes, to easily remember and traverse back to a commit.

  • Annotated tags are usually used for release. They contain extra information like message and the tagger name along with the commit ID. Annotated tags can be created with ‘ git tag –a –m “<message>” ‘ command.

blog18

A separate tag object is created for annotated tag. You can list the tags created with ‘git tag’ command

Packs

  1. Although Git stores the contents of the latest versions of object intact, the older versions are stored as deltas in pack files. Let’s understand with an example. We would need a slightly larger file to see the difference in size of the delta file and original file. Download GNU license web page.
  2. Run the following command to download the license HTML file.

$ curl -L -O -C – https://www.gnu.org/licenses/gpl-3.0.en.html

You should now have gpl-3.0.en.H file in your working directory.

  1. Add and commit the file.

$ git add gpl-3.0.en.html

$ git commit -m “Added gpl file”

Inspect the commit and get the blob info of the added file.

git cat-file -p 53550a1c9325753eb44b1428a280bfb2cd5b90ef

blog19

The last command returns the size of the blob. That is the blob containing content of gpl-3.0.en.html is 49641 bytes.

4. Edit the file and commit it again.

blog20

A new blob is created with a slightly larger size. Now let’s check the original blob.

blog21

The original blog still exists. Thus for each change, a snapshot of the file is stored with its contents intact.

5. Let’s pack the files, with ‘git gc’ command.

As you can see, all of our existing blob and commit objects are gone and are now replaced with pack files.

blog22

6. We can run ‘git verify-pack’ command to inspect the pack file.

blog23

Now let’s check the highlighted entries.

The blob with commit ‘6be03f’ is the second version of gpl-3.0.en.HTML and the one with commit ‘0f6718’ is the first version.

The third column in output represents the blob size. As you can see, the first blob is now reduced to 9 bytes and references the second blob which maintains its size of 49652. Thus the first blob is stored as a delta of second blob although the first one is older. This is because the newer version is most likely to be the one to be used. Git automatically calls pack when pushing to remote repository.

Conclusion

In this blog we explored how Git stores the files internally and looked at the various types of Git objects, i.e, blob, tree and commit and how they are linked to each other. Then we also looked at packs, which explained how Git compresses older versions of a file and saves storage space.

Performance Testing for Serverless Architecture (AWS)

Purpose

This document provides a brief introduction/idea of how JMETER can be used with a serverless architecture also, like AWS, for evaluating the no. of read/writes capacity to benchmark the load an application can survive with.

What are AWS Lambdas? 

The code written is deployed over the cloud with one or more lambda functions, to AWS lambda, a compute service that runs the code on our behalf. Continue reading Performance Testing for Serverless Architecture (AWS)

Creating time-based index in Elasticsearch using NEST (.NET clients for Elasticsearch)

Introduction

One of the most common use cases in Elasticsearch is to create time-based indexes for logs. In this blog, we will see how to create time-based index on run time using NEST (.NET clients for Elastic search).

When it comes to logging, we usually create a log file everyday to isolate the logs and get only the ones relevant for analysis, when required. If we store the logs in a relational database, we commonly have one table. With time, the entries on this table grow and to check the number of records on table, we usually delete the old records from the table at specific interval. Continue reading Creating time-based index in Elasticsearch using NEST (.NET clients for Elasticsearch)