Kotlin Kronicles for Android developers — part 1

Another blog on“why Kotlin”? cliché? Not really. This is more like a “why not Kotlin?” kind of blog post. This blog is my attempt to convince android app developers to migrate to Kotlin. It doesn’t matter if you have little or no knowledge of Kotlin, or you are an iOS developer who worships swift, read along, I am sure Kotlin will impress you (if not my writing).

I am going to show some of the amazing features of Kotlin programming language that makes development so much easy and fun. And makes the code so readable as if you are reading plain English. I read somewhere that “Programming language isn’t for computers, computers understand only 1s and 0s, it is for humans” I couldn’t agree more. There is a learning curve, sure, where isn’t? It pays off nicely. Kotlin makes us do more with fewer lines of code, kotlin makes us productive.

Lets quickly walk over some of the obvious reasons for migrating to Kotlin:

  • Kotlin is one of the officially supported languages for android app development as announced in Google IO 2017.
  • Kotlin is 100% interoperable with Java. Which basically means Kotlin can use Java classes and methods and vice versa.
  • Kotlin has several modern programming language features like lambdas, higher order functions, null safety, extensions etc.
  • Kotlin is developed and maintained by JetBrains, which is the company behind several integrated development environments that developers use every day (or IDEs like IntelliJ IDEA, PyCharm, PhpStorm, GoLand etc).

This is available all over the internet. This is the content of “Why Kotlin” category of blogs.

Let’s talk about something a little more interesting.

Higher Order Functions:

Kotlin functions are first class citizens. Meaning functions can be stored in variables, passed as arguments or returned from other functions. A higher-order function is a function that takes a function as a parameter or returns a function.

This may sound strange at first. Why in the world would I pass a function to another function? (or return a function from another function) It is very common in various programming languages including javascript, swift, python (and Kotlin apparently). An excellent example of a higher function is the map. The map is a higher order function that takes in a function as a parameter and returns a list of results of applying the given function in each item of the original list or array.

checkout the map function above in line 3. It applies the stringStrirrer() function to each item of x. The result of the map operation is in line 4 above.

Data classes:

Java POJOs or Plain Old Java Objects or simply classes that store some data require a lot of boilerplate code most of the times, like getters, setters, equals, hashCode, toString etc. Kotlin data class derives these properties and functions automatically from properties defined in the primary constructor.

Just one line of code to replace the several lines of java POJO. For custom behavior we can override functions in data classes. Other than this Kotlin data classes are also bundled with copy, components etc which allows copying object and de-structuring respectively.

Dealing With Strings:

Kotlin standard library makes dealing with strings so much easier. Here is a sample:

No helper classes, public static methods or StringUtils is required. We can invoke these functions as if they belong to String class itself.

Dealing with Collections:

Same as String, the helper methods in “java.util.Collections” class are no longer required. We can directly call “sort, max, min, reverse, swap etc” on collections.

Consider a bank use case. A bank has many customers, a customer does several transactions every month. Think in terms of objects:

As it is clear from the picture above, a bank has many customers, a customer has some properties (name, list of transactions etc) and several transactions, a transaction has properties like amount, type etc. It will look something like this in Java:

Find the customer with minimum balance:

I don’t know about you but I think the Kotlin way is much more clean, simple and readable. And we didn’t import any helper class for that (Java way needed Collections class). I can read it as plain English, which is more than what I can say for the Java counterpart. The motive here is not to compare Java with Kotlin, but to appreciate the Kotlin Kronicles.

There are several functions like map, filter, reduce, flatmap, fold, partition etc. Here is how we can simplify our tasks by combining these standard functions (for each problem statement below, imagine doing it in Java):

As it is clear from the above gist, we can solve mundane problems with much fewer lines of code. Readability wise I just love it. Above code explanation here:

  1. FlatMap: Returns a single list of all elements yielded from results of transform function being invoked on each element of the original array (in above cases the transform function returned list of transactions by each individual user)
  2. Filter and SumBy: Here we combined filter and sum by operations to write a one-liner code to find the total amount deposited and withdrawn from the bank considering all customers.
  3. Fold: Accumulates value starting with the initial value (0.0 in our case) and applying operation (when statement above) from left to right to current accumulator value and each element. Here we used fold and when to find the net amount deposited in the bank considering all deposits and withdrawals.
  4. Partition: Splits the original array into a pair of lists, where the first list contains elements for which predicate (the separation function in this case) yielded true, while the 2nd where it yielded false. Of course, we can filter twice, but this is so much easier.

So many complex operations simplified by the Kotlin standard library.

Extensions:

One of my favourite features. Kotlin extensions let us add functionality to a class without modifying the original class. Just like in Swift and C#. This can be very handy if used properly. Check this out:

In the above code, we just added a new function called “toINR()” to Kotlin’s Double type. So we basically added a new function in Kotlin’s primitive type, how about that 😎. And it is a one-liner function, no curly braces, return type, return statement whatsoever. Noticed that conciseness did you?.

Since Kotlin supports higher order functions we can combine this with extension functions to solidify our code. One very common problem with android development involving SQLite is, developers often forget to end the transaction. Then we waste hours debugging it. Here is how we can avoid it:

We added an extension function called “performDBTransaction” in SQLiteDatabaseThis function takes a parameter that is a function with no input and no output, and this parameter function is whatever we want executed in between begin and end transactions. This function calls beginTransaction() then the passed operation and then calls endTransaction(). We can use this function wherever required without having to double check if we called endTransaction or not.

I always forget to call commit() or apply() when storing data in Shared Preferences. Similar approach:

extension function persist() (line 9 above) takes care of it. We are calling persist() as if it is a part of SharedPreferences.Editor.

Smart Casts:

Going back to our bank example. Let’s say the transaction can be of three types as explained in below figure:

NEFT transaction has fixed charges, IMPS has some bank-related charges. Now we deal with a transaction object, the super class “Transaction”. We need to identify the type of transaction so that the transaction can be processed accordingly. Here is how this can be handled in Kotlin:

In line 10 and 11 in above code gist, we didn’t cast the Transaction object into NEFT or IMPS, yet we are able to invoke the functions of these classes. This is a smart cast in Kotlin. Kotlin has automatically casted the transaction object into its respective type.

Epilogue:

As developers, we need to focus on stuff that matters, the core part, and boilerplate code isn’t one of them. Kotlin helps in reducing boilerplate code and makes development fun. Kotlin has many amazing features which ease development and testing. Do not let the fear of the unknown dictate your choice of the programming language; migrate your apps to kotlin now. The initial resistance is the only resistance.

I sincerely hope you enjoyed the first article of this Kotlin Kronicles series. We have just tapped the surface. Stay tuned for part 2. Let me know if you want me to cover anything specific.

Got any suggestions? shoot below in comments.

What is your favourite feature of Kotlin?

Keep Developing…

AWS Batch Jobs

What is batch computing?

Batch computing means running jobs asynchronously and automatically, across one or more computers.

What is AWS Batch Job?

AWS Batch enables developers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (for example, CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.

Why use AWS Batch Job ?

  • Fully managed infrastructure – No software to install or servers to manage. AWS Batch provisions, manages, and scales your infrastructure.
  • Integrated with AWS – Natively integrated with the AWS Platform, AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Recognition.
  • Cost-optimized Resource Provisioning – AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot.

AWS Batch Concepts

  • Jobs
  • Job Definitions
  • Job Queue
  • Compute Environments

Jobs

Jobs are the unit of work executed by AWS Batch as containerized applications running on Amazon EC2. Containerized jobs can reference a container image, command, and parameters or users can simply provide a .zip containing their application and AWS will run it on a default Amazon Linux container.

$ aws batch submit-job –job-name poller –job-definition poller-def –job-queue poller-queue

Job Dependencies

Jobs can express a dependency on the successful completion of other jobs or specific elements of an array job.

Use your preferred workflow engine and language to submit jobs. Flow-based systems simply submit jobs serially, while DAG-based systems submit many jobs at once, identifying inter-job dependencies.

Jobs run in approximately the same order in which they are submitted as long as all dependencies on other jobs have been met.

$ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f …

Job Definitions

Similar to ECS Task Definitions, AWS Batch Job Definitions specify how jobs are to be run. While each job must reference a job definition, many parameters can be overridden.

Some of the attributes specified in a job definition are:

  • IAM role associated with the job
  • vCPU and memory requirements
  • Mount points
  • Container properties
  • Environment variables
$ aws batch register-job-definition –job-definition-name gatk –container-properties …

Job Queues

Jobs are submitted to a Job Queue, where they reside until they are able to be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours.

$ aws batch create-job-queue –job-queue-name genomics –priority 500 –compute-environment-order …

 

Compute Environments

Job queues are mapped to one or more Compute Environments containing the EC2 instances that are used to run containerized batch jobs.

Managed (Recommended) compute environments enable you to describe your business requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as x % of On-Demand) and AWS launches and scale resources on your behalf.

We can choose specific instance types (e.g. c4.8xlarge), instance families (e.g. C4, M4, R3), or simply choose “optimal” and AWS Batch will launch appropriately sized instances from AWS more-modern instance families.

Alternatively, we can launch and manage our own resources within an Unmanaged compute environment. Your instances need to include the ECS agent and run supported versions of Linux and Docker.

$ aws batch create-compute-environment –compute- environment-name unmanagedce –type UNMANAGED …

AWS Batch will then create an Amazon ECS cluster which can accept the instances we launch. Jobs can be scheduled to your Compute Environment as soon as the instances are healthy and register with the ECS Agent.

Job States

Jobs submitted to a queue can have the following states:

  • SUBMITTED: Accepted into the queue, but not yet evaluated for execution
  • PENDING: The job has dependencies on other jobs which have not yet completed
  • RUNNABLE: The job has been evaluated by the scheduler and is ready to run
  • STARTING: The job is in the process of being scheduled to a compute resource
  • RUNNING: The job is currently running
  • SUCCEEDED: The job has finished with exit code 0
  • FAILED: The job finished with a non-zero exit code or was cancelled or terminated.

AWS Batch Actions

  • Jobs: SubmitJob, ListJobs, DescribeJobs, CancelJob, TerminateJob
  • Job Definitions: RegisterJobDefinition, DescribeJobDefinitions, DeregisterJobDefinition
  • Job Queues: CreateJobQueue, DescribeJobQueues, UpdateJobQueue, DeleteJobQueue
  • Compute Environments: CreateComputeEnvironment, DescribeComputeEnvironments, UpdateComputeEnvironment, DeleteComputeEnvironment

AWS Batch Pricing

There is no charge for AWS Batch. We only pay for the underlying resources we have consumed.

Use Case

Poller and Processor Service

Purpose

Poller service needs to be run every hour like a cron job which submits one or more requests to a processor service which has to launch the required number of EC2 resource, process files in parallel and terminate them when done.

Solution

We plan to go with Serverless Architecture approach instead of using the traditional beanstalk/EC2 instance, as we don’t want to maintain and keep running EC2 server instance 24/7.

This approach will reduce our AWS billing cost as the EC2 instance launches when the job is submitted to Batch Job and terminates when the job execution is completed.

Poller Service Architecture Diagram

Processor Service Architecture Diagram

First time release

For Poller and Processor Service:

  • Create Compute environment
  • Create Job queue
  • Create Job definition

To automate above resource creation process, we use batchbeagle (for Installaion and configuration, please refer batch-deploymnent repository)

Command to Create/Update Batch Job Resources of a Stack (Creates all Job Descriptions, Job Queues and Compute Environments)

beagle -f stack/stackname/servicename.yml assemble

To start Poller service:

  • Enable a Scheduler using AWS CloudWatch rule to trigger poller service batch job.

Incremental release

We must create a new revision of existing Job definition environment which will point to the new release version tagged ECR image to be deployed.

Command to deploy new release version of Docker image to Batch Job (Creates a new revision of an existing Job Definition)

 

beagle -f stack/stackname/servicename.yml job update job-definition-name

Monitoring

Cloudwatch Events

We will use AWS Batch event stream for CloudWatch Events to receive near real-time notifications regarding the current state of jobs that have been submitted to your job queues.

AWS Batch sends job status change events to CloudWatch Events. AWS Batch tracks the state of your jobs. If a previously submitted job’s status changes, an event is triggered. For example, if a job in the RUNNING status moves to the FAILED status.

We will configure an Amazon SNS topic to serve as an event target which sends notification to lambda function which will then filter out relevant content from the SNS message (json) content and beautify it and send to the respective Environment slack channel .

CloudWatch Event Rule → SNS Topic → Lambda Function → Slack Channel

Batch Job Status Notification in Slack

Slack notification provides the following details:

  • Job name
  • Job Status
  • Job ID
  • Job Queue Name
  • Log Stream Name

Explanation of Git design principles through Git internals

In this blog we’ll explore the internals of how Git works. Having some behind-the-scenes working knowledge of Git will help you understand why Git is so much faster than traditional version control systems. Git also helps you recover data from unexpected crash/ delete scenarios.

For a developer, it is quite useful to understand the design principles of Git and see how it manages both speed of access (traversing to previous commits) and small disk space for repository.

In this blog, we will cover the following topics:

  • Initializing a new repository
  • Working directory and local repository
  • Git objects
  • Blob
  • Tree
  • Commit
  • Tag
  • Packs

I’m using Ubuntu 16.04 LTS, Zsh terminal and Git v2.7.4 for this blog, but you can use any operating system and terminal to follow along.

Initializing a new repository

  1. Initialize a new Git repository with ‘git init’ command.

 

  1. Create a couple of files by running the following commands.

$ echo “First file”>>first.txt

$ echo “Second file”>>second.txt

blog1

  1. Run ls –la to display all the contents of the folder.
    It should show .git directory and the two files we created.

blog2

Working directory and local repository

Working directory comprises of the files and folder we want to manage with Git Version Control System (VCS). In this case ~/testRepo folder constitutes our working directory except for ‘.git’ folder. .git folder forms the local repository. ‘.git’ folders contain everything that Git stores to manage our data.

 

Git objects

Git doesn’t store diff of the contents of the file. It stores snapshots of each file, that is  each version of the file is stored exactly as it is at the point it is staged and committed. This is done for faster access, one of the core principles for development of Git.

The latest version of the file is always stored as is in Git as it is most likely the one to be used. Also,  storing it makes it much faster for retrieve operations.

As further versions of the file are added, Git automatically creates pack files. We’ll discuss more on pack files later.

There are 4 types of git objects.

  • Blob
  • Tree
  • Commit
  • Tag

Let’s understand these with an example. First let’s check the contents of .git folder.

blog3

There are 5 directories and 3 files. Let’s start with objects.

blog5

Find command returned no results, i.e. there are currently no files in objects folder. Let’s stage our two files and then check.

blog6

We can see that there are two files. To explore these objects, we would need to use the command ’git cat-file

From the main pages of the Git cat-file, “Provides content or type and size information for repository objects”

blog7

-p argument pretty-prints the object contents and –t returns the type.

The objectId(SHA-1 Id) is subpath of the file from .git/objects folder without the slashes.

For example, object Id of .git/objects/20/d5b672a347112783818b3fc8cc7cd66ade3008 is 20d5b672a347112783818b3fc8cc7cd66ade3008.

The type that was returned for both the objects is blob.

So blobs are objects used for storing content of the file. Blobs just store the content, no other information like file name.

Let’s commit our code now with ‘git commit’ command.

blog8

Next, run ‘git log’ to retrieve the commit ID.

blog9

Copy the commit ID and then run cat-file commands on it.

blog10

The object type that is returned is ‘commit’. It contains reference to a tree object, author name, committer name and the commit message.

Now let’s check the tree object.

blog11

blog12

The tree object contains reference to the blob files we saw earlier, and also the reference to file names. We can summarize our Git repository state at this point with the following object diagram:

blog13

 

Let’s add a folder to our repository.

Run the following commands. We’ll use ‘git add’ to add folder from working directory to the local repository.

$ mkdir fol1

$ echo “Third File”>> fol1/third.txt

$ git add fol1

$ git commit -m “Second commit”

Inspect the second commit object.

blog14

Notice that the tree reference has changed. Also, there is a parent object property. So commit object also stores the parent commit’s id. Let’s inspect the tree.

blog15

The folder we just added, fol1, is stored as a tree and it contains reference to a blob referencing third.txt file. So tree objects reference blobs or other sub tree objects.

blog16

Now let’s discuss tags. Tags are used to provide an alias to a commit SHA ID for future reference/use.  There are two types of tags: lightweight tags and annotated tags.

  • Lightweight tags contain only the SHA-ID information.

blog17

The command ‘git tag light’ creates a file under .git/refs/tags/light which contains the commit Id on which the tag was created. No separate tag object is created. This is mostly used for development purposes, to easily remember and traverse back to a commit.

  • Annotated tags are usually used for release. They contain extra information like message and the tagger name along with the commit ID. Annotated tags can be created with ‘ git tag –a –m “<message>” ‘ command.

blog18

A separate tag object is created for annotated tag. You can list the tags created with ‘git tag’ command

Packs

  1. Although Git stores the contents of the latest versions of object intact, the older versions are stored as deltas in pack files. Let’s understand with an example. We would need a slightly larger file to see the difference in size of the delta file and original file. Download GNU license web page.
  2. Run the following command to download the license HTML file.

$ curl -L -O -C – https://www.gnu.org/licenses/gpl-3.0.en.html

You should now have gpl-3.0.en.H file in your working directory.

  1. Add and commit the file.

$ git add gpl-3.0.en.html

$ git commit -m “Added gpl file”

Inspect the commit and get the blob info of the added file.

git cat-file -p 53550a1c9325753eb44b1428a280bfb2cd5b90ef

blog19

The last command returns the size of the blob. That is the blob containing content of gpl-3.0.en.html is 49641 bytes.

4. Edit the file and commit it again.

blog20

A new blob is created with a slightly larger size. Now let’s check the original blob.

blog21

The original blog still exists. Thus for each change, a snapshot of the file is stored with its contents intact.

5. Let’s pack the files, with ‘git gc’ command.

As you can see, all of our existing blob and commit objects are gone and are now replaced with pack files.

blog22

6. We can run ‘git verify-pack’ command to inspect the pack file.

blog23

Now let’s check the highlighted entries.

The blob with commit ‘6be03f’ is the second version of gpl-3.0.en.HTML and the one with commit ‘0f6718’ is the first version.

The third column in output represents the blob size. As you can see, the first blob is now reduced to 9 bytes and references the second blob which maintains its size of 49652. Thus the first blob is stored as a delta of second blob although the first one is older. This is because the newer version is most likely to be the one to be used. Git automatically calls pack when pushing to remote repository.

Conclusion

In this blog we explored how Git stores the files internally and looked at the various types of Git objects, i.e, blob, tree and commit and how they are linked to each other. Then we also looked at packs, which explained how Git compresses older versions of a file and saves storage space.

Performance Testing for Serverless Architecture (AWS)

Purpose

This document provides a brief introduction/idea of how JMETER can be used with a serverless architecture also, like AWS, for evaluating the no. of read/writes capacity to benchmark the load an application can survive with.

What are AWS Lambdas? 

The code written is deployed over the cloud with one or more lambda functions, to AWS lambda, a compute service that runs the code on our behalf. Continue reading Performance Testing for Serverless Architecture (AWS)

Creating time-based index in Elasticsearch using NEST (.NET clients for Elasticsearch)

Introduction

One of the most common use cases in Elasticsearch is to create time-based indexes for logs. In this blog, we will see how to create time-based index on run time using NEST (.NET clients for Elastic search).

When it comes to logging, we usually create a log file everyday to isolate the logs and get only the ones relevant for analysis, when required. If we store the logs in a relational database, we commonly have one table. With time, the entries on this table grow and to check the number of records on table, we usually delete the old records from the table at specific interval. Continue reading Creating time-based index in Elasticsearch using NEST (.NET clients for Elasticsearch)

Proxy Routing in Angular 4 Applications

This blog intends to deal and simplify routing in Angular 4 applications in both development and production environment. While working on routing in Angular 4 applications, we often face some of the following challenges:

  • Finding a solution to the problem of cross-origin issues in Angular development environment
  • Separating routing configurations and APIs servers URLs from the code, making the application more robust and maintainable
  • Finding a generic solution where frequent changing of API URLs (due to deployment on different servers, let’s say for the purpose of load balancing) do not force you to modify and change the code
  • Servers where the APIs deployed do not appear in your code at all

Continue reading Proxy Routing in Angular 4 Applications

Data Fingerprinting to enable Incremental Improvement in Machine Learning Complexity

Introduction

Many startups would like to incorporate a machine learning component into their product(s). Most of these products are unique in terms of the business, the data that is required to train the machine learning models, and the data that can be collected. One of the main challenges that these startups have is the availability of data specific to their business problem. Unfortunately, the quality of the machine learning algorithms is dependent on the quality of the domain specific data that is used to train these models. Generic data sets are not useful for the unique problems that these startups are solving. As a result, they cannot rollout a feature involving machine learning until they can collect enough data. On the other hand, customers ask for the product feature before their usage can generate the required data. In such a situation, one needs to rollout a machine learning solution incrementally. For this to happen, there must be a synergy between the data and the algorithms that have the ability to process this data. To enforce this synergy, we propose a computational model that we refer to as “Data Fingerprinting”. Continue reading Data Fingerprinting to enable Incremental Improvement in Machine Learning Complexity