Scala code analysis and coverage report on Sonarqube using SBT

Introduction

This blog is all about configuring scoverage plugin with SonarQube for tracking statement coverage as well as static code analysis for Scala project. SonarQube has support for many languages but it doesn’t have support for Scala- so this blog will guide through configuring sonar-scala and scoverage plugins to generate code analysis and code coverage reports.

The scoverage plugin for SonarQube reads the coverage reports generated by sbt coverage test and displays those in sonar dashboard.

Here are the steps to configure Scala projects with SonarQube for code coverage as well as static code analysis.

  1. Install sonarqube and start the server.
  2. Go to the Sonarqube marketplace and install `SonarScala` plugin.

This plugin provides static code analyzer for Scala language. It supports all the standard metrics implemented by sonarQube including Cognitive complexity.

  1. Add `Scoverage` plugin to Sonarqube from the marketplace

This plugin provides the ability to import statement coverage generated by Scoverage for scala projects. Also, this plugin reads XML report generated by Scoverage and populates several metrics in Sonar.

Requirements:

i.  SonarQube 5.1

ii. Scoverage 1.1.0

4. Now add the `sbt-sonar` plugin dependency to your scala project                   addSbtPlugin(“com.github.mwz” % “sbt-sonar” % “1.6.0”)

This sbt plugin can be used to run sonar-scanner launcher to analyze a Scala project with SonarQube.

Requirements:

i.  sbt 0.13.5+

ii. Scala 2.11/2.12

iii. SonarQube server.

iv. sonar-scanner (See point#5 for installation)

5. Configure `sonar-scanner` executable

 

6. Now, configure the sonar-properties in your project. This can be done in 2 ways

  • Use sonar-project.properties file:

This file has to be placed in your root directory. To use an external config file you can set the sonarUseExternalConfig to true.

import sbtsonar.SonarPlugin.autoImport.sonarUseExternalConfig

sonarUseExternalConfig := true

  • Configure Sonar-properties in build file:
  • By default, the plugin expects the properties to be defined in the sonarProperties setting key in sbt
import sbtsonar.SonarPlugin.autoImport.sonarProperties

sonarProperties ++= Map(

"sonar.sources" -> "src/main/scala",

"sonar.tests" -> "src/test/scala",

"sonar.modules" -> "module1,module2")
  1. Now run the below commands to publish code analysis and code coverage reports in your sonarQube server.
  • sbt  coverage test
  • sbt  coverageReport
  • sbt  sonarScan

 

SonarQube integration is really useful to perform an automatic review of code to detect bugs, code smells and security vulnerabilities.  SonarQube can also track history and provide the visual representation of it.

Introduction to Akka Streams

Why Streams?

In software development, there can be cases where we need to handle the potentially large amount of data. So while handling these kinds of scenarios there can be issues such as `out of memory` exceptions so we should divide the data in chunks and handle the chunks independently.

There come Akka streams for rescue to do this in a more predictable and less chaotic manner.

Introduction

Akka streams consist of 3 major components in it – Source, Flow, Sink – and any non-cyclical stream consist of at least 2 components Source, Sink and any number of Flow element. Here we can say Source and Sink are the special cases of Flow.

  • Source – this is the Source of data. It has exactly one output. We can think of Source as Publisher.
  • Sink – this is the Receiver of data. It has exactly one input. We can think of Sink as Receiver.
  • Flow – this is the Transformation that acts on the Source. It has exactly one input and one output.

Here Flow sits in between the Source and Sink as they are the Transformations applied on the Source data.

 

 

A very good thing is that we can combine these elements to obtain another one e.g combine Source and Flow to obtain another Source.

Akka streams are called reactive streams because of its backpressure handling capabilities.

What are Reactive Streams?

Applications developed using streams can run into problems if Source is generating data too fast than the Sink can handle. This causes Sink to buffer the data – but the problem is if data is too large then Sink buffer will also grow and can lead to memory issues.

So to handle this Sink need to communicate with the Source – to slow down the generation of data until it finished handling of current data.  This handle of communication between Publisher and Receiver is called as Backpressure handling. And Streams that handle this mechanism are called Reactive Streams.

Example using Akka Stream:

In this example, let’s try to find out prime numbers between 1 to 10000 using Akka stream. Akka stream version used is 2.5.11.

 

package example.akka

import akka.{Done, NotUsed}
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._

import scala.concurrent.Future
object AkkaStreamExample {

def isPrime(i :Int) : Boolean = {
 if (i <= 1) false
 else if (i == 2) true
 else !(2 until i).exists(x => i % x == 0)
 }

def main(args: Array[String]): Unit = {
 implicit val system = ActorSystem("actor-system")
 implicit val materializer = ActorMaterializer()

val numbers = 1 to 10000

//Source that will iterate over the number sequence
 val numberSource: Source[Int, NotUsed] = Source.fromIterator(() => numbers.iterator)

//Flow for Prime number detection
 val isPrimeFlow: Flow[Int, Int, NotUsed] = Flow[Int].filter(num => isPrime(num))

//Source from original Source with Flow applied
 val primeNumbersSource: Source[Int, NotUsed] = numberSource.via(isPrimeFlow)

//Sink to print the numbers
 val consoleSink: Sink[Int, Future[Done]] = Sink.foreach[Int](println)

//Connect the Source with the Sink and run it using the materializer
 primeNumbersSource.runWith(consoleSink)
 }
}

 

Above example illustrated as a diagram:

 

 

  1. `Source` – based on the number iterator

`Source`, as explained already, represents a stream. Source takes two type parameters. The first one represents the type of data it emits and the second one is the type of the auxiliary value it can produce when ran/materialized. If we don’t produce any we use the NotUsed type provided by Akka.

The static methods to create Source are

  • fromIterator – its will accepts elements till iterator is empty
  • fromPublisher – uses object that provides publisher functionality
  • fromFuture – new Source from a given future
  • fromGraph – Graph is also a Source.
  1. `Flow` – filters out only prime numbers

Basically, `Flow` is an ordered set of transformations to the provided input. It takes 3 type parameters – input datatype, output datatype & auxiliary datatype.

We can create a Source by combining existing one and a Flow- as used in code

val primeNumbersSource: Source[Int, NotUsed] = numberSource.via(isPrimeFlow)

  1. `Sink` – prints numbers to the console

It is basically subscriber of the data and the last element of the Stream steps.

The sink is basically a Flow which uses foreach or fold function to run a procedure over its input elements and propagate the auxiliary value.

As with Source and Flow, the companion object provides a method for creating an instance of it. As mentioned above the two main methods of doing so are:

  • forEach – run the given function for each received element
  • foreachParallel – same as forEach – except runs in parallel
  • fold – run the given function for each received element, propagating the resulting value to the next iteration.

The runWith method produces a Future that will be completed when the Source is empty and Sink is finished with the processing of elements. If processing fails it returns Failure.

We can also create a RunnableGraph instance and run it manually using toMat (or viaMat).

  1. `ActorSystem` and `ActorMaterializer` are needed as Akka Stream uses Akka Actor model.

The `ActorMaterializer` class instance is needed to materialize a Flow into a Processor which represents a processing stage, which is a construct from the Reactive Streams standard, which Akka Streams implements.

In fact, Akka Streams employs back-pressure as described in the Reactive Streams standard mentioned above. Source, Flow, Sink get eventually transformed into low-level Reactive Streams constructs via the process of materialization.

App Store Connect API To Automate TestFlight Workflow

TestFlight

Most mobile application developers try to automate build sharing process as it is one of the most tedious tasks in an app development cycle. However, it always remained difficult especially for iOS developers because of Apple’s code signing requirements. So when iOS developers start thinking about automating build sharing, the first option which comes to their mind is TestFlight.

Before TestFlight acquisition by Apple, it was easy to automate build sharing process. TestFlight had it’s own public API’s (http://testflightapp.com/api) to upload and share builds from the command line. Developers used these API’s to write automation scripts. After Apple’s acquisition, TestFlight made part of app store connect and invalidated old API’s. Therefore to upload or share build developers had to rely on third-party tools like Fastlane.

App Store Connect API

In WWDC 2018, Apple announced new App Store Connect API and made it publicly available in November 2018. By using App Store Connect API, developers can now automate below TestFlight workflow without relying on any third party tool. The workflow includes:

In this short post, we will see a use case example of App Store Connect API for TestFlight.

Authentication

App Store Connect API is a REST API to access data from the Apple server. Use of this API requires authorization via JSON Web Token. API request without this token results in error “NOT_AUTHORIZED”. Generating the JWT Token is a tedious task. We need to follow the below steps to use the App Store Connect API:

  1. Create an API Key in app store connectportal
  2. Generate JWT token using above API key
  3. Send JWT token with API call

Let’s now deep dive into each step.

Creating the API Key

The API key is the pair of the public and private key. You can download the private key from App Store Connect and public key will be stored on the Apple server. To create the private key, follow the below steps:

  1. Login to app store connect portal
  2. Go to ‘Users and Access’ section
  3. Then select ‘Keys’ section

Account holder (Legal role) needs to request for access to generate the API key.

Once you get access, you can generate an API key.

There are different access levels for keys like Admin, App Manager, developer etc. Key with the ‘Admin’ access can be used for all App Store Connect API.

Once you generate the API key, you can download it. This key is available to download for a single time only, so make sure to keep it secure once downloaded.

The API key never expires, you can use it as long as it’s valid. In case you lose it, or it is comprised then remember to revoke it immediately. Because anyone who has this key can access your app store record.

Generate JWT Token

Now we have the private key required to generate the JWT token. To generate the token, we also need the below-mentioned parameters:

  1. Private key Id: You can find it on the Keys tab (KEY ID).
  2. Issuer Id: Once you generate the private key, you will get an Issuer_ID. It is also available on the top of the Keys tab.
  3. Token Expiry: The generated token can be used within a maximum of 20 minutes. It expires after lapse of the specified time.
  4. Audience: As of now it is “appstoreconnect-v1”
  5. Algorithm: The ES256 JWT algorithm is used to generate a token.

Once all the parameters are in place, we can generate the JWT token. To generate it, there is a Ruby script which is used in the WWDC demo.

require "base64"
require "jwt"
ISSUER_ID = "ISSUER_ID"
KEY_ID = "PRIVATE_KEY_ID"
private_key = OpenSSL::PKey.read(File.read("path_to_private_key/AuthKey_#{KEY_ID}.p8"))
token = JWT.encode(
 {
    iss: ISSUER_ID,
    exp: Time.now.to_i + 20 * 60,
    aud: "appstoreconnect-v1"
 },
 private_key,
 "ES256",
 header_fields={
 kid: KEY_ID }
)
puts token

 

Let’s take a look at the steps to generate a token:

  1. Create a new file with the name jwt.rb and copy the above script in this file.
  2. Replace Issuer_Id, Key_Id and private key file path values in the script with your actual
  3. To run this script, you need to install jwt ruby gemon your machine. Use the following command to install it: $ sudo gem install jwt
  4. After installing the ruby gem, run the above script by using the command: $ ruby jwt.rb

You will get a token as an output of the above script. You can use this token along with the API call! Please note that the generated token remains valid for 20 minutes. If you want to continue using it after 20 minutes, then don’t forget to create another.

Send JWT token with API call

Now that we have a token, let’s see a few examples of App Store Connect API for TestFlight. There are many APIs available to automate TestFlight workflow. We will see an example of getting information about builds available on App Store Connect. We will also look at an example of submitting a build to review process. This will give you an idea of how to use the App Store Connect API.

Example 1: Get build information:

Below is the API for getting the build information. If you hit this API without the jwt token, it will respond with an error

$ curl https://api.appstoreconnect.apple.com/v1/builds
{
 "errors": [{
 "status": "401",
 "code": "NOT_AUTHORIZED",
 "title": "Authentication credentials are missing or invalid.",
 "detail": "Provide a properly configured and signed bearer token, and make sure that it has not expired. Learn more about Generating Tokens for API Requests https://developer.apple.com/go/?id=api-generating-tokens"
 }]
}

So you need to pass above-generated jwt token in the request

$ curl https://api.appstoreconnect.apple.com/v1/builds --Header "Authorization: Bearer your_jwt_token”
{
"data": [], // Array of builds available in your app store connect account
"links": {
"self": "https://api.appstoreconnect.apple.com/v1/builds"
},
"meta": {
"paging": {
"total": 2,
"limit": 50
}
}
}

 

Example 2: Submit build for review process:

By using the above build API, you can get an ID for the build. Use this ID to submit a build for the review process. You can send the build information in a request body like:

{
 "data": {
 "type": "betaAppReviewSubmissions",
 "relationships": {
 "build": {
 "data": {
 "type": "builds",
 "id": “your_build_Id"
 }
 }
 }
 }
}

In the the above request body, you just need to replace your build ID. So the final request will look like:

$ curl -X POST -H “Content-Type: application/json” –data ‘{“data”:{“type”:”betaAppReviewSubmissions”,”relationships”:{“build”:{“data”:{“type”:”builds”,”id”:”your_build_Id”}}}}}’https://api.appstoreconnect.apple.com/v1/betaAppReviewSubmissions –Header “Authorization: Bearer your_jwt_token”

That’s it. The above API call will submit the build for the review process. This way you can use any other App Store Connect API like getting a list of beta testers or to manage beta groups.

Conclusion

We have seen the end-to-end flow for App store Connect API. By using these API you can automate TestFlight workflow. You can also develop tools to automate the release process without relying on any third-party tool. You can find the documentation for App Store Connect API here. I hope you’ll find this post useful. Good luck and have fun.

 

 

 

 

 

WebRTC – Basics of web real-time communication

WebRTC is a free open source standard for real-time, plugin-free video, audio and data communication between peers. Many solutions like Skype, Facebook, Google Hangout offer RTC but they need downloads, native apps or plugins. The guiding principles of the WebRTC project are that its APIs should be open source, free, standardized, built into web browsers and more efficient than existing technologies.

How does it work

  • Obtain a Video, Audio or Data stream from the current client.
  • Gather network information and exchange it with peer WebRTC enabled client.
  • Exchange metadata about the data to be transferred.
  • Stream audio, video or data.

That’s it ! .. well almost, it’s a dumbed down version of what actually happens. Since now you have an overall picture let’s dig into the details.

How it really works

WebRTC provides the implementation of 3 basic APIs to achieve everything.

  • MediaStream: Allowing the client to access a stream from a WebCam or microphone.
  • RTCPeerConnection: Enabling audio or video data transfer, with support for encryption and bandwidth management.
  • RTCDataChannel: Allowing peer-to-peer communication for any generic data.

Along with these capabilities, we will need a server (yes we still need a server !)  to identify the remote peer and to do the initial handshake. Once the peer has been identified we can directly transfer data between two peers if possible or relay the information using a server.

Let’s look at each of these steps in detail.

MediaStream

MediaStream has a getUserMedia() method to get access of Audio or Video or a data stream and provide success and failure handler.

 

navigator.getUserMedia(constraints, successCallback, errorCallback);

 

The constraints is a json which specifies if an audio or video access is required. In addition, we can specify some metadata about the constraints like video with and height, example:

 

navigator.getUserMedia({ audio: true, video: true}, successCallback, errorCallback);

 

RTCPeerConnection

This interface represents the connection between local WebRTC client and a remote peer. It is used to do the efficient transfer of data between the peers. Both the peers need to setup RtcPeerConnection at their end. In general, we use an RTCPeerConnection::onaddstream event callback to take care of audio/video stream.

  • The initiator of the call (the caller) needs to create an offer and send it to the callee, with the help of a signalling server.
  • Callee which receives the offer needs to create an answer and send it back to the caller using the signalling server.
ICE

It is a framework that allows web browsers to connect with peers. There are many reasons why a straight up connection from Peer A to Peer B simply won’t work. Most of the clients won’t have a public IP address as they are usually sitting behind a firewall and a NAT. Given the involvement of NAT, our client has to figure out the IP address of the peer machine. This is where Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN) servers come into the picture

STUN

A STUN server allows clients to discover their public IP address and the type of NAT they are behind. This information is used to establish a media connection. In most cases, a STUN server is only used during the connection setup and once that session has been established, media will flow directly between clients.

TURN

If a STUN server cannot establish the connection, ICE can switch to TURN. Traversal Using Relay NAT (TURN) is an extension to STUN, that allows media traversal over a NAT that does not allow a peer to peer connection required by STUN traffic. TURN servers are often used in the case of a symmetric NAT.

Unlike STUN, a TURN server remains in the media path after the connection has been established. That is why the term “relay” is used to define TURN. A TURN server literally relays the media between the WebRTC peers.

RTCDataChannel

The RTCDataChannel interface represents a bi-directional data channel between two peers of a connection. Objects of this type can be created using

 

RTCPeerConnection.createDataChannel()

 

Data channel capabilities make use of events based communication:

var peerConn= new RTCPeerConnection(),
     dc = peerConn.createDataChannel("my channel");
 
 dc.onmessage = function (event) {
   console.log("received: " + event.data);
 };

Links and References

Android life cycle aware components

What is a life cycle aware component?

A life cycle aware component is a component which is aware of the life cycle of other components like activity or fragment and performs some action in response to change in life cycle status of this component.

Why have life cycle aware components?

Let’s say we are developing a simple video player application. Where we have an activity named as VideoActivity which contains the UI to play the video and we have a class named as VideoPlayer which contains all the logic and mechanism to play a video. Our VideoActivity creates an instance of this VideoPlayer class in onCreate() method

 

 

Now as for any video player we would like it to play the video when VideoActivity is in foreground i.e, in resumed state and pause the video when it goes in background i.e when it goes in the paused state. So we will have the following code in our VideoActivity’s onResume() and onPause() methods.

 

 

Also, we would like it to stop playing completely and release the resources when the activity gets destroyed. Thus we will have the following code in VideoActivity’s onDestroy() method

When we analyze this code we can see that even for this simple application our activity has to take a lot of care about calling the play, pause and stop methods of VideoPlayer class. Now imagine if we add separate components for audio, buffering etc, then our VideoActivity has to take care of all these components inside its life cycle callback methods which leads to poor organization of code, prone to errors.

 

Using arch.lifecycle 

With the introduction of life cycle aware components in android.arch.lifecycle library, we can move all this code to the individual components. Our activities or fragments no longer need to play with these component logic and can focus on their own primary job i.e. to maintain UI. Thus, the code becomes clean, maintainable and testable.

The android.arch.lifecycle package provides classes and interfaces that prove helpful to solve such problems in an isolated way.

So let’s dive and see how we can implement the above example using life cycle aware components.

Life cycle aware components way

To keep things simple we can add the below lines to our app gradle file to add life cycle components from android.arch library

 

 

Once we have integrated the arch components we can make our VideoPlayer class implement LifecycleObserver, which is an empty interface with annotations.
Using the specific annotations with the VideoPlayer class methods it will be notified about the life cycle state changes in VideoActivity. So our VideoPlayer class will be like:

We are not done yet. We need some binding between this VideoPlayer class and the VideoActivity so that our VideoPlayer object gets notified about the life cycle state changes in VideoActivity.

Well, this binding is quite easy, VideoActivity is an instance of android.support.v7.app.AppCompatActivity which implements Lifecycleowner interface. Lifecycleowner interface is a single method interface which contains a method, getLifecycle(), to get the Lifecycle object corresponding to its implementing class which keeps track about the life cycle state changes of activity/fragment or any other component having a life cycle. This Lifecycle object is observable and notifies its observers about the change in state.

So we have our VideoPlayer, instance of LifecycleObserver, and we need to add this as an observer to the Lifecycle object of VideoActivity. So we will modify VideoActivity as:

Well it makes things quite resilient and isolated. Our VideoPlayer class logic is separated from VideoActivity. Our VideoActivity no longer needs to bother about calling its dependent components methods to pause or play in its life cycle callback methods which makes the code clean, manageable and testable.

Conclusion

The beauty of such separation of concern can be also felt when we are developing some library and we intend it to be used as a third party library. It should not be a concern for end users of our library i.e. developers who would be using our library, to call life cycle dependent methods of our library. They might miss it or may not be aware at all which methods to call (because developers don’t usually read the documentation completely) leading to memory leaks or worse app crashes.

Another use case can be when an activity depends on some network call handled by a  network manager class. We can make the network manager class life cycle aware so that it tries to supply the data to activity only when it is alive or better to not keep a reference to activity when it is destroyed. Thus, avoiding memory leaks.

We can develop a well managed app using the life cycle aware components provided by android.arch.lifecycle package. The resulting code will be loosely coupled and thus easy for modifications, testing and debugging which makes our life easy as developers.

Kotlin Kronicles for Android developers — part 1

Another blog on“why Kotlin”? cliché? Not really. This is more like a “why not Kotlin?” kind of blog post. This blog is my attempt to convince android app developers to migrate to Kotlin. It doesn’t matter if you have little or no knowledge of Kotlin, or you are an iOS developer who worships swift, read along, I am sure Kotlin will impress you (if not my writing).

I am going to show some of the amazing features of Kotlin programming language that makes development so much easy and fun. And makes the code so readable as if you are reading plain English. I read somewhere that “Programming language isn’t for computers, computers understand only 1s and 0s, it is for humans” I couldn’t agree more. There is a learning curve, sure, where isn’t? It pays off nicely. Kotlin makes us do more with fewer lines of code, kotlin makes us productive.

Lets quickly walk over some of the obvious reasons for migrating to Kotlin:

  • Kotlin is one of the officially supported languages for android app development as announced in Google IO 2017.
  • Kotlin is 100% interoperable with Java. Which basically means Kotlin can use Java classes and methods and vice versa.
  • Kotlin has several modern programming language features like lambdas, higher order functions, null safety, extensions etc.
  • Kotlin is developed and maintained by JetBrains, which is the company behind several integrated development environments that developers use every day (or IDEs like IntelliJ IDEA, PyCharm, PhpStorm, GoLand etc).

This is available all over the internet. This is the content of “Why Kotlin” category of blogs.

Let’s talk about something a little more interesting.

Higher Order Functions:

Kotlin functions are first class citizens. Meaning functions can be stored in variables, passed as arguments or returned from other functions. A higher-order function is a function that takes a function as a parameter or returns a function.

This may sound strange at first. Why in the world would I pass a function to another function? (or return a function from another function) It is very common in various programming languages including javascript, swift, python (and Kotlin apparently). An excellent example of a higher function is the map. The map is a higher order function that takes in a function as a parameter and returns a list of results of applying the given function in each item of the original list or array.

checkout the map function above in line 3. It applies the stringStrirrer() function to each item of x. The result of the map operation is in line 4 above.

Data classes:

Java POJOs or Plain Old Java Objects or simply classes that store some data require a lot of boilerplate code most of the times, like getters, setters, equals, hashCode, toString etc. Kotlin data class derives these properties and functions automatically from properties defined in the primary constructor.

Just one line of code to replace the several lines of java POJO. For custom behavior we can override functions in data classes. Other than this Kotlin data classes are also bundled with copy, components etc which allows copying object and de-structuring respectively.

Dealing With Strings:

Kotlin standard library makes dealing with strings so much easier. Here is a sample:

No helper classes, public static methods or StringUtils is required. We can invoke these functions as if they belong to String class itself.

Dealing with Collections:

Same as String, the helper methods in “java.util.Collections” class are no longer required. We can directly call “sort, max, min, reverse, swap etc” on collections.

Consider a bank use case. A bank has many customers, a customer does several transactions every month. Think in terms of objects:

As it is clear from the picture above, a bank has many customers, a customer has some properties (name, list of transactions etc) and several transactions, a transaction has properties like amount, type etc. It will look something like this in Java:

Find the customer with minimum balance:

I don’t know about you but I think the Kotlin way is much more clean, simple and readable. And we didn’t import any helper class for that (Java way needed Collections class). I can read it as plain English, which is more than what I can say for the Java counterpart. The motive here is not to compare Java with Kotlin, but to appreciate the Kotlin Kronicles.

There are several functions like map, filter, reduce, flatmap, fold, partition etc. Here is how we can simplify our tasks by combining these standard functions (for each problem statement below, imagine doing it in Java):

As it is clear from the above gist, we can solve mundane problems with much fewer lines of code. Readability wise I just love it. Above code explanation here:

  1. FlatMap: Returns a single list of all elements yielded from results of transform function being invoked on each element of the original array (in above cases the transform function returned list of transactions by each individual user)
  2. Filter and SumBy: Here we combined filter and sum by operations to write a one-liner code to find the total amount deposited and withdrawn from the bank considering all customers.
  3. Fold: Accumulates value starting with the initial value (0.0 in our case) and applying operation (when statement above) from left to right to current accumulator value and each element. Here we used fold and when to find the net amount deposited in the bank considering all deposits and withdrawals.
  4. Partition: Splits the original array into a pair of lists, where the first list contains elements for which predicate (the separation function in this case) yielded true, while the 2nd where it yielded false. Of course, we can filter twice, but this is so much easier.

So many complex operations simplified by the Kotlin standard library.

Extensions:

One of my favourite features. Kotlin extensions let us add functionality to a class without modifying the original class. Just like in Swift and C#. This can be very handy if used properly. Check this out:

In the above code, we just added a new function called “toINR()” to Kotlin’s Double type. So we basically added a new function in Kotlin’s primitive type, how about that 😎. And it is a one-liner function, no curly braces, return type, return statement whatsoever. Noticed that conciseness did you?.

Since Kotlin supports higher order functions we can combine this with extension functions to solidify our code. One very common problem with android development involving SQLite is, developers often forget to end the transaction. Then we waste hours debugging it. Here is how we can avoid it:

We added an extension function called “performDBTransaction” in SQLiteDatabaseThis function takes a parameter that is a function with no input and no output, and this parameter function is whatever we want executed in between begin and end transactions. This function calls beginTransaction() then the passed operation and then calls endTransaction(). We can use this function wherever required without having to double check if we called endTransaction or not.

I always forget to call commit() or apply() when storing data in Shared Preferences. Similar approach:

extension function persist() (line 9 above) takes care of it. We are calling persist() as if it is a part of SharedPreferences.Editor.

Smart Casts:

Going back to our bank example. Let’s say the transaction can be of three types as explained in below figure:

NEFT transaction has fixed charges, IMPS has some bank-related charges. Now we deal with a transaction object, the super class “Transaction”. We need to identify the type of transaction so that the transaction can be processed accordingly. Here is how this can be handled in Kotlin:

In line 10 and 11 in above code gist, we didn’t cast the Transaction object into NEFT or IMPS, yet we are able to invoke the functions of these classes. This is a smart cast in Kotlin. Kotlin has automatically casted the transaction object into its respective type.

Epilogue:

As developers, we need to focus on stuff that matters, the core part, and boilerplate code isn’t one of them. Kotlin helps in reducing boilerplate code and makes development fun. Kotlin has many amazing features which ease development and testing. Do not let the fear of the unknown dictate your choice of the programming language; migrate your apps to kotlin now. The initial resistance is the only resistance.

I sincerely hope you enjoyed the first article of this Kotlin Kronicles series. We have just tapped the surface. Stay tuned for part 2. Let me know if you want me to cover anything specific.

Got any suggestions? shoot below in comments.

What is your favourite feature of Kotlin?

Keep Developing…

AWS Batch Jobs

What is batch computing?

Batch computing means running jobs asynchronously and automatically, across one or more computers.

What is AWS Batch Job?

AWS Batch enables developers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (for example, CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.

Why use AWS Batch Job ?

  • Fully managed infrastructure – No software to install or servers to manage. AWS Batch provisions, manages, and scales your infrastructure.
  • Integrated with AWS – Natively integrated with the AWS Platform, AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Recognition.
  • Cost-optimized Resource Provisioning – AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot.

AWS Batch Concepts

  • Jobs
  • Job Definitions
  • Job Queue
  • Compute Environments

Jobs

Jobs are the unit of work executed by AWS Batch as containerized applications running on Amazon EC2. Containerized jobs can reference a container image, command, and parameters or users can simply provide a .zip containing their application and AWS will run it on a default Amazon Linux container.

$ aws batch submit-job –job-name poller –job-definition poller-def –job-queue poller-queue

Job Dependencies

Jobs can express a dependency on the successful completion of other jobs or specific elements of an array job.

Use your preferred workflow engine and language to submit jobs. Flow-based systems simply submit jobs serially, while DAG-based systems submit many jobs at once, identifying inter-job dependencies.

Jobs run in approximately the same order in which they are submitted as long as all dependencies on other jobs have been met.

$ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f …

Job Definitions

Similar to ECS Task Definitions, AWS Batch Job Definitions specify how jobs are to be run. While each job must reference a job definition, many parameters can be overridden.

Some of the attributes specified in a job definition are:

  • IAM role associated with the job
  • vCPU and memory requirements
  • Mount points
  • Container properties
  • Environment variables
$ aws batch register-job-definition –job-definition-name gatk –container-properties …

Job Queues

Jobs are submitted to a Job Queue, where they reside until they are able to be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours.

$ aws batch create-job-queue –job-queue-name genomics –priority 500 –compute-environment-order …

 

Compute Environments

Job queues are mapped to one or more Compute Environments containing the EC2 instances that are used to run containerized batch jobs.

Managed (Recommended) compute environments enable you to describe your business requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as x % of On-Demand) and AWS launches and scale resources on your behalf.

We can choose specific instance types (e.g. c4.8xlarge), instance families (e.g. C4, M4, R3), or simply choose “optimal” and AWS Batch will launch appropriately sized instances from AWS more-modern instance families.

Alternatively, we can launch and manage our own resources within an Unmanaged compute environment. Your instances need to include the ECS agent and run supported versions of Linux and Docker.

$ aws batch create-compute-environment –compute- environment-name unmanagedce –type UNMANAGED …

AWS Batch will then create an Amazon ECS cluster which can accept the instances we launch. Jobs can be scheduled to your Compute Environment as soon as the instances are healthy and register with the ECS Agent.

Job States

Jobs submitted to a queue can have the following states:

  • SUBMITTED: Accepted into the queue, but not yet evaluated for execution
  • PENDING: The job has dependencies on other jobs which have not yet completed
  • RUNNABLE: The job has been evaluated by the scheduler and is ready to run
  • STARTING: The job is in the process of being scheduled to a compute resource
  • RUNNING: The job is currently running
  • SUCCEEDED: The job has finished with exit code 0
  • FAILED: The job finished with a non-zero exit code or was cancelled or terminated.

AWS Batch Actions

  • Jobs: SubmitJob, ListJobs, DescribeJobs, CancelJob, TerminateJob
  • Job Definitions: RegisterJobDefinition, DescribeJobDefinitions, DeregisterJobDefinition
  • Job Queues: CreateJobQueue, DescribeJobQueues, UpdateJobQueue, DeleteJobQueue
  • Compute Environments: CreateComputeEnvironment, DescribeComputeEnvironments, UpdateComputeEnvironment, DeleteComputeEnvironment

AWS Batch Pricing

There is no charge for AWS Batch. We only pay for the underlying resources we have consumed.

Use Case

Poller and Processor Service

Purpose

Poller service needs to be run every hour like a cron job which submits one or more requests to a processor service which has to launch the required number of EC2 resource, process files in parallel and terminate them when done.

Solution

We plan to go with Serverless Architecture approach instead of using the traditional beanstalk/EC2 instance, as we don’t want to maintain and keep running EC2 server instance 24/7.

This approach will reduce our AWS billing cost as the EC2 instance launches when the job is submitted to Batch Job and terminates when the job execution is completed.

Poller Service Architecture Diagram

Processor Service Architecture Diagram

First time release

For Poller and Processor Service:

  • Create Compute environment
  • Create Job queue
  • Create Job definition

To automate above resource creation process, we use batchbeagle (for Installaion and configuration, please refer batch-deploymnent repository)

Command to Create/Update Batch Job Resources of a Stack (Creates all Job Descriptions, Job Queues and Compute Environments)

beagle -f stack/stackname/servicename.yml assemble

To start Poller service:

  • Enable a Scheduler using AWS CloudWatch rule to trigger poller service batch job.

Incremental release

We must create a new revision of existing Job definition environment which will point to the new release version tagged ECR image to be deployed.

Command to deploy new release version of Docker image to Batch Job (Creates a new revision of an existing Job Definition)

 

beagle -f stack/stackname/servicename.yml job update job-definition-name

Monitoring

Cloudwatch Events

We will use AWS Batch event stream for CloudWatch Events to receive near real-time notifications regarding the current state of jobs that have been submitted to your job queues.

AWS Batch sends job status change events to CloudWatch Events. AWS Batch tracks the state of your jobs. If a previously submitted job’s status changes, an event is triggered. For example, if a job in the RUNNING status moves to the FAILED status.

We will configure an Amazon SNS topic to serve as an event target which sends notification to lambda function which will then filter out relevant content from the SNS message (json) content and beautify it and send to the respective Environment slack channel .

CloudWatch Event Rule → SNS Topic → Lambda Function → Slack Channel

Batch Job Status Notification in Slack

Slack notification provides the following details:

  • Job name
  • Job Status
  • Job ID
  • Job Queue Name
  • Log Stream Name

Explanation of Git design principles through Git internals

In this blog we’ll explore the internals of how Git works. Having some behind-the-scenes working knowledge of Git will help you understand why Git is so much faster than traditional version control systems. Git also helps you recover data from unexpected crash/ delete scenarios.

For a developer, it is quite useful to understand the design principles of Git and see how it manages both speed of access (traversing to previous commits) and small disk space for repository.

In this blog, we will cover the following topics:

  • Initializing a new repository
  • Working directory and local repository
  • Git objects
  • Blob
  • Tree
  • Commit
  • Tag
  • Packs

I’m using Ubuntu 16.04 LTS, Zsh terminal and Git v2.7.4 for this blog, but you can use any operating system and terminal to follow along.

Initializing a new repository

  1. Initialize a new Git repository with ‘git init’ command.

 

  1. Create a couple of files by running the following commands.

$ echo “First file”>>first.txt

$ echo “Second file”>>second.txt

blog1

  1. Run ls –la to display all the contents of the folder.
    It should show .git directory and the two files we created.

blog2

Working directory and local repository

Working directory comprises of the files and folder we want to manage with Git Version Control System (VCS). In this case ~/testRepo folder constitutes our working directory except for ‘.git’ folder. .git folder forms the local repository. ‘.git’ folders contain everything that Git stores to manage our data.

 

Git objects

Git doesn’t store diff of the contents of the file. It stores snapshots of each file, that is  each version of the file is stored exactly as it is at the point it is staged and committed. This is done for faster access, one of the core principles for development of Git.

The latest version of the file is always stored as is in Git as it is most likely the one to be used. Also,  storing it makes it much faster for retrieve operations.

As further versions of the file are added, Git automatically creates pack files. We’ll discuss more on pack files later.

There are 4 types of git objects.

  • Blob
  • Tree
  • Commit
  • Tag

Let’s understand these with an example. First let’s check the contents of .git folder.

blog3

There are 5 directories and 3 files. Let’s start with objects.

blog5

Find command returned no results, i.e. there are currently no files in objects folder. Let’s stage our two files and then check.

blog6

We can see that there are two files. To explore these objects, we would need to use the command ’git cat-file

From the main pages of the Git cat-file, “Provides content or type and size information for repository objects”

blog7

-p argument pretty-prints the object contents and –t returns the type.

The objectId(SHA-1 Id) is subpath of the file from .git/objects folder without the slashes.

For example, object Id of .git/objects/20/d5b672a347112783818b3fc8cc7cd66ade3008 is 20d5b672a347112783818b3fc8cc7cd66ade3008.

The type that was returned for both the objects is blob.

So blobs are objects used for storing content of the file. Blobs just store the content, no other information like file name.

Let’s commit our code now with ‘git commit’ command.

blog8

Next, run ‘git log’ to retrieve the commit ID.

blog9

Copy the commit ID and then run cat-file commands on it.

blog10

The object type that is returned is ‘commit’. It contains reference to a tree object, author name, committer name and the commit message.

Now let’s check the tree object.

blog11

blog12

The tree object contains reference to the blob files we saw earlier, and also the reference to file names. We can summarize our Git repository state at this point with the following object diagram:

blog13

 

Let’s add a folder to our repository.

Run the following commands. We’ll use ‘git add’ to add folder from working directory to the local repository.

$ mkdir fol1

$ echo “Third File”>> fol1/third.txt

$ git add fol1

$ git commit -m “Second commit”

Inspect the second commit object.

blog14

Notice that the tree reference has changed. Also, there is a parent object property. So commit object also stores the parent commit’s id. Let’s inspect the tree.

blog15

The folder we just added, fol1, is stored as a tree and it contains reference to a blob referencing third.txt file. So tree objects reference blobs or other sub tree objects.

blog16

Now let’s discuss tags. Tags are used to provide an alias to a commit SHA ID for future reference/use.  There are two types of tags: lightweight tags and annotated tags.

  • Lightweight tags contain only the SHA-ID information.

blog17

The command ‘git tag light’ creates a file under .git/refs/tags/light which contains the commit Id on which the tag was created. No separate tag object is created. This is mostly used for development purposes, to easily remember and traverse back to a commit.

  • Annotated tags are usually used for release. They contain extra information like message and the tagger name along with the commit ID. Annotated tags can be created with ‘ git tag –a –m “<message>” ‘ command.

blog18

A separate tag object is created for annotated tag. You can list the tags created with ‘git tag’ command

Packs

  1. Although Git stores the contents of the latest versions of object intact, the older versions are stored as deltas in pack files. Let’s understand with an example. We would need a slightly larger file to see the difference in size of the delta file and original file. Download GNU license web page.
  2. Run the following command to download the license HTML file.

$ curl -L -O -C – https://www.gnu.org/licenses/gpl-3.0.en.html

You should now have gpl-3.0.en.H file in your working directory.

  1. Add and commit the file.

$ git add gpl-3.0.en.html

$ git commit -m “Added gpl file”

Inspect the commit and get the blob info of the added file.

git cat-file -p 53550a1c9325753eb44b1428a280bfb2cd5b90ef

blog19

The last command returns the size of the blob. That is the blob containing content of gpl-3.0.en.html is 49641 bytes.

4. Edit the file and commit it again.

blog20

A new blob is created with a slightly larger size. Now let’s check the original blob.

blog21

The original blog still exists. Thus for each change, a snapshot of the file is stored with its contents intact.

5. Let’s pack the files, with ‘git gc’ command.

As you can see, all of our existing blob and commit objects are gone and are now replaced with pack files.

blog22

6. We can run ‘git verify-pack’ command to inspect the pack file.

blog23

Now let’s check the highlighted entries.

The blob with commit ‘6be03f’ is the second version of gpl-3.0.en.HTML and the one with commit ‘0f6718’ is the first version.

The third column in output represents the blob size. As you can see, the first blob is now reduced to 9 bytes and references the second blob which maintains its size of 49652. Thus the first blob is stored as a delta of second blob although the first one is older. This is because the newer version is most likely to be the one to be used. Git automatically calls pack when pushing to remote repository.

Conclusion

In this blog we explored how Git stores the files internally and looked at the various types of Git objects, i.e, blob, tree and commit and how they are linked to each other. Then we also looked at packs, which explained how Git compresses older versions of a file and saves storage space.

Performance Testing for Serverless Architecture (AWS)

Purpose

This document provides a brief introduction/idea of how JMETER can be used with a serverless architecture also, like AWS, for evaluating the no. of read/writes capacity to benchmark the load an application can survive with.

What are AWS Lambdas? 

The code written is deployed over the cloud with one or more lambda functions, to AWS lambda, a compute service that runs the code on our behalf. Continue reading Performance Testing for Serverless Architecture (AWS)