AWS Batch Jobs

What is batch computing?

Batch computing means running jobs asynchronously and automatically, across one or more computers.

What is AWS Batch Job?

AWS Batch enables developers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (for example, CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.

Why use AWS Batch Job ?

  • Fully managed infrastructure – No software to install or servers to manage. AWS Batch provisions, manages, and scales your infrastructure.
  • Integrated with AWS – Natively integrated with the AWS Platform, AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Recognition.
  • Cost-optimized Resource Provisioning – AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot.

AWS Batch Concepts

  • Jobs
  • Job Definitions
  • Job Queue
  • Compute Environments

Jobs

Jobs are the unit of work executed by AWS Batch as containerized applications running on Amazon EC2. Containerized jobs can reference a container image, command, and parameters or users can simply provide a .zip containing their application and AWS will run it on a default Amazon Linux container.

$ aws batch submit-job –job-name poller –job-definition poller-def –job-queue poller-queue

Job Dependencies

Jobs can express a dependency on the successful completion of other jobs or specific elements of an array job.

Use your preferred workflow engine and language to submit jobs. Flow-based systems simply submit jobs serially, while DAG-based systems submit many jobs at once, identifying inter-job dependencies.

Jobs run in approximately the same order in which they are submitted as long as all dependencies on other jobs have been met.

$ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f …

Job Definitions

Similar to ECS Task Definitions, AWS Batch Job Definitions specify how jobs are to be run. While each job must reference a job definition, many parameters can be overridden.

Some of the attributes specified in a job definition are:

  • IAM role associated with the job
  • vCPU and memory requirements
  • Mount points
  • Container properties
  • Environment variables
$ aws batch register-job-definition –job-definition-name gatk –container-properties …

Job Queues

Jobs are submitted to a Job Queue, where they reside until they are able to be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours.

$ aws batch create-job-queue –job-queue-name genomics –priority 500 –compute-environment-order …

 

Compute Environments

Job queues are mapped to one or more Compute Environments containing the EC2 instances that are used to run containerized batch jobs.

Managed (Recommended) compute environments enable you to describe your business requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as x % of On-Demand) and AWS launches and scale resources on your behalf.

We can choose specific instance types (e.g. c4.8xlarge), instance families (e.g. C4, M4, R3), or simply choose “optimal” and AWS Batch will launch appropriately sized instances from AWS more-modern instance families.

Alternatively, we can launch and manage our own resources within an Unmanaged compute environment. Your instances need to include the ECS agent and run supported versions of Linux and Docker.

$ aws batch create-compute-environment –compute- environment-name unmanagedce –type UNMANAGED …

AWS Batch will then create an Amazon ECS cluster which can accept the instances we launch. Jobs can be scheduled to your Compute Environment as soon as the instances are healthy and register with the ECS Agent.

Job States

Jobs submitted to a queue can have the following states:

  • SUBMITTED: Accepted into the queue, but not yet evaluated for execution
  • PENDING: The job has dependencies on other jobs which have not yet completed
  • RUNNABLE: The job has been evaluated by the scheduler and is ready to run
  • STARTING: The job is in the process of being scheduled to a compute resource
  • RUNNING: The job is currently running
  • SUCCEEDED: The job has finished with exit code 0
  • FAILED: The job finished with a non-zero exit code or was cancelled or terminated.

AWS Batch Actions

  • Jobs: SubmitJob, ListJobs, DescribeJobs, CancelJob, TerminateJob
  • Job Definitions: RegisterJobDefinition, DescribeJobDefinitions, DeregisterJobDefinition
  • Job Queues: CreateJobQueue, DescribeJobQueues, UpdateJobQueue, DeleteJobQueue
  • Compute Environments: CreateComputeEnvironment, DescribeComputeEnvironments, UpdateComputeEnvironment, DeleteComputeEnvironment

AWS Batch Pricing

There is no charge for AWS Batch. We only pay for the underlying resources we have consumed.

Use Case

Poller and Processor Service

Purpose

Poller service needs to be run every hour like a cron job which submits one or more requests to a processor service which has to launch the required number of EC2 resource, process files in parallel and terminate them when done.

Solution

We plan to go with Serverless Architecture approach instead of using the traditional beanstalk/EC2 instance, as we don’t want to maintain and keep running EC2 server instance 24/7.

This approach will reduce our AWS billing cost as the EC2 instance launches when the job is submitted to Batch Job and terminates when the job execution is completed.

Poller Service Architecture Diagram

Processor Service Architecture Diagram

First time release

For Poller and Processor Service:

  • Create Compute environment
  • Create Job queue
  • Create Job definition

To automate above resource creation process, we use batchbeagle (for Installaion and configuration, please refer batch-deploymnent repository)

Command to Create/Update Batch Job Resources of a Stack (Creates all Job Descriptions, Job Queues and Compute Environments)

beagle -f stack/stackname/servicename.yml assemble

To start Poller service:

  • Enable a Scheduler using AWS CloudWatch rule to trigger poller service batch job.

Incremental release

We must create a new revision of existing Job definition environment which will point to the new release version tagged ECR image to be deployed.

Command to deploy new release version of Docker image to Batch Job (Creates a new revision of an existing Job Definition)

 

beagle -f stack/stackname/servicename.yml job update job-definition-name

Monitoring

Cloudwatch Events

We will use AWS Batch event stream for CloudWatch Events to receive near real-time notifications regarding the current state of jobs that have been submitted to your job queues.

AWS Batch sends job status change events to CloudWatch Events. AWS Batch tracks the state of your jobs. If a previously submitted job’s status changes, an event is triggered. For example, if a job in the RUNNING status moves to the FAILED status.

We will configure an Amazon SNS topic to serve as an event target which sends notification to lambda function which will then filter out relevant content from the SNS message (json) content and beautify it and send to the respective Environment slack channel .

CloudWatch Event Rule → SNS Topic → Lambda Function → Slack Channel

Batch Job Status Notification in Slack

Slack notification provides the following details:

  • Job name
  • Job Status
  • Job ID
  • Job Queue Name
  • Log Stream Name

Comparing productivity of node.js frameworks

Our mission is to compare the node.js frameworks on productivity.

In one of my previous blogs I have benchmark the various node.js frameworks performance against native http call and native mongodb driver and native combination was clear winner in term of performance.

https://blog.talentica.com/2017/11/14/comparing-performance-of-node-js-frameworks/

so, Why not use only native http and native mongodb driver. well one of the the key aspect and usp of node.js frameworks is that they provide lot of abstractions and as a developer you don’t have to write boiler plate , repetitive code . so lets see what our research has come up with against this concept. Continue reading Comparing productivity of node.js frameworks

Top Mistakes while taking charge of an Existing Team

If you’ve constantly worked in bootstrap or greenfield projects, you develop an intuition for partnering with the customer and assembling a sharp team that will build out the product. Bootstrap projects come with their own challenges but everyone is new on the team and goes through the team journey together. However, entering an existing long-standing team as the Development Manager is a different ball game altogether. Like a movie director, every manager has his or her way of leading a team. That is great, but there are a few things that put you on the wrong side of the road. Here are the nominees, for top performances on the wrong side, in no particular order. Continue reading Top Mistakes while taking charge of an Existing Team

Creating daily status updates email from JIRA

Context

My client has always preferred reading an email at the start of his day that summarizes all the work his dev team was up to the night before. We call it the status mail. It usually contains the list of tasks we have worked on, containing its full description along with our comments/status on that day. Such reports are favorites among clients who are non-technical or who cannot afford to spend time in JIRA, as they get all updates in one go. But, of course, this is beneficial only for a team size of 5-20 members. Any more than that, the mail itself becomes too long to read. Continue reading Creating daily status updates email from JIRA