Contact Us

Contact Us



Pleas confirm by checkbox


Big DataTechnical

How To Pick The Right Data Analytics Strategy For Serverless Systems?

Author_img
By Priya Kar August 25, 2021

How To Pick The Right Data Analytics Strategy For Serverless Systems?

A 2018 survey of The Newstack suggests that around 46% of IT decision-makers are either using or evaluating options of going serverless. Organizations of all sizes, be it cloud-native startups or large enterprises, are exploring opportunities in this field. Now, if you dig a little it will reveal that companies are pursuing this strategic move to avoid tech hassles, reduce cost, and focus more on bringing their ideas to the market.

Most of these companies are banking on AWS Serverless applications. As a user, you may have opted for Lambda functions (the most preferred one) to host the business logic via APIs and AWS Aurora Serverless to store and manage data for the web application. You can use this stored data for both reporting and analytics purposes. On the other hand, you can apply BI to develop new business strategies based on the insights or patterns observed in the data.

Serverless DB systems

Analyzing A Use Case

Let’s consider a coaching platform to improve participants’ current knowledge or skill level. Such programs include various sessions on skills and sub-skills with learners and coaches. After its completion, you can do a survey to find out how much participants have improved. If you use analytics for it, you can assess their strengths and weaknesses too. You can go even further and get feedback on coaches and learning content because it is not one-dimensional. Then, with the collected data, you can plan your strategy

If you use analytics properly, getting real-time insights regarding engagement would not be a difficult task. Suppose the engagement or the feedback is negative, you can immediately launch corrective measures. The entire process has the potential to improve the program’s efficacy.

Things to do before introducing Analytics

Now, if the deployment of Analytics and a serverless DB is your priority, you need to consider a few factors for AWS Aurora Serverless:

    1. Analytics introduces a higher amount of reads and might involve a lot of computation.
    2. It requires the proper execution of aggregation operations. Frequent or heavy use of analytics can keep your DB busy with heavy reads, causing a bottleneck for traditional applications.
    3. The serverless DB might scale up based on the percentage of utilization or the maximum connections reached in serverless applications. But the scaleout operation can take up to 2.5 mins, which might affect the user. Lack of speed in the application may become apparent.

Things to do before introducing serverless DB

Points to be considered for serverless DBs:

    1. One of the major blockers of the AWS Aurora Serverless DB is that it cannot create Read replicas.
    2. AWS Aurora Serverless does not guarantee durability.
    3. The DB instance for an Aurora Serverless DB cluster is created in a single Availability Zone. automatic Multi-AZ failovertakes longer in the case of Serverless DB.
    4. There are constraints regarding DB connection pooling if Data API is not getting used. AWS Aurora serverless does not support RDS proxy for DB connection pooling.

Building an efficient data pipeline

Although AWS Aurora Serverless manages scalability, high availability, and maintains the DB at the AWS end, you must be aware of the constraints before building a resilient system.

If there are cases where analytics is required and you have the AWS Aurora Serverless DB, you can use Elasticsearch / Redis / DynamoDB / Redshift as the source to pick the analytics data. Build the data pipelines to update the raw Data or computed information to the secondary storage incrementally.

Another option for data pipelines is to have messaging queues. These queues will listen to events and update the secondary storage post computations accordingly.

You can also improve the speed by having a design pattern with a denormalized DB or domain aggregation or star schema. It can provide near real-time aggregated data for analytics. Data can be aggregated and stored in a normalized table periodically or processed post listing to events in this scenario. You can use this information directly for analytics.

Strategy Comparison

Analytics can introduce heavy reads or higher levels of computations. So, you have to strategize wisely. Please find the comparison between various strategies as mentioned below.

Key Consideration Strategy 1 Strategy 2 Strategy 3 Strategy 4
Name Aurora Serverless having Denormalized Tables – Using higher configuration machine Provisioned DB with Read Replicas having Denormalized Tables Using ElasticSearch with Aurora Serverless MySQL

This will need some data pipeline/queues to keep the data in sync

Aurora Serverless  V2 having Denormalized Tables
Speed Fast Faster Fastest Fast (Need to Benchmark)
Cost Pay as you use

Aurora Capacity Unit              $0.06 per ACU Hour

(0.06*24*30)=$44.64 at max

db.t3.medium – $0.065/Hour = 48.36$ per Month

Storage Rate              $0.10 per GB-month

I/O Rate              $0.20 per 1 million requests

t3.medium.elasticsearch $0.073/hour

(0.073*24*31) = $ 54.312 per Month

Pay as you use

Aurora Capacity Unit              $0.06 per ACU Hour

(0.12*24*30)=$86.64 at max

Durability No Yes as Read Replicas can be created Yes  if Replication is done Yes
Scaling Autoscaling Need to be configured Need to be configured Autoscaling
High Availability Provided by AWS Need to be managed Need to be managed Provided by AWS
Maintenance Provided by AWS Need to be managed Need to be managed Provided by AWS
Analytics Usage Pattern Low High High Medium to High
Latency Low (Need to benchmark) Low (Need to benchmark) Lowest Low (Need to benchmark)
API Integration Yes, Data API can be used No Yes No
Pros Scaling and Maintenance will be taken care of by AWS No issue regarding connection Pooling

Performant

Fast

Read Replication Available

Granular Scaling available

Scaling Up and Down will be faster than Aurora serverless V1

Cons There might be slowness observed during scale-out.

Connection Pooling Issue

While scaling it doubles the instance size.

Higher Cost

Need to manage scaling and handling

Higher Cost

New to Market

Expensive

Does not support AWS RDS to solve the connection pooling issue.

After going through the strategy comparison chart, you will be able to plan out a proper strategy. Try this process and share your experience with us. Stay safe and happy coding!

References:

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html

https://aws.amazon.com/rds/aurora/serverless/

https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/

Related posts
An Introduction to Flink and Better Batch Processing
Big Data

An Introduction to Flink and Better Batch Processing

By kulwinder.singh March 31, 2021
Setting up development environment for Google App Engine and Python
Big Data

Setting up development environment for Google App Engine and Python

By kulwinder.singh August 23, 2016
Tech trends for 2016 and how startups would capitalize on them
Big Data

Tech trends for 2016 and how startups would capitalize on them

By kulwinder.singh March 15, 2016
Build a Custom Solr Filter to Handle Unit Conversions
Big Data

Build a Custom Solr Filter to Handle Unit Conversions

By kulwinder.singh March 14, 2016
Flexible Data Extraction from Multiple Sources for Analytics
Big Data

Flexible Data Extraction from Multiple Sources for Analytics

By kulwinder.singh October 01, 2012

Stay updated

Get the latest creative news from Fubiz about art, design and pop-culture.