Contact Us

Contact Us



Pleas confirm by checkbox


Technical

Apache Flink Standalone Setup on Linux/macOS

Author_img
By Rajkishore Hembram October 13, 2021

To develop and run various types of applications, Apache Flink is considered a great choice because of the variety of features it offers. Some of its prominent features are stream and batch processing support, advanced state management, event-time processing semantics, and consistency for the state. You can deploy Flink on multiple resource providers like YARN, Apache Mesos, and Kubernetes. Apache Flink is the new generation of Big Data Platform. It is also known as 4G for Big Data,  Delivering high throughput and low latency. Since it supports both batch and stream processing, it enables users to analyze historical data and real-time data.

While developing or testing any application, we need a playground to experiment with our application. We need our playground to be independent, available at affordable prices, and easily preparable. Apache Flink supports creating a standalone cluster with few simple steps. It provides a friendly Web UI for monitoring the cluster and the jobs. In this blog, I will provide a brief overview of Apache Flink.

Then I will look into the prerequisites for setting up Apache Flink standalone cluster.

After that, I will set up a local standalone cluster. You will also learn how to start the cluster, submit an example WordCount job, and terminate the cluster finally.

What is Apache Flink?

Apache Flink is an open-source distributed computing framework for stateful computations over bounded and unbounded data streams. It is an actual stream processing framework that supports Java, Scala, and Python. Has a high-level Dataset API for batch application, DataStream API for processing continuous streams of data, Table API for running SQL queries on batch and streaming data.  It also comprises Gelly for graph processing, FlinkML for machine learning application, and FlinkCEP for complex event processing that detects intricate event patterns in data streams.

Prerequisites for Apache Flink

Apache Flink is developed by leveraging Java and Scala. Therefore, it requires a compatible JVM environment to run the Flink applications. Some of the prerequisites for Apache Flink are as mentioned below:

LanguageVersion
JavaJava 8/11
Scala (only for writing driver programs in Scala)Scala 2.11/2.12

Local Installation

In this section, we will do a local installation of the Apache Flink. Its local installation directory contains shell scripts, jar files, config files, binary files, etc. All these files are necessary to manage the cluster, monitor it, and run Flink applications.

First, we have to ensure that we have the correct Java 8/11 installation.

Then, we will download the Apache Flink distribution file from here,

https://flink.apache.org/downloads.html.

We will extract the distribution file using the tar command

Now, visit Flink’s home directory to carry out the next steps:

Once, the local standalone cluster is installed, let’s start the cluster now.

Start the Cluster

The standalone cluster runs independently without interacting with the outside world. Once the cluster is up, jobs can be submitted to the cluster. The local Flink cluster quickly starts using a single script.

The Flink Web UI will be available on http://localhost:8081/.

Apache Flink Dashboard

After successfully starting the cluster, the next step is to submit a job.

Submit a Job

A job is an application running in the cluster. The application is defined in a single file or a set of files known as a driver program. We write driver programs in Java or Scala, compiles them, and then build their jars. These jars are submitted in the running clusters through the command-line interface or the Flink Web UI.

The jar can have multiple class files. We can explicitly provide the name of the class file containing the driver program while submitting the jar, or it can be referred from the jar’s manifest file.

In this example, we will submit a WordCount job using the command line interface and using one of the existing example jars provided by Flink.

We can check the output of the job by running the tail command on the output log.

We can also monitor the running and completed jobs in the Flink Web UI.

Apache Flink web dashboard

After carrying out these steps, we will now move towards the last step of terminating the cluster.

Stop the Cluster

When running the jobs on the cluster are finishes, stop the cluster and all running components by using a single script.

Conclusion

Apache Flink is a lot faster, more flexible, and versatile than other Big Data frameworks. A local setup will be an excellent start for learning Apache Flink. It also provides us with a nice environment to test our Flink applications before deploying them in production.

So, do try this method and share your experience with us. Happy coding!

Related posts
Apache Spark Standalone Setup On Linux/macOS
Technical

Apache Spark Standalone Setup On Linux/macOS

By Siddhi October 20, 2021
Identity, Authentication, And Access Management Using Azure Active Directory  
Technical

Identity, Authentication, And Access Management Using Azure Active Directory  

By Siddhi September 22, 2021
Things to Know Before You Select A Crypto Wallet
Blockchain

Things to Know Before You Select A Crypto Wallet

By Siddhi September 20, 2021
Solve 3 Most Irritating Outlook Email Rendering Issues.
Technical

Solve 3 Most Irritating Outlook Email Rendering Issues.

By Siddhi September 15, 2021
Intuit Wasabi – A Scalable A/B Testing Solution
Technical

Intuit Wasabi – A Scalable A/B Testing Solution

By Siddhi September 01, 2021
How To Pick The Right Data Analytics Strategy For Serverless Systems?
Big Data

How To Pick The Right Data Analytics Strategy For Serverless Systems?

By Siddhi August 25, 2021
Change Notifications and Named Options using Options pattern in .NET Core
Technical

Change Notifications and Named Options using Options pattern in .NET Core

By Siddhi August 18, 2021
Create Strongly Typed Configurations in .NET Core
Technical

Create Strongly Typed Configurations in .NET Core

By Siddhi August 13, 2021
Partitioning Database - A Divide and Rule Strategy
Technical

Partitioning Database - A Divide and Rule Strategy

By Siddhi June 02, 2021
Blockchain Interoperability Solution: How Chainbridge Can Be A Way Out?
Blockchain

Blockchain Interoperability Solution: How Chainbridge Can Be A Way Out?

By Siddhi May 26, 2021

Stay updated

Get the latest creative news from Fubiz about art, design and pop-culture.