An Empirical Evaluation of Real-Time Stream Processing Frameworks for Handling High Velocity Big Data

Journal Name: International Journal of Business Intelligence and Big Data Analytics
Publisher: Tensorgate
Copyright: © 2024 International Journal of Business Intelligence and Big Data Analytics. All rights reserved.
Disclaimer: The views and opinions in the articles published in this journal are those of the authors and do not necessarily represent the official policy of the journal or its publisher.

Rajesh Sharma; Chietra Jalota

An Empirical Evaluation of Real-Time Stream Processing Frameworks for Handling High Velocity Big Data

Authors

Rajesh Sharma Department of Computer Science, Indian Institute of Science (IISc), Bangalore
Chietra Jalota Lingayas Vidyapeeth

Keywords:

Stream, processing, framework, performance, latency, throughput model, Occupational health and safety

Abstract

The exponential growth of data in motion, also known as streaming data or big data, has necessitated the development of specialized data processing platforms that can handle the volume, velocity and variety of such data in real-time. This study empirically evaluates three leading open-source, real-time stream processing frameworks – Apache Storm, Apache Spark Streaming, and Apache Flink – on critical performance metrics like throughput, latency and fault tolerance when applied to high velocity big data workloads. Six experiments were conducted using both synthetic and real-world streaming data to measure throughput and latency while scaling up cluster resources. Fault tolerance tests were performed by killing execution nodes and measuring system recovery times. Results indicate that Flink outperformed Storm and Spark Streaming in most tests, achieving up to 5 times higher throughput with half the latency, as well as sub-second recovery from failures. Storm showed the most inconsistent performance across experiments. We discuss the advantages and limitations of each framework and offer recommendations for selecting the right stream processing platform based on use case requirements around scalability, responsiveness and reliability. The empirical evaluations provided can serve as a practical guide for organizations planning production deployments of real-time analytics on fast data.

An Empirical Evaluation of Real-Time Stream Processing Frameworks for Handling High Velocity Big Data