An Empirical Evaluation of Real-Time Stream Processing Frameworks for Handling High Velocity Big Data

An Empirical Evaluation of Real-Time Stream Processing Frameworks for Handling High Velocity Big Data

Authors

  • Rajesh Sharma Department of Computer Science, Indian Institute of Science (IISc), Bangalore
  • Chietra Jalota Lingayas Vidyapeeth

Keywords:

Stream, processing, framework, performance, latency, throughput model, Occupational health and safety

Abstract

The exponential growth of data in motion, also known as streaming data or big data, has necessitated the development of specialized data processing platforms that can handle the volume, velocity and variety of such data in real-time. This study empirically evaluates three leading open-source, real-time stream processing frameworks – Apache Storm, Apache Spark Streaming, and Apache Flink – on critical performance metrics like throughput, latency and fault tolerance when applied to high velocity big data workloads. Six experiments were conducted using both synthetic and real-world streaming data to measure throughput and latency while scaling up cluster resources. Fault tolerance tests were performed by killing execution nodes and measuring system recovery times. Results indicate that Flink outperformed Storm and Spark Streaming in most tests, achieving up to 5 times higher throughput with half the latency, as well as sub-second recovery from failures. Storm showed the most inconsistent performance across experiments. We discuss the advantages and limitations of each framework and offer recommendations for selecting the right stream processing platform based on use case requirements around scalability, responsiveness and reliability. The empirical evaluations provided can serve as a practical guide for organizations planning production deployments of real-time analytics on fast data.

Author Biography

Rajesh Sharma, Department of Computer Science, Indian Institute of Science (IISc), Bangalore

 

 

 

 

An Empirical Evaluation of Real-Time Stream Processing Frameworks for Handling High Velocity Big Data

Downloads

Published

2022-01-25

How to Cite

Sharma, R., & Jalota, C. (2022). An Empirical Evaluation of Real-Time Stream Processing Frameworks for Handling High Velocity Big Data. International Journal of Business Intelligence and Big Data Analytics, 5(1), 57–65. Retrieved from https://research.tensorgate.org/index.php/IJBIBDA/article/view/77
Loading...