Big Data Architecture : Understanding the Lambda Architecture with Detailed Explanation
There are a lot of different tools for handling massive amounts of data: for Storage, Analysis or Dissemination, for example. But how to put these different tools together to build an architecture that can Scale, be Fault-tolerant and Easily extensible, all without blowing up the costs ?
In this article, I’ll introduce you to a very popular architecture model that can be applied to almost any situation requiring massive data. It’s called Lambda Architecture.
It is a model that will allow you to design an architecture that fits your needs while keeping a modular structure. I will present in detail this generic model as well as concrete technical choices that meet the specifications of the different components.
Lambda architecture definition
The Lambda Architecture is a deployment model for data processing that organisations use to combine a traditional batch pipeline with a fast real-time stream pipeline for data access. It is a common architecture model in IT and development organisations toolkits as businesses strive to become more data-driven and event-driven in the face of massive volumes of rapidly generated data.
The design of a Lambda Architecture is guided by the following constraints:
- Scaling : the proposed architecture must be able to scale horizontally, i.e. by adding servers. This growth must be done while…