The data lake is a central concept of AireInsights. It is used to store and feed the data to the data pipelines. Data can be stored in the data lake even without having a pipeline in place to process it. This can be useful to collect data before it is obvious how the processing and the analysis of the data will pan out.
Data collections
Inside the data lake the data is stored as a simple string without any structured. The main way to organise the data is by assigning it to a data collection. All data items belonging to the same data collection should have the same data format and even the same or at least a compatible structure. This is important as when the data is sent to a pipeline the pipeline has to assume the structure of the data to be able to map it into the processing steps.
Using AWS S3
The datalake allows a user to add their own AWS S3 buckets to store the data in.