• A lake full of data

    Some call Google the omniscient garbage dump. Pretty disrespectful for perhaps the best search engine the world has ever seen. In fact, however, this is exactly what makes an excellent data source so valuable: Supposedly irrelevant data later turn out to be elementary factors. That's the difference between a Data Lake and a Data Warehouse.

     

Erfassung, Speicherung und Interpretation von Informationen im Data Lake

raw data

Recording of Raw Data

In contrast to the data warehouse, a Data Lake does not yet know exactly which data is required afterwards, at the time of data collection. The amount of data is large and a simple and uncomplicated recording of the most diverse data formats is correspondingly important. A schema-agnostic storage is therefore useful at the time of entry.

Data Lake

Storage in Data Lake

Significant amounts of data can accumulate in a short time: The Data Lake. It is the proverbial source of later analyses. To meet the ever increasing space requirements, scalable storage and database solutions must be used that can grow with your requirements.

Analysis

Interpretation on Demand

The advantage over the classic data warehouse: You can move the complex part of data analysis to a point in time when you actually need it. You don't have to decide today which data you need in three months or two years, but can create evaluations precisely as required.

 

 

Offene Lösung für maximale Investitionssicherheit

A challenge when setting up a data lake is the avoidance of a vendor lock-in , i.e. dependence on a service provider or manufacturer: The initial costs for setting up the system are manageable - however, more and more requirements are made over time and solved by individual extensions. The longer the system is in operation, the higher its value - both in terms of the data it contains and the hours worked in connecting data sources and analyzing data.

This is fine as long as you are satisfied with your service provider and the manufacturer of this solution. Should this no longer be the case, you must have the flexibility to switch to another provider with your existing system in order to secure your already extensive investment. This independence is only guaranteed if you consistently leverage open source systems such as the components of the SMACK stack (Spark, Mesos, Akka, Cassandra and Kafka) or the free variants of elastic software (e.g. ElasticSearch, <Logstash, Kibana). And to use service providers who implement them - like ESONO AG.

Contact us For more information on building a data lake, for which we offer different implementation concepts from mid-range to the enterprise segment.