Understand the fundamentals of Delta Lake Concept

You might be hearing a lot about Delta Lake nowadays. Yes, it is because of it’s introduction of new features which was not there in Apache Spark earlier. Why is Delta Lake?If you check here, you can understand that Spark writes are not atomic and data consistency is not guaranteed. When metadata itself becomes big data it is difficult to manage data. If you have worked on lambda architecture you would understand how painful it is to have same aggregations to both hot layer as well as cold layer differently. Sometimes you make unwanted writes to a table or a location and it overwrites existing data then you wish to go back to your previous state of data which was tedious task. What is Delta Lake?Delta Lake is a project that was developed by Databricks and now open sourced with the Linux Foundation project. Delta Lake is an open source storage layer that sits on top of the Apache Spark services which brings reliability to data lakes. Delta Lake provides new features which includes ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is ...


Read More on Datafloq

Comments

Popular posts from this blog

Underwater Autonomous Vehicles Helping Navy Get More for the Money 

Canada regulator seeks information from public on Rogers-Shaw deal