ToolData Pipeline

Delta Lake

Delta Lake is an open-source storage layer that brings ACID transactions, schema enforcement, and time travel to data lakes built on top of Apache Spark and cloud object stores. It enables reliable, scalable data pipelines and lakehouse architectures by ensuring data quality and consistency across batch and streaming workloads.

by Linux Foundation / Delta Lake ProjectOpenSource

Key Features

  • ACID transactions on data lakes using optimistic concurrency control
  • Schema enforcement and evolution for structured and semi-structured data
  • Time travel with versioned data for reproducibility and rollback
  • Unified batch and streaming processing on the same tables
  • Scalable metadata handling using transaction logs instead of file listing

Pricing

OpenSource

Delta Lake is open source under the Apache 2.0 license. Commercial support and managed offerings are available via Databricks and some cloud providers.