Databricks' New Contribution to Apache Spark Community
In a significant move for data engineering, Databricks has unveiled its commitment to open-source development by donating its core declarative ETL framework to the Apache Spark™ ecosystem. This framework, now known as
Apache Spark™ Declarative Pipelines, aims to address some of the most pressing challenges faced by data engineers and analysts alike. The announcement was made during the
Data + AI Summit held in San Francisco on June 11, 2025.
Databricks, well-known as a leader in the Data and AI sector, has seen
Apache Spark™ reach an incredible milestone of 2 billion downloads, coinciding with the recent release of
Apache Spark 4.0. This initiative is reflective of Databricks' long-standing dedication to creating open ecosystems that grant users the freedom and control over their data projects, minimizing vendor lock-in.
The
Declarative Pipelines are designed to simplify the process of defining and executing data pipelines for both batch and streaming ETL workloads. It supports a variety of data sources recognized by Apache Spark, such as cloud storage, message buses, and external systems. By utilizing this framework, engineers can navigate complex data workflows with ease and efficiency.
Key Features of Spark Declarative Pipelines
Databricks' framework is built on the foundation of their original declarative ETL model, which has been successfully employed by thousands of customers across various sectors. It not only tackles the complexities of data engineering workloads but also positions itself as a robust solution for low-latency streaming tasks. This new offering allows more engineers and data specialists to enhance productivity by significantly reducing development time and operational costs. Key benefits include:
- - Simplified Pipeline Authoring: Data engineers and analysts now have the capability to declare sophisticated pipelines with minimal coding, allowing them to concentrate on analytics that drive business decisions.
- - Enhanced Operability: Spark Declarative Pipelines are designed to detect issues earlier during the development stage through meticulous pipeline definitions. This proactive approach minimizes downstream errors and streamlines the troubleshooting process, making maintenance easier.
- - Unified Batch and Streaming Processing: This framework supports both real-time and regular data processing needs through a single API, enabling teams to efficiently manage and maintain their data pipelines.
Community Feedback and Support
Industry professionals have expressed enthusiasm about the open-sourcing of the framework. Jian (Miracle) Zhou, Senior Engineering Manager at Navy Federal Credit Union, emphasized that “declarative pipelines simplify the complexity of modern data engineering with an intuitive programming model.” Meanwhile, Brad Turnbaugh, Senior Data Engineer at 84.51°, mentioned that the declarative approach significantly reduces the volume of code that teams must manage, particularly as they adopt more open tools in their data operations.
As a pioneer behind various open-source initiatives such as
Apache Spark,
Delta Lake, and
MLflow, Databricks remains steadfast in its pursuit of fostering innovation in data science and engineering. The introduction of Spark Declarative Pipelines is not just a contribution to the community; it represents a step towards a more inclusive and cooperative environment where enterprises can build high-quality data pipelines seamlessly.
Databricks: A Commitment to Innovation
Founded by the creators of the lakehouse architecture, Databricks serves over
15,000 organizations globally, including prominent names such as Block, Comcast, and more than 60% of the
Fortune 500. With its headquarters in San Francisco, Databricks continues to deliver cutting-edge solutions that help businesses harness the full potential of their data through AI-driven insights.
For those interested in learning more about Databricks or the newly launched Apache Spark™ Declarative Pipelines, further information can be found by following Databricks on social media platforms like X, LinkedIn, and Facebook.