What Is a Data Lakehouse?

A data lakehouse sounds like a serene getaway, but it can be the key to improving efficiency and customer satisfaction.

Your team’s data lakehouse awaits.

Take a data lakehouse tour to see how this technology can help your organization break down silos and unlock efficiency.

Data lakehouse 101

Explore the basics of Salesforce Data Cloud, our customer data platform built on data lakehouse tech. This Trail is a helpful guide that breaks it all down clearly.

Experience the power of a data lakehouse.

When your customer data platform is powered by data lakehouse architecture, you can make sense of all your data streams. See how this technology can help you better serve your customers.

Data Lakehouse FAQ

A data lakehouse is a modern data architecture that uniquely combines the flexible, low-cost storage capabilities of a data lake with the robust data management features and ACID (Atomicity, Consistency, Isolation, Durability) transactions of a data warehouse.

It bridges the gap by offering the raw, diverse data storage of a data lake along with the structured schema, data quality, and query performance typically found in a data warehouse, creating a unified platform for all data needs.

Advantages include a simplified and unified data architecture, strong support for both traditional business intelligence and advanced AI/machine learning workloads, improved data quality and governance, and reduced data redundancy across systems.

Data lakehouses are primarily enabled by open table formats like Delta Lake, Apache Iceberg, and Apache Hudi. These technologies operate on top of scalable cloud object storage, adding transactional capabilities and schema enforcement to data lakes.

It provides a single, unified platform that can efficiently support a wide range of analytical workloads, from traditional business intelligence reporting and dashboards to more complex data science, machine learning model training, and AI applications.

Yes, data lakehouses are increasingly designed to handle real-time data streams and support immediate data activation. This capability allows organizations to derive up-to-the-minute insights crucial for operational analytics and live applications.