What is a data warehouse
A data warehouse (DW), also called an enterprise data warehouse (EDW), is the most common database for traditional business intelligence (BI) reporting and dashboards.
Unlike a traditional relational database (RDBMS) - like Oracle, SQL Server, DB2 or MySQL - which are designed to support transactions and other write-heavy workloads, a data warehouse is designed to support analytics, which are mostly reads.
Challenges
Most data warehouses are great at batch-loading data and supporting more traditional reporting and dashboards for small groups of analysts. That’s what they were designed for.
Everything else? Not so much.
They do not support interactive or ad hoc analytics because they cannot query large data sets in 1 second or less. Most cannot scale to petabytes or support hundreds of concurrent users unless they’re built on a decoupled storage and compute architecture. Even those do not support semi-structured data well. They also don’t usually support streaming analytics or continuous updates at any large scale.
Traditional data warehouses also make your BI relatively inflexible to change. If your ETL, or data integration, is done straight from the source applications to the data warehouse, then whenever you need to add new data, it can take weeks to months to make changes. It’s partly because 80% of the work is typically spent on data integration and improving data quality and consistency. This ETL is often done by a different team from the data warehouse or analytics teams. That’s a big problem because analysts often need new data for new reports in days.
Benefits
Most data warehouses can replace some of your spreadsheets and other inconsistent reports with a single source of the truth.
When it comes to performance, there are also some data warehouses that are fast enough to support ad hoc analytics, petabyte-scale analytics, even semi-structured analytics.
You can also be more agile, and new reports built faster if you implement a combination of a data lake and data warehouse. The ETL team still does the hard work, but they now move their data into the data lake. If they do that, analysts can get any data they need for new reports from the lake. If they are able to do this on their own, then they get new reports done in days.
Just don’t make replacing all Excel a goal. It is still the most popular BI tool, for some really good reasons. Make it easier to get the right data into it like any other BI tool.
Data warehouse vs database
I data warehouse is a type of database management system (DBMS.)
But there are a few good rules for distinguishing between data warehouses and other databases.
- Many databases are for recording (ACID) transactions and performing specific lookups. They are designed to guarantee that you never lose a write. A data warehouse is for making read-only queries as fast as possible. Those queries are almost always done in SQL.
- Most good data warehouses use columnar storage at the very least. If it doesn’t, it may not be the best data warehouse for you.
Unlike RDBMS row-based storage, which is suited for updates, data warehouse storage is usually columnar, where each column of a table is stored and compressed sequentially on disk. This makes for much faster lookups and scans across disk since you often search on one column. But it usually makes writes really costly and (ACID) transactions nearly impossible. In each case, the compute is designed together with the storage to improve performance and scalability for the specific function.
In other words do not try to use relational databases as data warehouses, or data warehouses for transactions.
Firebolt data warehouse
Firebolt provides the fastest cloud data warehouse with the performance to support ad hoc and high-performance analytics at scale, as well as semi-structured data analytics, not just traditional reporting and dashboards. Its modern decoupled storage and compute not only improves scalability and administration. It also allows companies to easily support high user concurrency.