What is OLTP and OLAP

From MemCP
Jump to navigation Jump to search

There are basically two types database workloads leading to different database designs:

abbr. Online Transaction Processing Online Analytical Processing
Example applications ERP, Ticket System BI suite, Statistical programs
Typical SQL Queries large amounts of small UPDATE, INSERT few SELECT with big computation demand
Best Storage Format Row Store Columnar store
Database Engines MySQL, Postgres MonetDB
Users Many Small Users One Data Operator

OLAP databases use ETL (Extract, Transform, Load) processes to load data from an OLTP database. This allows for the data to be transformed into a format suitable for data analysis and reporting. The ETL process enables data to be extracted from the source database, transformed into the format desired for the analysis, and then loaded into the OLAP database.

However, wouldn’t it be nice to have a DBMS that can handle both, OLAP and OLTP workloads well?

Marrying both worlds

OLAP databases work best with columnar storage, which allows for large data sets to be stored and accessed quickly. This makes it ideal for data analysis and reporting. On the other hand, OLTP databases are optimized for record-wise data storage, which is better suited for read/write operations.

The ideal DBMS should be able to handle both OLAP and OLTP workloads well. It should be able to store and access data quickly and efficiently, while also being able to handle the read/write operations associated with OLTP workloads. It should also provide a robust set of features for data analysis and reporting.

So we have two kinds of data in the database:

  • A large set of old records where we want to do analysis on (called main storage) stored in the OLAP-optimized columnar storage
  • A small set of recent changes that came from the OLTP workloads (called delta storage) stored as a OLTP-optimized row storage

While the main storage can be compressed and optimized for read operations, the delta storage must be optimized for write operations.

So, the main storage is a column store while delta storage is a row store.

Whenever there is enough „old“ data in the delta storage, the main storage can be rebuilt from main storage + delta storage and delta storage can start empty again.

Of course, queries on the database have to consider both: delta and main storage.

See also

Shards, RecordIDs, Main Storage, Delta Storage