Temporary Computed Columns: Difference between revisions
(Created page with "Most (or better: all) RDBMS know the concept of so-called „temporary tables“. These tables act as a buffer when complex computations or joins are made and the result has to be sorted afterwards. Temporary tables are also a big loss of performance in daily workloads: Let’s say a table consists of 1,000,000 items and you display them with ORDER BY … LIMIT 100. Because auf some complex join, the 1,000,000 items spanning sorted temptable has to be rendered first and...") |
m (Carli moved page Temporary Columns to Temporary Computed Columns) |
(No difference)
|
Latest revision as of 21:49, 17 May 2024
Most (or better: all) RDBMS know the concept of so-called „temporary tables“. These tables act as a buffer when complex computations or joins are made and the result has to be sorted afterwards.
Temporary tables are also a big loss of performance in daily workloads: Let’s say a table consists of 1,000,000 items and you display them with ORDER BY … LIMIT 100. Because auf some complex join, the 1,000,000 items spanning sorted temptable has to be rendered first and then cut to size. This may take minutes or hours instead of milliseconds.
MariaDB has a mechanism to cache those intermediate results, but this storage gets full very fast. The reason is that the whole table data has to be copied into the temp table.
Another drawback of temp table caches is their restricted condition scope. Only if the exact same query is executed repeatedly, a temp table can be reused. As soon as a WHERE clause changes, a new temp table will occupy the storage.
MemCPs temporary columns now solve both these issues.
Temporary columns use less memory since they only have to store the values of a single column (e.g. a SUM(x) over a partition). The rest of the columns can be reused from the base table where the temporary column is stored.
When there are more than 60,000 items, the computing is also parallelized.
Temporary columns can also be reused in other queries. Not only with exactly matching WHERE clauses, also with WHERE clauses that overlap each other in a BETWEEN AND range. That is why MemCP names temporary columns after there computation formula, so the column name is canonical.
Implementation of GROUP BY through Temporary Columns
As of April 13, MemCP now supports GROUP BY statements in SQL:
Your MySQL connection id is 3 Server version: MemCP linux Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [test]> create table t(a text, b int); Query OK, 0 rows affected (0,004 sec) MySQL [test]> insert into t values ('foo',1),('bar',2),('foo',3); Query OK, 0 rows affected (0,010 sec) MySQL [test]> select * from t; +------+------+ | a | b | +------+------+ | foo | 1 | | bar | 2 | | foo | 3 | +------+------+ 3 rows in set (0,001 sec) MySQL [test]> select a, sum(b) from t group by a; +------+-------------------+ | a | (aggregate b + 0) | +------+-------------------+ | foo | 4 | | bar | 2 | +------+-------------------+ 2 rows in set (0,004 sec) MySQL [test]>
The current implementation works as follows:
- Every set of GROUP BY is assigned a table containing a UNIQUE KEY over all grouped columns.
- At first, fill in the unique values of the group keys
- Then, create all aggregate columns as so-called „temporary computed columns“
- At last, just scan over the assigned table and display all desired columns