<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.memcp.org/index.php?action=history&amp;feed=atom&amp;title=Dictionary_Compression</id>
	<title>Dictionary Compression - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.memcp.org/index.php?action=history&amp;feed=atom&amp;title=Dictionary_Compression"/>
	<link rel="alternate" type="text/html" href="https://www.memcp.org/index.php?title=Dictionary_Compression&amp;action=history"/>
	<updated>2026-04-24T16:47:46Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.1</generator>
	<entry>
		<id>https://www.memcp.org/index.php?title=Dictionary_Compression&amp;diff=213&amp;oldid=prev</id>
		<title>Carli: Created page with &quot;Strings are the nightmare of every database. You never know their exact size and you have to either reference them in a separate storage or reserve enough character space to store them in-table.  Columnar databases can do better. But that&#039;s not guaranteed, you have to put a bit of extra effort into it. Here&#039;s how:  At first, memcp converts strings into dictionaries. If you have a large list consisting of [Male, Male, Male, Female, Male, Female, Male, Male], you only have...&quot;</title>
		<link rel="alternate" type="text/html" href="https://www.memcp.org/index.php?title=Dictionary_Compression&amp;diff=213&amp;oldid=prev"/>
		<updated>2025-08-22T19:06:13Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;Strings are the nightmare of every database. You never know their exact size and you have to either reference them in a separate storage or reserve enough character space to store them in-table.  Columnar databases can do better. But that&amp;#039;s not guaranteed, you have to put a bit of extra effort into it. Here&amp;#039;s how:  At first, memcp converts strings into dictionaries. If you have a large list consisting of [Male, Male, Male, Female, Male, Female, Male, Male], you only have...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Strings are the nightmare of every database. You never know their exact size and you have to either reference them in a separate storage or reserve enough character space to store them in-table.&lt;br /&gt;
&lt;br /&gt;
Columnar databases can do better. But that&amp;#039;s not guaranteed, you have to put a bit of extra effort into it. Here&amp;#039;s how:&lt;br /&gt;
&lt;br /&gt;
At first, memcp converts strings into dictionaries. If you have a large list consisting of [Male, Male, Male, Female, Male, Female, Male, Male], you only have two distinct values. So the dictionary can hold &amp;quot;Male,Female&amp;quot; and then you can reference them by integers (0,0,0,1,0,1,0,0).&lt;br /&gt;
&lt;br /&gt;
Then, these integers can be [[Integer Compression|Integer Compressed]]. And if that&amp;#039;s not enough, you can also add [[Sequence Compression]].&lt;br /&gt;
&lt;br /&gt;
With dictionary compression, you can achieve compression ratios of 10x and higher.&lt;br /&gt;
&lt;br /&gt;
Here&amp;#039;s an example from &amp;lt;code&amp;gt;(print (stat schema table))&amp;lt;/code&amp;gt;&lt;br /&gt;
 Shard 6254&lt;br /&gt;
 ---&lt;br /&gt;
 main count: 61440, delta count: 0, deletions: 0&lt;br /&gt;
  mode: &amp;#039;&amp;#039;&amp;#039;string-dict[1 entries; 5 bytes]&amp;#039;&amp;#039;&amp;#039;, size = 7.833KiB&lt;br /&gt;
  cntin: seq[176x int[20]/int[20]], size = 1.477KiB&lt;br /&gt;
  cntout: seq[187x int[6]/int[7]], size = 968B&lt;br /&gt;
  duration: int[15], size = 112.6KiB&lt;br /&gt;
  filters: seq[161x int[1]/int[2]], size = 680B&lt;br /&gt;
  append: seq[176x int[6]/int[7]], size = 928B&lt;br /&gt;
  p: &amp;#039;&amp;#039;&amp;#039;string-dict[3 entries; 43 bytes]&amp;#039;&amp;#039;&amp;#039;, size = 15.36KiB&lt;br /&gt;
  ---&lt;br /&gt;
  ---&lt;br /&gt;
  + insertions 40B&lt;br /&gt;
  + deletions 48B&lt;br /&gt;
  ---&lt;br /&gt;
 = total 140KiB&lt;br /&gt;
As you can see, the dictionary can store massive amounts of data (61k items) in 15.4 KiB for a 3-way dictionary. That&amp;#039;s 2 bits per item!&lt;/div&gt;</summary>
		<author><name>Carli</name></author>
	</entry>
</feed>