If you have important files, you need to backup them before it's too late. Cloud drives, such as OneDrive, Google Drive, or pCloud, are great for this. However, this is not the perfect solution. And you may want to have another copy of your data.rclone is a free tool that can synchronize a local di
In the last post, I talked about how the simdcomp library is actually just doing bit-packing. Given a set of numbers, it will put the first N bits in a packed format. That is all. On its own, that isn’t very useful, we have to pair it with something else to make it do something more interesting.
Previously, I talked about delta compression, what if we put both of those together?
Given a sorted list of numbers, we’ll compute the delta between the numbers, then we can figure out how many bits we need to store them, and we can pack that tightly.
Analyzing the sample posting list I’m using, we get the following batches (of 128 numbers each)
328 batches at 13 bits
11 batches at 22 bits
7 batches at 23 bits
3 batches as 27 bits
1 batch at 31 bits
1 batch at 28 bits
Doing the math, we can fit all of those using 77Kb or so.
The idea is that for each batch, we can use simdpack() to write just the relevant bits. We’ll need some extra metadata to keep track of the number of bits per batch, but that is just one byte per every 128 values. For that matter, we can bit-pack that as well .
Note that in this case, we are using the simd packing routines just for bit packing. The actual compression aspect comes not from the packing, but from computing the deltas and then figuring out how densely we can pack those values.
This approach is called Frame Of Reference (FOR), and it has some really nice advantages. By splitting the data into blocks, we can take advantage of the patterns in the data. Here are the first 16 deltas in the posting list:
Value
Bits
[0]
1871143144
31
[1]
4
3
[2]
4
3
[3]
4
3
[4]
4
3
[5]
4
3
[6]
4
3
[7]
4
3
[8]
4
3
[9]
4
3
[10]
4
3
[11]
7984
13
[12]
4
3
[13]
4
3
[14]
4
3
[15]
4
3
Let’s assume that the batch size is 4 numbers, what would be the numbers here? Basically, we take the max number of bits for each batch of 4 numbers.
The first block will require us to use 31 bits, the second would be just 3 bits, but the third would require 13 bits and the last is 3 bits again.
The idea is that for each block, we have a separate frame of reference, the number of bits that are required for us to record & retrieve the data.
Of course, if you’ll look at the data itself, we are wasting quite a lot of space here needlessly. There are just two numbers (0, 11) that require more than 3 bits.
There is an algorithm called Patched Frame Of Reference (PFor) that handles this. Basically, when we compute a batch, we’ll record it using 3 bits only in this case. What about the rest of the values? Those we’ll store outside the batch, as an exception. On reading, we’ll need to patch the data to make it whole.
That adds a bit of complexity, but it also means that you can compress the data a lot more densely.
In the next post, I’m going to be talking about FastPFor, which combines simd bit packing, patched frame of references and some really clever tricks to get a very interesting integer compression algorithm.
In the previous post, I showed how you can use integer compression using variable-size integers. That is a really straightforward approach for integer compression, but it isn’t ideal. To start with, it means that we take at least one byte for each number, but in many cases, especially if we are using delta encoding, we can pack things a lot more tightly. For reference, here are the first 15 records we have in the posting list I’m using for testing, delta encoded:
1871143144 4 4 4 4 4 4 4 4 4 4 7984 4 4 4
See the range of 4, repeating. Each one of them can be represented by 3 bits, but we’ll waste 32 – 64 bits to hold it.
When you are searching for data on “database stuff”, the building blocks that are being used to build databases, you’ll usually find Lemire there. In this case, I would like to talk about the SimdComp library. From the GitHub repository:
A simple C library for compressing lists of integers using binary packing and SIMD instructions. This library can decode at least 4 billion of compressed integers per second on most desktop or laptop processors.
Those are some really nice numbers, but I actually have a pet peeve with the name of the library. It isn’t actually doing compression, what it does is bit packing, which is something quite different.
Bit packing just takes numbers that are 32 bits in length and stores them using fewer bits. For example, let’s consider the following list: 1,2,3,4,5,6,7,8,9,10. If I would store that in 32 bits integers, that would take 40 bytes. But given that the maximum size is 10, I can use 4 bits integers, instead. Taking only 5 bytes.
The core of the library is this set of routines:
simdpack()
simdpackwithoutmask()
simdunpack()
All of them have roughly the same structure, something like this:
This switch statement goes on for 32 bits. Each time selecting a different function. This code makes a lot of assumptions. You’ll always give it exactly an input of 128 numbers and it will pack them and write them to output. The idea is to be as compact as you possibly can.
Reading this code is.. a challenge, because there is so much mental weight behind it, or so it seems. For example, take a look at how we pack a set of numbers into 2-bit integers:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
static void __SIMD_fastpack2_32(const uint32_t *_in, __m128i *out) {
const __m128i *in = (const __m128i *)(_in);
__m128i OutReg;
const __m128i mask = _mm_set1_epi32((1U << 2) - 1);
__m128i InReg = _mm_and_si128(_mm_loadu_si128(in), mask);
OutReg = InReg;
InReg = _mm_and_si128(_mm_loadu_si128(++in), mask);
OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 2));
InReg = _mm_and_si128(_mm_loadu_si128(++in), mask);
OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 4));
InReg = _mm_and_si128(_mm_loadu_si128(++in), mask);
// redacted
view raw
__SIMD_fastpack2_32.c
hosted with ❤ by GitHub
The actual code goes on for quite a while (104 of dense SIMD code), so let’s try to break it up a bit. If we’ll translate this from SIMD to scalar, we’ll have code like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
uint mask = (1u << 2) -1;
var input = *in++ & mask;
var output = input;
input = *in++ & mask;
output |= input << 2;
input = *in++ & mask;
output |= input << 4;
view raw
scalar.c
hosted with ❤ by GitHub
If you are used to manipulating bits, it’s fairly straightforward. We start by setting the output to the first value, then the second value was shifted by 2 bits and added to the value, then the third value was shifted by 4 bits, etc.
With SIMD, the interesting part is that we aren’t operating on a single value, but 4 at a time. That also means that we have a difference in how the data actually end up being packed.
Given a list of values from 1 .. 16, bit packing them will usually result in the data looking like this:
In contrast, the SIMD bit packing model will store the data like so:
This is a result of operating on 4 items at a time, but in practice, the actual placement of the bits does not matter much, just that we are able to store and retrieve them.
Note that for packing, we have two options, one that uses a mask and the other that does not. That will be important later. The only difference between those options is whether a mask is applied before we splice the bits into the output, nothing else.
The reason I said that this isn’t compression is that this library is focused solely on packing the bits, it has no behavior / concept beyond that stage. In the next post, we’ll start looking into how we can make use of this in more interesting ways.
If you are building a database, the need to work with a list of numbers is commonplace. For example, when building an index, we may want to store all the ids of documents of orders from Europe.
You can just store the raw values, of course, but that can be incredibly wasteful in terms of disk space, memory bandwidth and performance. Consider a database where millions of orders are from Europe. We’ll need multiple megabytes just to store the document ids. We can utilize patterns in the numbers to reduce the total storage size. For example, if you are storing a list of sorted numbers, you can store the delta of the numbers, instead of the whole value. The delta tends to be much smaller, after all. Take a look at the following code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
static void DeltaEncode(List<long> sortedList)
{
if (sortedList.Count == 0)
return;
long prev = sortedList [0];
for (int i = 1; i < sortedList.Count; i++)
{
long current = sortedList [i];
sortedList [i] = current - prev;
prev = current;
}
}
static void WriteListToStream(List<long> deltas, Stream outputStream)
{
BinaryWriter writer = new BinaryWriter(outputStream);
foreach (var num in deltas)
{
writer.Write7BitEncodedInt64(num);
}
}
view raw
delta_encode.cs
hosted with ❤ by GitHub
This will generate (in place) delta for the sorted list. We can then write it to a stream, using 7-bit integer (also called variable size integers) encoding. This way, smaller numbers take less space, which is the point of first doing delta encoding.
For testing purposes, I have a list of 44,956 items, which looks like this (full list is here):
1871143144
1871143148
1871143152
1871143156
1871143160
1871143164
1871143168
1871143172
1871143176
1871143180
Note that I’m using int64, so if we would store the data as is, it would take ~351KB. Using delta encoding, we can pack that into just ~45Kb.
I care a lot more about the decoding of the values, so let’s see how we read it back, shall we?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
static List<long> ReadListFromStream(Stream inputStream)
{
var nums = new List<long>();
var reader = new BinaryReader(inputStream);
while (reader.BaseStream.Position < reader.BaseStream.Length)
{
nums.Add(reader.Read7BitEncodedInt64());
}
return nums;
}
static void DeltaDecode(List<long> list)
{
for (int i = 1; i < list.Count; i++)
{
list[i] = list[i - 1] + list[i];
}
}
view raw
delta_decode.cs
hosted with ❤ by GitHub
With very little code, we can gain quite a jump in information density. That is quite good. For reference purposes, taking the same data and compressing it using GZip would cost us ~58KB, so significantly more than this simpler mechanism.
On the other hand, using GZip on the delta encoded list will put us under 8KB. This is because in this context, there is a great deal of repetition in the data, there are many runs that are exactly 4 values apart from one another, so we can get quite a good compression ratio out of this.
That does require that we’ll have patterns in the data, which is by no means a guarantee. There is a whole field out there of research into integer compression, since it is such an important aspect of how database internals are working.
In this series, I’m going to focus on detailing how we implemented FastPFor inside of RavenDB, what we learned along the way and what the final results were.
Configuration in ASP.NET Core is performed using one or more configuration providers. Configuration providers read configuration data from key-value pairs using a variety of configuration sources such as the appsettings.json file, environment variables, vaults and so on. All these sources are merge
.NET Roadmap The latest version of .NET is 7. The current roadmap is for .NET 8. .NET 8 is scheduled to ship in November 2023 and will be an…Keep Reading →
TipA newer version of this post is available: Avoid HSTS issues on localhostHttp Strict Transport Security (HSTS) is a security mechanism that instructs the browser to automatically redirect http requests to https before sending a request to the server. When you are developing a web application, yo