skip to content
Relatively General .NET

Integer compression

by Oren Eini

posted on: June 13, 2023

The FastPFor is an integer compression algorithm that was published in 2012 initially. You can read the paper about it here: Decoding billions of integers per second through vectorization. I’ve run into this algorithm many times in the past. You pretty much can’t be in the database arena and not run into that. It is an interesting paper, and it has a GitHub repository with the code, which is great. Except that I couldn’t figure out what was going on there. I actually tried stepping through the code a bunch of times, and I always ended up getting lost. The code is in C++ and makes heavy use of templates, containers and non-trivial amounts of magic. To give some context, I gave up when I run into these code snippers: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters /*assumes that integers fit in the prescribed number of bits */ void __fastpackwithoutmask6(const uint32_t *__restrict__ in, uint32_t *__restrict__ out) { Unroller<6>::PackNoMask(in, out); } /*assumes that integers fit in the prescribed number of bits */ void __fastpackwithoutmask7(const uint32_t *__restrict__ in, uint32_t *__restrict__ out) { Unroller<7>::PackNoMask(in, out); } static void __SIMD_fastpackwithoutmask7_32(const uint32_t *__restrict__ _in, __m128i *__restrict__ out) { const __m128i *in = reinterpret_cast<const __m128i *>(_in); __m128i OutReg; __m128i InReg = _mm_loadu_si128(in); OutReg = InReg; InReg = _mm_loadu_si128(++in); OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 7)); InReg = _mm_loadu_si128(++in); OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 14)); InReg = _mm_loadu_si128(++in); OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 21)); InReg = _mm_loadu_si128(++in); OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 28)); _mm_storeu_si128(out, OutReg); ++out; OutReg = _mm_srli_epi32(InReg, 7 - 3); InReg = _mm_loadu_si128(++in); // this method goes on for another 100 lines... } view raw wow.cpp hosted with ❤ by GitHub The paper itself describes the algorithm, but not in a way that actually made sense to me. I tried looking at the code, and I basically noped out. Too complex for me to understand, it seems. But I kept running into this, and we recently had a strong need for something similar. So I basically took the time to meditate on this for a while. On a personal level, I realized that I was trying to understand how the whole thing worked by just stepping through the code and reading the paper. The problem was that I was mixing several different layers and wasn’t able to create the proper separation between them. FastPFor builds upon the previously discussed simdcomp, once you understand that, you can go through this in isolation. For this reason, I first talked about SIMD bit-packing and showed an example usage, before actually diving into how FastPFor itself works. As a reminder, simdcomp provides routines for quick packing and unpacking of integers, nothing more. FastPFor is where the actual compression happens. Let’s see what is actually going on here. Consider the following list, which we previously discussed: 1871143144 4 4 4 4 4 4 4 4 4 4 7984 4 4 4 4 If we’ll try bit-pack this list, we’ll find out that we aren’t gaining much. The first value is 31 bits in length, after all, and that means that we aren’t able to save much. We talked about Frame of Reference (FOR) as a way to handle that. We can treat every128 block of numbers as a block that has its own reference point and compute the maximum number of bits for that location. That would actually save us a lot of space, but it isn’t ideal. In the case above, most entries in the list can be packed in just 3 bits, except 2. That is where PFor comes into play, which stands for Patched Frame of Reference. The key aspects of FastPFor are how it figures out what is the best size to use to pack the numbers, the way it detects what should be patched, and the manner in which it stores this. The code boils down to a single function: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters void getBestBFromData(const uint32_t *in, uint8_t &bestb, uint8_t &bestcexcept, uint8_t &maxb) { uint32_t freqs[33]; for (uint32_t k = 0; k <= 32; ++k) freqs[k] = 0; for (uint32_t k = 0; k < BlockSize; ++k) { freqs[asmbits(in[k])]++; } bestb = 32; while (freqs[bestb] == 0) bestb--; maxb = bestb; uint32_t bestcost = bestb * BlockSize; uint32_t cexcept = 0; bestcexcept = static_cast<uint8_t>(cexcept); for (uint32_t b = bestb - 1; b < 32; --b) { cexcept += freqs[b + 1]; uint32_t thiscost = cexcept * overheadofeachexcept + cexcept * (maxb - b) + b * BlockSize + 8; // the extra 8 is the cost of storing maxbits if (thiscost < bestcost) { bestcost = thiscost; bestb = static_cast<uint8_t>(b); bestcexcept = static_cast<uint8_t>(cexcept); } } } view raw getBestBFromData.cpp hosted with ❤ by GitHub What is going on here? This function accepts an array of integers (whose size is 128) and computes the best number of bits to use to pack it. How does that work? The first thing this does is compute how many numbers we have for each bit range. This is done using the asmbits() call, which is basically a LeadingZeroCount(). The end result is a frequency map that we can use to estimate how many bits will be required to store the items from this block. We start from the maximum number of bits that we need to store all the numbers in the block, and we go down, checking what would be the best cost (in terms of space). Since we need to store exceptions somewhere, we compute the cost of that as we go down in the list. This function gives us: bestb – the number of bits needed to pack all the items in the block that would result in the smallest output size bestcexception – the count of exceptions (numbers that do not fit into bestb bits maxb – the maximum number of bits that we need to store for this block That is the key function, the one that is responsible for getting the best compression ratio. What I found that made FastPFor challenging to understand is that it has a lot of state that you need to keep track of. What I described so far is a single block, but FastPFor operates over sets of numbers, and so needs to tie all of them together. At any given point, FastPFor has: The block that is being outputted The metadata about the entire process 32 arrays of exceptions The process interleaves all of them together in interesting ways, and I had a hard time figuring out how it all ties together. Let’s talk about the way FastPFor processes a single block. We process the data in the array in blocks of 128 numbers at a time, like so: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters for (const uint32_t *const final = in + length; (in + BlockSize <= final); in += BlockSize) { uint8_t bestb, bestcexcept, maxb; getBestBFromData(in, bestb, bestcexcept, maxb); *bc++ = bestb; *bc++ = bestcexcept; if (bestcexcept > 0) { *bc++ = maxb; std::vector<uint32_t, cacheallocator> &thisexceptioncontainer = datatobepacked[maxb - bestb]; const uint32_t maxval = 1U << bestb; for (uint32_t k = 0; k < BlockSize; ++k) { if (in[k] >= maxval) { // we have an exception thisexceptioncontainer.push_back(in[k] >> bestb); *bc++ = static_cast<uint8_t>(k); } } } out = packblockupsimd(in, out, bestb); } view raw ProcessBlock.cpp hosted with ❤ by GitHub The first thing that happens is the computation of the best bits widths for the current block. Then, we use the bc value to record the metadata about the block. That is an array of bytes with the following format: Packed number of bits Number of exceptions If there are exceptions for this block, we also add: Maximum number of bits Offset of exception in the block (repeated as the number of exceptions) The metadata is shared for the entire encoding operation, across blocks. You can see in the code that if bestcexcept (which holds the number of exceptions) is greater than zero, we find the appropriate exceptions buffer to use. That requires a better explanation. the getBestBFromData() call gave us two important data points, the best number of bits to pack the numbers, and the maximum number. We are going to pack all the numbers in the block to the output, but what about the exceptions? Where do they go? It turns out that we need to store the exceptions, but we don’t actually need to store max bits, only the difference between the best number of bits and the maximum. This is what thisexceptioncontainer is holding, the remainding bits for exceptions. It’s important to understand that this is done across blocks. So the thisexceptioncontainer value holds exceptions from many different blocks. That will turn out to be very important in just a little bit. We then scan the block and write the remainding bits to the container, and write to the metadata the offset of the value in the block. Since we are using blocks of 128 numbers, this is ensured to fit inside a byte. The last step that we do for the block is to call: packblockupsimd(), this ends up calling to simdpack() and packs all the numbers from the block to bestb bits in the output. It’s really important to understand that we still have two data items that haven’t been written. The first is the metadata for the encoding process (bits in blocks, exceptions offsets, etc). The second is the set of exceptions themselves. This process repeats itself for each block, until the end of the buffer. It is at this point that we need to write the remaining data to the output. Here is what the code looks like: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters const uint32_t bytescontainersize = static_cast<uint32_t>(bc - &bytescontainer[0]); *(out++) = bytescontainersize; memcpy(out, &bytescontainer[0], bytescontainersize); uint8_t* pad8 = (uint8_t*)out + bytescontainersize; out += (bytescontainersize + sizeof(uint32_t) - 1) / sizeof(uint32_t); while (pad8 < (uint8_t*)out) *pad8++ = 0; // clear padding bytes uint32_t bitmap = 0; for (uint32_t k = 2; k <= 32; ++k) { if (datatobepacked[k].size() != 0) bitmap |= (1U << (k - 1)); } *(out++) = bitmap; for (uint32_t k = 2; k <= 32; ++k) { if (datatobepacked[k].size() > 0) out = packmeupwithoutmasksimd(datatobepacked[k], out, k); } view raw final.cpp hosted with ❤ by GitHub What is going on? This is dense, and it takes a while to unpack (pun intended) what is going on here. First, we write the size of the metadata and then we copy the metadata that we have to the output. That is the description of all the blocks that we just went over. Then, we run over the set of exceptions that we gathered. Remember, the datatobepacked is an array that holds lists of exception data for each bit range that we have to store. We iterate over all the bit widths where we wrote exception data and generate a bitmap. That will help us later during the decoding process. Finally, we run over the same exceptions and write them out. Note that we do that using the same simd bit-packing approach as the rest of the code. The end result is that we have the following format: For decoding, we start from the metadata offset and jump to that. Then we read the exceptions bitmap. That tells us how many exceptions we have and how many bits we have in each one of them. In the image above, you can see that we have 3 exceptions list, for 4 bits, 12 bits and 19 bits. We use the simd bit-packing to decode those values into memory. We’ll need them shortly.  Now, we start iterating over the metadata, each block has an overhead of minimum two metadata bytes (number of bits and number of exceptions). Let’s assume that we don’t have any exceptions in the block. In that case, the process of decoding is simple. Just do the unpacking from bits to numbers and you are done. If we have exceptions, on the other hand, we have to deal with them. At that point, the next metadata byte would contain the maximum number of bits for the exceptions in this block. Using that and the normal number of bits, we can tell where the extra bits are located. Here is the relevant code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters for (uint32_t run = 0; run < nvalue / BlockSize; ++run, out += BlockSize) { const uint8_t b = *bytep++; const uint8_t cexcept = *bytep++; in = unpackblocksimd(in, out, b); if (cexcept > 0) { const uint8_t maxbits = *bytep++; if (maxbits - b == 1) { for (uint32_t k = 0; k < cexcept; ++k) { const uint8_t pos = *(bytep++); out[pos] |= static_cast<uint32_t>(1) << b; } } else { std::vector<uint32_t, cacheallocator>::const_iterator &exceptionsptr = unpackpointers[maxbits - b]; for (uint32_t k = 0; k < cexcept; ++k) { const uint8_t pos = *(bytep++); out[pos] |= (*(exceptionsptr++)) << b; } } } } view raw decoding.cpp hosted with ❤ by GitHub Note that there is a special case here if the difference in the number of bits between the block bits and the maximum number of bits is one. In that case, we don’t need to store the data, we already know what the value is then (it’s one, after all). So we can compute it once and set it in the output. For scenarios where we can’t tell from the bit width along, we read the relevant exception array based on the difference between the bits in the block and the exceptions bits. That gives us an array that is shared across blocks. The idea is that the metadata contains the offset in the block, and we read from the relevant array one item at a time. This two-step process will result in us setting the right value and getting ready for the next call, which may be done in a different block. Note that the exception bits buffers only care about the number of bits, not where they come from. So two blocks, which have (bestb: 4, maxb: 9) and (bestb: 7: maxb: 10) will both go to the 3 bits container. Okay, this blog post is getting long enough, so I think that would be it. Hopefully, it will make it easier to understand exactly how the FastPFor format is working. I’m going to be talking more about FastPFor and the format implications in the next post, then move on to actually using that in C#.

Supporting legacy browsers and SameSite cookies without UserAgent sniffing in ASP.NET Core.

by Andrew Lock

posted on: June 13, 2023

In this post I explore one way to get ASP.NET Core Identity SameSite cookies working with both legacy and modern browsers…

Integer compression

by Oren Eini

posted on: June 12, 2023

I talked a bit before about the nature of bit packing and how the simdcomp library isn’t actually doing compression. Why do I care about that, then? Because the simdcomp library provides a very useful building block. A simple set of functions that allows you to very quickly pack and unpack bits. That is important, since those operations are fast enough that we can treat them as effectively free. Once that exists, we can now start finding interesting usages for those. Let’s assume that we have the following functions: Pack(Span<uint> input, Stream output, int bit); Unpack(Stream input, Span<uint> output, int bit); It get a set of numbers and the number of bits and encode or decode them as needed. Because those methods are fast, it means that we can play with various interesting designs. For example, let’s take dictionary compression. Assume that we have a set of values, such as country names. This looks like this: [ "United States", "China", "Japan", "India", "United States", "Brazil", "Germany", "India", "France", "China", "Japan", "Brazil", … ] This array contains thousands or millions of entries. We want to encode that in an efficient manner, how can we do that? Here is a simple way to utilize the speed of bit-packing to do something useful: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters void Encode(List<string> countries, Stream output) { var nums = new List<int>(); var dic = new Dictionary<string, int>(); foreach (var country in countries) { if(dic.TryGetValue(country, out var index) == false) { dic[country] = index = dic.Count; } nums.Add(index); } var bits = 32 - BitOperations.LeadingZeroCount((uint)dic.Count); var br = new BinaryWriter(output); br.Write(dic.Count); foreach(var country in dic.Keys) { br.Write(country); } br.Write(nums.Count); Pack(nums, output, bits); } view raw DictionaryCompression.cs hosted with ❤ by GitHub What is going on here? We first iterate over the set of countries, finding the unique names and giving an index for each one of them. Along the way, we produce an integer array of the indexes of country names. Then we compute the maximum number of bits that we have to use to store those indexes. To the output, we write the number of unique country names, the names themselves and then the bit-packed set of indexes. Let’s assume that we have 30 unique values in this list, but the list itself is 5 million items in size. What is the actual cost here? We’ll assume an average country name of 8 bytes, to make the math easy. That means that to store the list of countries would cost us 38MB, if we did that using the most obvious approach. Using the code above, assuming 30 unique values, we’ll have 5 million values with 5 bits each, leading to a total cost of about 3MB, instead of 38MB. The nice thing about this approach is that once you have fast bit-packing routines, you can now start using them in all sorts of interesting ways. The reason I talked about the topic before starting to discuss FastPFor is that it makes a lot more sense to understand how bit-packing can be applied separately from FastPFor, since that is just making interesting use of the capability.

Using rclone to backup OneDrive or Google Drive

by Gérald Barré

posted on: June 12, 2023

If you have important files, you need to backup them before it's too late. Cloud drives, such as OneDrive, Google Drive, or pCloud, are great for this. However, this is not the perfect solution. And you may want to have another copy of your data.rclone is a free tool that can synchronize a local di

Talk

by Oren Eini

posted on: June 09, 2023

Integer compression

by Oren Eini

posted on: June 08, 2023

In the last post, I talked about how the simdcomp library is actually just doing bit-packing. Given a set of numbers, it will put the first N bits in a packed format. That is all. On its own, that isn’t very useful, we have to pair it with something else to make it do something more interesting. Previously, I talked about delta compression, what if we put both of those together? Given a sorted list of numbers, we’ll compute the delta between the numbers, then we can figure out how many bits we need to store them, and we can pack that tightly. Analyzing the sample posting list I’m using, we get the following batches (of 128 numbers each) 328 batches at 13 bits 11 batches at 22 bits 7 batches at 23 bits 3 batches as 27 bits 1 batch at 31 bits 1 batch at 28 bits Doing the math, we can fit all of those using 77Kb or so. The idea is that for each batch, we can use simdpack() to write just the relevant bits. We’ll need some extra metadata to keep track of the number of bits per batch, but that is just one byte per every 128 values. For that matter, we can bit-pack that as well . Note that in this case, we are using the simd packing routines just for bit packing. The actual compression aspect comes not from the packing, but from computing the deltas and then figuring out how densely we can pack those values. This approach is called Frame Of Reference (FOR), and it has some really nice advantages. By splitting the data into blocks, we can take advantage of the patterns in the data. Here are the first 16 deltas in the posting list:                       Value                         Bits   [0] 1871143144 31   [1] 4 3   [2] 4 3   [3] 4 3   [4] 4 3   [5] 4 3   [6] 4 3   [7] 4 3   [8] 4 3   [9] 4 3   [10] 4 3   [11] 7984 13   [12] 4 3   [13] 4 3   [14] 4 3   [15] 4 3 Let’s assume that the batch size is 4 numbers, what would be the numbers here? Basically, we take the max number of bits for each batch of 4 numbers. The first block will require us to use 31 bits, the second would be just 3 bits, but the third would require 13 bits and the last is 3 bits again. The idea is that for each block, we have a separate frame of reference, the number of bits that are required for us to record & retrieve the data. Of course, if you’ll look at the data itself, we are wasting quite a lot of space here needlessly. There are just two numbers (0, 11) that require more than 3 bits. There is an algorithm called Patched Frame Of Reference (PFor) that handles this. Basically, when we compute a batch, we’ll record it using 3 bits only in this case. What about the rest of the values? Those we’ll store outside the batch, as an exception. On reading, we’ll need to patch the data to make it whole. That adds a bit of complexity, but it also means that you can compress the data a lot more densely. In the next post, I’m going to be talking about FastPFor, which combines simd bit packing, patched frame of references and some really clever tricks to get a very interesting integer compression algorithm.

Integer compression

by Oren Eini

posted on: June 07, 2023

In the previous post, I showed how you can use integer compression using variable-size integers. That is a really straightforward approach for integer compression, but it isn’t ideal. To start with, it means that we take at least one byte for each number, but in many cases, especially if we are using delta encoding, we can pack things a lot more tightly. For reference, here are the first 15 records we have in the posting list I’m using for testing, delta encoded: 1871143144 4 4 4 4 4 4 4 4 4 4 7984 4 4 4 See the range of 4, repeating. Each one of them can be represented by 3 bits, but we’ll waste 32 – 64 bits to hold it. When you are searching for data on “database stuff”, the building blocks that are being used to build databases, you’ll usually find Lemire there. In this case, I would like to talk about the SimdComp library. From the GitHub repository: A simple C library for compressing lists of integers using binary packing and SIMD instructions. This library can decode at least 4 billion of compressed integers per second on most desktop or laptop processors. Those are some really nice numbers, but I actually have a pet peeve with the name of the library. It isn’t actually doing compression, what it does is bit packing, which is something quite different. Bit packing just takes numbers that are 32 bits in length and stores them using fewer bits.  For example, let’s consider the following list: 1,2,3,4,5,6,7,8,9,10. If I would store that in 32 bits integers, that would take 40 bytes. But given that the maximum size is 10, I can use 4 bits integers, instead. Taking only 5 bytes. The core of the library is this set of routines: simdpack() simdpackwithoutmask() simdunpack() All of them have roughly the same structure, something like this: This switch statement goes on for 32 bits. Each time selecting a different function. This code makes a lot of assumptions. You’ll always give it exactly an input of 128 numbers and it will pack them and write them to output. The idea is to be as compact as you possibly can. Reading this code is.. a challenge, because there is so much mental weight behind it, or so it seems. For example, take a look at how we pack a set of numbers into 2-bit integers: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters static void __SIMD_fastpack2_32(const uint32_t *_in, __m128i *out) { const __m128i *in = (const __m128i *)(_in); __m128i OutReg; const __m128i mask = _mm_set1_epi32((1U << 2) - 1); __m128i InReg = _mm_and_si128(_mm_loadu_si128(in), mask); OutReg = InReg; InReg = _mm_and_si128(_mm_loadu_si128(++in), mask); OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 2)); InReg = _mm_and_si128(_mm_loadu_si128(++in), mask); OutReg = _mm_or_si128(OutReg, _mm_slli_epi32(InReg, 4)); InReg = _mm_and_si128(_mm_loadu_si128(++in), mask); // redacted view raw __SIMD_fastpack2_32.c hosted with ❤ by GitHub The actual code goes on for quite a while (104 of dense SIMD code), so let’s try to break it up a bit. If we’ll translate this from SIMD to scalar, we’ll have code like this: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters uint mask = (1u << 2) -1; var input = *in++ & mask; var output = input; input = *in++ & mask; output |= input << 2; input = *in++ & mask; output |= input << 4; view raw scalar.c hosted with ❤ by GitHub If you are used to manipulating bits, it’s fairly straightforward. We start by setting the output to the first value, then the second value was shifted by 2 bits and added to the value, then the third value was shifted by 4 bits, etc. With SIMD, the interesting part is that we aren’t operating on a single value, but 4 at a time. That also means that we have a difference in how the data actually end up being packed. Given a list of values from 1 .. 16, bit packing them will usually result in the data looking like this: In contrast, the SIMD bit packing model will store the data like so: This is a result of operating on 4 items at a time, but in practice, the actual placement of the bits does not matter much, just that we are able to store and retrieve them. Note that for packing, we have two options, one that uses a mask and the other that does not. That will be important later. The only difference between those options is whether a mask is applied before we splice the bits into the output, nothing else. The reason I said that this isn’t compression is that this library is focused solely on packing the bits, it has no behavior / concept beyond that stage. In the next post, we’ll start looking into how we can make use of this in more interesting ways.

Integer compression

by Oren Eini

posted on: June 06, 2023

If you are building a database, the need to work with a list of numbers is commonplace. For example, when building an index, we may want to store all the ids of documents of orders from Europe. You can just store the raw values, of course, but that can be incredibly wasteful in terms of disk space, memory bandwidth and performance. Consider a database where millions of orders are from Europe. We’ll need multiple megabytes just to store the document ids. We can utilize patterns in the numbers to reduce the total storage size. For example, if you are storing a list of sorted numbers, you can store the delta of the numbers, instead of the whole value. The delta tends to be much smaller, after all. Take a look at the following code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters static void DeltaEncode(List<long> sortedList) { if (sortedList.Count == 0) return; long prev = sortedList [0]; for (int i = 1; i < sortedList.Count; i++) { long current = sortedList [i]; sortedList [i] = current - prev; prev = current; } } static void WriteListToStream(List<long> deltas, Stream outputStream) { BinaryWriter writer = new BinaryWriter(outputStream); foreach (var num in deltas) { writer.Write7BitEncodedInt64(num); } } view raw delta_encode.cs hosted with ❤ by GitHub This will generate (in place) delta for the sorted list. We can then write it to a stream, using 7-bit integer (also called variable size integers) encoding. This way, smaller numbers take less space, which is the point of first doing delta encoding. For testing purposes, I have a list of 44,956 items, which looks like this (full list is here): 1871143144 1871143148 1871143152 1871143156 1871143160 1871143164 1871143168 1871143172 1871143176 1871143180 Note that I’m using int64, so if we would store the data as is, it would take ~351KB. Using delta encoding, we can pack that into just ~45Kb. I care a lot more about the decoding of the values, so let’s see how we read it back, shall we? This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters static List<long> ReadListFromStream(Stream inputStream) { var nums = new List<long>(); var reader = new BinaryReader(inputStream); while (reader.BaseStream.Position < reader.BaseStream.Length) { nums.Add(reader.Read7BitEncodedInt64()); } return nums; } static void DeltaDecode(List<long> list) { for (int i = 1; i < list.Count; i++) { list[i] = list[i - 1] + list[i]; } } view raw delta_decode.cs hosted with ❤ by GitHub With very little code, we can gain quite a jump in information density. That is quite good. For reference purposes, taking the same data and compressing it using GZip would cost us ~58KB, so significantly more than this simpler mechanism. On the other hand, using GZip on the delta encoded list will put us under 8KB. This is because in this context, there is a great deal of repetition in the data, there are many runs that are exactly 4 values apart from one another, so we can get quite a good compression ratio out of this. That does require that we’ll have patterns in the data, which is by no means a guarantee. There is a whole field out there of research into integer compression, since it is such an important aspect of how database internals are working. In this series, I’m going to focus on detailing how we implemented FastPFor inside of RavenDB, what we learned along the way and what the final results were.

Understanding SameSite cookies

by Andrew Lock

posted on: June 06, 2023

In this post I discuss SameSite cookies, what they are, why they're useful, and the limitations when you use them.…

Debug ASP.NET Core application configuration

by Gérald Barré

posted on: June 05, 2023

Configuration in ASP.NET Core is performed using one or more configuration providers. Configuration providers read configuration data from key-value pairs using a variety of configuration sources such as the appsettings.json file, environment variables, vaults and so on. All these sources are merge