The following code takes a long time to run. In fact, I’m writing this blog post while this is running, and I’m not sure how long that will take.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
unsafe class Program
{
public class NumericValue
{
public string Name;
public object Value;
public override bool Equals(object obj)
{
return obj is NumericValue value &&
Name == value.Name &&
Equals(Value, value.Value);
}
public unsafe override int GetHashCode()
{
Span<byte> span;
if (Value is long l)
{
span = new Span<byte>(&l, sizeof(long));
}
else if (Value is int i)
{
span = new Span<byte>(&l, sizeof(int));
}
else
{
throw new NotSupportedException("Unknown value");
}
return (int)Standart.Hash.xxHash.xxHash32.ComputeHash(span,
span.Length, (uint)Name.GetHashCode());
}
}
static void Main(string[] args)
{
var dic = new Dictionary<NumericValue, int>();
var sp = Stopwatch.StartNew();
for (int i = 0; i < 100_000; i++)
{
dic.Add(new NumericValue
{
Name = "test",
Value = i
}, i);
}
Console.WriteLine(sp.ElapsedMilliseconds);
}
}
view raw
slow.cs
hosted with ❤ by GitHub
Update: This took over a minute to complete on my machine (which is pretty nice).The killer is that this code is correct, it does the right thing, but it can be slow. I stripped a much larger scenario to ~50 lines of code, can you see what is going on? And why?
In this talk, Oren Eini, founder of the RavenDB NoSQL database, will speak about how you can design and architect software systems that can easily scale to stupendous sizes, without causing either headache or pinching your wallet. It is obvious that if you chose the right algorithm for a particular task, you can gain tremendous benefits in terms of efficiency. What isn't as obvious is that the same apply to the global system architecture. In this talk, we aren't going to cover specific products or implementation. Rather, we'll cover architectural styles and approaches that withstood the test of time and allow you to create systems that are simple and cheap to operate while giving you the flexibility to grow. Arguably even more important, large complex systems require ongoing maintenance and care. We'll look into how we can provide insight and visibility into the innards of the system without putting too much of a strain on the development team. How we can enable operations features such as rolling deployment, recovery and audits at ease.
Our monitoring system pinged us about a problem with a RavenDB cluster running in production. The problem was simple, we saw quite a bit of server restarts for that particular cluster. Looking deeper, it was obvious that the RavenDB instances for the cluster would occasionally run out of memory and crash. The customer, by the way, was unaware of this issue. From their perspective, the RavenDB cluster would switch the primary node for the database in question on a regular basis. On our end, we could see that each node would start using higher and higher memory and end up dying because of that. They would be restarted, of course, and the primacy of the cluster would switch automatically, but that is not a proper way to run.
The problem was figuring out what was going on. It took some time to figure out what exactly was going on. We didn’t see any such behavior on any other customer, but this customer had two factors that affected the outcome. The first is that the database in question is encrypted, which means that RavenDB will need some place to put the decrypted values. The second is that the user is issuing streaming queries that have a lot of results. We were able to reproduce the high memory usage when issuing the same queries, however, we were utterly unable to reproduce the problem when trying to run it on our own machines.
That was… strange, and it took a while to figure out that we need to run on Linux to get the issue. We subjected the system to a very high load on Windows, with no issue. On Linux, it would be quickly apparent that we are consuming more and more memory. We were able to narrow things down to this call:
posix_memalign(&ptr, 4096, 8192);
What we are asking here is an 8KB buffer aligned on 4KB boundary. And we were leaking those like crazy but we couldn’t figure out how. We are pretty careful with manual memory management and we have the tools around to detect leaks. Each and every call to allocate was also freed. The problem is that we aren’t the only ones using the system. Basically, posix_memalign will use the same memory pool as malloc(). The problem is memory fragmentation, basically. The way posix_memalign() works is to issue:
Where nb is 8192 bytes, alignment is 4096 bytes and MINSIZE is 32 bytes. We then release the end of the buffer, which ends up being ~4KB or so in most cases. Along with other allocations, that created severe fragmentation in our memory.
We need the memory to be page aligned, because we use that for direct memory access. The memory is typically pooled, so we won’t be allocating and freeing it all the time, but when you use streaming queries, you may be running through a lot of data, so we exceeded the size of the pool. At that point, we would allocate (and free), but we’ll also fragment the memory.
We fixed the issue by using mmap() directly, which will give us page aligned memory and won’t cause us to use more memory than needed. Given that we get page aligned memory with is a multiple of page size, we can be sure that we’ll get reuse of the memory, instead of having to deal with internal fragmentation inside the malloc implementation. With this change, there are no issues, and we are actually slightly faster than before.
The reason we didn’t run into the same problem on Windows, by the way? There we called VirtualAlloc() from the get-go, which will ensure that we have page aligned memory, so no need to deal with fragmentation.
I often see code that uses File.Exists(path) before opening/writing the file. This pattern is very suspect and may leads to unexpected bugs. Let's consider the following code:C#copyif (File.Exists(path))
{
using var stream = File.Open(path, FileMode.Open); // Is it safe?
// ...
}There are m
This post is my contribution to the .NET Advent calendar. Make sure you check out the other amazing posts on the lead up to Christmas! At the time of writing, I am deep into work on some significant changes in the Elasticsearch .NET client. One of the changes is moving to System.Text.Json as the default […]
I had dug into the implementation of the .NET dictionary because of this issue. More specifically, I wanted to know what kind of collision handling system is in use there. The two common ones are open addressing and chaining. The code for the Dictionary is available, but I don’t think that I read much about how it works. I found some interesting tidbits about how it works, enough to warrant a post to remember it for the future.
System.Collections.Generic.Dictionary<TKey, TValue> has the following interesting fields (I’m only showing some of them, mind):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
private int[]? _buckets;
private Entry[]? _entries;
private int _count;
private int _freeList;
private int _freeCount;
private struct Entry
{
public uint hashCode;
public int next;
public TKey key; // Key of entry
public TValue value; // Value of entry
}
view raw
interesting.cs
hosted with ❤ by GitHub
The first thing to note is that we have both _buckets and _entries fields. In most hash tables, you’ll have just a single field for that purpose. Another interesting aspect that we have here is the fact that the _freeList is an integer and not a pointer. For that matter, if we look at the Entry, we can see that the field to the next value is actually an int as well, instead of a pointer. When running in 64 bits mode, that means benefit from 50% savings in memory because we only use a 32 bits int. The _freeList and the next pointer always point to the offset in the _entries array. This can be advantageous as well, since we can get avoid pointer caching. We are likely to be operating on the same memory and benefit from locality.
This has some other advantages as well. The separation of _buckets and _entries means that we can have separate ordering between them. More to the point, that has a side effect that I’m fairly certain is planned. The dictionary is going to add items to the _entries array in the order they were added, which means that if you just add items to the dictionary, you get a stable sort order. That sort will persist even if we need to rehash the dictionary and if there are no deletions, it will be in the insertion order. That is not guaranteed or documented, but I don’t think it is possible to change it at this point.
When you remove an item from the dictionary, it will reset that entry and set that location as the next candidate in the free list. Overall, the code is really nice to read, there is no need to deal with tombstones or recovery, as you’ll typically see with open addressing. The fact that we are using separate _buckets and _entries fields means that there is a lot less work for removal and rehashing scenarios, and I really love how easily the stable iteration order comes out of that implementation decision.
I ran into this post and I thought that I would share my thinking on the matter. I don’t actually know anything about IndexedDB, but I have been working with databases for a sufficiently long time to understand what the root problem here is.
The issue the post is talking about is that IndexedDB is slow, to the rate of inserting hundreds of documents takes multiple seconds. When you go a bit deeper into the post, you realize that the trouble is not with the number of documents that are being saved, but the number of transactions.
IndexedDB has the concept of transactions, and I’m going to err on the safe side and assume that it has durability. I dug a bit and found that IndexedDB is based on LevelDB (and I do know that one quite a bit). The underlying issue is that each transaction commit needs to issue an fsync(), and fsync is slow!
Pretty much all database engines have to deal with that, the usual answer is to allow transaction merging of some sort. Many relational databases will write to the log and commit multiple transactions in a single fsync(), RavenDB will do explicit transaction merging in order to batch writes to disk.
However, for pretty much every database out there with durability guarantees, the worst thing you can do is write a lot of data one transaction at a time. Server databases still benefit from being able to amortize multiple operations across different requests, but a client side database is effectively forced to do serial work with no chance for optimization.
From the client perspective, the only viable option here is to batch writes to get better performance. Then again, if you are working on the client side, you are either responding to a user action (in which case you have plenty of time to commit) or running in the background, which means that you have the ability to batch.
RavenDB is rarely deployed in isolation, it is typically used in existing systems and is integrated into the overall system. One of the key ways by which this is promoted is the built-in ETL support that we have. RavenDB currently has ETL for Postgres, SQL Server, Oracle, MySQL, Elastic, OLAP / Date Lake, and other RavenDB instances.
We are looking into adding RavenDB ETL support to queues (RabbitMQ, Kafka, SQS, AQS, etc). That support is the topic of this blog post. I wanted to summarize my thinking about the topic and along the way gather some insight from you about what kind of shape this feature should have.
When talking about ETL to Queues, we have to deal with two distinct scenarios: receiving and sending. For the other ETL targets in RavenDB, we just send data, but for queues, given that there is a well defined interface for pulling the results, it makes sense to support receiving as well. Let’s consider what it means to be able to receive messages from a queue into RavenDB…
It means that RavenDB will listen to a queue and apply a script to it. That script will be able to insert or modify documents as a result of the message contents. For example, let’s assume that we have the queue defined as in the image on the right. We can write the following script to process messages from the queue.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
if(msg.type == "NewOrder") {
var doc = {
Customer: msg.Customer,
Lines: [],
"@metadata": {"@collection": "Orders"}
};
put(msg.id, doc);
}
else if (msg.type == "AddLine") {
var doc = load(msg.id);
doc.Lines.push(msg.Details);
put(msg.id, doc);
}
view raw
etl.js
hosted with ❤ by GitHub
The script above handles two message types. A recording of a new order or adding a line item to an existing order. It will be invoked by RavenDB whenever it receives a message from the queue. In this way, you can have RavenDB build your domain model directly from the message traffic. Of course, this is a pretty simplistic scenario, there are a lot of more interesting scenarios to explore here.
The second part is when RavenDB will be the one sending messages to the queues. Those messages, naturally, would be generated from the documents in the database. How would that work? We can write a script that would be applied to documents as they change which will output the messages to write to the queue. That is how ETL in general works in RavenDB. For queues, however, the situation is a bit more complex.
When we use ETL to sync data from RavenDB to a relational database, any update of the document will also update the data in the relational database. When we send the data to a queue, what would happen then? Well, we can’t update a message in the queue, that doesn’t make any sort of sense. So we need to consider what is the scenario we have here. One option would be to just send the message each time, every update of a document will generate a new message. Or the author of the ETL script may decide to only send it once, of course.
The scenario that I think is far more likely is to use RavenDB and ETL to Queue as part of a larger scheme. Consider the scenario where you want to use the outbox pattern. In other words, you have a transaction that needs to do a bunch of things, including sending messages on a queue. Instead of trying to create a distributed transaction or carefully coordinate things, you will use this feature. Your transaction will save a Message document alongside any other changes. That relies on RavenDB’s ACID nature to ensure that this happens in an atomic manner.
Then you will be able to utilize the ETL to Queues option to actually send that over to the actual queue, in a reliable manner.
Those two scenarios (send & receive) are the two most likely scenarios for this feature, but the point of this post is to get more feedback from you. What kind of use cases do you think that this will enable? What would you like to be able to do?