skip to content
Relatively General .NET

Announcing .NET Aspire Preview 2

by Damian Edwards

posted on: December 20, 2023

.NET Aspire Preview 2 is now available and includes many improvements and new capabilities.

Resilient HttpClient with or without Polly

by Gérald Barré

posted on: December 18, 2023

Network issues are common. So, you should always handle them in your application. For instance, you can retry the request a few times before giving up. You can also use a cache to avoid making the same request multiple times. You can also use a circuit breaker to avoid making requests to a service

RavenDB Backups are now Faster & Smaller

by Oren Eini

posted on: December 15, 2023

With the release of RavenDB 6.0, we are now starting to focus on smaller features. The first one out of the gate, part of RavenDB 6.0.1 release, is actually a set of enhancements around making backups faster, smaller and cheaper. I just checked, and the core backup behavior of RavenDB hasn't changed much since 2010(!). In other words, decisions that were made almost 14 years ago are still in effect. There have been a… number of changes in both RavenDB, its operating environment and the size of the database that we deal with. In the past year, we ran into a number of cases where people working with datasets in the high hundreds of GB to low TB range had issues with backups. In particular, with the duration of backups. After the 6.0 release, we had the capacity to do a lot more about this, so we took a look. On first impression, you would expect that backing up a database whose size exceeds 750GB will take… a while. And indeed, it does. The question is, why? It’s a lot of data, sure. But where does the time go? The format of RavenDB backups is really simple. It is just a GZipped JSON file. The contents are treated as a JSON stream that contains all the data in the database. This has a number of advantages, the file size is small, the format itself lends itself well to extension, it is streamable, etc. In fact, it is a testament to the early design decision that we haven’t really had to touch that in so long. Given that the format is stable, and that we have a lot of experience with producing JSON, we approach the task of optimizing the backups with a good idea where we should go. The problem is likely with I/O (we need to go through the entire database, after all). There were some (pretty wild) ideas flying around on how to address this, but the first thing to do, of course, was to run it under the profiler. The results, as you can imagine, were not what we expected. It turns out that we spend quite a lot of the time inside of GZip, compressing the data. It turns out that when we set up the backup format all those years ago, we chose GZip and Optimal compression mode. In other words, we wanted the file size to be as small as possible. That… makes sense, of course. But it turns out that the vast majority of the time is actually spent compressing the data? Time to start looking deeper into that. GZip is an old format (it came out in 1992!). And recently there have been a number of new compression algorithms (Zstd, Brotli, etc). We decided to look into those in detail. GZip also has several modes that affect compression ratio vs. compression time. After a bit of experimentation, we have the following details when backing up a 35GB database. Algorithm & Mode     Size Time GZip - Optimal 5.9 GB 6 min, 40 sec GZip - Fastest 6.6 GB 4 min, 7 sec ZStd - Fastest 4.1 GB 3 min, 1 sec The data in this case is mostly textual (JSON), and it turns out that we can reduce the backup time by more than half while saving 30% in the space we take. Those are some nice numbers. You’ll note that ZStd also has a mode that controls compression ratio vs compression time. We tried checking this as well on a different dataset (a snapshot of the actual database) with a size of 25.5GB and we got: Algorithm & Mode    Size Time ZStd - Fastest 2.18 GB 56 sec ZStd - Optimal 1.98 GB 1 min, 41 sec GZip - Optimal 2.99 GB 3 min, 50 sec As you can see, GZip isn’t going to get a participation trophy at this point, coming dead last for both size and time. In short, RavenDB 6.0.1 will use the new ZStd compression algorithm for backups (and export files),  and you can expect to have greatly reduced backup times as well as smaller backups overall. This is now the default mode for RavenDB 6.0.1 or higher, but you can control that in the backup settings if you so wish. Restoring from old backups is no issue, of course, but restoring a ZStd backup on an older version of RavenDB is not supported. You can configure RavenDB to use the GZip algorithm if that is required. Another feature that is going to improve backup performance is the notion of backup mode, which you can see in the image above. RavenDB backups support multiple destinations, so you can back up to Amazon S3 as well as Azure Blob Storage as a single unit.  At the time of designing the backup system, that was a nice feature to have, since we assume that you’ll usually have a backup to a local disk (for quick restore) as well as an offsite backup for longer-term storage. In practice, almost all backup configurations in RavenDB have a single destination. However, because we have support for multiple backup destinations, the backup process will first write the backup file to the local disk and then upload it. The new Direct Upload mode only supports a single destination, and it streams the data to the destination directly, without touching the disk. As a result of this change, we are using far less I/O during backup procedures as well as reducing the total time it takes to run the backup. This is especially useful if your backup destination is nearby and the network is good. This is frequently the case in the cloud, where you are backing up to S3 in the same region. In our tests, it reduced the backup time by 30% in some cases. From a coding perspective, those are not huge changes, but together they mean that backups in RavenDB are now cheaper, faster, and far smaller. That translates to a better operating environment for your system. It also means that the costs of storing backups are going to go down by a significant amount. You can read all the technical details about the few features in the feature announcements: Introduction of zstd compression algorithm in Backups and Exports Direct Upload mode in Backups

Production Postmortem

by Oren Eini

posted on: December 12, 2023

A customer contacted us to complain about a highly unstable cluster in their production system. The metrics didn’t support the situation, however. There was no excess load on the cluster in terms of CPU and memory, but there were a lot of network issues. The cluster got to the point where it would just flat-out be unable to connect from one node to another. It was obviously some sort of a network issue, but our ping and network tests worked just fine. Something else was going on. Somehow, the server would get to a point where it would be simply inaccessible for a period of time, then accessible, then not, etc. What was weird was that the usual metrics didn’t give us anything. The logs were fine, as were memory and CPU. The network was stable throughout. If the first level of metrics isn’t telling a story, we need to dig deeper. So we did, and we found something really interesting. Here is the total number of TCP connections on the server over time. So there are a lot of connections on the system, which is choking it? But the CPU is fine, so what is going on? Are we being attacked? We looked at the connections, but they all came from authorized machines, and the firewall was locked down tight. If you look closely at the graph, you can see that it hits 32K connections at its peak. That is a really interesting number, because 32K is also the number of ephemeral port range values for Linux. In other words, we basically hit the OS limit for how many connections could be sustained between a client and a server. The question is what could be generating all of those connections? Remember, they are coming from a trusted source and are valid operations.  Indeed, digging deeper we could see that there are a lot of connections in the TIME_WAIT state. We asked to look at the client code to figure out what was going on. Here is what we found: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters const readline = require('readline'); const Raven = require('ravendb'); const process = require('process'); const store = Raven.DocumentStore.create({ // ... initialize RavenDB client ... }).initialize(); const session = store.openSession(); const doc = JSON.parse(readline.readline()); session.store(doc); session.saveChanges() .catch(error => { console.error(error); process.exit(1); }); view raw script.js hosted with ❤ by GitHub There is… not much here, as you can see. And certainly nothing that should cause us to generate a stupendous amount of connections to the server. In fact, this is a very short process. It is going to run, read a single line from the input, write a document to RavenDB, and then exit. To understand what is actually going on, we need to zoom out and understand the system at a higher level. Let’s assume that the script above is called using the following manner: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters cat large-input-file.jsonl | xargs -n 1 -I {} node ./script.js <<< "{}" view raw invoke.sh hosted with ❤ by GitHub What will happen now? All of this code is pretty innocent, I’m sure you can tell. But together, we are going to get the following interesting behavior: For each line in the input, we’ll invoke the script, which will spawn a separate process to connect to RavenDB, write a single document to the server, and exit. Immediately afterward, we'll have another such process, etc. Each of those processes is going to have a separate connection, identified by a quartet of (src ip, src port, dst ip, dst port). And there are only so many such ports available on the OS. Once you close a connection, it is moved to a TIME_WAIT mode, and any packets that arrive for the specified connection quartet are going to be assumed to be from the old connection and drop. Generate enough new connections fast enough, and you literally lock yourself out of the network. The solution to this problem is to avoid using a separate process for each interaction. Aside from alleviating the connection issue (which also requires non trivial cost on the server) it allows RavenDB to far better optimize network and traffic patterns.

Hardware Intrinsics in .NET 8

by Tanner Gooding [MSFT]

posted on: December 11, 2023

.NET 8 includes significant improvements to the Hardware Intrinsics feature.