skip to content
Relatively General .NET

Using the binary log to find the source of a .NET dependency

by Gérald Barré

posted on: December 09, 2024

Understanding where a dependency comes from can be tedious. This is especially true when you have a large project with many dependencies. Recently, .NET brings a new tool dotnet nuget why to help you understand why a package is installed in your project. However, there is a better way to do it, and

RavenDB Performance: 15% improvement in one line

by Oren Eini

posted on: December 02, 2024

RavenDB is a database, a transactional one. This means that we have to reach the disk and wait for it to complete persisting the data to stable storage before we can confirm a transaction commit. That represents a major challenge for ensuring high performance because disks are slow. I’m talking about disks, which can be rate-limited cloud disks, HDD, SSDs, or even NVMe. From the perspective of the database, all of them are slow. RavenDB spends a lot of time and effort making the system run fast, even though the disk is slow.An interesting problem we routinely encounter is that our test suite would literally cause disks to fail because we stress them beyond warranty limits. We actually keep a couple of those around, drives that have been stressed to the breaking point, because it lets us test unusual I/O patterns.We recently ran into strange benchmark results, and during the investigation, we realized we are actually running on one of those burnt-out drives. Here is what the performance looks like when writing 100K documents as fast as we can (10 active threads):As you can see, there is a huge variance in the results. To understand exactly why, we need to dig a bit deeper into how RavenDB handles I/O. You can observe this in the I/O Stats tab in the RavenDB Studio:There are actually three separate (and concurrent) sets of I/O operations that RavenDB uses:Blue - journal writes - unbuffered direct I/O - in the critical path for transaction performance because this is how RavenDB ensures that the D(urability) in ACID is maintained.Green - flushes - where RavenDB writes the modified data to the data file (until the flush, the modifications are kept in scratch buffers).Red - sync - forcing the data to reside in a persistent medium using fsync().The writes to the journal (blue) are the most important ones for performance, since we must wait for them to complete successfully before we can acknowledge that the transaction was committed. The other two ensure that the data actually reached the file and that we have safely stored it.It turns out that there is an interesting interaction between those three types. Both flushes (green) and syncs (red) can run concurrently with journal writes. But on bad disks, we may end up saturating the entire I/O bandwidth for the journal writes while we are flushing or syncing. In other words, the background work will impact the system performance. That only happens when you reach the physical limits of the hardware, but it is actually quite common when running in the cloud.To handle this scenario, RavenDB does a number of what I can only describe as shenanigans. Conceptually, here is how RavenDB works:def txn_merger(self): while self._running: with self.open_tx() as tx: while tx.total_size < MAX_TX_SIZE and tx.time < MAX_TX_TIME: curOp = self._operations.take() if curOp is None: break # no more operations curOp.exec(tx) tx.commit() # here we notify the operations that we are done tx.notify_ops_completed()The idea is that you submit the operation for the transaction merger, which can significantly improve the performance by merging multiple operations into a single disk write. The actual operations wait to be notified (which happens after the transaction successfully commits). If you want to know more about this, I have a full blog post on the topic. There is a lot of code to handle all sorts of edge cases, but that is basically the story. Notice that processing a transaction is actually composed of two steps. First, there is the execution of the transaction operations (which reside in the _operations queue), and then there is the actual commit(), where we write to the disk. It is the commit portion that takes a lot of time.Here is what the timeline will look like in this model:We execute the transaction, then wait for the disk. This means that we are unable to saturate either the disk or the CPU. That is a waste. To address that, RavenDB supports async commits (sometimes called early lock release). The idea is that while we are committing the previous transaction, we execute the next one. The code for that is something like this:def txn_merger(self): prev_txn = completed_txn() while self._running: executedOps = [] with self.open_tx() as tx: while tx.total_size < MAX_TX_SIZE and tx.time < MAX_TX_TIME: curOp = self._operations.take() if curOp is None: break # no more operations executedOps.append(curOp) curOp.exec(tx) if prev_txn.completed: break # verify success of previous commit prev_txn.end_commit() # only here we notify the operations that we are done prev_txn.notify_ops_completed() # start the commit in async manner prev_txn = tx.begin_commit()The idea is that we start writing to the disk, and while that is happening, we are already processing the operations in the next transaction. In other words, this allows both writing to the disk and executing the transaction operations to happen concurrently. Here is what this looks like:This change has a huge impact on overall performance. Especially because it can smooth out a slow disk by allowing us to process the operations in the transactions while waiting for the disk. I wrote about this as well in the past. So far, so good, this is how RavenDB has behaved for about a decade or so. So what is the performance optimization?This deserves an explanation. What this piece of code does is determine whether the transaction would complete in a synchronous or asynchronous manner. It used to do that based on whether there were more operations to process in the queue. If we completed a transaction and needed to decide if to complete it asynchronously, we would check if there are additional operations in the queue (currentOperationsCount).The change modifies the logic so that we complete in an async manner if we executed any operation. The change is minor but has a really important effect on the system. The idea is that if we are going to write to the disk (since we have operations to commit), we’ll always complete in an async manner, even if there are no more operations in the queue.The change is that the next operation will start processing immediately, instead of waiting for the commit to complete and only then starting to process. It is such a small change, but it had a huge impact on the system performance. Here you can see the effect of this change when writing 100K docs with 10 threads. We tested it on both a good disk and a bad one, and the results are really interesting. The bad disk chokes when we push a lot of data through it (gray line), and you can see it struggling to pick up. On the same disk, using the async version (yellow line), you can see it still struggles (because eventually, you need to hit the disk), but it is able to sustain much higher numbers and complete far more quickly (the yellow line ends before the gray one).On the good disk, which is able to sustain the entire load, we are still seeing an improvement (Blue is the new version, Orange is the old one). We aren’t sure yet why the initial stage is slower (maybe just because this is the first test we ran), but even with the slower start, it was able to complete more quickly because its throughput is higher.

NuGet Packages: security risks and best practices

by Gérald Barré

posted on: December 02, 2024

NuGet packages offer a convenient way to share code, but many developers download them without reviewing the contents or verifying updates when new versions are released. When you install a NuGet package, you are:Downloading code from unknown authors that has likely not been thoroughly reviewed by

Dramatically faster package restores with .NET 9’s new NuGet resolver

by Jeff Kluge

posted on: November 27, 2024

.NET 9 introduces a new NuGet dependency graph resolver that dramatically improves package restore performance for large repositories. Learn how this reimagined approach reduces restore times from 30 minutes to just 2 minutes by creating a more efficient dependency graph with fewer nodes.

RavenDB Cloud

by Oren Eini

posted on: November 26, 2024

RavenDB Cloud has a whole bunch of new features that were quietly launched over the past few months. I discuss them in this post. It turns out that the team keeps on delivering new stuff, faster than I can write about it.The following new auto-scaling feature is a really interesting one because it is pretty simple to understand and has some interesting implications for production.You need to explicitly enable auto-scaling on your cluster. Here is what that looks like:Once you enabled auto-scaling - which usually takes under a minute - you can click the Configure button to set your own policies:Here is what this looks like:The idea is very simple, we routinely measure the load on the system, and if we detect a high CPU threshold for a long time, we’ll trigger scaling to the next tier (or maybe higher, see the Upscaling / Downscaling step options) to provide additional resources to the system. If there isn’t enough load (as measured in CPU usage), we will downscale back to the lowest instance type. Conceptually, this is a simple setup. You use a lot of CPU, and you get a bigger machine that has more resources to use, until it all balances out. Now, let’s talk about the implications of this feature. To start with, it means you only pay based on your actual load, and you don’t need to over-provision for peak load. The design of this feature and RavenDB in general means that we can make scale-up and scale-down changes without any interruption in service. This allows you to let auto-scaling manage the size of your instances.In the image above, you may have noticed that I’m using the PB line of products (PB10 … PB50). That stands for burstable instances, which consume CPU credits when in use. How this interacts with auto-scaling is really interesting.As you use more CPU, you consume all the CPU credits, and your CPU usage becomes high. At this point, auto-scaling kicks in and moves you to a higher tier. That gives you both more baseline CPU credits and a higher CPU credits accrual rate. Together with zero downtime upscaling and downscaling, this means you can benefit from the burstable instances' lower cost without having to worry about running out of resources.Note that auto-scaling only applies to instances within the same family. So if you are running on burstable instances, you’ll get scaling from burstable instances, and if you are running on the P series (non-burstable), your auto-scaling will use P instances.Note that we offer auto-scaling for development instances as well. However, a development instance contains only a single RavenDB instance, so auto-scaling will trigger, but the instance will be inaccessible for up to two minutes while it scales. That isn’t an issue for the production tier.