skip to content
Relatively General .NET

Performance discovery

by Oren Eini

posted on: January 03, 2025

When building RavenDB, we occasionally have to deal with some ridiculous numbers in both size and scale. In one of our tests, we ran into an interesting problem. Here are the performance numbers of running a particular query 3 times.First Run: 19,924 msSecond Run: 3,181 msThird Run: 1,179 msThose are not good numbers, so we dug into this to try to figure out what is going on. Here is the query that we are running:from index 'IntFloatNumbers-Lucene' where Int > 0And the key here is that this index covers 400 million documents, all of which are actually greater than 0. So this is actually a pretty complex task for the database to handle, mostly because of the internals of how Lucene works. Remember that we provide both the first page of the results as well as its total number. So we have to go through the entire result set to find out how many items we have. That is a lot of work. But it turns out that most of the time here isn’t actually processing the query, but dealing with the GC. Here are some entries from the GC log while the queries were running:2024-12-12T12:39:40.4845987Z, Type: GC, thread id: 30096, duration: 2107.9972ms, index: 25, generation: 2, reason: Induced 2024-12-12T12:39:53.1359744Z, Type: GC, thread id: 30096, duration: 1650.9207ms, index: 26, generation: 2, reason: Induced 2024-12-12T12:40:07.5835527Z, Type: GC, thread id: 30096, duration: 1629.1771ms, index: 27, generation: 2, reason: Induced 2024-12-12T12:40:20.2205602Z, Type: GC, thread id: 30096, duration: 776.24ms, index: 28, generation: 2, reason: InducedThat sound you heard was me going: Ouch!Remember that this query actually goes through 400M results. Here are the details about its Memory Usage & Object Count:Number of objects for GC (under LuceneIndexPersistence): 190M (~12.63GB)Managed Memory: 13.01GBUnmanaged Memory: 4.53MBWhat is going on? It turns out that Lucene handles queries such as Int>0 by creating an array with all the unique values, something similar to:string[] sortedTerms = new string[190_000_000]; long[] termPostingListOffset = new long[190_000_000];This isn’t exactly how it works, mind. But the details don’t really matter for this story. The key here is that we have an array with a sorted list of terms, and in this case, we have a lot of terms.Those values are cached, so they aren’t actually allocated and thrown away each time we query. However, remember that the .NET GC uses a Mark & Sweep algorithm. Here is the core part of the Mark portion of the algorithm:long _marker; void Mark() { var currentMarker = ++_marker; foreach (var root in GetRoots()) { Mark(root); } void Mark(object o) { // already visited if (GetMarket(o) == currentMarker) return; foreach (var child in GetReferences(node)) { Mark(child); } } }Basically, start from the roots (static variables, items on the stack, etc.), scan the reachable object graph, and mark all the objects in use. The code above is generic, of course (and basically pseudo-code), but let’s consider what the performance will be like when dealing with an array of 190M strings. It has to scan the entire thing, which means it is proportional to the number of objects. And we do have quite a lot of those. The problem was the number of managed objects, so we pulled all of those out. We moved the term storage to unmanaged memory, outside the purview of the GC. As a result, we now have the following Memory Usage & Object Count:Number of objects for GC (under LuceneIndexPersistence): 168K (~6.64GB)Managed Memory: 6.72GBUnmanaged Memory: 1.32GBLooking at the GC logs, we now have:2024-12-16T18:33:29.8143148Z, Type: GC, thread id: 8508, duration: 93.6835ms, index: 319, generation: 2, reason: Induced 2024-12-16T18:33:30.7013255Z, Type: GC, thread id: 8508, duration: 142.1781ms, index: 320, generation: 2, reason: Induced 2024-12-16T18:33:31.5691610Z, Type: GC, thread id: 8508, duration: 91.0983ms, index: 321, generation: 2, reason: Induced 2024-12-16T18:33:37.8245671Z, Type: GC, thread id: 8508, duration: 112.7643ms, index: 322, generation: 2, reason: InducedSo the GC time is now in the range of 100ms, instead of several seconds. This change helps both reduce overall GC pause times and greatly reduce the amount of CPU spent on managing garbage. Those are still big queries, but now we can focus on executing the query, rather than managing maintenance tasks. Incidentally, those sorts of issues are one of the key reasons why we built Corax, which can process queries directly on top of persistent structures, without needing to materialize anything from the disk.

Sometimes it's the hardware

by Oren Eini

posted on: December 31, 2024

An issue was recently raised with a really scary title: Intermittent Index corruption: VoronUnrecoverableErrorException. Those are the kinds of issues that you know are going to be complex. Fixing such issues in the past was usually a Task Force effort and quite a challenge. We asked for more information and started figuring out who would handle the issue (given the time of the year) when the user came back with:After pressing the disk check issue with our hosting provider, we found out that one of the disks was reporting an error but according to our hosting, it was only because the manufacturer's guarantee expired, and not the actual disk failure. We swapped the disk anyway, and so far we are not seeing the issue.I’m so happy that I can close that issue 🙂

Signing commits in Git using SSH keys on Windows

by Gérald Barré

posted on: December 30, 2024

Signing commits in Git is a way to verify the authenticity and integrity of your commits. This ensures that the commits are indeed made by you and have not been tampered with. On Windows, it's easy to set up SSH-Agent for easy signature. Here's how you can do it.First, you need to install SSH on Wi

Critical: .NET Install links are changing

by Richard Lander

posted on: December 26, 2024

The .NET installers and archives distribution method is currently changing unexpectedly. This change may impact your development, CI, and production infrastructure. It is crucial to validate if you are affected and monitor for any downtime or disruptions.

Isn't it ironic: Money isn't transactional

by Oren Eini

posted on: December 24, 2024

I write a transactional database for a living, and the best example of why we want transactions is transferring money between accounts. It is ironic, therefore, that there is no such thing as transactions for money transfers in the real world. If you care to know why, go back 200 years and consider how a bank would operate in an environment without instant communication. I would actually recommend doing that, it is a great case study in distributed system design. For example, did you know that the Templars used cryptography to send money almost a thousand years ago? Recently I was reviewing my bank transactions and I found the following surprise. This screenshot is from yesterday (Dec 18), and it looks like a payment that I made is still “stuck in the tubes” two and a half weeks later.   I got in touch with the supplier in question to apologize for the delay. They didn’t understand what I was talking about. Here is what they see when they go to their bank, they got the money.   For fun, look at the number of different dates that you can see in their details. Also, as of right now, my bank account still shows the money as pending approval (to be sent out from my bank). I might want to recommend that they use a different database. Or maybe I should just convince the bank to approve the payment by the time of the next invoice and see if I can get a bit of that infinite money glitch.

Top .NET Blogs Posts of 2024

by James Montemagno

posted on: December 23, 2024

Let's take a look back at the top .NET blogs from 2024 from exciting announcements, deep dives on new features, and so much more.

Reading USN Journal with .NET

by Gérald Barré

posted on: December 23, 2024

If you want to detect when a file is created, modified, or deleted, you can use the FileSystemWatcher class in .NET. However, this is only available when the application is running. What if you want to detect changes that occurred while the application was not running? The USN Journal is a feature