skip to content
Relatively General .NET

WinForms: Analyze This (Me in Visual Basic)

by Klaus Loeffelmann

posted on: January 21, 2025

Your WinForms code might have issues—maybe an Async call picked the wrong overload, or it’s leaking data into resource files. Time to call in a code-shrink! So, WinForms, Analyze This!

Challenge

by Oren Eini

posted on: January 20, 2025

Here is a pretty simple C program, running on Linux. Can you tell me what you expect its output to be?#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <sys/stat.h> #define BUFFER_SIZE (3ULL * 1024 * 1024 * 1024) // 3GB in bytes int main() { int fd; char *buffer; struct stat st; buffer = (char *)malloc(BUFFER_SIZE); if (buffer == NULL) { return 1; } fd = open("large_file.bin", O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR); if (fd == -1) { return 2; } if (write(fd, buffer, BUFFER_SIZE) == -1) { return 3; } if (fsync(fd) == -1) { return 4; } if (close(fd) == -1) { return 5; } if (stat("large_file.bin", &st) == -1) { return 6; } printf("File size: %.2f GB\n", (double)st.st_size / (1024 * 1024 * 1024)); free(buffer); return 0; }And what happens when I run:head large_file.bin | hexdump -CThis shows both surprising behavior and serves as a good opening for discussion on a whole bunch of issues. In an interview setting, that can give us a lot of insight into the sort of knowledge a candidate has.

Using Roslyn to analyze and rewrite code in a solution

by Gérald Barré

posted on: January 20, 2025

I've written a lot about Roslyn in the context of Roslyn Analyzers and Source Generators. You can also use Roslyn as a library to analyze and generate code. For instance, you can create a console application that loads a solution, find patterns, and rewrite code. While Roslyn Analyzers are tied to

Production post-mortem

by Oren Eini

posted on: January 17, 2025

The problem was that this took time - many days or multiple weeks - for us to observe that. But we had the charts to prove that this was pretty consistent. If the RavenDB service was restarted (we did not have to restart the machine), the situation would instantly fix itself and then slowly degrade over time. The scenario in question was performance degradation over time. The metric in question was the average request latency, and we could track a small but consistent rise in this number over the course of days and weeks. The load on the server remained pretty much constant, but the latency of the requests grew.That the customer didn’t notice that is an interesting story on its own. RavenDB will automatically prioritize the fastest node in the cluster to be the “customer-facing” one, and it alleviated the issue to such an extent that the metrics the customer usually monitors were fine. The RavenDB Cloud team looks at the entire system, so we started the investigation long before the problem warranted users’ attention.I hate these sorts of issues because they are really hard to figure out and subject to basically every caveat under the sun. In this case, we basically had exactly nothing to go on. The workload was pretty consistent, and I/O, memory, and CPU usage were acceptable. There was no starting point to look at.Those are also big machines, with hundreds of GB of RAM and running heavy workloads. These machines have great disks and a lot of CPU power to spare. What is going on here?After a long while, we got a good handle on what is actually going on. When RavenDB starts, it creates memory maps of the file it is working with. Over time, as needed, RavenDB will map, unmap, and remap as needed. A process that has been running for a long while, with many databases and indexes operating, will have a lot of work done in terms of memory mapping.In Linux, you can inspect those details by running:$ cat /proc/22003/smaps 600a33834000-600a3383b000 r--p 00000000 08:30 214585 /data/ravendb/Raven.Server Size: 28 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 28 kB Pss: 26 kB Shared_Clean: 4 kB Shared_Dirty: 0 kB Private_Clean: 24 kB Private_Dirty: 0 kB Referenced: 28 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd mr mw me dw 600a3383b000-600a33847000 r-xp 00006000 08:30 214585 /data/ravendb/Raven.Server Size: 48 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 48 kB Pss: 46 kB Shared_Clean: 4 kB Shared_Dirty: 0 kB Private_Clean: 44 kB Private_Dirty: 0 kB Referenced: 48 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dwHere you can see the first page of entries from this file. Just starting up RavenDB (with no databases created) will generate close to 2,000 entries. The smaps virtual file can be really invaluable for figuring out certain types of problems. In the snippet above, you can see that we have some executable memory ranges mapped, for example.The problem is that over time, memory becomes fragmented, and we may end up with an smaps file that contains tens of thousands (or even hundreds of thousands) of entries.Here is the result of running perf top on the system, you can see that the top three items that hogs most of the resources are related to smaps accounting.This file provides such useful information that we monitor it on a regular basis. It turns out that this can have… interesting effects. Consider that while we are running the scan through all the memory mapping, we may need to change the memory mapping for the process. That leads to contention on the kernel locks that protect the mapping, of course. It’s expensive to generate the smaps fileReading from /proc/[pid]/smaps is not a simple file read. It involves the kernel gathering detailed memory statistics (e.g., memory regions, page size, resident/anonymous/shared memory usage) for each virtual memory area (VMA) of the process. For large processes with many memory mappings, this can be computationally expensive as the kernel has to gather the required information every time /proc/[pid]/smaps is accessed.When /proc/[pid]/smaps is read, the kernel needs to access memory-related structures. This may involve taking locks on certain parts of the process’s memory management system. If this is done too often or across many large processes, it could lead to contention or slow down the process itself, especially if other processes are accessing or modifying memory at the same time.If the number of memory mappings is high, and the frequency with which we monitor is short… I hope you can see where this is going. We effectively spent so much time running over this file that we blocked other operations. This wasn’t an issue when we just started the process, because the number of memory mappings was small, but as we worked on the system and the number of memory mappings grew… we eventually started hitting contention. The solution was two-fold. We made sure that there is only ever a single thread that would read the information from the smaps (previously it might have been triggered from multiple locations).  We added some throttling to ensure that we aren’t hammering the kernel with requests for this file too often (returning cached information if needed) and we switched from using smaps to using smaps_rollup instead. The rollup version provides much better performance, since it deals with summary data only.With those changes in place, we deployed to production and waited. The result was flat latency numbers and another item that the Cloud team could strike off the board successfully.

Meet the .NET Team at NDC London 2025

by Mehul Harry

posted on: January 16, 2025

Meet the .NET team at NDC London 2025 to explore the latest in .NET 9, Azure, and AI-powered development through keynotes, sessions, and 1:1 meetups.

Accidenal complexity: A tale of two GUIDs

by Oren Eini

posted on: January 15, 2025

For a new feature in RavenDB, I needed to associate each transaction with a source ID. The underlying idea is that I can aggregate transactions from multiple sources in a single location, but I need to be able to distinguish between transactions from A and B.Luckily, I had the foresight to reserve space in the Transaction Header, I had a whole 16 bytes available for me. Separately, each Voron database (the underlying storage engine that we use) has a unique Guid identifier. And a Guid is 16 bytes… so everything is pretty awesome.There was just one issue. I needed to be able to read transactions as part of the recovery of the database, but we stored the database ID inside the database itself. I figured out that I could also put a copy of the database ID in the global file header and was able to move forward. This is part of a much larger change, so I was going full steam ahead when I realized something pretty awful. That database Guid that I was relying on was already being used as the physical identifier of the storage as part of the way RavenDB distributes data. The reason it matters is that under certain circumstances, we may need to change that. If we change the database ID, we lose the association with the transactions for that database, leading to a whole big mess. I started sketching out a design for figuring out that the database ID has changed, re-writing all the transactions in storage, and… a colleague said: why don’t we use another ID?It hit me like a ton of bricks. I was using the existing database Guid because it was already there, so it seemed natural to want to reuse it. But there was no benefit in doing that. Instead, it added a lot more complexity because I was adding (many) additional responsibilities to the value that it didn’t have before.Creating a Guid is pretty easy, after all, and I was able to dedicate one I called Journal ID to this purpose. The existing Database ID is still there, and it is completely unrelated to it. Changing the Database ID has no impact on the Journal ID, so the problem space is radically simplified.I had to throw away heaps of complexity because of a single comment. I used the Database ID because it was there, never considering having a dedicated value for it. That single suggestion led to a better, simpler design and faster delivery. It is funny how you can sometimes be so focused on the problem at hand, when a step back will give you a much wider view and a better path to the solution.

.NET and .NET Framework January 2025 servicing releases updates

by Tara,Rahul

posted on: January 14, 2025

Welcome to our combined .NET servicing updates for January 2025. Let's get into the latest release of .NET & .NET Framework, here is a quick overview of what's new in these releases: Security improvements   This month you will find several CVEs that have been fixed this month: .NET January 2025 Updates   Below you will find a detailed list of everything from the .NET release for January 2025 including .NET 9.0.1 and .NET 8.0.12: .NET Improvements   Share feedback about this release in the Release feedback issue. ....

The memory leak in ConcurrentQueue

by Oren Eini

posted on: January 13, 2025

We ran into a memory issue recently in RavenDB, which had a pretty interesting root cause. Take a look at the following code and see if you can spot what is going on:ConcurrentQueue<Buffer> _buffers = new(); void FlushUntil(long maxTransactionId) { List<Buffer> toFlush = new(); while(_buffers.TryPeek(out buffer) && buffer.TransactionId <= maxTransactionId) { if(_buffers.TryDequeue(out buffer)) { toFlush.Add(buffer); } } FlushToDisk(toFlush); }The code handles flushing data to disk based on the maximum transaction ID. Can you see the memory leak?If we have a lot of load on the system, this will run just fine. The problem is when the load is over. If we stop writing new items to the system, it will keep a lot of stuff in memory, even though there is no reason for it to do so. The reason for that is the call to TryPeek(). You can read the source directly, but the basic idea is that when you peek, you have to guard against concurrent TryTake(). If you are not careful, you may encounter something called a torn read.Let’s try to explain it in detail. Suppose we store a large struct in the queue and call TryPeek() and TryTake() concurrently. The TryPeek() starts copying the struct to the caller at the same time that TryTake() does the same and zeros the value. So it is possible that TryPeek() would get an invalid value. To handle that, if you are using TryPeek(), the queue will not zero out the values. This means that until that queue segment is completely full and a new one is generated, we’ll retain references to those buffers, leading to an interesting memory leak.