skip to content
Relatively General .NET

Get the default value of a type at runtime

by Gérald Barré

posted on: October 09, 2023

Getting the default value of a type at compile time is easy. You can use default(T). What if you want to do the same at runtime? What is the default value for any System.Type?You can find multiple wrong answers on the internet. For example, this one:C#copy// ⚠️ Not the same as default(T) object? Ge

RavenDB version 6.0 is now live

by Oren Eini

posted on: October 02, 2023

Today marks a very long journey for RavenDB as we release version 6.0 into the wild. This completes a multi-year effort on our part, which I’m really excited to share with you. In version 6.0 the RavenDB team had a number of goals, with the primary one being, as always, providing a simple and powerful platform for your data and documents. The full list of changes in the release is here and you can already download the binaries of the new release to try it out. Below you’ll find more information about some of the highlights for the new release, but the executive summary for the technically inclined is: Deep integration with Kafka & RabbitMQ - allowing to send and consume messages directly from RavenDB. Sharding support allows you to run massively large databases easily and cheaply. Corax is a new indexing engine for RavenDB that can provide a 10x performance improvement for both indexing and queries. With this new release we are also announcing a pricing update for our commercial customers. You can find more details at the bottom of this post. Before we get to the gist of the changes, I want to spend a minute talking about this release. One of the reasons that we took so long to craft this release was an uncompromising eye toward ease of use, performance, and quality of experience. It may sound strange to talk about “quality of experience” when discussing a database engine, we aren’t talking about a weekend trip or a romantic getaway, after all. The reason RavenDB exists is that we found other databases to be complicated and delicate beasts, and we wanted to create a database that would be a joy to work with. Something that you could throw your data and queries to and would just work. With version 6.0, I’m happy to say that we managed to keep on doing just that while providing you with a host of new features and capabilities. Without further ado, allow me to tell you about the highlights of this release.   Sharding Sharding is the process of splitting your data among multiple nodes, to allow you to scale beyond the terabyte range. RavenDB actually had this feature, as far back as 2010, when the 1.0 version came out. I’m speaking about this today because we have decided to revamp the entire approach to sharding in the new release. Sharding in RavenDB version 6.0 consists of selecting the sharded option on database creation time, and everything else is handled by the cluster. Splitting the data across the nodes, balancing load and giving you the feeling that you are working on a unified database is all our responsibility. A chief goal of the new sharding feature in RavenDB was to make sure that you don’t have to be aware that you are running in a sharded environment. In fact, you can even switch the backend to a sharded database behind your application’s back, with no modifications to the application. Your systems can take advantage of the sharding feature, but they aren’t forced to. This makes it easier when you need to scale because you don’t need an expensive rewrite or stop the world migration to move to the new architecture. RavenDB, with or without sharding, offers you the same capabilities. Features such as indexing, map/reduce, ETL tasks, or subscriptions all work in the same way regardless of how you choose to store your data. A successful metric for us is that after you switch over to sharding, the only thing that you’ll notice is that you can now scale your systems even more. And, of course, you could do that easily, safely, and quickly. I recorded a webinar showcasing our new sharding approach that you can watch. Corax Indexing Engine Corax indexing engine is now available in version 6.0. We have been working on that since 2014, and as you can imagine, it hasn’t been a simple -to-solve challenge. Corax has a single task, to be able to do everything that we use Lucene for, but do that faster. In fact, to do things much faster. RavenDB has used Lucene as its indexing engine from the start, and Lucene has been the industry benchmark for search engines for a long time. In the context of RavenDB, we have optimized Lucene heavily for the past 15 years. When building Corax, it wasn’t sufficient to match what Lucene can do but to also significantly outperform it. With version 6.0, we now offer both Corax and Lucene as indexing engines for RavenDB. You can choose to use either engine (globally or per index). Upgrading from an older version of RavenDB, you’ll keep on using the Lucene engine but can create new indexes with Corax. Corax outperforms Lucene on a wide range of scenarios by an order of magnitude. This includes both indexing time and query latency. Corax also manages to be far faster than Lucene while utilizing less CPU and memory. One of the ways it does that is by preparing, in advance, the relevant data structures that Lucene needs to compute at runtime. Corax indexes tend to consume about 10% - 20% more disk space than Lucene, as a trade for offering lowered memory usage, far faster querying performance, and reduced indexing time. In the same manner as sharding, switching to the new indexing engine will get you performance improvements, with no difference in capabilities or features. Kafka & RabbitMQ Integration Kafka & RabbitMQ integration has been extended to version 6.0. RavenDB already supported writing to Kafka and RabbitMQ and we have now extended this capability to allow you to read messages from Kafka and RabbitMQ and turn them into documents. This capability provides complete support for queuing infrastructure integration. RavenDB can work with existing pipelines and easily expose its data for the rest of the organization as well as consume messages from the queue and create or update documents accordingly. Such documents are then available for querying, aggregation, etc. This makes it trivial to use a low-code solution to quickly and efficiently plug your system into Kafka and RabbitMQ. In the style of all RavenDB features, we take upon ourselves all the complexity inherent in such a solution. You only need to provide the connection string and the details about what you want to pull, and the full management of pulling or pushing data, handling distributed state, failover, and monitoring is the responsibility of the RavenDB cluster. Data Archiving Data archiving is a new capability in RavenDB, aiming to help users who deal with very large amounts of data. While a business wants to retain all its data forever, it typically works primarily with recent data. That is typically the data from the last year or two. As time goes by, old data accumulates, and going through irrelevant data consumes computing and memory resources from the database. RavenDB version 6.0 offers a way to mark a document with an attribute, marking when RavenDB should move that document to archive mode. Aside from telling RavenDB at what time we should archive a document, there is nothing further that you need to do. When the time comes to archive the document, RavenDB will change the document mode, which will have a number of effects. The document itself is retained, it is not deleted. If you aren’t already using document compression, the archived document is stored in a compressed manner, to reduce disk usage further. Subscriptions and indexes will ignore archive documents and unless explicitly requested, archived documents will remain on the disk and consume no memory resources. Archiving reduces the resources consumed by archived documents while keeping them available if you need to look them up. You can also construct indexing that would operate over archived and regular documents as well if you still want to allow queries over archived data. In this way, you can retain your entire dataset in the main system of record but still operate a significantly smaller chunk of that during normal operations. Performance, observability, and usability enhancements have also been on the menu for the version 6.0 release. There are far too many individual features and improvements to count them all in this format. I encourage you to check the full release details on the website. You are probably well aware of the drastic changes that happened in the business environment in the last few years. As part of adjusting to this new reality, we are updating our pricing to allow us to keep delivering top-notch solutions and support to our valued customers. The new pricing policy is effective starting from January 1st, 2024. What does this mean to you right now? Absolutely nothing until January 1st, 2024. All subscription renewals in 2023 keep their current price point. An Early Birds benefit for our existing customers: renew your 2024 subscription while locking your 2023 price point. * Available for Cloud and on-premises Customers. In the weeks ahead, we'll provide detailed updates about these changes and ensure a smooth transition. Our goal is to empower you to make optimized choices about your RavenDB subscription. Your satisfaction remains our priority. Our team is ready to assist you. Please reach out to us at sales@ravendb.net for any questions you have. Finally, I would like to thank the RavenDB team for their hard work in delivering this version. Our goal for this release was to deliver you a lot more while making sure that you’ll not be exposed to the underlying complexity of the problems that we are solving for you. I believe that as you try out the new release, you’ll see how successful we are in providing you with an excellent database platform to take your systems to new heights. And with that, I am left only with encouraging you to try out the new version. And let us know what you think about it.

How to test the logs from ILogger in .NET

by Gérald Barré

posted on: October 02, 2023

You sometimes need to validate that a log message is written to the log. In this post, I describe how to unit test ILogger in .NET Core.There are multiple strategies:You can use a mock framework like Moq to mock the ILogger interface and validate the right calls are made.You can store all logs in-m

How to write logs from ILogger to xUnit.net ITestOutputHelper

by Gérald Barré

posted on: September 25, 2023

When a test fails, it's important to understand what happened. All information could be useful. If you are using ILogger in your application, you can use the logs to understand what happened. However, the logs are not displayed in the output of xUnit. You can change this behavior by using a custom

Challenge

by Oren Eini

posted on: September 19, 2023

The following bug cost me a bunch of time, can you see what I’m doing wrong? This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters private static int AddAndSort(Dictionary<long, long> freeListAdditions, Span<long> keys, Span<long> vals) { int index = 0; foreach (var (k, v) in freeListAdditions) { keys[index] = k; vals[index] = v; index++; } keys.Sort(vals); return index; } view raw nasty.cs hosted with ❤ by GitHub For fun, it’s so nasty because usually, it will accidentally work.

Accessing private members without reflection in C#

by Gérald Barré

posted on: September 18, 2023

Reflection allows you to access private members of a class. This is very useful when you want to access private members of a class that you don't own. However, it is slow and doesn't work well with Native AOT. In this post, I describe how to access private members without reflection in .NET 8.Let's

Filtering negative numbers, fast

by Oren Eini

posted on: September 15, 2023

In the previous post, I was able to utilize AVX to get some nice speedups. In general, I was able to save up to 57%(!) of the runtime in processing arrays of 1M items. That is really amazing, if you think about it. But my best effort only gave me a 4% improvement when using 32M items. I decided to investigate what is going on in more depth, and I came up with the following benchmark. Given that I want to filter negative numbers, what would happen if the only negative number in the array was the first one? In other words, let’s see what happens when we could write this algorithm as the following line of code: array[1..].CopyTo(array); The idea here is that we should measure the speed of raw memory copy and see how that compares to our code. Before we dive into the results, I want to make a few things explicit. We are dealing here with arrays of long, when I’m talking about an array with 1M items, I’m actually talking about an 8MB buffer, and for the 32M items, we are talking about 256MB of memory. I’m running these benchmarks on the following machine:     AMD Ryzen 9 5950X 16-Core Processor     Base speed:    3.40 GHz     L1 cache:    1.0 MB     L2 cache:    8.0 MB     L3 cache:    64.0 MB     Utilization    9%     Speed    4.59 GHz In other words, when we look at this, the 1M items (8MB) can fit into L2 (barely, but certainly backed by the L3 cache. For the 32M items (256MB), we are far beyond what can fit in the cache, so we are probably dealing with memory bandwidth issues. I wrote the following functions to test it out: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters [Benchmark] public int CopyTo() { items[0] = -items[0]; Span<long> itemsSpan = items; itemsSpan[1..].CopyTo(itemsSpan); return items.Length - 1; ; } [Benchmark] public unsafe int MemoryCopy() { items[0] = -items[0]; fixed(long * p = items) { Buffer.MemoryCopy(p, p + 1, items.Length * sizeof(long), (items.Length - 1) * sizeof(long)); } return items.Length - 1; ; } [Benchmark] public unsafe int MoveMemory() { items[0] = -items[0]; fixed (long* p = items) { MoveMemory(p + 1, p, items.Length * sizeof(long)); } return items.Length - 1; ; } [DllImport("Kernel32.dll", EntryPoint = "RtlMoveMemory", SetLastError = false)] static unsafe extern void MoveMemory(void* dest, void* src, int size); view raw CopyBench.cs hosted with ❤ by GitHub Let’s look at what I’m actually testing here. CopyTo() – using the span native copying is the most ergonomic way to do things, I think. MemoryCopy() – uses a built-in unsafe API in the framework. That eventually boils down to a heavily optimized routine, which… calls to Memove() if the buffer overlaps (as they do in this case). MoveMemory() – uses a pinvoke to call to the Windows API to do the moving of memory for us. Here are the results for the 1M case (8MB): Method N Mean Error StdDev Ratio FilterCmp 1048599 441.4 us 1.78 us 1.58 us 1.00 FilterCmp_Avx 1048599 141.1 us 2.70 us 2.65 us 0.32 CopyTo 1048599 872.8 us 11.27 us 10.54 us 1.98 MemoryCopy 1048599 869.7 us 7.29 us 6.46 us 1.97 MoveMemory 1048599 126.9 us 0.28 us 0.25 us 0.29 We can see some real surprises here. I’m using the FilterCmp (the very basic implementation) that I wrote. I cannot explain why CopyTo() and MemoryMove() are so slow. What is really impressive is that the FilterCmp_Avx() and MoveMemory() are so close in performance, and so much faster. To put it in another way, we are already at a stage where we are within shouting distance from the MoveMemory() performance. That is.. really impressive. That said, what happens with 32M (256MB) ? Method N Mean Error StdDev Ratio FilterCmp 33554455 22,763.6 us 157.23 us 147.07 us 1.00 FilterCmp_Avx 33554455 20,122.3 us 214.10 us 200.27 us 0.88 CopyTo 33554455 27,660.1 us 91.41 us 76.33 us 1.22 MemoryCopy 33554455 27,618.4 us 136.16 us 127.36 us 1.21 MoveMemory 33554455 20,152.0 us 166.66 us 155.89 us 0.89 Now we are faster in the FilterCmp_Avx than MoveMemory. That is… a pretty big wow, and a really nice close for this blog post series, right? Except that we won’t be stopping here. The way the task I set out works, we are actually filtering just the first item out, and then we are basically copying the memory. Let’s do some math: 256MB in 20.1ms means 12.4 GB/sec! On this system, I have the following memory setup:     64.0 GB     Speed:    2133 MHz     Slots used:    4 of 4     Form factor:    DIMM     Hardware reserved:    55.2 MB I’m using DDR4 memory, so I can expect a maximum speed of 17GB/sec. In theory, I might be able to get more if the memory is located on different DIMMs, but for the size in question, that is not likely. I’m going to skip the training montage of VTune, understanding memory architecture and figuring out what is actually going on. Let’s drop everything and look at what we have with just AVX vs. MoveMemory: Method N Mean Error StdDev Median Ratio FilterCmp_Avx 1048599 141.6 us 2.28 us 2.02 us 141.8 us 1.12 MoveMemory 1048599 126.8 us 0.25 us 0.19 us 126.8 us 1.00               FilterCmp_Avx 33554455 21,105.5 us 408.65 us 963.25 us 20,770.4 us 1.08 MoveMemory 33554455 20,142.5 us 245.08 us 204.66 us 20,108.2 us 1.00 The new baseline is MoveMemory, and in this run, we can see that the AVX code is slightly slower. It’s sadly not uncommon to see numbers shift by those ranges when we are testing such micro-optimizations, mostly because we are subject to so many variables that can affect performance. In this case, I dropped all the other benchmarks, which may have changed things. At any rate, using those numbers, we have 12.4GB/sec for MoveMemory() and 11.8GB/sec for the AVX version. The hardware maximum speed is 17GB/sec. So we are quite close to what can be done. For that matter, I would like to point out that the trivial code completed the task in 11GB/sec, so that is quite respectable and shows that the issue here is literally getting the memory fast enough to the CPU. Can we do something about that? I made a pretty small change to the AVX version, like so: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters +if (i + Vector256<long>.Count <= len) +{ + var next = Vector256.LoadUnsafe(ref Unsafe.Add(ref output, i)); + len -= Vector256<long>.Count; for (; i + Vector256<long>.Count <= len; i += Vector256<long>.Count) { - var v = Vector256.LoadUnsafe(ref Unsafe.Add(ref output, i)); + var v = next; + next = Vector256.LoadUnsafe(ref Unsafe.Add(ref output, i + Vector256<long>.Count)); ... view raw Avx_Next.diff hosted with ❤ by GitHub What are we actually doing here? Instead of loading the value and immediately using it, we are loading the next value, then we are executing the loop and when we iterate again, we will start loading the next value and process the current one. The idea is to parallelize load and compute at the instruction level. Sadly, that didn’t seem to do the trick. I saw a 19% additional cost for that version compared to the vanilla AVX one on the 8MB run and a 2% additional cost on the 256MB run. I then realized that there was one really important test that I had to also make, and wrote the following: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters [Benchmark()] public unsafe int MoveMemory() { items[0] = -items[0]; fixed (long* p = items) { MoveMemory(p + 1, p, items.Length * sizeof(long)); } return items.Length - 1; ; } [DllImport("Kernel32.dll", EntryPoint = "RtlMoveMemory", SetLastError = false)] static unsafe extern void MoveMemory(void* dest, void* src, int size); [Benchmark(Baseline = true)] public unsafe int FillMemory() { items[0] = -items[0]; fixed (long* p = items) { var b = (byte)Random.Shared.Next(); FillMemory(p,items.Length * sizeof(long), b); } return items.Length - 1; ; } [DllImport("Kernel32.dll", EntryPoint = "RtlFillMemory", SetLastError = false)] static unsafe extern void FillMemory(void* dest, IntPtr size, byte fill); view raw Final.cs hosted with ❤ by GitHub In other words, let's test the speed of moving memory and filling memory as fast as we possibly can. Here are the results: Method N Mean Error StdDev Ratio RatioSD Code Size MoveMemory 1048599 126.8 us 0.36 us 0.33 us 0.25 0.00 270 B FillMemory 1048599 513.5 us 10.05 us 10.32 us 1.00 0.00 351 B                 MoveMemory 33554455 20,022.5 us 395.35 us 500.00 us 1.26 0.02 270 B FillMemory 33554455 15,822.4 us 19.85 us 17.60 us 1.00 0.00 351 B This is really interesting, for a small buffer (8MB), MoveMemory is somehow faster. I don’t have a way to explain that, but it has been a pretty consistent result in my tests. For the large buffer (256MB), we are seeing results that make more sense to me. MoveMemory – 12.5 GB / sec FIllMemory – 15.8 GB / sec In other words, for MoveMemory, we are both reading and writing, so we are paying for memory bandwidth in both directions. For filling the memory, we are only writing, so we can get better performance (no need for reads). In other words, we are talking about hitting the real physical limits of what the hardware can do. There are all sorts of tricks that one can pull, but when we are this close to the limit, they are almost always context-sensitive and dependent on many factors. To conclude, here are my final results: Method N Mean Error StdDev Ratio RatioSD Code Size FilterCmp_Avx 1048599 307.9 us 6.15 us 12.84 us 0.99 0.05 270 B FilterCmp_Avx_Next 1048599 308.4 us 6.07 us 9.26 us 0.99 0.03 270 B CopyTo 1048599 1,043.7 us 15.96 us 14.93 us 3.37 0.11 452 B ArrayCopy 1048599 1,046.7 us 15.92 us 14.89 us 3.38 0.14 266 B UnsafeCopy 1048599 309.5 us 6.15 us 8.83 us 1.00 0.04 133 B MoveMemory 1048599 310.8 us 6.17 us 9.43 us 1.00 0.00 270 B                 FilterCmp_Avx 33554455 24,013.1 us 451.09 us 443.03 us 0.98 0.02 270 B FilterCmp_Avx_Next 33554455 24,437.8 us 179.88 us 168.26 us 1.00 0.01 270 B CopyTo 33554455 32,931.6 us 416.57 us 389.66 us 1.35 0.02 452 B ArrayCopy 33554455 32,538.0 us 463.00 us 433.09 us 1.33 0.02 266 B UnsafeCopy 33554455 24,386.9 us 209.98 us 196.42 us 1.00 0.01 133 B MoveMemory 33554455 24,427.8 us 293.75 us 274.78 us 1.00 0.00 270 B As you can see, just the AVX version is comparable or (slightly) beating the MoveMemory function. I tried things like prefetching memory, both the next item, the next cache item and from the next page, using non-temporal load and stores and many other things, but this is a pretty tough challenge. What is really interesting is that the original, very simple and obvious implementation, clocked at 11 GB/sec. After pulling pretty much all the stops and tricks, I was able to hit 12.5 GB / sec. I don’t think anyone can look / update / understand the resulting code in any way without going through deep meditation. That is not a bad result at all. And along the way, I learned quite a bit about how the lowest level of the machine architecture is working.