skip to content
Relatively General .NET

Optimizing cache resets for higher transaction output

by Oren Eini

posted on: January 11, 2024

One of the most frequent operations we make in RavenDB is getting a page pointer. That is the basis of pretty much everything that you can think of inside the storage engine. On the surface, that is pretty simple, here is the API we’ll call: public Page GetPage ( long pageNumber ) Easy, right? But internally, we need to ensure that we get the right page. And that is where some complexity enters the picture. Voron, our storage engine, implements a pattern called MVCC (multi-version concurrency control). In other words, two transactions loading the same page may see different versions of the page at the same time. What this means is that the call to GetPage () needs to check if the page: Has been modified in the current transaction Has been modified in a previously committed transaction and has not yet flushed to disk The on-disk version is the most up-to-date one Each one of those checks is cheap, but getting a page is a common operation. So we implemented a small cache in front of these operations, which resulted in a substantial performance improvement.  Conceptually, here is what that cache looks like: public unsafe class PageLocator {      private struct PageData {          public long PageNumber ;          public byte* Pointer ;          public bool IsWritable ;     }      private PageData [] _cache = new PageData [ 512 ];      public byte* Get ( long page , out bool isWritable ) {          var index = page & 511 ;          ref var data = ref _cache [ index ];          if ( data.PageNumber == page ) {              isWritable = data.IsWritable ;              return data.Pointer ;         }          return LookupAndGetPage ( page , ref data , out isWritable );     }      public void Reset () {          for ( int i = 0 ; i < _cache.Length ; i++)              _cache [ i ].PageNumber =-1 ;     } } This is intended to be a fairly simple cache, and it is fine if certain access patterns aren’t going to produce optimal results. After all, the source it is using is already quite fast, we simply want to get even better performance when we can. This is important because caching is quite a complex topic on its own. The PageLocator itself is used in the context of a transaction and is pooled. Transactions in RavenDB tend to be pretty short, so that alleviates a lot of the complexity around cache management. This is actually a pretty solved problem for us, and has been for a long while. We have been using some variant of the code above for about 9 years. The reason for this post, however, is that we are trying to optimize things further. And this class showed up in our performance traces as problematic. Surprisingly enough, what is actually costly isn’t the lookup part, but making the PageLocator ready for the next transaction. We are talking about the Reset () method. The question is: how can we significantly optimize the process of making the instance ready for a new transaction? We don’t want to allocate, and resetting the page numbers is what is costing us. One option is to add an int Generation field to the PageData structure, which we’ll then check on getting from the cache. Resetting the cache can then be done by incrementing the locator’s generation with a single instruction. That is pretty awesome, right? It sadly exposes a problem, what happens when we use the locator enough to encounter an overflow? What happens if we have a sequence of events that brings us back to the same generation as a cached instance? We’ll be risking getting an old instance (from a previous transaction, which happened long ago).  The chances for that are low, seriously low. But that is not an acceptable risk for us. So we need to consider going further. Here is what we ended up with: public unsafe class PageLocator {      private struct PageData {          public long PageNumber ;          public byte* Pointer ;          public ushort Generation ;          public bool IsWritable ;     }      private PageData [] _cache = new PageData [ 512 ];      private int _generation = 1 ;      public byte* Get ( long page , out bool isWritable ) {          var index = page & 511 ;          ref var data = ref _cache [ index ];          if ( data.PageNumber == page && data.Generation == _generation ) {              isWritable = data.IsWritable ;              return data.Pointer ;         }          return LookupAndGetPage ( page , ref data , out isWritable );     }      public void Reset () {          _generation++;          if ( _generation >= 65535 ){                  _generation = 1 ;                  MemoryMarshal.Cast < PageData , byte >( _cache ).Fill ( 0 );         }     } } Once every 64K operations, we’ll pay the cost of resetting the entire buffer, but we do that in an instruction that is heavily optimized. If needed, we can take it further, but here are our results before the change: And afterward: The cost of the Renew ()call, which was composed mostly of the  Reset () call, basically dropped off the radar,and the performance roughly doubled. That is a pretty cool benefit for a straightforward change.

.NET Framework January 2024 Security and Quality Rollup

by Salini Agarwal

posted on: January 09, 2024

January 2024 Security and Quality Rollup Updates for .NET Framework

.NET January 2024 Updates – .NET 8.0.1, 7.0.15, .NET 6.0.26

by Rahul Bhandari (MSFT)

posted on: January 09, 2024

Check out latest Janaury 2024 updates for .NET 7.0 and .NET 6.0

A brief look at StringValues

by Andrew Lock

posted on: January 09, 2024

In this post I look at the StringValues type, where it's used in ASP.NET Core, why it's useful, how it's implemented, and why.…

Making primary constructor parameters read-only

by Gérald Barré

posted on: January 08, 2024

C# 12 introduced a new feature called Primary constructor. This feature allows us to define a constructor directly in the class declaration.C#copy// You can define a constructor directly in the class declaration. public readonly struct Distance(double dx, double dy) { public readonly double Mag

RavenDB HTTP Compression: Bandwidth & Time reductions

by Oren Eini

posted on: January 03, 2024

I recently talked about how RavenDB is now using ZStd as the default compression algorithm for backups. That led to a reduction both in the amount of storage we are consuming for backups and a significant reduction in the time to actually run the backups. We have been exploring where else we can get those benefits and the changes were recently released in RavenDB 6.0.2. RavenDB now supports ZStd for HTTP compression, which you can control using the DocumentConventions.HttpCompressionAlgorithm. You can find all the gory details about the performance impact in the release announcement here. The really nice thing is that you can expect to see about a 50% reduction in the amount of bandwidth being used at comparable or better timings. That is especially true if you are using bulk inserts, where the benefit is most noticeable. If you are running on the cloud, that matters a lot, since a reduction in bandwidth to and from the database translates directly into dollars being saved.

Announcing the Azure Migrate application and code assessment tool for .NET

by Olia Gavrysh

posted on: January 03, 2024

The new tool to help you move your .NET applications from on-premises to Azure is available in Visual Studio Marketplace and as a .NET CLI tool!

Recording

by Oren Eini

posted on: January 02, 2024

This was actually released a while ago, I was occupied with other matters and missed that.I had a blast talking with Carl & Richard about data sharding and how we implemented that in RavenDB.What is data sharding, and why do you need it? Carl and Richard talk to Oren Eini about his latest work on RavenDB, including the new data sharding feature. Oren talks about the power of sharding a database across multiple servers to improve performance on massive data sets. While a sharded database is typically in a single data center, it is possible to distribute the shards across multiple locations. The conversation explores the advantages and disadvantages of the different approaches, including that you might not need it today, but it's great to know it's there when you do!You can listen to the podcast here.

Backing up files to Azure blob storage with azcopy

by Andrew Lock

posted on: January 02, 2024

In this post I describe how I used the azcopy command-line tool to backup some photos to Azure Blob Storage…

Getting a handle for a directory in .NET

by Gérald Barré

posted on: January 02, 2024

On Windows, you can get a handle on a directory. This can be useful to prevent other applications from accessing or deleting the folder while your application is accessing it. You can use the win32 method CreateFile to open a folder. This method is not available in .NET, but it is possible to use i