skip to content
Relatively General .NET

RavenDB 5.3 New Features

by Oren Eini

posted on: November 12, 2021

In RavenDB 5.0 we had a major new feature, native time series support. Using this feature, you can store values over time, query and aggregate them, store them efficiently, produce rollups, etc. The classic example for time series data in RavenDB is when you have data coming from sensors. For example a Fitbit monitoring heartrate, a stock exchange feed giving you stock values. You don’t care about a particular value, you care about the value over time. It turns out that there are quite a lot of use cases for those kind of details. We have seen a major pick up in IoT related fields in particular. However, the API we provided for users to insert data for time series had a limitation, have a look: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters // add one value void Append(DateTime timestamp, double value, string tag = null); // add multiple values (up to 32) void Append(DateTime timestamp, IEnumerable<double> values, string tag = null); view raw api-v5.cs hosted with ❤ by GitHub The API gives you the ability to record a value (or a set of values) at a particular point in time, with an optional tag for additional meaning. What is the problem with this API, then? Well, it works great if you are processing data from a singular source (the stock exchange feed, or a medical device), but it fails to do its job if you may need to record multiple values for the same timestamp. Huh? What does that even mean? If we a are storing a value per timestamp, obviously there should be a value for that timestamp. How can there be multiple values? Note that here I’m not talking about something like location (with latitude and longitude coordinates), those are covered under storing an array of values on the same timestamp. The issue happens when you have the need to record multiple different values at the same timestamp. Typical time series are things like Heartrate, Location, StockPrice, etc. Having multiple values for the same thing at the same time frame doesn’t really work. In the Location time series, if I’m both here and there, you can expect trouble (if only because the paradox cops will show up). A stock may have different prices at the same time in different exchanges, sure, but that is not the same value, by its very nature. There is a common scenario where this will happen. When what I’m recording is not the full value, but part of that value. The classic example for that is tracking page views. Let’s say that I want to know how many people are looking at this blog post, I cannot use the Append() API for that purpose. Each individual operation is going to belong to a particular timestamp. What happens if I have two views on this post at the exact same millisecond? For that matter, what happens in the more “interesting” case of having writes to the same millisecond on two different nodes in the cluster? With timeseries as we envisioned them for the 5.0 release, that wasn’t an issue, a timeseries had a value in a particular timestamp. But supporting a scenario such as tracking views, or any scenario where we want to record partial data and have RavenDB take care of everything else isn’t served well by this model. Note that RavenDB already has the notion of distributed counters, they are intended specifically for doing such things. It is trivial in RavenDB to implement a counter that would track the overall views on a post. It will also handle concurrency, distributing data between nodes, everything that needs to be handled. So why can’t I use that? It turns out that I typically want to know more than just the total number of views on the post, I want to know when they happened. Counters are only a partial answer for that. That is why incremental time series were created. They are here to marry the ability of time series to track a value over time and the distributed counters ability to aggregate information concurrently and in a safe distributed manner. Here is the new API for incremental time series: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters // Increment at the current time, single value void Increment(double value); // Increment at the current time, up to 32 values void Increment(IEnumerable<double> values); // Increment at a given point in time (UTC only), single value void Increment(DateTime timestamp, double value); // Increment at a given point in time (UTC only), up to 32 values void Increment(DateTime timestamp, IEnumerable<double> values); view raw api-v53.cs hosted with ❤ by GitHub The changes are apparent at the API level, the Increment() is not setting the value, it is incrementing it with a delta value. So two increments on the same timestamp will give you the right result. Note that we don’t have a way to tag the entry any longer. That is no longer meaningful, because a single timestamp may have multiple different values. The method is called increment, but note that you can also pass negative values, if you want to reduce the amount. You can see in the image on the right how this looks like in the studio. An incremental time series is one that has the “INC:” prefix in the name. Such a time series is able to accept only increment operations, it will reject attempts to append values to it. In the same sense, a non incremental time series will not allow you to increment a value, only append new entries. We wanted to have a strong separation between the two time series modes because mixing them up resulted in a huge mess of edge cases that are really hard to solve. I probably should explain the terminology here, because it reflects an important distinction: Append – add a new timestamp and the value(s) for that time. This appends to the time series a new entry. Appending an entry to a time that is already in the timeseries will overwrite that time. Increment – add a new timestamp and its values. If there is already value for that time in the time series, we’ll add the new value and existing value together, writing their sum as the new value. That isn’t actually how it works internally, but that is the conceptual model. Aside from using increment to set the values, incremental time series behave just like any other time series. You can query over them, aggregate, index, etc. They can create rollups (a rolled up incremental time series is a normal time series, not an incremental one), apply retention polices, and everything else that you can do with a time series, the special behavior of incremental time series does not extend to its rolled-up versions. Here is a full example of how you can use this feature: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters using var session = store.OpenAsyncSession(); // this is safe to run concurrently session.IncrementalTimeSeriesFor("posts/2931-A", "INC:Views").Increment(1); await Task.Delay(1000); // simulate work // here we commit the transaction, the incremented timestamp // will be the recording time, not the tx commit time await session.SaveChangesAsync(); view raw usage.cs hosted with ❤ by GitHub As usual, this is transactional with any other operation you may want to do, so you can increment a time series along side uploading an attachment and modifying a document, as a single atomic transaction. And now we can ask about view counts on an hourly basis for the last week, like so: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters from Posts where id() = 'posts/2931-A' select timeseries( from 'INC:Views' between $weekAgo and $now group by 1h select sum() ) view raw stats.sql hosted with ❤ by GitHub This feature is going to be available in all editions of RavenDB 5.3, expected for release in mid November. I got so many ideas about what you can use this for .

RavenDB 5.3 New Features

by Oren Eini

posted on: November 11, 2021

Almost as soon as we introduced concurrent subscriptions, we ran into a serious problem in their use. The desire was to do things in a serial fashion. That was quite infuriating, because we spent to much time working on making things concurrent, and now we had to deal with making them serial again? What the hell? Before I dive any further, it will probably be for the best if I explained a bit more about the context of this very strange feature request. Consider a system where the subscription is used to process commands, which may relationships between one another. For example, consider the following commands (all of them belonging to the same “Commands” collection): EmployeePayroll – commands/40-A EmployeeBankAccountChange – commands/34-A EmployeeContractUpdate – commands/49-C For each one of those commands (and many more), we want to run some logic. Some of this requires us to touch third party services, which means that we are likely to be slow / stalled on some cases. That is the exact case for using concurrent subscriptions. The developers quickly jumped on the new system, setting the mode of the subscription as concurrent and running multiple workers. Things worked, latency was down and everyone was happy. Everyone, that is, except for George. The problem was George had gotten married recently. Well, that wasn’t the actual problem. George is happily married. The problem is that George and his wife have a new joint bank account. George let the HR department know about the new bank account in advance, which resulted in the EmployeeBankAccountChange command being generated. Then payroll day hit, and we have an EmployeePayroll command as well. This is where things started to get iffy. In terms of timing, the EmployeeBankAccountChange happened before the EmployeePayroll command. When the subscription was running in serial mode, it was guaranteed that it will always process the commands in the order that they were modified. That meant that handling things like changing the bank account and actually paying had a very natural order. If you made the change before payroll, it got processed before hand, otherwise, it was processed afterward. With concurrent subscriptions, this is no longer the situation. We are still working roughly in the order of modification, but we are no longer guaranteeing it. And it is possible to process documents out of order. RavenDB’s concurrent subscriptions will ensure that you’ll not have to worry about concurrent processing of a single document, but in this case, there are different documents, so they can be processed concurrently. An EmployeeBankAccountChange may take a long time (verifying accounts, etc) while EmployeePayroll  is just adding a line to a ACH file, so it is very likely that we’ll process the payroll before the account change. And that makes George very sad. Let’s see how we can avoid depressing the newlywed. One option is to make use of another RavenDB feature, the compare exchange support. This allows you to use strongly consistent, cluster-wide, values which are suitable for distributed locks. I looked into what it will take to build this and quailed in fear. I don’t want to let things become this complicated. The key issue here is that we want both concurrency and serial work. An interesting observation is that there is a scope for such things. Commands on the same employee should run in the same order they were issued, commands on different employees are free to run in whatever order they like. How can we make this work without diving head first into complexity the like of which will keep you up at night? For the most part, we can assume that concurrent operations for the same employee is rare. Even when we have multiple commands for the same employee, we can expect that there won’t be many of them. Given that, we can change the way we model the commands themselves. Instead of creating a document per command, we’ll have a document per employee. Where before we had this model: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public record class EmployeePayrollCommand(string EmployeeId, decimal Amount); public record class EmployeeBankAccountChange(string EmployeeId, BankInfo bankInfo); public record class EmployeeContractUpdateCommand(string EmployeeId, HourlyRate newRate); view raw before.cs hosted with ❤ by GitHub We’ll now have the following model: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public record class EmployeeCmd(); public record class EmployeePayrollCommand(decimal Amount) : EmployeeCmd(); public record class EmployeeBankAccountChange(BankInfo bankInfo) : EmployeeCmd(); public record class EmployeeContractUpdateCommand(HourlyRate newRate) : EmployeeCmd(); // this is the document, containing an array of commands in order public record class EmployeeCommands(string EmployeeId, List<EmployeeCmd> Commands); view raw after.cs hosted with ❤ by GitHub What does this give us? We now have a commands/employees/1-A for the first employee, all operations on the employee and handled as a single unit, guaranteed by the concurrent subscription. Let’s explore further how that works, okay? With the previous model/modeling, to register a command, we need to just call: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters using var session = store.OpenAsyncSession(); await session.StoreAsync(new EmployeePayroll("employees/1-A", 500)); await session.SaveChangesAsync(); view raw before_register.cs hosted with ❤ by GitHub All the commands were using the Commands collection, so the subscription worker will look like:: from Commands And if we process this concurrently, we may process the commands for the same employee at the same time, leading to sadness in the household of George. Instead, with the new model/modeling, we can use the patching API to handle this. Here is what this looks like: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters using var session = store.OpenAsyncSession(); var cmd = new EmployeePayroll(500); session.Advanced.AddOrPatch( "commands/employees/1-A", new EmployeeCommands("employees/1-A", new [] { cmd} ), // create new instance // run a patch operation on existing instance x => x.Commands, u => u.Add(cmd) ); await session.SaveChangesAsync(); view raw after_register.cs hosted with ❤ by GitHub The idea in this case is that all commands for the same employee use the same document. If there isn’t already such a value, we’ll create a new instance, otherwise, we’ll apply the patch script and add to it. The end result is that we can have multiple concurrent operations and they will all be added to the same document in order of execution. However, so far this has nothing to do with concurrent subscriptions. What do we do from here? Here is what the subscription worker looks like after these changes:   This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters worker.Run(async batch => { using var session = batch.OpenAsyncSession(); foreach (var item = batch.Items) { foreach (var cmd in item.Result.Commands) { await cmd.ExecuteAsync(); } // now clear the commands we executed / delete the collection if needed session.Advanced.Defer(new PatchCommandData(_docId, null, new PatchRequest { Script = @" this.Commands = this.Commands.splice(0, args.CmdCount); if(this.Commands.length == 0) // no more commands del(id(this)); ", Values = { ["CmdCount"] = item.Result.Commands.Length} })); } await session.SaveChangesAsync(); }); view raw process.cs hosted with ❤ by GitHub The idea is that when we enqueue a command, we register them in the document specifically for the employee (the scope for serial work in a concurrent subscription) and when we process the command in the subscription worker we patch out all the commands that we already executed. This behavior will guarantee that we can process commands serially within a concurrent worker. All commands for the same employee will be processed serially in the order they were submitted, while different employees will be processed concurrently.We even support adding additional commands to the employee document while the worker is processing commands, we’ll simply handle them in the next batch after the employee commands are all done. One thing that I’m not discussing here is what to do in case we have concurrent modifications on the commands document in multiple nodes? That would generate a conflict and RavenDB defaults to selecting the latest version. You can configure RavenDB to resolve this property, I talk about this at length here. Aside from leaning on the new concurrent subscriptions feature, all the rest of the things that we have been using in this post to solve the problem are long standing features of RavenDB and both conceptually and in practice this gives us a great deal of simplicity to handle a non trivial issue. As usual, I would very much welcome your feedback.

RavenDB 5.3 New Features

by Oren Eini

posted on: November 10, 2021

RavenDB supports a dedicated batch processing mode, using the notion of subscriptions. A subscription is simply a way to register a query with the database and have the database send the subscriber the documents that match the query. The previous sentence is taken directly from the Inside RavenDB book, and it is a good intro for the topic. A subscription is a way to process documents that match a query. A good example might be to run various business processes as a result of data changes. Let’s assume that we have a bank, and a new customer was registered. We need to run plenty of such processes (Know Your Customer, Anti Money Laundering, Credit Score, in-house estimation, credit limits & authorization, etc). A typical subscription query would then be: from Customers where Onboarded = false And then we can register to that subscription. At this point, the database will start sending us all the customers that haven’t been onboarded yet. This is a persistent query, so restarts and failures are handled properly. And the key aspect is that RavenDB will push the matching documents to the subscription worker. RavenDB will handle batching of the results, ensure that we can process humungous amount of data safely and easily and in general remove a lot of hassle from backend processing. Up until RavenDB 5.3, however, a subscription was defined to be a singleton. In other words, at any given point, only a single subscription worker could be running. That is enforced by the server and help making it much easier to reason about processing documents. One comment that we got is that this is great, if the processing that we are doing is internal, but if there is the need to make a remote call to a potentially slow service, that can be an issue. For example, consider the following worker code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters worker.Run(async batch => { using var session = batch.OpenAsyncSession(); for (var item in batch.Items) { Customer customer = item.Result; customer.CreditScore = await CheckCreditScore(customer); } await session.SaveChangesAsync(); }); view raw worker.cs hosted with ❤ by GitHub What happens when the CheckCreditScore() is slow? We are halting processing for everything. In some cases, it is only particular customers that are slow, and we absolutely want to process them in parallel. However, RavenDB did not allow that. In RavenDB 5.3, we are bringing concurrent subscriptions to the table. When you create the subscription worker, you can define it with a Concurrent mode, like so: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters var subscription = store.Subscriptions.GetSubscriptionWorker<Customer>( new SubscriptionWorkerOptions("NewCustomers") { Strategy = SubscriptionOpeningStrategy.Concurrent }); view raw sub.cs hosted with ❤ by GitHub When you have done that, RavenDB will allow multiple concurrent workers to run at the same time, processing batches in parallel. That means that a single slow customer will not halt your entire processing pipeline. In general, I would like you to think about this flag as just removing a limitation. Previously we blocked you from an operation, and now you can run freely.  However… We didn’t decide to limit your capabilities just because we like doing that. One of the key aspects of subscriptions is that they offer reliable processing of documents. If an exception has been thrown when processing a batch, RavenDB will resend the batch to the worker again, until processing is susccessful. If we handed a batch of documents to process to a worker, and that worker crashed without letting us know, we need to make sure that the next client to connect will start processing from the last acknowledged batch. It turns out that adding concurrency and the ability for workers to work completely independently of one another make such promises a lot harder to implement. There is also another aspect that we have to consider. When we have just a single worker, certain concurrency issues never happen, but when we allow you to run concurrently, we have to deal with them. Consider the subscription above, running on two workers. We handed a new customer document to Worker A, which started processing it. While Worker A is processing the document, that document has changed. That means that it needs to be processed again by the subscription. We have Worker B available and ready, but if we allow such a scenario, we risk getting a race between the workers, working on the same document. We could punt that to the user and ask them to ensure that this is something that they handle, but that isn’t the philosophy of RavenDB. Instead, we have implemented the following behavior for concurrent subscriptions: When the server sends a batch of documents to a worker, that worker “checks them out”. Until that worker signals the server that the batch has been either processed or failed, we’ll not send those documents out to other workers, even if they have been modified. Once a batch is acknowledged as processed, we’ll scan all the documents in that batch and see if we need to schedule them for the next batch, because they were missed while they were checked out. That means that from the perspective of the user, they can write code knowing that only a single subscription worker will run on a given document at a time. This is a very powerful promise and can significantly simplify the complexity of building your systems. A single worker that is stalling will not prevent the other workers from making progress. There aren’t any timeouts to deal with. If you have a process that may take a long time, as long as the worker is alive and functioning (maintaining the TCP connection to the server), the server will consider the documents that the worker is processing as checked out. Concurrent subscriptions require you to opt in, using the Concurrent flag. All workers for a subscription must agree to run in a concurrent mode. This is to ensure that there aren’t any workers that expect pure serial work model. If you aren’t setting this flag, you’ll keep getting the usual serial behavior of subscriptions. We require opting in to this behavior because we violate an important guarantee of the subscription, that you’ll process the documents in the order in which they were modified. This is now no longer the case, obviously. The first worker to connect to a subscription will determine if it will run in concurrent mode or serial mode. Any new worker trying to run on that subscription needs to be concurrent (if the first one was concurrent) and no concurrent worker can join a subscription that has a serial worker active. This is a transient setting, it is important to note. When the last worker is shut down, the subscription state is reset, and then you can connect a worker for the first time again (which will then be able to set the mode of the subscription). You can see in the benchmark image on the right the impact of adding concurrent workers when there is a non trivial processing time. It is important to note that the concurrent part of the concurrent subscriptions is the fact that the workers are running in parallel. We are still sending batches of documents for each worker independently and then waiting for confirmation. If you have no significant processing time for a batch, you’ll not see a significant improvement in processing time (the server side cost of processing the documents, sending the batch, etc is related to the total number of documents, and won’t be impacted). Concurrent subscriptions are available in RavenDB 5.3 (due to be released by mid November) and will be available in the Professional and Enterprise editions of RavenDB.

Screencast Video Demo Checklist

by Ardalis

posted on: November 09, 2021

Recording a short screencast video can be a very effective way to provide a demo to stakeholders or show how a bug can be reproduced. Follow…Keep Reading →

File upload with progress bar in Blazor

by Gérald Barré

posted on: November 08, 2021

Uploading files may take times. The users want to have visibility on what's happening, so it is good to show them about the progress. The simplest way to do it is to show a progress bar. In this post, we'll use the InputFile component to upload files and some custom code to show the progress bar. T

Finding and tracking a race condition in MemoryCache

by Oren Eini

posted on: November 05, 2021

Following my previous posts, about the use after free bug that was found in a pull request, I put a lot of effort into testing the validity of the fix. As it turned out, my code was right and the fix worked properly. However, I uncovered a race condition in the .NET MemoryCache implementation. Here is what I got when I put the system under load:Unhandled exception. System.ArgumentException: Unable to sort because the IComparer.Compare() method returns inconsistent results. Either a value does not compare equal to itself, or one value repeatedly compared to another value yields different results. IComparer: 'System.Comparison`1[Microsoft.Extensions.Caching.Memory.CacheEntry]'. at System.Collections.Generic.ArraySortHelper`1.Sort(Span`1 keys, Comparison`1 comparer) in System.Private.CoreLib.dll:token 0x60066cd+0x1d at System.Collections.Generic.List`1.Sort(Comparison`1 comparison) in System.Private.CoreLib.dll:token 0x600688b+0x3 at Microsoft.Extensions.Caching.Memory.MemoryCache.g__ExpirePriorityBucket|27_0(Int64& removedSize, Int64 removalSizeTarget, Func`2 computeEntrySize, List`1 entriesToRemove, List`1 priorityEntries) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000061+0x21 at Microsoft.Extensions.Caching.Memory.MemoryCache.Compact(Int64 removalSizeTarget, Func`2 computeEntrySize) in Microsoft.Extensions.Caching.Memory.dll:token 0x600005b+0xff at Microsoft.Extensions.Caching.Memory.MemoryCache.OvercapacityCompaction(MemoryCache cache) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000059+0xad at System.Threading.ThreadPoolWorkQueue.Dispatch() in System.Private.CoreLib.dll:token 0x6002b7c+0x110 at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in System.Private.CoreLib.dll:token 0x6002c66+0x190There are a few interesting things here. First, we can see that this killed the process, this isn’t just an error, this is an error from a background thread that ended up unhandled and killed everything. That is a nope, for server applications. The second issue is that the error is strange. What exactly is going on here?Here is the relevant piece of code that throw the error, inside the MemoryCache:priorityEntries.Sort((e1, e2) => e1.LastAccessed.CompareTo(e2.LastAccessed));This is a really interesting line, because of what it does. The priorityEntries is a local list of cache entries, which we need to sort by the last access, to figure out what we can evict. What can go wrong here?Well, the MemoryCache is a naturally concurrent instance, of course. What happens when we have an access to the entry in question? We’ll update the LastAccessed value. And if we do this just right, we may give the sort function different values for the same cache entry, leading to this problem. The bug in question was in place for as long as I can track the MemoryCache codebase. In the best case scenario, it will cause evictions of data that shouldn’t be evicted. Not a major issue, and unlikely to be noticed. But if it fail in this manner, it will kill the process, so very likely to be noticed.My test scenario had a lot of concurrency and a very small cache (leading to a lot of evictions / compactions), I’m guessing that is why I’m able to trigger this so easily.

Caching hostility–usage patterns that breaks your system

by Oren Eini

posted on: November 04, 2021

When you use a cache, you need to take into account several factors about the cache. There are several workload patterns that can cause the cache to turn into a liability, instead of an asset. One of the most common scenarios where you can pay heavily for the cache, but not benefit much, is when you have a sequential access pattern that exceed the size of the cache.Consider the following scenario: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters var cache = new MyCache(100); // Max items - 1000 for (var i =0; i < 1_000_000; i++) { DoSomething(cache, i % 128); } view raw bad_cache.cs hosted with ❤ by GitHub In this case, the size is set to 100, but the keys are sequential in the range of 0 .. 127. We are basically guaranteed to never have a cache hit. What is the impact of such a cache, however?Well, it will keep the reference alive for longer, so they will end up in the Gen2. On eviction, they will take longer to be discarded. In other words, adding a cache here will increase the amount of memory that is being used, have higher CPU utilization (the GC has to do more work) and won’t add any performance benefit at all. Removing the cache, on the other hand, will reduce both memory utilization and CPU costs. This can be completely unintuitive at first glance, but it is a real scenario, and sadly something that we had experienced many times in RavenDB 3.x editions. In fact, a lot of the design of RavenDB 4.x was about fixing those kinds of issues.Whenever you design a cache, you should consider what sort of adversity you have to face. Considering your users and adversaries, intentionally trying to break your software, is a good mindset to have. You get to avoid many pitfalls this way. There are many other caching anti patterns. For example, if you are using a distributed cache, the pattern of accesses to the cache may be more expensive than reading from the source. You have many (fast) queries to answer a value, instead of one (somewhat slower) remote call. The network cost is typically huge, but discounted (see: Fallacies of Distributed Computing).But for in memory cache, it is easy to forget that a cache that is overloaded is just a memory hog, not providing very good details at all. In the previous posts, I discussed how I should use a buffer pool in conjunction with the cache. That is done because of this particular scenario, if the cache is overloaded, and we discard values, we want to at least avoid doing additional allocations.In many ways, a cache is a really complex piece of software. There has been a lot of research into it. Here are another non initiative result. Instead of using the least recently used (or least frequently used), select a value at random and evict it. Your performance is going to be faster.Why is that? Look at the code above, let’s assume that I’m evicting a random value in the 25% least frequently used items. The fact that I’m doing that randomly means that there is higher likelihood that some values will remain in the cache, even after they “should” have expired. And by the time I come back to them, they would be useful in the cache, instead of predictably evicted. In many databases, the cache management takes a huge part of the complexity. You usually have multiple levels of caches, and policies that move them between one another. I really liked this post, discussing the Postgres algorithm in great details. It also cover some aspects of nearly hostile behavior that the cache has to guard against, to avoid pathological performance drops.

Challenge

by Oren Eini

posted on: November 03, 2021

After presenting the issue of how to return items to the array pool without creating a use after free bug, I asked you how you would fix that. There are several ways to try to do that, you can use reference counting scheme, or try to use locking, etc. All of those options are expensive, what is worse, they are expensive on a routine basis, not just for the free the buffer code path.Instead, I changed the way we are handling returning the buffer. Take a look at the following code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public class ReturnBuffer { public byte[] Buffer; ~ReturnBuffer() { if (Buffer != null) { ArrayPool<byte>.Shared.Return(Buffer); Buffer = null; } } } private static ConditionalWeakTable<object, object> _joinLifetimes = new(); private void EvictionCallback(object key, object value, EvictionReason reason, object state) { _joinLifetimes.Add(value, new ReturnBuffer { Buffer = (byte[])value }); } view raw fixed.cs hosted with ❤ by GitHub This may require some explanation. I’m using a ConditionaWeakTable here, that was added to the runtime to enable dynamic properties on objects. Basically, it creates a table that you can lookup by an object to get a key. The most important feature is that the runtime ensures that the associated reference lifetime match the key object lifetime. In other words, when we add the buffer in the eviction callback, we ensure that the ReturnBuffer we register will live at least as long as the buffer.That means that we can let the GC do the verification job. We’ll now return the buffer back to the pool only after the GC has ensured that there are no outstanding references to it. Not a lot of code, and an elegant solution. This also ensures that we are only paying the code on eviction (likely rare), and not all the time.