Page 69 • Relatively General .NET

The cost of timing out

by Oren Eini

posted on: February 07, 2023

Let’s assume that you want to make a remote call to another server. Your code looks something like this: var response = await httpClient.GetAsync("https://api.myservice.app/v1/create-snap", cancellationTokenSource.Token); This is simple, and it works, until you realize that you have a problem. By default, this request will time out in 100 seconds. You can set it to a shorter timeout using HttpClient.Timeout property, but that will lead to other problems. The problem is that internally, inside HttpClient, if you are using a Timeout, it will call CancellationTokenSource.CancelAfter(). That is... what we want to do, no? Well, in theory, but there is a problem with this approach. Let's sa look at how this actually works, shall we? It ends up setting up a Timer instance, as you can see in the code. The problem is that this will modify a global value (well, one of them, there are by default N timers in the process, where N is the number of CPUs that you have on the machine. What that means is that in order to register a timeout, you need to take a look. If you have a high concurrency situation, setting up the timeouts may be incredibly expensive. Given that the timeout is usually a fixed value, within RavenDB we solved that using a different manner. We set up a set of timers that will go off periodically and then use this instead. We can request a task that will be completed on the next timeout duration. This way, we'll not be contending on the global locks, and we'll have a single value to set when the timeout happens. The code we use ends up being somewhat more complex: var sendTask = httpClient.GetAsync("https://api.myservice.app/v1/create-snap", cancellationTokenSource.Token); var waitTask = TimeoutManager.WaitFor(TimeSpan.FromSeconds(15), cancellationTokenSource.Token); if (Task.WaitAny(sendTask, waitTask) == 1) { throw new TimeoutException("The request to the service timed out."); } Because we aren't spending a lot of time doing setup for a (rare) event, we can complete things a lot faster. I don't like this approach, to be honest. I would rather have a better system in place, but it is a good workaround for a serious problem when you are dealing with high-performance systems. You can see how we implemented the TimeoutManager inside RavenDB, the goal was to get roughly the same time frame, but we are absolutely fine with doing roughly the right thing, rather than pay the full cost of doing this exactly as needed. For our scenario, roughly is more than accurate enough.

Generating the response writing expression for RequestDelegate

by Andrew Lock

posted on: February 07, 2023

Behind the scenes of minimal APIs - Part 6

Using source-generated regex in ASP.NET Core route constraints

by Gérald Barré

posted on: February 06, 2023

To use the Regex source generator, you need to use .NET 7 and C# 11. Source Generated Regexes provide multiple advantages over traditional regexes:Faster startup time as all the code is generated at compile time. You don't need to parse the regex pattern and generate an optimized code to execute th

On AI, GPT and the future of developers

by Oren Eini

posted on: February 02, 2023

When I started using GitHub Copilot, I was quite amazed at how good it was. Sessions using ChatGPT can be jaw dropping in terms of the generated content. The immediate reaction from many people is to consider what the impact of that would be on the humans who currently fill those roles. Surely, if we can get a machine to do the task of a human, we can all benefit (except for the person made redundant, I guess). I had a long discussion on the topic recently and I think that it is a good topic for a blog post, given the current interest in the subject matter. The history of replacing manual labor with automated machines goes back as far as you’ll like to stretch it. I wouldn’t go back to the horse & plow, but certain the Luddites and their arguments about the impact of machinery on the populace will sound familiar to anyone today. The standard answer is that some professions will go away, but new ones will pop up, instead. The classic example is the ice salesman. That used to be a function, a guy on a horse-drawn carriage that would sell you ice to keep your food cold. You can assume that this profession is no longer relevant, of course. The difference here is that we now have computer programs and AI taking over what was classically thought impossible. You can ask Dall-E or Stable Diffusion for an image and in a few seconds, you’ll have a beautiful render that may actually match what you requested. You can start writing code with GitHub Copilot and it will predict what you want to do to an extent that is absolutely awe-inspiring. So what is the role of the human in all of this? If I can ask ChatGPT or Copilot to write me an email validation function, what do I need a developer for? Here is ChatGPT’s output: And here is Copilot’s output: I would rate the MailAddress version better, since I know that you can’t actually manage emails via Regex. I tried to take this further and ask ChatGPT about the Regex, and got: ChatGPT is confused, and the answer doesn’t make any sort of sense. Most of the time spent on “research” for this post was waiting for ChatGPT to actually produce a result, but this post isn’t about nitpicking, actually. The whole premise around “machines will make us redundant” is that the sole role of a developer is taking a low-level requirement such as email validation and producing the code to match. Writing such low-hanging fruit is not your job. For that matter, a function is not your job. Nor is writing code a significant portion of that. A developer needs to be able to build the system architecture and design the interaction between components and the overall system. They need to make sure that the system is performant, meet the non-functional requirements, etc. A developer would spend a lot more time reading code than writing it. Here is a more realistic example of using ChatGPT, asking it to write to a file using a write-ahead log. I am both amazed by the quality of the answer and find myself unable to use even a bit of the code in there. The scary thing is that this code looks correct at a glance. It is wrong, dangerously so, but you’ll need to be a subject matter expert to know that. In this case, this doesn’t meet the requirements, the provided solution has security issues and doesn’t actually work. On the other hand, I asked it about password hashing and I would give this answer a good mark. I believe it will get better over time, but the overall context matters. We have a lot of experience in trying to get the secretary to write code. There have been many tools trying to do that, going all the way back to CASE in the 80s. There used to be a profession called: “computer”, where you could hire a person to compute math for you. Pocket calculators didn’t invalidate them, and Excel didn’t make them redundant. They are now called accountants or data scientists, instead. And use the new tools (admittedly, calling calculators or Excel new feels very strange) to boost up their productivity enormously. Developing with something like Copilot is a far easier task, since I can usually just tab complete a lot of the routine details. But having a tool to do some part of the job doesn’t mean that there is no work to be done. It means that a developer can speed up the routine bits and get to grips faster / more easily with the other challenges it has, such as figuring out why the system doesn’t do what it needs to, improving existing behavior, etc. Here is a great way to use ChatGPT as part of your work, ask it to optimize a function. For this scenario, it did a great job. For more complex scenarios? There is too much context to express. My final conclusion is that this is a really awesome tool to assist you. It can have a massive impact on productivity, especially for people working in an area that they aren’t familiar with. The downside is that sometimes it will generate junk, then again, sometimes real people do that as well. The next few years are going to be really interesting, since it provides a whole new level of capability for the industry at large, but I don’t think that it would shake the reality on the ground.

Generating argument expressions for minimal APIs (continued)

by Andrew Lock

posted on: January 31, 2023

Behind the scenes of minimal APIs - Part 5

Mocking an HttpClient using ASP.NET Core TestServer

by Gérald Barré

posted on: January 30, 2023

I've already written about mocking an HttpClient using an HttpClientHandler. You can write the HttpClientHandler yourself or use a mocking library. Multiple NuGet packages can help you write the HttpClientHandler such as Moq, RichardSzalay.MockHttp, HttpClientMockBuilder, SoloX.CodeQuality.Test.Hel

Production postmortem

by Oren Eini

posted on: January 27, 2023

A customer reported a scenario where RavenDB was using stupendous amounts of memory. In the orders of tens of GB on a system that didn’t have that much load. Our first suspicion was that this is an issue with reading the metrics, since RavenDB will try to keep as much of the data in memory, which sometimes leads users to worry. I spoke about this at length in the past. In this case, that wasn’t the case. We were able to drill down into the exact cause of the memory usage and we found out that RavenDB was using an abnormally high amount of memory. The question was why that was, exactly. We looked into the common operations on the server, and we found a suspicious query, it looked something like this: from index 'Sales/Actions' where endsWith(WorkflowStage, '/Final') The endsWith query was suspicious, so we looked into that further. In general, endsWith requires us to scan all the unique terms for a particular field, but in most cases, there aren’t that many unique values for a field. In this case, however, that wasn’t the case, here are some of the values for WorkflowStage: Workflows/3a1af12a-b5d2-4c96-9348-177ebaacab6c/Step-2 Workflows/6aacc86c-2f28-4b8b-8dee-1024314d5add/Final In total, there were about 250 million sales in the database, each one of them with a unique WorflowStage value. What does this mean, in terms of RavenDB query execution? Well, the fields are indexed, but we need to effectively do: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public IEnumerable<string> EndsWith(string field, string endsWith) { foreach((string term, string documentId) in GetFieldValues(field)) { if(term.EndsWith(endsWith)) yield return documentId; } } view raw query.cs hosted with ❤ by GitHub This isn’t the actual code, but it will show you what is going on. In other words, in order to process this query, we have to scan (and materialize) all 250 million unique terms for this field. Obviously that is going to consume a lot of memory. But what is the solution to that? Instead of doing an expensive endsWith query, we can move the computation from the query time to the index time. In other words, instead of indexing the WorkflowStage field as is, we’ll extract the information we want from it. The index would have one of those: IsFinalWorkFlowStage = doc.WorkflowStage.EndsWith(“/Final”), WorkflowStagePostfix = doc.WorkflowStage.Split(‘/’).Last() The first one will check whether the value is final or not, while the second just gets the (one of hopefully a few) postfixes for the field. We can then query using equality instead of endsWith, leading to far better performance and greatly reduced memory usage, since we don’t need to materialize any values during the query.

Generating argument expressions for minimal APIs

by Andrew Lock

posted on: January 24, 2023

Behind the scenes of minimal APIs - Part 4

Which collection interface to use?

by Vladimir Khorikov

posted on: January 23, 2023

Let’s talk about when to use which collection type and why.

Production postmortem

by Oren Eini

posted on: January 23, 2023

A user of ours called us, quite frantic. They are running a lot of systems on RavenDB, and have been for quite some time. However, very recently they started to run into severe issues. RavenDB would complain that there isn’t sufficient memory to run. The system metrics, however, said that there are still gobs of GBs available (I believe that this is the appropriate technical term). After verifying the situation, the on-call engineer escalated the issue. The problem was weird. There was enough memory, for sure, but for some reason RavenDB would be unable to run properly. An important aspect is that this user is running a multi-tenant system, with each tenant being served by its own database. Each database has a few indexes as well. Once we figured that out, it was actually easy to understand what is going on. There are actually quite a few limits that you have to take into account. I talked about them here. In that post, the issue was the maximum number of tasks defined by the system. After which, you can no longer create new threads. In this case, the suspect was: vm.max_map_count. Beyond just total memory, Linux has a limit on the number of memory mappings that a process may have. And RavenDB uses Voron, which is based on mmap(), and each database and each index typically have multiple maps going on. Given the number of databases involved… The solution was to increase the max_map_count and add a task for us, to give a warning to the user ahead of time when they are approaching the system's limits.