skip to content
Relatively General .NET

Challenge

by Oren Eini

posted on: December 14, 2022

Take a look at the following code, what do you think it will print? This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters using System; using System.Threading; using System.Threading.Tasks; await new Scenario().Run(); public class Scenario { public AsyncLocal<bool> Active = new(); private async Task Start() { Active.Value = true; } public async Task Run() { Console.WriteLine(Active.Value); await Start(); Console.WriteLine(Active.Value); } } view raw wierd.cs hosted with ❤ by GitHub Since it obviously doesn’t print the expected value, why do you think this is the case?

Recording

by Oren Eini

posted on: December 12, 2022

Self-hosting Chromium extensions

by Gérald Barré

posted on: December 12, 2022

Google Chrome and Microsoft Edge provide a store to publish extensions. Extensions published on the store are validated by testers, so you know the quality is ok and the extension does what the description said. However, there is no staging environment before publishing a new version. Also, you can

Solving support issues in other people’s products

by Oren Eini

posted on: December 06, 2022

A user contacted us to tell us that RavenDB does not work in his environment. As you can imagine, we didn’t really like to hear that, so we looked deeper into the issue. The issue in question included the actual problem, which looked something like this: { "Url": "/auth/", "Type": "Raven.Client.Exceptions.Routing.RouteNotFoundException", "Message": "There is no handler for path: GET /auth/", "Error": "Raven.Client.Exceptions.Routing.RouteNotFoundException: There is no handler for path: GET /auth/\n" } My reaction to that was… huh?! That is a really strange error, since RavenDB does not have an “/auth/” endpoint. The problem isn’t with RavenDB, it is with something else. In this case, the user ran RavenDB on port 8080 (which is the normal thing to do) and then tried to access RavenDB in the browser. The problem was that they previously ran some other software, and that software had the following interaction: * Connected to 127.0.0.1 port 8080 > GET / HTTP/1.1 > Host: 127.0.0.1 > User-Agent: curl/7.85.0 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Location: http://127.0.0.1:8080/auth/ < Content-Type: text/html; charset=UTF-8 In other words, it redirected the browser to the “/auth/” endpoint. It’s critical to understand that 301 response means: Moved Permanently. That means that they are actually cached by the browser. In this case, the scenario was reusing the same endpoint for a different software product, and the browser cache meant that we got strange results.

5 new MVC features in .NET 7

by Andrew Lock

posted on: December 06, 2022

In this post I look at some of the new features introduced for MVC controllers in .NET 7…

Building emergency pathways in your software (never to be used)

by Oren Eini

posted on: December 05, 2022

Preconditions, postconditions, and invariants, oh my! The old adage about Garbage In, Garbage Out is a really important aspect of our profession. If you try to do things that don’t make sense, the output will be nonsensical. On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. ~Charles Babbage – Inventor of the first computer As you can see, the issue isn’t a new one. And there are many ways to deal with that. You should check your inputs, assume they are hostile, double check on every layer, etc. Those are the principles of sound programming design, after all. This post is about a different topic. When everything is running smoothly, you want to reject invalid operations and dangerous actions. The problem is when everything is hosed. The concept of emergency operations is something that should be a core part of the design, because emergencies happen, and you don’t want to try to carve new paths in emergencies. Let’s consider a scenario such as when the root certificate has expired, which means that there is no authentication. You cannot authenticate to the servers, because the auth certificate you use has also expired. You need to have physical access, but the data center won’t let you in, since you cannot authenticate. Surely that is fiction, right? Happened last year to Facebook (bad IP configuration, not certs, but same behavior). An important aspect of good design is to consider what you’ll do in the really bad scenarios. How do you recover from such a scenario? For complex systems, it’s very easy to get to the point where you have cross dependencies. For example, your auth service relies on the database cluster, which uses the auth service for authentication. If both services are down at the same time, you cannot bring them up. Part of the design of good software is building the emergency paths. When the system breaks, do you have a well-defined operation that you can take to recover? A great example of that is fire doors in buildings. They are usually alarmed and open to the outside world only, preventing their regular use. But in an emergency, they allow the quick evacuation of a building safely, instead of creating a chokepoint. We recently got into a discussion internally about a particular feature in RavenDB (modifying the database topology). There are various operations that you shouldn’t be able to make, because they are dangerous. They are also the sort of things that allow you to recover from disaster. We ended up creating two endpoints for this feature. One that included checks and verification. The second one is an admin-only endpoint that is explicitly meant for the “I know what I mean” scenario. RavenDB actually has quite a bit of those scenarios. For example, you can authenticate to RavenDB using a certificate, or if you have a root access on the machine, you can use the OS authentication mechanism instead. We had scenarios where users lost their certificates and were able to use the alternative mechanism instead to recover. Making sure to design those emergency pathways ahead of time means that you get to do that with a calm mind and consider more options. It also means that you get to verify that your emergency mechanism doesn’t hinder normal operations. For example, the alarmed fire door. Or in the case of RavenDB, relying on the operating system permissions as a backup if you are already running as a root user on the machine. Having those procedures ahead of time, documented and verified, ends up being really important at crisis time. You don’t need to stumble in the dark or come up with new ways to do things on the fly. This is especially important since you cannot assume that the usual invariants are in place. Note that this is something that is very easy to miss, after all, you spend a lot of time designing and building those features, never to use them (hopefully). The answer to that is that you also install sprinklers and fire alarms with the express hope & intent to never use them in practice. The amusing part of this is that we call this: Making sure this areup to code. You need to ensure that your product and code are up to code.

Prevent http requests to external services in unit tests

by Gérald Barré

posted on: December 05, 2022

NoteThis blog post is part of The C# Advent Calendar, a series of 50 posts about C#. Be sure to check out the rest of the blog posts in the calendar!Flakiness in unit tests refers to the tendency of a test to sometimes pass and sometimes fail, even when there has been no change to the code being te

Looking into Corax’s posting lists: Part I

by Oren Eini

posted on: November 30, 2022

One of the big changes in RavenDB is the new search engine, Corax. We want to replace Lucene with a purpose built search engine, capable of doing everything that we can do with Lucene, but far faster. In this post, I want to discuss a particular detail of the implementation. Managing the posting list. In information retrieval, the idea of a posting list is that we have a list of document ids that match a particular term. I’m ignoring the details of how we create or manage the list. The basic idea is that we have a need to store a potentially large number of document ids, update them on the fly as well query them. Lucene manages that by creating immutable segments, but that design decision doesn’t match the way we want to do things in Corax. The problem with immutable segments is that you’ll eventually need to run compaction, which can be really expensive. As a database, we already have a really good data structure that matches most of those requirements, it turns out. A B+Tree can do a pretty good approximation of a posting list, but it does have a problem. It’s not efficient in terms of space usage. A document id is a 64 bits integer, and we can make a number of assumptions about it. Therefore, I create a dedicated B+Tree like structure to hold the posting list. This B+Tree is called a Set inside of Voron, and it can hold any number of uint64 values. Internally, this is managed as two separate types of pages. The Branch pages (which are fairly standard B+Tree branches) and the Leaf pages. Here is the first version of the Set Leaf Page implementation: Let’s figure out what is going on here. The page has a notion of a base. That means all the documents IDs that have the same upper 33 bits. Basically, inside a single page, all the IDs are in the same 2 billion range. That means that even though the document ids are 64 bits, in the context of a single page, we can treat them as 32 bits integers. That turns out to be important, since most integer compression routines work on 32 bits integers. How does that work? We have a section in the page that is dedicated to raw values, and we insert values into that section until it is full. Whenever that happens, we compress the raw values section using PForDelta compression. The page will then contain multiple compressed segments and a single raw values segment. Each compressed segment is non-overlapping with the other compressed segments (but may overlap with the raw values). PForDelta is really good in compressing integers, especially if it is able to figure out patterns in your data. And documents IDs in Corax are explicitly designed to have common patterns so it will be able to take advantage of this behavior. When we read the entries from the page, we merge the compressed segments with the raw values and return the full details. The code isn’t particularly simple, but has a fairly straightforward approach to the problem once you understand the design. One thing that I haven’t touched is the notion of removals. That is an important concept, and we handle that in an interesting way. Remember that I said that the baseline for the page is the upper 33 bits? That is because the numbers inside the page are 31 bits in length, we reserve the top bit to mark a value as a tombstone marker.  In other words, when we write to the raw values, we can have either a new value or a removed value, distinguished using the uppermost bit. When we fill the raw values segment, we compress it alongside the relevant compressed segments. At that point, we’ll filter out the removed values. This is very similar to the segments that Lucene is using, but we do that on a page boundary (8KB), not across all values. We are able to push a lot of values into a single page. We see typically thousands to tens of thousands of documents IDs fitting into a single 8KB page. That is pretty amazing, since even if you have a posting list that has millions of entries, the actual disk reads are minimal. The design was with us throughout most of the development of Corax, but we ran into a serious issue with the way it works when we started to build the query optimizer for Corax. That is an interesting statement, I think you’ll agree. What is the relation between a (very) low-level design of the on-disk data format and the result of the query optimizer? Well, it turns out that one of the things that we need to know for the query optimizer is: “How big is this posting list?” This question is actually really important to be able to generate optimal queries. And given the structure we have, we can provide a good answer to that, most of the time, but not always. Why is that? The problem is what happens when we want to remove a value from the set, or add an existing value. If the value already exists inside a compressed segment, we don’t open the compressed segement (which will require re-writing it from scratch), so we record an addition that is actually spurious. Conversely, if we try to remove a value that isn’t in the set, we’ll wrongly decrement the number of entries in the posting list, leading to issues with a mismatch between the record number of entries and the actual number we have in the posting list. That was very confusing to figure out, I’ll admit. It was also really hard to fix, but I’ll address that in the next post in the series.

Fundamental knowledge

by Oren Eini

posted on: November 29, 2022

I’ve been calling myself a professional software developer for just over 20 years at this point. In the past few years, I have gotten into teaching university courses in the Computer Science curriculum. I have recently had the experience of supporting a non-techie as they went through a(n intense) coding bootcamp (aiming at full stack / front end roles). I’m also building a distributed database engine and all the associated software. I list all of those details because I want to make an observation about the distinction between fundamental and transient knowledge. My first thought is that there is so much to learn. Comparing the structure of C# today to what it was when I learned it (pre-beta days, IIRC), it is a very different language. I had literally decades to adjust to some of those changes, but someone that is just getting started needs to grasp everything all at once. When I learned JavaScript you still had browsers in the market that didn’t recognize it, so you had to do the “//<!—” trick to get things to work (don’t ask!). This goes far beyond mere syntax and familiarity with language constructs. The overall environment is also critically important. One of the basic tasks that I give in class is something similar to: “Write a network service that would serve as a remote dictionary for key/value operations”.  Most students have a hard time grasping details such as IP vs. host, TCP ports, how to read from the network, error handling, etc. Adding a relatively simple requirement (make it secure from eavesdroppers) will take it entirely out of their capabilities. Even taking a “simple” problem, such as building a CRUD website is fraught with many important details that aren’t really visible. Responsive design, mobile friendly, state management and user experience, to name a few. Add requirements such as accessibility and you are setting the bar too high to reach. I intentionally choose the examples of accessibility and security, because those are “invisible” requirements. It is easy to miss them if you don’t know that they should be there. My first website was a PHP page that I pushed to the server using FTP and updated live in “production”. I was exposed to all the details about DNS and IPs, understood exactly that the server side was just a machine in a closet, and had very low levels of abstractions. (Naturally, the solution had no security or any other –ities). However, that knowledge from those early experiments has served me very well for decades. Same for details such as how TCP works or the basics of operating system design. Good familiarity with the basic data structures (heap, stack, tree, list, set, map, queue) paid itself many times over. The amount of time that I spent learning WinForms… still usable and widely applicable even in other platforms and environments. WPF or jQuery? Not so much. Learning patterns paid many dividends and was applicable on a wide range of applications and topics. I looked into the topics that are being taught (both for bootcamps and universities) and I understand why in many cases, those are being skipped. You can actually be a front end developer without understanding much (if at all) about networks. And the breadth of details you need to know is immense. My own tendency is to look at the low level stuff, and given that I work on a database engine, that is obviously quite useful. What I have found, however, is that whenever I dug deep into a topic, I found ways to utilize that knowledge at a later point in time. Sometimes I was able to solve a problem in a way that would be utterly inconceivable to me previously. I’m not just talking about being able to immediately apply new knowledge to a problem. If that were the case, I would attribute that to wanting to use the new thing I just learned. However, I’m talking about scenarios where months or years later I ran into a problem, and was then able to find the right solution given what was then totally useless knowledge. In short, I understand that chasing the 0.23-alpha-stage-2.3.1-dev updates on the left-pad package is important, but I found that spending time deep in the stack has a great cumulative effect. Joel Spolsky wrote about leaky abstractions, that was 20 years ago. I remember reading that blog post and grokking that. And it is true, being able to dig one or two layers down from where you usually live has a huge amount of leverage on your capabilities.

Managing delays in distributed systems with RavenDB

by Oren Eini

posted on: November 28, 2022

A user came to us with an interesting scenario. They have a RavenDB cluster, which is running in a distributed manner. At some point, we have a user that creates a document inside of RavenDB as well as posts a message using SQS (Amazon queuing system) to be picked up by a separate process. The flow of the system is shown below: The problem they run into is that there is an inherent race condition in the way they work. The backend worker that picks up the messages may use a different node to read the data than the one that it was written to. RavenDB uses asynchronous replication model, which means that if the queue and the backend workers are fast enough, they may try to load the relevant document from the server before it was replicated to it. Amusingly enough, that typically happens on light load (not a mistake, mind). On high load, the message processing time usually is sufficient to delay things for replication to happen. In light load, the message is picked up immediately, exposing the race condition. The question is, how do you deal with this scenario? If this was just a missing document, that was one thing, but we also need to handle another scenario. While the message is waiting to be processed in the queue, it may be updated by the user. So the question now is, how do we handle distributed concurrency in a good manner using RavenDB. The answer to this question is the usage of change vectors.  A change vector is a string that represents the version of a document in a distributed environment. The change vector is used by RavenDB to manage optimistic concurrency. This is typically used to detect changes in a document behind our backs, but we can use that in this scenario as well. The idea is that when we put the message on the queue, we’ll include the change vector that we got from persisting the document. That way, when the backend worker picks up the message and starts using it, the worker can compare the change vectors. If the document doesn’t exist, we can assume that there is a replication delay and try later. If the document exists and the change vector is different, we know the document was modified, and we may need to use different logic to handle the message in question.