Page 70 • Relatively General .NET

Generating argument expressions for minimal APIs

by Andrew Lock

posted on: January 24, 2023

Behind the scenes of minimal APIs - Part 4

Which collection interface to use?

by Vladimir Khorikov

posted on: January 23, 2023

Let’s talk about when to use which collection type and why.

Production postmortem

by Oren Eini

posted on: January 23, 2023

A user of ours called us, quite frantic. They are running a lot of systems on RavenDB, and have been for quite some time. However, very recently they started to run into severe issues. RavenDB would complain that there isn’t sufficient memory to run. The system metrics, however, said that there are still gobs of GBs available (I believe that this is the appropriate technical term). After verifying the situation, the on-call engineer escalated the issue. The problem was weird. There was enough memory, for sure, but for some reason RavenDB would be unable to run properly. An important aspect is that this user is running a multi-tenant system, with each tenant being served by its own database. Each database has a few indexes as well. Once we figured that out, it was actually easy to understand what is going on. There are actually quite a few limits that you have to take into account. I talked about them here. In that post, the issue was the maximum number of tasks defined by the system. After which, you can no longer create new threads. In this case, the suspect was: vm.max_map_count. Beyond just total memory, Linux has a limit on the number of memory mappings that a process may have. And RavenDB uses Voron, which is based on mmap(), and each database and each index typically have multiple maps going on. Given the number of databases involved… The solution was to increase the max_map_count and add a task for us, to give a warning to the user ahead of time when they are approaching the system's limits.

Chrome Browser Tabs to Links

by Ardalis

posted on: January 23, 2023

It often happens when you're looking for a solution to a problem that you end up with a bunch of browser tabs open. Some were dead ends…Keep Reading →

Using .NET code from JavaScript using WebAssembly

by Gérald Barré

posted on: January 23, 2023

Blazor WebAssembly allows running a .NET web application in a browser. Starting with .NET 7, you can easily run any .NET method from JavaScript without needing the whole Blazor framework. Let's see how to run a .NET method from JavaScript!First, you need to install the WASM workload, so you can pub

Exploring the model-binding logic of minimal APIs

by Andrew Lock

posted on: January 17, 2023

Behind the scenes of minimal APIs - Part 3

Production postmortem

by Oren Eini

posted on: January 16, 2023

A user reported that they observed nodes in the cluster “going dark”. Basically, they would stop communicating with the rest of the cluster, but would otherwise appear functional. Both the internal and external metrics were all fine, the server would just stop responding to anything over the network. The solution for the problem was to restart the service (note, the service, not the whole machine), but the problem would happen every few days. As you can imagine, we are taking this sort of thing very seriously, so we looked into the problem. And we came up short. The problem made absolutely no sense. The problem occurred on a (minor) version migration, but there was absolutely nothing related to this that could cause it. What was really weird was that the service itself continue to work. We could see log entries being written and it was able to execute scheduled backups, for example. It would just refuse to talk to us over the network. That was super strange, since the network itself was fine. All the monitoring systems were green, after all. For that matter, the user was able to SSH into the system to restart the service. This didn’t match with any other issue we could think of. Since the user worked around the problem by restarting the server, we didn’t have a lead. Then we noticed the exact same problem in one of our cloud instances, and there we have much better diagnostic capabilities. Once we had noticed a problematic server, we were able to SSH into that and try to figure out what was going on. Here is what we found out: The server will not respond to HTTP(s) communication either from outside the machine or by trying to connect from inside the machine. The server will respond to SNMP queries both from inside the machine and outside of it (which is how we typically monitor the system). When we designed RavenDB, we implemented a “maintenance hatch” for such scenarios, in addition to using HTTP(s) for communication, RavenDB also exposes a named pipe that allows you to connect to the server without going through the network at all. This ensures that if you have administrator privileges on the server, you are able to connect even if there are network issues, certificate problems, etc. Here is the kicker. Under this particular situation, we could not activate this escape hatch. That is not supposed to be possible. Named pipes on Linux, where we run into the problem, are basically Unix Sockets. A network issue such as a firewall problem or something similar isn’t going to affect them. At the same time, we were able to communicate with the process using SNMP. What is the problem? Lacking any other options, we dumped the process, restarted the service, and tried to do the analysis offline. We couldn’t find any problem. All the details we looked at said that everything was fine, the server was properly listening to new connections and it should work. That was… weird. And then it happened again, and we did the same analysis, and it came back the same. We were clueless. One of the things that we updated between versions was the .NET runtime that we were using, so we opened an issue to see if anyone ran into the same problem. And then it happened again. This time, we knew that just looking at the dump wouldn’t help us, so we tried other avenues. Linux has a pretty rich set of knobs and dials that you can look at to see what was going on. We suspected that this may be an issue with running out of file descriptors, running out of memory, etc. We tried looking into what is going on inside the process using strace, and everything was fine. The trace clearly showed that the server was processing requests and was able to send and receive data properly. Wait, go through that statement again please! It is fine? But the reason we are using strace is that there is a problem. It looks like the problem fixed itself. That was annoying, because we were hoping to use the trace to figure out what is going on. We added more monitoring along the way, which would let us know if the server found itself isolated. And we waited. The next time we ran into the problem, the first thing we did was run strace, we needed to get the root cause as soon as possible, and we were afraid that it would fix itself before we had a chance to get to the root cause. The moment we used strace, the server got back online, continuing as if there was never any issue. Over the next few instances of this issue, we were able to confirm the following observations: The service would stop responding to TCP and Unix Sockets entirely. There were no firewall or network issues. The service was up and functional, tailing the log showed activity. We could query the server state using SNMP. Running strace on the service process would fix the problem. There are a few more things, the actual trigger for the fix wasn’t strace itself. It was the ptrace() call, which it uses. That would cause the service to start responding again. The ptrace() call is basically the beginning and the end of debugging under Linux. Everything uses it. If you want to dump a memory process, you start with ptrace(). You want to trace the calls, ptrace(). You want to debug the process? GDB will start by calling ptrace(), etc. And doing that would alleviate the problem. That was… quite annoying. We still had absolutely no indication of what the root cause even was. We suspected it may be something inside Kestrel that was causing a problem. But that wouldn’t affect the named pipes / Unix sockets that we also saw. Networking worked, because SNMP did. We thought that this may be because SNMP uses UDP instead of TCP, and looked into that, but we couldn’t figure out how that would be any different. Looking at this further, we found that we have this in the code dumps: ~~~~ 5072 1 Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*) 1 System.Net.Sockets.SocketAsyncEngine.EventLoop() 1 System.Net.Sockets.SocketAsyncEngine+<>c.ctor>b__14_0(Object) As you can see, we are waiting for this in the .NET Sockets thread. The SNMP, on the other hand, looked like: Thread (0x559): [Native Frames] System.Net.Sockets!System.Net.Sockets.SocketPal.SysReceive() System.Net.Sockets!System.Net.Sockets.SocketPal.TryCompleteReceiveFrom() System.Net.Sockets!System.Net.Sockets.SocketAsyncContext.ReceiveFrom() System.Net.Sockets!System.Net.Sockets.SocketPal.ReceiveFrom() System.Net.Sockets!System.Net.Sockets.Socket.ReceiveFrom() SharpSnmpLib.Engine!Lextm.SharpSnmpLib.Pipeline.ListenerBinding.AsyncReceive() That was really interesting, since it meant that for sockets (both HTTP and Unix), we were always using async calls, but for SNMP, we were using the synchronous API. We initially suspected that this may be something related to the thread pool. Maybe we had something that blocked it, but it turns out to be a lot more interesting. Here is the code that is actually handling the SNMP: var count = _socket.ReceiveFrom(buffer, ref remote); Task.Factory.StartNew(() => HandleMessage(buffer, count, (IPEndPoint)remote)); In other words, we are actually reading from the socket in a blocking manner, but then processing the actual message using the thread pool. So being able to get results via SNMP meant the thread pool was well. At this point we resulted to hair pulling, rubber ducking and in some instances, shaking our fists to heaven. I reminded myself that I’m an adult with a bit of experience solving problems, and dug deeper. We started looking into how .NET is actually handling sockets in async mode. This end up here, doing a system call: while ((numEvents = epoll_wait(port, events, *count, -1)) < 0 && errno == EINTR); Reading through the man page for epoll_wait() I learned how epoll() works, that it is complex and that we need to be aware of level-triggered and edge-triggered options. Since .NET uses edge-triggered events (EPOLLET, which I keep reading as electronic chicken), we focused on that. There are a lot of edge cases and things to cover, but everything we checked was handled properly. We finally had a good smoking gun. For some reason, we weren’t getting notifications from epoll(), even though we should. Using strace() or friends somehow fixes that. We actually found the exact scenario we saw in StackOverflow, but without any idea what the issue was. Truly, there is an XKCD for everything. Our current understanding of the issue: All async sockets in .NET are going through the same socket engine, and are using epoll() under the covers. SNMP is using synchronous calls, so it wasn’t using epoll(). That covers both of the weird things that we are seeing. So what is the issue? It is not in .NET. Given the size & scope of .NET, we wouldn’t be the only ones seeing that. Below .NET, there is the kernel, so we looked into that. The machines we were running that on were using kernel 5.4.0-azure-1095, so we looked into that. And it looked like it is a kernel bug, which was fixed in the next updated kernel. A race condition inside the kernel would cause us to miss wakeups, and then we would basically just stall without anything to wake us up. We dug deeper to understand a bit more about this situation, and we got this: Some system calls return with EINTR if a signal was sent to a tracee, but delivery was suppressed by the tracer. (This is very typical operation: it is usually done by debuggers on every attach, in order to not introduce a bogus SIGSTOP). As of Linux 3.2.9, the following system calls are affected (this list is likely incomplete): epoll_wait(2), and read(2) from an inotify(7) file descriptor. The usual symptom of this bug is that when you attach to a quiescent process with the command strace -p <process-ID> then, instead of the usual and expected one-line output such as restart_syscall(<... resuming interrupted call ...>_ or select(6, [5], NULL, [5], NULL_ ('_' denotes the cursor position), you observe more than one line. For example: clock_gettime(CLOCK_MONOTONIC, {15370, 690928118}) = 0 epoll_wait(4,_ What is not visible here is that the process was blocked in epoll_wait(2) before strace(1) has attached to it. Attaching caused epoll_wait(2) to return to user space with the error EINTR. In this particular case, the program reacted to EINTR by checking the current time, and then executing epoll_wait(2) again. (Programs which do not expect such "stray" EINTR errors may behave in an unintended way upon an strace(1) attach.) And.. that is exactly what is happening. On attaching, the epoll_wait() will return with EINTR, which will cause .NET to retry the command, and that “fixes” the issue. It makes total sense now, and concludes the discovery process of a pretty nasty bug. Now, if you’ll excuse me, I need to go and apologize to a rubber duck.

Reducing Blazor WASM size by providing custom ICU data

by Gérald Barré

posted on: January 16, 2023

The International Components for Unicode (ICU) is a set of libraries that provide Unicode and internationalization support for software applications. Unicode is a standardized encoding system that represents almost all of the written languages of the world. It is used to represent characters in com

RavenDB Sharding Webinar recording

by Oren Eini

posted on: January 13, 2023

The recording of my webinar showing off the new Sharding feature in RavenDB 6.0 is now live. I’m showcasing the new technology preview of RavenDB 6.0 and we have a nightly build already available for it. I think the webinar was really good, and I’m super excited to discuss all the ins & out of how this works. Please take a look, and take the software for a spin. We have managed to get sharding down to a simple, obvious and clear process. And we are very much looking for your feedback.

Using client side encrypted fields in RavenDB

by Oren Eini

posted on: January 10, 2023

Sometimes, you need to hold on to data that you really don’t want to have access to. A great example may be that your user will provide you with their theme color preference. I’m sure you can appreciate the sensitivity of preferring a light or dark theme when working in the IDE. At any rate, you find yourself in an interesting situation, you have a piece of data that you don’t want to know about. In other words, the threat model we have to work with is that we protect the data from a malicious administrator. This may seem to be a far-fetched scenario, but just today I was informed that my email was inside the 200M users leak from Twitter. Having an additional safeguard ensures that even if someone manages to lay their hands on your database, there is little that they can do about it. RavenDB supports Transparent Data Encryption. In other words, the data is encrypted on disk and will only be decrypted while there is an active transaction looking at it. That is a server-side operation, there is a single key (not actually true, but close enough) that is used for all the data in the database. For this scenario, that is not good enough. We need to use a different key for each user. And even if we have all the data and the server’s encryption key, we should still not be able to read the sensitive data. How can we make this happen? The idea is that we want to encrypt the data on the client, with the client’s own key, that is never sent to the server. What the server is seeing is an encrypted blob, basically. The question is, how can we make it work as easily as possible. Let’s look at the API that we use to get it working: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public class User { public string Name { get; set; } public string Email { get; set; } public Encrypted<string> PreferredTheme { get; set; } } [JsonConverter(typeof(EncryptedJsonConverter))] public class Encrypted<T> : EncryptedJsonConverter.IShouldEncrypt { public Encrypted(){ } public Encrypted(T v) {Value = v;} public T Value { get; set; } object EncryptedJsonConverter.IShouldEncrypt.GetValue() => Value; } view raw Model.cs hosted with ❤ by GitHub As you can see, we indicate that the value is encrypted using the Encrypted<T> wrapper. That class is a very simple wrapper, with all the magic actually happening in the assigned JSON converter. Before we’ll look into how that works, allow me to present you with the way this document looks like to RavenDB: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters { "Name": "Oren Eini (Ayende Rahien)", "Email": "ayende@ayende.com", "PreferredTheme": { "Tag": "yqyhaf08pbZz7tdS4xiTpA==", "Data": "Q5Txbs3I/SS/Q7vcNQHZIKyR++2tIhJWlS2AIXLXMME=", "Nonce": "a6+QoR534F7T68Aq" }, "@metadata": { "@collection": "Users" } } view raw doc.json hosted with ❤ by GitHub As you can see, we don’t actually store the data as is. Instead, we have an object that stores the encrypted data as well as the authentication tag. The above document was generated from the following code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters using (var session = store.OpenSession()) { session.Store(new User { Email = "ayende@ayende.com", Name = "Oren Eini (Ayende Rahien)", PreferredTheme = new Encrypted<string> { Value= "Dark" } }, "users/1"); session.SaveChanges(); } view raw Create.cs hosted with ❤ by GitHub The JSON document holds the data we have, but without knowing the key, we do not know what the encrypted value is. The actual encrypted value is composed of three separate (and quite important) fields: Tag – the authentication tag that ensures that the value we decrypt is indeed the value that was encrypted Data – this is the actual encrypted value. Note that the size of the value is far larger than the value we actually encrypted. We do that to avoid leaking the size of the value. Nonce – a value that ensures that even if we encrypt similar values, we won’t end up with an identical output. I talk about this at length here. Just storing the data in the database is usually not sufficient, mind. Sure, with what we have right now, we can store and read the data back, safe from data leaks on the server side. However, we have another issue, we want to be able to query the data. In other words, the question is how, without telling the database server what the value is, can we query for matching values? The answer is that we need to provide a value during the query that would match the value we stored. That is typically fairly obvious & easy. But it runs into a problem when we have cryptography. Since we are using a Nonce, it means that each time we’ll encrypt the value, we’ll get a different encrypted value. How can we then query for the value? The answer to that is something called DAE (deterministic authenticated encryption). Here is how it works: instead of generating the nonce using random values and ensuring that it is never repeated, we’ll go the other way. We’ll generate the nonce in a deterministic manner. By effectively taking a hash of the data we’ll encrypt. That ensures that we’ll get a unique nonce for each unique value we’ll encrypt. And it means that for the same value, we’ll get the same encrypted output, which means that we can then query for that. Here is an example of how we can use this from RavenDB: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters // Generated query: // from 'Users' where PreferredTheme = $p0 // {"p0":{"Tag":"yqyhaf08pbZz7tdS4xiTpA==","Data":"Q5Txbs3I/SS/Q7vcNQHZIKyR++2tIhJWlS2AIXLXMME=","Nonce":"a6+QoR534F7T68Aq"}} using (var session = store.OpenSession()) { var users = session.Query<User>() .Where(x => x.PreferredTheme == new Encrypted<string> { Value = "Dark" }) .ToList(); foreach (var user in users) { Console.WriteLine(user.PreferredTheme.Value); } } view raw query.cs hosted with ❤ by GitHub And with that explanation out of the way, let’s see the wiring we need to make this happen. Here is the JsonConverter implementation that makes this possible: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public class EncryptedJsonConverter : JsonConverter { public interface IShouldEncrypt { object GetValue(); } public override bool CanConvert(Type objectType) { return objectType.IsGenericType && objectType.GetGenericTypeDefinition() == typeof(Encrypted<>); } public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer) { var data = JObject.Load(reader); var tag = Convert.FromBase64String(data.Value<string>("Tag")); var encrypted = Convert.FromBase64String(data.Value<string>("Data")); var nonce = Convert.FromBase64String(data.Value<string>("Nonce")); var plain = DeterministicEncryption.Decrypt(encrypted, tag, nonce); var ms = new MemoryStream(plain); var br = new BinaryReader(ms); var str = br.ReadString(); var val = JsonConvert.DeserializeObject(str, objectType.GetGenericArguments()[0]); return Activator.CreateInstance(objectType, val); } public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer) { var se = (IShouldEncrypt)value; var plainText = JsonConvert.SerializeObject(se.GetValue()); var ms = new MemoryStream(); var bw = new BinaryWriter(ms); bw.Write(plainText); // pad to 32 bytes boundary ms.SetLength(ms.Length + (32 - ms.Length % 32)); var bytes = ms.ToArray(); var (encrypted, tag, nonce) = DeterministicEncryption.Encrypt(bytes); writer.WriteStartObject(); writer.WritePropertyName("Tag"); writer.WriteValue(tag); writer.WritePropertyName("Data"); writer.WriteValue(encrypted); writer.WritePropertyName("Nonce"); writer.WriteValue(nonce); writer.WriteEndObject(); } } view raw EncryptedJsonConverter.cs hosted with ❤ by GitHub There is quite a lot that is going on here. This is a JsonConverter, which translates the in-memory data to what is actually sent over the wire for RavenDB. On read, there isn’t much that is going on there, we pull the individual fields from the JSON and pass them to the DeterministicEncryption class, which we’ll look at shortly. We get the plain text back, read the JSON we previously stored, and translate that back into a .NET object. On write, things are slightly more interesting. We convert the object to a string, and then we write that to an in memory stream. We ensure that the stream is always aligned on 32 bytes boundary (to avoid leaking the size). Without that step, you could distinguish between “Dark” and “Light” theme users simply based on the length of the encrypted value. We pass the data to the DeterministicEncryption class for actual encryption and build the encrypted value. I choose to use a complex object, but we could also put this into a single field just as easily. With that in place, the last thing to understand is how we perform the actual encryption: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public static class DeterministicEncryption { public static Func<byte[]> GetCurrentKey; public static (byte[] Encrypted, byte[] Tag, byte[] Nonce) Encrypt(byte[] bytes) { var (sivKey, encKey) = DeriveKeys(); using var aes = new AesGcm(encKey); using var hmac = Blake2b.CreateHMAC(12, sivKey); var nonce = hmac.ComputeHash(bytes); var tag = new byte[16]; var encrypted = new byte[bytes.Length]; aes.Encrypt(nonce, bytes, encrypted, tag); return (encrypted, tag, nonce); } public static byte[] Decrypt(byte[] encrypted, byte[] tag, byte[] nonce) { var (_, encKey) = DeriveKeys(); using var aes = new AesGcm(encKey); var plain = new byte[encrypted.Length]; aes.Decrypt(nonce, encrypted, tag, plain); return plain; } private static (byte[] SivKey, byte[] EncKey) DeriveKeys() { var derivedKey = Blake2b.ComputeHash(64, GetCurrentKey()); // from Blake2Fast return (derivedKey[0..32], derivedKey[32..64]); } } view raw DeterministicEncryption.cs hosted with ❤ by GitHub There is actually very little code here, which is pretty great. The first thing to note is that we have GetCurrentKey, which is a delegate you need to provide to find the current key. You can have a global key for the entire application or for the current user, etc. This key isn’t the actual encryption key, however. In the DerivedKeys function, we use the Blake2b algorithm to turn that 32 bytes key into a 64 bytes value. We then split this into two 32 bits keys. The idea is that we separate the domains, we have one key that is used for computing the SIV and another for the actual encryption. We use HMAC-Blake2b using the SIV key to compute the nonce of the value in a deterministic manner and then perform the actual encryption. For decryption, we go in reverse, but we don’t need to derive a SIV, obviously. With this in place, we have about 100 lines of code that add the ability to store client-side encrypted values and query them. Pretty neat, even if I say so myself. Note that we can store the encrypted value inside of RavenDB, which the database have no way of piercing, and retrieve those values back as well as query them for equality. Other querying capabilities, such as range or prefix scans are far more complex and tend to come with security implications that weaken the level guarantees you can provide.