Additional HTTP, Sockets, DNS and TLS Telemetry in .NET 5
by Steve Gordon
posted on: October 30, 2020
In this post, I describe and demonstrate some of the new telemetry and event counters from sources such as HTTP, Sockets, DNS and TLS.
by Steve Gordon
posted on: October 30, 2020
In this post, I describe and demonstrate some of the new telemetry and event counters from sources such as HTTP, Sockets, DNS and TLS.
by Oren Eini
posted on: October 30, 2020
The title of this post is a reference to a quote by Leslie Lamport: “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable”.A few days ago, my blog was down. The website was up, but it was throwing errors about being unable to connect to the database. That is surprising, the database in question is running a on a triply redundant system and has survived quite a bit of abuse. It took some digging to figure out exactly what was going on, but the root cause was simple. Some server that I never even knew existed was down.In particular the crl.identrust.com server was down. I’m pretty familiar with our internal architecture, and that server isn’t something that we rely on. Or at least so I thought. CRL stands for Certificate Revocation List. Let’s see where it came from, shall we. Here is the certificate for this blog:This is signed by Let’s Encrypt, like over 50% of the entire internet. And the Let’s Encrypt certificate has this interesting tidbit in it:Now, note that this CRL is only used for the case in which a revocation was issued for Let’s Encrypt itself. Which is probably a catastrophic event for the entire internet (remember > 50%). When that server is down, the RavenDB client could not verify that the certificate chain was valid, so it failed the request. That was not expected and something that we are considering to disable by default. Certificate Revocation Lists aren’t really used that much today. It is more common to see OCSP (Online Certificate Status Protocol), and even that has issues.I would appreciate any feedback you have on the matter.
by Oren Eini
posted on: October 29, 2020
RavenDB is a document database, as such, it stores data in JSON format. We have had a few cases of users that wanted to use RavenDB as the backend of various blockchains. I’m not going to touch on their reasoning. I think that a blockchain is a beautiful construct, but one that is searching for a good niche to solve. The reason for this post, however, is that we need to consider one of the key problems that you have to deal with the blockchain, how to compute the signature of a JSON document. That is required so we’ll be able to build a merkle tree, which is at the root of all blockchains.There are things such as JWS and JOSE to handle that, of course. And rolling your own signature scheme is not advisable. However, I want to talk about a potentially important aspect of signing JSON, and that is that there isn’t really a proper canonical form of JSON. For example, consider the following documents: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters // All of them are the same { "Name": "Discworld", "Rating": 5 } {"Name":"Discworld","Rating":5} { "Name": "Discworld", "Rating": 2, "Rating": 5 } { "Rating": 5, "Name": "Discworld" } { "Rating": 5, "Name": "Discworld" } { "Name": "D\u0069scworl\u0064", "Rating": 5 } view raw data.json hosted with ❤ by GitHub All of those documents have identical output. Admittedly, you could argue about the one using multiple Rating properties, but in general, they are the same. But if we look at the byte level representation, that is very far from the case.A proper way to sign such messages would require that we’ll:Minify the output to remove any extra whitespace.Error on multiple properties with the same key. That isn’t strictly required, but is going to make everything easier.Output them in a sorted order.Normalize the string encoding to a single format.Normalize numeric encoding (for example, whatever you support only double precision floats or arbitrary sized numbers).Only then can you actually perform the actual signature on the raw bytes. That also means that you can’t just pipe the data to sha256() and call it a day.Another alternative is to ignore all of that and decide that the only thing that we actually care about in this case is the raw bytes of the JSON document. In other words, we’ll validate the data as raw binary, without caring about the semantic differences. In this case, the output of all the documents above will be different. Here is a simple example of cleaning up a JSON object to return a stable hash: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters async function hash_json(obj) { async function digestMessage(message) { const msgUint8 = new TextEncoder().encode(message); const hashBuffer = await crypto.subtle.digest('SHA-256', msgUint8); const hashArray = Array.from(new Uint8Array(hashBuffer)); const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join(''); return hashHex; } const replacer = (key, value) => value instanceof Object && !(value instanceof Array) ? Object.keys(value) .sort() .reduce((sorted, key) => { sorted[key] = value[key]; return sorted }, {}) : value; var msg = JSON.stringify(obj, replacer); return await digestMessage(msg); } view raw hash.js hosted with ❤ by GitHub That answer the above criteria and is pretty simple to run and work with. Including from other platforms and environments.
by Oren Eini
posted on: October 28, 2020
RavenDB is built to be your main database. It’s the system of record where you store all of your information. To minimize complexity, work, and cost, RavenDB also contains a fully-fledged full-text search engine. You can perform full text searches, but you don’t need a plugin or addon. This enables you to find interesting documents based on quite a lot of different criteria. In this Webinar, I show how you can run all sort of interesting queries and show off some of RavenDB’s full text search capabilities.
by Gérald Barré
posted on: October 28, 2020
When Blazor renders a page, it first generates the expected DOM. Then, it compares this expected DOM with the current DOM and it generates a diff. Finally, it applies the diff so the page corresponds to the expected one.The diffing algorithm must find the additions/editions/deletions, and generate
by Oren Eini
posted on: October 27, 2020
At the beginning of the year, we run into a problematic query. The issue was the use of an in clause vs. a series of OR. You can see the previous investigation results here. We were able to pinpoint the issue pretty well, very deep in the guts of Lucene, our query engine. Fast Query Slow Query Time: 1 – 2 ms Time: 60 – 90 ms The key issue for this query was simple. There are over 600,000 orders with the relevant statuses, but there are no orders for CustomerId “customers/100”. In the OR case, we would evaluate the query lazily. First checking the CustomerId, and given that there have been no results, short circuiting the process and doing no real work for the rest of the query. The IN query, on the other hand, would do things eagerly. That would mean that it would build a data structure that would hold all 600K+ documents that match the query, and then would throw that all away because no one actually needed that. In order to resolve that, I have to explain a bit about the internals of Lucene. As its core, you can think of Lucene in terms of sorted lists inside dictionaries. I wrote a series of posts on the topic, but the gist of it is: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters Dictionary<string Field, Dictionary<string Term, SortedList<int DocId>>> Lucene; view raw lucene.data-structure.cs hosted with ❤ by GitHub Note that the ids for documents containing a particular term are sorted. That is important for a lot of optimizations in Lucene, which is also a major problem for the in query. The problem is that each component in the query pipeline needs to maintain this invariant. But when we use an IN query, we need to go over potentially many terms. And then we need to get the results in the proper order to the calling code. I implemented a tiered approach. If we are using an IN clause with a small number of terms in it (under 128), we will use a heap to manage all the terms and effectively do a merge sort on the results. When we have more than 128 terms, that stops being very useful, however. Instead, we’ll create a bitmap for the possible results and scan through all the terms, filling the bitmap. That can be expensive, of course, so I made sure that this is done lazily by RavenDB. The results are in: OR Query IN Query Invalid CustomerId 1.39 – 1.5 ms 1.33 – 1.44 ms Valid CustomerId 17.5 ms 12.3 ms For the first case, this is now pretty much a wash. The numbers are slightly in favor of the IN query, but it is within the measurement fluctuations. For the second case, however, there is a huge performance improvement for the IN query. For that matter, the cost is going to be more noticeable the more terms you have in the IN query. I’m really happy about this optimization, it ended up being quite elegant.
by Ardalis
posted on: October 27, 2020
When planning, whether for a large project or a single feature, there will be risks. Identifying risks and planning appropriate mitigations…Keep Reading →
by Andrew Lock
posted on: October 27, 2020
Deploying ASP.NET Core applications to Kubernetes - Part 9
by Oren Eini
posted on: October 26, 2020
I had a task for which I need to track a union of documents and then iterate over them in order. It is actually easier to explain in code than in words. Here is the rough API: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters void Init(int maxId, IEnumerable<IEnumerable<int>> items); bool Exists(int id); int GetNextId(int id); view raw challenge.cs hosted with ❤ by GitHub As you can see, we initialize the value with a list of streams of ints. Each of the streams can contain any number of values in the range [0 … maxId). Different streams can the same or different ids. After initialization, we have to allow to query the result, to test whatever a particular id was stored, which is easy enough. If this was all I needed, we could make do with a simple HashSet<int> and mostly call it a day. However, we also need to support iteration, more interesting, we have to support sorted iteration. A quick solution would be to use something like SortedList<int,int>, but that is going to be massively expensive to do (O(N*logN) to insert). It is also going to waste a lot of memory, which is important. A better solution would be to use a bitmap, which will allow us to use a single bit per value. Given that we know the size of the data in advance, that is much cheaper, and the cost of insert is O(N) to the number of ids we want to store. Iteration, on the other hand, is a bit harder on a bitmap. Luckily, we have Lemire to provide a great solution. I have taken his C code and translated that to C#. Here is the result: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public class FastBitArray : IDisposable { private ulong[] _bits; public FastBitArray(int countOfBits) { _bits = ArrayPool<ulong>.Shared.Rent(countOfBits / 64 + (countOfBits % 64 == 0 ? 0 : 1)); new Span<ulong>(_bits).Clear(); } public void Set(int index) { _bits[index / 64] |= 1UL << index % 64; } public IEnumerable<int> Iterate(int from) { // https://lemire.me/blog/2018/02/21/iterating-over-set-bits-quickly/ int i = from / 64; if (i >= _bits.Length) yield break; ulong bitmap = _bits[i]; bitmap &= ulong.MaxValue << (from % 64); while (true) { while (bitmap != 0) { ulong t = bitmap & (ulong)-(long)bitmap; int count = BitOperations.TrailingZeroCount(bitmap); int setBitPos = i * 64 + count; if (setBitPos >= from) yield return setBitPos; bitmap ^= t; } i++; if (i >= _bits.Length) break; bitmap = _bits[i]; } } public void Dispose() { if (_bits == null) return; ArrayPool<ulong>.Shared.Return(_bits); _bits = null; } } view raw FastBitArray.cs hosted with ❤ by GitHub I’m using BitOperations.TrailingZeroCount, which will use the compiler intrinsics to compile this to a very similar code to what Lemire wrote. This allows us to iterate over the bitmap in large chunks, so even for a large bitmap, if it is sparsely populated, we are going to get good results.Depending on the usage, a better option might be a Roaring Bitmap, but even there, dense sections will likely use something similar for optimal results.
by Gérald Barré
posted on: October 26, 2020
Glob patterns (Wikipedia) are a very common way to specify a list of files to include or exclude. For instance, **/*.csproj match any file with the .csproj extension. You can use glob patterns in many cases, such as in the .gitignore file, in bash, or PowerShell..NET Core 2.1 introduced a new API f