skip to content
Relatively General .NET

Understanding the impact of Roslyn Analyzers on build time

by Gérald Barré

posted on: May 10, 2021

Roslyn analyzers analyze your code for style, quality and maintainability, design and other issues. .NET includes builtin analyzers (Microsoft.CodeAnalysis.NetAnalyzers), and it's very common to include a few other analyzers such as Microsoft.VisualStudio.Threading.Analyzers, Meziantou.Analyzer, or

Simple Systems and Gall's Law

by Ardalis

posted on: May 04, 2021

When it's time to build that big new system to replace the aging old one, consider Gall's Law and the benefit of frequent feedback and…Keep Reading →

Debouncing / Throttling JavaScript events in a Blazor application

by Gérald Barré

posted on: May 03, 2021

Debouncing and throttling are used to prevent too many events from being processing. For instance, when a user types text in a search bar, you may want to wait until they stop writing for a few milliseconds before executing the search request. This is what debouncing is made for. When using debounc

Practical considerations for implementing Raft

by Oren Eini

posted on: April 29, 2021

RavenDB has been using the Raft protocol for the past years. In fact, we have written three or four different implementations of Raft along the way. I implemented Raft using pure message passing, on top of async RPC and on top of TCP. I did that using actor model and using direct parallel programming as well as the usual spaghettis mode. The Raft paper is beautiful in how it explain a non trivial problem in a way that is easy to grok, but it is also something that can require dealing with a number of subtleties. I want to discuss some of the ways to successfully implement it. Note that I’m assuming that you are familiar with Raft, so I won’t explain anything here. A key problem with Raft implementations is that you have multiple concurrent things happening all at once, on different machines. And you always have the election timer waiting in the background. In order to deal with that, I divide the system into independent threads that each has their own task. I’m going to talk specifically about the leader mode, which is the most complex aspect, usually. In this mode, we have: Leader thread – responsible for determining the current progress in the cluster. Follower thread – once per follower – responsible for communicating with a particular follower. In addition, we may have values being appended to our log concurrently to all of the above. The key here is that the followers threads will communicate with their follower and push data to it. The overall structure for a follower thread looks like this: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters global log_append_event global follower_update_event global followers_index def follower_thread(self, follower_url): last_shared_index =connect_to_follower(follower_url) while True: batch = log.get_after(last_shared_index+1, take: 50) send_to_follower(AppendEntries(Term, batch)) if len(batch) > 0: last_shared_index = batch[^1].index followers_index[follower_url] = (last_shared_index, datetime.now()) follower_update_event.set() else: # wait for new entries or timeout log_append_event.wait(election_timeout/3) view raw follower_thread.py hosted with ❤ by GitHub What is the idea? We have a dedicated thread that will communicate with the follower. It will either ping the follower with an empty AppendEntries (every 1/3 of the election timeout) or it will send a batch of up to 50 entries to update the follower. Note that there is nothing in this code about the machinery of Raft, that isn’t the responsibility of the follower thread. The leader, on the other hand, listen to the notifications from the followers threads, like so: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters global follower_update_event global followers_index def leader_thread(): last_commit = 0 while True: follower_update_event.wait(election_timeout) times = followers_index.select(x=>x.last_ping_time).sort() # include ourselves followers_index[leader_url] = (last_index, datetime.now()) majority_time = times[len(times)/2] if (datetime.now() - majority_time) > election_timeout: return# no longer leader indexes = followers_index.select(x=>x.last_index).sort() majority_index = indexes[len(indexes)/2] if last_commit < majority_index: commit_log(majority_index); last_commit = majority_index view raw leader_thread.py hosted with ❤ by GitHub The idea is that each aspect of the system is running independently, and the only communication that they have with each other is the fact that they can signal the other that they did some work. We then can compute whatever that work changed the state of the system. Note that the code here is merely drafts, missing many details. For example, we aren’t sending the last commit index on AppendEntries, and committing the log is an asynchronous operation, since it can take a long time and we need to keep the system in operation.

Influential computer science papers

by Oren Eini

posted on: April 28, 2021

I run into this question on Hacker News, asking for the best computer science papers. There are a few that I keep getting back to, either because they are so fundamental or they are so useful. Without any particular orderThe Raft Paper – a distributed consensus algorithm that made sense to me on first read. There are a lot of subtle issues to consider, but when reading the paper, everything clicked. That is head and shoulders above what Paxos literature is about.The Ubiquitous BTree – talk about a paper that I used daily. Admittedly, I didn’t get started on BTrees from this paper, but this is a very well written one and it does a great job presenting the topic. It is also from 1979, and BTree were already “ubiquitous” at that time, which tells us something.Extendible Hashing – this is also from 1979, and it is well written. I implemented extendible hashing based on this article directly and I grokked it right away. How Complex Systems Fail – not strictly a computer science paper. In fact, I’m fairly certain that this fits more into civil engineering, but it does an amazing job of explaining the internals of complex systems and the why and how they fail. I took a lot from this paper. It is also very short and highly readable.OLTP Through the Looking Glass – discuss the internal structure of database engines and the cost and complexities of their various pieces.You’re doing it wrong – discuss the implementation of Varnish proxy from the point of view of a kernel hacker. Totally different approach to the design of the system. Had a lot of influence on how I build systems.I’m fairly certain that my criteria won’t be yours, but those are all papers that I have read multiple times and have utilized their insights in my daily work.

Convert SVG files to PNG or JPEG using .NET

by Gérald Barré

posted on: April 26, 2021

Recently, I had to convert an SVG file to PNG. I tried a few free online converters, but none of them were able to convert the image correctly. Indeed, they don't support custom fonts… Instead of wasting more time searching for an existing tool, I opened Visual Studio and I made a quick application

Slow and predictable vs. fast and bursty

by Oren Eini

posted on: April 21, 2021

This is a tale of two options that we took for an exhaustive test. Amazon recently came out with a new disk type on the cloud. As a database vendor, that is of immediate interest to me, so we took a deep look into that. GP3 disks are about 20% cheaper than their GP2 equivalent. What is more, they come with a guarantee level of performance even before you purchase additional IOPS. Consider the following two disks:   Size IOPS MB/S Price GP2 512GB 1,536 250 51.2 USD GP3 512GB 3,000 125 40.9 USD GP3 512GB 4,075 250 51.2 USD In other words, for the same disk, we can get a much better baseline performance at a cheaper price. What isn’t there not to like? The major difference between GP2 and GP3, however, is their latency. In practice, we see an additional 1 – 2 milliseconds in response times from the GP3 disk vs. the GP2 disk. In other words, GP3 disks are somewhat slower, even if they are able to run more IOPS, their latency is higher. A really key observation from us, however, is that GP3 does not offer burst I/O capabilities. And that means that I can breath a huge sigh of relief. RavenDB as a database is meant to run on anything from an SD card to HDD to SSD to NVMe drives. We are used to account for the I/O being the slowest thing around and have already mostly coded around that. An additional millisecond in disk latency doesn’t matter that much in the grand scheme of things. However… the fact that this doesn’t provide I/O burst is a huge plus for us. RavenDB can easily deal with slow I/O, what it find it very hard to deal with is an environment that very rapidly change its operational characteristics. Let’s assume that we have a 100 GB GP2 disk, which means that we have a baseline of around 300 IOPS and 75MB / sec of throughput. RavenDB is under some high load, and it is using the maximum capabilities of the hardware. However, because of burstiness, we are actually able to utilize 3,000 IOPS and 250MB/sec for a while. Until all the I/O credits are gone and we are forced into a screeching halt. That means, for example, that we read from the network at a rate of 250MB/sec, but we are unable to write to the disk at this level. There is a negative balance of 125MB/sec that needs to be stored some where. We can buffer that in memory, of course, but that only work for so long. That means that we have to put a huge break all of a sudden, which the rest of the eco system isn’t happy with. For example, the other side that is sending us data at 250MB /sec, they are likely not going to be able to respond in time to the shift is our behavior. It is very likely that the network connection would congest and break in this case. All of the internal optimizations inside of RavenDB will also be skewed for a while, until we are used to the new level of speed. If this was gradual, we could adjust a lot more easily, but this is basically like hitting the brakes at speed. You will slow down, sure, but you are also likely to cause an accident. As a simple example, RavenDB can compress the data that it writes to disk, and it balances the compression ratio vs. the cost to write to the disk. If we know that the disk is slow, we can spend more time trying to reduce the amount of data we write. If this changes rapidly, we are operating under the old assumptions and may create a true traffic jam The fact that GP3 disks have a predictable performance profile means that we are much better suited to run on them. A more predictable platform from which to operate gives me a much better opportunity to handle optimizations.

Add ImgBot to your GitHub Repository

by Ardalis

posted on: April 21, 2021

My blog is hosted on GitHub using GatsbyJS and Netlify. One nice thing about this setup is that I have complete control over my content, and…Keep Reading →