Page 85 • Relatively General .NET

Modern programming languages require generics

by Oren Eini

posted on: May 24, 2022

A few weeks ago I wrote about the Hare language and its lack of generic data structures. I don’t want to talk about this topic again, instead I want to discuss something more generic (pun intended). In my view, any modern programming language that aims for high performance should have some form of generics in it. To not have that in place is a major mistake and a huge cause for additional complexity and loss of performance. One aspect of that is the fact that generic data structures get a lot more optimizations than one-off implementations. But I already talked about that in the previous post. The other issue is that by not having generics, there is a huge barrier for optimizations in front of you. You lack the ability to build certain facilities at all. Case in point, let us take a topic that is near and dear to my heart, sorting. Working on sorted data is pretty much the one thing that makes databases work. Everything else is just details on top of that, nothing more. Let’s consider how you sort data (in memory) using a few programming languages, using their definitions Using C: void qsort (void *array, size_t count, size_t size, comparison_fn_t compare);int comparison_fn_t (const void *, const void *); Using C++: template <class RandomAccessIterator> void sort (RandomAccessIterator first, RandomAccessIterator last); Using Java: public static void sort(int [] a);public static void sort(long[] a);public static void sort(Object[] a); Using C#: public static void Sort<T> (T[] array); Using Hare: type cmpfunc = fn(a: const *void , b: const *void ) int ; fn sort([]void , size, *cmpfunc) void ; Using Rust: impl<T> [T] { pub fn sort(&mut self) where T: Ord, } Using Zig: pub fn sort( comptime T: type, items: []T, context: anytype, comptime lessThan: fn (context: @TypeOf(context), lhs: T, rhs: T) bool, ) void I’m looking only at the method declaration, not the implementation. In fact, I don’t care about how this is implemented at this point. Let’s assume that I want to sort an array of integers, what would be the result in all of those languages? Well, they generally fall into one of a few groups: C & Hare – will require you to write something like this: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters int cmp_asc_int(const void *a, const void *b) { return *(int*)a > *(int*)b; } qsort(array, len, sizeof(int), cmp_asc_int); view raw sort.c hosted with ❤ by GitHub In other words, we are passing a function pointer to the sorting routine and we’ll invoke that on each comparison. C++, C#, Rust, Zig – will specialize the routine for the call. On invocation, this will look like this: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters std::sort(array.begin(), array.end()); view raw sort.cp hosted with ❤ by GitHub The idea is that the compiler is able to emit code specifically for the invocation we use. Instead of having to emit a function call on each invocation, the compare call will usually be inlined and the cost of invocation is completely eliminated. Java is the only one on this list that has a different approach. Instead of using generics at compile time, it is actually doing a dispatch of the code to optimized routines based on runtime types. That does mean that they had to write the same sort code multiple times, of course. Note that this isn’t anything new or novel. Here is a discussion on the topic when Go got generics, in the benchmark there, there is a 20% performance improvement from moving to the generics version. That results from avoiding the call overhead as well as giving the compiler more optimization opportunities. Going back to the premise of this post, you can see how a relatively straightforward decision (having generics in the language) can have a huge impact on the performance of what is one of the most common scenarios in computer science. The counter to this argument is that we can always specialize the code for our needs, right? Except… that this isn’t something that happens. If you have generics, you get this behavior for free. If you don’t, well, this isn’t being done. I write databases for a living, and the performance of our sorting code is something that we analyze at the assembly level. Pretty much every database developer will have the same behavior, I believe. The performance of sorting is pretty key to everything a database does. I run into this post, talking about performance optimizations in Postgres, and one of the interesting ones there was exactly this topic. Changing the implementation of sorting from using function pointers to direct calls. You can see the commit here. Here is what the code looks like: Postgres is 25 years old(!) and this is a very well known weakness of C vs. C++. Postgres is also making a lot of sorting calls, and this is the sort of thing that is a low hanging fruit for performance optimization. As for the effect, this blog post shows 4% – 6% improvement in overall performance as a result of this change. That means that for those particular routines, the effect is pretty amazing. I can think of very few scenarios where a relatively simple change can bring about 6% performance improvement on a well-maintained and actively worked-on 25-year-old codebase. Why am I calling it out in this manner, however? Because when I ran into this blog post and the optimization, it very strongly resonated with the previous discussion on generics. It is a great case study for the issue. Because the language (C, in the case of Postgres) isn’t supporting generics in any meaningful way, those sorts of changes aren’t happening, and they are very costly. A modern language that is aiming for performance should take this very important aspect of language design into account. To not do so means that your users will have to do something similar to what Postgres is doing. And as we just saw, that sort of stuff isn’t done. Not having generics means that you are forcing your users to leave performance on the table. Indeed, pretty much all the modern languages that care for high performance have generics. The one exception that I can think of is Java, and that is because it chose backward compatibility when it added generics. Adding this conclusion to the previous post about generics data structure, I think that the final result is glaringly obvious. If you want high-performance system, you should choose a language that allows you to express it easily and succinctly. And generics are mandatory tooling in the box for that.

Reducing the size of a git repository with git-replace

by Andrew Lock

posted on: May 24, 2022

In this post I show how you can split a git repo into 'current' and 'history' repos, and then join them again using git-replace as necessary…

RavenDB at Rakuten Kobo recording is now available

by Oren Eini

posted on: May 23, 2022

I had a great discussion with Trevor, the CTO of Rakuten Kobo about their use of RavenDB, you can watch it here:

Performance: Lambda Expressions, Method Groups, and delegate caching

by Gérald Barré

posted on: May 23, 2022

Delegates are used to pass methods as arguments to other methods. The most common delegates are Action, Func<T>, and EventHandler. You can use a lambda expression to provide a delegate or you can use a method group. You can also cache the delegate into a field and reuse the instance when need

Domain Modeling - Encapsulation

by Ardalis

posted on: May 18, 2022

Domain models should encapsulate logic operations so that there is only one way to perform a given logical operation. That means avoiding…Keep Reading →

Rewriting git history simply with git-filter-repo

by Andrew Lock

posted on: May 17, 2022

In this post I describe how I used git-filter-repo in Docker to rewrite the history of a git repository to move files into a subfolder…

Copying a collection: ToList vs ToArray

by Gérald Barré

posted on: May 16, 2022

It's common to use ToList() or ToArray() to copy a collection to a new collection. I've seen many comments on the web about which one is the most performant without any proof. So, it was time to run a benchmark.C#copy[MemoryDiagnoser] public class CloneCollectionBenchmark { private byte[] _arra

Domain Modeling - Anemic Models

by Ardalis

posted on: May 11, 2022

When building a domain model, proper object-oriented design and encapsulation should be applied as much as possible. Some teams choose to…Keep Reading →

Who can give a refund?

by Oren Eini

posted on: May 10, 2022

Consider an eCommerce system where customers can buy stuff. Part of handling commerce is handling faults. Those range from “I bought the wrong thing” to “my kid just bought a Ferrari”. Any such system will need some mechanism to handle fixing those faults. The simplest option we have is the notion of refunds. “You bought by mistake, we can undo that”. In many systems, the question is then “how do we manage the process of refunds”? You can do something like this: So a customer requests a refund, it is processed by the Help Desk and is sent for approval by Finance, who is then consulting Fraud and then get sign off by the vice –CFO. There are about 12 refunds a quarter, however. Just the task of writing down the rules for processing refunds costs more than that. Instead, a refund policy can state that anyone can request a refund within a certain time frame. At which point, the act of processing a refund becomes: Is there a potential for abuse? Probably, but it is going to be caught naturally as we see the number of refunds spike over historical levels. We don’t need to do anything. In fact, the whole idea relies on two important assumptions: There is a human in the loop They are qualified to make decisions and relied upon to try to do the right thing Trying to create a process to handle this is a bad idea if the number of refunds is negligible. It costs too much, and making refunds easy is actually a goal (since that increases trust in the company as a whole).

Testing ASP.NET Core gRPC services in JetBrains Rider

by Andrew Lock

posted on: May 10, 2022

In this post I show how you can use the tools built into JetBrains' Rider IDE to test your ASP.NET Core gRPC endpoints…