Page 59 • Relatively General .NET

Keyed service dependency injection container support

by Andrew Lock

posted on: August 15, 2023

Exploring the .NET 8 preview - Part 6

The role of GitHub in paying for Open Source Software

by Oren Eini

posted on: August 14, 2023

I have been doing Open Source work for just under twenty years at this point. I have been paying my mortgage from Open Source software for about 15. I’m stating that to explain that I have spent quite a lot of time struggling with the inherent tension between having an Open Source project and getting paid. I wrote about it a few times in the past. It is not a trivial problem, and the core of the issue is not something that you can easily solve with technical means. I ran into this fascinating thread on Twitter that over the weekend: And another part of that is here: I’m quoting the most relevant pieces, but the idea is pretty simple. Donations don’t work, period. They don’t work not because companies are evil or developers don’t want to pay for Open Source. They don’t work because it takes a huge amount of effort to actually get paid. If you are an independent developer, your purchasing process goes something like this: I would like to use this thing I need to pay for that The price matches the value I’m getting Where is my credit card… Paid! Did you note step 2? The part about needing to pay? If you don’t have that step, what will happen? Same scenario, an independent developer: I would like to use this thing I use this thing It would be great to pay something to show my appreciation Where did I put the credit card? Oh, it’s down the hall… I’ll get to that later (never). That is in the best-case scenario where the thought of donating actually crossed your mind. In most likelihood, the process is more: I would like to use this thing I use this thing Ticket closed, what is the next one… ? Now, what happens if you are not an independent developer? Let’s say that you are a contract worker for a company. You need to talk to your contact person, they will need to get purchasing approval. Depending on the amount, that may require escalating upward a few levels, etc. Let’s say that the amount is under 100$, so basically within the budgetary discretion of the first manager you run into. They would still need to know what they are paying for, what they are getting out of that (they need to justify that). If this is a donation, welcome to the beauty of tax codes in multiple jurisdictions and what counts as such. If this is not a donation, what do they get? That means that you now have to do a meeting, potentially multiple ones. Present your case, open a new supplier at the company, etc. The cost of all of those is high, both in time and money. Or… you can just nuget add-package and move on. In the case of RavenDB, it is an Open Source software (a license to match, code is freely available), but we treat it as a commercial project for all intents and purposes. If you want to install RavenDB, you’ll get a popup saying you need a license, directing you to a page where you see how much we would like to get and what do you get in return, etc. That means that from a commercial perspective, we are in a familiar ground for companies. They are used to paying for software, and there isn’t an option to just move on to the next task. There is another really important consideration here. In the ideal Open Source donation model, money just shows up in your account. In the commercial world, there is a huge amount of work that is required to get things done. That is when you have a model where “the software does not work without a purchase”. To give some context, 22% is Sales & Marketing and they spent around 21.8 billion in 2022 on Sales & Marketing. That is literally billions being spent to make sales. If you want to make money, you are going to invest in sales, sales strategy, etc. I’m ignoring marketing here because if you are expected to make money from Open Source, you likely already have a project well-known enough to at least get started. That means that you need to figure out what you are charging for, how do you get customers, etc. In the case of RavenDB, we use the per-core model, which is a good indication of how much use the user is getting from RavenDB. LLBLGen Pro, on the other hand, they are charging per seat. Particular’s NServiceBus uses a per endpoint / number of messages a day model. There is no one model that fits all. And you need to be able to tailor your pricing model to how your users think about your software. So pricing strategy, creating a proper incentive to purchase (hard limit, usually) and some sales organization to actually drive all of that are absolutely required. Notice what is missing here? GitHub. It simply has no role at all up to this point. So why the title of this post? There is one really big problem with getting paid that GitHub can solve for Open Source (and in general, I guess). The whole process of actually getting paid is absolutely atrocious. In the best case, you need to create a supplier at the customer, fill up various forms (no, we don’t use child labor or slaves, indeed), figure out all sorts of weird roles (German tax authority requires special dispensation, and let’s not talk about getting paid from India, etc). Welcome to Anti Money Laundering roles and GDPR compliance with Known Your Customer and SOC 2 regulations. The last sentence is basically nonsense words, but I understand that if you chant it long enough, you get money in the end. What GitHub can do is be a payment pipe. Since presumably your organization is already set up with them in place, you can get them to do the invoicing, collecting the payment, etc. And in the end, you get the money. That sounds exactly like GitHub Sponsorships, right? Except that in this case, this is no a donation. This is a flat-out simple transaction, with GitHub as the medium. The idea is that you have a limit, which you enforce, on your usage, and GitHub is how you are paid. The ability to do it in this fashion may make things easier, but I would assume that there are about three books worth of regulations and EULAs to go through to make it actually successful. Yet, as far as I’m concerned, that is really the only important role that we have for GitHub here. That is not a small thing, mind. But it isn’t a magic bullet.

Supporting custom protocols in WebView2

by Gérald Barré

posted on: August 14, 2023

WebView2 is a new web browser control for Windows desktop applications. It is based on the Chromium engine and is a replacement for the old WebBrowser control. A browser mostly handles HTTP requests, but it is also possible to handle custom protocols. Custom protocols are used by some applications

On Moq & SponsorLink: Some thoughts

by Oren Eini

posted on: August 10, 2023

Today I ran into this Reddit post, detailing how Moq is now using SponsorLink to encourage users to sponsor the project. The idea is that if you are using the project, you’ll sponsor it for some amount, which funds the project. You’ll also get something like this: This has been rolled out for some projects for quite some time, it seems. But Moq is a far more popular project and it got quite a bit of attention. It is an interesting scenario, and I gave some thought to what this means. I’m not a user of Moq, just to note. I absolutely understand the desire to be paid for Open Source work. It takes a lot of time and effort and looking at the amount of usage people get out of your code compared to the compensation is sometimes ridiculous. For myself, I can tell you that I made 800 USD out of Rhino.Mocks directly when it was one of the most popular mocking frameworks in the .NET world. That isn’t a sale, that is the total amount of compensation that I got for it directly. I literally cannot total the number of hours that I spent on it. But OpenHub estimates it as 245 man-years. I… disagree with that estimate, but I certainly put a lot of time there. From a commercial perspective, I think that this direction is a mistake. Primarily because of the economies of software purchases. You can read about the implementation of SponsorLink here. The model basically says that it will check whether the individual user has sponsored the project. That is… not really how it works. Let’s say that a new developer is starting to work on an existing project. It is using a SponsorLink project. What happens then? That new developer is being asked to sponsor the project? If this is a commercial project, I certainly support the notion that there should be some payment. But it should not be on the individual developer, it should be on the company that pays for the project. That leaves aside all the scenarios where this is being used for an open source project, etc. Let’s ignore those for now. The problem is that this isn’t how you actually get paid for software. If you are targeting commercial usage, you should be targeting companies, not individual users. More to the point, let’s say that a developer wants to pay, and their company will compensate them for that. The process for actually doing that is atrocious beyond belief. There are tax implications (if they sponsor with 5$ / month and their employer gives them a 5$ raise, that would be taxed, for example), so you need to submit a receipt for expenses, etc. A far better model would be to have a way to get the company to pay for that, maybe on a per project basis. Then you can detect if the project is sponsored, for example, by looking at the repository URL (and accounting for forks). Note that at this point, we are talking about the actual process of getting money, nothing else about this issue. Now, let’s get to the reason that this caused much angst for people. The way SponsorLink works is that it fetches your email from the git configuration file and check wether: You are registered as a SponsorLink sponsor You are sponsoring this particular project It does both checks using what appears to be: base62(sha256(email)); If you are already a SponsorLink sponsor, you have explicitly agreed to sharing your email, so not a problem there. So the second request is perfectly fine. The real problem is the first check, when you check if you are a SponsorLink sponsor in the first place. Let’s assume that you aren’t, what happens then. Well, there is a request made that looks something like this: HEAD /azure-blob-storage/path/app/3uVutV7zDlwv2rwBwfOmm2RXngIwJLPeTO0qHPZQuxyS The server will return a 404 if you are not a sponsor at this point. The email hash above is my own, by the way. As I mentioned, I’m not a sponsor, so I assume this will return 404. The question is what sort of information is being provided to the server in this case? Well, there is the hashed email, right? Is that a privacy concern? It is indeed. While reversing SHA256 in general is not possible, for something like emails, that is pretty trivial. It took me a few minutes to find an online tool that does just that. The cost is around 0.00045 USD / email, just to give some context. So the end result is that using SponsorLink will provide the email of the user (without their express or implied consent) to the server. It takes a little bit of extra work, but it most certainly does. Note that practically speaking, this looks like it hits Azure Blob Storage, not a dedicated endpoint. That means that you can probably define logging to check for the requests and then gather the information from there. Not sure what you would do with this information, but it certainly looks like this falls under PII definition on the GDPR. There are a few ways to resolve this problem. The first would be to not use email at all, but instead the project repository URL. That may require a bit more work to resolve forks, but would alleviate most of the concerns regarding privacy. A better option would be to just check for an included file in the repository, to be honest. Something like: .sponsored.projects file. That would include the ids of the projects that were sponsored by this project, and then you can run a check to see that they are actually sponsored. There is no issue with consent here, since adding that file to the repository will explicitly consent for the process. Assuming that you want / need to use the emails still, the problem is much more complex. You cannot use the same process as k-Anonymity as you can use for passwords. The problem is that a SHA256 of an email is as good as the email itself. I think that something like this would work, however. Given the SHA256 of the email, you send to the server the following details: prefix = SHA256(email)[0 .. 6] key = read(“/dev/urandom”, 32bytes) hash = HMAC-SHA256(key, SHA256(email) The prefix is the first 6 letters of the SHA256 hash. The key has cryptography strength of 32 random bytes. The hash is taking the SHA256 and hashing it again usung HMAC with the provided key. The idea is that on the server side, you can load all the hashes that you stored that match the provided prefix. Then you compute the keyed HMAC for all of those values and attempt to check if there is a match. We are trying to protect against a malicious server here, remember. So the idea is that if there is a match, we pinged the server with an email that it knows about. If we ping the server with an email that it does not know about, on the other hand, it cannot tell you anything about the value. The first 6 characters of the SHA256 will tell you nothing about the value, after all. And the fact that we use a random key to sending the actual hash to the server means that there is no point trying to figure it out. Unlike trying to guess an email, guessing a hash of an email is likely far harder, to the point that it is not feasible. Note, I’m not a cryptography expert, and I wouldn’t actually implement such a thing without consulting with one. I’m just writing a blog post with my ideas. That would at least alleviate the privacy concern. But there are other issues. The SponsorLink is provided as a closed-source obfuscated library. People have taken the time to de-obfuscate it, and so far it appears to be matching the documented behavior. But the mere fact that it is actually obfuscated and closed-source inclusion in an open-source project raises a lot of red flags. Finally, there is the actual behavior when it detects that you are not sponsoring this project. Here is what the blog post states will happen: It will delay the build (locally, on your machine, not on CI). That… is really bad. I assume that this happens on every build (not sure, though). If that is the case, that means that the feedback cycle of "write a test, run it, write code, run a test", is going to hit significant slowdowns. I would consider this to be a breaking point even excluding everything else. As I previously stated, I’m all for paying for Open Source software. But this is not the way to do that, there is a whole bunch of friction and not much that can indicate a positive outcome for the project. Monetization strategies for Open Source projects are complex. Open core, for example, likely would not work for this scenario. Nor would you be likely to get support contracts. The critical aspect is that beyond just the technical details, any such strategy requires a whole bunch of infrastructure around it. Marketing, sales, contract negotiation, etc. There is no easy solution here, I’m afraid.

Porting Moq to NSubstitute

by Ardalis

posted on: August 10, 2023

If for some reason you find yourself wanting to switch from Moq to another test double framework like NSubstitute, here's how to do it…Keep Reading →

Introducing the new IHostedLifecycleService Interface in .NET 8

by Steve Gordon

posted on: August 09, 2023

As regular readers will be aware, an area of .NET which I follow closely is Microsoft.Extensions.Hosting. I’ve already blogged about a change in .NET 8, where new concurrency options have been introduced to support parallel running of the StartAsync and StopAsync across multiple IHostedServices. In this post, we’ll look at some new lifecycle events introduced […]

A performance profile mystery: The cost of Stopwatch

by Oren Eini

posted on: August 09, 2023

Measuring the length of time that a particular piece of code takes is a surprising challenging task. There are two aspects to this, the first is how do you ensure that the cost of getting the start and end times won’t interfere with the work you are doing. The second is how to actually get the time (potentially many times a second) in as efficient way as possible. To give some context, Andrey Akinshin does a great overview of how the Stopwatch class works in C#. On Linux, that is basically calling to the clock_gettime system call, except that this is not a system call. That is actually a piece of code that the Kernel sticks inside your process that will then integrate with other aspects of the Kernel to optimize this. The idea is that this system call is so frequent that you cannot pay the cost of the Kernel mode transition. There is a good coverage of this here. In short, that is a very well-known problem and quite a lot of brainpower has been dedicated to solving it. And then we reached this situation: What you are seeing here is us testing the indexing process of RavenDB under the profiler. This is indexing roughly 100M documents, and according to the profiler, we are spending 15% of our time gathering metrics? The StatsScope.Start() method simply calls Stopwatch.Start(), so we are basically looking at a profiler output that says that Stopwatch is accounting for 15% of our runtime? Sorry, I don’t believe that. I mean, it is possible, but it seems far-fetched. In order to test this, I wrote a very simple program, which will generate 100K integers and test whether they are prime or not. I’m doing that to test compute-bound work, basically, and testing calling Start() and Stop() either across the whole loop or in each iteration. I run that a few times and I’m getting: Windows: 311 ms with Stopwatch per iteration and 312 ms without Linux: 450 ms with Stopwatch per iteration and 455 ms without On Linux, there is about 5ms overhead if we use a per iteration stopwatch, on Windows, it is either the same cost or slightly cheaper with per iteration stopwatch. Here is the profiler output on Windows: And on Linux: Now, that is what happens when we are doing a significant amount of work, what happens if the amount of work is negligible? I made the IsPrime() method very cheap, and I got: So that is a good indication that this isn’t free, but still… Comparing the costs, it is utterly ridiculous that the profiler says that so much time is spent in those methods. Another aspect here may be the issue of the profiler impact itself. There are differences between using Tracing and Sampling methods, for example. I don’t have an answer, just a lot of very curious questions.

QCon San Francisco Workshop: Building a database from the ground up

by Oren Eini

posted on: August 08, 2023

I’m going to QCon San Francisco and will be teaching a full day workshop where we’ll start from a C compiler and an empty file and end up with a functional storage engine, indexing and more. Included in the minimum requirements are implementing transactions, MVCC, persistent data structures, and indexes. The workshop is going to be loosely based on the book, but I’m going to condense things so we can cover this topic in a single day. Looking forward to seeing you there.

ASP.NET Core in Action, Third Edition is now in print

by Andrew Lock

posted on: August 08, 2023

My new book, ASP.NET Core in Action, Third Edition, is now in print. In this post I describe some of the changes, how you can get it, and how to get money off…

Struct memory layout optimizations, practical considerations

by Oren Eini

posted on: August 07, 2023

In my previous post I discussed how we could store the exact same information in several ways, leading to space savings of 66%! That leads to interesting questions with regard to actually making use of this technique in the real world.The reason I posted about this topic is that we just gained a very significant reduction in memory (and we obviously care about reducing resource usage). The question is whether this is something that you want to do in general.Let’s look at that in detail. For this technique to be useful, you should be using structs in the first place. That is… not quite true, actually. Let’s take a look at the following declarations: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters public class PersonClass { public int Id; public DateTime Birthday; public ushort Kids; } public struct PersonStruct { public int Id; public DateTime Birthday; public ushort Kids; } view raw StructVsClass.cs hosted with ❤ by GitHub We define the same shape twice. Once as a class and once as a structure. How does this look in memory? Typelayoutfor'PersonClass'Size:32bytes.Paddings:2bytes%12ofObjectHeader8bytesMethodTablePtr8bytes1619:Int32Id4bytes2021:UInt16Kids2bytes2223:padding2bytes2432:DateTimeBirthday8bytesemptyspaceTypelayoutfor'PersonStruct'Size:24bytes.Paddings:10bytes%41of03:Int32Id4bytes47:padding4bytes815:DateTimeBirthday8bytes1617:UInt16Kids2bytes1823:padding6bytesemptyspace Here you can find some really interesting differences. The struct is smaller than the class, but the amount of wasted space is much higher in the struct. What is the reason for that?The class needs to carry 16 bytes of metadata. That is the object header and the pointer to the method table. You can read more about the topic here. So the memory overhead for a class is 16 bytes at a minimum. But look at the rest of it.You can see that the layout in memory of the fields is different in the class versus the structure. C# is free to re-order the fields to reduce the padding and get better memory utilization for classes, but I would need [StructLayout(LayoutKind.Auto)] to do the same for structures. The difference between the two options can be quite high, as you can imagine. Note that automatically laying out the fields in this manner means that you’re effectively declaring that the memory layout is an implementation detail. This means that you cannot persist it, send it to native code, etc. Basically, the internal layout may change at any time. Classes in C# are obviously not meant for you to poke into their internals, and LayoutKind.Auto comes with an explicit warning about its behavior.Interestingly enough, [StructLayout] will work on classes, you can use to force LayoutKind.Sequential on a class. That is by design, because you may need to pass a part of your class to unmanaged code, so you have the ability to control memory explicitly. (Did I mention that I love C#?) Going back to the original question, why would you want to go into this trouble? As we just saw, if you are using classes (which you are likely to default to), you already benefit from the automatic layout of fields in memory. If you are using structs, you can enable LayoutKind.Auto to get the same behavior.This technique is for the 1% of the cases where that is not sufficient, when you can see that your memory usage is high and you can benefit greatly from manually doing something about it.That leads to the follow-up question, if we go about implementing this, what is the overhead over time? If I want to add a new field to an optimized struct, I need to be able to understand how it is laid out in memory, etc. Like any optimization, you need to maintain that. Here is a recent example from RavenDB.In this case, we used to have an optimization that had a meaningful impact. The .NET code changed, and the optimization now no longer makes sense, so we reverted that to get even better perf.At those levels, you don’t get to rest on your laurels. You have to keep checking your assumptions.If you got to the point where you are manually optimizing memory layouts for better performance, there are two options:You are doing that for fun, no meaningful impact on your system over time if this degrades.There is an actual need for this, so you’ll need to invest the effort in regular maintenance.You can make that easier by adding tests to verify those assumptions. For example, verifying the amount of padding in structs match expectation. A simple test that would verify the size of a struct would mean that any changes to that are explicit. You’ll need to modify the test as well, and presumably that is easier to catch / review / figure out than just adding a field and not noticing the impact. In short, this isn’t a generally applicable pattern. This is a technique that is meant to be applied in case of true need, where you’ll happily accept the additional maintenance overhead for better performance and reduced resource usage.