skip to content
Relatively General .NET

Little's Law

by Ardalis

posted on: November 24, 2020

Little's Law was first described in 1954 and later proved by John Little in 1961. It is typically expressed as: L = λW L represents the…Keep Reading →

The design of concurrent subscriptions in RavenDB

by Oren Eini

posted on: November 23, 2020

One of the interesting features with RavenDB is Subscriptions. These allow you to define a query and then to subscribe to its results. When a client opens the subscription, RavenDB will send it all the documents matching the subscription query. This is an ongoing process, so you will get updates from documents that were modified after your subscription started. For example, you can ask to get: “All orders in UK”, and then use that to compute tax / import rules. Subscriptions are ideal for a number of use cases, but backend business processing is where they shine. This is because of the other property that subscriptions have, the ability to process the subscription results reliably. In other words, a failure in process a subscription batch will not lose anything, we can simply restart the subscription. In the same manner, a server failure will simply failover to another node and things will continue processing normally. You can shut down the client for an hour or a week and when the subscription is started, it will process all the matching documents that were changed while we didn’t listen.Subscriptions currently have one very strong requirement. There can only ever be a single subscription client open at a given point. This is done to ensure that we can process the batches reliably. A subscription client will accept a batch, process it locally and then acknowledge it to the server, which will then send the next one.Doing things in this manner ensures that there is an obvious path of progression in how the subscription operates. However, there are scenarios where you’ll want to use concurrent clients on a single subscription. For example, if you have a lengthy computation required, and you want to have concurrent workers to parallelize the work. That is not a scenario that is currently supported, and it turns out that there are significant challenges in supporting it. I want to use this post to walk through them and consider possible options.The first issue that we have to take into account is that the fact that subscriptions are reliable is a hugely important feature, we don’t want to lose that. This means that if we allow multiple concurrent clients at the same time, we have to have a way to handle a client going down. Right now, RavenDB keeps track of a single number to do so. You can think about it as the last modified date that was sent to the subscription client, this isn’t how it works, but it is a close enough lie that would save us the deep details.In other words, we send a batch of documents to the client and only update our record of the “last processed” when the batch is acknowledged. This design is simple and robust, but it cannot handle the situation when we have concurrent clients that are processing batches. We have to account for a client failing to process a batch and needing to resend it. This can be sent to the same client or to another one. That means that in addition the last “last processed” value, we also need to keep a record of in flight documents that were sent in batches and hasn’t been acknowledged yet. We keep track of our clients by holding on to the TCP connection. That means that as long as the connection is open, the batch of documents that was sent will be considered in transit state. If the client that got the batch failed, we’ll have to note (when we close the TCP connection) and then send the old batch to another client. There are issues with that, by the way, different clients may have different batch sizes, for example. If the batch we need to retry has 100 documents, but the only available client needs 10 at a time, for example.There is another problem with this approach, however. Consider the case of a document that was sent to a client for processing. While it is being processed, it is modified again, that means that we have a problem. Do we send the document again to another client for processing? Remember that it is very likely that you’ll do something related to this document, and it can be a cause for bugs because two clients will get the same document (albeit, two different versions of it) at the same time.In order to support concurrent clients on the same subscription, we need to handle all of these problems. Keep track of all the documents that were sent and haven’t been acknowledged yet.Keep track of all the active connections and re-schedule the documents to be sent to clients that weren’t acknowledged if the connection is broken.When a document is about to be sent, we need to check that it isn’t already being processed (an early version of it, rather) by another client. If that is the case, we have to wait until that document is acknowledged before allowing that document to be processed.The latter is meant to avoid concurrency issues with handling of a single document. I think that limiting the work on a document basis is a reasonable behavior. If your model requires coordination across multiple distinct documents, that is something that you’ll need to implement directly. Implementing the “don’t send the same document to multiple clients at the same time”, on the other hand, is likely to result in better experience all around.This post is meant to explore the design of such a feature, and as such, I would dearly love any and all feedback.

C# 9 - Improving performance using the SkipLocalsInit attribute

by Gérald Barré

posted on: November 23, 2020

C# 9 brings lots of new language features. One of them is the ability to suppress emitting .locals init flag. This feature allows to improve the performance of a method by not zeroing the local variables before executing the method. Even if zeroing local has been improved in .NET 5, not doing it wi

Open Source & Money: Part II

by Oren Eini

posted on: November 19, 2020

I run into this tweet: I wanted to respond to that, because it ties very closely to the previous post. As I already said, getting paid for open source is a problem. You either try to do that professionally (full time) or you don’t. Trying to get hobbyist amount of money from open source is not really working. And when you are doing this professionally, there is a very different manner of operating. For this post, I want to talk about the other side, the people who want to pay for certain things, but can’t.Let’s say that Jane works for a multibillion dollar company. She is using project X for a certain use case and would like to extend its capabilities to handle and additional scenario. We’ll further say that this project has a robust team or community behind it, so there is someone to talk to.If the feature in question isn’t trivial, it is going to require a substantial amount of work. Jane doesn’t want to just bug the maintainers for this, but how can she pay for that? The first problem that you run into is who to pay. There isn’t usually an organization behind the project. Just figuring out who to pay can be a challenge. The next question is whatever that person can even accept payments. In Israel, for example, if you aren’t an independent employee, there is a lot of bureaucracy you have to go through if you want to accept money outside of your employer.Let’s say that the cost of the feature is set around 2,500$ – 7,500$. That amount usually means that Jane can’t just hand it over and claim it in her expenses. She needs someone from Accounts Payable to handle that, which means that it needs to go through the approval process, there should be a contract (so legal is also involved), they might be a required bidding process, etc. The open source maintainer on the other side is going to get an 8 pages contract written is dense legalese and have to get a lawyer to go over that. So you need to cover that expense as well. There are delivery clauses in the contract, penalties for late delivery, etc. You need to consider whatever this is work for hire or not (matters for copy right law), whatever the license on the project is suitable for the end result, etc. For many people, that level of hassle for a rare occurrence of non life changing amount of money is too much.  This is especially true if they are already employed and need to do that on top of their usual work.For Jane, who would like her employer to pay for a feature, this is too much of a hassle to go through all the steps and paperwork involved. Note that we aren’t talking about a quick email, we are probably talking weeks of having to navigate through the hierarchy, getting approval from multiple parties (and remember that there is also the maintainer on the other side as well). In many cases, the total cost that is involved here can very quickly reach ridiculous levels. There is a reason why in many cases it is easier for such companies to simply hire the maintainers directly. It simplify a lot of work for all sides, but it does means that the project is no longer independent.

Open Source & Money: Part I

by Oren Eini

posted on: November 18, 2020

I run into a (private) tweet that said the following:Is there a way to pay for a feature in an opensource project in a crowdfunded manner with potential over time payouts? I would love to pay someone to implement a feature I really want, but I won't be able to pay it all.I think that this is a very interesting sentiment, because the usual answer for that range between no and NO. Technically, yes, there are ways to handle that. For example, Patreon or similar services. I checked a few of those and found LineageOS – 205 Patrons with 582$ monthly.There is also Librapay, which seems to be exactly what the tweet is talking about, but…  the highest paid individual in there is paid about under a thousand dollars a month. The highest paid organization is bringing in about 1,125$ / month. There are other places, but they present roughly the same picture. In short, there doesn’t seem to be any money in this style of operation. Let me make it clear what I mean by no money. Going to Fiverr and sorting by the cheapest rate, you can find a developer for 5 – 10$ / hour. No idea about the quality / ability to deliver, but that is the bottom line. Using those numbers (which are below minimum wage) gives you not a lot of time at all. A monthly recurring income of 500$ – 1,250$, assuming minimum wage, will get you about a week or two of work per month. But note that this is assuming that you desire minimum wage. I’m unaware of anywhere that a developer is charging that amount, typical salaries for developers are in the upper tier. So in term of financial incentives, there isn’t anything here.Note that the moment you take any amount of money, you lose the ability to just mute people. If you are working on open source project and someone come with a request, either it is interesting, so it might be picked up, or it isn’t. But if there is money involved (and it doesn’t have to be a significant amount), there are different expectations.There is also a non trivial amount of hassle in getting paid. I’m not talking about actually collecting the money, I’m talking about things like taxes, making sure that all your reports align, etc. If you are a salaried employee, in many cases, this is so trivial you never need to think about it. That on its own can be a big hurdle, especially because there isn’t much money in it.Counter point to my entire post is that there are projects that have done this. The obvious one is the Linux kernel project, but you’ll note that such projects are extremely rare. And usually have had a major amount of traction before they managed to sort out funding. In other words, it got to the point where people were already employed full time to handle such projects.Another option is Kickstarter. This isn’t so much for recurring revenue, but getting started, of course. On Kickstarter, there seems to be mostly either physical objects or games. I managed to find Light Table which was funded in 2014 to the tune of  316,720$ by 7,317 people. Checking the repository, there seems to be non activity from the beginning of the year.

Using RavenDB Subscriptions with complex object graphs

by Oren Eini

posted on: November 17, 2020

RavenDB Subscriptions allows you to create a query and subscribe to documents that match the query. They are very useful in many scenarios, including backend processing, queues and more. Subscriptions allow you to define a query on a document, and get all the documents that match this query. The key here is that all documents don’t refer to just the documents that exists now, but also future documents that match the query. That is what the subscription part is all about. The subscription query operate on a single document at a time, which leads to open questions when we have complex object graphs. Let’s assume that we want to handle via subscriptions all Orders that are managed by an employee residing in London. There isn’t a straightforward of doing this. One option would be to add EmployeeCity to the Orders document, but that is a decidedly inelegant solution. Another option is to use the full capabilities of RavenDB. For Subscription queries, we actually allow you to ask question on other documents, like so: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters declare function filter(o) { var emp = load(o.Employee); return emp.Address.City == 'London'; } from Orders as o where filter(o) view raw OrdersFromEmpsInLondon.sql hosted with ❤ by GitHub Now we’ll only get the Orders who employee is in London. Simple and quite elegant. It does have a caveat, though. We will only evaluate this condition whenever the order changes, not when the employee changed. So if the employee moves, old orders will not be matched against the subscription, but new ones will.

CAP Theorem, PACELC, and Microservices

by Ardalis

posted on: November 17, 2020

CAP Theorem (wikipedia) is a classic "given 3 choices, choose 2" topic. The three choices are Consistency, Availability, and Partition…Keep Reading →

RavenDB 5.1 Features: Searching in Office documents

by Oren Eini

posted on: November 16, 2020

For a long time, whenever I tried to explain how RavenDB is a document database, people immediately assumed that I’m talking about Office documents (Word, Excel, etc) and that I’m building a SharePoint clone.  Explaining that documents are a different way to model data has been a repeated chore, and we still get prospects asking about RavenDB’s office integration.  As an aside, I’ll be doing a Webinar on Tuesday talking about Data Modeling with RavenDB.RavenDB 5.1 has a new feature, Nuget integration, which allows you to integrate Nuget packages into RavenDB’s indexes. Turns out, it takes very little code to allow RavenDB to search inside Office documents. Let’s consider a legal case system, where we track the progression of legal cases, the work done on them, billing, etc. As you can imagine, the amount of Word and Excel documents that are involved is… massive. Making sense of all of that information can be pretty hard. Here is how you can help your users with the use of RavenDB.Here is the Filing/Search index definition: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters from i in docs.Filings let docx = AttachmentsFor(i).Where(x=>x.Name.EndsWith(".docx")).Select(x=>LoadAttachment(i, x.Name).GetContentAsStream()) let xlsx = AttachmentsFor(i).Where(x=>x.Name.EndsWith(".xlsx")).Select(x=>LoadAttachment(i, x.Name).GetContentAsStream()) select new { Documents = new[]{ docx.Select(doc => Office.GetWordText(doc)), xlsx.Select(x => Office.GetExcelText(x)) } } view raw filing_search.cs hosted with ❤ by GitHub As you can see, we are using two new features in RavenDB 5.1:The LoadAttachment() / GetContentAsStream() methods, which expose the attachments to the indexing engine.The Office.GetWordText() / Office.GetExcelText() methods, which extract the text from the relevant documents to be indexed by RavenDB.Aside from that, this is a fairly standard index, we mark the Documents field as full text search (in red in the image below). There is also the yellow markers in the image, what are they for?No, RavenDB didn’t integrate directly with Office, instead, we make use of the new Additional Assemblies (and the existing Additional Sources) to bring you this functionality. Let’s see how that works, shall we?We tell RavenDB that for this index, we want to pull the NuGet package DocumentsFormat.OpenXml. And it will just happen, which means that we have the full power of this package in your indexes. In fact, this is exactly what we do. Here is the content of the Additional Sources: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters using System.Collections.Generic; using System.IO; using System.Linq; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Spreadsheet; using DocumentFormat.OpenXml.Wordprocessing; public static class Office { public static IEnumerable<string> GetExcelText(Stream stream) { using var doc = SpreadsheetDocument.Open(stream, false); var sst = doc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First() .SharedStringTable.ChildElements.Select(x => x.InnerText) .ToArray(); foreach (var sheet in doc.WorkbookPart.Workbook.Descendants<Sheet>()) { var part = (WorksheetPart)doc.WorkbookPart.GetPartById(sheet.Id); foreach (var cell in part.Worksheet.Descendants<Cell>()) { switch (cell.DataType?.Value) { case CellValues.Boolean: yield return cell.InnerText == "0" ? "false" : "true"; break; case CellValues.SharedString: yield return sst[int.Parse(cell.InnerText)]; break; case CellValues.Date: yield return cell.InnerText; break; } } } } public static IEnumerable<string> GetWordText(Stream stream) { using var doc = WordprocessingDocument.Open(stream, false); foreach (var element in doc.MainDocumentPart.Document.Body) { if (element is Paragraph p) { yield return p.InnerText; } } var comments = doc.MainDocumentPart?.WordprocessingCommentsPart?.Comments; if(comments == null) yield break; foreach (var element in comments) { yield return element.InnerText; } } } view raw Office.cs hosted with ❤ by GitHub What this code does is use the DocumetnsFormat.OpenXml package to read the data inside the provided attachments. We extract the text from them and then provide it to the RavenDB indexing engine, which enable us to do full text search on the content of attachments.In effect, within the space of a single blog post, you can turn your RavenDB instance to a document indexing system. Here is how we can query the data:And the result is here:And here is the relevant term inside the Office documents:As you can imagine, this is a very exciting capability to add to RavenDB. There is much more that you can do with the ability to integrate such capabilities directly into your database.