skip to content
Relatively General .NET

The cost of queue architecture, and why upfront payment will pay dividends

by Oren Eini

posted on: August 18, 2021

I wrote a post a couple of weeks ago called: Architecture foresight: Put a queue on that. I got an interesting comment from Mike Tomaras on the post that deserve its own post in reply.Even though the benefits of an async queue are indisputable, I will respectfully point out that you brush over or ignore the drawbacks. … redacted, see the real comment for details …I think we agree that your sync code example is much easier to reason about than your async one. "Well, it is a bit more complex to manage in the user interface", "And you can play games on the front end" hides a lot of complexity in the FE to accommodate async patterns. Your "At more advanced levels" section presents no benefits really, doing these things in a sync pattern is exactly the same as in async, the complexity is moved to the infrastructure instead of the code.This is a great discussion, and I agree with Mike that there are additional costs to using the async option compared to the synchronous one. There is a really good reason why pretty much all modern languages has something similar to async/await, after all. And anyone who did any work with Node.js and promises without that knows exactly what are the cost of trying to keep the state of the system through multiple levels of callbacks.It is important, however, that my recommendation had nothing to do with async directly, although that is the end result. My recommendation had a lot more to do with breaking apart the behavior of the system, so you aren’t expected to give immediate replies to the user. Consider this: ⏱. When you are processing a user’s request, you have a timer inherent to the operation. That timer can be a real one (how long until the request times out) or it can be a mental one (how long until the user gets bored). That means that you have a very short SLA to run the actual request.What is the impact of that on your system? You have to provision enough capacity in the system to handle the spikes within the small SLA that you have to work with. That is tough. Let’s assume that you are running a website that accepts comments, and you need to run spam detection on the comment before actually posting that. This seems like a pretty standard scenario, right? It doesn’t require specialized scenarios.However, the service you use has a rate limit of 10 comments / sec. That is also something that is pretty common and reasonable. How would you handle something like that if you have a post that suddenly gets a lot of comments? Well, you’ll have something that ensure that you don’t pass the limit, but then the user is sitting there, waiting and thinking that the request timed out. On the other hand, if you accept the request and place it into a queue, you can show it in the UI as accepted immediately and then process that at leisure. Yes, this is more complex than just making the call inline, it requires a higher degree of complexity, but it also ensure that you have proper separation in your system. The front end submit messages to the backend, which will reply when it is done. By having this separation upfront, as part of your overall design, you get options. You can change how you are processing things in the backend quickly. Your front end feel fast (which is usually much more important than being fast, mind you).As for the rate limits and the SLA? In the case of spam API or similar services, sure, this is obvious. But there are usually a lot of implicit SLAs like that. Your database disk is only able to serve so many writes a second, for example. That isn’t usually surfaced to you as X writes / sec limit, but it is true nevertheless. And a queue will smooth over any such issues easily. With making the request directly, you have to ensure that you have enough capacity to handle spikes, and that is usually far more expensive.What is more interesting, in my opinion, is that the queue gives you options that you wouldn’t have otherwise. For example, tracing of all operations (great for audits), retries if needed, easy model for scale out, smoothing out of spikes, etc. You cannot actually put everything into a queue, of course. The typical example is that you’ll want to handle a login page. You cannot really “let the user login immediately and process in the background”. Another example where you don’t want to use asynchronous processing is when you are making a query. There are patterns for async query completions, but they are pretty horrible to work with. In general, the idea is that whenever the is any operation in the system, you throw that to a queue. Reads and certain key aspects are things that you’ll need to run directly.

Implementing a count(distinct) query in RavenDB

by Oren Eini

posted on: August 17, 2021

A user called us to ask about how they can manage to move a particular report from a legacy system to RavenDB. They need to be able to ask questions such as the following one: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters select count(distinct(City)) as CitiesCount, Employee, Company from Orders group by Employee, Company view raw query.sql hosted with ❤ by GitHub This is an interesting issue, when you think about it from the point of view of a database engine. The distinct issue means that we have to keep state (all the unique values) while we evaluate the query, which can be expensive. One of the design principles of RavenDB was that we want to make it hard to accidently create expensive queries. Indeed, a query like that isn’t trivial to implement in RavenDB. We need to have a two stage approach for implementing this feature.First, we’ll introduce a Map/Reduce index, which will aggregate the data on Employee, Company and City. Along the way, it will run the distinct operation on the City, because it will group by it. That gives us a model in which we get the distinct amount for free, and in a highly efficient manner. Here is the index in question: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters /// map from o in docs.Orders select new { Key = new { o.Employee, o.Company }, o.ShipTo.City, Count = 1 } // reduce from result in results group result by new { result.Key, result.City} into g select new { g.Key.Key, g.Key.City, Count = g.Sum(x => x.Count) } view raw index.cs hosted with ❤ by GitHub The interesting thing about this index is that querying it will not give us the right results. We don’t want to get the details based on Employee, Company and City. We want just by Employee and Company. This is where the second stage comes into play. Instead of running a simple query on the index, we’ll use a faceted query. Here is what it will look like: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters from index 'Orders/Stats' select facet(Key, sum(Count)) view raw faceted.sql hosted with ❤ by GitHub What this does is to aggregate the results (which were already partially aggregated by the Map/Reduce) and give us the totals. And here are the results: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters { "Name": "Key", "Values": [ { "Name": "Count", "Sum": 3, "Count": 2, "Range": "{\"Employee\":\"employees/1-A\",\"Company\":\"companies/1-A\"}" }, { "Name": "Count", "Sum": 2, "Count": 1, "Range": "{\"Employee\":\"employees/1-A\",\"Company\":\"companies/10-A\"}" } ] } view raw results.json hosted with ❤ by GitHub The end result is that we are able to do most of the work an indexing time, and the query time is left working on already aggregated data. That means that the queries should be much faster and that there is a lot less work for the database to do.It also isn’t RavenDB’s strong suit. Such queries are typically more inline with OLAP systems, to be honest. If you know what your query patterns looks like, you can use this technique to easily handle such queries, but if there is a wide range of dynamic queries, you may want to use RavenDB as the system of record and then use either SQL ETL or OLAP ETL to push that to a reporting system.

Playing with System.Text.Json Source Generators

by Steve Gordon

posted on: August 16, 2021

In my daily work, I’m becoming quite familiar with the ins and outs of using System.Text.Json. For those unfamiliar with this library, it was released along with .NET Core 3.0 as an in-the-box JSON serialisation library. At its release, System.Text.Json was pretty basic in its feature set, designed primarily for ASP.NET Core scenarios to handle […]

Reference equality for dictionaries in Python

by Oren Eini

posted on: August 16, 2021

Implementing a unit of work in Python can be an interesting challenge. Consider the following code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters class Holder(object): def __init__(self): self.items = dict() def try_set(self, key, name): if key in self.items: return self.items[key] = name def try_get(self, key): if key in self.items: return self.items[key] return None view raw Holder.py hosted with ❤ by GitHub This is about as simple a code as possible, to associate a tag to an object, right?However, this code will fail for the following scenario: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters @dataclass class Item: name: str holder = Holder() cup = Item(name="Cup") holder.try_set(cup, "cups/1") view raw cups.py hosted with ❤ by GitHub You’ll get a lovely: “TypeError: unhashable type: 'Item'” when you try this. This is because data classes in Python has a complicated relationship with __hash__().An obvious solution to the problem is to use: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters class Holder(object): def __init__(self): self.items = dict() # this is bad def try_set(self, key, name): if id(key) in self.items: return self.items[id(key)] = name # this is bad def try_get(self, key): if id(key) in self.items: return self.items[id(key)] return None view raw Holder2.py hosted with ❤ by GitHub However, the id() in Python is not guaranteed to be unique. Consider the following code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters cup = Item(name="Cup") print(id(cup)) cup = None cup = Item(name="Cup") print(id(cup)) # different instance view raw opps.py hosted with ❤ by GitHub On my machine, running this code gives me:124597181219840 124597181219840In other words, the id has been reused. This makes sense, since this is just the pointer to the value. We can fix that by holding on to the object reference, like so: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters class RefEq(object): def __init__(self, ref): self.ref = ref def __eq__(self, other): if id(self.ref) == id(other): return True if not isinstance(other, RefEq): return False return id(self.ref) == id(other.ref) def __hash__(self): return id(self.ref) class Holder(object): def __init__(self): self.items = dict() def try_set(self, key, name): if RefEq(key) in self.items: return self.items[RefEq(key)] = name def try_get(self, key): if RefEq(key) in self.items: return self.items[RefEq(key)] return None view raw Holder3.py hosted with ❤ by GitHub With this approach, we are able to implement proper reference equality and make sure that we aren’t mixing different values.

Prevent refreshing the UI after an event in Blazor

by Gérald Barré

posted on: August 16, 2021

If you want to improve the performance of a Blazor application, you may want to reduce the number of time the UI is recomputed. In Blazor, this means reducing the number of time the StateHasChanged method is called. This method needs to be called only when the state has changed and the UI needs to

Questions to answer when sizing a RavenDB node

by Oren Eini

posted on: August 13, 2021

A common question that is raised by customers is how to determine what kind of hardware you need to run RavenDB on. I’m sorry, but the answer is it’s depend, because there are a lot of variables to juggle, in this post, I”m going to try to give some insights about what sort of things you should consider when sizing your instances.In general, you have three axis that you can work with. CPU, Memory and I/O. In terms of the best bang for the buck, optimizing I/O is usually the way to go and will return the most dividends. This is because most of the time, RavenDB will be bottlenecked on the I/O. This is especially true when you are running on the cloud, where 500 IOPS is a fairly common default (that is basically zilch to a database).To give more a concrete answer we’ll need more details. Let’s say that you have an application with a database per customer (common for multi tenant scenarios). The structure of the database is the same, but the databases contain data that is separated from each customer. The database has 20 indexes in total, 15 map / full text search as well as 5 for map-reduce / facets operations. There is also an few ETL tasks and a couple of subscriptions for background work.Let’s breakdown the load on  single server in this mode, shall we?100 databases (meaning 100 tx merger threads for I/O).2,000 indexes - 20 indexes x 100 databases (meaning 2,000 indexing threads).Across the cluster, we also have:500 ETL tasks – 5 per database x 100200 subscriptions – you get the drillThe latest items are spread fairly among all the nodes that you have, but the first two are present in all nodes in the cluster. What does this mean? We have 2,100 threads active at any given point in time? Well, that is where things gets a bit complex. We need to know more than just the raw numbers, we need to understand usage.How many of those databases are active at any given point in time? In a multi tenant system, it is common to have many customers using the system sporadically, which can allow you to pack a lot more instances into the same hardware resources. Of more interest, however, is usually the rate of writes. Here we need to ask ourselves what is the write write as well. In general, for reads RavenDB will load all the relevant items into memory and serve directly from there. For writes, given it’s durable nature, RavenDB must hit the disk. And the question now becomes how many database are active at the same time?This is important, because 10 writes per second to a single database are far better than 10 writes / second across 10 databases. This is because RavenDB is able to batch I/O for a single database, but not across databases. Let’s consider the scenario where we have writes that would impact 5 indexes in the database, what is going to happen when we have 10 writes / sec in a single database?1 – 5 writes to the disk for the actual documents writes (depends on a lot of factors, and assuming that we are talking about concurrent requests here).5 – 10 index updates: 1 –2 index updates x 5 relevant indexes (in most cases, we are able to batch indexes even better than documents writes).Total number of writes to disk: 6 – 15 writes.However, if we take the same scenario, but now run it across 10 databases, each having a single write? There is no way for us to batch updates, so we’ll have:10 databases x (1 document writes  + 5 index updates) = 50 writes to disk.If the number of relevant indexes is high, or if there are more databases involved, it is easy to hit the limits of I/O, especially on the cloud. I’m actually painting somewhat bleak picture, in most cases you don’t have to worry too much about those details, RavenDB will take care of that for you. However, when you need to consider the sizing, you want to be aware of the possible load that you’ll have. Ironically enough, if you have enough load, RavenDB is able to really optimize things, it is when you have sporadic operations, spread across many locations that we start putting a lot of load on the underlying system.So far, I was talking about I/O only, but there are other factors as well. Let’s assume that you are running 100 databases with 20 indexes each on a system with 4 cores. How is RavenDB going to split the load across the system?The first priority is going to be given to processing requests, and then we’ll start on running indexes. That is actually by design, to ensure that we won’t overwhelmed the underlying system by issuing too much work all at once. That means that we’ll round robin the work across all the indexes that want to run, while keeping enough capacity to process user requests. In this case, more cores will allow us higher degree of parallelism, but if you have an unbalanced system (a lot of CPU but slow I/O), you’re going to see stalls because we’ll wait a lot for I/O.In short, you need to have a fair idea about how your system is going to be used. If you don’t have at least a good guess on the topic, you are probably better off getting more I/O bandwidth than anything else. RavenDB continuously monitor itself and will alert you if there are resource issues. You are then able to shore up anything that is lacking to get the best system performance.

Looking into Odin and Zig: My notes

by Oren Eini

posted on: August 12, 2021

I was pointed to the Odin language after my post about the Zig language. On the surface, Odin and Zig are very similar, but they have some fundamental differences in behavior and mindset. I’m basing most of what I’m writing here on admittedly cursory reading of the Odin language docs and this blog post.Odin has a great point on conditional compilation. The if statements that are evaluated at compile time are hard to distinguish. I like Odin’s when clauses better, but Zig has comptime if as well, which make it easier. The actual problem I have with this model in Zig is that it is easy to get to a situation where you write (new) code that doesn’t get called, but Zig will detect that it is unused and not bother compiling it. When you are actually trying to use it, you’ll hit a lot of compilation errors that you need to fix. This is in contrast to the way I would usually work, which is to almost always have the code in compliable state and leaning hard on the compiler to double check my work.Beyond that, I have grave disagreements with Ginger, the author of the blog post and the Odin language. I want to pull just a couple of what I think are the most important points from that post:I have never had a program cause a system to run out of memory in real software (other than artificial stress tests). If you are working in a low-memory environment, you should be extremely aware of its limitations and plan accordingly. If you are a desktop machine and run out of memory, don’t try to recover from the panic, quit the program or even shut-down the computer. As for other machinery, plan accordingly!This is in relation to automatic heap allocations (which can fail, which will usually kill the process because there is no good way to report it). My reaction to that is “640KB is enough for everything”, right?To start with, I write databases for a living. I run my code on containers with 128MB when the user uses a database that is 100s of GB in size. Even if running on proper server machines, I almost always have to deal with datasets that are bigger than memory. Running out of memory happens to us pretty much every single time we start the program. And handling this scenario robustly is important to building system software. In this case, planning accordingly in my view is not using a language that can put me in a hole. This is not theoretical, that is real scenario that we have to deal with. The biggest turnoff for me, however, was this statement on errors:…my issue with exception-based/exception-like errors is not the syntax but how they encourage error propagation. This encouragement promotes a culture of pass the error up the stack for “someone else” to handle the error. I hate this culture and I do not want to encourage it at the language level. Handle errors there and then and don’t pass them up the stack. You make your mess; you clean it.I didn’t really know how to answer that at first. There are so many cases where that doesn’t even make sense that it isn’t even funny. Consider a scenario where I need to call a service that would compute some value for me. I’m doing that as gRPC over TCP + SSL. Let me count the number of errors that can happen here, shall we?Bad reaction on the service (run out of memory, for example).Argument passed is not a valid oneInvalid SSL certificateAuthentication issuesTCP firewall issueDNS issueWrong URL / portMy code, which is calling the service, need to be able to handle any / all of those. And probably quite a few more that I didn’t account for. Trying to build something like that is onerous, fragile and doesn’t really work. For that matter, if I passed the wrong URL for the service, what is the code that is doing the gRPC call supposed to do but bubble the error up? If the DNS is returning an error, or there is a certificate issue, how do you clean it up? The only reasonable thing to do is to give as much context as possible and raise the error to the caller. When building robust software, bubbling it up so the caller can decide what to do isn’t about passing the back, it is a b best practice. You only need to look at Erlang and how applications with the highest requirements for reliability are structured. They are meant to fail, error handling and recovery is something that happens in dedicated (supervisors) locations, because these places has the right context to make an actual determination.The killer impact of this, however, is that Zig has explicit notion of errors, while Odin relies on the multiple return values system. We have seen how good that is with Go. In fact, one of the most common issues with Go is the issue with how much manual work it takes to do proper error handling. But I think that the key issue here is that errors as a first class aspect of the language gives us a very powerful ability, errdefer. This single language feature is the reason I think that Zig is an amazing language. The concept of first class errors combine with errdefer makes building complex structures so much easier. Consider the following code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters pub fn mmap_file(path: []const u8) ![]align(mem.page_size) u8 { var file = try std.fs.createFileAbsolute(path, .{ .truncate = false, .read = true }); defer file.close(); var size = try file.getEndPos(); var mapping = try std.os.mmap(null, size, std.os.PROT_READ | std.os.PROT_WRITE, std.os.MAP_SHARED, file.handle, 0); errdefer std.os.munmap(mapping); if (size < sizeof(u64)) { return error.FileTooSmall; } var data_end = size - sizeof(u64); var actual_hash = std.hash.CityHash64.hash(mapping[0..data_end]); var expected_hash = std.mem.readIntLittle(u64, mapping[data_end..]); if (actual_hash != expected_hash) { return error.FileSignatureInvalid; } return mapping; } view raw code.zig hosted with ❤ by GitHub Note that I’m opening a file, mapping it to memory, validating its size and then that it has the right hash. I’m using defer to ensure that I cleanup the file handle, but what about the returned memory, in this case, I want to clean it up if there is an error, but not otherwise.Consider how you would write this code without errdefer. I would have to add the “close the map” portion to both places where I want to return an error. And what happens if I’m using more than a couple of resources, I may be needing to do something that require a file, network socket, memory, etc. Any of those operations can fail, but I want to clean them up only on failure. Otherwise, I need to return them to my caller. Using errdefer (which relies on the explicit distinction between regular returns and errors) will ensure that I don’t have a problem. Everything works, and the amount of state that I have to keep in my head is greatly reduce.Consider how you’ll that that in Odin or Go, on the other hand, and you can see how error handling become a big beast. Having explicit language support to assist in that is really nice.

Recent podcasts & videos

by Oren Eini

posted on: August 11, 2021

It turns out that there were quite a lot podcasts and videos that we took part of recently, enough that I didn’t get to talk about all of them. This post is to clear the queue, so to speak.What Is a noSQL Database? – Dejan, our developer advocate, is discussing the high level details of non relational databases and how they fit into the ecosystem.Getting started with RavenDB – Dejan is talking about how to get started using RavenDB, including live demos.Applying BDD techniques using RavenDB – Dejan will show you how you can build a behavior driven design system while leaning on RavenDB’s capabilities.Live demoing RavenDB – I talk about RavenDB and take you for a walk through all its features. This video is close to two hours and cover many of the interesting bits of RavenDB and its design.Interview with Oren Eini – Here I talk about my career and how I got to building RavenDBThe Birth of RavenDB – This is a podcast in which I discuss in details how I got started working in the databases field and how I ended up here .