skip to content
Relatively General .NET

Feature Design: ETL for Queues in RavenDB

by Oren Eini

posted on: December 08, 2021

RavenDB is rarely deployed in isolation, it is typically used in existing systems and is integrated into the overall system. One of the key ways by which this is promoted is the built-in ETL support that we have. RavenDB currently has ETL for Postgres, SQL Server, Oracle, MySQL, Elastic,  OLAP / Date Lake, and other RavenDB instances. We are looking into adding RavenDB ETL support to queues (RabbitMQ, Kafka, SQS, AQS, etc). That support is the topic of this blog post. I wanted to summarize my thinking about the topic and along the way gather some insight from you about what kind of shape this feature should have. When talking about ETL to Queues, we have to deal with two distinct scenarios: receiving and sending. For the other ETL targets in RavenDB, we just send data, but for queues, given that there is a well defined interface for pulling the results, it makes sense to support receiving as well. Let’s consider what it means to be able to receive messages from a queue into RavenDB… It means that RavenDB will listen to a queue and apply a script to it. That script will be able to insert or modify documents as a result of the message contents. For example, let’s assume that we have the queue defined as in the image on the right. We can write the following script to process messages from the queue. This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters if(msg.type == "NewOrder") { var doc = { Customer: msg.Customer, Lines: [], "@metadata": {"@collection": "Orders"} }; put(msg.id, doc); } else if (msg.type == "AddLine") { var doc = load(msg.id); doc.Lines.push(msg.Details); put(msg.id, doc); } view raw etl.js hosted with ❤ by GitHub The script above handles two message types. A recording of a new order or adding a line item to an existing order. It will be invoked by RavenDB whenever it receives a message from the queue. In this way, you can have RavenDB build your domain model directly from the message traffic. Of course, this is a pretty simplistic scenario, there are a lot of more interesting scenarios to explore here. The second part is when RavenDB will be the one sending messages to the queues. Those messages, naturally, would be generated from the documents in the database. How would that work? We can write a script that would be applied to documents as they change which will output the messages to write to the queue. That is how ETL in general works in RavenDB. For queues, however, the situation is a bit more complex. When we use ETL to sync data from RavenDB to a relational database, any update of the document will also update the data in the relational database. When we send the data to a queue, what would happen then? Well, we can’t update a message in the queue, that doesn’t make any sort of sense. So we need to consider what is the scenario we have here. One option would be to just send the message each time, every update of a document will generate a new message. Or the author of the ETL script may decide to only send it once, of course. The scenario that I think is far more likely is to use RavenDB and ETL to Queue as part of a larger scheme. Consider the scenario where you want to use the outbox pattern. In other words, you have a transaction that needs to do a bunch of things, including sending messages on a queue. Instead of trying to create a distributed transaction or carefully coordinate things, you will use this feature. Your transaction will save a Message document alongside any other changes. That relies on RavenDB’s ACID nature to ensure that this happens in an atomic manner. Then you will be able to utilize the ETL to Queues option to actually send that over to the actual queue, in a reliable manner. Those two scenarios (send & receive) are the two most likely scenarios for this feature, but the point of this post is to get more feedback from you. What kind of use cases do you think that this will enable? What would you like to be able to do?

Should you Abstract the Database?

by Vladimir Khorikov

posted on: December 07, 2021

This article is inspired by a tweet that I thought would be a good topic for discussion: should you abstract your database?

Cumulative computation with RavenDB queries

by Oren Eini

posted on: December 07, 2021

Consider the image on the right, where we have three charges on separate months. This is a time series, showing charges over time. We can very easily issue queries that will give us the results of how much we paid in a time period, but what if we wanted to get the cumulative value. How much have I paid so far? Here is how this should look like: However, that is not something that we provide in RavenDB. Luckily, we do provide a very flexible query engine, so we can make it happen anyway. Here is what the query will look like: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters declare function compute(o) { var results = []; var sum = 0; var ts = timeseries(id(o), 'Charges').get(); for(var i = 0; i< ts.length; i++) { var cur = ts[i]; sum += cur.Values[0]; results.push({ Timestamp: cur.Timestamp, Amount: cur.Values[0], Sum: sum }); } return results; } from 'Orders' as o where id(o) == 'orders/829-A' select compute(o) view raw compute.js hosted with ❤ by GitHub Note that we are using a JavaScript function to process the time series and run the computation that we want, and then we return an array, which is translated to multiple results set per document. Here is the result of this query:

Make the Implicit Explicit

by Ardalis

posted on: December 07, 2021

When practicing software architecture and design, one important consideration is how the rules of the system are modeled. Are the rules ad…Keep Reading →

An optimization story:–27% runtime costs for 8 lines of code

by Oren Eini

posted on: December 06, 2021

I give a lot of talks about performance and in those talks, I tend to emphasize the architectural impact of your choices. There is a huge tendency to focus on micro optimizations to get the performance you need, even though you can usually get orders of magnitude higher performance by making architectural changes. Good architecture can still benefit from micro optimizations, however, and it is sometimes really surprising to see by how much. During a routine performance review, we identified a particular scenario as a performance issue. Here is the code in question: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters private string AddQueryParameter(object value) { var parameterName = $"{ParameterPrefix}{QueryParameters.Count.ToInvariantString()}"; QueryParameters.Add(parameterName, value); return parameterName; } view raw AddQueryParameter.cs hosted with ❤ by GitHub This is being triggered when you are using a parameterized query, like this one: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters var orders = await session.Query<Order>() .Where(x => x.Employee == "employees/1-A" && x.OrderedAt.Year .ToListAsync(); view raw Query.cs hosted with ❤ by GitHub   And here is the profiler trace for that: That is kind of ridiculous, to be honest. About 18% of the client side query process went into generating the name of the query. Opps, that is not really something that I expected. And here is the optimized version: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters private const string DefaultParameterPrefix = "p"; private static readonly List<string> _pregeneratedParametersWithDefaultPrefix = Enumerable.Range(0, 128).Select(i => $"{DefaultParameterPrefix}{i.ToInvariantString()}").ToList(); private string AddQueryParameter(object value) { string parameterName; if (ParameterPrefix == DefaultParameterPrefix && QueryParameters.Count < 128) parameterName = _pregeneratedParametersWithDefaultPrefix[QueryParameters.Count]; else parameterName = $"{ParameterPrefix}{QueryParameters.Count.ToInvariantString()}"; QueryParameters.Add(parameterName, value); return parameterName; } view raw AddQueryParameter.Optimized.cs hosted with ❤ by GitHub Basically, we prepare, in advance, the most likely names, so we can find them as cheaply as possible. The result for that particular operation is impressive: So we almost halved the costs that we have here, but what is more interesting is what happens at higher level of the stack… This is the query processing portion, and you can see that the pretty minimal saving of 187 ms in the AddQueryParameter method is translated to a far greater saving down the line. The overall cost went down by almost 30%. The probable reason is that we are now allocating slightly less, we saved a few allocations for each query parameter, and that in turn translated to a far better overall performance.

Fastest way to enumerate a List<T>

by Gérald Barré

posted on: December 06, 2021

NoteThis blog post is part of The C# Advent Calendar, a series of 50 posts about C#. Be sure to check out the rest of the blog posts in the calendar!System.Collections.Generic.List<T> may be the most used collection in .NET. It is a generic collection that can be used to store any type of dat

Desire features in software architecture

by Oren Eini

posted on: December 03, 2021

In architecture (physical building) there is a term called Desire Lanes. The idea is that users will take the path of least resistance, regardless of the intention of the architect. The image on the right is one that I have seen many times, and I got a chuckle out of that each time. That is certainly something that I have seen over and over again. I had the chance recently to see how the exact same thing happens in two very different software systems. There was a need and the system didn’t allow it. The users found a way. In the first instance, we are talking about a high security environment. The kind where you leave your phones and smart watches at the door, outside devices are absolutely prohibited. So far, this makes sense and there is a real need for that for their scenario. The problem is that they also have a high degree of people who are working on that environment on a very transient basis. You may get people that show up for a day or two mostly (meeting, briefings, training) or for a couple of weeks at most. Those people need to be able to do… stuff with computers (take notes, present, plan, etc). Given the high security environment, creating a user in the system takes a few days at least (involves security briefing, guidance, etc). Note that all involved have the right security clearances, that isn’t the issue. But before you can get a user account, policy dictates that you need to be briefed, login is done via smart cards + password only, etc. You can’t make that work if you have hundreds of people coming and going all the time. The solution? There are a bunch of smart cards in a drawer belonging to former employees whose accounts were purposefully not deactivated. You get handed the card + password and can use the account for basic needs. I assume that those accounts are locked, but I didn’t bother to verify that. It wouldn’t surprise me if they still had all their permissions and privileges. From an IT security standpoint, I am horrified. That is a Bad Idea, but it is a solution to the issue at hand, providing computer access for short terms visitors without having to go through all the hoops the security policy dictates. This is sadly a very widespread tactic in that organization, I have seen this in multiple branches in separate locations. In the second scenario, there is a system to reserve appointments with doctors. The system has an app, where users can register for their appointments themselves. There is also the administration team that may also reserve appointments for patients. The system allows a doctor to define their hours of operations and then (as far as the system is concerned) it is first come & first served basis. The administration team, on the other hand, has to deal with a more complex situation. For example, a common issue that I run into is that you can only set an appointment if you are registered in the system. What would you do with first time visitors? They are routinely setting things up through the administration team, but while they can reserve an appointment, they have to put someone that is registered in the system. The solution for that is to use other people, typically the administration team will use their own accounts and set the appointment for themselves, to reserve the spot for the new patient. That can lead to some issues, for example, if the doctor has to cancel, the system will send notices to the scheduled patients. But the scheduled patient and the real patient are distinct. It also means that from a medical file perspective, certain people are “ditching” a lot of appointments. In both cases, we can see that there is a need to do something that the system doesn’t allow (or actively trying to prevent). The end result is a solution, a sub optimal one, for sure, but something that works. One of the key aspects for building proper systems for the long term is to listen and implement proper solutions for those sort of issues. In many cases, they are of pivotal importance for the end user, note that this is very much distinct from the customer. The customer is the one who pays, the end user is the one who is using the system. There is often a major disconnect between the two. This is where you get Desire Features and workaround that become Official, to the point where some of those solutions are literally in the employee handbook.

Dealing with complex hierarchies in RavenDB

by Oren Eini

posted on: December 01, 2021

For many business domains, it is common to need to deal with hierarchies or graphs. The organization chart is one such common scenario, as is the family tree. It is common to want to use graph queries to deal with such scenarios, but I find that it is usually much easier to explicitly build your own queries.. Consider the following query, which will give me the entire hierarchy for a particular employee: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters declare function hierarchy(e) { var results = []; while (true) { e = load(e.ReportsTo); if(e == null) return results; results.push(id(e)); } } from "Employees" as e select e.FirstName, e.LastName, hierarchy(e) as Hierachy include Hierachy view raw hierarchy.sql hosted with ❤ by GitHub I can define my own logic for traversing from the document to the related documents, and I can do whatever I want there. You can also see that I’m including the related documents, here is how this looks like when I execute the query: A single query gives me all the details I need to show the user with one roundtrip to the server. Let’s go with a more complex example. The above scenario had a single path to follow, which is trivial. What happens if I have a more complex system, such as a family tree? I took the Games of Thrones data (easiest to work with for this demo) and threw that into RavenDB, and then executed the following query: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters declare function ancestors(e) { var results = []; var wrk = [e]; while (wrk.length > 0) { var cur = wrk.splice(0,1)[0]; results.push(id(cur)); for(var i = 0; i< cur.parents.length; i++){ var next = load(cur.parents[i]); if(next != null) wrk.push(next); } } return results.splice(1).join(", "); } from "Characters" as c where c.characterName == 'Rhaego' select c.characterName, ancestors(c) view raw tree.sql hosted with ❤ by GitHub And that gives me the following output: This is a pretty fun technique to explore, because you can run any arbitrary logic you need, and expressing things in an imperative manner is typically much more straightforward.