In version 4.2 we have added an experimental feature to RavenDB, Graph Queries. That was quite a bit of effort and we were really excited about it. The feature was marked as experimental and had been in the product in that state for the past 4 years or so.
Unfortunately, while quite impressive, it didn’t graduate from an experimental feature to a stable one. Mostly because there wasn’t enough usage of graph queries to warrant it. We have seen its usage in some cases, but it seems that our target audience isn’t interested in graph queries for RavenDB.
Given that there isn’t much use of graph queries, we are also aren’t spending much time there. We are looking at the 6.0 release (scheduled around July 2022) and we realize that this feature makes our life more complicated and that the support burden of keeping it outweigh its benefits.
For that reason, we have made the decision to remove the experimental Graph Queries from RavenDB in the 6.0 release. Before we actually pull the trigger on that, I wanted to get your feedback on the feature and its usage. In particular, if you are using it and if so, what are you using it for?
The most common scenarios for this feature are already covered via projection queries in RavenDB, which often can be easier to express for developers.
Regardless, the feature will remain in the 5.x branch and the 5.2 version LTS will support it until at least 2024.
During code review, I ran into the following code (simplified):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
public class Wrapper
{
private IEnumerable _inner;
public Wrapper(IEnumerable inner)
{
_inner = inner;
}
public dynamic this[int i]
{
get => Enumerable.ElementAt(i);
}
}
view raw
first.cs
hosted with ❤ by GitHub
We have some wrapper object for IEnumerable and allow to access it by index.
I don’t like this code, and suggested this version, instead:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
public class Wrapper
{
private IEnumerable _inner;
public Wrapper(IEnumerable inner)
{
_inner = inner;
}
public dynamic this[int i]
{
get
{
if(_inner is not ICollection c)
{
_inner = c = _inner.ToList();
}
return c[i];
}
}
}
view raw
second.cs
hosted with ❤ by GitHub
The problem is that if you’ll look at the code of ElementAt(), it is already doing that, so why the duplicate work? It is specialized to make things fast for similar scenarios, why do I write the same code again?
Because I’m familiar with the usage scenario. A really common usage pattern for the wrapper object is something like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
var wrapper = new Wrapper( item.SelectMany(x=>x.Tags) );
view raw
usage.cs
hosted with ❤ by GitHub
The difference between what ElementAt() does and what I do in the second version of the code is that my version will materialize the values. If you are calling that in a for loop, you’ll pay the cost of iterating over all the items once.
On the other hand, since the instance we pass to ElementAt() isn’t one that has an indexer, we’ll have to scan through the enumerator multiple times. A for loop with this implementation is a quadratic accident waiting to happen.
My first project as a professional software developer was to build a scheduling system for a dental clinics chain. That was a huge project (multiple years) and was really quite interesting. Looking back, I have done a lot of really cool technical things there. I also learned quite a lot from that project. The danger of complexity being one of the chief issues.
Consider a dental clinic, where we have the following schedule for a dentist:
Monday – 10:00 – 15:00
Wednesday – 09:00 – 13:00
Thursday – 11:30 – 16:30
The task is to be able to schedule an appointment for the dentist given those rules.
In addition to the schedule of the dentist, we also have actual Appointments, those looks like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
{
"At": "2022-03-09T12:00:00",
"Duration": "01:00:00",
"Patient": "patients/2130-C",
"Task": "Filling tooth 26"
}
view raw
appointment-1.json
hosted with ❤ by GitHub
Assume that you have quite a few of those, and you want to schedule a new appointment for a patient. How would you do that? I’m a database guy, let’s see if I can specify the task as a query?
We need a dentist that has availability of a particular length (different tasks have different schedules) and particular qualifications. However, there is no such thing as availability in our model. We have just:
Dentist
Schedule
Appointment
The complexity here is that we need to search for something that isn’t there.
I actually found some of my posts on this topic, from 2006. That isn’t a simple problem. And the solution is usually to generate the missing data and query on that. My old posts on the topic actually generate an in memory table and operate on that, which is great for small datasets, but will fail in interesting ways for real world datasets.
For what it’s worth, RavenDB allows you to generate the missing data during the indexing process, so at least the queries are fast, but the indexing process is now compute-intensive and a change in the dentist schedule can result in a lot of work.
All of that is because of two issues:
We are trying to query for the data that isn’t there.
The information is never used as queried.
These two points are strongly related to one another. Consider how you would schedule a dentist appointment. You first need to find the rough time frame that you need (“come back in six months”) and then you need to match it to your schedule (“I can’t on Monday, I got the kids”, etc).
There is a better way to handle that, by filling in the missing pieces. Instead of trying to compute the schedule of a dentist from the specification that we have, go the other way around. Generate the schedule based on the template you have. The result should be something like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
{
"Date" : "2022-03-09",
"Dentist": "dentists/3281-C",
"MaximumDuration": "01:30:00",
"Start": "09:00",
"End": "13:00",
"Appointments": [
{
"Start": "12:00:00",
"Duration": "01:00:00",
"Patient": "patients/2130-C",
"Task": "Filling tooth 26"
}
]
}
view raw
schedule-dentist-3281-C-2022-03-09.json
hosted with ❤ by GitHub
In other words, based on the schedule provided, we’ll generate an entry per day for the dentist. That entry will contain the appointments for the day as well as the maximum duration for an available appointment. That means that on query time, we can do something as simple as:
from Schedules where Dentist = $dentistId and At between $start and $end and MaximumDuration >= $reqDuration
And that gives us the relevant times that we can schedule the appointment. This is cheap to do, easy to work and it actually matches the common ways that users will use the system.
This has a bunch of other advantages, that are not immediately apparent but end up being quite important. Working with time sucks. The schedule above is a nice guideline, but it isn’t a really useful one when you need to run actual computations. Why is that? Well, it doesn’t account for vacations days. If there is a public holiday on Wednesday, the dentist isn’t working, but that is an implied assumption in the schedule.
For that matter, you now need to figure out which calendar to use. A Christian and a Jewish dentist are going to have different holiday calendars. Trying to push that into a query is going to be quite annoying, if not impossibly so. Putting that on the generator simplifies things, because you can “unroll” the schedule, apply the holiday calendar you want and then not think about it.
Other factors, such as vacation days, reserved time for emergencies and similar issues make it a lot easier to manage in a concrete form. Another important aspect is that the schedule changes, for any decent size clinic, the schedule changes all the time. You may have the dentist requesting to close a couple of hours early on Monday because of a dance recital and add more hours on Thursday. If the schedule is generated, this is a simple matter to do (manual adjusting). If we have just the schedule template, on the other hand… that becomes a lot more complex.
In short, the best way to handle this is to take the template schedule, generate it to a concrete schedule and operate from that point on.
If you're not already practicing continuous deployment, odds are your team and company would benefit from more frequent deployments. Let's…Keep Reading →
The Main method is the entry point of a C# application. When the application started, the Main method is the first method that is invoked.-- DocumentationIn fact, the Main method may not be the first method of the assembly to be executed when the application starts. There are different methods that
The new context menu in Windows 11 explorer is nice, but lots of applications are not yet integrated into it. Currently, I have to use the "Show more options" item too often (7-zip, Notepad++, Visual Studio Code, etc.). You can restore the old context menu by editing the registry. Open a command pr
I ran into this recently and I thought that this technique would make a great post. We are using that extensively inside of RavenDB to reduce the overhead of abstractions while not limiting our capabilities. It is probably best that I’ll start with an example. We have a need to perform some action, which needs to be specialized by the caller.
For example, let’s imagine that we want to aggregate the result of calling multiple services for a certain task. Consider the following code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
using System.Buffers;
using System.Threading.Tasks;
using System.Net.Http;
public class Orchestrator
{
string[] urls;
HttpClient client = new();
public Orchestrator(string[] urls)
{
this.urls = urls;
}
private static readonly Task<string> CompletedTask = Task.FromResult(string.Empty);
public async Task<T> Execute<T>(Func<string, HttpRequestMessage> factory, Func<Memory<string>, T> combine)
{
var tasks = ArrayPool<Task<string>>.Shared.Rent(urls.Length);
for (int i = 0; i < urls.Length; i++)
{
tasks[i] = client.SendAsync(factory(urls[i]))
.ContinueWith(t=> t.Result.Content.ReadAsStringAsync())
.Unwrap();
}
for (int i = urls.Length; i < tasks.Length; i++)
{
tasks[i] = CompletedTask;
}
await Task.WhenAll(tasks);
var results = ArrayPool<string>.Shared.Rent(urls.Length);
for (int i = 0; i < urls.Length; i++)
{
results[i] = tasks[i].Result;
}
ArrayPool<Task<string>>.Shared.Return(tasks);
var result = combine(new Memory<string>(results, 0, urls.Length));
ArrayPool<string>.Shared.Return(results);
return result;
}
}
view raw
OrchestratorV1.cs
hosted with ❤ by GitHub
As you can see, the code above sends a single request to multiple locations and aggregates the results. The point is that we can separate the creation of the request (and all that this entails) from the actual logic for aggregating the results.
Here is a typical usage for this sort of code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
var urls = new[] { "https://google.com", "https://bing.com" };
var orch = new Orchestrator(urls);
var term = "fun";
var t = await orch.Execute<string>(
url => new HttpRequestMessage(HttpMethod.Get,url + "/?q=" + term),
results =>
{
var sb = new StringBuilder();
var span = results.Span;
for(var i =0;i< span.Length; i++)
{
sb.AppendLine(span[i]);
}
return sb.ToString();
});
view raw
Usage.cs
hosted with ❤ by GitHub
You can notice that the code is fairly simple, and uses lambdas for injecting the specialized behavior into the process.
That leads to a bunch of problems:
Delegate / lambda invocation is more expensive.
Lambdas need to be allocated.
They capture state (and may capture more and for a lot longer than you would expect).
In short, when I look at this, I see performance issues down the road. But it turns out that I can write very similar code, without any of those issues, like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
using System.Buffers;
using System.Threading.Tasks;
using System.Net.Http;
public interface IMergedOperation<T>
{
T Combine(Memory<string> results);
HttpRequestMessage Create(string url);
}
public class Orchestrator
{
string[] urls;
HttpClient client = new();
public Orchestrator(string[] urls)
{
this.urls = urls;
}
private static readonly Task<string> CompletedTask = Task.FromResult(string.Empty);
public async Task<TResult> Execute<TMerger, TResult>(TMerger merger)
where TMerger : struct, IMergedOperation<TResult>
{
var tasks = ArrayPool<Task<string>>.Shared.Rent(urls.Length);
for (int i = 0; i < urls.Length; i++)
{
tasks[i] = client.SendAsync(merger.Create(urls[i]))
.ContinueWith(t=> t.Result.Content.ReadAsStringAsync())
.Unwrap();
}
for (int i = urls.Length; i < tasks.Length; i++)
{
tasks[i] = CompletedTask;
}
await Task.WhenAll(tasks);
var results = ArrayPool<string>.Shared.Rent(urls.Length);
for (int i = 0; i < urls.Length; i++)
{
results[i] = tasks[i].Result;
}
ArrayPool<Task<string>>.Shared.Return(tasks);
var result = merger.Combine(new Memory<string>(results, 0, urls.Length));
ArrayPool<string>.Shared.Return(results);
return result;
}
}
view raw
OrchestratorV2.cs
hosted with ❤ by GitHub
Here, instead of passing lambdas, we pass an interface. That has the same exact cost as lambda, in fact. However, in this case we also specify that this interface must be implemented by a struct (value type). That leads to really interesting behavior, since at JIT time, the system knows that there is no abstraction here, it can do optimizations such as inlining or calling the method directly (with no abstraction overhead). It also means that any state that we capture is done so explicitly (and we won’t be tainted by other lambdas in the method).
We still have good a separation between the process we run and the way we specialize that, but without any runtime overhead on this. The code itself is a bit more verbose, but not too onerous.