skip to content
Relatively General .NET

Challenge

by Oren Eini

posted on: July 01, 2024

At the heart of RavenDB, there is a data structure that we call the Page Translation Table. It is one of the most important pieces inside RavenDB.The page translation table is basically a Dictionary<long, Page>, mapping between a page number and the actual page. The critical aspect of this data structure is that it is both concurrent and multi-version. That is, at a single point, there may be multiple versions of the table, representing different versions of the table at given points in time.The way it works, a transaction in RavenDB generates a page translation table as part of its execution and publishes the table on commit. However, each subsequent table builds upon the previous one, so things become more complex. Here is a usage example (in Python pseudo-code):table = {} with wtx1 = write_tx(table): wtx1.put(2, 'v1') wtx1.put(3, 'v1') wtx1.publish(table) # table has (2 => v1, 3 => v1) with wtx2 = write_tx(table): wtx2.put(2, 'v2') wtx2.put(4, 'v2') wtx2.publish(table) # table has (2 => v2, 3 => v1, 4 => v2)This is pretty easy to follow, I think. The table is a simple hash table at this point in time.The catch is when we mix read transactions as well, like so:# table has (2 => v2, 3 => v1, 4 => v2) with rtx1 = read_tx(table): with wtx3 = write_tx(table): wtx3.put(2, 'v3') wtx3.put(3, 'v3') wtx3.put(5, 'v3') with rtx2 = read_tx(table): rtx2.read(2) # => gives, v2 rtx2.read(3) # => gives, v1 rtx2.read(5) # => gives, None wtx3.publish(table) # table has (2 => v3, 3 => v3, 4 => v2, 5 => v3) # but rtx2 still observe the value as they were when # rtx2 was created rtx2.read(2) # => gives, v2 rtx2.read(3) # => gives, v1 rtx2.read(5) # => gives, NoneIn other words, until we publish a transaction, its changes don’t take effect. And any read translation that was already started isn’t impacted. We also need this to be concurrent, so we can use the table in multiple threads (a single write transaction at a time, but potentially many read transactions). Each transaction may modify hundreds or thousands of pages, and we’ll only clear the table of old values once in a while (so it isn’t infinite growth, but may certainly reach respectable numbers of items).The implementation we have inside of RavenDB for this is complex! I tried drawing that on the whiteboard to explain what was going on, and I needed both the third and fourth dimensions to illustrate the concept. Given these requirements, how would you implement this sort of data structure?

How to output a SARIF file from a .NET project

by Gérald Barré

posted on: July 01, 2024

SARIF (Static Analysis Results Interchange Format) is an OASIS Standard that defines an output file format. The SARIF standard is used to streamline how static analysis tools share their results. SARIF is a JSON-based format that is easy to parse. Lots of tools support it, including Visual studio C

RavenDB News: June, 2024

by Oren Eini

posted on: June 28, 2024

We recently published an article on Getting started with GraphQL and RavenDB, it will walk you through setting up Hot Chocolate to create a RavenDB-based GraphQL endpoint in your system.Here is what this looks like:Another new feature is the New Database Wizard, which was completely redesigned and made much simpler. We have a great number of features & options, and the idea is that we want to give you better insight into what you can do with your system. Here is a quick peek, but take a look at the link. I’m biased, of course, but I think the team did a really great job in exposing a really complex set of options in a very clear manner.The About Page of RavenDB Studio has undergone major updates. In particular, we have made it easier to see that new versions are available, what changes are included in each version, and check for updates. Additionally, you can now review your license details and compare features available in different editions. Furthermore, we pushed an update to the About Page of RavenDBitself. Here we try to tell the story of RavenDB and how it came about. Beyond the narrative, our goal is to explain the design philosophy of RavenDB.The idea is quite simple, we aim to be the database that you don’t have to think about. The one component in your system that you don’t need to worry about, the cog that just keeps on working. If we do our job right, we are a very boring database. Amusingly enough, the actual story behind it is quite interesting, and I would love to get your feedback on it.We also published our new roadmap, which has a bunch of new goodies for you. I’m quite excited about this, and I hope you’ll too (in a boring manner 🙂). Upcoming features include data governance features, Open Telemetry integration, extremely large clusters, and more.One of our stated goals in the roadmap is better performance, with a focus on ARM hardware, which is taking the server / cloud world by storm. To that end, we are performing many “crimes against code” to squeeze every last erg of performance from the system. Initial results are promising, but we still have some way to go before we can publicly disclose numbers. As usual, I would appreciate your feedback about the roadmap and the new features in general.

Implementing "Suggested Destinations" in a few lines of code

by Oren Eini

posted on: June 26, 2024

Today I got in my car to drive to work and realized that Waze suggested “Work” as the primary destination to select. I had noticed that before, and it is a really nice feature. Today, I got to thinking about how I would implement something like that. That was a nice drive since I kept thinking about algorithms and data flow. When I got to the office, I decided to write about how we can implement something like that. Based on historical information, let’s suggest the likely destinations. Here is the information we have:The Lat & Lng coordinates represent the start location, the time is the start time for the trip, and the destination is obvious. In the data set above, we have trips to and from work, to the gym once a week, and to our parents over the weekends.Based on this data, I would like to build recommendations for destinations. I could try to analyze the data and figure out all sorts of details. The prediction that I want to make is, given a location & time, to find where my likely destination is going to be.I could try to analyze the data on a deep level, drawings on patterns, etc. Or I can take a very different approach and just throw some computing power at the problem. Let’s talk in code since this is easier. I have a list of trips that look like this:public record Trip(double Lat, double Lng, string Destination, DateTime Time); Trip[] trips = RecentTrips(TimeSpan.FromDays(90));Given that, I want to be able to write this function:string[] SuggestDestination((double Lat, double Lng) location, DateTime now)I’m going to start by processing the trips data, to extract the relevant information:var historyByDest = new Dictionary<string, List<double[]>>(); foreach (var trip in trips) { if (historyByDest.TryGetValue(trip.Destination, out var list) is false) { historyByDest[trip.Destination] = list = new(); } list.Add([ trip.Lat, trip.Lng, trip.Time.Hour * 100 + trip.Time.Minute, // minutes after midnight trip.Time.DayOfYear, (int)trip.Time.DayOfWeek ]); }What this code does is extract details (location, day of the week, time of day, etc.) from the trip information and store them in an array. For each trip, we basically break apart the trip across multiple dimensions. The next step is to make the actual prediction we want, which will begin by extracting the same dimensions from the inputs we get, like so:double[] compare = [ location.Lat, location.Lng, now.Hour * 100 + now.Minute, now.DayOfYear, (int)now.DayOfWeek ];Now we basically have an array of values from which we want to predict, and for each destination, an array that represents the same dimensions of historical trips. Here is the actual computation:List<(string Dest, double Score)> scores = new(); foreach (var (dest, items) in historyByDest) { double score = 0; foreach (var cur in items) { for (var i = 0; i < cur.Length; i++) { score += Math.Abs(cur[i] - compare[i]); } } score /= items.Count; scores.Add((dest, score)); } scores.Sort((x, y) => x.Score.CompareTo(y.Score));What we do here is compute the difference between the two arrays: the current start location & time compared to the start location & time of historical trips. We do that not only on the raw data but also extract additional features from the information.For example, one dimension is the day of the week, and the other is the time of day. It is not sufficient to compare just the date itself. The end result is the distance between the current trip start and previous trips for each of the destinations I have. Then I can return the destinations that most closely match my current location & time.Running this over a few tests shows that this is remarkably effective. For example, if I’m at home on a Saturday, I’m very likely to visit either set of grandparents. On Sunday morning, I head to the Gym or Work, but on Monday morning, it is more likely to be Work.All of those were mostly fixed, with the day of the week and the time being different. But If I’m at my parents’ house on a weekday (which is unusual), the location would have a far greater weight on the decision, etc. Note that the code is really trivial (I spent more time generating the actual data), but we can extract some nice information from this. The entire code is here, admittedly it’s pretty dirty code since I wanted to test how this would actually work. At this point, I’m going to update my Curriculum Vitae and call myself a senior AI developer. Joking aside, this approach provides a good (although highly simplified) overview of how modern AI systems work. Given a data item (image, text, etc.), you run that through the engine that outputs the embedding (those arrays we saw earlier, with values for each dimension) and then try to find its nearest neighbors across multiple dimensions. In the example above, I explicitly defined the dimensions to use, whereas LLMs would have their“secret sauce” for this. The concept, at a sufficiently high level, is the same.

Improve the tree view settings in Visual Studio Code

by Gérald Barré

posted on: June 24, 2024

I think the default settings for the tree view in Visual Studio Code are not very good. The indentation is too small, and the indentation guides are not visible enough. Here's how to improve the tree view settings:Open the VS Code settingsAdd the following json content to the settings:JSONcopy{

Recording

by Oren Eini

posted on: June 19, 2024

Watch Oren Eini, CEO of RavenDB, as he delves into the intricate process of constructing a database engine using C# and .NET. Uncover the unique features that make C# a robust system language for high-end system development. Learn how C# provides direct memory access and fine-grained control, enabling developers to seamlessly blend high-level concepts with intimate control over system operations within a single project. Embark on the journey of leveraging the power of C# and .NET to craft a potent and efficient database engine, unlocking new possibilities in system development.I’m going deep into some of the cool stuff that you can do with C# and low level programming.