Converting a docker-compose file to .NET Aspire
by Andrew Lock
posted on: May 27, 2025
In this post I describe how I converted the deployment method of the mailing-list manager lismonk from a docker-compose.yml file to an Aspire app host project…
by Andrew Lock
posted on: May 27, 2025
In this post I describe how I converted the deployment method of the mailing-list manager lismonk from a docker-compose.yml file to an Aspire app host project…
by Oren Eini
posted on: May 26, 2025
When building RavenDB 7.0, a major feature was Vector Search and AI integration.We weren't the first database to make Vector Search a core feature, and that was pretty much by design. Not being the first out of the gate meant that we had time to observe the industry, study new research, and consider how we could best enable Vector Search for our users. This isn’t just about the algorithm or the implementation, but about the entire mindset of how you provide the feature to your users. The logistics of a feature dictate how effectively you can use it, after all.This post is prompted by the recent release of SQL Server 2025 Preview, which includes Vector Search indexing.Looking at what others in the same space are doing is fascinating. The SQL Server team is using the DiskANN algorithm for their Vector Search indexes, and that is pretty exciting to see.The DiskANN algorithm was one of the algorithms we considered when implementing Vector Search for RavenDB. We ended up choosing the HNSW algorithm as the basis for our vector indexing.This is a common choice; most databases with both indexing options use HNSW. PostgreSQL, MongoDB, Redis, and Elasticsearch all use HNSW.Microsoft’s choice to use DiskANN isn’t surprising (DiskANN was conceived at Microsoft, after all). I also assume that Microsoft has sufficient resources and time to do a good job actually implementing it. So I was really excited to see what kind of behavior the new SQL Server has here.RavenDB's choice of HNSW for vector search ended up being pretty simple.Of all the algorithms considered, it was the only one that met our requirements.These requirements are straightforward: Vector Search should function like any other index in the system. You define it, it runs, your queries are fast. You modify the data, the index is updated, your queries are still fast.I don’t think this is too much to ask :-), but it turned out to be pretty complicated when we look at the Vector Search indexes. Most vector indexing solutions have limitations, such as requiring all data upfront (ANNOY, SCANN) or degrading over time (IVF Flat, LSH) with modifications.HNSW, on the other hand, builds incrementally and operates efficiently on inserted, updated, and deleted data without significant maintenance.Therefore, it was interesting to examine the DiskANN behavior in SQL Server, as it's a rare instance of a world-class algorithm available from the source that I can start looking at. I must say I'm not impressed. I’m not talking about the actual implementation, but rather the choices that were made for this feature in general. As someone who has deeply explored this topic and understands its complexities, I believe using vector indexes in SQL Server 2025, as it currently appears, will be a significant hassle and only suitable for a small set of scenarios.I tested the preview using this small Wikipedia dataset, which has just under 500,000 vectors and less than 2GB of data – a tiny dataset for vector search.On a Docker instance with 12 cores and 32 GB RAM, SQL Server took about two and a half hours to create the index!In contrast, RavenDB will index the same dataset in under two minutes.I might have misconfigured SQL Server or encountered some licensing constraints affecting performance, but the difference between 2 minutes and 150 minutes is remarkable. I’m willing to let that one go, assuming I did something wrong with the SQL Server setup. Another crucial aspect is that creating a vector index in SQL Server has other implications. Most notably, the source table becomes read-only and is fully locked during the (very long) indexing period.This makes working with vector indexes on frequently updated data very challenging to impossible. You would need to copy data every few hours, perform indexing (which is time-consuming), and then switch which table you are querying against – a significant inconvenience.Frankly, it seems suitable only for static or rarely updated data, for example, if you have documentation that is updated every few months.It's not a good solution for applying vector search to dynamic data like a support forum with continuous questions and answers.I believe the design of SQL Server's vector search reflects a paradigm where all data is available upfront, as discussed in research papers. DiskANN itself is immutable once created. There is another related algorithm, FreshDiskANN, which can handle updates, but that isn’t what SQL Server has at this point. The problem is the fact that this choice of algorithm is really not operationally transparent for users. It will have serious consequences for anyone trying to actually make use of this for anything but frozen datasets. In short, even disregarding the indexing time difference, the ability to work with live data and incrementally add vectors to the index makes me very glad we chose HNSW for RavenDB. The entire problem just doesn’t exist for us.
by Gérald Barré
posted on: May 26, 2025
The C# compiler is smart enough to understand some constructs and generates optimized code. However, C# gives you multiple ways to express the same intent. The compiler is not always smart enough to optimize every single ways. In this post, I will show you how to use pattern matching syntax to impr
by Andrew Lock
posted on: May 20, 2025
In this post I show how you can push push a whole stack of branches with a single command using a Git alias: git push-stack…
by Gérald Barré
posted on: May 19, 2025
When creating a new virtual machine in Hyper-V with Ubuntu Server with all the default settings, the partition will have 60GB of space instead of 127GB (default size of the virtual hard disk). Maybe there is a setting somewhere that I missed during the setup, but I couldn't find it. The solution is
by Shyam Namboodiripad
posted on: May 15, 2025
Announcing content safety evaluations and other imporvements in the Microsoft.Extensions.AI.Evaluation libraries.
by Oren Eini
posted on: May 14, 2025
RavenDB 7.0 introduces vector search using the Hierarchical Navigable Small World (HNSW) algorithm, a graph-based approach for Approximate Nearest Neighbor search. HNSW enables efficient querying of large vector datasets but requires random access to the entire graph, making it memory-intensive.HNSW's random read and insert operations demand in-memory processing. For example, inserting 100 new items into a graph of 1 million vectors scatters them randomly, with no efficient way to sort or optimize access patterns. That is in dramatic contrast to the usual algorithms we can employ (B+Tree, for example), which can greatly benefit from such optimizations.Let’s assume that we deal with vectors of 768 dimensions, or 3KB each. If we have 30 million vectors, just the vector storage is going to take 90GB of memory. Inserts into the graph require us to do effective random reads (with no good way to tell what those would be in advance). Without sufficient memory, performance degrades significantly due to disk swapping.The rule of thumb for HNSW is that you want at least 30% more memory than the vectors would take. In the case of 90GB of vectors, the minimum required memory is going to be 128GB. Cost reductionYou can also use quantization (shrinking the size of the vector with some loss of accuracy). Going to binary quantization for the same dataset requires just 3GB of space to store the vectors. But there is a loss of accuracy (may be around 20% loss). We tested RavenDB’s HNSW implementation on a 32 GB machine with a 120 GB vector index (and no quantization), simulating a scenario with four times more data than available memory.This is an utterly invalid scenario, mind you. For that amount of data, you are expected to build the HNSW index on a machine with 192GB. But we wanted to see how far we could stress the system and what kind of results we would get here. The initial implementation stalled due to excessive memory use and disk swapping, rendering it impractical. Basically, it ended up doing a page fault on each and every operation, and that stalled forward progress completely. Optimization ApproachOptimizing for such an extreme scenario seemed futile, this is an invalid scenario almost by definition. But the idea is that improving this out-of-line scenario will also end up improving the performance for more reasonable setups.When we analyzed the costs of HNSW in a severely memory-constrained environment, we found that there are two primary costs.Distance computations: Comparing vectors (e.g., cosine similarity) is computationally expensive.Random vector access: Loading vectors from disk in a random pattern is slow when memory is insufficient.Distance computation is doing math on two 3KB vectors, and on a large graph (tens of millions), you’ll typically need to run between 500 - 1,500 distance comparisons. To give some context, adding an item to a B+Tree of the same size will have fewer than twenty comparisons (and highly localized ones, at that).That means reading about 2MB of data per insert on average. Even if everything is in memory, you are going to be paying a significant cost here in CPU cycles. If the data does not reside in memory, you have to fetch it (and it isn’t as neat as having a single 2MB range to read, it is scattered all over the place, and you need to traverse the graph in order to find what you need to read). To address this issue, we completely restructured how we go about inserting nodes into the graph. We avoid serial execution and instead spawn multiple insert processes at the same time. Interestingly enough, we are single-threaded in this regard. We extract from the process the parts where it does the distance computation and loads the vectors.Each process will run the algorithm until it reaches the stage where it needs to run distance computation on some vectors. In that case, it will yield to another process and let it run. We keep doing this until we have no more runnable processes to execute. We can then scan the list of nodes that we need to process (run distance computation on all their edges), and we can then:Gather all vectors needed for graph traversal.Preload them from disk efficiently.Run distance computations across multiple threads simultaneously.The idea here is that we save time by preloading data efficiently. Once the data is loaded, we throw all the distance computations per node to the thread pool. As soon as any of those distance computations are done, we resume the stalled process for it. The idea is that at any given point in time, we have processes moving forward or we are preloading from the disk, while at the same time we have background threads running to compute distances.This allows us to make maximum use of the hardware. Here is what this looks like midway through the process:As you can see, we still have to go to the disk a lot (no way around that with a working set of 120GB on a 32GB machine), but we are able to make use of that efficiently. We always have forward progress instead of just waiting. Don’ttry this on the cloud - most cloud instances have a limited amount of IOPS to use, and this approach will burn through any amount you have quickly. We are talking about roughly 15K - 20K IOPS on a sustained basis. This is meant for testing in adverse conditions, on hardware that is utterly unsuitable for the task at hand. A machine with the proper amount of memory will not have to deal with this issue. While still slow on a 32 GB machine due to disk reliance, this approach completed indexing 120 GB of vectors in about 38 hours (average rate of 255 vectors/sec). To compare, when we ran the same thing on a 64GB machine, we were able to complete the indexing process in just over 14 hours (average rate of 694 vectors/sec). Accuracy of the resultsWhen inserting many nodes in parallel to the graph, we risk a loss of accuracy. When we insert them one at a time, nodes that are close together will naturally be linked to one another. But when running them in parallel, we may “miss” those relations because the two nodes aren’t yet discoverable.To mitigate that scenario, we preemptively link all the in-flight nodes to each other, then run the rest of the algorithm. If they are not close to one another, these edges will be pruned. It turns out that this behavior actually increased the overall accuracy (by about 1%) compared to the previous behavior. This is likely because items that are added at the same time are naturally linked to one another.ResultsOn smaller datasets (“merely” hundreds of thousands of vectors) that fit in memory, the optimizations reduced indexing time by 44%! That is mostly because we now operate in parallel and have more optimized I/O behavior.We are quite a bit faster, we are (slightly) more accurate, and we also have reasonable behavior when we exceed the system limits. What about binary quantization?I mentioned that binary quantization can massively reduce the amount of space required for vector search. We also ran tests with the same dataset using binary quantization. The total size of all the vectors was around 3GB, and the total index size (including all the graph nodes, etc.) was around 18 GB. That took under 5 hours to index on the same 32GB machine. If you care to know what it is that we are doing, take a look at the Hnsw.Parallel file inside the repository. The implementation is quite involved, but I’m really proud of how it turned out in the end.
by Tara Overfield
posted on: May 13, 2025
A recap of the latest servicing updates for .NET and .NET Framework for May 2025.
by .NET Team
posted on: May 13, 2025
Find out about the new features in .NET 10 Preview 4 across the .NET runtime, SDK, libraries, ASP.NET Core, Blazor, C#, .NET MAUI, and more!
by Andrew Lock
posted on: May 13, 2025
In this post I use the new Microsoft's new .NET AI template to ingest the contents of a website and create a chatbot that can answer questions with citations…