The International Components for Unicode (ICU) is a set of libraries that provide Unicode and internationalization support for software applications. Unicode is a standardized encoding system that represents almost all of the written languages of the world. It is used to represent characters in com
The recording of my webinar showing off the new Sharding feature in RavenDB 6.0 is now live. I’m showcasing the new technology preview of RavenDB 6.0 and we have a nightly build already available for it. I think the webinar was really good, and I’m super excited to discuss all the ins & out of how this works. Please take a look, and take the software for a spin. We have managed to get sharding down to a simple, obvious and clear process. And we are very much looking for your feedback.
Sometimes, you need to hold on to data that you really don’t want to have access to. A great example may be that your user will provide you with their theme color preference. I’m sure you can appreciate the sensitivity of preferring a light or dark theme when working in the IDE.
At any rate, you find yourself in an interesting situation, you have a piece of data that you don’t want to know about. In other words, the threat model we have to work with is that we protect the data from a malicious administrator. This may seem to be a far-fetched scenario, but just today I was informed that my email was inside the 200M users leak from Twitter. Having an additional safeguard ensures that even if someone manages to lay their hands on your database, there is little that they can do about it.
RavenDB supports Transparent Data Encryption. In other words, the data is encrypted on disk and will only be decrypted while there is an active transaction looking at it. That is a server-side operation, there is a single key (not actually true, but close enough) that is used for all the data in the database. For this scenario, that is not good enough. We need to use a different key for each user. And even if we have all the data and the server’s encryption key, we should still not be able to read the sensitive data.
How can we make this happen? The idea is that we want to encrypt the data on the client, with the client’s own key, that is never sent to the server. What the server is seeing is an encrypted blob, basically. The question is, how can we make it work as easily as possible. Let’s look at the API that we use to get it working:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
public class User
{
public string Name { get; set; }
public string Email { get; set; }
public Encrypted<string> PreferredTheme { get; set; }
}
[JsonConverter(typeof(EncryptedJsonConverter))]
public class Encrypted<T> : EncryptedJsonConverter.IShouldEncrypt
{
public Encrypted(){ }
public Encrypted(T v) {Value = v;}
public T Value { get; set; }
object EncryptedJsonConverter.IShouldEncrypt.GetValue() => Value;
}
view raw
Model.cs
hosted with ❤ by GitHub
As you can see, we indicate that the value is encrypted using the Encrypted<T> wrapper. That class is a very simple wrapper, with all the magic actually happening in the assigned JSON converter. Before we’ll look into how that works, allow me to present you with the way this document looks like to RavenDB:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
{
"Name": "Oren Eini (Ayende Rahien)",
"Email": "ayende@ayende.com",
"PreferredTheme": {
"Tag": "yqyhaf08pbZz7tdS4xiTpA==",
"Data": "Q5Txbs3I/SS/Q7vcNQHZIKyR++2tIhJWlS2AIXLXMME=",
"Nonce": "a6+QoR534F7T68Aq"
},
"@metadata": {
"@collection": "Users"
}
}
view raw
doc.json
hosted with ❤ by GitHub
As you can see, we don’t actually store the data as is. Instead, we have an object that stores the encrypted data as well as the authentication tag. The above document was generated from the following code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
using (var session = store.OpenSession())
{
session.Store(new User
{
Email = "ayende@ayende.com",
Name = "Oren Eini (Ayende Rahien)",
PreferredTheme = new Encrypted<string> { Value= "Dark" }
}, "users/1");
session.SaveChanges();
}
view raw
Create.cs
hosted with ❤ by GitHub
The JSON document holds the data we have, but without knowing the key, we do not know what the encrypted value is. The actual encrypted value is composed of three separate (and quite important) fields:
Tag – the authentication tag that ensures that the value we decrypt is indeed the value that was encrypted
Data – this is the actual encrypted value. Note that the size of the value is far larger than the value we actually encrypted. We do that to avoid leaking the size of the value.
Nonce – a value that ensures that even if we encrypt similar values, we won’t end up with an identical output. I talk about this at length here.
Just storing the data in the database is usually not sufficient, mind. Sure, with what we have right now, we can store and read the data back, safe from data leaks on the server side. However, we have another issue, we want to be able to query the data.
In other words, the question is how, without telling the database server what the value is, can we query for matching values? The answer is that we need to provide a value during the query that would match the value we stored. That is typically fairly obvious & easy. But it runs into a problem when we have cryptography. Since we are using a Nonce, it means that each time we’ll encrypt the value, we’ll get a different encrypted value. How can we then query for the value?
The answer to that is something called DAE (deterministic authenticated encryption). Here is how it works: instead of generating the nonce using random values and ensuring that it is never repeated, we’ll go the other way. We’ll generate the nonce in a deterministic manner. By effectively taking a hash of the data we’ll encrypt. That ensures that we’ll get a unique nonce for each unique value we’ll encrypt. And it means that for the same value, we’ll get the same encrypted output, which means that we can then query for that.
Here is an example of how we can use this from RavenDB:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
// Generated query:
// from 'Users' where PreferredTheme = $p0
// {"p0":{"Tag":"yqyhaf08pbZz7tdS4xiTpA==","Data":"Q5Txbs3I/SS/Q7vcNQHZIKyR++2tIhJWlS2AIXLXMME=","Nonce":"a6+QoR534F7T68Aq"}}
using (var session = store.OpenSession())
{
var users = session.Query<User>()
.Where(x => x.PreferredTheme == new Encrypted<string> { Value = "Dark" })
.ToList();
foreach (var user in users)
{
Console.WriteLine(user.PreferredTheme.Value);
}
}
view raw
query.cs
hosted with ❤ by GitHub
And with that explanation out of the way, let’s see the wiring we need to make this happen. Here is the JsonConverter implementation that makes this possible:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
public class EncryptedJsonConverter : JsonConverter
{
public interface IShouldEncrypt
{
object GetValue();
}
public override bool CanConvert(Type objectType)
{
return objectType.IsGenericType &&
objectType.GetGenericTypeDefinition() == typeof(Encrypted<>);
}
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
var data = JObject.Load(reader);
var tag = Convert.FromBase64String(data.Value<string>("Tag"));
var encrypted = Convert.FromBase64String(data.Value<string>("Data"));
var nonce = Convert.FromBase64String(data.Value<string>("Nonce"));
var plain = DeterministicEncryption.Decrypt(encrypted, tag, nonce);
var ms = new MemoryStream(plain);
var br = new BinaryReader(ms);
var str = br.ReadString();
var val = JsonConvert.DeserializeObject(str, objectType.GetGenericArguments()[0]);
return Activator.CreateInstance(objectType, val);
}
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
var se = (IShouldEncrypt)value;
var plainText = JsonConvert.SerializeObject(se.GetValue());
var ms = new MemoryStream();
var bw = new BinaryWriter(ms);
bw.Write(plainText);
// pad to 32 bytes boundary
ms.SetLength(ms.Length + (32 - ms.Length % 32));
var bytes = ms.ToArray();
var (encrypted, tag, nonce) = DeterministicEncryption.Encrypt(bytes);
writer.WriteStartObject();
writer.WritePropertyName("Tag");
writer.WriteValue(tag);
writer.WritePropertyName("Data");
writer.WriteValue(encrypted);
writer.WritePropertyName("Nonce");
writer.WriteValue(nonce);
writer.WriteEndObject();
}
}
view raw
EncryptedJsonConverter.cs
hosted with ❤ by GitHub
There is quite a lot that is going on here. This is a JsonConverter, which translates the in-memory data to what is actually sent over the wire for RavenDB.
On read, there isn’t much that is going on there, we pull the individual fields from the JSON and pass them to the DeterministicEncryption class, which we’ll look at shortly. We get the plain text back, read the JSON we previously stored, and translate that back into a .NET object.
On write, things are slightly more interesting. We convert the object to a string, and then we write that to an in memory stream. We ensure that the stream is always aligned on 32 bytes boundary (to avoid leaking the size). Without that step, you could distinguish between “Dark” and “Light” theme users simply based on the length of the encrypted value. We pass the data to the DeterministicEncryption class for actual encryption and build the encrypted value. I choose to use a complex object, but we could also put this into a single field just as easily.
With that in place, the last thing to understand is how we perform the actual encryption:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
public static class DeterministicEncryption
{
public static Func<byte[]> GetCurrentKey;
public static (byte[] Encrypted, byte[] Tag, byte[] Nonce) Encrypt(byte[] bytes)
{
var (sivKey, encKey) = DeriveKeys();
using var aes = new AesGcm(encKey);
using var hmac = Blake2b.CreateHMAC(12, sivKey);
var nonce = hmac.ComputeHash(bytes);
var tag = new byte[16];
var encrypted = new byte[bytes.Length];
aes.Encrypt(nonce, bytes, encrypted, tag);
return (encrypted, tag, nonce);
}
public static byte[] Decrypt(byte[] encrypted, byte[] tag, byte[] nonce)
{
var (_, encKey) = DeriveKeys();
using var aes = new AesGcm(encKey);
var plain = new byte[encrypted.Length];
aes.Decrypt(nonce, encrypted, tag, plain);
return plain;
}
private static (byte[] SivKey, byte[] EncKey) DeriveKeys()
{
var derivedKey = Blake2b.ComputeHash(64, GetCurrentKey()); // from Blake2Fast
return (derivedKey[0..32], derivedKey[32..64]);
}
}
view raw
DeterministicEncryption.cs
hosted with ❤ by GitHub
There is actually very little code here, which is pretty great. The first thing to note is that we have GetCurrentKey, which is a delegate you need to provide to find the current key. You can have a global key for the entire application or for the current user, etc. This key isn’t the actual encryption key, however. In the DerivedKeys function, we use the Blake2b algorithm to turn that 32 bytes key into a 64 bytes value. We then split this into two 32 bits keys. The idea is that we separate the domains, we have one key that is used for computing the SIV and another for the actual encryption.
We use HMAC-Blake2b using the SIV key to compute the nonce of the value in a deterministic manner and then perform the actual encryption. For decryption, we go in reverse, but we don’t need to derive a SIV, obviously.
With this in place, we have about 100 lines of code that add the ability to store client-side encrypted values and query them. Pretty neat, even if I say so myself.
Note that we can store the encrypted value inside of RavenDB, which the database have no way of piercing, and retrieve those values back as well as query them for equality. Other querying capabilities, such as range or prefix scans are far more complex and tend to come with security implications that weaken the level guarantees you can provide.
If you have your own web sites or apps that you maintain, it's helpful to know when they're not working. One tool I've been using for a long…Keep Reading →
This Wednesday I’m going to be doing a webinar about RavenDB & Sharding. This is going to be the flagship feature for RavenDB 6.0 and I’m really excited to be talking about it in public finally.
Sharding involves splitting your data into multiple nodes. Similar to having different volumes of a single encyclopedia.
RavenDB’s sharding implementation is something that we have spent the past three or four years working on. That has been quite a saga to get it out. The primary issue is that we want to achieve two competing goals:
Allow you to scale the amount of data you have to near infinite levels.
Ensure that RavenDB remains simple to use and operate.
The first goal is actually fairly easy and straightforward. It is the second part that made things complicated. After a lot of work, I believe that we have a really good solution at hand.
In the webinar, I’m going to be presenting how RavenDB 6.0 implements sharding, the behavior of the system at scale, and all the details you need to know about how it works under the cover.
I’m really excited to finally be able to show off the great work of the team! Join me, it’s going to be really interesting.
I've already talked about preventing breaking changes in the post I fixed a bug. What should I do now?. But things evolve, and now the .NET SDK provides new features to help NuGet package authors to detect breaking changes between two versions of a NuGet package when building a new version.Starting
I posted this code previously:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
using System;
Iterator? it = new Iterator();
while (it.Value.MoveNext())
{
Console.WriteLine(it.Value.Current);
}
public struct Iterator
{
public int Current;
public bool MoveNext()
{
Current++;
return Current < 10;
}
}
view raw
what.cs
hosted with ❤ by GitHub
And asked what it prints. This is actually an infinite loop that will print an endless amount of zeros to the console. The question is why.
The answer is that we are running into two separate features of C# that interact with each other in a surprising way.
The issue is that we are using a nullable iterator here, and accessing the struct using the Value property. The problem is that this is a struct, and using a property will cause it to be copied.
So the way it works, the code actually runs:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
Iterator? it = new Iterator();
while (true)
{
if (it.HasValue == false)
break;
Iterator localIt = it.Value;
if (localIt.MoveNext() == false)
break;
Iterator localIt2 = it.Value;
Console.WriteLine(localIt2.Current);
}
view raw
how.cs
hosted with ❤ by GitHub
And now you can more easily see the temporary copies that are created and how because we are using a value type here, we are using a different instance each time.
Given the following code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
using System;
Iterator? it = new Iterator();
while (it.Value.MoveNext())
{
Console.WriteLine(it.Value.Current);
}
public struct Iterator
{
public int Current;
public bool MoveNext()
{
Current++;
return Current < 10;
}
}
view raw
what.cs
hosted with ❤ by GitHub
Can you guess what it will do? Can you explain why?I love that this snippet is under 20 lines of code, but being able to explain it shows a lot more knowledge about C# than you would expect.