Date range queries can be quite expensive for RavenDB. Consider the following query:
from index 'Users/Search' where search(DisplayName, "Oren") and CreationDate between "2008-10-13T07:18:01.623" and "2018-10-13T07:18:01.623" include timings()
The root issue is that we have a compound query here, we use full text search on the left but then need to match it on the right. The way Lucene works, we have to compute the set of all the documents that match the date range. If we have a lot of documents in that range, we have to scan through a lot of values here.
We spent a lot of time and effort optimizing date queries in RavenDB. Such issues also impacted heavily the design of our next-gen indexing capabilities (but more on that when it matures enough to discuss).
One of the primary design principles of RavenDB is that it learns from previous usage, and we realized that date ranges in queries are likely to repeat often. So we take advantage of that. The details are a bit complex and require that you’ll understand how Lucene stores its data in immutable segments. We are able to analyze queries on repeating date ranges and remember them, so next time we use the same type of date range, we’ll already have the set of matching documents ready.
That feature was deployed to address a specific customer scenario, where they do a lot of wide date range queries and it had a big impact on that.
Last week we ran into some funny metrics for a completely different customer, with a very different scenario. You can probably tell at what point they moved to the updated version of RavenDB and were able to take advantage of this feature:
The really nice thing about this, from my perspective, is that none of us even considered the impact that feature would have for this scenario. They upgraded to the latest version to get access to the new features, and this is just sitting in the background, pushing their CPU utilization to near zero.
That’s the kind of news that I love to get.
Sometimes you want to execute a step or a job only when some files are modified. For instance, you may want to lint and deploy the documentation only when a file under docs is modified. This scenario is not directly supported by GitHub Actions. However, you can do it manually by using git, PowerShe
This series has been going on quite a bit longer than I intended it to be. Barring any new developments or questions, I think that this will be the last post I’ll write on the topic.
In a previous post, I implemented authenticated encryption, ensuring that we’ll be able to clearly detect if the ciphertext we got has been modified along the way. That is relevant because we have to think about malicious actors but we also have to consider things like bit flips, random hardware errors, etc.
In most crypto systems, we have to pass some metadata about the encrypted messages that we pass. Right now, this is strictly outside our scope, but it turns out that there is a compelling reason to consider that plain text data as well. For example, let’s say that I’m sending a number of messages. I have to include the message length and its position in the set of messages I sent in the clear. Otherwise, the receiver might not be able to make sense of that. When we need to decrypt the message, we want to include that additional information (which wasn’t encrypted) as well.
The reason for that is simple, we want to ensure that the other data hasn’t been modified using the same cryptographic tools we already have. It turns out that this is quite simple, check out the code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn encryptSivWithNonce(key: [32]u8, nonceKey: []u8, additionalData: []u8, plainText: []const u8, cipherText: []u8) void {
var nonce: [HmacMd5.mac_length]u8 = undefined;
HmacMd5.create(&nonce, plainText, nonceKey);
var mca = MyCryptoAlgo.initWithNonce(key, nonce);
mca.encryptBlock(plainText, cipherText);
// additional data hashing
mca.macGen.update(additionalData);
mca.finalize(cipherText[plainText.len..]);
}
pub fn decrypt(key: [32]u8, additionalData: []u8, cipherText: []u8, plainText: []u8) !void {
if (cipherText.len < ExtraLen)
return error.CihpherTextBufferTooSmall;
if (cipherText.len - ExtraLen > plainText.len)
return error.PlainTextBufferTooSmall;
var nonce: [HmacMd5.mac_length]u8 = undefined;
std.mem.copy(u8, &nonce, cipherText[cipherText.len - ExtraLen .. cipherText.len - NonceLen]);
var dec = MyCryptoAlgo.initWithNonce(key, nonce);
var msgText = cipherText[0 .. cipherText.len - ExtraLen];
var expectedMac: [16]u8 = undefined;
std.mem.copy(u8, &expectedMac, cipherText[cipherText.len - NonceLen .. cipherText.len]);
var myMac: [16]u8 = undefined;
dec.macGen.update(msgText);
dec.macGen.update(additionalData);
dec.macGen.final(&myMac);
if (std.crypto.utils.timingSafeEql([16]u8, expectedMac, myMac) == false) {
return error.InvalidMac;
}
dec.xorWithKeyStream(msgText, plainText);
}
view raw
aead.zig
hosted with ❤ by GitHub
We added a parameter for the encryptSivWithNonce() and decrypt() functions that has the buffer of all the associated data for this message. And all we need to do is add that to the MAC computation as well. On the decrypt(), we do the exact same. We compute the hash from the encrypted text and the additional data and abort if they aren’t an exact match.
And with this in place, we implemented a (probably very bad) encryption system from a single primitive (MD5) and brought it to roughly modern standards of AEAD encryption (Authenticated Encryption, Additional Data).
I want to emphasize that this entire series is meant primarily to go over the details of how you build and use an encryption system, not to actually build a real one. I didn’t do any analysis on how secure such a system would be, and I wouldn’t want to trust this with anything beyond toying around.
If you have any references on similar systems, I would be very happy to learn about that, I doubt that I’m the first person who tried to build stream cipher from MD5, after all.
I mentioned in a previous post that nonce reuse is a serious problem. It is enough to reuse a nonce even once and you can have catastrophic results at your hands. The problem occurs, interestingly enough, when we are able to capture two separate messages generated with the same key & nonce. If we capture the same message twice, that is not an issue (we can XOR the values, they will be all zeroes).
The question is whether there is something that can be done about this. The answer to that is yes, we can create a construction that would be safe from nonce reuse. This is called SIV mode (Syntactic IV).
The way to do that is to make the nonce itself depend on the value that we are encrypting. Take a look at the following code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn encryptSiv(key: [32]u8, plainText: []const u8, cipherText: []u8) void {
var nonceKey: [HmacMd5.mac_length]u8 = undefined;
std.crypto.random.bytes(&nonceKey);
encryptSivWithNonce(key, &nonceKey, plainText, cipherText);
}
pub fn encryptSivWithNonce(key: [32]u8, nonceKey: []u8, plainText: []const u8, cipherText: []u8) void {
var nonce: [HmacMd5.mac_length]u8 = undefined;
HmacMd5.create(&nonce, plainText, nonceKey);
var mca = MyCryptoAlgo.initWithNonce(key, nonce);
mca.encryptBlock(plainText, cipherText);
mca.finalize(cipherText[plainText.len..]);
}
view raw
encryptSiv.zig
hosted with ❤ by GitHub
The idea is that we get a nonce, as usual, but instead of relying on the nonce, we’ll compute a hash of the plain text using the nonce as a key. Aside from that, the rest of the system behaves as usual. In particular, there are no changes to the decryption.
Let’s see what the results are, shall we?
With a separate nonce per use:
attack at dawn!
838E1CE1A64D97E114237DE161A544DA5030FC5ECAB1C20D34AF838634D1C591AE208FC0AEE706690669E9F56F45C1
attack at dusk!
EEA7DE8A51A06FE6CA9374CDDEC1053249F8B1F0BF1995A0EEEE7D6EBF68868ECAE7CBEFD6EE23017480ACD494D634
Now, what happens when we use the same nonce?
attack at dusk!
0442EFA977919327C92B47C7F6A0CD617AE4FD3138DF07D45994EBC2C4B709ACDE1130422924B7206354D03569FDAA
attack at dawn!
324A996C22F7FFDE62596C0E9EE37D7EE1F89569A10A1188BA4A03EE7B8C47DF347A20D1B73EB4523D3511F2F46FF2
As you can see, the values are completely different, even though we used the same key and nonce and the plain text is mostly the same.
Because we are generating the actual nonce we use from the hash of the input, reusing the nonce with different data will result in a wildly different nonce being actually used. We are safe from the catastrophe that is nonce reuse.
With SIV mode, we are paying with an additional hashing pass over the plain text data. In return, we have the following guarantees:
If the nonce isn’t reused, we have nothing to worry about.
If the nonce is reused, an attacker will not be able to learn anything about the content of the message.
However, an attacker may be able to detect that the same message is being sent, if the nonce is reused.
Given the cost of nonce reuse without SIV, it is gratifying to see that the worst case scenario for SIV with nonce reuse is that an adversary can detect duplicate messages.
I’m not sure how up to date this is, but this report shows that SIV adds about 1.5 – 2.2 cycles to the overall cost of encryption. Note that this is for actual viable implementations, instead of what I’m doing, which is by design, not a good idea.
In the previous post, I showed some code that compared two MAC values (binary buffers) and I mentioned that the manner in which I did that was bad.
Here is the code in question:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
// bad code
if (std.mem.eql(u8, mac, &myMac) == false) {
return error.InvalidMac;
}
// good code
if (std.crypto.utils.timingSafeEql(u8, mac, &myMac) == false) {
return error.InvalidMac;
}
view raw
options.zig
hosted with ❤ by GitHub
When you are looking at code that is used in a cryptographic context, you should be aware that any call that compares buffers (or strings) cannot short circuit. What do I mean by that? Let’s look at the implementation of those two functions:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn timingSafeEql(a: []u8, b: []u8) bool {
var acc :u8=0;
for (a) |x, i| {
acc |= x ^ b[i];
}
return acc ==0;
}
pub fn eql(a: []const u8, b: []const u8) bool {
if (a.len != b.len) return false;
if (a.ptr == b.ptr) return true;
for (a) |item, index| {
if (b[index] != item) return false;
}
return true;
}
view raw
impl.zig
hosted with ❤ by GitHub
Those two functions are doing the same thing, but in a very different manner. The issue with eql() is that it will stop at the first mismatch byte, while timingSafeEql() will always scan through the two buffers first and then return the result.
Why do we need that?
Well, the issue is that if I can time (and you can, even over the network) the duration of a call like that, I’ll be able to test various values until I match whatever secret value the code is comparing against. In this case, I don’t believe that the use of eql() is an actual problem. We tested that on the output of HMAC operation vs. the expected value. The caller has no way to control the HMAC computation and already knows what we are comparing against. I can’t think of any reason where that would be a problem. However, I’m not a cryptographer and any call to buffer comparison in crypto related code should use a constant time method.
For that matter, side channels are a huge worry in cryptography. AES, for example, is nearly impossible to implement in software at this point, because it is vulnerable to timings side channels and requires the hardware to help here. Other side channels include watching caches, power signatures and more. I don’t actually have much to say about this, except that when working with cryptography, even something as simple as multiplication is suspect, because it may not complete in constant time. As a good example of the problem, see this page.
In the previous post I showed how we can mess around with the encrypted text, resulting in a valid (but not the original) plain text. We can use that for many nefarious reasons, as you can imagine. Luckily, there is a straightforward solution for this issue. We can implement something called MAC (message authentication code) to ensure that the encrypted data wasn’t tampered with. That is pretty easy to do, actually, since we have HMAC already, which is meant for exactly this purpose.
The issue here is an interesting one, what shall we sign? Here are the options:
Sign the plain text of the message, using a hashed key function (HMAC-MD5, in our case). Because we are using the secret key to compute the hash, just looking at the hashed value will not tell us anything about the plain text (for example, if we were using just MD5, we could use rainbow tables to figure out what the plain text was from the hash). Since there is no security issue with making the signature public, we can just append that to the output of the encryption as plain text. At least, I don’t think it does. I’ll remind you again that I’m not a cryptographer by trade.
Sign the plain text of the message (using a hashed key function or a regular cryptographic hash function) and append the hash (encrypted) to the output message.
Sign the encrypted value of the message, and append that hash to the output message.
A visualization might make it easier to understand. If you want to read more, there is a great presentation here.
The first two options are bad. Using those methods will leak your data in various ways. There is something that is apparently called the Cryptographic Doom principle, which is very relevant here. The idea is simple, we don’t trust the encrypted value, it may have been modified by an adversary. The first two options that we have require us to first take an action (decrypting the data) before we authenticate the message. We can then use various tricks to rip apart the whole scheme. That means that the very first thing we do is verify that the encrypted text we were handled was indeed signed by a trusted party (that has the secret key).
If you’ll look closely at the image above, you can see that I’m using two keys here, instead of one: key1 and key2. What is that about?
In cryptography, there is a strong reluctance to reuse the same key in different contexts. The issue is that if we use a single key in multiple scenarios (such as encryption and authentication), a weakness in one of them can be exploited in the other. Remember, cryptography is just math, and the fear is that given two values that were computed with the same key, but using different algorithms, you can do something with that. That has led to practical attacks in the past, so the general practice is to avoid reusing keys. The good thing is that given a single cryptographic key, it is easy to generate a new key using a key derivation function.
I’m still going to limit myself to HMAC-Md5 only (remember, none of this code is meant to actually be used), so I can derive a new key from an existing one using the following mechanism:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
fn deriveKey(key: *const [32]u8, nonce: []const u8, domain: []const u8, newKey: *[32]u8) void {
var buf: [4]u8 = undefined;
std.mem.writeIntNative(u32, &buf, 1);
var mac: [16]u8 = undefined;
var hmacMd5 = HmacMd5.init(key);
hmacMd5.update(domain);
hmacMd5.update(nonce);
hmacMd5.update(&buf);
hmacMd5.final(&mac);
std.mem.copy(u8, newKey, &mac);
std.mem.writeIntNative(u32, &buf, 2);
hmacMd5 = HmacMd5.init(key);
hmacMd5.update(nonce);
hmacMd5.update(domain);
hmacMd5.update(&buf);
hmacMd5.final(&mac);
std.mem.copy(u8, newKey[16..], &mac);
}
view raw
deriveKey.zig
hosted with ❤ by GitHub
The idea is that we use the HMAC and the static domain string we get to generate the new key. In this case, we actually use it twice, with the nonce being used to inject even more entropy into the mix. Since HMAC-Md5 outputs 16 bytes and I need a 32 bytes key, I’m doing that twice, with a different counter value each time. I also rearrange the order of the (nonce, domain) and (domain, nonce) fields on each hashing attempt to make it more interesting.
A reminder: I didn’t spend any time trying to figure out what kind of security this sort of system brings. It looks very much like what Sodium does for key derivation, but I wouldn’t trust it with anything.
With that in place, here is the new code for encryption:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
const HmacMd5 = std.crypto.auth.hmac.HmacMd5;
const MyCryptoAlgo = struct {
state: [HmacMd5.mac_length]u8 = undefined,
nonce: [HmacMd5.mac_length]u8 = undefined,
keyEnc: [HmacMd5.key_length]u8 = undefined,
keyMac: [HmacMd5.key_length]u8 = undefined,
macGen: HmacMd5 = undefined,
pos: u8 = 0,
count: [1]u64 = undefined,
pub const NonceLen = HmacMd5.mac_length;
pub const MacLen = HmacMd5.mac_length;
pub const ExtraLen = MacLen + NonceLen;
pub fn initWithNonce(key: [HmacMd5.key_length]u8, nonce: [HmacMd5.mac_length]u8) MyCryptoAlgo {
var self = MyCryptoAlgo{};
self.count[0] = 0;
std.mem.copy(u8, &self.nonce, &nonce);
deriveKey(&key, "encryption", &nonce, &self.keyEnc);
deriveKey(&key, "authentication", &nonce, &self.keyMac);
var md5 = HmacMd5.init(&self.keyEnc);
md5.update(&self.nonce);
md5.update(std.mem.sliceAsBytes(&self.count));
md5.final(&self.state);
self.macGen = HmacMd5.init(&self.keyMac);
self.macGen.update(&self.nonce);
return self;
}
pub fn init(key: [HmacMd5.key_length]u8) MyCryptoAlgo {
var nonce: [HmacMd5.mac_length]u8 = undefined;
std.crypto.random.bytes(&nonce);
return initWithNonce(key, nonce);
}
pub fn encrypt(key: [32]u8, plainText: []const u8, cipherText: []u8) void {
var mca = MyCryptoAlgo.init(key);
mca.encryptBlock(plainText, cipherText);
mca.finalize(cipherText[plainText.len..]);
}
pub fn finalize(self: *MyCryptoAlgo, data: []u8) void {
std.mem.copy(u8, data, &self.nonce);
var mac: [16]u8 = undefined;
self.macGen.final(&mac);
std.mem.copy(u8, data[NonceLen..], &mac);
}
pub fn encryptBlock(self: *MyCryptoAlgo, input: []const u8, output: []u8) void {
self.xorWithKeyStream(input, output);
self.macGen.update(output[0..input.len]); // process the encrypted stream
}
fn xorWithKeyStream(self: *MyCryptoAlgo, input: []const u8, output: []u8) void {
for (input) |c, i| {
if (self.pos == self.state.len) {
self.genKeyStreamBlock();
}
output[i] = self.state[self.pos] ^ c;
self.pos += 1;
}
}
fn genKeyStreamBlock(self: *MyCryptoAlgo) void {
var md5 = HmacMd5.init(&self.keyEnc);
md5.update(&self.nonce);
self.count[0] += 1;
md5.update(std.mem.sliceAsBytes(&self.count));
md5.final(&self.state);
self.pos = 0;
}
};
view raw
badEncryption.zig
hosted with ❤ by GitHub
We have a lot going on here. In the initWithNonce() function, we generate the derived keys for two domains. Then we generate the block of key stream, as we previously did. The last stage in the initWithNonce() function is initializing the MAC computation. Note that in addition to using a derived key for the MAC, I’m also adding the nonce as the first thing that we’ll hash. That should have no effect on security, but it ties the output hash even closer to this specific encryption.
In the xorWithKeyStream() function, you’ll note that I’m now passing both an input and an output buffer, aside from that, this is exactly the same as before (with the actual key stream generation moved to genKeyStreamBlock()). Things get interesting in the encrypytBlock() function. There we XOR the value that we encrypt with the key stream and push that to the output. We also add the encrypted value to the MAC that we generate.
The idea with encryptBlock() is to allow you to build an encrypted message in a piecemeal fashion. Once you are done with the data you want to encrypt, you need to call to finalize(). That would copy the nonce and complete the computation of the MAC of the encrypted portion.
The encrypt() function is provided in order to make it easier to work with the data when you want to encrypt a single buffer. (And yes, I’m not doing any explicit bounds check here, relying on Zig’s to panic if we need to. I mentioned that this isn’t production level code already?)
For encryption, we can pass either a single buffer to encrypt or we can pass it in pieces. For decryption, on the other hand, the situation isn’t as simple. To decrypt the data properly, we first need to verify that it wasn’t modified. That means that to decrypt the data, we need all of it. The API reflects this behavior:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn decrypt(key: [32]u8, cipherText: []u8, plainText: []u8) !void {
if (cipherText.len < ExtraLen)
return error.CihpherTextBufferTooSmall;
if (cipherText.len - ExtraLen > plainText.len)
return error.PlainTextBufferTooSmall;
var nonce: [HmacMd5.mac_length]u8 = undefined;
std.mem.copy(u8, &nonce, cipherText[cipherText.len - ExtraLen .. cipherText.len - NonceLen]);
var mac = cipherText[cipherText.len - NonceLen .. cipherText.len];
var dec = MyCryptoAlgo.initWithNonce(key, nonce);
var msgText = cipherText[0 .. cipherText.len - ExtraLen];
var myMac: [16]u8 = undefined;
dec.macGen.update(msgText);
dec.macGen.final(&myMac);
if (std.mem.eql(u8, mac, &myMac) == false) {
return error.InvalidMac;
}
dec.xorWithKeyStream(msgText, plainText);
}
view raw
decrypt.zig
hosted with ❤ by GitHub
The decrypt() function does do some checks. We are dealing here with input that is expected to be malicious. As such, the first thing that we do is to extract the MAC and the nonce from the cipher text buffer. I decided it would be simpler to require that as a single buffer (although, as you can imagine, it would be very simple to change the API to have that as independent values).
Once we have the nonce, we can initialize the struct with the key and nonce (which will also derive the keys and setup the macGen properly). The next step is to compute the hash over the encrypted text and verify that it matches our expectation.
Yes, I’m using eql() here for the comparison. This is a short circuiting operation, and I’m doing that intentionally so I can talk about this in a future post.
If the MAC that I compute is a match to the MAC that was provided, we know that the message hasn’t been tampered with. At that point we can simply XOR the encrypted text with the key stream to get the original value back.
A single bit out of line with this model will ensure that the decryption fails. What is more, note that we don’t do anything with the decryption until we validate the provided MAC and cipher text. To do anything else would invite cryptographic doom, so it is nice that we were able to avoid it.
In the next post, I’m going to cover timings attacks.
When you have multiple Visual Studio instances, the Visual Studio Installer allows you to update them one by one. This is not very convenient as the installer takes time to download and install newer versions. Daniel Cazzulino made a .NET tool to manage Visual Studio instances. This tool allows you
In the previous post, we managed to get to a fairly complete state, the full code is less than 50 lines of code, but has enough functionality to be able to make use of it.
Don’t actually do that. This code is horribly broken, and the adage of “don’t implement your own encryption” holds very strongly here.
Let’s consider a fairly typical encryption setup. You log in with me, I generate an encrypted cookie and hand it back to you. In all future interactions, you give me back the cookie. I can decrypt that and make decisions based on that.
For example, let’s assume that we want to send the user the following:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
// plain text: {"role":"users","name":"oren"}
// nonce: 8E92B0E4AE4BE6BEFEF2638D02416E61
// encrypted: 6763169603E0BFA8A6BC6B2C768EABAA930E15CB7D11901C9E932ED0DBD8
// structure...
pub const UserCookie = struct {
role: []const u8,
name: []const u8,
pub fn deinit(self: *UserCookie, allocator: std.mem.Allocator) void {
allocator.free(self.role);
allocator.free(self.name);
}
};
fn genCookie(allocator: std.mem.Allocator, key: [32]u8, uc: *UserCookie) ![]u8 {
var json = try std.json.stringifyAlloc(allocator, uc, .{});
defer allocator.free(json);
var encrypted = try allocator.dupe(u8, json);
defer allocator.free(encrypted);
var mca = MyCryptoAlgo.init(key);
mca.run(encrypted);
return try std.fmt.allocPrint(allocator, "{},{}", .{
std.fmt.fmtSliceHexUpper(&mca.nonce),
std.fmt.fmtSliceHexUpper(encrypted),
});
}
view raw
data.zig
hosted with ❤ by GitHub
We compute the user name and role, pass it to the genCookie() function and have an encrypted string to work with. In this case, here is the cookie in question:
8E92B0E4AE4BE6BEFEF2638D02416E61,6763169603E0BFA8A6BC6B2C768EABAA930E15CB7D11901C9E932ED0DBD8
To go the other way, we decrypt the cookie using the nonce and the key, then parse the JSON into the cookie struct, like so:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
fn parseEncryptedCookie(allocator: std.mem.Allocator, cookie: []const u8, key: [32]u8) !UserCookie {
var comma = std.mem.indexOf(u8, cookie, ",") orelse return error.InvalidCookieFormat;
var nonce: [16]u8 = undefined;
try parseHex(cookie[0..comma], &nonce);
var rest = cookie[comma + 1 ..];
if (rest.len > 1042)
return error.CookieSizeIsTooBig;
var buf: [1024]u8 = undefined;
var raw = buf[0 .. rest.len / 2];
try parseHex(rest, raw);
var mca = MyCryptoAlgo.initWithNonce(key, nonce);
mca.run(raw);
return try std.json.parse(UserCookie, &std.json.TokenStream.init(raw), .{
.allocator = allocator,
});
}
fn parseHex(hexStr: []const u8, bytes: []u8) !void {
if (hexStr.len != bytes.len * 2)
return error.MismatchedBufferSizes;
var idx: usize = 0;
while (idx < hexStr.len) : (idx += 2) {
bytes[idx / 2] = try std.fmt.parseInt(u8, hexStr[idx .. idx + 2], 16);
}
}
view raw
parse.zig
hosted with ❤ by GitHub
With this in place, we can start making use of this foundation. Here is some user level code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
var user = try parseEncryptedCookie(allocator, cookieStr, key);
defer user.deinit(allocator);
if(!std.mem.eql(u8, "admin", user.role)){
return error.PermissionDenied;
}
// privileged action
view raw
usage.zig
hosted with ❤ by GitHub
At this point, this is awesome. We know that the cookie we got was encrypted with my key (which the user doesn’t have, obviously). So I handed an encrypted blob to the user, got it back and now I can make decisions based on this.
Or can I? A proper crypto system is defined as one where everything is known, aside from the secret key, and it maintains all its properties. The plain text of the cookie is:
{"role":"users","name":"oren"}
But I don’t have that. However… can I play with this? Remember the last post, we saw that XOR on the encrypted text can give us a lot of insight. What about in this case? Here is what I know:
The role value starts at position 9 and lasts 5 characters.
The value it currently has is “users”, we would like it to be “admin”.
Since we got the encrypted text from the server, we can return something else at a later point. Can we take advantage of that? Well, XOR is cumulative, right? So we can do this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
fn hackTheCoookie(allocator: std.mem.Allocator, cookie: []const u8) ![]u8 {
var comma = std.mem.indexOf(u8, cookie, ",") orelse return error.InvalidCookieFormat;
var buf: [1024]u8 = undefined;
var rest = cookie[comma + 1 ..];
var raw = buf[0 .. rest.len / 2];
try parseHex(rest, raw);
var oldRole = "users";
var newRole = "admin";
for (oldRole) |c, i| {
raw[9 + i] ^= c ^ newRole[i];
}
return std.fmt.allocPrint(allocator, "{s},{s}", .{
cookie[0..comma],
std.fmt.fmtSliceHexUpper(raw),
});
}
view raw
hackTheCoookie.zig
hosted with ❤ by GitHub
Let’s see what we’ll get when we run this on this cookie:
8E92B0E4AE4BE6BEFEF2638D02416E61,6763169603E0BFA8A6BC6B2C768EABAA930E15CB7D11901C9E932ED0DBD8
And the output would be:
8E92B0E4AE4BE6BEFEF2638D02416E61,6763169603E0BFA8A6A87C246D93ABAA930E15CB7D11901C9E932ED0DBD8
Now, if I send this to the server, it will properly decrypt this into:
{"role":"admin","name":"oren"}
And we are off to the races with full administrator privileges.
This is not a theoretical issue, it has been exploited in the past, to a devastating effect.
The question is, how do we prevent that? The key issue is that encryption (for stream ciphers) is basically just XORing with a secret key stream. Even for block ciphers, the encrypted data is malleable. We can’t assume that it will not decrypt properly, or that the decryption of the modified encrypted text wouldn’t result in valid plain text.
Luckily, cryptography also has an answer, we can create a signature of the data (with the secret key) and then use that to verify that the data hasn’t been tampered with. In fact, we are already using HMAC, which is meant for this exact purpose. This is a pretty big topic, so I’ll discuss that in the next post.