I recently needed to set up an alert when a certificate expires soon or when it uses an insecure signature algorithm. The first step is to get the certificate. In .NET you can use the SslStream class to do the hard job. Then, you can set up a recurring task using Azure Pipelines or GitHub Actions t
One of the things that I find myself paying a lot of attention to is the error handling portion of writing software. This is one of the cases where I’m sounding puffy even to my own ears, but from over two decades of experience, I can tell you that getting error handling right is one of the most important things that you can do for your systems. I spend a lot of time on getting errors right. That doesn’t just mean error handling, but error reporting and giving enough context that the other side can figure out what we need to do.In a secured protocol, that is a bit harder, because we need to safeguard ourselves from eavesdroppers, but I spent significant amounts of time thinking on how to do this properly. Here are the ground rules I set out for myself:The most common scenario is client failing to connect to the server.We need to properly report underlying issues (such as TCP errors) while also exposing any protocol level issues.There is an error during the handshake and errors during processing of application messages. Both scenarios should be handled.We already saw in the previous post that there is the concept of the data messages and alert messages (of which there can only be one). Let’s look how that works for the handshake scenario. I’m focusing on the server side here, because I’m assuming that this one is more likely to be opaque. A client side issue can be much more easily troubleshooted. And the issue isn’t error handling inside the code, it is distributed error handling. In other words, if the server has an issue, how it reports to the client?The other side, where the client wants to report an issue to the server, is of no interest to us. From our perspective, a client can cut off at any point (TCP connection broke, etc), so there is no meaning to trying to do that gracefully or give more data to the server. What would the server do with that? Here is the server portion of establishing a secured connection:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn serverConnection(allocator: *std.mem.Allocator, stream: std.net.Stream, server_keys: crypto.KeyPair) !AuthenticatedConnection {
errdefer stream.close();
var handshake = protocol.Server.initialize(server_keys);
var reader = stream.reader();
var hello: protocol.HelloMessage = undefined;
try reader.readNoEof(std.mem.asBytes(&hello));
try hello.route(&handshake); // no routing supported here
var challenge = try hello.challenge(&handshake);
var writer = stream.writer();
try writer.writeAll(std.mem.asBytes(&challenge));
var resp: protocol.ChallengeResponse = undefined;
try reader.readNoEof(std.mem.asBytes(&resp));
var session = try handshake.generateKey();
var rc: AuthenticatedConnection = undefined;
std.mem.copy(u8, &rc.pub_key, &handshake.client.long_term_public_key);
rc.stream = try crypto.NetworkStream.init(allocator, stream, session);
try resp.completeAuth(&handshake);
return rc;
}
view raw
serverConnection.zig
hosted with ❤ by GitHub
I’m using Zig to write this code and you can see any potential error in the process marked with a try keyword. Looking at the code, everything up to line 24 (the completeAuth() call) is mechanically sending and receiving data. Any error up to that point is something that is likely network related (so the connection is broken). You can see that the protocol call challenge() can fail as does the call to generateKey() – in both cases, there isn’t much that I can do about it. If the generateKey() call fails, there is no shared secret (for that matter, it doesn’t look like that can fail, but we’ll ignore that). As for the challenge() call, the only way that can fail is if the server has failed to encrypt its challenge properly. That is not something that the client can do much about. And anyway, there isn’t a failing codepath there either.In other words, aside from network issues, which will break the connection (meaning we cannot send the error to the client anyway), we have to wait until we process the challenge from the client to have our first viable failure. In the code above, I”m just calling try, which means that we’ll fail the connection attempt, close the socket and basically just hang up on the client. That isn’t nice to do at all. Here is what I replaced line 24 with:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
resp.completeAuth(&handshake) catch |e| {
// we use the secure channel to send an error to the other side (will also abort the connection there)
var msg = "Failed to validate challenge response".*;
rc.stream.send_alert(crypto.AlertTypes.BadChallengeResponse, &msg) catch {
// there is nothing we can do here, ignoring the error
};
return e; // implicitly close the connection
};
view raw
serverConnection.err-handling.zig
hosted with ❤ by GitHub
What is going on here is that by the time that I got the challenge response from the client, I have enough information to send derive the shared key. I can use that to send an alert to the other side, letting them know what the failure was. A client will complete the challenge, and if there is a handshake failure, we proceed to fail gracefully with meaning error.But there is another point to this protocol, an alert message doesn’t have to show up only in the hand shake part. Consider a long running response that run into an error. Here is how you’ll usually handle that in TCP / HTTP scenarios, assume that we are streaming data to the client and suddenly run into an issue:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
{
"Databases": [
{
"Name": "Northwind",
"Disabled": false,
"TotalSize": {
"HumaneSize": "327.81 MBytes",
"SizeInBytes": 343736320
}
},
{
"Name": "Darksand",
"Disabled": false,
"TotalSize": {
Unhandled Exception: System.UnauthorizedAccessException: Access to the path '/data/darksand' is denied. ---> System.IO.IOException: Permission denied
--- End of inner exception stack trace ---
at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at Microsoft.Diagnostics.Runtime.Linux.LinuxLiveDataReader.OpenMemFile()
at Microsoft.Diagnostics.Runtime.Linux.LinuxLiveDataReader.ReadMemory(UInt64 address, IntPtr buffer, Int32 bytesRequested, Int32& bytesRead)
at Microsoft.Diagnostics.Runtime.DacInterface.DacDataTargetWrapper.ReadVirtual(IntPtr self, UInt64 address, IntPtr buffer, Int32 bytesRequested, Int32& bytesRead)
at Microsoft.Diagnostics.Runtime.DacLibrary..ctor(DataTarget dataTarget, String dacDll)
at Microsoft.Diagnostics.Runtime.DataTarget.ConstructRuntime(ClrInfo clrInfo, String dac)
view raw
response.json
hosted with ❤ by GitHub
How do you send an error midstream? Well, you don’t. If you are lucky, you’ll have the error output and have some way to get the full message and manually inspect it. That is a distressingly common issue, by the way, and a huge problem for proper error reporting with long running responses. With the alert model, we have effectively multiple channels in the same TCP stream that we can utilize to send a clear and independent error for the client. Much nicer overall, even if I say so myself.And it just occurred to me that this mimics quite nicely the same approach that Zig itself uses for error handling .
We now have managed to do a proper handshake and both client and server has a shared key. The client has also verified that the server is who they thought it should be, the server knows who the client is and can lookup whatever authorization such a client is ought to get. The next step we have to take is actually starting sending data over the wire. I mentioned earlier that while conceptually, we are dealing with a stream of data, in practice, we have to send the data as independent records. That is done so we can properly verify that they weren’t meddled with along the way (either via cosmic radiation or malicious intent).We’ll start with the writing data, which is simple. We initiate the write side of the connection using CryptoWriter:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn init(allocator: *std.mem.Allocator, stream: TStream, secret_keys: sodium.SecretKeys) !CryptoWriter {
var buf = try allocator.alloc(u8, RecordBufferSize * 2);
errdefer allocator.free(buf);
var self: CryptoWriter = .{
.allocator = allocator,
.buffer = buf,
.writer = stream.writer(),
.padder = null,
.stream = stream,
.state = undefined,
.alert_raised = false,
.buffered = 0,
};
if (c.crypto_secretstream_xchacha20poly1305_init_push(
&self.state,
&self.buffer[0],
&secret_keys.transmit[0],
) != 0) {
return error.UnableToPushStreamHeader;
}
try self.writer.writeAll(self.buffer[0..c.crypto_secretstream_xchacha20poly1305_HEADERBYTES]);
self.buffered = HeaderSize;
return self;
}
view raw
CryptoWriter.init.zig
hosted with ❤ by GitHub
We allocate a buffer that is 32KB in size (16KB x 2). The record size we selected is 16KB. Unlike TLS, this is an inclusive size, so the entire thing must fit in 16KB. We need to allocate 32KB because the API we use does not support in place encryption. You’ll note that we reserved some space in the header (5 bytes, to be exact) for our own needs. You’ll note that we initialize the stream and send the stream header to the other side in here, that is the only reference for cryptography in the initialization. The actual writing isn’t really that interesting, we are pushing all the data to the buffer, until we run out of space, then we call flush(). I’ve written this code in plenty of languages, and it is pretty straightforward, if tedious.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn write(self: *CryptoWriter, buffer: []const u8) !usize {
if (self.alert_raised) {
return error.AlertAlreadyRaised;
}
var buf = buffer;
var total_size: usize = 0;
while (true) {
var size = std.math.min(buf.len, RecordBufferSize - self.buffered);
total_size += size;
std.mem.copy(u8, self.buffer[self.buffered..RecordBufferSize], buf[0..size]);
self.buffered += size;
buf = buf[size..];
if (self.buffered == RecordBufferSize) {
try self.flush(RecordTypes.Data);
}
if (buf.len == 0)
break;
}
return total_size;
}
view raw
CryptoWriter.write.zig
hosted with ❤ by GitHub
There isn’t anything happening here, until we call to flush(RecordTypes.Data) – that is an indication to the other side that this is application data, rather than some protocol level message. The flush() method is where things gets really interesting.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn flush(self: *CryptoWriter, rec_type: RecordTypes) !void {
if (self.alert_raised) {
return error.AlertAlreadyRaised;
}
if (self.buffered > MaxPlainTextSize) {
return error.PlainTextRecordSizeToLarge; // should never happen
}
std.mem.writeInt(u16, self.buffer[2..4], @intCast(u16, self.buffered - @sizeOf(u16)), .Little);
self.buffer[4] = @enumToInt(rec_type);
if (self.padder) |padder| {
var pad_len = padder(self, self.buffered);
if (pad_len + self.buffered > MaxPlainTextSize)
return error.InvalidPaddingLengthProvided;
std.mem.set(u8, self.buffer[self.buffered..(self.buffered + pad_len)], 0);
self.buffered += pad_len;
}
var len: u64 = 0;
var encrypted = self.buffer[RecordBufferSize..];
if (c.crypto_secretstream_xchacha20poly1305_push(
&self.state,
&encrypted[@sizeOf(u16)],
&len,
&self.buffer[@sizeOf(u16)],
self.buffered - @sizeOf(u16),
null,
0,
0,
) != 0) {
return error.UnableToPushEncryptedRecord;
}
if (len > RecordBufferSize - @sizeOf(u16)) {
return error.EncryptedRecordSizeTooBig; // should never happen
}
std.mem.writeInt(u16, encrypted[0..2], @intCast(u16, len + @sizeOf(u16)), .Little);
try self.writer.writeAll(encrypted[0 .. len + @sizeOf(u16)]);
self.buffered = HeaderSize;
}
view raw
CryptoWriter,flush.zig
hosted with ❤ by GitHub
There is a lot of code here, I know. Let’s see if I can take it in all, there are some preconditions that should be fairly obvious, then we write the size of the plain text value as well as the record type to the header (that part of the header will be encrypted, mind). The next step is interesting, we invoke a callback to get an answer about how much padding we should use. There is a lot of information about padding. In general, just looking at the size of the data can tell you about what is going on, even if there is nothing else you can figure out. If you know that the “Attack At Dawn” is 14 chars long, and with the encryption overhead that turns to a 37 bytes message, that along can tell you much. Assume that you can’t figure out the contents, but can sniff the sizes. That can be a problem. There are certain attacks that rely on leaking the size of messages to work, the BREACH attack, for example, relies on being able to send text that would collide with secret pieces of the message. Analyzing the size of the data that is sent will tell us when we managed to find a match (because the size will be reduced). To solve that, you can define a padding policy. For example, all messages are always exactly 16KB in size, and you’ll send an empty message every second if there is no organic traffic. Alternatively, you may select to randomize the message size (to further confuse things). At any rate, this is a pretty complex topic,and not something that I wanted to get too much into. Being able to let the user decide gives me both worlds. This is a match to SSL_CTX_set_record_padding_callback() on OpenSSL.The rest is just calling to libsodium to do the actual encryption, setting the encrypted envelope size and sending it to the other side. Note that we use the other half of the buffer here to store the encrypted portion of the data. In addition to sending application data, we can send alerts to the other side. That is an protocol level error message. I’ll actually have a separate post to talk about error handling, but for now, let’s see how sending an alert looks like:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn send_alert(self: *CryptoWriter, alert_type: AlertTypes, msg: []u8) !void {
defer {
self.alert_raised = true;
}
if (msg.len + @sizeOf(AlertTypes) + @sizeOf(u16) > MaxPlainTextSize) {
return error.PlainTextRecordSizeToLarge;
}
std.mem.copy(u8, self.buffer[(HeaderSize + @sizeOf(AlertTypes))..], msg);
std.mem.writeInt(u16, self.buffer[HeaderSize .. HeaderSize + @sizeOf(u16)], @enumToInt(alert_type), .Little);
self.buffered = HeaderSize + @sizeOf(AlertTypes) + msg.len; // we discard everything else
try self.flush(RecordTypes.Alert);
}
view raw
CryptoWriter.alert.zig
hosted with ❤ by GitHub
Basically, we overwrite whatever there is on the buffer, and we flush it immediately to the other side. We also set the alert_raised flag, which will prevent any further usage of the stream. Once an error was sent, we are done. We aren’t closing the stream because that is the job for the calling code, which will get an error and close us during normal cleanup procedures.The reading process is a bit more involved, on the other hand. We start by mirroring the write, pulling the header from the network and initializing the stream:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn init(allocator: *std.mem.Allocator, source: TStream.Reader, secret_keys: sodium.SecretKeys) !CryptoReader {
var buf = try allocator.alloc(u8, RecordBufferSize * 2);
errdefer allocator.free(buf);
var self: CryptoReader = .{
.allocator = allocator,
.reader = source,
.buffer = buf,
.incoming = &[0]u8{},
.incoming_plain_text = &[0]u8{},
.state = undefined,
.alert_code = null,
};
try self.reader.readNoEof(self.buffer[0..c.crypto_secretstream_xchacha20poly1305_HEADERBYTES]);
if (c.crypto_secretstream_xchacha20poly1305_init_pull(
&self.state,
&self.buffer[0],
&secret_keys.recieve[0],
) != 0) {
return error.FailedToInitCryptoStream;
}
return self;
}
view raw
CryptoReader.init.zig
hosted with ❤ by GitHub
The real fun starts when we need to actually read things, let’s take a look at the code and then I’ll explain it in details:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
fn read(self: *CryptoReader, buffer: []u8) !usize {
if (self.alert_code) |_| {
return error.AnAlertWasRaised;
}
if (self.incoming_plain_text.len > 0) { // read from buffer
return self.read_buffer(buffer);
}
while (true) {
while (self.incoming.len < @sizeOf(u16)) {
try self.read_from_network();
}
var env_len = std.mem.readInt(u16, self.incoming[0..2], .Little);
if (env_len == 0 or env_len > RecordBufferSize) {
return error.InvalidCryptoEnvelopeSize;
}
while (env_len > self.incoming.len) {
try self.read_from_network(); // read enough bytes from network
}
self.incoming_plain_text = self.buffer[RecordBufferSize..];
var len: u64 = 0;
if (c.crypto_secretstream_xchacha20poly1305_pull(
&self.state,
&self.incoming_plain_text[0],
&len,
null,
&self.incoming[@sizeOf(u16)],
env_len - @sizeOf(u16),
null,
0,
) != 0) {
return error.FailedToDecryptRecord;
}
std.mem.copy(u8, self.incoming, self.incoming[env_len..]);
self.incoming = self.incoming[env_len..];
if (len < @sizeOf(u16)) {
return error.DecryptedRecordIsTooSmall;
}
var plain_txt_len = std.mem.readInt(u16, self.incoming_plain_text[0..2], .Little);
if (plain_txt_len == 0) {
continue; // allowed to have empty record
}
var record_type = @intToEnum(RecordTypes, self.incoming_plain_text[@sizeOf(u16)]);
self.incoming_plain_text = self.incoming_plain_text[@sizeOf(u16) + @sizeOf(u8) .. plain_txt_len];
if (record_type == .Alert) {
self.alert_code = @intToEnum(AlertTypes, std.mem.readInt(u16, self.incoming_plain_text[0..@sizeOf(u16)], .Little));
self.incoming_plain_text = self.incoming_plain_text[@sizeOf(u16)..];
return error.AnAlertWasRaised;
}
break;
}
return self.read_buffer(buffer);
}
view raw
CryptoReader.read.zig
hosted with ❤ by GitHub
We first check if an alert was raised, if it was, we immediately abort, since the stream is now dead. If there are any plain text bytes, we can return them directly from the buffer. We’ll look into that as well as how we read from the network shortly. For now, let’s focus on what we are doing here.We read enough from the network to know what is the envelope length that we have to read. That value, if you’ll remember, is the first value that we send for a record and is not encrypted (there isn’t much point, you can look at the packet information to get that if you wanted to). We then make sure that we read the entire record to the buffer. We decrypt the data from the incoming buffer to the plain_text buffer (that is what the read_buffer() function will use to actually return results).The rest of the code is figuring out what we actually got. We check what is the actual size of the data we received. We may have received a zero length value, so we have to handle this.We check whatever we got a data record or an alert. If the later, we mark it as such and return an error. If this is just the data, we setup the plain text buffer properly and go to the read_buffer() call to return the values. That is a lot of code, but not a lot of functionality. Simple code is best, and this match that scenario. Let’s see how we handle the actual buffer and network reads:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
fn read_buffer(self: *CryptoReader, buffer: []u8) usize {
var size = std.math.min(self.incoming_plain_text.len, buffer.len);
std.mem.copy(u8, buffer, self.incoming_plain_text[0..size]);
self.incoming_plain_text = self.incoming_plain_text[size..];
return size;
}
fn read_from_network(self: *CryptoReader) !void {
var existing = self.incoming.len; // we may have data already in buffer, but need more...
var len = try self.reader.read(self.buffer[existing..(RecordBufferSize - existing)]);
if (len == 0) {
return error.UnexpectedEndOfStream;
}
self.incoming = self.buffer[0..(existing + len)];
}
view raw
CryptoReader.buffer-net.zig
hosted with ❤ by GitHub
Not much here, just need to make sure that we handle partial reads as well as reading multiple records in one shot. We saw that when we get an alert, we return an error. But the question is, how do we get the actual alert? The answer is that we store the message in the plain text buffer and record the alert itself. All future calls will fail with an error. You can then call to the alert() function to get the actual details:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn alert(self: *CryptoReader) !Alert {
if (self.alert_code) |code| {
var rc = Alert{ .alert = code, .msg = self.incoming_plain_text };
return rc;
}
return error.NoAlertRecieved;
}
view raw
CryptoReader.alert.zig
hosted with ❤ by GitHub
This gives us a nice API to use when there are issues with the stream. I think that matches well with the way Zig handles errors, but I can’t tell whatever this is idiomatic Zig.That is long enough for now, you can go and read the actual code, of course. And I will welcome any comments. In the next (and likely last) post in the series, I”m going to go over error handling at the protocol level.
After figuring out the design, let’s see what it would take to actually write a secured communication channel, sans PKI, in code. I’m going to use Zig as the language of choice here. It is as low level as C, but so much nicer to work with. To actually implement the cryptographic details, I’m going to lean on libsodium to do all the heavy lifting. It took multiple iterations of the code to get to this point, but I’m pretty happy with how it turned out. I’ll start from the client code, which connects to a remote server and establish a secured TCP channel, here is what this looks like:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
fn clientFn(
host: []const u8,
port: u16,
server_pub_key: [crypto.KeyLength]u8,
client_kp: crypto.KeyPair,
) !void {
var server_key = protocol.Client.ExpectedPublicKey{
.end_public_key = server_pub_key,
.middlebox_public_key = server_pub_key,
};
var con = try crypto.clientConnection(
std.heap.page_allocator,
host,
port,
client_kp,
server_key,
);
defer con.stream.deinit();
var encrypted_stream = con.stream;
var buf: [1024]u8 = undefined;
// read normal message
var len = try encrypted_stream.reader().read(&buf);
std.log.debug("{s}", .{buf[0..len]});
// handle errors from the other side
_ = encrypted_stream.reader().read(&buf) catch |e| {
std.log.debug("err {s}", .{@errorName(e)});
var a = try encrypted_stream.alert();
std.log.debug("{s} {s} {s}", .{ @errorName(e), @tagName(a.alert), a.msg });
};
}
view raw
client.zig
hosted with ❤ by GitHub
The function connects to a server, expecting it to use a particular public key, and will authenticate using a provided key pair. The bulk of the work is done in the crypto.clientConnection() call, where we are following the handshake I outlined here. The result of the call is an AuthenticatedConnection structure, containing both the encrypted stream as well as the public key of the other side. Note that from the client side, if the server doesn’t authenticate using the expected key, the call will fail with an error, so for clients, it is usually not important to check the public key, that is already something that we checked.The actual stream we return expose a reader and writer instances that you can use to talk to the other side. Note that we are using buffered data, so writing to the stream will not do anything until the buffer is full (about 16KB) or flush() is called.The other side is the server, of course, which looks like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
var client : std.net.Connection = try server.accept();
var con = try crypto.serverConnection(std.heap.page_allocator, client.stream, server_kp);
std.log.debug("Connected, I'm {s} - other side {s}", .{
crypto.KeyPair.keyBase64(server_kp.public),
crypto.KeyPair.keyBase64(con.pub_key),
});
var encrypted_stream = con.stream;
defer encrypted_stream.deinit();
var w = encrypted_stream.writer();
try w.writeAll("hi there");
try encrypted_stream.flush();
var msg = "Opps, msg".*;
try encrypted_stream.send_alert(crypto.AlertTypes.Badness, &msg);
view raw
server.zig
hosted with ❤ by GitHub
On the server side, we have the crypto.serverConnection() call, it accepts a new connection from a listening socket and starts the handshake process. Note that this code, unlike the client, does not verify that the other side is known to us. Instead, we return that to the caller which can then check the public key of the client. This is intentional, because at this point, we have a secure channel, but not yet authentication. The server can then safely tell the other side that they authorize them (or not) using the channel with not one being able to peek what is going on there.Let’s dig a bit deeper into the implementation. We’ll start from the client code, which is simpler:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn clientConnection(
allocator: *std.mem.Allocator,
host: []const u8,
port: u16,
client_keys: crypto.KeyPair,
expected_server_key: ?protocol.Client.ExpectedPublicKey,
) !AuthenticatedConnection {
var con = try std.net.tcpConnectToHost(allocator, host, port);
errdefer con.close();
var handshake = protocol.Client.init(client_keys, expected_server_key);
var hello = try handshake.hello();
var writer = con.writer();
try writer.writeAll(std.mem.asBytes(&hello));
var reader = con.reader();
var msg: protocol.ChallengeMessage = undefined;
try reader.readNoEof(std.mem.asBytes(&msg));
var response = try msg.respond(&handshake);
try writer.writeAll(std.mem.asBytes(&response));
var session = try handshake.generateKey();
var rc: AuthenticatedConnection = undefined;
std.mem.copy(u8, &rc.pub_key, &handshake.server.long_term_public_key);
rc.stream = try crypto.NetworkStream.init(allocator, con, session);
return rc;
}
view raw
clientConnection.zig
hosted with ❤ by GitHub
The handshake protocol itself is handled by the protocol.Client. The way I have coded it, we are reading known lengths from the network into in memory structure and using them directly. I can do that because the structures are basically just bunch of packed []u8 (char arrays), so the in memory and network representation are one and the same. That makes things simpler. You can see that I’m calling readNoEof on the structures as bytes. That ensure that I get the whole message from the network and then the actual operations that I need to make are handled. Here is the sequence of operations:After sending the hello, the server will respond with a challenge, the client replies and both sides now know that they other side is who they say they are.Let’s dig a bit deeper, shall we, and see how we have the hello message:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn hello(self: *Client) !HelloMessage {
var req: HelloMessage = undefined;
req.version = Client.ExpectedVersion;
std.mem.copy(u8, &req.client_session_public_key, &self.session.public);
std.mem.copy(
u8,
&req.expected_server_public_key.data,
&self.server.expected_server_key.end_public_key,
);
try req.expected_server_public_key.encrypt(
self.server.expected_server_key.middlebox_public_key,
self.session.secret,
);
return req;
}
pub fn EncryptedBoxBuffer(size: usize) type {
return packed struct {
const Self = @This();
data: [size]u8,
mac: [mac_len]u8,
nonce: [nonce_len]u8,
pub fn encrypt(
self: *Self,
public_key: [KeyLen]u8,
secret_key: [KeyLen]u8,
) !void {
c.randombytes_buf(&self.nonce, self.nonce.len);
var rc = c.crypto_box_detached(
&self.data[0],
&self.mac,
&self.data[0],
size,
&self.nonce,
&public_key,
&secret_key,
);
if (rc != 0) {
return error.EncryptionFailure;
}
}
pub fn decrypt(
self: *Self,
public_key: [KeyLen]u8,
secret_key: [KeyLen]u8,
) !void {
var rc = c.crypto_box_open_detached(
&self.data[0],
&self.data[0],
&self.mac,
size,
&self.nonce,
&public_key,
&secret_key,
);
if (rc != 0) {
return error.DecryptionFailure;
}
}
};
}
view raw
hello.zig
hosted with ❤ by GitHub
There isn’t much here, we set the version field to a known value, we copy our own session public key (which was just generated and tells no one nothing about us) and then we copy the expected server public key, but we aren’t sending that over the wire in the clear. Instead, we encrypt that. We encrypt it with the client session public key (which we just send over) as well as the expected middlebox key (remember, those might be different). The idea is that the server on the other end may decide to route the request, but at the same time, we want to ensure that we are never revealing any information to 3rd parties. The actual encryption is handled via the EncryptedBoxBuffer structure, you can see that I’m using Zig’s comptime support to generate a structure with a compile time variant size. That make is trivial to do certain things without really needing to think about the details. It used to be more complex, and be able to support arbitrary embedded structures, but I simplified it to a single buffer. For that matter, for most of the code here, the size I’m using is fixed (32 bytes / 256 bits). The key here is that all the details of nonce generation, MAC validation, etc are hidden and handled. I also don’t really need to think about the space for that, since this directly part of the structure.It gets more interesting when we look at how the client respond to the challenge from the server:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub const ChallengeMessage = packed struct {
server_session_public_key: [crypto.KeyLength]u8,
server_long_term_public_key: crypto.EncryptedBoxKey,
long_term_key_proof: crypto.EncryptedBoxKey,
pub fn respond(self: *ChallengeMessage, state: *Client) !ChallengeResponse {
var resp = std.mem.zeroes(ChallengeResponse);
std.mem.copy(u8, &state.server.session_public_key, &self.server_session_public_key);
try self.server_long_term_public_key.decrypt(self.server_session_public_key, state.session.secret);
std.mem.copy(u8, &state.server.long_term_public_key, &self.server_long_term_public_key.data);
if (state.server.validate_server_key) {
if (!std.crypto.utils.timingSafeEql(
[crypto.KeyLength]u8,
state.server.expected_server_key.end_public_key,
state.server.long_term_public_key,
)) {
return error.ExpectedServerPublicKeyMismatch;
}
}
try self.long_term_key_proof.decrypt(state.server.long_term_public_key, state.session.secret);
if (!std.crypto.utils.timingSafeEql([crypto.KeyLength]u8, self.long_term_key_proof.data, state.session.public)) {
return error.LongTermProofValueAndSessionPublicKeyMismatch;
}
std.mem.copy(u8, &resp.client_long_term_key.data, &state.long_term.public);
try resp.client_long_term_key.encrypt(state.server.session_public_key, state.session.secret);
std.mem.copy(u8, &resp.challenge_answer.data, &state.server.session_public_key);
try resp.challenge_answer.encrypt(state.server.session_public_key, state.long_term.secret);
return resp;
}
};
view raw
ChallengeMessage.zig
hosted with ❤ by GitHub
We copy the server’s session public key to our own state, then we decrypt the server’s long term public key using the public key that we were sent alongside the client’s own secret key. Without both of them, we cannot decrypt the information that was sealed using the server’s secret key and the client’s public key. Remember that we have a very important distinction here:Session key pair – generated per connection, transient, meaningless. If you know what the session public key is, you don’t get much.Long term key pair – used for authentication of the other side. If you know what the long term public key, you may figure out who the client or server are.Because of that, we never send the long term public keys in the clear. However, just getting the public key isn’t enough, we need to ensure that the other side actually holds the full keypair, not just saying that it does.We handle that part asking that the server will encrypt the client’s public session key using its long term secret key. Because the public session key is something that the client controls, the fact that the server can produce a value that decrypt to that using the stated public key ensures that it holds the secret portion as well. To answer the challenge, we do much the same thing in reverse. In other words, we are encrypting the server’s public session key with our own long term key and sending that to the server. The final step is actually generating the symmetric keys for the channel, which is done using:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
pub fn generate_client(client: KeyPair, server_public_key: [KeyLen]u8) !SecretKeys {
var rc: SecretKeys = undefined;
if (c.crypto_kx_client_session_keys(
&rc.recieve[0],
&rc.transmit[0],
&client.public[0],
&client.secret[0],
&server_public_key[0],
) != 0) {
return error.FailedToGenerateKey;
}
return rc;
}
view raw
gen_keys.zig
hosted with ❤ by GitHub
We are using the client’s session key pair as well as the server’s public key to generate a shared secret. Actually, a pair of secrets, one for sending and one for receiving. On the other side, you do pretty much the same in reverse. You can see the full source code here. This is only a partial work, of course, we still need to deal with the issue of actually sending data after the handshake, I’ll deal with that in my next post.
Do you ever need to demo your app to users, customers, or stakeholders? Is it part of your regular software delivery process? What about…Keep Reading →
In the previous post, I talked a lot about the manner in which both client and server will authenticate one another safely and securely. The reason for all the problem is that we want to ensure that we are talking to the entity we believe we do, protect ourselves from man in the middle, etc. The entire purpose of the handshake exchange is to establish that the person on the other side is the right one and not a malicious actor (like the coffee shop router or the corporate firewall). Once we establish who is on the other side, the rest is pretty easy. Each side of the connection generated a key pair specifically for this connection. They then managed to send each other both the other side’s public key as well as prove that they own another key pair (trust in which was established separately, in an offline manner).In other words, on each side, we have:My key pair (public, secret)
Other side public keyWith those, we can use key exchange to derive a shared secret key. The gist of this is that we know that this statement holds:op(client_secret, server_public) == op(server_secret, client_public)The details on the actual op() aren’t important for understanding, but I’m using sodium, so this is scalar multiplication over curve 25519. If this tells you anything, great. Otherwise, you can trust that people who do understand the math says that this is safe to do. Diffie-Hellman is the search term to use to understand how this works.Now that we have a shared secret key, we can start sending data to one another, right? It would appear that the answer to that is… no. Or at least, not yet. The communication channel that we build here is based is built on top of TCP, providing two way communication for client and server. The TCP uses the stream abstraction to send data over the wire. That does not work with modern cryptographic algorithm.How can that be? After is literally this thing called stream cipher, after all. If you cannot use a stream cipher for stream, what is it for?A stream cipher is a basic building block for modern cryptography. However, it also has a serious problem. It doesn’t protect you from modification of the ciphertext. In other words, you will “successfully” decrypt the value and use it, even though it was modified. Here is a scary scenario of how you can abuse that badly.Because of such issues, all modern cryptographic algorithms uses Authenticated Encryption. In other words, to successfully complete their operation, they require that the cipher stream will match a cryptographic key. In other words, conceptually, the first thing that a modern cipher will do on decryption is something like:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
def decrypt(key, nonce, cipher_text):
mac = cipher_text[:16] # last 16 bytes are the signature
expected_mac = crypto_hash(key, nonce, cipher_text[0:-16])
if not timeSafeEql(mac, expected_mac):
raise "Invalid cipher text"
return actually_decrypt(key, nonce, cipher_text[0:-16])
view raw
cipher.py
hosted with ❤ by GitHub
That isn’t quite how this looks like, but it is close enough to understand what is going on. If you want to look at how a real implementation does it, you can look here. The python code is nicer, but this is basically the same concept.So, why does that matter for us? How does this relate to having to dealing with streams?Consider the following scenario, in this model, in order to successfully decrypt anything, we first need to validate the MAC (message authentication code) for the encrypted value. But in order to do that, we have to have the whole value, not just part of that. In other words, we cannot use a real stream, instead, we need to send the data in chunks. The TLS protocol has the same issue, that is handled via the notion of records, with a maximum size of about 16KB. So a TLS stream is actually composed of records that are processed independently from one another. That also means that before you get to the TLS a buffered stream is a must, otherwise we’ll send just a few plain text bytes for a lot of cryptographic envelope. In other words, if you call tls.Write(buffer[0..4]), if you don’t have buffering, this will send a packet with a cryptographic envelope that is much bigger than the actual plain text value that you sent. Looking at the TLS record layer, I think that I’ll adopt many of the same behaviors. Let’s consider a record:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
RecordEnvelope = {
Len : u16, // Max 16KB, *includes* the size of Len
Record : [Len -2]u8 // The cipher text
};
Record = {
Len : u16, // < RecordEnvelope.Len, includes the size of Len & Data.
Type : u8, // Data | Alert
Data : [Len - 3]u8, // If Alert - Lne >= 3 and Data starts with 16 bits error code, then a string for the error itself.
};
// Note that padding here is explicitly allowed.
view raw
record.zig
hosted with ❤ by GitHub
So each record is composed of an envelope, that simply contains the length, then we have the cipher text itself. I’m intending to use the libsodium’s encrypted stream, because it lets me handle things like re-keying on the fly transparently, etc. We read the record from the network, decrypt and then need to decide what to do. If this is an alert, we raise it to the user (this is critical for good error reporting). Note that in this way I can send (an encrypted) stream to the other side to give a good error for the caller. For data, we just pass it to the caller. Note that there is one very interesting aspect here. We have two Len fields. This is because we allow padding, that can help avoid attacks such as BREACH and mitigate traffic analysis. We ensure that the padding is always set to zero, similar in reason to the TLS model, to avoid mistakes and to force implementation correctness.I think that this is enough theory for now. In my next post, I want to get to actually implementing this.As usual, I would love to hear your feedback and comments.
Following our recent hiccup with certificate expiration, I spent some time thinking about what we could do better. One of the paths that this led me to was to consider how I would design the underlying communication channel for RavenDB if I had a blank slate. Currently, RavenDB uses TLS over TCP and HTTPS (which is the same thing) as the sole communication mechanism between servers and clients and between the servers in the cluster. That relies on TLS to ensure the safety of the information, as well as client certificates for authentication. TLS, of course, require the use of server certificates, which means that we have mutual authentication between clients and servers. However, the PKI infrastructure that is required to support that is freaking complex. It is mostly invisible, except when it isn’t, when something fails.The idea in this design exercise is to consider how I would do things differently. This is a thought exercise only, not something that we intend to put into any kind of system at this point in time. The use of TLS has proven itself to be very successful and was greatly beneficial. I consider such design exercises to be vital to the overall health of a project (and my own mind), because it allows me to dive deeply into a topic and consider this from a different view point. Therefor, I’m going to proceed based on the RavenDB’s set of requirements, even though this is all theoretical.That disclaimer aside, what do we actually need from an secure communication channel?Build on top of TCP – nothing else would do, and while UDP is nice to consider, that isn’t relevant for RavenDB’s scenario, so not worth considering. RavenDB makes a lot of use of the streaming nature of TCP connections. It allows us to make a lot of assumptions on the state of the other side. The key aspect we take advantage of is the fact that for a given connection, if I send you a document, I can assume that you already go (and processed successfully) all previous documents. That saves a lot of back & forth to maintain distributed state.Encrypted over the wire – naturally that means that we need to satisfy the same level of security as TLS.Provide mutual authentication of clients and servers – including in a hostile network environment.Let’s consider what we want to achieve here. The situation is not deployment of servers and clients by many independent organizations (each distrusting all others). Instead, we are setting up a cluster of RavenDB nodes that will talk to one another as well as any number of clients that will talk to those servers. That means that we can safely assume that there is a background channel which we trust. That remove the need to setup PKI and having a trusted third party that we’ll talk to. Instead, we are going to use public key cryptography to do authentication between nodes and clients.Here is how it is going to look like. When setting up a cluster, the admin will generate a key pair, like so:Server Secret: I_lfn5vna3p1OxyJ_kCJzRaBOWD-vio6hvpL6b2qYs8Server Public: oXQJcrZfMNoDDl1ZVSuJlKbREsd5yoprViQOTqmSSCkThe secret portion is going to remain written to the server’s configuration file, and the public portion will be used when connecting to the server, to ensure that we are talking to the right one. In the same sense, we’ll have the client generate a key pair as well:Client Secret: TVwQXoiYfvuToz5NY8D27bIeJR-LgR4y8gCM4UE3ZScClient Public: 5nNpLTSQmqzh3yttyD1DyM2a2caLORtecPj5LQ2tIHsWith those in place, we can now setup the following configuration on the server side:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
{
"Listen": "tcp://0.0.0.0:1981",
"ServerKeyPair": {
"Secret": "I_lfn5vna3p1OxyJ_kCJzRaBOWD-vio6hvpL6b2qYs8",
"Public": "oXQJcrZfMNoDDl1ZVSuJlKbREsd5yoprViQOTqmSSCk"
},
"AuthorizedClients": {
"5nNpLTSQmqzh3yttyD1DyM2a2caLORtecPj5LQ2tIHs": {
"SecurityClearance": "User",
"Databases": [
{ "Name": "Northwind", "Permissions": "ReadWrite" },
{ "Name": "Southsand", "Permissions": "ReadOnly" },
]
}
}
view raw
settings.json
hosted with ❤ by GitHub
Note that the settings.json contains the key pair of the server, but only the public key of the authorized clients. Conversely, the connection string for RavenDB would be:Server=crypto.protocol.ravendb.example;ServerPublicKey=oXQJcrZfMNoDDl1ZVSuJlKbREsd5yoprViQOTqmSSCk; ClientSecretKey=TVwQXoiYfvuToz5NY8D27bIeJR-LgR4y8gCM4UE3ZSc; ClientPublicKey=5nNpLTSQmqzh3yttyD1DyM2a2caLORtecPj5LQ2tIHs; In this case, the client connection string has the key pair of the client, and just the public key of the server. The idea is that we’ll use these to validate that either end is actually who we think they are.The details of public key cryptography is beyond the topic of this blog post (or indeed, my own understanding, if you get down to it), but the best metaphor that I found was the color mixing one. I’ll remind you that in public key cryptography, we have:Client Secret Key (CSK), Client Public Key (CPK)Server Secret Key (SSK), Server Public Key (SPK)We can use the following operations:Encrypt(CPK, SSK) –> Decrypt(SPK, CSK)Encrypt(SPK, CSK) –> Decrypt(CPK, SSK)In other words, we can use a public / secret from both ends to encrypt and decrypt the data. Note that so far, everything I did was pretty bog standard Intro to Cryptography 101. Let’s see how we take those idea and turn them into an actual protocol. The details are slightly more involved, and instead of using just two key pairs, we actually need to use five(!), let’s look at them in turn.The couple of key pairs are the one that we are familiar with, the server’s and the client’s. However, we are going to tag them with long term key pairs and show them as:The problem with using those keys is that we have to assume that they will leak at some point. In fact, one of the threat model that TLS has is dealing with adversaries that can record all network communication between parties for arbitrary amount of time. Given that this is encrypted, and assuming that no one can deal break the encryption algorithm itself, we need to worry about key leakage after the fact. In other words, if we use a pair of key to communicate securely, but the communication was recorded, it is enough to capture a single key (from either server or client) to be able to decrypt past conversations. That is not ideal. In order to handle that, we introduce the notion of session keys. Those are keys that are in no way or shape are related to the long term keys. They are generated using secured cryptographic method and are used of a single connection. Once that connection is closed, they are discarded.The idea is that even if you manage to lay your hands on the long term keys, the session keys, which are actually used to encrypt the communication, are long gone (and were never kept) anyway. For more details, the Wiki article on Perfect Forward Secrecy does a great job explaining the details.I’m counting four pairs of keys so far, but I mentioned that we’ll use five in this protocol, what is that about? I’m going to introduce the idea of a middlebox key. A middlebox is a server that the client will connect to, the client wants to be able to provide just enough information to the middlebox to route the request to the right location, but without providing any external observer with any idea about what is the final destination of the client is. In essence, this is ESNI (Encrypted Server Name Indication). A key aspect of this is that the client does not trust the middlebox, and the only thing a malicious middlebox can do is to record what is the final destination of the connection. It cannot eavesdrop on the details or modify them in any way.With all of that in place, and hopefully clear, let’s talk about the handshake that is required to make both sides verify that the other one is legit. The connection starts with a hello message, with the following details:Client –> ServerOverall size: 108 bytesAlgorithm – crypto_box (sodium) - Key exchange: X25519 Encryption: XSalsa20 stream cipher Authentication: Poly1305 MACField #SizeContentEncrypted using14VersionPlain text232Client’s session public keyPlain text332Expected server public keyMiddlebox’s public key + Client’s session secret key416MAC for field 3 524Nonce for field 3 This requires some explanation. I know enough to know my limitation with cryptography. I’m going to lean on well known and tested library, libsodium for the actual cryptographic details and try to do as little as possible on my own. The hello message details contains just three actual fields , but the third field is encrypted. Modern encryption practices are meant to make it as hard as possible to misuse. That means that pretty much any encryption algorithm that you are likely to use will use Authenticated Encryption. This is to ensure that any modification to the cypher text will fail the decryption process, rather than give corrupted results.To handle that scenario, we need to send a MAC (message authentication code), which you can see as field 4 on the message. The last field is a random value that will be used to ensure that when we encrypt the same data with the same keys, we'll not output the same value. That can have catastrophic impact on the safety of your system. You can think of the last two fields as part of the encryption envelope we need to properly encrypt the data.As the first field, we have the protocol version, which allows to change the protocol over time. Note that this is the only choice that we have, there is no negotiation or choice involved here at all. If we want to change the cryptographic details of the protocol, we’ll need to create a new version for that. This is in contrast with how TLS works, where we have both clients and servers offering their supported options and having to pick which one to use. That ends up being complex, so it is simpler to tie it down. Wireguard works in a similar manner, for example.You’ll notice that the client’s session public key is sent in the clear. That is fine, it is the public key, after all, and we ensure that each separate connection will generate a new key pair, there is nothing that can be gleaned from this data.Now, let’s go back to the fields that are actually meaningful, the client’s session public key and the expected server public key. What is that about?The client will first generate a key pair and send to the server the public portion of that key pair. Along with another keypair, we’ll be able to establish communication. However, what other key pair? In order to trust the remote server, we need to know its public key in advance. The administrator will be able to tell us that, of course, but requiring this is a PITA. We may want to implement TUFU (Trust Upon First Use), like SSH does, or we may want to tie ourselves to a particular key. In any event, at the protocol level, we cannot require that the public key for the server will be known before the first message, not if we want to apply it.To solve this issue, we have to consider why we have this expected server public key in the message in the first place. This is there to provide the middlebox a secure manner to discover what server the client wants to connect to. How the client discover the public key of the middlebox is intentionally left blank here. You can use the same manner as ESNI and grab the public key from a DNS entry, for example. Regardless, a key aspect of this is that the expected server public key is meant to be advisory only. If we are able to successfully decrypt it, then we know what server public key the client is looking for. We can lookup in some table and route the connection directly, without being able to figure out anything else on the contents of any future traffic.If we cannot successfully decrypt this, we can just ignore this and assume that the client is expecting any key (at any rate, the client itself will do its own validation down the line). In many cases, by the way, I expect that the middlebox and the end server will be one and the same, this middlebox feature is meant for some advanced scenarios, likely never to be relevant here.The server will reply to the hello message with a challenge, here is how it looks like:Server –> ClientOverall size: 168 bytesAlgorithm – crypto_box (sodium)Field #SizeContentEncrypted using132Server’s session public keyPlain text232Server’s long term public keyClient’s session public key + Server’s session secret key316MAC for field 2 424Nonce for field 2 532Client’s session public keyClient’s session public key + Server’s session long term key616MAC for field 6 724Nonce for field 6 Here we are starting to see some more interesting details. The server is sending its session public key, to complete the key exchange between the client and server. As before, this is a transient value, generated on a per connection basis and has no relation to the actual long term key pair. There it nothing that you can figure out from the plain text public key, so we don’t mind sending it.We send the long term key on field 2, on the other hand, encrypted. Why are we encrypting this? To prevent an outside observer from figuring out what server we are using (if we are using a middlebox).The idea is that once we exchange the public keys for the session key pairs for both sides, we’ll encrypt the long term public key using this and let the client know. We’ll also encrypt the client’s session’s public key. This time, however, we’ll encrypt using the server long term key as well as the client’s session public key. The idea is that the server is encrypting a value that the client chose (the client’s session public key, which is also transient) and encrypt that with Authenticated Encryption. If the client can successfully decrypt that, we know that the session’s public key was encrypted using the long term secret key. In this manner, we prove that we own the long term key pair.The client, upon receiving this message, will do the following:Decrypt field 2 – verifying their authenticity using the MAC in field 3.Decrypt field 5 – using the public key we got from the server.Assuming that those two decryption procedures were successful, we can compare the plain text value for field 3 and field 6. If they are the same, we know that the server has the long term key pair (both public and secret). If it didn’t have the secret portion of the key, the server would be unable to properly encrypt the value so we’ll be able to read it. The fact that it does this encryption with the client’s session key (which differs on each call) means that you can’t do reply / caching or any such tricks.The last thing that the client needs to do now is to figure out if the long term public key they got from the server is a match to the public key that they need. That can be part of a TUFU system, or we can reject the connection if the public key does not match.Client –> ServerOverall size: 136 bytesAlgorithm – crypto_box (sodium)Field #SizeContentEncrypted using132Client’s long term public keyServer’s session public key + client’s session secret key216MAC for field 1 324Nonce for field 1 424Server’s session public keyServer’s session public key + client’s long term secret key216MAC for field 4 324Nonce for field 4 At this point, the same pattern applies. The server will decrypt the client’s long term public key from field 1 using the session keys. It will then use its own secret session key in conjunction with the client’s long term public key to decrypt the value in field 4. The act of successfully decrypting the value in field 4 serves as a proof that the client indeed holds the secret key for the long term value. At the end of processing this message, the server know who is the client and verified that they posses the relevant key pair.From there, we are left with the simple act of doing key exchange using the session keys. Now both client and server know who the other side is and have agreed on the cryptographic keys that they will use to communicate with one another.I mentioned that I’m not an expert cryptographer, right? The design of this protocol isn’t innovative in any way. It takes heavily from the design of TLS 1.3, the most successful cryptographic protocol on the planet, which was design by people who actually know their craft here. What I’m mostly doing here is making assumptions, because I can:I don’t need PKI infrastructure, the communicating nodes all have a separate channel to establish trust by distributing the public keys. There is no need for negotiation between the client & server, we fixed all the parameters at the protocol version.The messages exchanged are all pretty small, that means that we can put them all on a single packet. Most importantly of all, the entire system relies on local state, there is absolutely nothing here that relies or uses any external party. That is kind of amazing, when you think about it, and obviously one of the major reasons why I’m doing this exercise. The tables and description above make it see exactly what is going on, even if they give all the details. I find that code make sense of code samples. Here is some sample code, showing how the server works:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
class ServerConnection:
long_term: KeyPair
authorized_clients: []
def accept(self, stream : Stream):
(session_sk, session_pk) = gen_key_pair()
buffer = stream.read_exactly(108)
version = int(buffer[0:4])
client_session_pk, expected_server_pk, mac, nonce = buffer[4:36], buffer[36:68], buffer[68:84], buffer[84, 108]
# okay to mess this up
if try_decrypt(expected_server_pk, mac, nonce, client_session_pk, long_term.secret):
if not timeSafeEqual(expected_server_pk, long_term.public):
log.debug("wrong server pk requested by client, client will reject connection")
long_term_proof_encrypted = encrypt(client_session_pk, client_session_pk, self.long_term.secret)
session_pk_encrypted = encrypt(session_pk, client_session_pk, session_sk)
stream.write(long_term_proof_encrypted + session_pk_encrypted)
buffer = stream.read_exactly(136)
client_long_term_key, mac, nonce = buffer[0:32], buffer[32:48], buffer[48:72]
if not try_decrypt(client_long_term_key, mac, nonce, client_session_pk, session_sk):
raise "Invalid connection: Cannot get long term client key"
challenge_answer, mac, nonce = buffer[72:104], buffer[104:122], buffer[122:136]
if not try_decrypt(challenge_answer, mac, nonce, client_long_term_key, session_sk):
raise "Invalid connection: Cannot validate challenge"
if not timeSafeEqual(challenge_answer, session_pk):
raise "Invalid connection: Wrong challenge message"
if client_long_term_key not in authorized_clients:
raise "Invalid connection: Unknown client"
return CryptoStream(stream, client_session_pk, session_sk, session_pk)
view raw
server.py
hosted with ❤ by GitHub
The server will read the first message and then send a reply, the client will respond to the challenge, and the server will read the data and validate it. This is meant to be pseudo code, mind you, not real code. Just to get you to figure out how this interacts. Here is the client side of things:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
class ClientConnection:
long_term: KeyPair
expected_server_key: bytes
def accept(self, stream: Stream):
(session_sk, session_pk) = generate_key_pair()
if len(expected_server_key) != 32:
raise "Invalid server key"
validate_server_pk = any(expected_server_key)
if not validate_server_pk:
randombuf(expected_server_key)
buffer = encrypt(expected_server_key, expected_server_key, session_sk)
stream.write(session_pk + buffer)
buffer = stream.read_exactly(168)
server_session_pk = buffer[0:32]
server_long_term_pk, mac, nonce = buffer[32:64], buffer[64:80], buffer[80:96]
if not try_decrypt(server_long_term_pk, mac, nonce, server_session_pk,session_sk):
raise "Unable to decrypt long term server key"
if validate_server_pk:
if not timeSafeEqual(server_long_term_pk, expected_server_key):
raise "Unepxected server key"
long_term_proof, mac, nonce = buffer[96:128], buffer[128:144], buffer[144:168]
if not try_decrypt(long_term_proof, mac, nonce, server_long_term_pk, session_sk):
raise "Unable to decrypt long term proof"
if not timeSafeEqual(long_term_proof, session_pk):
raise "Bad long term proof"
client_long_term_key = encrypt(self.long_term.public, server_session_pk, session_sk)
challenge_answer = encrypt(server_session_pk, server_session_pk, self.long_term.secret)
stream.write(client_long_term_key + challenge_answer)
return CryptoStream(stream, server_session_pk, session_sk, session_pk)
view raw
client.py
hosted with ❤ by GitHub
I hope that the code sample would make it clearer what is going on. I haven’t mentioned the key generation for the follow up communication. All I talked about here is the ability to setup a key exchange after validating the keys from both sides. At the same time, the long term keys aren’t used for anything except authentication, so we get perfect forward secrecy. The idea with the middlebox key also allows us to natively support more complex routing and topologies, which is nice (but also probably YAGNI for this exercise).I would love to get your feedback and thoughts about this idea.
This is an interesting and a somewhat confusing topic, so I decided to write down my understanding of how certificates signature works with cross signing.A certificate is basically an envelope over a public key and some metadata (name, host, issuer, validation dates, etc). There is a format called ASN.1 that specify how the data is structured and there are multiple encoding options DER / BER. None of that really matters for this purpose. A lot of the complexity of certificates can be put down to the issues in the surrounding technology. It would probably be a lot more approachable if the format was JSON or even XML. However, ASN.1 dates to 1984. Considering that this is the age of MS-DOS 3.0 and the very first Linux was 7 years in the future, it is understandable why this wasn’t done. I’m talking specifically about the understandability issue here, by the way. The fact that certificates are basically opaque blocks that requires special tooling to operate and understand has made the whole thing a lot more complex than it should be. A good example of that can be seen here, this will parse the certificate using ASN.1 and let you see its contents.Here is the same data (more or less) as a JSON document:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
{
"subject": {
"common_name": "R3",
"country": "US",
"organization": "Let's Encrypt",
"names": ["US", "Let's Encrypt", "R3"]
},
"issuer": {
"common_name": "DST Root CA X3",
"organization": "Digital Signature Trust Co.",
"names": ["Digital Signature Trust Co.", "DST Root CA X3"]
},
"serial_number": "85078157426496920958827089468591623647",
"not_before": "2020-10-07T19:21:40Z",
"not_after": "2021-09-29T19:21:40Z",
"sigalg": "SHA256WithRSA",
"authority_key_id": "C4:A7:B1:A4:7B:2C:71:FA:DB:E1:4B:90:75:FF:C4:15:60:85:89:10",
"subject_key_id": "14:2E:B3:17:B7:58:56:CB:AE:50:09:40:E6:1F:AF:9D:8B:14:C2:C6",
"authority_inforamtion_access": "http://apps.identrust.com/roots/dstrootcax3.p7c",
"public_key": [
"MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuwIVKMz2oJTTDxLsjVWS",
"w/iC8ZmmekKIp10mqrUrucVMsa+Oa/l1yKPXD0eUFFU1V4yeqKI5GfWCPEKpTm71",
"O8Mu243AsFzzWTjn7c9p8FoLG77AlCQlh/o3cbMT5xys4Zvv2+Q7RVJFlqnBU840",
"yFLuta7tj95gcOKlVKu2bQ6XpUA0ayvTvGbrZjR8+muLj1cpmfgwF126cm/7gcWt",
"0oZYPRfH5wm78Sv3htzB2nFd1EbjzK0lwYi8YGd1ZrPxGPeiXOZT/zqItkel/xMY",
"6pgJdz+dU/nPAeX1pnAXFK9jpP+Zs5Od3FOnBv5IhR2haa4ldbsTzFID9e1RoYvb",
"FQIDAQAB"
],
"signature": [
"2UzgyfWEiDcx27sT4rP8i2tiEmxYt0l+PAK3qB8oYevO4C5z70kHejWEHx2taPDY",
"/laBL21/WKZuNTYQHHPD5b1tXgHXbnL7KqC401dk5VvCadTQsvd8S8MXjohyc9z9",
"/G2948kLjmE6Flh9dDYrVYA9x2O+hEPGOaEOa1eePynBgPayvUfLqjBstzLhWVQL",
"GAkXXmNs+5ZnPBxzDJOLxhF2JIbeQAcH5H0tZrUlo5ZYyOqA7s9pO5b85o3AM/OJ",
"+CktFBQtfvBhcJVd9wvlwPsk+uyOy2HI7mNxKKgsBTt375teA2TwUdHkhVNcsAKX",
"1H7GNNLOEADksd86wuoXvg=="
]
}
view raw
cert.json
hosted with ❤ by GitHub
The only difference between the two options is that one is easy to parse, and the other… not so much, to be honest. After we dove into the format, let’s understand what are the important fields we have in the certificate itself. A certificate is basically just the public key, the name (or subject alternative names) and the validity period. In order to establish trust in a certificate, we need to trace it back up to a valid root certificate on our system (I’m not touch that topic in this post). The question is now, how do we create this chain?Well, the certificate itself will tell us. The certificate contains the name of the issuer, as well as the digital signature that allows us to verify that the claimed issuer is the actual issuer. Let’s talk about that for a bit.How does digital signatures work? We have a key pair (public / secret) that we associate with a particular name. Given a set of bytes, we can use the key pair to generate a cryptographic signature. Another party can then take the digital signature, the original bytes we signed and the public key that was used and validate that they match. The details of how this works are covered elsewhere, so I’ll just assume that you take this on math. The most important aspect is that we don’t need the secret part of the key to do the verification. That allows us to trust that a provided value is a match to the value that was signed by the secret key, knowing only the public portion of it.As an aside, doing digital signatures for JSON is a PITA. You need to establish what is the canonical form of the JSON document. For example: with or without whitespace? With the fields sorted, or in arbitrary order? If sorted, using what collation? What about duplicated keys? Etc… the nice thing about ASN.1 is that at least no one argues about how it should look. In fact, no one cares.In order to validate a certificate, we need to do the following:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
def validate(cert):
now = datetime.now()
if now < cert.not_before or now > cert.not_after:
return False, "Expired certificate"
for issuer_cert in lookup_cert_by_name(cert.issuer):
if now < issuer_cert.not_before or now > issuer_cert.not_after:
continue
if validate_signature(cert.data, cert.signature, issuer_cert.public_key):
if cert.issuer is None:
if is_trusted_root_ca(cert):
return True, "Valid certificate"
else:
return False, "Uknown root CA"
return validate(issuer_cert)
return False, "No parent certificate"
view raw
valid_cert.py
hosted with ❤ by GitHub
There are a few things to note in this code. We do the lookup of the issuer certificate by name. The name isn’t anything special, mind you. And the process for actually doing the lookup by name is completely at the hands of the client, an implementation detail for the protocol. You can see that we validate the time, then check if the issuer certificate that we found can verify the digital signature for the certificate we validate. We continue to do so until we find a trusted root authority or end the chain. There are a lot of other details, but these are the important ones for our needs.A really important tidbit of information. We find the issuer certificate using a name, and we establish trust by verifying the signature of the certificate with the issuer public key. The actual certificate only carries the signature, it knows nothing about the issuer except for its name. That is where cross signing can be applied.Consider the following set of (highly simplified) certificates:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
// certificate
{ "name": "www.example.com", "issuer": ["My Encrypt Authority Z1"], "sig": "ABCD", "public_key": 1234 }
// intermediate
{ "name": "My Encrypt Authority Z1", "issuer": ["My Encrypt Root Sep21"], "sig": "DEFG", "public_key": 3456 }
// root
{ "name": "My Encrypt Root Sep21", "issuer": ["My Encrypt Root Sep21"], "sig": "GLMN", "public_key": 5678 }
view raw
certs.json
hosted with ❤ by GitHub
The www.example.com certificate is signed by the Z1 certificate, which is signed by the Root Sep21 certificate, which is signed by itself. A root certificate is always signed by itself. Now, let’s add cross signing to this mix. I’m not going to touch any of the certificates we have so far, instead, I’m going to have:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
// intermediate - 2nd
{ "name": "My Encrypt Authority Z1", "issuer": ["Their Encrypt Y4"], "sig": "HGYE", "public_key": 3456 }
// root - 2nd
{ "name": "Their Encrypt Y4", "issuer": ["Their Encrypt Y4"], "sig": "VBNA", "public_key": 9284 }
view raw
cross_cert.json
hosted with ❤ by GitHub
This is where things get interesting. You’ll notice that the name of the intermediate certificate is the same in both cases, as well as the public key. Everything else is different (including validity periods, thumbprint, parent issuer, etc). And this works. But how does this work?Look at the validate() code, what we actually need to do to verify the certificate is to have the same name (so we can lookup the issuer certificate) and public key. We aren’t using any of the other data of the certificate, so if we reuse the same name and public key in another certificate, we can establish another chain entirely from the same source certificate. The question that we have to answer now is simple, how do we select which chain to use? In the validate() code above, we simply select the first chain that has an element that matches. In the real world, we may actually have multiple certificates with the same public key in our system. Let’s take a look:Take a look here 8d02536c887482bc34ff54e41d2ba659bf85b341a0a20afadb5813dcfbcf286d, these are all certificates that has the same name and public key, issued at different times and by different issuers.We don’t generally think about the separate components that makes a certificate, but it turns out to be a very useful property.Note that cross signed here is a misnomer. Take a look at this image, which shows the status of the Let’s Encrypt certificates as of August 2021. You can see that the R3 certificate is “signed” by two root certificates: DST Root CA X3 and ISRG Root X1. That isn’t actually the case, there isn’t a set of digital signatures on the R3 certificate, just a single one. That means that we can produce a different chain of trust from an existing certificate, without modifying it at all. For example, I can create a new R3 certificate, which is issued by Yours Truly CA, which will validate just fine, as long as you trust the Yours Truly CA. For fun, I don’t even need to have the secret key for the R3 certificate, all the information I need to generate the R3 certificate is public, after all. I wouldn’t be able to generate valid leaf certificates, but I can add parents at will.The question now become, how will a client figure out which chain to use? That is where things get really interesting. During TLS negotiation, when we get the server certificate, the server isn’t going to send us just a single certificate, instead, it is going to send us (typically) at least the server certificate it will use to authenticate the connection as well as an issuer’s certificate. The client will then (usually, but it doesn’t have to) will use the second certificate that was sent from the server to walk up the chain. Most certificate chains are fairly limited in size. For example, using the R3 certificate that I keep referring back to, you can see that it has the ability to generate certificates that cannot properly sign child certificates (that is because its Path Length Constraint is zero, there is no more “room” for child certificates to also sign valid certs).What will happen usually is that the client will use the second certificate to lookup the actual root certificate in the local certificate authority store. That must be local, otherwise it wouldn’t be trusted. So by sending both the certificate itself and its issuer from the server, we can ensure that the client will be able to do the whole certificate resolution without having to make an external call*.* Not actually true, the client may decide to ignore the server “recommendation”, it will likely issue an OSCP call or CRL call to validate the certificate, etc.I hope that this will help you make sense of how we are using cross signed certificates to add additional trust chains to a existing certificate.
Although it's generally considered a best practice to assert only one thing per test, sometimes one logical "thing" may require multiple…Keep Reading →