Simpler XAML in .NET MAUI 10
by David Ortinau
posted on: June 26, 2025
Make your .NET MAUI XAML more consistent and easier to read with global and implicit XML namespaces.
by David Ortinau
posted on: June 26, 2025
Make your .NET MAUI XAML more consistent and easier to read with global and implicit XML namespaces.
by Andrew Lock
posted on: June 24, 2025
In this post looking at stacked branches I describe how to handle scenarios such as merging one of your stacked branches and handling changes to main…
by Gérald Barré
posted on: June 23, 2025
Some applications are displaying sensitive information. With Recall, Windows can take a screenshot of your screen regularly and store sensitive information.You can exclude your application from this feature, and screen capture more generally, by using SetWindowDisplayAffinity and the WDA_EXCLUDEFRO
by Oren Eini
posted on: June 20, 2025
You are assigned the following story:As a helpdesk manager,I want the system to automatically assign incoming tickets to available agents in a round-robin manner,so that tickets are distributed evenly and handled efficiently.That sounds like a pretty simple task, right? Now, let’s get to implementing this. A junior developer will read this story and realize that you need to know who the available agents are and who the last assigned agent was.Then you realize that you also need to handle more complex scenarios:What if you have a lot of available agents?What if we have two concurrent tickets at the same time?Where do you keep the last assigned agent?What if an agent goes unavailable and then becomes available again?How do you handle a lot of load on the system?What happens if we need to assign a ticket in a distributed manner?There are answers to each one of those, mind you. It is just that it turns out that round-robin distribution is actually really hard if you want to do that properly.A junior developer will try to implement the story as written, maybe they know enough to recognize the challenges listed above. If they are good, they will also be able to solve those issues. A senior developer, in my eyes, would write the following instead:from Agents where State = 'Available' order by random() limit 1In other words, instead of trying to do “proper” round-robin distribution, with all its attendant challenges, we can achieve pretty much the same thing with far less hassle. The key difference here is that you need to challenge the requirements, because by changing what you need to do, you can greatly simplify your problem. You end up with a great solution that meets all the users’ requirements (in contrast to what was written in the user story) and introduces almost no complexity.A good way to do this, by the way, is to reject the story outright and talk to its owner. “You say round-robin here, can I do that randomly? It ends up being the same in the end.”There may be a reason that mandates the round-robin nature, but if there is such a reason, I can absolutely guarantee that there are additional constraints here that are not expressed in the round-robin description.That aspect, challenging the problem itself, is a key part of what makes a senior developer more productive. Not just understanding the problem space, but reframing it to make it easier to solve while delivering the same end result.
by David Ortinau
posted on: June 17, 2025
Enhance your .NET MAUI app with photo-based AI by capturing images and extracting structured information using Microsoft.Extensions.AI.
by Andrew Lock
posted on: June 17, 2025
In this post I describe why I like to use stacked branches and stacked PRs for larger features, and how I handle making changes to commits in the stack…
by Gérald Barré
posted on: June 16, 2025
GitHub Actions doesn't provide a built-in way to rerun a run automatically. If you have some flakiness in your jobs, this can be a pain as you have to rerun the workflow manually.Hopefully, you can use the workflow_run event to trigger a new workflow when a previous one fails. This allows you to re
by Leslie Richardson
posted on: June 12, 2025
We recently introduced several new GitHub Copilot-powered .NET experiences designed to help you be more productive. Take a look!
by Oren Eini
posted on: June 11, 2025
Today's incident involved a production system failure when one node in the cluster unexpectedly died. That is a scenario RavenDB is designed to handle, and there are well-established (and well-trodden) procedures for recovery.In this case, the failing node didn’t just crash (which a restart would solve), but actually died. This meant that the admin had to provision a new server and add it to the cluster. This process is, again, both well-established and well-trodden. As you can tell from the fact that you are reading this post, something went wrong. This cluster is primarily intended to host a single large database (100+ GB in size). When you add a new node to the cluster and add an existing database to it, we need to sync the state between the existing nodes and the new node.For large databases, that can take a while to complete, which is fine because the new node hasn’t (yet) been promoted to serve users’ requests. It is just slurping all the data until it is in complete sync with the rest of the system. In this case, however… somehow this rookie server got promoted to a full-blown member and started serving user requests.This is not possible. I repeat, it is not possible. This code has been running in production for over a decade. It has been tested, it has been proven, it has been reviewed, and it has been modeled. And yet… It happened. This sucks.This postmortem will dissect this distributed systems bug.Debugging such systems is pretty complex and requires specialized expertise. But this particular bug is surprisingly easy to reason about.Let’s start from the beginning. Here is how the RavenDB cluster decides if a node can be promoted:def scan_nodes(): states = {} for node in self.cluster.nodes: # retrieve the state of the node (remote call) # - may fail if node is down state = self.cluster.get_current_state(node) states[node] = state for database in self.cluster.databases: promotables = database.promotable_nodes() if len(promotables) == 0: # nothing to do continue for promotable in promotables: mentor = promotable.mentor_node() mentor_db_state = states[mentor].databases[database.name] if mentor_db_state.faulted: # ignore mentor in faulty state continue promotable_db_state = states[promotable].databases[database.name] if mentor_db_state.last_etag > promotable_db_state.current_etag: continue # the promotable node is up to date as of the last check cycle, promote self.cluster.promote_node(promotable, database)The overall structure is pretty simple, we ask each of the nodes in the cluster what its current state is. That gives us an inconsistent view of the system (because we ask different nodes at different times).To resolve this, we keep both the last and current values. In the code above, you can see that we go over all the promotable nodes and check the current state of each promotable node compared to the last state (from the previous call) of its mentoring node.The idea is that we can promote a node when its current state is greater than the last state of its mentor (allowing some flexibility for constant writes, etc.). The code is simple, well-tested, and has been widely deployed for a long time. Staring at this code didn’t tell us anything, it looks like it is supposed to work!The problem with distributed systems is that there is also all the code involved that is not there. For example, you can see that there is handling here for when the mentor node has failed. In that case, another part of the code would reassign the promotable node to a new mentor, and we’ll start the cycle again.That was indeed the cause of the problem. Midway through the sync process for the new node, the mentor node failed. That is expected, as I mentioned, and handled. The problem was that there are various levels of failure.For example, it is very clear that a node that is offline isn’t going to respond to a status request, right? What about a node that just restarted? It can respond, and for all intents and purposes, it is up & running - except that it is still loading its databases. Loading a database that exceeds the 100 GB mark can take a while, especially if your disk is taking its time. In that case, what ended up happening was that the status check for the node passed with flying colors, and the status check for the database state returned a loading state.All the other fields in the database status check were set to their default values… I think you can see where this is going, right? The problem was that we got a valid status report from a node and didn’t check the status of the individual database state. Then we checked the progress of the promotable database against the mentor state (which was all set to default values). The promotable node’s current etag was indeed higher than the last etag from the mentor node (since it was the default 0 value), and boom, we have a rookie server being promoted too soon.The actual fix, by the way, is a single if statement to verify that the state of the database is properly loaded before we check the actual values. To reproduce this, even after we knew what was going on, was an actual chore, by the way. You need to hit just the right race conditions on two separate machines to get to this state, helped by slow disk, a very large database, and two separate mistimed incidents of server failures.
by David Ortinau
posted on: June 11, 2025
Learn how to enhance your .NET MAUI apps with multimodal AI capabilities, enabling users to interact through voice using plugins and Microsoft.Extensions.AI.