Introduction In the exhilarating infancy stages of a software development project, teams are marked by agility, prompt decision-making, and…Keep Reading →
When you use a record. you can create a new instance by using the new keyword. Or you can copy an instance with some modifications using with expression (non-destructive mutation). The with expression copy all fields from the original instance and then apply the modifications.C#copyvar john = new S
You can listen to me talk to Carl & Richard on RavenDB Sharding here.What is data sharding, and why do you need it? Carl and Richard talk to Oren Eini about his latest work on RavenDB, including the new data sharding feature. Oren talks about the power of sharding a database across multiple servers to improve performance on massive data sets. While a sharded database is typically in a single data center, it is possible to distribute the shards across multiple locations. The conversation explores the advantages and disadvantages of the different approaches, including that you might not need it today, but it's great to know it's there when you do!This episode was recorded a while ago, and just went live.
When you use embedded resources in a .NET project, the resource name is computed from the path of the file. By default, it uses a format similar to <Assembly Name>.<File Path>. But the file path doesn't contain a path separator (/ or \). Instead, the path separator is replaced by a dot
In the previous post, I showed a very simple request router that would always select the fastest node. That worked for a long while, until it didn’t, and the challenge is figuring out why.
As it turns out, the issue is a simple one of spooky action at a distance. Here is what happens. Assume that we have three servers and 10 clients. Each server is sized to handle 4 clients. So far, so good, the system has the capacity to spare.
The problem is in the manner in which clients will detect which is the fastest node in the cluster. The only thing that is considered is the state of the node at the time of selection. At that time, we may end up with all the nodes selecting one particular node as the fastest.
In other words, we have three servers, two of them have no clients talking to them and one of the servers has all the clients talking to it. That results in that node going down, obviously. The clients would then react appropriately, and select a new node to talk to. All of them would do that, find the fastest node, and… bring it down as well. Rinse & repeat.
The issue can be stated as Time Of Check vs Time Of Use, but also as a race condition, where all individual nodes end up doing a synchronized “wave” operation that kills the system.
How do you prevent this?
You introduce randomness into the system. You don’t test the status once, but re-check on a regular basis so you can respond to shifting load. You should also introduce randomness into the process. So the nodes won’t all do this exactly at the same time and end up in the same position.
Side note: Current state in Israel right now is bad. I’m writing this blog post as a form of escapism so I can talk about something that makes sense and follow logic and reason. I’ll not comment on the current status otherwise in this area.
Consider the following scenario. We have a bunch of servers and clients. The clients want to send requests for processing to the fastest node that they have available. But the algorithm that was implemented has an issue, can you see what this is?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
class Router:
def __init__(self, urls: List[string]):
self.urls = urls
self.fastest = asyncio.create_task(self._SelectFastest())
async def Request(self, path):
url = await self.fastest
try:
async with session.get(url + path) as response:
response.raise_for_status() # Raise an exception for non-2xx responses
return await response.text()
except:
# on error - find the next fastest node
self.fastest = asyncio.create_task(self._SelectFastest())
async def _SelectFastest(self):
try:
tasks = [_ContactServer(url) for url in self.urls]
done, _ = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
for task in done:
if task.result() is not None:
return task.result() # returns the url
except:
pass # ignore errors
return self.urls[0] # returns the first otherwise
async def _ContactServer(self, url, path):
async with session.get(url + path) as response:
response.raise_for_status() # Raise an exception for non-2xx responses
return url
view raw
Router.py
hosted with ❤ by GitHub
To simplify things, we are going to assume that the work that is being done for each request is the same, so we don’t need to worry about different request workloads.
The idea is that each client node will find the fastest node (usually meaning the nearest one) and if there is enough load on the server to have it start throwing errors, it will try to find another one. This system has successfully spread the load across all servers, until one day, the entire system went down. And then it stayed down.
Can you figure out what is the issue?
We use cookies to analyze our website traffic and provide a better browsing experience. By
continuing to use our site, you agree to our use of cookies.