skip to content
Relatively General .NET

International money transfers, sanctions and utter stupidity

by Oren Eini

posted on: February 15, 2021

I have a consultant that did some work for me. While the majority of our people are working in Israel and Poland, we actually have people working for us all over the map.The consultant submitted their invoice at the end of the month and we sent a wire transfer to the provided account. So far, pretty normal and business as usual. We do double verification of account details, to avoid common scams, by the way, so we know that the details we sent were correct. Except… the money never arrived. When we inquired, it turned out that the money transfer was reversed. The reason why? This is the address that the consultant provided (not the actual address, mind, but has the same issue). Note that the Mr. Great Consultant1234 Cuba AvenueAlta Vista, Ottawa, K1G 1L7CanadaThe wire transfer was flagged as potential international sanctions violation and refused. That was… very strange. It appears that someone saw Cuba in the address, decided that this is a problem and refuse the transfer.  I’m not sure if I would rather that this is the case of an over active Regex or a human not applying critical thinking.We are now on week two of trying to resolve that with the bank and it is quite annoying.Next port of call, buying Monero on the dark web… .

Packaging a Roslyn Analyzer with NuGet package references

by Gérald Barré

posted on: February 15, 2021

This post is part of the series 'Roslyn Analyzers'. Be sure to check out the rest of the blog posts of the series!Writing a Roslyn analyzerWriting language-agnostic Roslyn Analyzers using IOperationWorking with types in a Roslyn analyzerReferencing an analyzer from a projectPackaging a Roslyn Analy

Keep Tests Short and DRY with Extension Methods

by Ardalis

posted on: February 10, 2021

Today as I was writing functional tests for API endpoints again I created some helpers to assist with the boilerplate code involved in such…Keep Reading →

The performance degradation in the load testing tool

by Oren Eini

posted on: February 08, 2021

During benchmark run on a large dataset, I started to notice that longer benchmarks were showing decidedly worse numbers than short ones. In other words, a benchmark that is run for 1 minute is orders of magnitude higher latencies than a benchmark that is run for 30 seconds. And the longer the benchmark, the worst things off. That raised a lot of red flags, and spawn a pretty serious investigation. We take performance very seriously and the observed behavior was that we were getting slower over time. We suspected a leak, high number of GC calls, or memory fragmentation. The scenario under test was a web application using the RavenDB API to talk to RavenDB. We run both the web application and the server under profilers and found a few hot spots, but nothing really major. There was no smoking gun.Then we noticed that the load testing  machine was sitting there with 100% CPU. I initially thought that this is us generating too much load for the machine, but that wasn’t it. We are using wrk2, which is capable of handling million of requests per seconds. We were generating the requests dynamically using a Lua script, and in one of the scenarios under test, we have code like this:path = "/orders/user/" .. page * pageSize .. "/" .. pageSize .. "/?userId=" .. item.id .. "&deep=y"That isn’t the most optimal way to do things, I’ll admit. We can do better by using something like table.concat(), but the problem was that regardless of how you build the string, this is supposed to be fairly cheap. The wrk2 project is using LuaJIT, which has a reputation as a really scripting system. I never really thought that this would be a problem. Sure, it is a little wasteful, but it isn’t too bad, a few string temporaries and maybe some realloc() calls, but nothing major.Instead, this resulted in us getting far worse results over time. It took a while to actually figure out why, but the root cause is in the way LuaJIT handles string hashing. a = lj_getu32(str);h ^= lj_getu32(str+len-4); b = lj_getu32(str+(len>>1)-2); h ^= b; h -= lj_rol(b, 14); b += lj_getu32(str+(len>>2)-1); Strings in Lua are interned, which means that there is just a single copy of a string per value. That means that hashing is important, but the way it does hashing is to take the first 4 bytes, the last 4 bytes and the 4 bytes in the middle and use that for a hash. And that is it.If you have a bunch of strings where those 3 locations match… well, welcome to hash collisions. At which point, what is supposed to be a O(1) call becomes an O(N) call. And creating strings will turn the operations into an O(N^2) operation!Here is the reproduction code: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters local s = 0 local prefix = nil -- change to '' for perf boost for i=1,1000000,1 do local f = string.format('aaaa-%s-%6d-zzzz', prefix, i) s = s + #f -- avoid elimination of f end view raw perf.lua hosted with ❤ by GitHub Change the prefix to be an empty string for a major performance boost. The actual bug is well known (5 or 6 years), but it was only recently fixed and not on the version that wrk2 is using. We had to toss out the entire benchmarking set and start over because of this. We were generating requests with random data, so some of them would hit this problem hard, and some would avoid it by magic. I was not expecting to debug hash collision in Lua code while trying to get some performance numbers from overloading RavenDB, quite random, literally.

Correctly converting a character to lower/upper case

by Gérald Barré

posted on: February 08, 2021

This post is part of the series 'Strings in .NET'. Be sure to check out the rest of the blog posts of the series!String comparisons are harder than it seemsHow to correctly count the number of characters of a stringCorrectly converting a character to lower/upper case (this post)How not to read a st

Dream Big: Three Months in at Elastic

by Steve Gordon

posted on: February 05, 2021

I recently passed my probation period at Elastic which, of course, I’m remarkably pleased about! In this post, I wanted to attempt three things. Firstly, to encourage every developer out there to dream big and realise that you can accomplish anything you put your mind to. I’m now in a dream role and if I […]

Building a social media platform without going bankrupt

by Oren Eini

posted on: February 05, 2021

Unless I get good feedback / questions on the other posts in the series, this is likely to be the last post on the topic. I was trying to show what kind of system and constraints you have to deal with if you wanted to build a social media platform without breaking the bank.I talked about the expected numbers that we have for the system, and then set out to explain each part of it independently. Along the way, I was pretty careful not to mention any one particular technological solution. We are going to need:CachingObject storage (S3 compatible API)Content Delivery NetworkKey/value storeQueuing and worker infrastructureNote that the whole thing is generic and there are very little constraints on the architecture. That is by design, because if your architecture can hit the lowest common denominator, you have a lot more freedom. Instead of tying yourself to a particular provider, you have a lot more freedom. For that matter, you can likely set things up so you can have multiple disparate providers without too much of a hassle. My goal with this system was to be able to accept 2,500 posts per second and to handle reads of 250,000 per second. This sounds like a lot, but a most of the load is meant to be handled by CDN and the infrastructure, not the core servers. Caching in a social network is somewhat problematic, since you’ll have a lot of the work is obviously personalized. That said, there is still quite a lot that can be cached, especially the more popular posts and threads. If we’ll assume that only about 10% of the reading load hits our servers, that is 25,000 reads per second. If we have just 25 servers for handling this (assuming five each in five separate data centers) we can accept the load at 1,000 requests per second. On the one hand, that is a lot, but on the other hand…. most of the cost is supposed to be about authorization, minor logic, etc. We can also at this point add more application servers and scale linearly. Just to give some indication of costs, a dedicated server with 8 cores & 32 GB disk will cost 100$ a month, and there is no charge for traffic. Assuming that I’m running 25 of these, that will cost me 2,500 USD a month. I can safely double or triple that amount without much trouble, I think.Having to deal with 1,000 requests per server is something that requires paying attention to what you are doing, but it isn’t really that hard, to be frank. RavenDB can handle more than a million queries a second, for example.One thing that I didn’t touch on, however, which is quite important, is the notion of whales. In this case, a whale is a user that has a lot of followers. Let’s take Mr. Beat as an example, he has 15 million followers and is a prolific poster. In our current implementation, we’ll need to add to the timeline of all his followers every time that he posts something. Mrs. Bold, on the other hand, has 12 million followers. At one time Mr. Beat and Mrs. Bold got into a post fight. This looks like this:Mr. Beat: I think that Mrs. Bold has a Broccoli’s bandana. Mrs. Bold: @mrBeat How dare you, you sniveling knaveMr. Beat: @boldMr2 I dare, you green teeth monsterMrs. Bold: @mrBeat You are a yellow belly deerMr. Beat: @boldMr2 Your momma is a dearThis incredibly witty post exchange happened during a three minute span. Let’s consider what this will do, given the architecture that we outlined so far:Post #1 – written to 15 million timelines.Post #2 - 5 – written to the timelines of everyone that follows both of them (mention), let’s call that 10 million.That is 55 million timeline writes to process within the span of a few minutes. If other whales also join in (and they might) the number of writes we’ll have to process will sky rocket. Instead, we are going to take advantage of the fact that only a small number of accounts are actually followed by many people. We’ll place the limit at 10,000 followers. At which point, we’ll no longer process writes for such accounts. Instead, we’ll place the burden at the client’s side. The code for showing the timeline will then become something like this: This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters async function load_timeline(user_id: string) { var user = await users.get(user_id); var promises = [timelines.get_first(user.timelines["self"])]; for (const whale of user.whales) { promises.push(timelines.get_first(whale)); } var all = await Promise.all(promises); var timeline = [].concat(all); // flatten the arrays timeline.sort(); // remember, post ids are semi sortable return timeline; } view raw read.js hosted with ❤ by GitHub In other words, we record the high profile users in the system and instead of doing the work for them on write, we’ll do that on read. The benefit of doing it in this manner is that the high profile users tiimeline reads will have very high cache utilization.Given that the number of high profile people you’ll follow are naturally limited, that can save quite a lot of work.The code above can be improved, of course, there are usually a lot of difference in the timeline posts, so we may have a high profile user that is off for a day or two, they shouldn’t show up in the current timeline and can be removed entirely. You need to do a bit more work around the time frames as well, which means that timeline should also allow us to query itself by most recent post id, but that is also not too hard to implement. And with that, we are at the end. I think that I covered quite a few edge cases and interesting details, and hopefully that was interesting for you to read. As usual, I really appreciate any and all feedback.