skip to content
Relatively General .NET

Meta Blog

by Oren Eini

posted on: January 17, 2024

I've been writing this blog since 2004. That means I have been doing this for twenty years, which is frankly unbelievable to me. The actual date is sometime in April, so I’ll probably do a summary post then about that. What I want to talk about today is a different aspect. The mechanism and processes I use to write blog posts. A large part of the reason I write blog posts is that it helps me understand and organize my own thoughts. And in order to do that effectively, I have found that I need very little friction in the blogging process. About a decade ago, Google Reader was shut down, and I’m still very bitter about that. It effectively killed a significant portion of the blogging audience and made the ergonomics of reading blogs a lot harder. That also led people to use walled gardens to communicate with others, instead of the decentralized network and feed aggregators. A side effect of that decision is that blogging tools have stopped being a viable thing people spend time or money on.At the time, I was using Windows Live Writer, which was a high-quality editor and had a rich plugin system. Microsoft discontinued it at some point, it became an open-source project, and even that died. The website is no longer functional and even in terms of the GitHub project, the last commit was 5 years ago.I’m still using Open Live Writer to write the majority of my blog posts, but given there are no longer any plugins, even something as simple as embedding code in my posts has become an… annoyance. That kills the ergonomics of blogging for me.Not a problem, this is Open Source, and I can do that myself. Except… I really don’t have the time to spend on something ancillary like that. I would happily pay (a reasonable amount) for a blogging client, but I’m going to assume that I’m not part of a large enough group that there is a market for this. Taking the code snippets example, I can go into the code, figure out what is going on there, and add a “code snippet” feature. I estimate that would take several days. Alternatively, I can place the code as a GitHub gist and embed it in the page. It is annoying, but far quicker than going to the trouble of figuring that out. Another issue that bugs me (pun intended) is a problem with copy/paste of images, where taking screenshots using the Snipping Tool doesn’t paste into Writer. I need to first paste them into Paint, then into Writer. In this case, I assume that Writer doesn’t recognize the clipboard format or something similar.  Finally, it turns out that I’m not writing blog posts in the same manner as I used to. It got to the point where I asked people to review my posts before making them public. It turns out that no matter how many times it is corrected, my brain seems unable to discern when to write “whether” or “whatever”, for example. At this point I gave up updating that piece of software 🙂. Even the use of emojis doesn’t work properly (Open Live Writer mostly predates a lot of them and breaks the HTML in a weird fashion 🤷).In other words, there are several problems in my current workflow, and it has finally reached the point where I need to do something about it. The last requirement, by the way, is the most onerous. Consider the workflow of getting the following fixes to a blog post:and we run => and we ranwe spend => we spentWhere is my collaborating editing and the ability to suggest changes with good UX? Improving the ergonomics for the blog has just expanded in scope massively. Now it is a full-fledged publishing platform with modern sensibilities. It’s 2024, features like proper spelling and grammar corrections should absolutely be there, no? And what about AI integration? It turns out that predicting text makes the writing process more efficient. Here is what this may look like:At this stage, this isn’t just a few minor fixes. I should mention that for the past decade and a half or so, I stopped considering myself as someone who can do UI in any meaningful manner. I find that the <table/> tag, which used to be my old reliable method, is not recommended now, for some reason.This… kind of sucks. I want to upgrade my process by a couple of decades, but I don’t want to pay the price for that. If only there was an easier way to do that.I started using Google Docs to edit my blog posts, then pasting them into Live Writer or directly to the blog (using a Rich Text Box with an editor from… a decade ago). I had to check the source code for this, by the way. The entire experience is decidedly Developer UX. Then I had a thought, I already have a pretty good process of writing the blog posts in Google Docs, right? It handles rich text editing and management much better than the editor in the blog. There are also options for things like proper workflows. For example, someone can go over my drafts and make comments or suggestions.The only thing that I need is to put both of those together. I have to admit that I spent quite some time just trying to figure out how to get the document from Google Docs using code. The authentication hurdles are… significant to someone who isn’t aware of how it all plugs together. Once I got that done, I got my publishing platform with modern features. Here is what the end result looks like:public class PublishingPlatform { private readonly DocsService GoogleDocs; private readonly DriveService GoogleDrive; private readonly Client _blogClient; public PublishingPlatform(string googleConfigPath, string blogUser, string blogPassword) { var blogInfo = new MetaWeblogClient.BlogConnectionInfo( "https://ayende.com/blog", "https://ayende.com/blog/Services/MetaWeblogAPI.ashx", "ayende.com", blogUser, blogPassword); _blogClient = new MetaWeblogClient.Client(blogInfo); var initializer = new BaseClientService.Initializer { HttpClientInitializer = GoogleWebAuthorizationBroker.AuthorizeAsync( GoogleClientSecrets.FromFile(googleConfigPath).Secrets, new[] { DocsService.Scope.Documents, DriveService.Scope.DriveReadonly }, "user", CancellationToken.None, new FileDataStore("blog.ayende.com") ).Result }; GoogleDocs = new DocsService(initializer); GoogleDrive = new DriveService(initializer); } public void Publish(string documentId) { using var file = GoogleDrive.Files.Export(documentId, "application/zip").ExecuteAsStream(); var zip = new ZipArchive(file, ZipArchiveMode.Read); var doc = GoogleDocs.Documents.Get(documentId).Execute(); var title = doc.Title; var htmlFile = zip.Entries.First(e => Path.GetExtension(e.Name).ToLower() == ".html"); using var stream = htmlFile.Open(); var htmlDoc = new HtmlDocument(); htmlDoc.Load(stream); var body = htmlDoc.DocumentNode.SelectSingleNode("//body"); var (postId, tags) = ReadPostIdAndTags(body); UpdateLinks(body); StripCodeHeader(body); UploadImages(zip, body, GenerateSlug(title)); string post = GetPostContents(htmlDoc, body); if (postId != null) { _blogClient.EditPost(postId, title, post, tags, true); return; } postId = _blogClient.NewPost(title, post, tags, true, null); var update = new BatchUpdateDocumentRequest(); update.Requests = [new Request { InsertText = new InsertTextRequest { Text = $"PostId: {postId}\r\n", Location = new Location { Index = 1, } }, }]; GoogleDocs.Documents.BatchUpdate(update, documentId).Execute(); } private void StripCodeHeader(HtmlNode body) { foreach(var remove in body.SelectNodes("//span[text()='&#60419;']").ToArray()) { remove.Remove(); } foreach (var remove in body.SelectNodes("//span[text()='&#60418;']").ToArray()) { remove.Remove(); } } private static string GetPostContents(HtmlDocument htmlDoc, HtmlNode body) { // we use the @scope element to ensure that the document style doesn't "leak" outside var style = htmlDoc.DocumentNode.SelectSingleNode("//head/style[@type='text/css']").InnerText; var post = "<style>@scope {" + style + "}</style> " + body.InnerHtml; return post; } private static void UpdateLinks(HtmlNode body) { // Google Docs put a redirect like: https://www.google.com/url?q=ACTUAL_URL foreach (var link in body.SelectNodes("//a[@href]").ToArray()) { var href = new Uri(link.Attributes["href"].Value); var url = HttpUtility.ParseQueryString(href.Query)["q"]; if (url != null) { link.Attributes["href"].Value = url; } } } private static (string? postId, List<string> tags) ReadPostIdAndTags(HtmlNode body) { string? postId = null; var tags = new List<string>(); foreach (var span in body.SelectNodes("//span")) { var text = span.InnerText.Trim(); const string TagsPrefix = "Tags:"; const string PostIdPrefix = "PostId:"; if (text.StartsWith(TagsPrefix, StringComparison.OrdinalIgnoreCase)) { tags.AddRange(text.Substring(TagsPrefix.Length).Split(",")); RemoveElement(span); } else if (text.StartsWith(PostIdPrefix, StringComparison.OrdinalIgnoreCase)) { postId = text.Substring(PostIdPrefix.Length).Trim(); RemoveElement(span); } } // after we removed post id & tags, trim the empty lines while (body.FirstChild.InnerText.Trim() is "&nbsp;" or "") { body.RemoveChild(body.FirstChild); } return (postId, tags); } private static void RemoveElement(HtmlNode element) { do { var parent = element.ParentNode; parent.RemoveChild(element); element = parent; } while (element?.ChildNodes?.Count == 0); } private void UploadImages(ZipArchive zip, HtmlNode body, string slug) { var mapping = new Dictionary<string, string>(); foreach (var image in zip.Entries.Where(x => Path.GetDirectoryName(x.FullName) == "images")) { var type = Path.GetExtension(image.Name).ToLower() switch { ".png" => "image/png", ".jpg" or "jpeg" => "image/jpg", _ => "application/octet-stream" }; using var contents = image.Open(); var ms = new MemoryStream(); contents.CopyTo(ms); var bytes = ms.ToArray(); var result = _blogClient.NewMediaObject(slug + "/" + Path.GetFileName(image.Name), type, bytes); mapping[image.FullName] = new UriBuilder { Path = result.URL }.Uri.AbsolutePath; } foreach (var img in body.SelectNodes("//img[@src]").ToArray()) { if (mapping.TryGetValue(img.Attributes["src"].Value, out var path)) { img.Attributes["src"].Value = path; } } } private static string GenerateSlug(string title) { var slug = title.Replace(" ", ""); foreach (var ch in Path.GetInvalidFileNameChars()) { slug = slug.Replace(ch, '-'); } return slug; } }You’ll probably not appreciate this, but the fact that I can just push code like that into the document and get it with proper formatting easily is a major lifestyle improvement from my point of view. The code works with the document in two ways. First, in the Document DOM (which is quite complex), it extracts the title of the blog post and afterward updates it with the document ID. But the core of this code is to extract the document as a zip file, grab everything from there, and push that to the blog. I do some editing for the HTML to get everything set up properly, mostly editing the links and uploading the images. There is also some stuff happening with CSS scopes that I frankly don’t understand. I think I got it right, which is fine for now.This cost me a couple of evenings, and it was fun. Nothing earth-shattering, I’ll admit. But it’s the first time in a while that I actually wrote a piece of code that was immediately useful. My blogging queue is rather full, and I hope that with this new process it will be easier to push the ideas out of my head and to the blog.And with that, it is now 01:26 AM, and I’m going to call it a night 🙂.And as a final thought, I had just made several changes to the post after publication, and it went smoothly. I think that I like it.

Authenticating a .NET GitHub App using a JSON Web Token (JWT)

by Steve Gordon

posted on: January 15, 2024

SERIES: A Guide to Developing GitHub Apps on .NET In this post, I cover the steps required to create and sign a JSON Web Token, herein abbreviated as JWT, to authenticate a GitHub App built using .NET. I want to state clearly up front that I’m learning as I go while experimenting with a hobby […]

Recording

by Oren Eini

posted on: January 15, 2024

I spoke with Jaime recently in the Modern .NET Podcast:In this episode of The Modern .NET Show podcast, Oren Eini, a seasoned developer with over 20 years of experience in the .NET field, discussed the evolution of the .NET framework and the complexities that come with it. Eini highlighted the rapid pace of change in the language, from the introduction of generics at version 2.0 to switch expressions and pattern matching in the latest versions. While these new features allow for more concise code, Eini acknowledged that they also increase the scope and complexity of learning C# from scratch.Would love to hear your feedback.

Difference between CultureInfo.Get and new CultureInfo

by Gérald Barré

posted on: January 15, 2024

When you want to get a CultureInfo object, you can use the static method CultureInfo.GetCultureInfo or the constructor of the CultureInfo class. In this post, I describe the difference between CultureInfo.GetCultureInfo and the constructor of the CultureInfo class.C#copyvar cultureInfo1 = CultureIn

Optimizing cache resets for higher transaction output

by Oren Eini

posted on: January 11, 2024

One of the most frequent operations we make in RavenDB is getting a page pointer. That is the basis of pretty much everything that you can think of inside the storage engine. On the surface, that is pretty simple, here is the API we’ll call: public Page GetPage ( long pageNumber ) Easy, right? But internally, we need to ensure that we get the right page. And that is where some complexity enters the picture. Voron, our storage engine, implements a pattern called MVCC (multi-version concurrency control). In other words, two transactions loading the same page may see different versions of the page at the same time. What this means is that the call to GetPage () needs to check if the page: Has been modified in the current transaction Has been modified in a previously committed transaction and has not yet flushed to disk The on-disk version is the most up-to-date one Each one of those checks is cheap, but getting a page is a common operation. So we implemented a small cache in front of these operations, which resulted in a substantial performance improvement.  Conceptually, here is what that cache looks like: public unsafe class PageLocator {      private struct PageData {          public long PageNumber ;          public byte* Pointer ;          public bool IsWritable ;     }      private PageData [] _cache = new PageData [ 512 ];      public byte* Get ( long page , out bool isWritable ) {          var index = page & 511 ;          ref var data = ref _cache [ index ];          if ( data.PageNumber == page ) {              isWritable = data.IsWritable ;              return data.Pointer ;         }          return LookupAndGetPage ( page , ref data , out isWritable );     }      public void Reset () {          for ( int i = 0 ; i < _cache.Length ; i++)              _cache [ i ].PageNumber =-1 ;     } } This is intended to be a fairly simple cache, and it is fine if certain access patterns aren’t going to produce optimal results. After all, the source it is using is already quite fast, we simply want to get even better performance when we can. This is important because caching is quite a complex topic on its own. The PageLocator itself is used in the context of a transaction and is pooled. Transactions in RavenDB tend to be pretty short, so that alleviates a lot of the complexity around cache management. This is actually a pretty solved problem for us, and has been for a long while. We have been using some variant of the code above for about 9 years. The reason for this post, however, is that we are trying to optimize things further. And this class showed up in our performance traces as problematic. Surprisingly enough, what is actually costly isn’t the lookup part, but making the PageLocator ready for the next transaction. We are talking about the Reset () method. The question is: how can we significantly optimize the process of making the instance ready for a new transaction? We don’t want to allocate, and resetting the page numbers is what is costing us. One option is to add an int Generation field to the PageData structure, which we’ll then check on getting from the cache. Resetting the cache can then be done by incrementing the locator’s generation with a single instruction. That is pretty awesome, right? It sadly exposes a problem, what happens when we use the locator enough to encounter an overflow? What happens if we have a sequence of events that brings us back to the same generation as a cached instance? We’ll be risking getting an old instance (from a previous transaction, which happened long ago).  The chances for that are low, seriously low. But that is not an acceptable risk for us. So we need to consider going further. Here is what we ended up with: public unsafe class PageLocator {      private struct PageData {          public long PageNumber ;          public byte* Pointer ;          public ushort Generation ;          public bool IsWritable ;     }      private PageData [] _cache = new PageData [ 512 ];      private int _generation = 1 ;      public byte* Get ( long page , out bool isWritable ) {          var index = page & 511 ;          ref var data = ref _cache [ index ];          if ( data.PageNumber == page && data.Generation == _generation ) {              isWritable = data.IsWritable ;              return data.Pointer ;         }          return LookupAndGetPage ( page , ref data , out isWritable );     }      public void Reset () {          _generation++;          if ( _generation >= 65535 ){                  _generation = 1 ;                  MemoryMarshal.Cast < PageData , byte >( _cache ).Fill ( 0 );         }     } } Once every 64K operations, we’ll pay the cost of resetting the entire buffer, but we do that in an instruction that is heavily optimized. If needed, we can take it further, but here are our results before the change: And afterward: The cost of the Renew ()call, which was composed mostly of the  Reset () call, basically dropped off the radar,and the performance roughly doubled. That is a pretty cool benefit for a straightforward change.

A brief look at StringValues

by Andrew Lock

posted on: January 09, 2024

In this post I look at the StringValues type, where it's used in ASP.NET Core, why it's useful, how it's implemented, and why.…