Making Globally Distributed Systems easier with Leadership elections in .NetCore

Recently I’ve been looking at how to ensure services are always running globally across a number of data centres.

Why would you want to using Leadership elections to do this?

I’ll cover this at a very high level, for more detail look at articles like this, this I also highly recommend the chapter covering this in Google’s Site Reliability Engineering book.

I want the service to remain available during outages which affecting a particular node or DC.

To do this you have two options, active-active services, where all DC’s can serve all traffic, or active-passive, where one DC or Node is a master and the others are secondary’s. There is also a halfway house but we’ll leave that for now.

The active-active example is better for some scenarios but comes at a cost, data replication, race conditions and synchronisation require careful thought and management.

For a lot of services the idea of a master node can greatly simplify matters. If you have a cluster of 3 machines spread across 3 DCs and they consistently elect a single, healthy, master node which orchestrates requests – things can get nice and simple. You always have 1 node running as the master and it moves around as needed, in response it issues.

The code it runs can be written in a much simpler fashion (generally speaking). You will only ever have one node executing it, as the elected master, so concurrency concerns are removed. The master can be used to orchestrate what the secondaries are doing or simple process requests itself and only use a secondary if it fails. Developers can write, and more importantly test, in a more managable way.

Now how does this affect scaling a service? Well you can now partition to scale this approach. When your single master is getting top hot you can now split the load across two clusters. Each responsible for a partition, say split by user id. But we’ll leave this for another day.

So how do we do this in .Net?

Well we need a consensus system to handle the election. I chose to use an etcd cluster deployed in multiple DCs, others to consider are Consul and Zookeeper. Lets get into the code..

Continue reading

Azure, How to

PubSub in Service Fabric with Redis

I’ve been working on a project that uses Fabric and I’m hosting Redis inside the cluster as a simple cache system.

One of things that isn’t baked into Fabric is a pub/sub model for communicating between services  about events that are occuring.

As I’ve got the Redis instance up and running in the cluster I decided to take a look at using the Pub/Sub capabilities in Redis to make this happen. N.B Redis isn’t a guarenteed delivery so use where appropriate, there are lots of discussions around it’s pub/sub model and when/where to use etc.

Turns out it’s nice and easy to get working, I’m a big fan of using RX to make nice reactive programs operating on streams of events and there is already a nice sample combineing Redis and RX in C# here.

In not too long I had just what I wanted and through it might be useful to others so I’ve put together a sample. My sample is here with one “EventPublisher” service pushing out events and an “EventSubscriber” listening to events.

Both services write out what they’re up to as ETW messages so you can view in the diagnostic window.



Lawrence Gripper


Azure, How to

Service Fabric: Getting started with a frontend website and a partycluster

This is going to be a quick guide to spinning up an ASPNET 5 website on Service Fabric.

To host it we’re going to use the “Party Cluster” service from the team. This lets you grab a slot on a free public Service Fabric cluster to try out things and get up to speed.

So first things first, head over to the Party Cluster site and sign up for a cluster.


Once you’ve requested access to a cluster (Tip: Pick the one with the most time left to run on it!) you’ll get an email like this one.


The three key bits of info are highlighted, we’ll use these to host our website! Have a read of the rest of the mail too as it details the limitation of the party clusters, limited time, shared etc.

First up the green circle is the link you can use to see the Service Fabric Explorer, we’ll use this later to see our app provision and check it’s health.

Second is the connection address and the port you’ve been allocated, our site will end up being hosted at the connection address plus our application port so in this case

Now lets create our website and publish it to the cluster! I’ll assume at this point that you’ve followed the install guides for getting your local environment setup, don’t worry if you haven’t .. I’ll wait. Head here and follow the guide.

Continue reading