Tuesday, January 28, 2020

Conflict-free replicated data as an application concern

A while back, I wrote about converging observations in IoT and described how we de-duplicate sensor data. This post is a follow-on in that it describes the same domain in terms of how we distribute the data itself. My inspiration for this particular post is from Colin Breck's excellent post on "Shared-Nothing Architectures for Server Replication and Synchronization". In particular, Colin describes the development of a distributable event log that he was involved with prior to the days of Kafka. We also have a distributed event log so I thought it'd be great to share our approach given that distributing data is hard and we have adopted a relatively simple solution, no doubt used by a few others.

In a nutshell, we move the problem of distributing data to the domain of the application. The closer this problem is to the consumer of the data then the easier it seems to handle (note: I'm not saying it is easy, just easier!). My previous article on converging observations is pretty much of the same vein; de-duplicating data at the point of consumption is much less error-prone than de-duplicating at the source given that there can be many sources that would otherwise need to know of each other.

A real-world example is needed.

We have a topic that events are appended to on a durable queue (our event log). These events describe IoT devices as distinct from recording observations. We call these events "end device events". Here's a complete list of them:

  • BatteryLevelUpdated
  • NameUpdated
  • NoncesUpdated
  • NwkAddrUpdated
  • NwkAddrRemoved
  • PositionUpdated
  • SecretsUpdated
(if you're interested, we have a public API to our system describing these events and more)

Our system has a cloud-based component and an edge-based one. We provision end devices in the cloud and record their observations at the edge (the edge is a farm in the case of our agriculture business).

NwkAddrUpdated, NwkAddrRemoved, and SecretsUpdated can only occur in the cloud. This is a design decision and reflects the use-case of provisioning end devices. When the edge connects to the cloud, it receives these events which are sufficient for an end-device to start sending its observations at the edge. The edge-based end-user may subsequently associate a position and a name with the device. As we receive data from an end-device, battery levels and nonce values (for end device authentication) will also appear as events. These edge-based events are pushed to the cloud.

This single source of writing, at a fine level of granularity, results in a complete picture of end devices both in the cloud and at the edge - with no conflict resolution required!

Now, the doubters among you may be saying, "but what if I need to provide the position from both the cloud and the edge"... or something similar... Well, then you might have a PositionReset event for representing the position provided to the cloud i.e. create a new event type!

Thus... event types dictate where they are written, hence my earlier assertion that distribution becomes an application concern.

We can also replicate events in the cloud for the purposes of backup and sharding. A nice feature of durable queues is that consumers can remember the offset of the last event that was consumed. Each consumer can, therefore, operate at their own pace, which is particularly useful with replicating between the cloud and the edge as the networks connecting them are typically unreliable. Replicating within the cloud is typically very fast and barely noticeable as the networks there are much faster and more reliable.

When writing events in the cloud we can use a partition key to determine which node is responsible for receiving the event. Again, there's only one node responsible for writing.

The down-side of our approach is that if a node becomes lost, at the edge or in the cloud, then the associated events will be lost. This is acceptable for our use-case though as the mean-time between failures is relatively low. 

The approach we've taken works for our use-cases in the world of IoT where best-effort is quite acceptable as the network itself tends to be quite unreliable. There are trade-offs but, I believe, the above approach can also work for many use-cases outside of IoT.

Thanks for reading.