An event-sourcing conundrum

I have a side project that I use as a sounding board to help me learn about “modern stuff”. The front end is built with React + Flux, and the back end persists information via event sourcing instead of a SQL or document database. I’m enjoying the experience of working with these tools, and so far everything has gone smoothly. But now I have a puzzle, which I will share here in the hope that one of you can point me in a good direction.

One of the components in my React app shows a list of recent activities: those performed by a specific user, or a group of users, or on a specific document. I would like a typical element in the list to look like this:


The words in blue will be hyperlinks to other views in the client app, and these will be built using the IDs of the document and the user. This means that the client needs the following information about each activity:

  1. The type of activity (create, delete, update, etc)
  2. The user’s name
  3. The user’s ID
  4. The document’s title
  5. The document’s ID

Currently the client gets the list of relevant activities by hitting an endpoint on the server’s API. The API gets this data from a read model which in turn is built up in memory as the events occur. But those events don’t carry all of the information that the client will now need. Specifically, the server events only carry the user’s ID, not their name.

So I need to provide information to the client, but no part of the server currently has all of that information available to it. What should I do?

I thought of several alternatives, none of which seemed entirely satisfactory:

  1. Have the client make extra API calls to get the information it needs. This would slow the client down, increase server traffic, and seems like an awful lot of work: extra API endpoints, with extra read models supporting them.
  2. Allow the activities read model to get usernames from the users read model. However, other read models may not be up to date, so this approach seems inherently unreliable.
  3. Augment the activities read model so that it keeps track of usernames and username changes. This would duplicate information kept in other read models, and seems like a lot of effort.
  4. Add the username to the payload of all document change events. Does it really make sense to add information to events just because one particular read model might need it? And besides, the username isn’t available to the code that raises document change events: it would have to be fetched from a read model, which (as above) may not be up to date.

What would you do?

Of course, the act of writing this post has helped me clarify my thinking on this problem, and I have now decided which approach to take. So in a few days I’ll document that here, eventually also with a short report describing how it turned out. In the meantime, I’m really interested in how you event-sourcing experts would proceed. Have I missed a fifth option? Have I mis-analysed one of the options above? Should I just stop thinking and get on with coding?

Update, 28th June

Of course, option 3 is the only sensible approach, and that’s what I’ve implemented. Thanks to everyone who patiently pointed this out to me in the last 24 hours; and as so often, the prize for the first correct answer goes to Mark Kirschstein.

I suspect the reason I dithered over this is that the code in this area contains some legacy crud, preventing me seeing things clearly. So it’s time for some clean-up refactoring too…

4 thoughts on “An event-sourcing conundrum

  1. I would lean towards option 3.

    My understanding is that the data within read models is denormalised and therefore by definition duplicates. This week introduced some complexity managing sets of denormalisers for each event but at least that complexity is in a single known place/pattern.

    Be interested to read your follow up post.

  2. I’d go with option 3 as well… Post-relational database NoSQL thinking would encourage us to design for easy reads even if that made writes more difficult… With the caveat that there might be latency issues in terms of eventual consistency kicking in. I take this approach when I’m building with Firebase.

  3. Number 1 is definitely bad – it’s logically the same as number 2, but with more network traffic.
    Number 4 is also definitely bad. Events are immutable, so unless you need to record the user’s name _at the time the event was recorded_ that data doesn’t belong in the activity events.

    2 and 3 are similar, depending on how far apart your separate read models are. If they’re in the same process then they’re pretty much the same; if they’re distributed then I’d go for 3 in this case, but maybe lean more towards 2 in the case where there’s a lot of data to share and it’s not shared often.

    However, it’s a bit of an odd question: your read models should support the reads you want to perform, but you seem to have an “activities” read model and separate “users” read model as if each read model can only deal with a single stream of events (or, more traditionally, corresponds to a single table). This isn’t the case. I would try something like a “recent activities” read model that receives events from both streams and builds its in-memory representation accordingly, as per option 3, and moreover discards activities that aren’t “recent”.

    If you also have a “activity history” model then there may well be a good amount of duplication between it and the “recent activities” model that a future refactoring could bring together. I’d expect them to remain as separate models, however: the usage patterns and memory requirements are quite different, so keeping them separate makes it easy to scale out the “recent activities” model without needing all the power for the full “activity history” one.

    You mention possible inconsistency in option 2, but option 3 is also potentially inconsistent if your read models are distributed (e.g. scaled out). It clearly doesn’t matter here, but I make the stronger claim that you can set things up so it almost never matters. Ultimately, the users’ in-brain models are going to be inconsistent some of the time, even if you only ever put consistent information in front of them. Business processes are usually designed around this.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s