CQRS Episode II – Attach the cloners

In my previous post I explained why CQRS matters and why you should adopt it if you really care your product and don’t want data growth to become a bottleneck rather than a success in your business.

Now I’m gonna dig a bit more. I want to show you how CQRS works under the hood.

Command/Query responsibilities

CQRS stands for Command Query Responsibility Segregation, and its name reveals how it works at its core.
As stated in the previous post, a software needs two data models in order to face the data growth: on model to hold the application state, and one to handle the numbers.
Well, let’s start from naming things. The requests sent to your application can be split in two main categories:

  • requests that do change the application state (e.g.: creating a new user, submitting an order, performing a bank transaction, etc.). These requests are called commands.
  • requests that only read your data and do not change the application state (e.g.: counting the number of registered users, getting the details of one user, getting the account balance). These requests are called queries.

CQRS aims to split your application in these two main areas: the commands and the queries. These have totally different structures, architectures and purposes.

Command vs Query

Usually when a command is sent to your application (e.g.: via an HTTP request), the business logic gets involved in order to determine whether or not the request can be satisfied. The typical steps are:

  • Parse the request (return an error in case of bad syntax/arguments)
  • Load the resource state from the storage
  • Ensure that the requested action is allowed basing on the resource state
  • Eventually apply and persist the change

As an example, imagine a banking application with a business rule stating:

A debit transaction is allowed if the requested amount is not higher than the balance

(i.e.: the account balance cannot go negative).
Some API is then designed to handle the command. The API code will look like the following:

  • load the bank account from the storage (it can involve multiple tables)
  • verify that the account balance covers the requested amount
  • update the balance
  • commit the update to the storage (hopefully with a proper concurrency management)

This is how things work, regardless of the database type.
And that just works fine with the classical one model to rule them all approach: the developer designs one database schema along with the code that handles that model.

What’s new in the CQRS architectural pattern, however, is the query model: when it’s time to query your application to get the numbers, the designated schema should be an ad-hoc set of tables. That is: the model that holds the application state is not touched by queries, it is just read and updated when a command is sent.
But how does that work? How is it possible for a microservice to handle these two different models?

Under the hood

As illustrated in the above diagram, the application is logically split in two models.
The command side handles all the incoming commands: it is invoked when a POST, PUT or DELETE request is sent. The command model and the business logic are involved.
The query side handles all the incoming queries: it is invoked when a GET request is sent. The query storage is used in read-only mode.

Event Bus is the bridge between the two. Whenever a command is processed without errors and the resource updated into the storage, a domain event is emitted to notify whoever is interested in. An event bus can be implemented in a lot of different ways, but that’s not the core point. What matters is that by dispatching domain events, the microservice itself can capture that same events and use them to update the query model.

This is the core point: by introducing an event bus, the business logic is not messed up with additional code that writes the same data in different places and formats. This means that the command side just processes the commands, ensures that the business logic is not deceived, applies the changes and then returns the result. Nothing more, nothing less. Pure business logic.
In a totally asynchronous way, the domain events dispatched by the command side get captured and processed by the query side to update its model.
The two sides are processing the same requests at different times and speeds: should the query model need some time to update its model, the command execution time would not be affected at all.
This however introduces a lag between the models.

But who is in charge of handling the events in the query model?

Attach the cloners

The query model is also known as projection: data coming from the command side is projected – that is represented – in very different ways, and each projection has a specific purpose, depending on the usage for which it has been thought.

Hence the key point in the query model is the projection. It is the microservice component that subscribe to specific business events and transforms their payload to some other data format. One microservice can have several projectors, handling the same events, to write the same data to totally different tables and formats.

As an example, think of a domain event for a debit transaction in a banking application.
When a debit request is sent to the microservice and the debit is successfully applied, an event is dispatched. Such event would most probably carry a payload like the following:

{
  "name": "AccountDebited",
  "date": "2017-12-18T17:23:48Z",
  "transactionId": "tx-7w89u12376162",
  "accountId": "IT32L0300203280675273924243",
  "amount": {
    "currency": "EUR",
    "amount": 42
  }
}

That event can be captured by the same microservice that triggered it and routed to different projectors, who in turn update different projections. For example by:

  • appending one row to the “Transactions” table, that just contains the transactions history
  • updating one row in the “Balances” table, that contains one row for each account, with its current balance and the last update time
  • updating one row in the “Monthly Expenses” table, that contains the sum of debit transactions for a baking account relative to one month; the table unique key is the [“account_id”, “month”] columns pair (the month can be extracted from the “date” field of the event payload, e.g.: “2017-12”)

By doing this, the application does not need to transform “the one” data model on the fly each time a query is performed by an API. Rather, it can rely on different data models to pick the requested data from, depending on what the query is asking for.
The query model already have materialized data.

What’s next?

In the next episode, CQRS Episode III – Rewind of the sync, I’ll show how to rebuild projections in case of bugs or migrations, and how the same applies when you need to build a brand new projection.

Stay tuned!

Antonio Seprano

Apr 2020, still covid-free.

CQRS Episode I – The phantom data

Have you ever heard about CQRS?
Maybe yes, maybe no.

Well, if you got here asking yourself how to implement CQRS, then you already know what I’m talking about.
But if you are looking for what CQRS is, then you are new.
Either way, by this and the following posts, I will progressively explain why CQRS matters, what problems it solves, why it should be the main approach for a medium/big size company who lives by its product, and how to implement it.

One data model to rule them all? No, thanks.

One of the most (if not the only) common approach that software developers unwittingly adopt in their job is to design their software on one specific data model. They think about the software in terms of that model, they think in terms of how they will store data on the database, then they write the code to handle that specific model.
But, and here’s the important part, one single data model is not suitable for data growth.
And no, it doesn’t matter whether you are planning for a microservice architecture or a monolith. If the data is planned to grow, one model is not enough.
Let me show you some examples.

Example 1: The organizational chart

VentureSaas Inc. wants to give their customers the ability to handle the organizational chart into the product, so they start planning to introduce the Organizational Chart feature.
Every part of the product will be impacted by this change, ranging from the search bar to the reports, to the dashboard widgets. The new Organizational Chart feature will be a core functionality.

Typical organizational chart

The project team started by (guess what?) modelling the database schema for the organizational chart operations. It will be easy for them – as developers – to create, rename or even drop entire organizational branches. Also, it seems to be very easy to create the relationships between employees and the branches they belong to.
This data model will suffice each and every operation.
The same model will be queried at need.

A classical parent-child relationship schema to store a tree. A nested set would be way better, but it is just representative.

What the project team didn’t think of is the scalability.

As the first big customer started to use this new feature, everything slowed down.
It became harder and harder for them to access the users management page, as for the reports, last but not least the dashboard.
Tickets forwarded to the dev-team were clear: the new feature was performing bad because of the queries to the users catalogue, to the organizational branch and a mix of the two. Counting the number of employees in one department and all of its sub-sections was awfully slow.
The feature has been designed to query the aforementioned data model via a series of tangled, nested, difficult to read, hard to modify, slow to run queries. Guess why?

One data model to rule them all.

Example 2: The monthly expenses

Sharks&Loans Bank is planning to add a new widget to the homepage of the home banking website: a pie chart representation of the account holder’s monthly expenses. Each slice of the pie represents the monthly expense percentage relative to the running year:

Their project team already has the “one table to rule them all” to start from: the Transactions table. Each and every bank transaction is stored in that table, along with the relative bank account id, the transaction type (deposit or withdrawal), the amount and the date:

Easy peasy. So they say.
But.
But that table is huge. And they had to write a complex query that filters the required transactions from a huge list, groups them by month, sums the amounts and calculates the percentage of each month relative to the entire year. Everything to just get the values of the pie chart from that infinite list.
A medium-complex SQL query, nothing impossible. Every developer is able to write something like that.
Now, just for fun, let’s assume that the above operation is very slow because of the too many transactions, and that’s the only data source that you can use to calculate the values you need for the pie-chart widget.
It becomes a pain in the ass.
What would you do?
I can predict your answers:

  • Check for bad query practices
  • Check for table indexes
  • Check for misused indexes
  • Try to optimize the query
  • Look for some solution on stackoverflow
  • Scale the database server

Everything, but a data model analysis: “That table schema is fine and the problem is somewhere else”. Right?

So what?

Why it is so hard to query a data model that, paradoxically, has been thought to be easy, at first?
Because data has grown too much!”, a developer could reply. And indeed that developer would be true. You can’t prevent how much your data will grow, so your starting model works fine, at least for a while. But the more your customers use your product, the more data they generate. And the more customers you get, the more the data grows.
What the above developer is not thinking at all, however, is the fact that there’s really no need to scan all the historical data to just build a new representation of it.
What developers take for granted is that the required informations can be rebuilt from the actual, generic data model. And it’s not a whim, that’s for one specific reason: developers don’t want to store the same information in different formats. “Data MUST NOT be redundant” is some kind of mantra for developers, because they know that data redundancy is risky and expensive, from a coding point of view. Keeping redundant data synchronized is slow, difficult and error prone. Why should a developer mess the codebase to just write the same information in many different formats when all the possible data formats can be deduced from just one?

The “What If” game

What if the two aforementioned companies already had the data they needed, rather than having to rebuild it from scratch at need? What if it was possible for them to write redundant data, represented in different models, and get faster answers?

VentureSaas in Example 1 didn’t have to COUNT the number of employees in each branch. They could already know how many employees there are in one branch and what is the total number of employees in that branch and all of its sub-branches. Their product wouldn’t have become deadly slow.

A new representation of the orgchart table: members_count is the number of employees in that branch, total_members_count is the number of employees in that node and all of its sub-branches.

What if Sharks&Loans Bank in Example 2 didn’t have to recalculate the monthly expenses of each account holder? What if they already had some snapshot of the monthly expenses into one, ad-hoc, data model? Their widget wouldn’t have slowed the dashboard.

But how could have been it possible for those companies to have the same data represented in different data models?

Two sides of the same coin

Both the above examples make it clear the big mistake that a software architect can make: pretending that the data model that stores the application state can be used to provide the numbers.

On the one hand a software needs to store its state, because the state tells the application where it is and where it can go. The state allows the software to determine whether a user can perform some action or not (e.g.: a debit transaction is allowed only if the account balance has enough funds).

On the other hand a software needs to store the numbers, because all the metrics, all the statistics, all the informations needed by humans (and/or by the UI) are expressed in terms of numbers.

And here’s the lying truth: a good software needs both the models.

State.
And numbers.

Two sides of the same coin.

CQRS is the word

And here’s finally what CQRS is all about: designing your software so that it can handle both the state and the numbers, without actually mixing the two.
A software designed with the CQRS pattern at its core is a software that does not fear the growth. Your customers will thank you.

What’s next?

Next episode, CQRS Episode II – Attach the cloners, will be a tech overview of the CQRS architecture.

Stay tuned!

Antonio Seprano

Apr 2020, covid-free.

The OOP golden rule #0

Interface

Have you ever wondered why an electrical socket is the way it is? And what does it represent in terms of design?
The electrical socket makes our interaction with electric power easy.
Maybe you don’t think about it but generating, leveling and delivering electricity to houses is not an easy job, it is a very complex production chain. Still, of that whole production chain, the socket, the last ring of the chain, is our only point of interaction.
It is easy to use because it has been meant to be like this. It has been conceived so that it is easy to use in the right way and hard to to use in the wrong way. It hides the complexity out of sight, behind a wall, along with annoying stuff like cables, fuses and weldings.
No matter what you need to plug to the socket, whether it is a simple device like a light bulb or a complex one like a computer, and no matter where the electricity comes from, whether it is generated by a hydroelectric plant or an old wizard, all you have to do is plug your device to the socket.

The above introduction on sockets, designed to be easy to use hiding complexity behind a wall, is suitable for introducing a very similar concept at OOP level: interfaces.

Interfaces

There are some reasons why you seriously should start using interfaces in your projects, if you haven’t yet. And, even if you already are, maybe you don’t effectively know the opportunities they provide, so you better keep on reading.
We are going to talk about what I usually like to refer to as “Rule #0” for writing clear, readable and maintainable code:

Program to an interface, not an implementation

Before we go any further, let me make it clear: the term “interface” does not refer to interfaces in the strict sense. I mean… I’m not specifically referring to the “interface” keyword at programming language level. I am talking about abstraction.

Program to an abstraction, not an implementation

The meaning of the above quote is: write your code so that it depends on abstract concepts rather than on concrete classes.

Decoupling the implementation

Introducing interfaces in your project, and using them to enforce arguments, allows you to have multiple implementations of the same concept. It is not unusual to switch implementation at runtime, on the basis of some kind of strategy. Most of the time, however, one single implementation will be enough for the entire application to run. Still, interfaces allows you to switch to another implementation with little or no effort – if you drew the right abstraction from the beginning and filled your code with references to it.

Imagine you are working on a project aimed at allowing users to pay with their credit card: the credit card is the abstraction that your code should rely on. The concrete payment mechanism is – of course – delegated to whatever payment gateway you want to use (e.g.: PayPal, PAYMILL, Stripe, and so on). Usually, those systems come with some proprietary frameworks making it easy for developers to interact with them. Still, it is worth creating an abstraction and passing it around in your code rather than creating a lot of references to the specific implementations of those frameworks:

Note: the abstraction is not called ICreditCard because of a very specific reason: since the word “ICreditCard” does not exist in the above business logic, there must not be room for it in your code.

Personally, I don’t like the “I<something>” syntax when defining abstractions. The fact that we’re introducing an interface, an abstract class or a concrete class, is completely irrelevant. That declaration represents a concept, so its name must be clear and explicit, and should hide the level of implementation (interface, abstract or concrete class) in use.

The credit card, as an abstraction used for payments and refunds, has nothing to do with any implementation of the payment systems on the market. Your code should only refer to the CreditCard abstraction in order to process payments and perform refunds:

class PaymentService {

public charge(amount: Money, targetCard: CreditCard) {
const chargeId = targetCard->charge(amount);
// other stuff, like persisting or triggering events
}

public refund(chargeId: ChargeID, targetCard: CreditCard) {
/* Code for refunding */
}

}

By introducing the CreditCard abstraction, your code becomes more abstract as well. You can use the Adapter design pattern along with a service container to use any vendor-specific credit card implementation, avoiding references to those objects in your code. Should you switch to another payment system, the only thing you have to do is replacing the adapters in use with new ones. The code that relies on the concept of CreditCard will remain the same because the concept has not been affected by the change.

As I said before, making references to abstractions in your code allows you to use any implementation of them. For example, you could write a multi-credit-card object:

The CreditCardArray is a composition that acts as a single object. It implements the CreditCard interface and can therefore be used accordingly. But it is also composed by a collection of one or more CreditCards (you can enforce the “one or more” constraint in the constructor) and its goal is to encapsulate the logic for looping through the collection of credit cards until one that can be charged for the requested amount is found.

Also think about a fake credit card implementation that intentionally fails at charging time: this implementation would allow you to test your system in such edge cases. You could end up creating as many fake cards as you want, each one returning a specific error code for you to test all possible situations.

Removing what is not needed

You may be wondering if you should use an abstraction even when your application doesn’t expect to have multiple implementations. Maybe you only have one single class that does a simple task in a simple way.
Well, you actually don’t have to.
It’s not a must, it’s more a rule of thumb. By introducing interfaces you are just one step ahead on the abstraction road, nothing that a simple refactor can’t do.

But there’s another reason why you should stay on that road.
Think to a class that implements the logic of a Queue, like the following:

An implementation of a queue

It’s a queue in the classic way: a client can push items to it, pop them out, query the number of items inside the queue and clear its content by removing them at once.
It’s ok for a queue.
But clients use what they get.
If a client gets an instance of Queue like the one above, then the developer of that client will feel entitled to carry out his job by doing everything at his disposal. Do you feel comfortable with that? Do you think that the developer sense of responsibility is enough? Do you believe that code reviewing prevents anybody from doing the wrong thing at some point? If you feel comfortable with that, then you can go over and use your class as-is.
But relying on the sense of responsibility has a cost: a higher probability of errors.
Sooner or later, somebody will use a method of your object that wasn’t supposed to be used in that context. A mistake, of course. Can developers be considered unreliable? Would you really blame them? Maybe developers make mistakes, or maybe the code reviewers, looking at what developers was given, thought that they were legitimate to do what they did. Can you really blame them? They are responsible for the mistake to some extent. But the truth is that the guilty developers are not that guilty since they have just used what they was given. You gave them an object and, implicitly, expected them to use only some of its functionalities. If you think about it for a while, this is what happens every time a developer – even you – writes code involving the use of an object provided by somebody else: in order to avoid mistakes, developers must know, based on the context, what they can do with that object, which methods of that object can be used and which ones are forbidden.
As time goes by, the code will need more and more maintenance and it will be more likely for some developer to act wrongly by using that object in a way that wasn’t supposed to be used in that context.
So, one question arises: why should you share an object that can only be partially used? Why don’t you provide an object by exposing the right interface instead? The kind of interface allowing a client to do only the few things it is allowed to do, and nothing more?

If, in a specific context, a client is only allowed to pop items out of a queue, then write it by using the interface of a queue that enforces this constraint:

An interface that is less likely to be used in the wrong way

Of course, you don’t have to write many different implementations of the queue, each one with a reduced set of functionalities. You can still create one full-featured queue as an object that implements the two – or even more – segregated interfaces and choose which of those interfaces you want to use in your code, depending on what the client should be allowed to do with it:

The interface segregation in action

Since the Queue class implements both the ReadableQueue and the WriteableQueue interfaces, it can be passed to any function that accepts those interfaces, keeping them unaware of what specific object it really is.

const queue = new Queue();
// ...

consumeQueue(queue);

function consumeQueue(queue: ReadableQueue) {
/*
Here the developer doesn't know what 'queue'
really is, and he can only use it by its interface
*/
}

In the example above, the consumeQueue() function receives a ReadableQueue object. It is unaware of what specific implementation it will get, and should stay agnostic about it. The only thing that the function can do is using the queue object as a ReadableQueue.

Better testing

Last but not least, introducing abstractions in your code makes it automatically more easy for developers to write tests: they can inject mocks, fake or stub implementations instead of the real classes. It is true that a lot of testing frameworks can create mocks starting from concrete classes, but sometimes writing fake implementations may be easier than programming mocks, depending on how handy the framework is.

Antonio Seprano

Jan 2019