wrongtog-3EeDN0ALsVo-unsplash.jpg

Life is about workflows.

Whether it’s the way you make your morning coffee, the process you follow to close out a JIRA ticket, or the production of the Model T, our lives are lived through workflows. These workflows form the backbone of our daily lives, and we put enormous effort into perfecting these flows: optimizing exactly what time we wake up to get to the gym just in time for our favorite class, coming up with the best bedtime routine for maximum sleep quality, or just making sure that we don’t brush our teeth before drinking orange juice.

Whether their end purpose is fitness optimization or automobile production, all of these workflows share two things in common:

  1. Workflows consist of a set of concrete, repeatable steps or activities.
  2. These steps or activities have important dependencies between them. One step often depends on the previous one, and a “half-done” or out-of-order step could result in utter chaos.

Luckily for us, it’s pretty easy for us humans to handle #2 in our day-to-day lives. There are some instances that are more complex than others, but aside from the occasional mental slip-up, we tend not to try and put our socks on top of our shoes.

Unfortunately for developers everywhere, in the abstract world of software, things are a bit more tricky.

For many short-lived, low-level activities, this problem has more or less been solved. Acid-compliant databases have long had transaction mechanisms in place to ensure things work as expected and programmers (and subsequently their applications) don’t end up in a wonky “halfway” state. Much research and work have been put into this, and as a result, no developer spends much time thinking about (much less checking for) whether their database write will work as expected – it’s just a forgone conclusion that it will. The database itself has taken on the responsibility of ensuring that things go as expected and alerting the developer if that isn’t the case. They might have to add some simple logic to retry in the event of an error, but in general, no error from the database means no worries for the developer. This mechanism is fantastic for simple tasks that can be boiled down to “write down that an event happened," and has been a staple of software engineering for decades.

But for higher-level business logic, the picture is a bit more challenging. In general, this is because higher level “business logic” activities tend to have two characteristics that make them very tricky to handle with database transactions: A) they take place over longer periods of time (think seconds or minutes, not microseconds) and B) they span across multiple services. Databases transactions are great for handling simple, short-lived tasks, but for long-lived and complex ones? Not so much.

Unfortunately, the brunt of this falls onto the shoulder of developers everywhere. No longer able to rely on the magic of databases, they are instead forced to handle things on their own. This takes state management from “I don’t have to think about it” (as is the case with databases) to “I better get this right or there are very real ramifications.”

As an example, imagine you are a fintech service, and you are trying to deal with new user sign-ups. At a high level, this is pretty a straightforward process/workflow:

  1. User signs up and enters their credit card
  2. You create a new account for the user
  3. You post a small, random transaction to their bank account
  4. You prompt the user to login to their bank and enter the amount posted
  5. If successful, you mark the account as verified
  6. You remove the credit from the account
  7. You send the user an email verification welcoming them to your service

In plain English, this process is easy for anyone to understand. But if you think about things a bit deeper, there is an enormous amount of room for error. For example, what happens if there is a failure during step 2 and the account never gets created? Does the user have to re-enter their credit card to try again? Do you end up with a half-created account in your database? Does the user get notified that something happened, or do they just sit waiting and wonder why the transaction never showed up?

What if the user walks away from their computer during step 4? Does the process terminate? How long should it wait? Should the credit be left in their account indefinitely? Where is that amount stored if you need to roll it back?

As complex as this is to think through, it’s even more difficult to implement. Coding aside, just imagine you had to do this manually with pen and paper and think about the amount of information you’d have to write down and keep track of: the credit card number, the account id, the user id, the transaction amount, whether the amount was verified, if the credit had been removed already or not. The management of this state is complex, convoluted, and painful. The question is, does it have to be?

The Masters At Work

This is the question that Temporal co-founders Maxim Fateev and Samar Abbas have spent their entire professional lives trying to answer. In today’s world of infinite possibilities and fun projects to work on, it is rare to find an engineer that has worked on the same problem for 5 years at a time. Maxim and Samar have been working on this one for 20 each, with work spanning across some of the biggest names in software: Uber, Microsoft, Google, AWS. They’ve written workflow engines and re-written workflow engines, narrowing in on the most elegant, efficient, and user-friendly solution to one of the hardest problems in software infrastructure. Samar & Maxim initially grabbed the hearts and minds of developers with their work on Cadence, the popular open source framework for workflows, but they didn’t quite feel they had done enough.

With their latest effort, Temporal, they seem to have cracked the code. What started as an internal project at Uber has morphed into one of the fastest-growing and most loved infrastructure projects we have ever seen, powering key production workflows at some of the leading technology companies like Stripe, Hashicorp, Datadog, and Netflix. Finally, the answer to “does state management need to be painful” can be answered with a loud, resounding “no.”

The history of software is one of increasing layers of abstraction - assembly gave way to c, which gave way to Java, which gave way to Rust. Temporal is a new chapter in that history, bringing the transactional guarantees and peace of mind of databases to application developers everywhere. From the moment we met Maxim and Samar, we knew they were the founders to write this chapter. We are delighted to be a part of their story.

In this post: Temporal

Published — Feb. 16, 2022