backend engineer

I build backend systems
that scale without drama.

Based in India. Open to high-impact backend roles and freelance engagements with technical teams who care about what they ship.

2024 —
Present
Backend Engineer
Your Company · Full-time
Brief description of what you do here. What systems you own, what you shipped, what impact it had.
Go PostgreSQL Redis Kafka
2022 —
2024
Software Engineer
Previous Company · Full-time
Brief description of what you did here. Focus on backend systems, scale, or interesting problems you solved.
Node.js TypeScript MongoDB AWS
// coming soon

Project showcases are being put together.

← back

Skribl — XState vs State Pattern vs Flat Matrix

Introduction

Skribl is a multiplayer online game where players try to guess the word assigned to the drawer based on their drawings. The score of guessers depends on how fast they guess, and the drawer's score is the average of all the guessers, which incentivizes the drawer to draw better.

To make this multiplayer loop work, the server cycles through a few core game states: After the players have joined, they get into the 'LOBBY' state. When there are 2 or more players, the host can start the game, which takes us to the 'PICK WORD' state. Either the word is picked or a 15-second timer expires, after which a word is chosen by the server for the drawer. After the word is decided, we go into the 'DRAW' state. When the turn timer expires or all players guess correctly, the turn ends. We enter the 'TURN END' state, and the next player is assigned as the drawer. After all the rounds end, we go into the 'GAME END' state from where we can restart the game.

Introduction to state machines

Why even choose to go with a state machine? Why not just use simple if and else blocks scattered all over the place? There are two primary reasons:

  • Preventing impossible states: If we tracked our 5 states using independent boolean flags (isLobby, isDrawing), there would be 25 = 32 possible combinations. 27 of those are invalid states that cause bugs (like being in the lobby and drawing at the same time). A state machine solves this by locking down to exactly 5 valid states.
  • Centralizing game logic: Without a state machine, there is no central authority controlling the core game mechanics. By using one, the whole context of the game lives in one place. In the future, if we have to handle one more state, the number of combinations will explode. Moreover, a future contributor to the code might forget adding an if statement here and there, since there is no one guarding what can and cannot happen with the game.

This is what my original state machine looked like:

// states
type gameState = "LOBBY" | "PICK_WORD" | "DRAW" | "TURN_END" | "GAME_END";

// events
type events =
  | "PICK_TIMER_EXPIRED"
  | "WORD_PICKED"
  | "ALL_GUESSED"
  | "GUESS_TIMER_EXPIRED"
  | "GAME_START"
  | "NEXT_TURN"
  | "ALL_PLAYERS_LEFT"
  | "RESTART"
  | "ALL_ROUNDS_END";

// transitions
// a = {state1: {event1: state2, event2: state3}}
// a[state1][event1] = state2
// a[state1][event2] = state3

const transitions: Record<gameState, Partial<Record<events, gameState>>> = {
  LOBBY: {
    GAME_START: "PICK_WORD",
    ALL_PLAYERS_LEFT: "GAME_END",
    RESTART: "LOBBY",
  },
  PICK_WORD: {
    PICK_TIMER_EXPIRED: "DRAW",
    WORD_PICKED: "DRAW",
    ALL_PLAYERS_LEFT: "GAME_END",
    RESTART: "LOBBY",
  },
  DRAW: {
    GUESS_TIMER_EXPIRED: "TURN_END",
    ALL_GUESSED: "TURN_END",
    ALL_PLAYERS_LEFT: "GAME_END",
    RESTART: "LOBBY",
    NEXT_TURN: "PICK_WORD",
    ALL_ROUNDS_END: "GAME_END",
  },
  TURN_END: {
    NEXT_TURN: "PICK_WORD",
    ALL_ROUNDS_END: "GAME_END",
    ALL_PLAYERS_LEFT: "GAME_END",
    RESTART: "LOBBY",
  },
  GAME_END: {
    RESTART: "LOBBY",
  },
};

In the next section, we'll see why this setup might cause problems and crash the loop.

Errors

As seen in the previous section, we are using Partial as the value for the transitions object. This means doing an access on the transition object (transitions[state][event] = nextState), where there is no event corresponding to the state, will return undefined. Let's see some cases where this might happen:

  • The player whose turn it is, is prompted to choose a word out of 3 to draw. The timer is running 3, 2, 1, 0. The timer expires and PICK_TIMER_EXPIRED is fired. But at the last millisecond, the user chooses a word. Because of network delay, it arrives late. Since PICK_TIMER_EXPIRED was already fired, the game is now in the DRAW state. But now the WORD_PICKED event arrives. Our state machine has to find the final state for transitions["DRAW"]["WORD_PICKED"], which does not exist. This returns undefined and causes an error.
  • The drawer's timer has expired, and the GUESS_TIMER_EXPIRED event is fired, bringing the game state to TURN_END. At the exact same moment, a user writes the correct word in the chat, firing the ALL_GUESSED event. Now our state transition becomes undefined for TURN_END + ALL_GUESSED.
  • A host accidentally clicks the 'start game' button twice. After the first click, the state changes to PICK_WORD. When the second GAME_START event arrives, the transition for transitions["PICK_WORD"]["GAME_START"] is undefined.

These errors can definitely crash our game. There are multiple other state+event combinations that occur because of network latency and the race conditions arising from it. In the next section, we will discuss some potential solutions to better handle this situation.

Final trade-offs

There were two main options other than the Exhaustive Matrix I considered to solve this problem.

  • XState Library: A state machine library which keeps context (variables needed to operate the game like players, scores, timers, rounds, etc.), the state machine itself, side effects, and guards. It acts as a single source of truth rather than having different components of the game live separately and communicate with one another.
    • Let's take the LOBBY state as an example. In XState, it would look like this:
      LOBBY: {
        on: {
          GAME_START: "PICK_WORD",
          ALL_PLAYERS_LEFT: "GAME_END",
          RESTART: "LOBBY",
        },
      }
      As you can see, there is no need to explicitly swallow the events that should not exist. XState just chooses to ignore such events, preventing the game from crashing on an undefined transition occurring because of race conditions and network delays.
    • This looks good, but at the current stage of my game, XState would have brought the following overhead:
      • A new dependency
      • A new mental model with a pre-defined programming pattern (contexts, actors, guards, etc.)
    • XState would have provided me with:
      • Timers embedded into the machine itself (but I only have 1 timer currently)
      • Guards within the machine (but I only have one: >2 players needed to start the game)
      • No boilerplate to handle undefined events (but doing so manually was straightforward)
    • Adding XState will make more sense when I add features like reconnection handling (coming in phase 2) and progressive hints. At that point, we will have 3 timers to manage, more callbacks to wire (and potentially forget), and multiple files to touch for each timer. When things start to go that way, it's a strong signal to move away from the current exhaustive matrix approach.
  • State Pattern: A paradigm where each state has its own object with transitions, onEnter and onExit functions, and guards. Let's again take the LOBBY state as an example. This is how it would be written using the State Pattern:
    const LOBBY = {
      onEnter(room) {
        // things to set up in lobby
      },
      onExit(room) {
        // clean up procedures
      },
      dispatch(event, room) {
        if (event === "GAME_START" && room.players.length >= 2) return PICK_WORD;
        //                               ^ guard lives here inside the state
        if (event === "ALL_PLAYERS_LEFT") return GAME_END;
        if (event === "RESTART") return LOBBY;
        return this; // everything else ignored
      },
    };
    We have a GameRoom class which orchestrates the whole game.
    class GameRoom {
      private currentState = LOBBY;
    
      dispatch(event) {
        const nextState = this.currentState.dispatch(event, this);
    
        if (nextState !== this.currentState) {
          // state changed
          this.currentState.onExit(this);  // clean up old state
          this.currentState = nextState;
          this.currentState.onEnter(this); // set up new state
        }
        // if nextState === this.currentState, nothing happens
      }
    }
    When it receives a WORD_PICKED event while in the LOBBY state, the dispatcher returns this. That makes the nextState !== this.currentState condition false, meaning the if block is skipped, preventing the crash on an undefined state.

    Again, using this pattern makes sense when we have multiple onExit functions for cleanup and onEnter functions for setup. Without the State Pattern, we have to wire these as callbacks at appropriate places. The responsibility is on us to remember to write all the setup and cleanup logic without forgetting anything. The State Pattern keeps these functions inside the state itself, taking that manual wiring responsibility away from us.

What I chose

In its current state, most of my game logic is about states and transitions rather than multiple callbacks, guards, and cleanup. I don't need to worry about multiple timers or dozens of cleanup functions. To avoid adding a dependency and shifting to a different coding paradigm, I chose to go with an exhaustive matrix. I still have to handle the callbacks, guards, and data flowing through the game, but it's manageable because they are few in number and there is only one timer.

I first changed the type of the transitions object from Record<gameState, Partial<Record<events, gameState>>> to Record<gameState, Record<events, gameState | 'IGNORE'>>. This meant each state now has to explicitly define a transition for all events. The combinations where a state+event should not be reachable are set to 'IGNORE'.

const transitions: Record<gameState, Record<events, gameState | "IGNORE">> = {
  LOBBY: {
    GAME_START: "PICK_WORD",
    ALL_PLAYERS_LEFT: "GAME_END",
    RESTART: "LOBBY",
    PICK_TIMER_EXPIRED: "IGNORE",
    WORD_PICKED: "IGNORE",
    ALL_GUESSED: "IGNORE",
    GUESS_TIMER_EXPIRED: "IGNORE",
    NEXT_TURN: "IGNORE",
    ALL_ROUNDS_END: "IGNORE",
  },
  // ... rest of the states
};

This prevents our machine from crashing the game on undefined transitions. Doing this simple thing achieves the purpose of safely handling unreachable states. Although we have to repeat 'IGNORE' multiple times, it's worth the tradeoff.

As I move to the second and final phase of the game, I will implement features like:

  • Reconnection handling
  • Progressive hints
  • Vote to kick
  • Chat muting
  • Custom word list
  • Mid-game joining
  • Host transfer

Vote to kick, chat muting, and custom word lists will still not require me to rethink my current setup much, because they won't require changes to the states themselves, just to the logic handling them. But progressive hints and especially reconnection handling will require me to reconsider viable options like XState, the State Pattern, or another paradigm.

But that's a subject for a future blog post!