Command Palette

Search for a command to run...

Bookshelf
Designing Data-Intensive Applications cover

Bookshelf

Designing Data-Intensive Applications

Martin Kleppmann

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively. Make informed decisions by identifying the strengths and weaknesses of different tools. Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity. Understand the distributed systems research upon which modern databases are built. Peek behind the scenes of major online services, and learn from their architectures

systemsdistributed

Kleppmann's DDIA is the book I keep returning to whenever I have to reason about a system whose state lives in more than one place. It's not a recipe book — it's a vocabulary book.

What stuck with me

  • Replication, partitioning, and consistency are three orthogonal axes, not a single dial. Most production confusion I've seen comes from conflating them.
  • The CAP theorem is overrated as a design tool — the more useful frame is the latency / consistency trade-off you actually pay for under partial failure.
  • Logs are everywhere. Once you see write-ahead logs, replication logs, and event logs as variants of the same idea, a lot of architectures collapse into the same shape.

Who should read it

Anyone who's about to choose a database for a new service and would otherwise pick the one their last team used. Read chapter 1, 5, 7, and 9 first if you're short on time.