Skip to main content
Background Image
  1. Database Guru/

7 Databases in 7 Weeks (2025)

·2114 words·10 mins· ·
Ruohang Feng
Author
Ruohang Feng
Pigsty Founder, @Vonng
Table of Contents

Author: Matt Blewitt, Original: 7 Databases in 7 Weeks (2025)

Translator: Feng Ruohang, database veteran, cloud computing mudslide

https://matt.blwt.io/post/7-databases-in-7-weeks-for-2025/

For a long time, I’ve been running Databases-as-a-Service, and there’s always something new to keep up with in this field — new technologies, different approaches to solving problems, not to mention the constant stream of research coming out of universities. Looking ahead to 2025, consider spending a week diving deep into each of the following database technologies.

A line drawing of a bookshelf, with the books labelled for each database covered - PostgreSQL, SQLite, DuckDB, ClickHouse, FoundationDB, TigerBeetle and CockroachDB


Foreword
#

This isn’t a “7 Best Databases” type of article, nor is it laying groundwork for a menu-style list of books — these are simply seven databases I think are worth spending about a week seriously studying. You might ask, “Why not Neo4j, MongoDB, MySQL/Vitess, or other databases?” The answer is mostly: I don’t find them interesting. Also, I won’t be covering Kafka or other similar streaming data services — they’re definitely worth your time to learn, but they’re outside the scope of this article.


Table of Contents
#

  1. PostgreSQL
  2. SQLite
  3. DuckDB
  4. ClickHouse
  5. FoundationDB
  6. TigerBeetle
  7. CockroachDB
  8. Wrap-up

1. PostgreSQL
#

The Default Database
#

“Use Postgres for everything” has almost become a meme, and for good reason. PostgreSQL is the pinnacle of boring technology, and should be your go-to choice when you need a client-server model database. PG follows ACID principles, has rich replication methods — including both physical and logical replication — and enjoys excellent support across all major vendors.

However, my favorite PostgreSQL feature is extensions. In this regard, Postgres demonstrates a vitality that other databases struggle to match. Almost any functionality you want has a corresponding extension — AGE supports graph data structures and Cypher query language, TimescaleDB supports time-series workloads, Hydra Columnar provides an alternative columnar storage engine, and so on. If you’re interested in trying this yourself, I recently wrote an article about building extensions.

Because of this, Postgres shines as an excellent “default” database, and we’re seeing more and more non-Postgres services using the Postgres wire protocol as a common layer-7 protocol to provide client compatibility. With a rich ecosystem, sensible defaults, and even the ability to run in browsers with Wasm, this makes it a database worth understanding deeply.

Spend a week exploring the various possibilities of Postgres, while also understanding some of its limitations — MVCC can be somewhat temperamental. Implement a simple CRUD application in your favorite programming language, or even try building a Postgres extension.


2. SQLite
#

The Local-First Database
#

Moving away from the client-server model, we detour into “embedded” databases, starting with SQLite. I call it the “local-first” database because SQLite databases coexist directly with applications. A more famous example is WhatsApp, which stores chat records as local SQLite databases on devices. Signal does the same.

Beyond this, we’re starting to see more innovative uses of SQLite, not just as a local ACID database. Tools like Litestream provide streaming backup capabilities, LiteFS provides distributed access capabilities, allowing us to design more interesting topological architectures. Extensions like CR-SQLite allow the use of CRDTs to avoid conflict resolution when merging changesets, as exemplified by Corrosion.

Thanks to Ruby on Rails 8.0, SQLite is also experiencing a small renaissance — 37signals fully invested in SQLite, building a series of Rails modules like Solid Queue, and configuring Rails through database.yml to operate multiple SQLite databases. Bluesky uses SQLite as personal data servers — each user has their own SQLite database.

Spend a week using SQLite, exploring local-first architecture, and you might even research whether you can migrate from a Postgres client-server model to a SQLite-only pattern.


3. DuckDB
#

The Universal Query Database
#

Next is another embedded database, DuckDB. Like SQLite, DuckDB aims to be an in-process database system, but focuses more on Online Analytical Processing (OLAP) rather than Online Transaction Processing (OLTP).

DuckDB’s highlight is as a “universal query” database, using SQL as the preferred dialect. It can natively import data from CSV, TSV, JSON, and even formats like Parquet — check out DuckDB’s data sources list! This gives it tremendous flexibility — take a look at this example of querying Bluesky’s firehose.

Like Postgres, DuckDB also has extensions, though the ecosystem isn’t as rich — after all, DuckDB is relatively young. Many community-contributed extensions can be found in the community extensions list, and I particularly like gsheets.

Spend a week using DuckDB for data analysis and processing — whether through Python Notebooks, tools like Evidence, or even see how it combines with SQLite’s “local-first” approach, offloading analytical queries from SQLite databases to DuckDB, since DuckDB can also read SQLite data.


4. ClickHouse
#

The Columnar Database
#

Leaving the embedded database realm but continuing in the analytical space, we encounter ClickHouse. If I could only choose two databases, I’d be very happy using just Postgres and ClickHouse — the former for OLTP, the latter for OLAP.

ClickHouse focuses on analytical workloads and supports very high ingestion rates through horizontal scaling and sharded storage. It also supports tiered storage, allowing you to separate “hot” and “cold” data — GitLab has quite detailed documentation on this.

ClickHouse has advantages when you need to run analytical queries on large datasets that DuckDB can’t handle, or when you need “real-time” analytics. There’s been a lot of “Benchmarketing” around these datasets, so I won’t elaborate further.

Another reason I recommend learning ClickHouse is its excellent operational experience — deployment, scaling, backup, etc. all have detailed documentation — even including setting up appropriate CPU Governors.

Spend a week exploring larger analytical datasets, or converting the DuckDB analysis above to ClickHouse deployment. ClickHouse also has an embedded version — chDB — which can provide more direct comparisons.


5. FoundationDB
#

The Layered Database
#

Now we enter the “mind-bending” section of this list, with FoundationDB taking the stage. You could say FoundationDB isn’t a database, but rather the foundation component of databases. Used in production by companies like Apple, Snowflake, and Tigris Data, FoundationDB is worth your time because it’s quite unique in the key-value storage world.

Yes, it’s an ordered key-value store, but that’s not what makes it interesting. At first glance, it has some peculiar limitations — for example, transactions cannot affect more than 10MB of data, transactions must complete within five seconds of their first read. But as they say, constraints liberate us. By imposing these limitations, it can achieve complete ACID transactions at very large scales — I know of clusters running over 100 TiB.

FoundationDB is designed for specific workloads and has been extensively tested using simulation methods. This testing approach has been adopted by other technologies, including another database on this list and by Antithesis, founded by some former FoundationDB members. For more on this, see related notes from Tyler Neely and PhilEaton.

As mentioned, FoundationDB has some very specific semantics that take time to adapt to — their features documentation and anti-features (functionality they don’t intend to provide) are worth understanding to grasp the problems they’re trying to solve.

But why is it a “layered” database? Because it proposes the concept of layers, rather than coupling storage engines with data models, they designed a storage engine flexible enough to remap its functionality to different layers. Tigris Data has an excellent article about building such layers, and the FoundationDB organization has some examples like the Record Layer and Document Layer.

Spend a week going through the tutorials, thinking about how to use FoundationDB as a replacement for databases like RocksDB. Maybe look at some design patterns and read the paper.


6. TigerBeetle
#

The Extremely Correct Database
#

Following deterministic simulation testing, TigerBeetle breaks from previous database patterns because it explicitly states it’s not a general-purpose database — it’s completely focused on financial transaction scenarios.

Why is it worth looking at? Single-purpose databases are rare, and databases as obsessed with correctness as TigerBeetle are even rarer, especially considering it’s open source. They incorporate everything from NASA’s Power of 10 and protocol-aware recovery to strict serializability and Direct I/O to avoid kernel page cache issues — it’s all very impressive. Check out their safety documentation and their programming methodology called Tiger Style!

Another interesting point is that TigerBeetle is written in Zig — a relatively new systems programming language, but apparently very aligned with the TigerBeetle team’s goals.

Spend a week modeling your financial accounts in a locally deployed TigerBeetle — follow the quick start and look at the system architecture documentation to understand how to combine it with the more general-purpose databases mentioned above.


7. CockroachDB
#

The Globally Distributed Database
#

Finally, we return to where we started. In the last position, I was a bit torn. My initial thought was Valkey, but FoundationDB already covered the key-value storage need. I also considered graph databases, or databases like ScyllaDB or Cassandra. I also considered DynamoDB, but the inability to run it locally/freely discouraged me.

Ultimately, I decided to end with a globally distributed database — CockroachDB. It’s compatible with the Postgres wire protocol and inherits some of the interesting features discussed earlier — large-scale horizontal scaling, strong consistency — while having some interesting features of its own.

CockroachDB achieves database scaling across multiple geographic regions, with a niche overlapping Google’s Spanner system. However, Spanner relies on atomic clocks and GPS clocks for extremely precise time synchronization, but ordinary hardware doesn’t have such luxury configurations. Therefore, CockroachDB has some clever solutions, dealing with NTP clock synchronization delays through retries or delayed reads. Nodes also compare clock drift and terminate membership if it exceeds maximum offset.

Another interesting feature of CockroachDB is how it uses multi-region configuration, including table localities, providing different options based on your desired read-write trade-offs. Spend a week reimplementing the movr example in your language and framework of choice.


Summary
#

We’ve explored many different databases, all used in production by some of the world’s largest companies. Hopefully, this exposes you to some technologies you weren’t familiar with before. Armed with this knowledge, go solve interesting problems!


Feng’s Comments
#

In 2013, there was a book called “Seven Databases in Seven Weeks.” That book introduced 7 “new (or reborn)” database technologies of the time, leaving an impression on me. Twelve years later, this series is getting updated again.

Looking back at the seven databases from that year, except for the original “hammer” PostgreSQL which is still around, all the other databases have changed completely. And PostgreSQL has evolved from a “hammer” to the “king of boring databases” — becoming the “default database” that won’t flip over.

The databases on this list are basically all ones I’ve practiced with or am interested in/have good feelings about. Except for ClickHouse — CK is good, but I think DuckDB and its combination with PostgreSQL has the potential to overturn CK, plus it’s MySQL protocol compatible ecosystem, so I really have no interest in it. If I were to design this list, I’d probably replace CK with either Supabase or Neon.

I think the author has very precisely grasped the trends in database technology development, and I highly agree with his choice of database technologies. Actually, among these seven databases, I’ve already deeply explored three of them. Pigsty itself is a high-availability PostgreSQL distribution that also integrates DuckDB, as well as DuckDB-grafted PG extensions. I’ve also made RPM/DEB packages for TigerBeetle as a dedicated financial transaction database for default download in the professional edition.

The other two databases are on my integration TODO list: for SQLite, besides FDW, the next step is to integrate ElectricSQL; providing sync capabilities between local PG and remote SQLite/PGLite; CockroachDB has always been on my TODO list, ready to add deployment support whenever I have spare time. FoundationDB is an object of my interest, and the next database I’m willing to spend time deeply researching will likely be this one.

Overall, I believe these technologies represent cutting-edge development trends in the field. If I were to envision the landscape ten years from now, it would probably look like this: FoundationDB, TigerBeetle, and CockroachDB will have their own niche ecosystem positions. DuckDB will likely shine in the analytical field, SQLite will continue to conquer territory on the local-first client side, and PostgreSQL will evolve from the “default database” to the ubiquitous “Linux kernel” of the database world. The main theme of the database field will become a battlefield of PostgreSQL distribution competition between Neon, Supabase, Vercel, RDS, and Pigsty.

After all, PostgreSQL devouring the database world isn’t just talk — PostgreSQL ecosystem companies have taken almost all the money in the database field’s capital market these past two years, with countless real money already voting with their feet by betting on it. Of course, how the future actually unfolds, let’s wait and see.

Related

StackOverflow 2024 Survey: PostgreSQL Has Gone Completely Berserk
·659 words·4 mins
The 2024 StackOverflow Global Developer Survey results are fresh out, and PostgreSQL has become the most popular, most loved, and most wanted database globally for the second consecutive year. Nothing can stop PostgreSQL from devouring the entire database world anymore!
Why PostgreSQL is the Bedrock for the Future of Data
·2719 words·13 mins
Today, one of the biggest trends in software development is PostgreSQL becoming the de facto database standard. There have already been blogs about using PostgreSQL for everything, but until now, there haven’t been many articles explaining the reasons behind this phenomenon (and more importantly, why it matters).
Self-Hosting Supabase on PostgreSQL
·5277 words·25 mins
Supabase is great, own your own Supabase is even better. A tutorial for self-hosting production-grade supabase on local/cloud/ VM/BMs.
Modern Hardware for Future Databases
·2964 words·14 mins
This article is a comprehensive review of how hardware developments affect database design, introducing key hardware advances in networking, storage, and computing.
Don't Upgrade! Released and Immediately Pulled - Even PostgreSQL Isn't Immune to Epic Fails
·1849 words·9 mins
Never deploy on Friday, or you’ll be working all weekend! PostgreSQL minor releases were pulled on the day of release, requiring emergency rollback.
PostgreSQL 12 End-of-Life, PG 17 Takes the Throne
·1609 words·8 mins
PG17 achieved extension ecosystem adaptation in half the time of PG16, with 300 available extensions ready for production use. PG 12 officially exits support lifecycle.