7 SaaS Database Schema Mistakes to Avoid | Coding Capybaras

The SaaS database schema mistakes that surface 12 months in: floats for money, naive timestamps, hard deletes, missing indexes, and how to avoid each one.

· Justin Boggs

Neatly organized yellow and green network cables connected to a server panel

Photo by Albert Stoynov on Unsplash

The SaaS database schema mistakes that hurt most are the ones that stay invisible for a year: storing money as floats, using timestamps without time zones, hard-deleting data you'll need back, skipping indexes on foreign keys, slapping arbitrary length limits on text columns, having no naming conventions, and picking an ID strategy by accident. None of these break on day one — that's exactly the problem. They surface at month twelve, when you have real customers and real data, and the fix requires a migration you're now afraid to run. This post covers all seven patterns and the boring, proven alternative to each.

TL;DR

  • Store money as integer cents (like Stripe does) or numeric — never floats, never Postgres's money type.
  • Always timestamptz, never timestamp. Always text, rarely varchar(n).
  • Add deleted_at soft deletes to customer-facing tables before you need them.
  • Index every foreign key. Postgres doesn't do it for you.
  • Pick naming conventions and an ID strategy on day one — both are near-impossible to retrofit.

How should you store money in a SaaS database?

Store money as an integer count of the smallest currency unit — cents for USD — or as a numeric column. Never as a float, and never as PostgreSQL's built-in money type.

Floats are the classic trap. 0.1 + 0.2 in floating-point math is 0.30000000000000004, and once you're summing thousands of transactions for a revenue report, those errors stop being cute. You'll get a dashboard that disagrees with Stripe by a few cents, spend an afternoon hunting the discrepancy, and discover it's baked into your column type.

The integer-cents approach has a strong endorsement: it's how Stripe's own API works. Stripe represents every amount as a positive integer in the currency's smallest unit — 1000 means $10.00. If your amount column is integer cents, every value that arrives in a webhook drops straight into your database with no conversion and no rounding step where a bug can live. Given how much webhook code you'll write anyway (I catalogued that pain in Stripe webhook hell), removing a conversion layer is worth a lot.

The money type deserves its own warning because it sounds correct. The PostgreSQL wiki's "Don't Do This" page — an official list of footguns maintained by the Postgres community, worth reading in full — explains that money doesn't store a currency with the value. It assumes whatever the database locale says. Change the lc_monetary setting and a stored $10.00 can come back as ¥1,000. It also can't represent fractions of a cent, which matters the moment you do proration.

If you handle multiple currencies, store the currency code in an adjacent column (amount_cents integer, currency text). That's the shape Stripe hands you anyway.

Why does timestamp without time zone burn you?

Postgres has two timestamp types, and the default-sounding one is the wrong one. timestamp (without time zone) stores a wall-clock reading — a picture of a calendar and a clock, with no information about where. timestamptz stores an actual point in time, normalized internally, displayed in whatever zone you ask for.

The failure mode is slow and quiet. Your server writes created_at in UTC. A cron job written later assumes local time. A customer in Sydney files a support ticket about an invoice dated tomorrow. Daylight saving time arrives and a "24 hours later" calculation is silently off by one hour. The PostgreSQL wiki is unambiguous: use timestamptz, even for storing UTC, because with plain timestamp the database has no way to know UTC was your intent — and every time-zone calculation you write afterward pays for it.

This mistake is a favorite of AI-assisted development, which is why I flag it for this audience specifically. If you prompt an assistant for "a users table with a created_at column," you'll frequently get timestamp in the output — the training data is full of it. It's a perfect example of the review-what-you-ship principle from when to trust your AI assistant: the AI isn't wrong exactly, it's defaulting to a pattern that works until it doesn't.

The rule fits in one line of your project's instructions file: every timestamp column is timestamptz, with default now(). Set it once and this entire class of bug never enters your codebase.

What happens when you hard delete customer data?

A hard delete removes the row; a soft delete sets a deleted_at timestamptz column and leaves the row in place. Twelve months in, the difference is a customer email that reads "I deleted my account by mistake, can you restore it?" — and whether your answer is "sure, one minute" or "no."

It's not just account recovery. Delete a customer row with a hard delete and you face a cascade problem: their invoices, their usage records, the audit trail your accountant will want — either those go too, or you have orphaned rows pointing at nothing. Your revenue reports for past months should not change because a customer left in March. With soft deletes, historical data stays coherent; queries just filter to active rows.

The implementation is one column and one habit:

ALTER TABLE app_projects ADD COLUMN deleted_at timestamptz;
CREATE INDEX idx_app_projects_active ON app_projects (id) WHERE deleted_at IS NULL;

That partial index keeps active-row lookups fast even as deleted rows accumulate. The habit is remembering WHERE deleted_at IS NULL in queries — a view over active rows, or a query helper, makes it automatic.

Honest tradeoffs: soft deletes complicate unique constraints (a deleted user's email may need to be re-registerable — partial unique indexes solve this), and privacy law is a real exception. When someone exercises a GDPR erasure right, "we kept everything with a flag on it" is not compliance; you need actual deletion or anonymization for personal data. My rule: soft deletes on things customers create and might want back; a documented hard-delete path for personal data. Decide per table, on purpose, at design time — the mistake isn't choosing hard deletes, it's never making the choice.

Which schema mistakes are invisible until you have real traffic?

Two mistakes in this list produce zero symptoms until the data grows: missing foreign key indexes and unindexed query patterns.

Here's the trap: PostgreSQL automatically indexes primary keys and unique constraints, but it does not index foreign keys. Every user_id column referencing your users table is unindexed unless you indexed it. At 500 rows nothing is measurably slow, so nothing prompts you to check. At a few hundred thousand rows, "show this customer's invoices" is scanning the whole invoices table, your dashboard takes four seconds to load, and you're learning what EXPLAIN ANALYZE is during a support fire.

The fix costs one line at table-creation time — CREATE INDEX ON app_invoices (user_id); — and every column you JOIN on or filter by in a hot query deserves the same look. Don't swing to the other extreme either: each index slows writes and takes disk, so indexing every column "to be safe" is its own mistake. Foreign keys, frequent WHERE columns, frequent ORDER BY columns. That covers an indie SaaS for a long time.

While we're on quiet capacity problems, here's the type-choice table for the columns founders get wrong most, following the PostgreSQL wiki's guidance:

| What you're storing | Tempting choice | Use instead | Why | | --- | --- | --- | --- | | Money | float / money | integer cents or numeric | Rounding errors; locale-dependent money | | Timestamps | timestamp | timestamptz | Time zone math breaks silently | | Text | varchar(255) | text (+ check constraint if needed) | Arbitrary limits fail on real names | | Auto IDs | serial | identity column | serial has messy permission/dependency behavior | | Fixed-length codes | char(3) | text + CHECK(length(...)) | char(n) pads with spaces, rejects nothing |

The varchar(255) row is the one I'd underline. That number is cargo-culted from old MySQL conventions; Postgres gains no performance from a length limit, and the limit fails exactly when a real human with a long name signs up. Use text, and if a field truly needs constraining, use a check constraint that says what you actually mean.

How do you pick an ID strategy (and does it leak your metrics)?

Sequential integer IDs are the default in most tutorials, and they come with a side effect nobody mentions: they're public metrics. If your URLs expose /invoice/847, anyone can read your invoice volume off your URL structure, sign up twice a month apart, and compute your growth rate. Competitors do this.

Your realistic options as an indie founder:

  • Sequential integers (identity columns). Fast, small, great for internal tables nobody sees. Note the table above: modern Postgres prefers GENERATED ALWAYS AS IDENTITY over the older serial — same behavior, fewer sharp edges.
  • UUIDs. Unguessable and safe to expose, at the cost of being bigger and unsortable-by-creation. Random UUIDs also fragment index locality, which is a real (if modest) write cost at scale.
  • Both. Integer primary key internally, a UUID or random public ID in a separate column for anything that appears in a URL or API response.

For a solo founder I think the honest answer is: this matters less than the other six mistakes, but it's the hardest to change later because IDs leak into URLs, customer bookmarks, webhook payloads, and third-party systems. Migrating a primary key strategy at month twelve touches everything. Pick deliberately at the start — even "integers inside, UUIDs at the edges" scribbled in a README beats deciding by tutorial default. The Coding Capybaras stack post covers how the boilerplate's schema handles this if you want a concrete reference.

Why do naming conventions matter before table thirty?

Naming feels like the least important item on this list, which is exactly how it earns its spot. It's in the same "impossible to retrofit" category as IDs: renaming thirty tables under live traffic, with queries, migrations, and dashboards all referencing the old names, is nobody's idea of a fun weekend.

Two rules cover most of it. First, snake_case everywhere. Postgres folds unquoted names to lowercase, so a camelCase column name condemns you to double-quoting it in every query, forever — and to a confusing class of "column does not exist" errors when you (or your AI assistant) forget. The PostgreSQL wiki lists uppercase identifiers among its don'ts for precisely this reason. Stick to lowercase letters, digits, and underscores and the problem never exists.

Second, decide how you'll group tables before the count grows. In the boilerplate I ship, infrastructure tables are prefixed platform_ and product tables are prefixed app_, and product code is forbidden from reaching into platform_ tables directly. That boundary does real work: when an AI assistant generates a query, the prefix makes it immediately obvious whether it's touching the plumbing or the product. You don't need my convention — you need a convention, written down, from day one. A prefix scheme, a schema-per-domain split, whatever fits; the failure mode is having none and discovering at month twelve that users, user_data, accounts, and customers all exist and nobody remembers which one is real.

The written-down part matters more with AI-assisted development, not less. Your conventions belong in your project's instructions file (CLAUDE.md or equivalent), because an assistant that knows "all product tables are prefixed app_, all timestamps are timestamptz" generates correct schema on the first try instead of the most statistically common schema. One paragraph of conventions is the cheapest schema-quality tool you have.

If you're still choosing where this database lives at all, my Supabase vs Firebase comparison is the place to start; everything in this post assumes you ended up on Postgres, which Supabase gives you.

Frequently asked questions

Should I use numeric or integer cents for money?

Integer cents if Stripe is your payment processor — your database then speaks the same units as every webhook and API response, per Stripe's amount format. numeric works too and reads more naturally in ad-hoc queries; it's the better fit if you need sub-cent precision for things like per-unit usage pricing.

Do I really need soft deletes on every table?

No — on tables holding things customers create and might want back (accounts, projects, documents). Log-style and derived tables can hard delete. And personal data covered by GDPR-style erasure requests needs a genuine deletion or anonymization path regardless of your soft-delete design.

Are UUIDs slower than integer IDs?

Somewhat: they're 16 bytes versus 4-8, and random UUIDs insert into indexes in random order, which costs some write performance and cache efficiency. At indie-SaaS scale you will likely never notice. The security benefit of unguessable public IDs usually outweighs the cost for anything URL-visible.

Can I fix these mistakes after launch?

Mostly yes, with increasing pain: adding indexes and deleted_at columns is easy at any time; converting timestamp to timestamptz requires knowing what zone your old data was written in; converting float money to integers means auditing accumulated rounding errors; changing ID strategy touches every URL and integration. That ordering is roughly why this list is worth reading before launch instead of after.

My AI assistant generated my schema. How do I check it?

Ask it to review its own output against this post's table: money columns, timestamp types, text types, foreign key indexes, ID strategy. Then ask it to explain any place it disagrees. AI assistants reproduce the most common patterns in their training data, and several of these mistakes (floats for money, naive timestamps, varchar(255)) are extremely common patterns.

Does Supabase protect me from any of this?

Supabase gives you real Postgres, so all of this applies as-is. Its table editor defaults are reasonable (it suggests timestamptz and identity columns), and row level security helps with tenant isolation — but nothing stops you from creating a float money column. The defaults help; they don't decide for you.

Conclusion

Every one of these SaaS database schema mistakes shares a shape: cheap to prevent, expensive to repair, and invisible in the demo. Integer cents or numeric for money. timestamptz for every timestamp. text over varchar(255). Soft deletes where customers might want things back. An index on every foreign key. snake_case names and a deliberate ID strategy, chosen before the first migration runs. None of this requires deep database expertise — it requires knowing the seven traps and spending one deliberate hour at the start instead of a fearful week at month twelve. The billing math that sits on top of these columns is its own topic, and I've broken that down in subscription billing math — but the math is only trustworthy if the columns underneath it are.

That hour is also exactly the kind of thing a boilerplate should spend for you. The schema that ships with Coding Capybaras has these patterns already in place — typed money columns, timestamptz throughout, prefixed table names — so the mistakes in this post are ones you'd have to opt into.