What Is Data Modeling? Explained Simply With Examples

Data modeling is the process of deciding how to organize and structure data before you build a system. A good data model makes queries fast, code simple, and your database easy to understand. A bad one creates problems that last years. This guide covers the core concepts with real examples.

3

main model types: conceptual, logical, physical

ER

entity-relationship diagrams — standard design tool

3NF

third normal form — standard normalization target

NoSQL

document vs relational — different trade-offs

1

What Is a Data Model?

Most expensive technical debt

Bad data models are one of the most expensive forms of technical debt in software. Changing a table structure in a production database with millions of rows is painful — requiring migrations, downtime risk, and code changes across the entire application. Getting it right upfront costs hours; fixing it later costs weeks.

A data model defines what data exists, how it's organized, and how pieces relate to each other. Think of it as the blueprint for your database — designed before you write a single SQL CREATE TABLE or Mongoose schema.

2

The Three Levels of Data Models

Conceptual Model

High-level: what entities exist and how they relate. Audience: business stakeholders. No technical details — just boxes and relationship arrows. Example: "Customers place Orders; Orders contain Products."

Logical Model

More detail: attributes, data types, relationships (1:1, 1:N, N:M). Audience: architects and developers. Still database-agnostic — works for SQL or NoSQL. Example: User has email (string), created_at (datetime), orders (1:N).

Physical Model

Implementation-specific: actual table names, column types, indexes, constraints, partitioning. Audience: database engineers. Ready to generate DDL (CREATE TABLE statements). Example: users table with BIGINT PK, VARCHAR(255) UNIQUE for email, BTREE index on created_at.

3

Entity-Relationship (ER) Modeling

ER modeling is the standard way to design relational databases. You identify entities (things), attributes (properties), and relationships (how things connect). Every ER model maps directly to SQL tables.

sqlER Model → SQL Schema — E-commerce Example
-- Entity: User
CREATE TABLE users (
  id         BIGINT PRIMARY KEY AUTO_INCREMENT,
  email      VARCHAR(255) UNIQUE NOT NULL,
  name       VARCHAR(100) NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Entity: Product
CREATE TABLE products (
  id          BIGINT PRIMARY KEY AUTO_INCREMENT,
  name        VARCHAR(255) NOT NULL,
  price       DECIMAL(10,2) NOT NULL,
  category_id BIGINT REFERENCES categories(id),
  stock       INT NOT NULL DEFAULT 0
);

-- Entity: Order (bridges users and products — 1:N from users, M:N with products)
CREATE TABLE orders (
  id         BIGINT PRIMARY KEY AUTO_INCREMENT,
  user_id    BIGINT NOT NULL REFERENCES users(id),
  status     ENUM('pending','paid','shipped','delivered') DEFAULT 'pending',
  total      DECIMAL(10,2) NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Junction table: Many-to-many (orders ↔ products)
-- One order has many products; one product appears in many orders
CREATE TABLE order_items (
  order_id   BIGINT REFERENCES orders(id),
  product_id BIGINT REFERENCES products(id),
  quantity   INT NOT NULL,
  unit_price DECIMAL(10,2) NOT NULL,  -- snapshot price at time of purchase
  PRIMARY KEY (order_id, product_id)
);
4

Relationships: 1:1, 1:N, M:N

ItemRelationship TypeExample + Implementation
One-to-One (1:1)User has one ProfileForeign key in profile table: profile.user_id UNIQUE (unique FK enforces 1:1)
One-to-Many (1:N)User has many OrdersForeign key in child: orders.user_id references users.id (most common relationship)
Many-to-Many (M:N)Orders contain many Products; Products appear in many OrdersJunction/bridge table: order_items(order_id, product_id) with composite PK
Self-referentialCategory has sub-categoriescategories.parent_id REFERENCES categories.id (nullable for root categories)
5

Normalization — Eliminating Redundancy

Normalization is the process of structuring a relational database to reduce data redundancy and improve integrity. The three normal forms every developer needs to know:

1

First Normal Form (1NF) — atomic values, unique rows

Each column contains atomic (indivisible) values — no arrays or comma-separated lists stored in a single field. Each row must be unique. Bad: tags = "javascript,react,node" in one column. Good: separate tags table with a junction table linking posts to tags.

2

Second Normal Form (2NF) — full functional dependency

1NF + every non-key column depends on the ENTIRE primary key (eliminates partial dependencies in composite-key tables). Bad: order_items table with product_name column — product_name depends only on product_id, not the full (order_id, product_id) composite key. Good: join the products table to get the name.

3

Third Normal Form (3NF) — no transitive dependencies

2NF + no non-key column depends on another non-key column (eliminates transitive dependencies). Bad: users table with city, state, zip — zip determines city and state, so city/state transitively depend on zip through zip, not directly on user_id. Good: separate zip_codes table with (zip, city, state). Join when needed.

Don't over-normalize for read-heavy workloads

3NF is the standard target for OLTP systems. Over-normalizing (4NF, 5NF) creates too many JOINs and slows down read queries significantly. Sometimes deliberate denormalization — storing redundant data for faster reads — is the right trade-off. Analytics (OLAP) systems actively denormalize into star schemas to avoid JOINs during aggregation queries.
6

Relational vs Document Data Models

ItemRelational (SQL)Document (MongoDB, DynamoDB)
Data structureTables with fixed schemas — every row has same columnsFlexible JSON documents — each document can have different fields
RelationshipsJOINs between normalized tablesEmbedded sub-documents or referenced IDs
Schema changesALTER TABLE — migrations needed, risky at scaleNo schema enforcement — add/remove fields freely (flexible but risky without validation)
Query powerComplex JOINs, aggregations, window functions, analyticsSimple lookups very fast; multi-collection JOINs much harder
TransactionsACID transactions across tables — strong consistencySingle-document atomic; multi-document transactions are slower and limited
Best forRelational data, financial systems, reporting, analyticsHierarchical/nested data, flexible schemas, high write throughput, event logs
7

Practical: Designing a Blog Data Model

sqlBlog Database Schema — Complete Example
-- Users
CREATE TABLE users (
  id       BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  username VARCHAR(50) UNIQUE NOT NULL,
  email    VARCHAR(255) UNIQUE NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Posts
CREATE TABLE posts (
  id           BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  author_id    BIGINT NOT NULL REFERENCES users(id),
  title        VARCHAR(500) NOT NULL,
  slug         VARCHAR(500) UNIQUE NOT NULL,  -- URL-friendly identifier
  body         TEXT,
  status       TEXT CHECK(status IN ('draft','published','archived')) DEFAULT 'draft',
  published_at TIMESTAMP,
  created_at   TIMESTAMP DEFAULT NOW()
);

-- Tags (many-to-many with posts)
CREATE TABLE tags (
  id   BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name VARCHAR(100) UNIQUE NOT NULL,
  slug VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE post_tags (
  post_id BIGINT REFERENCES posts(id) ON DELETE CASCADE,
  tag_id  BIGINT REFERENCES tags(id) ON DELETE CASCADE,
  PRIMARY KEY (post_id, tag_id)
);

-- Comments (self-referential for threaded replies)
CREATE TABLE comments (
  id         BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  post_id    BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
  author_id  BIGINT REFERENCES users(id),
  parent_id  BIGINT REFERENCES comments(id),  -- NULL = top-level comment
  body       TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Performance indexes for common access patterns
CREATE INDEX idx_posts_author    ON posts(author_id);
CREATE INDEX idx_posts_published ON posts(published_at) WHERE status = 'published';
CREATE INDEX idx_posts_slug      ON posts(slug);
CREATE INDEX idx_comments_post   ON comments(post_id);
CREATE INDEX idx_comments_parent ON comments(parent_id);
8

Common Data Modeling Mistakes

Storing arrays in a single column

Putting comma-separated values (tags="react,node,sql") in one column violates 1NF and makes querying, filtering, and indexing impossible. Always use a junction table for multi-value attributes.

Using strings for foreign keys

Storing username or email as foreign keys instead of numeric IDs causes bloat (VARCHAR vs BIGINT in every child row), makes renaming impossible, and is significantly slower to JOIN. Always use surrogate numeric IDs as PKs.

One giant table for everything

Combining unrelated concepts into one mega-table with nullable columns for different entity types (type="user"|"admin"|"bot") is the EAV anti-pattern. Each distinct entity should be its own table.

Missing indexes on foreign keys

PostgreSQL and MySQL do not automatically index foreign key columns. Every FK that you JOIN or filter on needs an explicit index. Forgetting indexes on high-traffic FK columns causes full table scans that kill performance at scale.

Frequently Asked Questions