Apache Kafka Applications — Real-World Use Cases and Patterns

Apache Kafka powers the real-time data backbone of companies like LinkedIn, Netflix, Uber, and Airbnb. This guide covers Kafka's most important real-world applications — event streaming, microservices communication, real-time analytics, CDC, and log aggregation — with architecture patterns and code examples.

1M+

messages per second per broker

80%

of Fortune 100 companies use Kafka

Sub-10ms

end-to-end latency achievable

Petabytes

of data stored in Kafka at LinkedIn daily

1

What Kafka Is Best At

Kafka is a distributed log, not a queue

Kafka is a distributed log, not a message queue. Messages are stored durably on disk and can be replayed at any time. Multiple independent consumers read the same data without affecting each other. This makes Kafka ideal for event sourcing, audit logs, real-time analytics, and decoupled microservices.

ItemKafkaTraditional Message Queue (RabbitMQ)
Message retentionConfigurable (days, weeks, forever)Deleted after consumed
Multiple consumersAll consumers read same data independentlyEach message consumed by one consumer
Message replayConsumers can reset offset and replayNot possible — message is gone
ThroughputMillions of messages/second per brokerTens of thousands per second
Ordering guaranteePer partition (use same key for ordering)Per queue
Best forEvent streaming, analytics, audit logs, CDCTask queues, complex routing, RPC patterns
2

Application 1: Real-Time Event Streaming

1

User Action Published

User clicks, purchases, searches, or adds to cart. The microservice handling the action publishes a structured event to the relevant Kafka topic. Events are small JSON payloads with a user ID as key.

2

Multi-Consumer Fan-Out

Analytics, ML recommendations, personalization, and fraud detection each subscribe independently to the purchases topic. Each consumer group gets every message — no coordination needed.

3

Real-Time Processing

Kafka Streams or Apache Flink processes events in real-time: aggregating sales per product, detecting anomalies, building recommendation signals. Zero batch jobs needed.

4

Output to Multiple Systems

Processed results flow to: dashboards (via Elasticsearch), recommendation APIs (via Redis), fraud alerts (via PagerDuty), and data warehouse (via S3).

javaPublishing Events to Kafka — Java Producer
// Producer — publish user purchase event to Kafka
Properties props = new Properties();
props.put("bootstrap.servers", "kafka:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

// Enable idempotence — exactly-once delivery guarantee
props.put("enable.idempotence", "true");
props.put("acks", "all");  // wait for all replicas to confirm

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

String event = """
  {
    "event": "purchase",
    "userId": "user-123",
    "productId": "prod-456",
    "amount": 49.99,
    "timestamp": "2026-03-15T10:30:00Z"
  }
""";

// userId as partition key — all events for same user go to same partition
// This guarantees ordering of events per user
producer.send(
    new ProducerRecord<>("purchases", "user-123", event),
    (metadata, exception) -> {
        if (exception != null) log.error("Failed to send event", exception);
        else log.info("Sent to partition {} offset {}", metadata.partition(), metadata.offset());
    }
);
3

Application 2: Microservices Decoupling

Kafka as the event bus between microservices

Instead of microservices calling each other directly (tight coupling, cascading failures), they publish events to Kafka topics and subscribe to topics they care about. Each service can be deployed, scaled, and upgraded independently. When Order Service is down, other services keep running — they process missed events when it recovers.

Order Service

Publishes "order.created" events to Kafka when orders are placed. Does not know which services consume it — pure publish/subscribe decoupling.

Payment Service

Subscribes to "order.created". Processes payment and publishes "payment.completed" or "payment.failed" events back to Kafka.

Inventory Service

Subscribes to "payment.completed". Decrements inventory and publishes "inventory.updated" events. Triggers reorder if stock falls below threshold.

Notification Service

Subscribes to "order.created", "payment.completed", "order.shipped". Sends emails/SMS at each step. Completely decoupled from other services.

Analytics Consumer

Subscribes to all topics. Builds real-time dashboards, funnel analysis, and business intelligence. Adding analytics never impacts service performance.

Fraud Detection

Subscribes to "purchase" events. Analyzes patterns in real-time. Can publish "fraud.alert" events to pause suspicious accounts — other services react accordingly.

4

Application 3: Real-Time Analytics Pipeline

javaKafka Streams — Real-Time Sales Aggregation
StreamsBuilder builder = new StreamsBuilder();

KStream<String, Purchase> purchases = builder.stream("purchases");

// Count purchases per product in 5-minute tumbling windows
KTable<Windowed<String>, Long> productCounts = purchases
    .groupBy((key, value) -> value.getProductId())
    .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
    .count();

// Alert when a product receives 100+ purchases in 5 minutes (trending)
productCounts
    .filter((windowedKey, count) -> count > 100)
    .toStream()
    .mapValues((windowedKey, count) ->
        new TrendingAlert(windowedKey.key(), count, windowedKey.window().startTime()))
    .to("hot-products-alerts");

// Real-time revenue aggregation
KTable<String, Double> revenueByCategory = purchases
    .groupBy((key, value) -> value.getCategory())
    .aggregate(
        () -> 0.0,
        (category, purchase, total) -> total + purchase.getAmount(),
        Materialized.as("revenue-by-category-store")  // queryable state store
    );

// Query directly: stateStore.get("electronics") → $1,245,678.90
// This runs as a continuous streaming computation — zero batch jobs
5

Application 4: Database Change Data Capture (CDC)

ItemTraditional ETLKafka CDC (Debezium)
FrequencyBatch every hour or overnightReal-time, every row change in milliseconds
LatencyMinutes to hours of data lagSub-second end-to-end
Database loadHeavy — bulk reads during ETL windowLow — tails the database transaction log
CompletenessMisses rapid updates between runsEvery INSERT, UPDATE, DELETE captured
ComplexityRequires timestamp columns and soft deletesWorks with existing schema, no DB changes
DeletesCannot capture actual deletesCaptures tombstone event for every delete
jsonDebezium CDC Event — Database Row Change to Kafka
// Kafka message from Debezium when a row is updated in PostgreSQL
// Each database change generates one Kafka message
{
  "op": "u",  // u=update, c=create (insert), d=delete, r=read (snapshot)
  "source": {
    "table": "orders",
    "db": "ecommerce",
    "ts_ms": 1710500000000,
    "lsn": 28690624  // PostgreSQL WAL (Write-Ahead Log) position
  },
  "before": {
    "id": 1001,
    "status": "pending",
    "amount": 49.99
  },
  "after": {
    "id": 1001,
    "status": "shipped",   // ← what changed
    "amount": 49.99
  }
}

// Consumers use this to:
// - Keep Elasticsearch in sync with PostgreSQL in real-time
// - Invalidate Redis cache when data changes
// - Notify downstream services of state changes
// - Build audit logs of every data change
6

Application 5: Log Aggregation and Observability

Centralized Logging

All application logs from hundreds of microservices flow into a Kafka "logs" topic via Logstash or Filebeat. Elasticsearch, Splunk, or S3 consume from Kafka independently. Zero coupling between apps and log storage.

Audit Trail

Kafka's immutable append-only log is ideal for compliance audit trails. Every user action, data access, and configuration change is written once and retained for the required period (GDPR, SOX, HIPAA).

Metrics Pipeline

Application metrics (latency, error rates, throughput) published to Kafka. Prometheus pulls from a Kafka exporter. InfluxDB consumers build time-series dashboards. Alerting consumers trigger PagerDuty/OpsGenie.

Distributed Tracing

OpenTelemetry trace spans published to Kafka, consumed by Jaeger or Zipkin for visualization. Correlate a single user request across 20 microservices with a single trace ID. Zero impact on service performance.

7

Kafka Architecture Essentials

yamlKafka deployment — key configuration decisions
# Key Kafka configuration decisions for production:

# Topic configuration — example for a purchases topic
kafka-topics.sh --create   --topic purchases   --partitions 12            # 12 partitions = 12-way parallelism
  --replication-factor 3     # 3 replicas = survives 2 broker failures
  --config retention.ms=604800000   # 7 days retention
  --config cleanup.policy=delete

# Producer: ensure durability
acks=all                     # Wait for all replicas to confirm
enable.idempotence=true      # Exactly-once producer
max.in.flight.requests.per.connection=5

# Consumer: control offset commits
enable.auto.commit=false     # Commit manually after processing
auto.offset.reset=earliest   # Start from beginning for new consumer groups
max.poll.records=500         # Process in batches

# Monitoring: always track
# consumer_group_lag — messages behind = processing delay
# producer_record_error_total — failed publishes
# broker_under_replicated_partitions — replication health

Frequently Asked Questions

Related Data Engineering Guides

Continue with closely related troubleshooting guides and developer workflows.