Apache Kafka Applications — Real-World Use Cases and Patterns
Apache Kafka powers the real-time data backbone of companies like LinkedIn, Netflix, Uber, and Airbnb. This guide covers Kafka's most important real-world applications — event streaming, microservices communication, real-time analytics, CDC, and log aggregation — with architecture patterns and code examples.
1M+
messages per second per broker
80%
of Fortune 100 companies use Kafka
Sub-10ms
end-to-end latency achievable
Petabytes
of data stored in Kafka at LinkedIn daily
What Kafka Is Best At
Kafka is a distributed log, not a queue
Kafka is a distributed log, not a message queue. Messages are stored durably on disk and can be replayed at any time. Multiple independent consumers read the same data without affecting each other. This makes Kafka ideal for event sourcing, audit logs, real-time analytics, and decoupled microservices.
| Item | Kafka | Traditional Message Queue (RabbitMQ) |
|---|---|---|
| Message retention | Configurable (days, weeks, forever) | Deleted after consumed |
| Multiple consumers | All consumers read same data independently | Each message consumed by one consumer |
| Message replay | Consumers can reset offset and replay | Not possible — message is gone |
| Throughput | Millions of messages/second per broker | Tens of thousands per second |
| Ordering guarantee | Per partition (use same key for ordering) | Per queue |
| Best for | Event streaming, analytics, audit logs, CDC | Task queues, complex routing, RPC patterns |
Application 1: Real-Time Event Streaming
User Action Published
User clicks, purchases, searches, or adds to cart. The microservice handling the action publishes a structured event to the relevant Kafka topic. Events are small JSON payloads with a user ID as key.
Multi-Consumer Fan-Out
Analytics, ML recommendations, personalization, and fraud detection each subscribe independently to the purchases topic. Each consumer group gets every message — no coordination needed.
Real-Time Processing
Kafka Streams or Apache Flink processes events in real-time: aggregating sales per product, detecting anomalies, building recommendation signals. Zero batch jobs needed.
Output to Multiple Systems
Processed results flow to: dashboards (via Elasticsearch), recommendation APIs (via Redis), fraud alerts (via PagerDuty), and data warehouse (via S3).
// Producer — publish user purchase event to Kafka
Properties props = new Properties();
props.put("bootstrap.servers", "kafka:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// Enable idempotence — exactly-once delivery guarantee
props.put("enable.idempotence", "true");
props.put("acks", "all"); // wait for all replicas to confirm
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
String event = """
{
"event": "purchase",
"userId": "user-123",
"productId": "prod-456",
"amount": 49.99,
"timestamp": "2026-03-15T10:30:00Z"
}
""";
// userId as partition key — all events for same user go to same partition
// This guarantees ordering of events per user
producer.send(
new ProducerRecord<>("purchases", "user-123", event),
(metadata, exception) -> {
if (exception != null) log.error("Failed to send event", exception);
else log.info("Sent to partition {} offset {}", metadata.partition(), metadata.offset());
}
);Application 2: Microservices Decoupling
Kafka as the event bus between microservices
Order Service
Publishes "order.created" events to Kafka when orders are placed. Does not know which services consume it — pure publish/subscribe decoupling.
Payment Service
Subscribes to "order.created". Processes payment and publishes "payment.completed" or "payment.failed" events back to Kafka.
Inventory Service
Subscribes to "payment.completed". Decrements inventory and publishes "inventory.updated" events. Triggers reorder if stock falls below threshold.
Notification Service
Subscribes to "order.created", "payment.completed", "order.shipped". Sends emails/SMS at each step. Completely decoupled from other services.
Analytics Consumer
Subscribes to all topics. Builds real-time dashboards, funnel analysis, and business intelligence. Adding analytics never impacts service performance.
Fraud Detection
Subscribes to "purchase" events. Analyzes patterns in real-time. Can publish "fraud.alert" events to pause suspicious accounts — other services react accordingly.
Application 3: Real-Time Analytics Pipeline
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Purchase> purchases = builder.stream("purchases");
// Count purchases per product in 5-minute tumbling windows
KTable<Windowed<String>, Long> productCounts = purchases
.groupBy((key, value) -> value.getProductId())
.windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
.count();
// Alert when a product receives 100+ purchases in 5 minutes (trending)
productCounts
.filter((windowedKey, count) -> count > 100)
.toStream()
.mapValues((windowedKey, count) ->
new TrendingAlert(windowedKey.key(), count, windowedKey.window().startTime()))
.to("hot-products-alerts");
// Real-time revenue aggregation
KTable<String, Double> revenueByCategory = purchases
.groupBy((key, value) -> value.getCategory())
.aggregate(
() -> 0.0,
(category, purchase, total) -> total + purchase.getAmount(),
Materialized.as("revenue-by-category-store") // queryable state store
);
// Query directly: stateStore.get("electronics") → $1,245,678.90
// This runs as a continuous streaming computation — zero batch jobsApplication 4: Database Change Data Capture (CDC)
| Item | Traditional ETL | Kafka CDC (Debezium) |
|---|---|---|
| Frequency | Batch every hour or overnight | Real-time, every row change in milliseconds |
| Latency | Minutes to hours of data lag | Sub-second end-to-end |
| Database load | Heavy — bulk reads during ETL window | Low — tails the database transaction log |
| Completeness | Misses rapid updates between runs | Every INSERT, UPDATE, DELETE captured |
| Complexity | Requires timestamp columns and soft deletes | Works with existing schema, no DB changes |
| Deletes | Cannot capture actual deletes | Captures tombstone event for every delete |
// Kafka message from Debezium when a row is updated in PostgreSQL
// Each database change generates one Kafka message
{
"op": "u", // u=update, c=create (insert), d=delete, r=read (snapshot)
"source": {
"table": "orders",
"db": "ecommerce",
"ts_ms": 1710500000000,
"lsn": 28690624 // PostgreSQL WAL (Write-Ahead Log) position
},
"before": {
"id": 1001,
"status": "pending",
"amount": 49.99
},
"after": {
"id": 1001,
"status": "shipped", // ← what changed
"amount": 49.99
}
}
// Consumers use this to:
// - Keep Elasticsearch in sync with PostgreSQL in real-time
// - Invalidate Redis cache when data changes
// - Notify downstream services of state changes
// - Build audit logs of every data changeApplication 5: Log Aggregation and Observability
Centralized Logging
All application logs from hundreds of microservices flow into a Kafka "logs" topic via Logstash or Filebeat. Elasticsearch, Splunk, or S3 consume from Kafka independently. Zero coupling between apps and log storage.
Audit Trail
Kafka's immutable append-only log is ideal for compliance audit trails. Every user action, data access, and configuration change is written once and retained for the required period (GDPR, SOX, HIPAA).
Metrics Pipeline
Application metrics (latency, error rates, throughput) published to Kafka. Prometheus pulls from a Kafka exporter. InfluxDB consumers build time-series dashboards. Alerting consumers trigger PagerDuty/OpsGenie.
Distributed Tracing
OpenTelemetry trace spans published to Kafka, consumed by Jaeger or Zipkin for visualization. Correlate a single user request across 20 microservices with a single trace ID. Zero impact on service performance.
Kafka Architecture Essentials
# Key Kafka configuration decisions for production:
# Topic configuration — example for a purchases topic
kafka-topics.sh --create --topic purchases --partitions 12 # 12 partitions = 12-way parallelism
--replication-factor 3 # 3 replicas = survives 2 broker failures
--config retention.ms=604800000 # 7 days retention
--config cleanup.policy=delete
# Producer: ensure durability
acks=all # Wait for all replicas to confirm
enable.idempotence=true # Exactly-once producer
max.in.flight.requests.per.connection=5
# Consumer: control offset commits
enable.auto.commit=false # Commit manually after processing
auto.offset.reset=earliest # Start from beginning for new consumer groups
max.poll.records=500 # Process in batches
# Monitoring: always track
# consumer_group_lag — messages behind = processing delay
# producer_record_error_total — failed publishes
# broker_under_replicated_partitions — replication health