How to Practice System Design: A Developer's Roadmap from Easy to Expert

A structured path through 12 system design problems — from retry engines to deployment pipelines. Build real systems, not draw diagrams.

By SysAdmin · Published 2026-05-27

How to Practice System Design: A Developer's Roadmap from Easy to Expert

System design interviews are the highest-signal round in most engineering hiring loops. They're also the hardest to practice for — because you can't just read about building a cache. You have to build one.

This guide gives you a structured learning path through 12 LLD (Low-Level Design) problems, organized from easiest to hardest. Each problem teaches specific engineering patterns you'll use in production.

The Learning Path

🟢 Easy (Week 1-2)
│  ├── Retry Engine
│  └── Employee Management
│
🟡 Medium (Week 3-5)
│  ├── LRU Cache
│  ├── Circuit Breaker
│  ├── Rate Limiter
│  ├── Task Queue
│  └── Parking Lot
│
🔴 Hard (Week 6-8)
│  ├── Load Balancer
│  ├── Cab Booking
│  └── Train Ticket Booking
│
🟣 Expert (Week 9-10)
│  └── Deployment Pipeline
│
🏆 URL Shortener (Bonus — the classic)

Phase 1: Foundation (Easy)

1. Retry Engine

What you'll learn: Exponential backoff, jitter, max retry limits

Why it matters: Every production system needs retry logic — HTTP clients, queue consumers, database connections. A bad retry implementation causes thundering herds. A good one uses exponential backoff with jitter to spread out retries and prevent overwhelming the target service.

Key pattern: delay = baseDelay * 2^attempt + random(jitter)

Infrastructure: In-memory (no external dependencies)

👉 Try it · Read the walkthrough

2. Employee Management System

What you'll learn: In-memory CRUD, search with multiple filters, pagination, department management

Why it matters: This is the pattern behind every admin panel, every REST API, every internal tool. Search, filter, paginate — these operations compose the majority of backend code in most applications.

Key pattern: Fluent filtering with multiple criteria (name contains, department equals, salary range)

Infrastructure: In-memory (HashMap + search logic)

👉 Try it · Read the walkthrough

Phase 2: Core Patterns (Medium)

3. LRU Cache

What you'll learn: HashMap + Doubly Linked List, O(1) get/put, eviction policies

Why it matters: LRU (Least Recently Used) eviction is the default caching strategy in everything from CPU L1 caches to CDN edge nodes to application-level caches. Understanding the HashMap + DLL trick is fundamental.

Key insight: The HashMap gives O(1) lookup. The DLL gives O(1) insertion/deletion. Together they give O(1) for all cache operations.

Infrastructure: Redis

👉 Try it · Read the walkthrough

4. Circuit Breaker

What you'll learn: State machine (CLOSED → OPEN → HALF_OPEN), failure counting, automatic recovery

Why it matters: In microservices, when Service A depends on Service B and Service B goes down, Service A shouldn't keep sending requests (wasting resources and increasing latency). The circuit breaker pattern detects failures, "opens" the circuit to stop requests, and periodically tests if the service has recovered.

Key insight: The state machine has three states with clear transition rules. The tricky part is the half-open state — allowing a limited number of test requests through to check recovery.

Infrastructure: Redis (for distributed state)

👉 Try it · Read the walkthrough

5. Sliding Window Rate Limiter

What you'll learn: Sliding window algorithm, Redis sorted sets, atomic operations

Why it matters: Every API gateway, every public API, every authentication system needs rate limiting. The sliding window algorithm is more accurate than fixed windows (no burst-at-boundary problem) and more memory-efficient than tracking every request individually.

Key insight: Store timestamps in a Redis sorted set. For each request, remove expired entries, count remaining, and compare against the limit — all in a Redis pipeline for atomicity.

Infrastructure: Redis

👉 Try it · Read the walkthrough

6. Priority Task Queue

What you'll learn: Priority ordering, dead-letter queues, retry policies, concurrent consumers

Why it matters: Background job processing is the backbone of async architectures — sending emails, processing payments, generating reports. A production task queue needs priorities (critical tasks first), retries (handle transient failures), and dead-letter queues (capture permanently failed tasks for debugging).

Key insight: Redis sorted sets give natural priority ordering. The dead-letter queue is just another list, but its existence changes how you think about failure handling.

Infrastructure: Redis

👉 Try it · Read the walkthrough

7. Multi-Floor Parking Lot

What you'll learn: Resource allocation, floor/slot management, vehicle type constraints, revenue tracking

Why it matters: Parking lot is the classic OOP design problem, but with real persistence it becomes a resource management system. The same patterns apply to cloud resource allocation, seat booking, warehouse management — any system where you allocate finite resources to incoming requests.

Key insight: Finding the "nearest available slot" is the interesting algorithmic challenge. Combined with vehicle type constraints and concurrent allocation, this becomes a non-trivial problem.

Infrastructure: PostgreSQL

👉 Try it · Read the walkthrough

Phase 3: Production Systems (Hard)

8. Load Balancer

What you'll learn: Round-robin, weighted, least-connections, random algorithms; health-aware routing

Why it matters: Every cloud platform runs load balancers — AWS ALB, Nginx, HAProxy. Understanding the four core algorithms and health-aware routing gives you deep insight into how traffic flows in distributed systems.

Key insight: Round-robin is simple but doesn't account for server capacity. Weighted routing distributes proportionally but requires knowing server capacity. Least-connections is self-adaptive but needs accurate connection tracking. There's no "best" algorithm — only tradeoffs.

Infrastructure: Redis

👉 Try it · Read the walkthrough

9. Cab Booking System

What you'll learn: Ride state machine, Haversine distance, race-safe acceptance, fare calculation

Why it matters: This is the Uber/Ola engineering challenge distilled. The ride state machine (REQUESTED → ACCEPTED → COMPLETED, or REQUESTED → CANCELLED) teaches you how to model real-world processes as code. The race-safe acceptance pattern (atomic UPDATE WHERE status='REQUESTED') is the same pattern used in any first-come-first-served system.

Key insight: The Haversine formula for geographic distance is essential for any location-aware system. And the UPDATE ... WHERE status='REQUESTED' pattern eliminates an entire class of concurrency bugs without distributed locks.

Infrastructure: PostgreSQL

👉 Try it · Read the walkthrough

10. Indian Railway Reservation

What you'll learn: Multi-tier allocation (Confirmed/RAC/Waiting), cascading promotions, berth preferences

Why it matters: The IRCTC reservation model is one of the most elegant queue systems in the world. A single cancellation can trigger a cascade: the first RAC passenger gets promoted to Confirmed (with a berth), and the first Waiting List passenger gets promoted to RAC. Understanding this teaches you how to build complex promotion and allocation engines.

Key insight: The cascade is always at most two levels deep. The FIFO ordering within each tier is critical for fairness.

Infrastructure: In-memory (pure data structure problem)

👉 Try it · Read the walkthrough

Phase 4: Expert (The Final Boss)

11. Deployment Pipeline

What you'll learn: Sequential stage advancement, rollback, exclusive execution, deployment history

Why it matters: This is the core of every CI/CD system — Jenkins, GitHub Actions, ArgoCD. Understanding how pipelines enforce sequential stages, prevent concurrent deployments, and handle rollback gives you deep appreciation for the infrastructure you use every day.

Key insight: Rollback isn't "undoing" stages — it's a terminal state transition. And exclusive execution (only one deployment per pipeline at a time) is the hardest requirement, because concurrent deployments can leave production in an undefined state.

Infrastructure: PostgreSQL + Redis

👉 Try it · Read the walkthrough

Study Tips

Don't skip the easy problems. The retry engine teaches exponential backoff, which you'll use in the circuit breaker. The employee management system teaches search patterns you'll reuse everywhere.

Read the contract before writing code. The interface tells you exactly what the grader expects. Every method has documented preconditions, postconditions, and edge cases.

Think about data model first, algorithm second. The right data model (which tables, which Redis keys, which data structures) determines 80% of your implementation.

Test concurrency explicitly. If the contract says "thread-safe," think about what happens when two threads call the same method simultaneously. The grader will test this.

Read the walkthrough blog after attempting. Each problem has a detailed blog post explaining the approach, key decisions, and common pitfalls. Reading it after your attempt maximizes learning.

Start Your System Design Practice

The best time to start was yesterday. The second best time is now.

👉 Explore all LLD problems

👉 Try Applied DSA problems

👉 Read all walkthrough blogs

Stop reading about system design. Start building it.