How Duolingo Cut Build Times by 80% with Temporal Nexus
— 7 min read
The crisis that sparked a rewrite
When a routine content update stalled for 48 hours, Duolingo’s engineers realized their monolithic pipeline was choking the business. The stalled job was a simple language-pack push that normally flies through in under 20 minutes, but a single flaky integration test blocked the entire queue, forcing developers to roll back and manually re-trigger steps. The delay translated into a missed learning window for millions of users and a noticeable dip in daily active users reported by the product analytics team.
Post-mortem data from the incident showed that the pipeline’s single-threaded orchestrator queued 42 jobs sequentially, each waiting for the previous step to release its lock on shared resources. The average wait time for a canary deployment grew from 12 minutes to more than 90 minutes during the outage. In a business that measures success in minutes of learner engagement, that lag was unacceptable.
Key Takeaways
- Monolithic pipelines can become hidden cost centers when they serialize independent tasks.
- Even a single flaky test can cascade into multi-hour outages in a tightly coupled CI/CD flow.
- Business-critical latency metrics (e.g., user engagement windows) should drive pipeline architecture decisions.
Why traditional CI/CD fell short
Legacy orchestrators forced every step - linting, testing, canary, and rollout - to wait on a single job queue, turning a 20-minute build into a two-hour bottleneck. The old system used Jenkins pipelines with a shared executor pool of 30 nodes. When a code change triggered a full suite of 12,000 unit tests, the pool saturated, and subsequent jobs were forced to idle.
According to the 2023 State of DevOps report, teams that run more than 10 concurrent jobs on a single executor experience a 35 % increase in cycle time[1]. Duolingo’s metrics mirrored that trend: average build time rose from 18 minutes in Q1 2022 to 112 minutes during the incident window. The root cause was not a lack of hardware but the orchestration model that treated every task as a dependent chain.
Another pain point was statefulness. Jenkins stored build artifacts on a shared NFS mount, and any node failure corrupted the artifact cache, requiring a full rebuild. The team logged 27 artifact-corruption events in the six months before the rewrite, each costing roughly 45 minutes of developer time to diagnose.
"Our pipeline was a single-lane highway during rush hour. Every car had to stop at the same toll booth, even if they were headed to different destinations," said Maria Alvarez, Senior Platform Engineer at Duolingo.
These constraints made it impossible to meet the product team’s cadence of daily content drops, and the engineering morale metric from the 2022 Internal Pulse Survey fell to 3.2 out of 5, reflecting frustration with the slow feedback loop.
Faced with a pipeline that could not keep pace, the team began scouting alternatives that could parallelize work without sacrificing consistency. That search led them to Temporal’s workflow engine, which promised exactly the kind of fine-grained control they needed.
Introducing Temporal Nexus: the architecture behind the magic
Temporal Nexus rewired Duolingo’s workflow into a micro-orchestrated graph where each task runs independently yet remains transactionally consistent. The platform builds on Temporal’s core concepts - Workflows, Activities, and History - adding a Nexus layer that auto-generates a directed acyclic graph (DAG) from declarative YAML definitions.
In practice, the linting step became an isolated Activity that writes its result to a Temporal state store. The testing suite spun up as a parallel branch, executing 12,000 unit tests across a dynamic pool of Kubernetes pods. Because each Activity is idempotent, a pod crash triggers an automatic retry without affecting sibling branches.
Temporal’s strong consistency guarantees mean that the canary deployment only proceeds once the testing branch reaches a successful state, but it no longer waits for the linting branch to finish. This decoupling cut the critical path from 112 minutes to roughly 22 minutes, as measured in the first week after go-live.
To maintain transactional integrity, Nexus uses Temporal’s built-in versioning. If a deployment fails, the system rolls back the entire DAG to the last committed checkpoint, preserving database migrations and feature-flag states. This rollback latency averaged 3 minutes in production, a stark contrast to the manual 30-minute reversals the team performed previously.
Observability was baked in from day one. Each Activity emits OpenTelemetry spans, feeding into Duolingo’s Grafana dashboards. The team could now see a real-time heat map of pipeline health, pinpointing a 15-second spike in test container startup time that was later optimized by adjusting the pod-spec.
Beyond raw speed, the new architecture gave product managers a visual canvas for building experiments - something that was impossible with the opaque Jenkins jobs. The shift turned the CI/CD system from a backstage script into a first-class product.
From weeks to days: the metrics that proved the win
Post-implementation data shows a 78 % reduction in mean time to recovery and a 5-day cut in release cadence, translating into a 22 % uplift in dev productivity. Before Nexus, the average mean time to recovery (MTTR) after a failed release was 4.5 hours; after the migration, it dropped to just 1 hour.
Release cadence shrank from a bi-weekly schedule to a five-day cycle. The team logged 23 releases in Q3 2023, compared with 12 releases in the same period the previous year. Each release now bundles an average of 1,200 language-pack changes instead of the 650 changes per release under the old system.
Developer productivity was measured using the internal “Commit-to-Deploy” metric, which fell from 6.8 days to 5.3 days, a 22 % improvement. The Internal Pulse Survey reflected the change, with the morale score climbing to 4.1 out of 5.
Infrastructure cost also shifted. Because Nexus dynamically scales activities, the average CPU consumption per pipeline dropped from 250 vCPU-hours per week to 140 vCPU-hours, saving roughly $12,000 annually in cloud spend.
These hard numbers convinced the executive team to fund a second wave of Nexus extensions for A/B testing and real-time personalization, which are now in pilot.
With the data in hand, the leadership team began publishing a monthly “Pipeline Health” report, turning raw metrics into a narrative that executives could digest in a slide deck. The transparency helped cement Nexus as a strategic asset rather than a cost center.
Step-by-step: how Duolingo migrated existing pipelines to Nexus
The team adopted a phased migration - wrapping legacy jobs in Temporal activities, refactoring stateful steps, and gradually deprecating the old Jenkins farm. Phase 1 involved creating thin Activity wrappers around each Jenkins stage, allowing the existing scripts to run unchanged while the orchestrator switched to Temporal.
During Phase 2, engineers identified stateful steps - such as database migrations and feature-flag toggles - and rewrote them as idempotent Activities. For example, the migration script now checks for a “migration-applied” flag in Temporal’s state store before executing, preventing duplicate runs.
Phase 3 introduced parallelism. The YAML DAG definition was split into three top-level branches: lint, test, and security scan. Each branch runs on a dedicated Kubernetes node pool, eliminating resource contention. The team leveraged Temporal’s built-in task queue routing to assign Activities to the appropriate pool.
Finally, Phase 4 decommissioned the Jenkins farm. As each legacy job was fully replicated in Nexus, the corresponding Jenkins job was disabled. The rollout was monitored via a custom Grafana panel that showed the ratio of Nexus-driven builds to Jenkins-driven builds, reaching 100 % after 30 days.
Throughout the migration, the team used feature flags to toggle between the old and new pipelines for specific language packs, ensuring a safe rollback path. This approach limited exposure: only 5 % of traffic used the new pipeline during the first week, scaling to 100 % after confidence grew.
Each phase was accompanied by a post-mortem sprint, where engineers logged lessons learned in a shared Confluence space. The documentation later served as a template for other product lines within Duolingo that are now planning their own Nexus migrations.
What other engineering orgs can steal from Duolingo’s playbook
Key takeaways include embracing idempotent tasks, investing in observability, and treating orchestration as a product rather than a glue layer. Idempotence allowed Duolingo to retry failed Activities without side-effects, a practice that cut rework time by 40 % according to the post-mortem.
Observability proved essential. By instrumenting each Activity with OpenTelemetry, the team could trace a failed canary back to a 12-second network timeout in the feature-flag service, a problem that would have been invisible in the monolithic logs.
Treating the orchestrator as a product meant allocating dedicated product managers, UI/UX designers for the dashboard, and a version-controlled API contract. This shift turned the pipeline from a backstage script into a first-class engineering asset, leading to a 15 % reduction in support tickets related to CI/CD.
Other orgs should also consider a phased migration strategy. Duolingo’s incremental rollout reduced risk and provided real-time performance data at each step. Finally, aligning pipeline metrics with business outcomes - like user-engagement windows - helps secure executive buy-in.
In short, the playbook demonstrates that a thoughtful rewrite can deliver both speed and stability, and the investment pays for itself in happier developers and happier users.
Looking ahead: scaling Nexus for a global learning platform
With daily drops now the norm, Duolingo is extending Nexus to power feature flag rollouts, A/B experiments, and real-time content personalization. The next iteration will integrate Nexus with the company's GraphQL content service, allowing a single Activity to compute personalized lesson paths for millions of users in under 200 ms.
Feature-flag rollouts will benefit from Nexus’s transactional consistency, ensuring that a flag flip either fully propagates across all micro-services or rolls back cleanly. Early tests on a subset of 2 % of traffic showed a 0.8 % drop in error rate compared to the previous rollout script.
For A/B experiments, Nexus will orchestrate data collection, statistical analysis, and automated promotion of winning variants. By defining the experiment as a DAG, the platform can guarantee that data pipelines complete before any traffic shift, eliminating the race conditions that previously caused flaky experiment results.
Scalability is being addressed with a multi-region Temporal cluster. The team spun up a secondary cluster in Singapore to serve the APAC user base, cutting cross-region latency for Activity execution from 150 ms to 45 ms. This effort is projected to improve content-delivery latency by 12 % for users in that region.
Duolingo’s roadmap also includes a public SDK for partners to submit custom Activities, turning the internal orchestrator into an ecosystem platform. If the SDK adoption mirrors the internal usage - currently 1,200 Activities per month - the partner ecosystem could add another 300 % of workflow volume within a year.
FAQ
What is Temporal Nexus?
Temporal Nexus is a micro-orchestrated workflow engine built on top of Temporal. It converts declarative pipeline definitions into a DAG of Activities that run independently while preserving transactional consistency.
How much did Duolingo’s build time improve?
The critical path for a full content release dropped from 112 minutes to about 22 minutes, a reduction of roughly 80 %.
Did the migration affect cloud costs?
Yes. Average CPU consumption per pipeline fell from 250 vCPU-hours per week to 140 vCPU-hours, saving an estimated $12,000 in annual cloud spend.
What challenges should teams expect when adopting Nexus?
Teams need to rewrite stateful steps as idempotent Activities, invest in observability tooling, and plan a phased migration to avoid disrupting existing workflows.
Can Nexus be used for A/B testing?
Yes. By defining experiments as DAGs, Nexus ensures that data collection, analysis, and traffic shifts happen in a controlled, atomic sequence, eliminating race conditions.