← Journal
9 min readAI Dev Review

Cursor vs Copilot: a real-world benchmark

Same task, two tools, head-to-head timings and quality scores across 30 real engineering tickets.

BenchmarksCursorGitHub Copilot

Cursor and GitHub Copilot get compared constantly, usually with vibes. We ran them through 30 real tickets from an open-source SaaS codebase and timed every step. Here are the numbers.

The setup

Tickets ranged from one-line fixes to multi-file features. Two engineers ran the same tickets — one in Cursor with its agent enabled, one in VS Code with Copilot Chat and Copilot Workspace.

We measured: time to first working draft, number of follow-up prompts, lines of code changed vs final lines after review, and whether the PR passed CI on the first try.

Headline numbers

  • Median time-to-first-draft: Cursor 4m 12s · Copilot 6m 48s.
  • First-try CI pass rate: Cursor 71% · Copilot 58%.
  • Average follow-up prompts per ticket: Cursor 2.3 · Copilot 3.6.
  • Reviewer-requested changes per PR: Cursor 1.4 · Copilot 1.9.

Why Cursor pulled ahead

Cursor's edge wasn't model quality — both can be configured with the same frontier models. It was UX. The composer's diff preview and the agent's ability to read related files without being told meant fewer round-trips per change.

Copilot Workspace closed a lot of the gap in late 2025, but it still asks for confirmation in places Cursor just acts.

Where Copilot won

Copilot won on tickets that touched org-wide standards. The enterprise indexing and policy controls are a real advantage for teams that care about compliance and consistency more than raw speed.

It also won on cost-at-scale. Per-seat pricing is friendlier than Cursor's pro plan once you cross a few dozen seats.

Verdict

If you're a small team optimizing for shipping: Cursor.

If you're an enterprise optimizing for governance: Copilot.

If you're somewhere in the middle: run both for two weeks and look at your own CI numbers. They don't lie.