The Six Dimensions

The base score is the sum of six dimensions that evaluate what a pull request does. Each dimension has a defined range, and they add up to a maximum of 100.

Base Score = Scope + Architecture + Implementation + Risk + Quality + Performance/Security

Score breakdown popover laying out all six dimensions for a PR scored 62 — Scope 14/20, Architecture 12/20, Implementation 13/20, Risk 10/20, Quality 9/15, Perf/Security 4/5 — with the base-score sum and ESF calculation underneath This is the breakdown popover any reviewer can open by clicking a score badge — the same six dimensions discussed below, in the same order.

This page explains each dimension in detail: what it measures, what different score levels look like, and the anchored examples the AI uses for calibration.

Scope (0-20 points)

Scope measures the breadth of the change. How many files were touched? How many subsystems were affected? Did the change cross architectural boundaries?

Score	What It Looks Like
0-3	Single file, minimal change. A configuration update or typo fix.
4-7	Localized change across 1-3 related files within one subsystem.
8-12	Multiple related files across a subsystem. UI, API, and database touched.
13-17	Cross-cutting change spanning multiple subsystems.
18-20	System-wide impact. New services, breaking changes, or major refactors.

Key factors the AI considers: files modified (weighted by criticality), new APIs or endpoints, database migrations, external service integrations, and cross-team coordination requirements.

Anchored Examples

Level	Score	Example
LOW	2	Typo fix. A single-character correction in a user-facing string. One file, no logic changes, no tests affected.
MID	10	Archive User feature. Adds a soft-delete mechanism touching the user model, API controller, service layer, and UI component. Multiple files across one domain.
HIGH	19	Notifications service. A new service with its own database tables, API endpoints, queue consumers, and integration points across the existing user and billing systems.

Architecture (0-20 points)

Architecture measures structural impact. Did the PR introduce new abstractions? Change the dependency graph? Establish patterns that future code will follow?

Score	What It Looks Like
0	No architectural changes — bug fixes or feature additions within existing patterns.
1-5	Minimal impact. Internal reorganization, extracting a helper.
6-10	Internal refactoring with improved structure. New interfaces or abstractions within a module.
11-15	New dependencies, service boundaries, or module abstractions.
16-20	Major architectural shifts. Event-driven design, dependency overhaul, new system-wide patterns.

A score of 0 is common and perfectly fine — most PRs work within existing architecture. High scores are reserved for work that genuinely changes how the system is structured. The AI considers service dependencies, critical path changes, decoupling improvements, and design pattern introductions.

Anchored Examples

Level	Score	Example
LOW	2	Extract helper function. Pulls a repeated block of validation logic into a shared utility. No new abstractions, no dependency changes.
MID	9	Strategy Pattern refactor. Replaces a switch statement with a strategy pattern, introducing an interface and multiple implementations. Changes the internal structure of one module without affecting external contracts.
HIGH	17	Event-driven migration. Converts synchronous service-to-service calls into an event-driven architecture with a message bus, event schemas, and consumer registration. Fundamentally changes how components communicate.

Implementation (0-20 points)

Implementation measures algorithmic and logic complexity. How sophisticated is the business logic? Does the code handle concurrency, complex state machines, or intricate data transformations?

Score	What It Looks Like
0-5	Simple CRUD, text changes, configuration. Straightforward data mapping.
6-10	Business rules with branching logic. Validation with edge cases.
11-15	Complex algorithms, concurrency, advanced patterns, batch processing.
16-20	Performance-critical code, complex state management, distributed systems logic.

Anchored Examples

Level	Score	Example
LOW	3	Add API fields. Adds two new fields to an existing API response by updating the DTO, entity mapping, and serializer. No branching logic, no new business rules.
MID	8	Discount calculation engine. Implements stacking discount rules with priority ordering, mutual exclusion logic, and currency-aware rounding. Multiple edge cases around percentage vs. fixed discounts.
HIGH	17	Concurrent background processor. Builds a job processor with configurable concurrency, batching, retry logic with exponential backoff, dead letter queues, and graceful shutdown handling.

Risk (0-20 points)

Risk measures deployment and operational complexity. How dangerous is this change to deploy? What is the blast radius? How hard is it to roll back?

Score	What It Looks Like
0-5	Easily reversible, backward-compatible. Low blast radius.
6-10	Some deployment complexity. Public API change, new external dependency.
11-15	Migration required. Breaking change with migration path. Moderate blast radius.
16-20	Core data model change. Multi-step deployment. High risk, hard to reverse.

Risk factors that increase the score: database schema changes (+3-5), authentication/security changes (+4-6), external API changes (+3-5). Risk mitigations that decrease it: feature flags (-2), documented rollback plans (-2), canary deployment strategy (-1).

Anchored Examples

Level	Score	Example
LOW	3	Optional field addition. Adds a nullable column to a non-critical table with a default value. Fully backward-compatible, no migration coordination needed.
MID	9	Change validation rules. Tightens validation on a public API endpoint, rejecting previously accepted input. Requires coordinating with API consumers and a deprecation notice.
HIGH	17	Core table migration. Restructures the primary users table — splitting into two tables, migrating data, updating all foreign keys, and deploying in a multi-step sequence to avoid downtime.

Quality (0-15 points)

Quality measures the craftsmanship of the change. Test coverage, documentation, code clarity, and maintainability.

This dimension is intentionally capped at 15 rather than 20. Quality matters, but the rubric is weighted toward the complexity of the work itself.

Score	What It Looks Like
0-3	No tests. No documentation. Quick fix with tech debt.
4-7	Unit tests for the happy path. Basic inline comments.
8-11	Comprehensive tests including edge cases. Integration tests. Clear documentation.
12-15	Exceptional coverage. E2E tests, contract tests, ADRs, load testing.

The AI evaluates test coverage, edge case handling, integration and E2E tests, API documentation, and migration guides.

Anchored Examples

Level	Score	Example
LOW	2	Null pointer hotfix. Fixes a crash with a single guard clause. No tests added, no documentation — the fix is self-evident.
MID	9	Invitation flow. Adds unit tests for the happy path and three error scenarios, integration tests for the email delivery path, and updates the API docs with the new endpoint.
HIGH	14	Payment processing. Comprehensive test suite covering happy paths, edge cases (currency rounding, partial refunds, idempotency), integration tests against a payment provider sandbox, contract tests for webhook signatures, and an ADR documenting the retry strategy.

Performance & Security (0-5 points)

This dimension captures explicit optimization and hardening work. Not "the code runs fast" but "the engineer deliberately optimized performance or hardened security."

Score	What It Looks Like
0	No explicit optimization. Framework defaults only.
1-2	Basic optimizations, security awareness, input validation.
3-4	Benchmarks, performance profiling, or security threat analysis.
5	Comprehensive: defense in depth, threat modeling, load testing, rate limiting, monitoring.

Most PRs score 0-1 here, and that is fine. This dimension rewards engineers who go beyond the defaults when the situation demands it.

Anchored Examples

Level	Score	Example
LOW	1	Basic endpoint. Adds a standard REST endpoint using framework defaults. Includes input validation via the framework's built-in decorators but no custom security or performance work.
MID	3	Product search. Implements search with database query optimization (indexing strategy, query plan analysis), pagination to limit result set sizes, and rate limiting on the endpoint.
HIGH	5	Secure file upload. Implements file upload with virus scanning, content-type validation, size limits, signed URLs with expiry, encryption at rest, access logging, rate limiting per user, and load testing to validate throughput under concurrent uploads.

How the Dimensions Work Together

The six dimensions are independent — a PR can score high on Implementation but zero on Architecture, or high on Risk but low on Scope. This independence matters because it means the score accurately reflects the specific type of complexity in each change.

A typical distribution for a mid-range feature PR might look like:

Dimension	Score
Scope	10
Architecture	4
Implementation	8
Risk	6
Quality	7
Performance & Security	1
Base Score	36

This base score then gets multiplied by the Effort Scale Factor to produce the final score.

Six Dimensions for AI Artifact PRs

When a PR includes Claude Code skills, slash commands, sub-agents, or CLAUDE.md / AGENTS.md memory files, the same six dimensions apply but with skill-grading semantics:

Scope -- How broadly the skill changes agent behavior (system-wide dispatcher vs. one-off helper).
Architecture -- Whether the skill introduces a new pattern (governance layer, sub-agent dispatch, reviewer-of-reviewers).
Implementation -- Prompt engineering rigor: complete frontmatter, principled tool/permission scopes, anchored examples, structured outputs.
Risk -- Whether the skill modifies destructive behavior, expands tool permissions, or loosens safety rails.
Quality -- Worked examples, "when to use / when NOT to use" triggers, frontmatter that is actually referenced in the body.
Performance/Security -- Permission tightness, secret-handling, prompt-injection defenses.

See AI Artifact Scoring for the full sub-rubric.