Score Examples

This page walks through five realistic PR scenarios at different score levels. Each example shows the full calculation: per-dimension base score breakdown, ESF tier determination, and final score.

These examples demonstrate how the scoring system handles different types of engineering work, from routine maintenance to complex system design.

Example 1: Environment Variable Update

PR: "Add feature flag for beta dashboard" 3 lines changed across 2 files. Adds a new environment variable to the config file and references it in the feature gate utility.

Base Score Breakdown

Dimension	Score	Reasoning
Scope	1	Two files in the same config subsystem. Minimal breadth.
Architecture	0	No structural changes. Works within existing feature flag pattern.
Implementation	1	Simple boolean check. No branching logic or business rules.
Risk	2	Feature flag is off by default. Easily reversible. Low blast radius.
Quality	1	No tests needed for config addition. Self-documenting change.
Performance & Security	0	No optimization or security work.
Base Score	5

ESF Calculation

Step	Value	Result
Lines changed	3	Nano (1-10)
Files changed	2	Nano (1-10)
Gap check	Nano - Nano = 0	No bump
ESF		0.10x

Final Score

5 x 0.10 = 0.5 (rounds to 1)

This is a routine configuration change. The low score does not mean the work was unimportant — feature flags are valuable — it means the change was small and simple.

Example 2: Bug Fix with Test Coverage

PR: "Fix timezone offset in weekly report aggregation" 85 lines changed across 4 files. Corrects a bug where weekly velocity reports used UTC midnight instead of the organization's configured timezone, causing reports to include/exclude PRs from the wrong day boundary.

Base Score Breakdown

Dimension	Score	Reasoning
Scope	5	Four files: report service, timezone utility, and two test files. Localized to the reporting subsystem.
Architecture	0	No structural changes. Fix is within existing patterns.
Implementation	7	Timezone arithmetic with edge cases around DST transitions and week boundary calculations. Non-trivial logic.
Risk	5	Changes report output for all organizations. Backward-compatible but affects data accuracy.
Quality	8	Unit tests for DST transitions, week boundaries, and multiple timezone configurations. Integration test for the full report generation path.
Performance & Security	0	No optimization work.
Base Score	25

ESF Calculation

Step	Value	Result
Lines changed	85	Small (51-150)
Files changed	4	Nano (1-10)
Gap check	Nano is below Small	No bump
ESF		0.40x

Final Score

25 x 0.40 = 10.0

A focused bug fix with solid test coverage. The score reflects meaningful implementation complexity (timezone logic is genuinely tricky) at a contained scope.

Example 3: Multi-Component Feature

PR: "Add team comparison view with date range filtering" 340 lines changed across 12 files. Adds a new page where engineering managers can compare velocity metrics across two teams over a configurable date range. Touches the frontend (new page component, chart components, API hooks), the backend (new query endpoint with aggregation logic), and shared type definitions.

Base Score Breakdown

Dimension	Score	Reasoning
Scope	12	Twelve files spanning frontend components, API hooks, backend controller, service, DTO, and shared types. Crosses the frontend/backend boundary.
Architecture	5	Introduces a new comparison data model and aggregation query pattern, but works within existing module structure. No new services or dependencies.
Implementation	10	Date range aggregation with GROUP BY logic, percentage calculations, null handling for teams with no data in a period, and chart data transformation. Moderate algorithmic complexity.
Risk	7	New API endpoint with query parameters. No database migration, but changes to how metrics are aggregated could surface data inconsistencies.
Quality	9	Unit tests for aggregation logic and edge cases (empty date ranges, single-team comparison). Integration test for the API endpoint. Frontend component tests for the chart rendering.
Performance & Security	2	Adds database index for the date range query. Pagination on the API response.
Base Score	45

ESF Calculation

Step	Value	Result
Lines changed	340	Medium (151-400)
Files changed	12	Micro (11-50)
Gap check	Micro - Medium = below	No bump
ESF		0.60x

Final Score

45 x 0.60 = 27.0

A solid multi-component feature that crosses architectural boundaries. The score sits in the 16-30 range, which is typical for focused features that touch multiple layers of the stack.

Example 4: Complex Integration with Migration

PR: "Implement GitHub webhook handler for real-time PR scoring" 620 lines changed across 18 files. Replaces the polling-based PR ingestion with a webhook-driven approach. Includes a new webhook controller with signature verification, an event processing queue, a database migration to add webhook tracking tables, retry logic for failed score computations, and an admin endpoint to view webhook delivery status.

Base Score Breakdown

Dimension	Score	Reasoning
Scope	16	Eighteen files across the webhook controller, queue processor, database migration, admin API, configuration, and tests. Touches ingestion, scoring, and admin subsystems.
Architecture	12	Introduces event-driven ingestion pattern alongside existing polling. New queue abstraction for async processing. Changes how PRs enter the scoring pipeline.
Implementation	15	Webhook signature verification (HMAC-SHA256), idempotent event processing, queue consumer with concurrency control, retry with exponential backoff, and state machine for webhook delivery tracking.
Risk	14	Database migration adds new tables and a foreign key to the existing PRs table. Changes the critical PR ingestion path. Requires coordinated deployment (webhook registration, then code deploy). Rollback requires reverting the migration.
Quality	10	Unit tests for signature verification, event deduplication, and retry logic. Integration tests for the webhook endpoint and queue processing. Migration rollback tested.
Performance & Security	4	HMAC signature verification on all incoming webhooks. Rate limiting per organization. Queue processing benchmarked for throughput. Webhook secret rotation support.
Base Score	71

ESF Calculation

Step	Value	Result
Lines changed	620	Large (401-800)
Files changed	18	Micro (11-50)
Gap check	Micro is below Large	No bump
ESF		0.80x

Final Score

71 x 0.80 = 56.8

A complex integration that introduces new architectural patterns and touches a critical system path. The score in the 51-75 range reflects the combination of high base complexity and substantial implementation effort.

Example 5: Large-Scale System Implementation

PR: "Add organization-level velocity benchmarking engine" 1,150 lines changed across 26 files. Builds a benchmarking engine that computes statistical velocity benchmarks (percentiles, trends, standard deviations) across all teams in an organization. Includes a new database schema for benchmark snapshots, a background computation job, a caching layer, REST API endpoints, and a frontend dashboard with interactive charts.

Base Score Breakdown

Dimension	Score	Reasoning
Scope	18	Twenty-six files spanning a new database migration, entity definitions, a background job module, caching service integration, API controller and DTOs, frontend page with multiple chart components, and shared types. Crosses every layer of the system.
Architecture	14	New benchmark computation module with its own service, repository, and caching layer. Introduces a snapshot pattern for time-series benchmark data. New background job scheduling pattern.
Implementation	17	Statistical calculations (percentile computation across variable-size datasets, trend detection with linear regression, outlier filtering). Background job with incremental computation to avoid reprocessing. Cache invalidation strategy tied to data freshness.
Risk	13	New database tables with indexes on large datasets. Background job interacts with production data. Cache consistency must be maintained across deployments. No breaking changes to existing APIs.
Quality	12	Comprehensive unit tests for statistical functions with known-answer test vectors. Integration tests for the background job and caching layer. API endpoint tests with fixture data. Frontend component tests for chart rendering with edge cases (empty data, single data point).
Performance & Security	4	Database query optimization with materialized aggregations. Cache warming strategy. Background job resource limits to prevent database contention. API response pagination.
Base Score	78

ESF Calculation

Step	Value	Result
Lines changed	1,150	XL (801+)
Files changed	26	Micro (11-50)
Gap check	Micro is below XL	No bump
ESF		1.00x

Final Score

78 x 1.00 = 78.0

A large-scale implementation that introduces a new subsystem. The XL ESF (1.00x) means the base score passes through unmodified, and the high base score reflects genuine complexity across all six dimensions.

Summary Table

Example	PR Description	Lines	Files	Base Score	ESF	Final Score
1	Environment variable update	3	2	5	0.10x	1
2	Timezone bug fix	85	4	25	0.40x	10
3	Team comparison feature	340	12	45	0.60x	27
4	Webhook integration	620	18	71	0.80x	57
5	Benchmarking engine	1,150	26	78	1.00x	78

The progression illustrates how both dimensions of the scoring system work together. Base scores increase with complexity, and ESF tiers increase with effort. The final score captures the full picture: what the change does and how much work it took to do it.