Score Examples

This page walks through five realistic PR scenarios at different score levels. Each example shows the full calculation: per-dimension base score breakdown, ESF tier determination, and final score.

These examples demonstrate how the scoring system handles different types of engineering work, from routine maintenance to complex system design.

Example 1: Environment Variable Update

PR: "Add feature flag for beta dashboard" 3 lines changed across 2 files. Adds a new environment variable to the config file and references it in the feature gate utility.

Base Score Breakdown

Dimension Score Reasoning
Scope 1 Two files in the same config subsystem. Minimal breadth.
Architecture 0 No structural changes. Works within existing feature flag pattern.
Implementation 1 Simple boolean check. No branching logic or business rules.
Risk 2 Feature flag is off by default. Easily reversible. Low blast radius.
Quality 1 No tests needed for config addition. Self-documenting change.
Performance & Security 0 No optimization or security work.
Base Score 5

ESF Calculation

Step Value Result
Lines changed 3 Nano (1-10)
Files changed 2 Nano (1-10)
Gap check Nano - Nano = 0 No bump
ESF 0.10x

Final Score

5 x 0.10 = 0.5 (rounds to 1)

This is a routine configuration change. The low score does not mean the work was unimportant — feature flags are valuable — it means the change was small and simple.

Example 2: Bug Fix with Test Coverage

PR: "Fix timezone offset in weekly report aggregation" 85 lines changed across 4 files. Corrects a bug where weekly velocity reports used UTC midnight instead of the organization's configured timezone, causing reports to include/exclude PRs from the wrong day boundary.

Base Score Breakdown

Dimension Score Reasoning
Scope 5 Four files: report service, timezone utility, and two test files. Localized to the reporting subsystem.
Architecture 0 No structural changes. Fix is within existing patterns.
Implementation 7 Timezone arithmetic with edge cases around DST transitions and week boundary calculations. Non-trivial logic.
Risk 5 Changes report output for all organizations. Backward-compatible but affects data accuracy.
Quality 8 Unit tests for DST transitions, week boundaries, and multiple timezone configurations. Integration test for the full report generation path.
Performance & Security 0 No optimization work.
Base Score 25

ESF Calculation

Step Value Result
Lines changed 85 Small (51-150)
Files changed 4 Nano (1-10)
Gap check Nano is below Small No bump
ESF 0.40x

Final Score

25 x 0.40 = 10.0

A focused bug fix with solid test coverage. The score reflects meaningful implementation complexity (timezone logic is genuinely tricky) at a contained scope.

Example 3: Multi-Component Feature

PR: "Add team comparison view with date range filtering" 340 lines changed across 12 files. Adds a new page where engineering managers can compare velocity metrics across two teams over a configurable date range. Touches the frontend (new page component, chart components, API hooks), the backend (new query endpoint with aggregation logic), and shared type definitions.

Base Score Breakdown

Dimension Score Reasoning
Scope 12 Twelve files spanning frontend components, API hooks, backend controller, service, DTO, and shared types. Crosses the frontend/backend boundary.
Architecture 5 Introduces a new comparison data model and aggregation query pattern, but works within existing module structure. No new services or dependencies.
Implementation 10 Date range aggregation with GROUP BY logic, percentage calculations, null handling for teams with no data in a period, and chart data transformation. Moderate algorithmic complexity.
Risk 7 New API endpoint with query parameters. No database migration, but changes to how metrics are aggregated could surface data inconsistencies.
Quality 9 Unit tests for aggregation logic and edge cases (empty date ranges, single-team comparison). Integration test for the API endpoint. Frontend component tests for the chart rendering.
Performance & Security 2 Adds database index for the date range query. Pagination on the API response.
Base Score 45

ESF Calculation

Step Value Result
Lines changed 340 Medium (151-400)
Files changed 12 Micro (11-50)
Gap check Micro - Medium = below No bump
ESF 0.60x

Final Score

45 x 0.60 = 27.0

A solid multi-component feature that crosses architectural boundaries. The score sits in the 16-30 range, which is typical for focused features that touch multiple layers of the stack.

Example 4: Complex Integration with Migration

PR: "Implement GitHub webhook handler for real-time PR scoring" 620 lines changed across 18 files. Replaces the polling-based PR ingestion with a webhook-driven approach. Includes a new webhook controller with signature verification, an event processing queue, a database migration to add webhook tracking tables, retry logic for failed score computations, and an admin endpoint to view webhook delivery status.

Base Score Breakdown

Dimension Score Reasoning
Scope 16 Eighteen files across the webhook controller, queue processor, database migration, admin API, configuration, and tests. Touches ingestion, scoring, and admin subsystems.
Architecture 12 Introduces event-driven ingestion pattern alongside existing polling. New queue abstraction for async processing. Changes how PRs enter the scoring pipeline.
Implementation 15 Webhook signature verification (HMAC-SHA256), idempotent event processing, queue consumer with concurrency control, retry with exponential backoff, and state machine for webhook delivery tracking.
Risk 14 Database migration adds new tables and a foreign key to the existing PRs table. Changes the critical PR ingestion path. Requires coordinated deployment (webhook registration, then code deploy). Rollback requires reverting the migration.
Quality 10 Unit tests for signature verification, event deduplication, and retry logic. Integration tests for the webhook endpoint and queue processing. Migration rollback tested.
Performance & Security 4 HMAC signature verification on all incoming webhooks. Rate limiting per organization. Queue processing benchmarked for throughput. Webhook secret rotation support.
Base Score 71

ESF Calculation

Step Value Result
Lines changed 620 Large (401-800)
Files changed 18 Micro (11-50)
Gap check Micro is below Large No bump
ESF 0.80x

Final Score

71 x 0.80 = 56.8

A complex integration that introduces new architectural patterns and touches a critical system path. The score in the 51-75 range reflects the combination of high base complexity and substantial implementation effort.

Example 5: Large-Scale System Implementation

PR: "Add organization-level velocity benchmarking engine" 1,150 lines changed across 26 files. Builds a benchmarking engine that computes statistical velocity benchmarks (percentiles, trends, standard deviations) across all teams in an organization. Includes a new database schema for benchmark snapshots, a background computation job, a caching layer, REST API endpoints, and a frontend dashboard with interactive charts.

Base Score Breakdown

Dimension Score Reasoning
Scope 18 Twenty-six files spanning a new database migration, entity definitions, a background job module, caching service integration, API controller and DTOs, frontend page with multiple chart components, and shared types. Crosses every layer of the system.
Architecture 14 New benchmark computation module with its own service, repository, and caching layer. Introduces a snapshot pattern for time-series benchmark data. New background job scheduling pattern.
Implementation 17 Statistical calculations (percentile computation across variable-size datasets, trend detection with linear regression, outlier filtering). Background job with incremental computation to avoid reprocessing. Cache invalidation strategy tied to data freshness.
Risk 13 New database tables with indexes on large datasets. Background job interacts with production data. Cache consistency must be maintained across deployments. No breaking changes to existing APIs.
Quality 12 Comprehensive unit tests for statistical functions with known-answer test vectors. Integration tests for the background job and caching layer. API endpoint tests with fixture data. Frontend component tests for chart rendering with edge cases (empty data, single data point).
Performance & Security 4 Database query optimization with materialized aggregations. Cache warming strategy. Background job resource limits to prevent database contention. API response pagination.
Base Score 78

ESF Calculation

Step Value Result
Lines changed 1,150 XL (801+)
Files changed 26 Micro (11-50)
Gap check Micro is below XL No bump
ESF 1.00x

Final Score

78 x 1.00 = 78.0

A large-scale implementation that introduces a new subsystem. The XL ESF (1.00x) means the base score passes through unmodified, and the high base score reflects genuine complexity across all six dimensions.

Summary Table

Example PR Description Lines Files Base Score ESF Final Score
1 Environment variable update 3 2 5 0.10x 1
2 Timezone bug fix 85 4 25 0.40x 10
3 Team comparison feature 340 12 45 0.60x 27
4 Webhook integration 620 18 71 0.80x 57
5 Benchmarking engine 1,150 26 78 1.00x 78

The progression illustrates how both dimensions of the scoring system work together. Base scores increase with complexity, and ESF tiers increase with effort. The final score captures the full picture: what the change does and how much work it took to do it.