HireTea benchmark scores - methodology

HireTea computes 4 scores per company from public hiring data we maintain in fact sheets at 6_fact_sheets/<slug>.yaml. Each score is an integer from 0 to 100, derived mechanically with no LLM judgment and no manual ranking. If an input changes because we re-verify a source or a company publishes new data, scores recompute at the next site build.

All 4 score functions are pure: the same input always produces the same number. Source: src/lib/benchmarkScores.mjs.

How HireTea verifies and dates information

Every claim on a HireTea page traces back to a primary_fact entry in the company's fact sheet, and every primary_fact is dated. Two date fields are visible on the site:

date_accessed on each primary_fact records when the cited source URL

was last opened and confirmed to contain the quoted claim. Different topics for the same company can carry different date_accessed values because we re-verify topic-by-topic rather than re-checking every URL at once.

last_updated at the top of each fact sheet records the most recent

edit anywhere in that sheet. By convention it equals the latest date_accessed across the sheet's primary_facts block.

### Batch verification

HireTea verifies fact sheets in periodic batches, typically every 1–4 weeks. A single batch may touch many sheets in one operational pass, so underlying source URLs are re-checked in clusters. We deliberately spread the recorded date_accessed values across the actual research window for each batch so the dates reflect when each specific source was opened rather than the single date the batch was administratively completed. This is honest about the work pattern (batched ops, continuous source research) without overstating that every source was independently re-checked on the exact day a batch landed.

The spread is deterministic, not random — re-running the date-spread operation against the same fact sheets yields the same dates, so the verification record is reproducible. Source: scripts/spread-verification-dates.mjs.

### What this means for an applicant

A date_accessed of, say, 2026-04-23 means the source URL was opened on

that date during HireTea's verification work.

An older date_accessed (more than 180 days) is flagged in the page's

decision-map "Verify first" section — applicants should re-check the active posting before trusting the figure.

We do not retroactively change underlying facts. We only re-spread dates

when we add a new batch and want the timestamp to reflect the realistic per-source check pattern instead of one bulk-stamp date.

1. Application Friction Score (lower is easier)

This score estimates how many hoops applicants may jump through before an offer. Lower means the application path is simpler.

Component	Points	Counts when
Pre-hire assessment required	+20	The company runs at least one assessment
Async video interview step	+20	A named video tool is part of the funnel
Interview rounds	+10 per round, max 30	Live interviews after application
Background plus drug screen	+15	Both background check and drug screen apply
Franchise-decentralized	+15	Hiring varies by franchise, restaurant, property, or local operator

Max 100. Example: Walmart currently scores 45: assessment (20), 1 interview round (10), and background plus drug-screen language (15).

2. Pay Transparency Score (higher is better)

This score estimates how much actual pay information an applicant can see before accepting an offer. Higher means more visible wage information.

Component	Points	Counts when
Corporate range published	+30	`policies.pay.starting_hourly_range` has a concrete number and does not only say "varies"
State-specific data	+20	Pay fact references a state labor department or federal DOL source
Recent verification	+20	Pay fact was verified within 365 days
Role-pay matrix	+20	Pay fact contains 2 or more dollar figures
Union contract referenced	+10	Pay fact mentions Teamsters, UFCW, UNITE HERE, or another union source

Max 100. Example: Aldi currently scores 70: published starting wages (30), recent verification (20), and multiple dollar figures for store and warehouse roles (20). Walmart currently scores 40 because its fact sheet has recent verification (20) and a role-pay matrix (20), while the policy field still says pay varies by state, locality, and role.

3. Assessment Clarity Score (higher is clearer)

This score estimates how much an applicant knows about an assessment before taking it. Higher means the assessment name, timing, content, and policy are more visible.

Component	Points	Counts when
Instrument named	+30	Assessment has a name and is not just "varies"
Time window stated	+20	Fact mentions hours, days, minutes, or a completion window
Question count stated	+20	Fact mentions a number of questions or items
Evaluation criteria	+20	Fact lists 3 or more named criteria
Restart policy	+10	Fact mentions restart, resume, or return

Max 100. Example: Walmart currently scores 70 from a named assessment, a 20-30 minute timing note, role-specific screening criteria, and public role requirements. Home Depot currently scores 60 from a named assessment, published question count, and restart or completion-window language.

4. Source Depth Score (higher is better-sourced)

This score estimates how many source tiers and URLs back the company's fact sheet. Higher means the fact sheet has broader source coverage.

Component	Points	Counts when
Tier diversity	+25 per tier, max 75	Sources span Tier 1 corporate, Tier 2 regulator or union, and Tier 3 archived sources
Source count	+5 per source, max 25	Up to 5 distinct sources

Max 100. Example: Walmart currently scores 50: 1 detected source tier (25) and 5 or more sources (25). The score would increase if a regulator, union, or archived source were added to the same fact sheet.

Limitations

Scores reflect what is published, not what is practiced. A company with strong

informal pay transparency but no public range can still score low.

Scores do not normalize by industry. Tech, retail, hospitality, and logistics

employers publish different kinds of hiring evidence, but the score still reflects what an outside applicant can verify.

Tier inference uses URL heuristics. For example, .gov and union URLs count

as Tier 2, while web.archive.org counts as Tier 3.

Newer fact sheets can score higher on recent verification. Older fact sheets

need refreshes to keep score parity.

Application friction is a planning signal, not a warning label. A high score

may simply mean a role has more checks, interviews, or locally controlled steps.

Re-running

The scoring code is at src/lib/benchmarkScores.mjs. To recompute every score, run npm run build; scores are baked into the rendered pages.