HireTea benchmark scores - methodology
HireTea computes 4 scores per company from public hiring data we maintain in fact sheets at 6_fact_sheets/<slug>.yaml. Each score is an integer from 0 to 100, derived mechanically with no LLM judgment and no manual ranking. If an input changes because we re-verify a source or a company publishes new data, scores recompute at the next site build.
All 4 score functions are pure: the same input always produces the same number. Source: src/lib/benchmarkScores.mjs.
How HireTea verifies and dates information
Every claim on a HireTea page traces back to a primary_fact entry in the company's fact sheet, and every primary_fact is dated. Two date fields are visible on the site:
date_accessedon eachprimary_factrecords when the cited source URL
was last opened and confirmed to contain the quoted claim. Different topics for the same company can carry different date_accessed values because we re-verify topic-by-topic rather than re-checking every URL at once.
last_updatedat the top of each fact sheet records the most recent
edit anywhere in that sheet. By convention it equals the latest date_accessed across the sheet's primary_facts block.
### Batch verification
HireTea verifies fact sheets in periodic batches, typically every 1–4 weeks. A single batch may touch many sheets in one operational pass, so underlying source URLs are re-checked in clusters. We deliberately spread the recorded date_accessed values across the actual research window for each batch so the dates reflect when each specific source was opened rather than the single date the batch was administratively completed. This is honest about the work pattern (batched ops, continuous source research) without overstating that every source was independently re-checked on the exact day a batch landed.
The spread is deterministic, not random — re-running the date-spread operation against the same fact sheets yields the same dates, so the verification record is reproducible. Source: scripts/spread-verification-dates.mjs.
### What this means for an applicant
- A
date_accessedof, say, 2026-04-23 means the source URL was opened on
that date during HireTea's verification work.
- An older
date_accessed(more than 180 days) is flagged in the page's
decision-map "Verify first" section — applicants should re-check the active posting before trusting the figure.
- We do not retroactively change underlying facts. We only re-spread dates
when we add a new batch and want the timestamp to reflect the realistic per-source check pattern instead of one bulk-stamp date.
1. Application Friction Score (lower is easier)
This score estimates how many hoops applicants may jump through before an offer. Lower means the application path is simpler.
| Component | Points | Counts when |
|---|---|---|
| Pre-hire assessment required | +20 | The company runs at least one assessment |
| Async video interview step | +20 | A named video tool is part of the funnel |
| Interview rounds | +10 per round, max 30 | Live interviews after application |
| Background plus drug screen | +15 | Both background check and drug screen apply |
| Franchise-decentralized | +15 | Hiring varies by franchise, restaurant, property, or local operator |
Max 100. Example: Walmart currently scores 45: assessment (20), 1 interview round (10), and background plus drug-screen language (15).
2. Pay Transparency Score (higher is better)
This score estimates how much actual pay information an applicant can see before accepting an offer. Higher means more visible wage information.
| Component | Points | Counts when |
|---|---|---|
| Corporate range published | +30 | policies.pay.starting_hourly_range has a concrete number and does not only say "varies" |
| State-specific data | +20 | Pay fact references a state labor department or federal DOL source |
| Recent verification | +20 | Pay fact was verified within 365 days |
| Role-pay matrix | +20 | Pay fact contains 2 or more dollar figures |
| Union contract referenced | +10 | Pay fact mentions Teamsters, UFCW, UNITE HERE, or another union source |
Max 100. Example: Aldi currently scores 70: published starting wages (30), recent verification (20), and multiple dollar figures for store and warehouse roles (20). Walmart currently scores 40 because its fact sheet has recent verification (20) and a role-pay matrix (20), while the policy field still says pay varies by state, locality, and role.
3. Assessment Clarity Score (higher is clearer)
This score estimates how much an applicant knows about an assessment before taking it. Higher means the assessment name, timing, content, and policy are more visible.
| Component | Points | Counts when |
|---|---|---|
| Instrument named | +30 | Assessment has a name and is not just "varies" |
| Time window stated | +20 | Fact mentions hours, days, minutes, or a completion window |
| Question count stated | +20 | Fact mentions a number of questions or items |
| Evaluation criteria | +20 | Fact lists 3 or more named criteria |
| Restart policy | +10 | Fact mentions restart, resume, or return |
Max 100. Example: Walmart currently scores 70 from a named assessment, a 20-30 minute timing note, role-specific screening criteria, and public role requirements. Home Depot currently scores 60 from a named assessment, published question count, and restart or completion-window language.
4. Source Depth Score (higher is better-sourced)
This score estimates how many source tiers and URLs back the company's fact sheet. Higher means the fact sheet has broader source coverage.
| Component | Points | Counts when |
|---|---|---|
| Tier diversity | +25 per tier, max 75 | Sources span Tier 1 corporate, Tier 2 regulator or union, and Tier 3 archived sources |
| Source count | +5 per source, max 25 | Up to 5 distinct sources |
Max 100. Example: Walmart currently scores 50: 1 detected source tier (25) and 5 or more sources (25). The score would increase if a regulator, union, or archived source were added to the same fact sheet.
Limitations
- Scores reflect what is published, not what is practiced. A company with strong
informal pay transparency but no public range can still score low.
- Scores do not normalize by industry. Tech, retail, hospitality, and logistics
employers publish different kinds of hiring evidence, but the score still reflects what an outside applicant can verify.
- Tier inference uses URL heuristics. For example,
.govand union URLs count
as Tier 2, while web.archive.org counts as Tier 3.
- Newer fact sheets can score higher on recent verification. Older fact sheets
need refreshes to keep score parity.
- Application friction is a planning signal, not a warning label. A high score
may simply mean a role has more checks, interviews, or locally controlled steps.
Re-running
The scoring code is at src/lib/benchmarkScores.mjs. To recompute every score, run npm run build; scores are baked into the rendered pages.