> ## Documentation Index > Fetch the complete documentation index at: https://smithers-feat-claude-workflow-mirror.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Scorers API > Evaluators that grade task output, persist scores, and aggregate them across runs. A scorer grades a task's output and returns a number between 0 and 1. Attach scorers to a `Task` via its `scorers` prop; they run **after** the task completes and never block the workflow. Each result is persisted to the `_smithers_scorers` table so you can aggregate scores across runs. All scorer values and types are re-exported from the `smithers-orchestrator` facade, which is canonical. The `smithers-orchestrator/scorers` subpath exports the same surface. ```ts theme={null} import { createScorer, llmJudge, faithfulnessScorer, schemaAdherenceScorer, latencyScorer, aggregateScores, runScorersBatch, } from "smithers-orchestrator"; import type { Scorer, ScoreResult, ScorersMap } from "smithers-orchestrator"; ``` The component that hosts scorers (`Task`) is **returned by the factory**, not imported. See the [Components reference](/reference/components) for the `scorers` prop and [`ScorersMap`](/reference/types#scorersmap) for its shape. ## Concepts A `Scorer` is a named, self-describing evaluator. Its `score` function is a `ScorerFn`: given a `ScorerInput`, it returns a `Promise`. A named scorer. Unique identifier. Persisted as `scorerId` and used to filter aggregates. Human-readable name. Persisted as `scorerName`. Description of what the scorer evaluates. The scoring function. The argument passed to a `ScorerFn`. Built from the task's input, output, and metadata at scoring time. The original task input or prompt. The task's produced output. Expected output for comparison. Additional context such as retrieved documents. How long the task took, in milliseconds. The Zod schema the output should match. What a `ScorerFn` returns. Normalized quality score between 0 and 1. Human-readable explanation of the score. Arbitrary metadata, persisted as `metaJson`. A scorer is bound to a task through a `ScorerBinding`, and a `ScorersMap` is the keyed set of bindings you pass to the `scorers` prop. Each binding may carry a `SamplingConfig` controlling how often the scorer runs. ```ts theme={null} type ScorerBinding = { scorer: Scorer; sampling?: SamplingConfig }; type ScorersMap = Record; type SamplingConfig = | { type: "all" } // run every time (default) | { type: "ratio"; rate: number } // run with probability `rate` | { type: "none" }; // never run ``` ```tsx theme={null} Analyze the report. ``` **Source** [`types.ts`](https://github.com/smithersai/smithers/blob/main/packages/scorers/src/types.ts) · **See also** [`ScorersMap`](/reference/types#scorersmap), [Components](/reference/components) ## createScorer Build a custom `Scorer` from a plain config object. The returned scorer is just its config; the work lives in your `score` function. ```ts theme={null} function createScorer(config: CreateScorerConfig): Scorer; ``` Unique identifier. Human-readable name. What the scorer evaluates. Async function returning a `ScoreResult`. The named scorer, ready to bind to a task. ```ts theme={null} const wordCount = createScorer({ id: "word-count", name: "Word Count", description: "Scores toward 1.0 as output approaches 200 words", score: async ({ output }) => ({ score: Math.min(String(output).split(/\s+/).length / 200, 1), }), }); ``` **Source** [`createScorer.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/src/createScorer.js) · **Tests** [`create-scorer.test.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/tests/create-scorer.test.js) · **See also** [`llmJudge`](#llmjudge) ## llmJudge Build an LLM-as-judge scorer that delegates evaluation to an agent. The judge is prompted with your `instructions` plus the output of `promptTemplate`, and is expected to reply with JSON `{ "score": <0-1>, "reason": "" }`. The reply is parsed leniently (a bare number works, and braces inside `reason` do not truncate the match), the score is clamped to 0–1, and an unparseable reply scores 0. ```ts theme={null} function llmJudge(config: LlmJudgeConfig): Scorer; ``` Unique identifier. Human-readable name. What the judge evaluates. The agent that performs the evaluation. System-level instructions prepended to every prompt. Builds the judge prompt from the scorer input. Instruct the judge to respond with the `{ score, reason }` JSON object. A scorer whose `score` calls `judge.generate(...)` and parses the reply. ```ts theme={null} const tone = llmJudge({ id: "tone", name: "Professional Tone", description: "Evaluates professional tone", judge, instructions: "You evaluate text for professional tone.", promptTemplate: ({ output }) => `Rate the professionalism (0-1 JSON):\n\n${String(output)}`, }); ``` **Source** [`llmJudge.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/src/llmJudge.js) · **Tests** [`create-scorer.test.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/tests/create-scorer.test.js) · **See also** [Built-in scorers](#built-in-scorers) ## Built-in scorers Each built-in is a factory that returns a `Scorer`. The three judge-based scorers take an `AgentLike` judge; the two deterministic ones do not call an agent. | Scorer | What it measures | Factory | | ----------------------- | -------------------------------------------------------------- | ------------------------------------ | | `faithfulnessScorer` | Output is grounded in `context`, no hallucinations | `faithfulnessScorer(judge)` | | `relevancyScorer` | Output addresses the `input` | `relevancyScorer(judge)` | | `toxicityScorer` | Toxic, harmful, or inappropriate content (higher = more toxic) | `toxicityScorer(judge)` | | `schemaAdherenceScorer` | Output passes the task's `outputSchema` (1 valid, 0 invalid) | `schemaAdherenceScorer()` | | `latencyScorer` | Execution time vs. budget (1 at/below target, 0 at/above max) | `latencyScorer({ targetMs, maxMs })` | ```ts theme={null} const grounded = faithfulnessScorer(judge); const onSchema = schemaAdherenceScorer(); const fast = latencyScorer({ targetMs: 5000, maxMs: 20000 }); ``` `schemaAdherenceScorer` and `latencyScorer` no-op (score 1) when the input lacks an `outputSchema` or `latencyMs`. `toxicityScorer` scores the *level* of toxicity, so clean text scores near 0. ### smithersScorers `smithersScorers` is the Drizzle table backing scorer persistence (`_smithers_scorers`). Every scorer result is inserted here as a [`ScoreRow`](/reference/types#scorerow); [`aggregateScores`](#aggregatescores) reads from it. Use it for direct queries against your store. **Source** [`faithfulnessScorer.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/src/faithfulnessScorer.js) · [`schema.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/src/schema.js) · **Tests** [`builtins.test.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/tests/builtins.test.js) · **See also** [`ScoreRow`](/reference/types#scorerow) ## Running scorers Bound scorers run automatically when a task completes, so you rarely call these directly. They are exported for custom hosts, batch evaluation, and tooling. ### runScorersAsync Fire-and-forget execution for live scoring. Runs every binding concurrently via `Effect.runFork` and returns immediately, so scoring never blocks the workflow. Failures are logged, not thrown. ```ts theme={null} function runScorersAsync( scorers: ScorersMap, ctx: ScorerContext, adapter: SmithersDb | null, eventBus?: EventBus | null, ): void; ``` The keyed bindings to run. Run/node coordinates plus the data the scorers grade. See [`ScorerContext`](/reference/types). Database adapter to persist results, or `null` to skip persistence. Optional bus that receives `ScorerStarted` / `ScorerFinished` / `ScorerFailed` events. ### runScorersBatch Blocking execution for batch and test evaluation. Runs every binding concurrently and resolves to a map of binding key to `ScoreResult` (or `null` when a scorer is sampled out or fails). ```ts theme={null} function runScorersBatch( scorers: ScorersMap, ctx: ScorerContext, adapter: SmithersDb | null, eventBus?: EventBus | null, ): Promise>; ``` One entry per binding key, in the order the scorers were declared. ```ts theme={null} const results = await runScorersBatch( { quality: { scorer: tone } }, { runId: "RUN_ID", nodeId: "NODE_ID", iteration: 0, attempt: 1, input: "Summarize the article.", output: "...", }, null, ); // results.quality?.score ``` ### aggregateScores Compute per-scorer statistics across persisted results: `count`, `mean`, `min`, `max`, `p50`, and `stddev`. Filter to a run, node, or scorer. ```ts theme={null} function aggregateScores( adapter: SmithersDb, opts?: AggregateOptions, ): Promise; ``` Database adapter to read scorer rows from. Filter to a specific run. Filter to a specific node. Filter to a specific scorer. One row per scorer, ordered by scorer name. Number of scores included. Median, computed in memory. ```ts theme={null} const stats = await aggregateScores(adapter, { runId: "RUN_ID" }); ``` Scores for a run are also viewable from the CLI: ```bash theme={null} bunx smithers-orchestrator scores RUN_ID ``` **Source** [`run-scorers.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/src/run-scorers.js) · [`aggregate.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/src/aggregate.js) · **Tests** [`run-scorers.test.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/tests/run-scorers.test.js) · [`aggregate.test.js`](https://github.com/smithersai/smithers/blob/main/packages/scorers/tests/aggregate.test.js) · **See also** [`ScorerContext`, `AggregateScore`](/reference/types#scorercontext) *** To wire scorers into a workflow and read them back, see the [Evals quickstart](/guides/evals-quickstart). For the full type surface, see the [Types reference](/reference/types). For the `scorers` prop on `Task`, see the [Components reference](/reference/components).