Trace

Real-time safety for knowledge agents

Three endpoints. One pipeline. Score groundedness, redact PII, and compress context — then turn the results into autonomous decisions and a continuous improvement loop.

Get started Decisions Analytics

Groundedness

/v1/grounding

Privacy

/v1/redact

Compression

/v1/compress

Install and authenticate

Install the SDK, grab an API key from the portal, and export it.

# Install

pip install latence

# Authenticate

export LATENCE_TRACE_API_KEY="lat_..."

Every example below reads the key from the environment automatically.

Score, redact, compress

Three stateless endpoints — call one, two, or all three. Each returns structured data you can act on immediately.

from latence import Latence

trace = Latence()   # reads LATENCE_TRACE_API_KEY from env

# --- 1. Score groundedness ---
ground = trace.grounding.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)
print(ground.score, ground.band)          # 0.92, "green"
print(ground.context_unused_ratio)        # 0.0 — no wasted context

# --- 2. Redact PII ---
privacy = trace.privacy.redact(text="Email me at john@acme.com")
print(privacy.redacted_text)              # "Email me at [EMAIL]"
print(privacy.entity_count)               # 1

# --- 3. Compress context ---
compressed = trace.compression.text(
    text="Our refund policy allows returns within 30 days of purchase. "
         "All items must be in original condition with receipt.",
)
print(compressed.compressed_text)         # shorter, same meaning
print(compressed.tokens_saved)            # 9

Groundedness scoring

POSThttps://api.latence.ai/v1/grounding

Send a response and its retrieval context. Trace scores every claim against the evidence, assigns a risk band, and tells you exactly how much context went unused.

Key response fields

Field	Type	Description
`score`	`float`	Groundedness score 0-1. Higher is better.
`band`	`string`	Risk band: green (> 0.7), amber (0.4-0.7), or red (< 0.4).
`context_coverage_ratio`	`float`	Fraction of the response grounded in context.
`context_unused_ratio`	`float`	Fraction of context left unused — the retrieval-quality signal.
`context_uncertain_ratio`	`float`	Fraction with uncertain grounding.
`nli_aggregate`	`float`	Aggregate NLI entailment score.
`support_units`	`list`	Per-unit verdicts: source_id, usage_state, coverage_score.
`runtime_decision`	`object \| null`	Allow / review / block decision when runtime decisioning is enabled.
`latency_ms`	`float`	Pod-side scoring latency in milliseconds.
`request_id`	`string`	Unique request ID for correlation and logging.

Structured premises with SupportUnit

Pass per-unit provenance (source_id, speaker) and the scorer propagates it back onto every verdict — so you know exactly which document contributed.

from latence import Latence, SupportUnit

trace = Latence()

units = [
    SupportUnit(text="Paris is the capital of France.", source_id="doc-42"),
    SupportUnit(text="It sits on the Seine.",           source_id="doc-42"),
    {"text": "Population: 2.1M.", "source_id": "wiki"},
]

r = trace.grounding.rag(
    response_text="Paris, France's capital, sits on the Seine.",
    support_units=units,
)

for u in (r.support_units or []):
    print(u.source_id, u.usage_state, u.coverage_score)
# doc-42  used       0.95
# doc-42  used       0.88
# wiki    unused     0.12

Privacy and PII redaction

POSThttps://api.latence.ai/v1/redact

Detect and redact over 50 GDPR-defined entity types — emails, phone numbers, addresses, medical records, financial data — before the response reaches the user or your logs. No PII is ever stored by the gateway.

# Default — open mode, mask all PII

privacy = trace.privacy.redact(text="Call John at +1-555-0123 or john@acme.com")

# → "Call [PERSON] at [PHONE] or [EMAIL]"

# Category mode — only redact specific types

privacy = trace.privacy.redact(

text="John's SSN is 123-45-6789",

mode="category",

categories=["social_security_number"],

)

# → "John's SSN is [SOCIAL_SECURITY_NUMBER]"

Key response fields

Field	Type	Description
`redacted_text`	`string`	Input text with PII replaced by type labels like [EMAIL], [PHONE].
`entities`	`list`	Each detected entity: label, text, start, end, score, redacted_value.
`entity_count`	`int`	Total entities detected and redacted.
`unique_labels`	`list[str]`	Distinct entity types found (email, phone, name, ...).
`label_mode`	`string`	Detection mode used: open (all types) or category (selected).
`processing_time_ms`	`float`	Server-side processing latency.

Context compression

POSThttps://api.latence.ai/v1/compress

Reduce retrieved context by up to 60% without losing answer quality. Lower token cost, lower latency, same results. Critical terms like numbers, dates, and proper nouns are force-preserved.

compressed = trace.compression.text(

text=long_context,

compression_rate=0.4, # target 40% of original size

force_preserve_digit=True, # keep all numbers intact

)

# Feed compressed.compressed_text to your LLM instead

# → 31% fewer tokens, same answer quality

Key response fields

Field	Type	Description
`compressed_text`	`string`	Semantically equivalent text with reduced token count.
`original_tokens`	`int`	Token count before compression.
`compressed_tokens`	`int`	Token count after compression.
`tokens_saved`	`int`	Tokens removed.
`compression_ratio`	`float`	compressed_tokens / original_tokens.
`compression_percentage`	`float`	Percentage of tokens removed.
`preserved_terms`	`list[str]`	Critical terms that were force-preserved.

Run all three in parallel

Use AsyncLatence to fire groundedness, privacy, and compression concurrently. Total latency = slowest call, not the sum.

import asyncio
from latence import AsyncLatence

async def trace_pipeline(answer: str, context: str):
    async with AsyncLatence() as trace:
        ground, privacy, compressed = await asyncio.gather(
            trace.grounding.rag(
                response_text=answer,
                raw_context=context,
            ),
            trace.privacy.redact(text=answer),
            trace.compression.text(text=context),
        )
    return ground, privacy, compressed

ground, privacy, compressed = asyncio.run(
    trace_pipeline(
        answer="The patient's refund was processed on 2024-01-15.",
        context="Refund policy: 30-day window. Patient John Doe, case #4412.",
    )
)

print(f"Grounded: {ground.band} ({ground.score:.2f})")
print(f"Redacted: {privacy.redacted_text}")
print(f"Compressed: {compressed.tokens_saved} tokens saved")

Turn results into autonomous decisions

The response fields are routing rules, not diagnostics. Use the band to decide what your agent does next — no human in the loop required.

Band	Score range	Action	What your agent does
green	> 0.70	Allow	Serve the answer. Redact PII first.
amber	0.40 – 0.70	Review	Flag for human review, or retry with better context.
red	< 0.40	Block	Return a safe fallback. Log the failure for analysis.

from latence import Latence

trace = Latence()

def safe_answer(query: str, context: str, llm_answer: str) -> dict:
    """Score, redact, decide — all before the user sees anything."""
    
    result = trace.grounding.rag(
        response_text=llm_answer,
        raw_context=context,
    )

    # Autonomous decision based on band
    if result.band == "red":
        return {
            "action": "block",
            "reason": f"Groundedness {result.score:.2f} — answer not supported by context",
            "fallback": "I don't have enough information to answer that reliably.",
        }

    if result.band == "amber":
        return {
            "action": "review",
            "reason": f"Groundedness {result.score:.2f} — partially supported",
            "answer": llm_answer,
            "flag": "human_review_required",
        }

    # Green band — safe to serve, redact PII first
    clean = trace.privacy.redact(text=llm_answer)

    return {
        "action": "allow",
        "answer": clean.redacted_text,
        "score": result.score,
        "context_used": f"{(1 - result.context_unused_ratio) * 100:.0f}%",
        "entities_redacted": clean.entity_count,
    }

Signals → next steps

context_coverage_ratio low → the answer isn't grounded. Upgrade data quality upstream.

context_unused_ratio > 30% → your retriever ships the wrong chunks. Upgrade retrieval

Every call is automatically logged

You don't need to build a logging pipeline. The gateway records every request — score, band, latency, context ratios, cost, and entity counts — to your Trace dashboard automatically. Just add a request_id to correlate with your own systems.

from latence import Latence

trace = Latence()

def traced_rag_call(query, context, answer):
    result = trace.grounding.rag(
        response_text=answer,
        raw_context=context,
        request_id=f"prod-{query[:20]}",   # your correlation ID
    )

    # Every call is automatically logged to your Trace dashboard.
    # The gateway records: score, band, latency, context ratios,
    # cost — all queryable in the Insights page.

    print(f"[Trace] {result.band} | score={result.score:.3f} "
          f"| unused={result.context_unused_ratio:.0%} "
          f"| latency={result.latency_ms:.0f}ms")

    return result

Analyze and improve

Open the Trace Dashboard to see your production traffic in real time. Use the insights to close the loop and improve your system.

Groundedness p50/p95

Track score distribution over time. Catch regressions before users notice.

Red band rate

The percentage of answers that failed grounding. Your north-star safety metric.

Context unused ratio

How much retrieved context was dead weight. Signals retriever quality.

Entities redacted

Total PII caught by the privacy endpoint. Proves compliance at audit time.

Tokens saved

Cumulative token savings from compression. Directly maps to cost reduction.

P95 latency

End-to-end scoring latency. Stays under 200ms for most payloads.

Open Dashboard Explore Insights

Keep going

Async pipeline

Fire all three endpoints concurrently with AsyncLatence.

See example

Python SDK reference

Full API reference with type hints and examples.

View on GitHub

Live demo

See Trace score a real RAG answer in the interactive demo.

Try the demo

Trace detected a data-quality bottleneck?

When context_coverage_ratio is low or bands trend amber/red, the fix is upstream: improve the retrieved context, document quality, and evidence coverage before the answer reaches users.