Public Beta
Home/Trace

Trace

Real-time safety for knowledge agents

Three endpoints. One pipeline. Score groundedness, redact PII, and compress context — then turn the results into autonomous decisions and a continuous improvement loop.

Groundedness

/v1/grounding

Privacy

/v1/redact

Compression

/v1/compress
1

Install and authenticate

Install the SDK, grab an API key from the portal, and export it.

# Install
pip install latence
# Authenticate
export LATENCE_TRACE_API_KEY="lat_..."

Every example below reads the key from the environment automatically.

2

Score, redact, compress

Three stateless endpoints — call one, two, or all three. Each returns structured data you can act on immediately.

from latence import Latence

trace = Latence()   # reads LATENCE_TRACE_API_KEY from env

# --- 1. Score groundedness ---
ground = trace.grounding.rag(
    response_text="Paris is the capital of France.",
    raw_context="France's capital city is Paris.",
)
print(ground.score, ground.band)          # 0.92, "green"
print(ground.context_unused_ratio)        # 0.0 — no wasted context

# --- 2. Redact PII ---
privacy = trace.privacy.redact(text="Email me at john@acme.com")
print(privacy.redacted_text)              # "Email me at [EMAIL]"
print(privacy.entity_count)               # 1

# --- 3. Compress context ---
compressed = trace.compression.text(
    text="Our refund policy allows returns within 30 days of purchase. "
         "All items must be in original condition with receipt.",
)
print(compressed.compressed_text)         # shorter, same meaning
print(compressed.tokens_saved)            # 9

Groundedness scoring

POSThttps://api.latence.ai/v1/grounding

Send a response and its retrieval context. Trace scores every claim against the evidence, assigns a risk band, and tells you exactly how much context went unused.

Key response fields

FieldTypeDescription
scorefloatGroundedness score 0-1. Higher is better.
bandstringRisk band: green (> 0.7), amber (0.4-0.7), or red (< 0.4).
context_coverage_ratiofloatFraction of the response grounded in context.
context_unused_ratiofloatFraction of context left unused — the retrieval-quality signal.
context_uncertain_ratiofloatFraction with uncertain grounding.
nli_aggregatefloatAggregate NLI entailment score.
support_unitslistPer-unit verdicts: source_id, usage_state, coverage_score.
runtime_decisionobject | nullAllow / review / block decision when runtime decisioning is enabled.
latency_msfloatPod-side scoring latency in milliseconds.
request_idstringUnique request ID for correlation and logging.

Structured premises with SupportUnit

Pass per-unit provenance (source_id, speaker) and the scorer propagates it back onto every verdict — so you know exactly which document contributed.

from latence import Latence, SupportUnit

trace = Latence()

units = [
    SupportUnit(text="Paris is the capital of France.", source_id="doc-42"),
    SupportUnit(text="It sits on the Seine.",           source_id="doc-42"),
    {"text": "Population: 2.1M.", "source_id": "wiki"},
]

r = trace.grounding.rag(
    response_text="Paris, France's capital, sits on the Seine.",
    support_units=units,
)

for u in (r.support_units or []):
    print(u.source_id, u.usage_state, u.coverage_score)
# doc-42  used       0.95
# doc-42  used       0.88
# wiki    unused     0.12

Privacy and PII redaction

POSThttps://api.latence.ai/v1/redact

Detect and redact over 50 GDPR-defined entity types — emails, phone numbers, addresses, medical records, financial data — before the response reaches the user or your logs. No PII is ever stored by the gateway.

# Default — open mode, mask all PII
privacy = trace.privacy.redact(text="Call John at +1-555-0123 or john@acme.com")
# → "Call [PERSON] at [PHONE] or [EMAIL]"
# Category mode — only redact specific types
privacy = trace.privacy.redact(
text="John's SSN is 123-45-6789",
mode="category",
categories=["social_security_number"],
)
# → "John's SSN is [SOCIAL_SECURITY_NUMBER]"

Key response fields

FieldTypeDescription
redacted_textstringInput text with PII replaced by type labels like [EMAIL], [PHONE].
entitieslistEach detected entity: label, text, start, end, score, redacted_value.
entity_countintTotal entities detected and redacted.
unique_labelslist[str]Distinct entity types found (email, phone, name, ...).
label_modestringDetection mode used: open (all types) or category (selected).
processing_time_msfloatServer-side processing latency.

Context compression

POSThttps://api.latence.ai/v1/compress

Reduce retrieved context by up to 60% without losing answer quality. Lower token cost, lower latency, same results. Critical terms like numbers, dates, and proper nouns are force-preserved.

compressed = trace.compression.text(
text=long_context,
compression_rate=0.4, # target 40% of original size
force_preserve_digit=True, # keep all numbers intact
)
# Feed compressed.compressed_text to your LLM instead
# → 31% fewer tokens, same answer quality

Key response fields

FieldTypeDescription
compressed_textstringSemantically equivalent text with reduced token count.
original_tokensintToken count before compression.
compressed_tokensintToken count after compression.
tokens_savedintTokens removed.
compression_ratiofloatcompressed_tokens / original_tokens.
compression_percentagefloatPercentage of tokens removed.
preserved_termslist[str]Critical terms that were force-preserved.
3

Run all three in parallel

Use AsyncLatence to fire groundedness, privacy, and compression concurrently. Total latency = slowest call, not the sum.

import asyncio
from latence import AsyncLatence

async def trace_pipeline(answer: str, context: str):
    async with AsyncLatence() as trace:
        ground, privacy, compressed = await asyncio.gather(
            trace.grounding.rag(
                response_text=answer,
                raw_context=context,
            ),
            trace.privacy.redact(text=answer),
            trace.compression.text(text=context),
        )
    return ground, privacy, compressed

ground, privacy, compressed = asyncio.run(
    trace_pipeline(
        answer="The patient's refund was processed on 2024-01-15.",
        context="Refund policy: 30-day window. Patient John Doe, case #4412.",
    )
)

print(f"Grounded: {ground.band} ({ground.score:.2f})")
print(f"Redacted: {privacy.redacted_text}")
print(f"Compressed: {compressed.tokens_saved} tokens saved")
4

Turn results into autonomous decisions

The response fields are routing rules, not diagnostics. Use the band to decide what your agent does next — no human in the loop required.

BandScore rangeActionWhat your agent does
green> 0.70AllowServe the answer. Redact PII first.
amber0.40 – 0.70ReviewFlag for human review, or retry with better context.
red< 0.40BlockReturn a safe fallback. Log the failure for analysis.
from latence import Latence

trace = Latence()

def safe_answer(query: str, context: str, llm_answer: str) -> dict:
    """Score, redact, decide — all before the user sees anything."""
    
    result = trace.grounding.rag(
        response_text=llm_answer,
        raw_context=context,
    )

    # Autonomous decision based on band
    if result.band == "red":
        return {
            "action": "block",
            "reason": f"Groundedness {result.score:.2f} — answer not supported by context",
            "fallback": "I don't have enough information to answer that reliably.",
        }

    if result.band == "amber":
        return {
            "action": "review",
            "reason": f"Groundedness {result.score:.2f} — partially supported",
            "answer": llm_answer,
            "flag": "human_review_required",
        }

    # Green band — safe to serve, redact PII first
    clean = trace.privacy.redact(text=llm_answer)

    return {
        "action": "allow",
        "answer": clean.redacted_text,
        "score": result.score,
        "context_used": f"{(1 - result.context_unused_ratio) * 100:.0f}%",
        "entities_redacted": clean.entity_count,
    }

Signals → next steps

context_coverage_ratio low → the answer isn't grounded. Upgrade data quality upstream.

context_unused_ratio > 30% → your retriever ships the wrong chunks. Upgrade retrieval

5

Every call is automatically logged

You don't need to build a logging pipeline. The gateway records every request — score, band, latency, context ratios, cost, and entity counts — to your Trace dashboard automatically. Just add a request_id to correlate with your own systems.

from latence import Latence

trace = Latence()

def traced_rag_call(query, context, answer):
    result = trace.grounding.rag(
        response_text=answer,
        raw_context=context,
        request_id=f"prod-{query[:20]}",   # your correlation ID
    )

    # Every call is automatically logged to your Trace dashboard.
    # The gateway records: score, band, latency, context ratios,
    # cost — all queryable in the Insights page.

    print(f"[Trace] {result.band} | score={result.score:.3f} "
          f"| unused={result.context_unused_ratio:.0%} "
          f"| latency={result.latency_ms:.0f}ms")

    return result
6

Analyze and improve

Open the Trace Dashboard to see your production traffic in real time. Use the insights to close the loop and improve your system.

Groundedness p50/p95

Track score distribution over time. Catch regressions before users notice.

Red band rate

The percentage of answers that failed grounding. Your north-star safety metric.

Context unused ratio

How much retrieved context was dead weight. Signals retriever quality.

Entities redacted

Total PII caught by the privacy endpoint. Proves compliance at audit time.

Tokens saved

Cumulative token savings from compression. Directly maps to cost reduction.

P95 latency

End-to-end scoring latency. Stays under 200ms for most payloads.

Keep going

Async pipeline

Fire all three endpoints concurrently with AsyncLatence.

See example

Python SDK reference

Full API reference with type hints and examples.

View on GitHub

Live demo

See Trace score a real RAG answer in the interactive demo.

Try the demo

Trace detected a data-quality bottleneck?

When context_coverage_ratio is low or bands trend amber/red, the fix is upstream: improve the retrieved context, document quality, and evidence coverage before the answer reaches users.