DATA ENGINE

Turn messy enterprise files into retrieval-ready intelligence

The Data Engine parses raw company data, extracts structure and meaning, applies redaction and enrichment, and emits a retrieval-ready package the Search Engine can serve.

Request Early Access See What's Already Real

Any file type to structured outputBuilt for retrieval, not just parsingRedaction, entities, relations, provenanceDesigned for local and private deployment

step 1/7

Parse

Any file type in. Structured output out.

PDF

XLSX

EML

PNG

structured.mdready

# Q4 Report

## Table 1 - Renewals

- tables: 3 extracted

- figures: 7 detected

Raw files in. Retrieval-ready package out.

WHY THE DATA ENGINE EXISTS

Enterprise retrieval does not fail on models first. It fails on raw data.

PDFs, scans, spreadsheets, images, emails, and fragmented internal content are not retrieval-ready. Generic RAG stacks treat them like input blobs. The Data Engine prepares them for search before retrieval ever begins.

01structure gap

Raw files are not structured enough for reliable retrieval

The Data Engine converts enterprise files into structured, inspectable output with layout, content, and provenance.

layout + provenance

02semantic gap

Messy corpora need more than chunking

The Data Engine extracts entities, relations, metadata, and optional graph signals before they reach search.

entities + relations

03governance gap

Security and governance cannot be bolted on later

The Data Engine applies redaction and controlled processing inside the same system.

policy-safe processing

WHAT IT DOES

One engine for preparing enterprise data for retrieval

Not another parser. Not another point tool. One preparation layer for enterprise retrieval.

Raw enterprise files in. Retrieval package out.

One deterministic preparation lane turns messy enterprise inputs into structured, policy-aware, retrieval-ready output without splitting the work across fragile tools.

6 connected stagesdeterministic preparationsingle search handoff

Files

Bring in raw enterprise inputs without flattening them into text first.

Output state

PDF

XLSX

EML

PNG

Any file type in

Parse

Preserve structure, layout, tables, and document boundaries.

Output state

structured.md

layout

tables

Layout fidelity

Extract

Pull out entities, relations, and metadata before retrieval begins.

Output state

entities

relations

metadata

Retrieval semantics

Redact

Apply policy and masking inside the same deterministic workflow.

Output state

PII

policy

audit trail

Policy-safe processing

Enrich

Attach provenance, graph signals, and cross-document linking.

Output state

graph signals

linking

provenance

Graph-ready context

Package

Emit one portable handoff the Search Engine can actually serve.

Output state

chunks

page source

export ready

Portable handoff

HOW IT WORKS

From file to package

Step 01

Parse any enterprise file

PDFs, spreadsheets, slides, images, emails, and more become structured output instead of raw text blobs.

raw files

contract.pdfPDF

pricing.xlsxXLSX

scan.pngPNG

structured parse

# Q4 Renewal Summary

## Clause 12 - Notice

sections: 14 preserved

tables: 3 extracted

layout: retained

layout preservedsections retainedtables detected

Step 02

Extract structure and meaning

The engine identifies entities, relations, metadata, and document structure in a way downstream retrieval can use.

entities + relations

CustomerClause 12.4Renewal

Customerrenewal_notice

Clause 12.4notice_window

retrieval metadata

doc typecontract

source pagep.12

metadataready

relationsattached

resolved before chunking

Step 03

Apply redaction and enrichment

Sensitive content can be redacted, while additional enrichment prepares the data for graph and higher-quality retrieval.

redaction policy

owner_email[masked]

customer_phone[masked]

audit trailretained

enrichment + provenance

CustomerContractRenewal

graph linkattached

source pagep.12

policy statecarried

graph-ready signals attached

Step 04

Emit a retrieval-ready package

The output is a portable artifact with chunks, provenance, metadata, and enrichment signals.

latence-package-v1.2

Portable retrieval handoff with inspectable state.

export ready

chunks2,184

entities536

provenanceattached

relations1,102

page source

redaction state

metadata

OUTPUT

The output is not text. It is a retrieval package.

This is the handoff between raw enterprise data and high-performance retrieval.

RETRIEVAL PACKAGElatence-package-v1.2

Provenance

source document

contract_q4.pdf

renewal notice clause / customer contract

private

chunk lineage

page12

spanclause_12.4

chunk idch_0184

checksum8fc1b2

Metadata

document typecontract

layoutpreserved

languageen-US

review stateready

policy scopeprivate

renewallayout-awaresearch-safe

latence-package-v1.2

One portable artifact for search, provenance, and downstream intelligence.

export ready

chunks2,184

entities536

relations1,102

page provenanceattached

redaction statecomplete

package statusvalidated

artifact manifest

manifest.jsoncomplete

chunks.parquet2,184 rows

entities.jsonl536 rows

provenance.indexattached

active retrieval handoff

chunk ch_0184page 12

Customer must provide renewal notice 90 days prior to expiration. Linked entities, source page, and policy state stay attached.

sourcecontract_q4.pdf

entity refs3 attached

policy stateprivate

Entities + relations

CustomerContractRenewalNotice

Customer -> renewal_notice

Clause 12.4 -> notice_window

Contract -> payment_reconciliation

entities536

relations1,102

Redaction + audit

emailmasked

phonemasked

auditretained

policy statecomplete

export policy

access scopesearch-safe only

reviewpassed

chunkssource and page provenanceentities and relationsmetadataredaction stateenrichment signals

WHY IT IS BETTER

Better than generic RAG preprocessing

Fully unsupervised structuring yields entities, labels, relations, ontologies, and a disambiguated connected graph that augments both graph search and BM25 hybrid retrieval.

generic pipeline

Parse. Chunk. Recover meaning later. The system stays text-first, so structure, ontology, and graph augmentation arrive too late or not at all.

brittle handoffs

parse

chunk

index

Text first, structure later

Parsing collapses layout, labels, provenance, and ontology context into raw text too early.

Chunking becomes the recovery mechanism

Meaning, entities, and relations are guessed after chunking instead of being extracted before retrieval.

Graph and lexical retrieval stay disconnected

There is no shared graph, no stable IDs, and no metadata layer to enrich hybrid BM25 search.

Manual cleanup creeps into production

Labels, ontology mapping, and disambiguation turn into brittle downstream work instead of core pipeline output.

typical result

raw textbrittle chunksweak metadata

DATA ENGINE AUGMENTATIONunsupervised graph + hybrid

fully unsupervised output

The Data Engine automatically emits structured output, entities, labels, relations, ontologies, and stable IDs that stay attached to the retrieval package.

search-ready

structured outputentitieslabelsrelationsontologiesdisambiguated graph

disambiguated connected graph

Entities resolve to stable IDs and ontology classes, so graph retrieval and lexical retrieval can share the same augmentation layer.

cross-doc linksontology classes

Organization

Document

Legal Clause

Event

Customer

org_482

Contract

doc_17

Renewal Notice

evt_22

Clause 12.4

legal_124

Notice Window

term_19

Payment Recon

fin_88

relation edgesontology linksstable ids

Customer#org_482Clause#legal_124RenewalNotice#evt_22

graph search augmentation

The graph can expand a query through disambiguated entities, relation edges, and ontology classes at retrieval time.

query: renewal notice obligations

Customer

Contract

Clause 12.4

Notice Window

BM25 hybrid metadata enrichment

Lexical retrieval gets enriched fields, labels, and ontology-backed metadata instead of depending on raw text alone.

entityCustomer

labelrenewal_notice

ontologylegal_notice

doc typecontract

why this wins in production

One inspectable package can drive graph retrieval, metadata enrichment, and hybrid BM25 search without manual ontology maintenance.

No manual labeling loop required

Shared graph and lexical augmentation layer

Inspectable package passed directly into search

Proof

Already real.
Now being packaged into the next enterprise release.

Latence is not starting from slides. The core components already exist today — across the current cloud product, SDK, and open-source retrieval infrastructure.

Latence Cloud

Use the existing product to process documents, extract structure, and build intelligence workflows today.

Live product

Python SDK

Develop against the current pipeline and Data Package model now.

pip install latence

voyager-index

Open-source proof of a real local retrieval engine built for multimodal, multi-vector search.

Open source · Rust

vLLM Factory

Open-source proof of real encoder serving for modern retrieval models.

Open source · Python

The waitlist is for the next unified enterprise release — not for a concept that does not exist yet.

03Deployment

Bring your own LLM.
Deploy where your data lives.

Start quickly in any environment. Test. Iterate. And move it behind your walls.

Fastest validation

Managed pilot

Fastest path to validation with the same product architecture.

Controlled deployment

Customer cloud / private VPC

Controlled deployment for teams not ready for full self-hosting.

Production control

Self-hosted / air-gapped

Built for local GPU environments and high-control enterprise deployments.

EARLY ACCESS

If retrieval quality matters, start with the data layer.

The next Latence release packages the Data Engine into a cleaner, local-first enterprise product. If that is what you need, join the early-access list.

Request Early Access See What's Already Real

The technology exists today. The waitlist is for the next unified release.

Turn messy enterprise files into retrieval-ready intelligence

Parse

Raw files are not structured enough for reliable retrieval

Messy corpora need more than chunking

Security and governance cannot be bolted on later

Parse any enterprise file

Extract structure and meaning

Apply redaction and enrichment

Emit a retrieval-ready package

Text first, structure later

Chunking becomes the recovery mechanism

Graph and lexical retrieval stay disconnected

Manual cleanup creeps into production

Already real. Now being packaged into the next enterprise release.

Bring your own LLM. Deploy where your data lives.

Managed pilot

Customer cloud / private VPC

Self-hosted / air-gapped

If retrieval quality matters, start with the data layer.

Already real.
Now being packaged into the next enterprise release.

Bring your own LLM.
Deploy where your data lives.