Documentation/Enrichment (Coming Soon)

Enrichment (Coming Soon)

Experimental0.5 credits

10-dimensional per-chunk and corpus-level feature enrichment for retrieval-optimized data. Coming Soon.

Production Recommendation

This is a direct endpoint for development and testing. For production workloads, use the Data Intelligence Pipeline -- it provides structured Data Packages with quality metrics, is async by default, and is covered by Enterprise SLAs.

Overview

The Enrichment service computes 10 feature groups per chunk and at corpus level for retrieval-optimized data. This service requires corpus-level streaming architecture and will be available in a future release.

**Feature groups:** quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy

**Status:** Coming Soon — corpus-level processing requires a dedicated architecture for streaming data from B2, efficient computation across 100k+ chunks, and concurrent multi-user processing.

When to Use

Use enrichment when you need chunk-level metadata (quality, semantic role, centrality) for advanced retrieval strategies like adaptive context selection, quality filtering, or redundancy detection. This service is not yet available.

API Reference

POSThttps://api.latence.ai/api/v1/enrichment/chunk

Split text into semantically meaningful chunks with structural metadata. Part of the Enrichment service (Coming Soon).

Request Parameters

Parameter	Type	Default	Description
`text`	`string`	—	Text to chunk (max 5,000,000 characters)
`strategy`	`string` charactertokensemantichybrid	`hybrid`	Chunking strategy
`chunk_size`	`integer`	`512`	Target chunk size (64-8192) Range: 64 - 8192
`chunk_overlap`	`integer`	`50`	Overlap between adjacent chunks Range: 0 - 8191
`min_chunk_size`	`integer`	`64`	Minimum chunk size — smaller chunks are discarded Range: 1 - 8192
`request_id`	`string`	—	Optional request tracking ID

Response Fields

Field	Type	Description
`success`	`boolean`	Whether the request succeeded
`data.chunks`	`array`	Array of chunk objects with content, index, start, end, char_count, token_count, semantic_score, section_path
`data.num_chunks`	`integer`	Total number of chunks produced
`data.strategy`	`string`	Chunking strategy used
`data.chunk_size`	`integer`	Target chunk size parameter
`data.processing_time_ms`	`number`	Processing time in milliseconds
`usage`	`object`	Credit usage information

Response Example

200 OKJSON

{
  "success": true,
  "data": {
    "chunks": [
      {
        "content": "Introduction to machine learning...",
        "index": 0,
        "start": 0,
        "end": 512,
        "char_count": 498,
        "token_count": 127,
        "semantic_score": 0.87,
        "section_path": ["Chapter 1", "Section 1.1"]
      }
    ],
    "num_chunks": 42,
    "strategy": "hybrid",
    "chunk_size": 512,
    "processing_time_ms": 45.2
  },
  "usage": { "credits": 0.5 }
}

POSThttps://api.latence.ai/api/v1/enrichment/enrich

Full enrichment pipeline: chunk text, compute dense embeddings, and calculate 10 feature groups in parallel. Returns chunks, embeddings, and rich metadata for each chunk.

Request Parameters

Parameter	Type	Default	Description
`text`	`string`	—	Text to enrich (max 5,000,000 characters)
`strategy`	`string` charactertokensemantichybrid	`hybrid`	Chunking strategy
`chunk_size`	`integer`	`512`	Target chunk size (64-8192) Range: 64 - 8192
`chunk_overlap`	`integer`	`50`	Overlap between adjacent chunks Range: 0 - 8191
`min_chunk_size`	`integer`	`64`	Minimum chunk size Range: 1 - 8192
`encoding_format`	`string` floatbase64	`float`	Embedding output format
`features`	`array`	—	Feature groups to compute (default: all 10). Valid: quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy
`request_id`	`string`	—	Optional request tracking ID

Response Fields

Field	Type	Description
`success`	`boolean`	Whether the request succeeded
`data.chunks`	`array`	Array of chunk objects
`data.num_chunks`	`integer`	Total number of chunks
`data.embeddings`	`array`	One embedding per chunk (float arrays or base64 strings)
`data.embedding_dim`	`integer`	Embedding dimension (e.g. 1024)
`data.encoding_format`	`string`	Embedding format used
`data.features`	`object`	Feature groups keyed by name: quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy
`data.strategy`	`string`	Chunking strategy used
`data.processing_time_ms`	`number`	Processing time in milliseconds
`usage`	`object`	Credit usage information

Response Example

200 OKJSON

{
  "success": true,
  "data": {
    "chunks": [
      {
        "content": "Introduction to machine learning...",
        "index": 0,
        "start": 0,
        "end": 512,
        "char_count": 498,
        "token_count": 127
      }
    ],
    "num_chunks": 42,
    "embeddings": [[0.012, -0.034, 0.056]],
    "embedding_dim": 1024,
    "encoding_format": "float",
    "features": {
      "quality": {
        "per_chunk": [{"coherence_score": 0.72, "is_short": false, "is_long": false, "word_count": 87, "avg_word_length": 5.2}],
        "aggregate": {"mean_coherence": 0.65, "short_chunks": 3, "long_chunks": 0}
      },
      "semantic": {
        "per_chunk": [{"rhetorical_role": "definition", "rhetorical_confidence": 0.82, "centrality": 0.67}],
        "aggregate": {"role_distribution": {"definition": 0.23}, "mean_centrality": 0.55}
      }
    },
    "strategy": "hybrid",
    "processing_time_ms": 1250.4
  },
  "usage": { "credits": 0.5 }
}

Error Handling

All errors return a JSON body with error and details fields.

Status	Code	Description
400	`MISSING_FIELD` Missing required field: text	The text field is required
400	`INVALID_STRATEGY` Invalid strategy. Must be character, token, semantic, or hybrid	Unknown chunking strategy
400	`INVALID_FEATURES` Unknown features: ['invalid']. Valid: ['quality', ...]	Invalid feature group name
429	`RATE_LIMITED` Rate limit exceeded	Too many requests — retry after the Retry-After interval

Billing

Pricing Formula

cost = (characters / 1,000,000) × rate

Add-ons & Multipliers

Option	Price	Description
Chunk task	`$0.10 / 1M chars`	Text chunking only
Enrich task	`$0.50 / 1M chars`	Chunk + embed + 10 feature groups

Pricing Examples

Chunk 1M characters$0.10

Enrich 1M characters (all features)$0.50

Enrich 500K characters (3 features)$0.25

Code Examples

from latence import Latence

client = Latence(api_key="YOUR_API_KEY")

# Chunk a document
chunks = client.experimental.enrichment.chunk(
    text="Your document text here...",
    strategy="hybrid",
    chunk_size=512,
)
print(f"{chunks.data.num_chunks} chunks")

# Full enrichment with features
result = client.experimental.enrichment.enrich(
    text="Your document text here...",
    strategy="hybrid",
    features=["quality", "semantic", "drift"],
)
for chunk in result.data.chunks:
    print(f"  [{chunk.index}] {chunk.content[:80]}...")
print(f"Mean coherence: {result.data.features['quality']['aggregate']['mean_coherence']:.2f}")

Best Practices

Use hybrid strategy for the best balance of speed and boundary quality

Start with chunk_size=512 and adjust based on your downstream model's context window

Use the features parameter to compute only the feature groups you need — reduces processing time

For large documents (>1MB), prefer character or hybrid strategy over semantic

Use the quality feature to filter low-coherence chunks before indexing

Use drift detection to identify natural document sections without relying on headings

Explore Tutorials & Notebooks

Deep-dive examples and interactive notebooks in our GitHub repository

View on GitHub

Looking for production-grade processing?

The Data Intelligence Pipeline chains services automatically and returns structured Data Packages.

Pipeline Guide Get API Key