Documentation/Enrichment (Coming Soon)

Enrichment (Coming Soon)

Experimental0.5 credits

10-dimensional per-chunk and corpus-level feature enrichment for retrieval-optimized data. Coming Soon.

Production Recommendation

This is a direct endpoint for development and testing. For production workloads, use the Data Intelligence Pipeline -- it provides structured Data Packages with quality metrics, is async by default, and is covered by Enterprise SLAs.

Overview

The Enrichment service computes 10 feature groups per chunk and at corpus level for retrieval-optimized data. This service requires corpus-level streaming architecture and will be available in a future release.

**Feature groups:** quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy

**Status:** Coming Soon — corpus-level processing requires a dedicated architecture for streaming data from B2, efficient computation across 100k+ chunks, and concurrent multi-user processing.

When to Use

Use enrichment when you need chunk-level metadata (quality, semantic role, centrality) for advanced retrieval strategies like adaptive context selection, quality filtering, or redundancy detection. This service is not yet available.

API Reference

POSThttps://api.latence.ai/api/v1/enrichment/chunk
Split text into semantically meaningful chunks with structural metadata. Part of the Enrichment service (Coming Soon).

Request Parameters

ParameterTypeRequiredDefaultDescription
textstringText to chunk (max 5,000,000 characters)
strategystring
charactertokensemantichybrid
hybridChunking strategy
chunk_sizeinteger512Target chunk size (64-8192)
Range: 64 - 8192
chunk_overlapinteger50Overlap between adjacent chunks
Range: 0 - 8191
min_chunk_sizeinteger64Minimum chunk size — smaller chunks are discarded
Range: 1 - 8192
request_idstringOptional request tracking ID

Response Fields

FieldTypeDescription
successbooleanWhether the request succeeded
data.chunksarrayArray of chunk objects with content, index, start, end, char_count, token_count, semantic_score, section_path
data.num_chunksintegerTotal number of chunks produced
data.strategystringChunking strategy used
data.chunk_sizeintegerTarget chunk size parameter
data.processing_time_msnumberProcessing time in milliseconds
usageobjectCredit usage information

Response Example

200 OKJSON
{
  "success": true,
  "data": {
    "chunks": [
      {
        "content": "Introduction to machine learning...",
        "index": 0,
        "start": 0,
        "end": 512,
        "char_count": 498,
        "token_count": 127,
        "semantic_score": 0.87,
        "section_path": ["Chapter 1", "Section 1.1"]
      }
    ],
    "num_chunks": 42,
    "strategy": "hybrid",
    "chunk_size": 512,
    "processing_time_ms": 45.2
  },
  "usage": { "credits": 0.5 }
}
POSThttps://api.latence.ai/api/v1/enrichment/enrich
Full enrichment pipeline: chunk text, compute dense embeddings, and calculate 10 feature groups in parallel. Returns chunks, embeddings, and rich metadata for each chunk.

Request Parameters

ParameterTypeRequiredDefaultDescription
textstringText to enrich (max 5,000,000 characters)
strategystring
charactertokensemantichybrid
hybridChunking strategy
chunk_sizeinteger512Target chunk size (64-8192)
Range: 64 - 8192
chunk_overlapinteger50Overlap between adjacent chunks
Range: 0 - 8191
min_chunk_sizeinteger64Minimum chunk size
Range: 1 - 8192
encoding_formatstring
floatbase64
floatEmbedding output format
featuresarrayFeature groups to compute (default: all 10). Valid: quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy
request_idstringOptional request tracking ID

Response Fields

FieldTypeDescription
successbooleanWhether the request succeeded
data.chunksarrayArray of chunk objects
data.num_chunksintegerTotal number of chunks
data.embeddingsarrayOne embedding per chunk (float arrays or base64 strings)
data.embedding_dimintegerEmbedding dimension (e.g. 1024)
data.encoding_formatstringEmbedding format used
data.featuresobjectFeature groups keyed by name: quality, density, structural, semantic, compression, zipf, coherence, spectral, drift, redundancy
data.strategystringChunking strategy used
data.processing_time_msnumberProcessing time in milliseconds
usageobjectCredit usage information

Response Example

200 OKJSON
{
  "success": true,
  "data": {
    "chunks": [
      {
        "content": "Introduction to machine learning...",
        "index": 0,
        "start": 0,
        "end": 512,
        "char_count": 498,
        "token_count": 127
      }
    ],
    "num_chunks": 42,
    "embeddings": [[0.012, -0.034, 0.056]],
    "embedding_dim": 1024,
    "encoding_format": "float",
    "features": {
      "quality": {
        "per_chunk": [{"coherence_score": 0.72, "is_short": false, "is_long": false, "word_count": 87, "avg_word_length": 5.2}],
        "aggregate": {"mean_coherence": 0.65, "short_chunks": 3, "long_chunks": 0}
      },
      "semantic": {
        "per_chunk": [{"rhetorical_role": "definition", "rhetorical_confidence": 0.82, "centrality": 0.67}],
        "aggregate": {"role_distribution": {"definition": 0.23}, "mean_centrality": 0.55}
      }
    },
    "strategy": "hybrid",
    "processing_time_ms": 1250.4
  },
  "usage": { "credits": 0.5 }
}

Error Handling

All errors return a JSON body with error and details fields.

StatusCodeDescription
400MISSING_FIELD

Missing required field: text

The text field is required
400INVALID_STRATEGY

Invalid strategy. Must be character, token, semantic, or hybrid

Unknown chunking strategy
400INVALID_FEATURES

Unknown features: ['invalid']. Valid: ['quality', ...]

Invalid feature group name
429RATE_LIMITED

Rate limit exceeded

Too many requests — retry after the Retry-After interval

Billing

Pricing Formula

cost = (characters / 1,000,000) × rate

Add-ons & Multipliers

OptionPriceDescription
Chunk task$0.10 / 1M charsText chunking only
Enrich task$0.50 / 1M charsChunk + embed + 10 feature groups

Pricing Examples

Chunk 1M characters$0.10
Enrich 1M characters (all features)$0.50
Enrich 500K characters (3 features)$0.25

Code Examples

from latence import Latence

client = Latence(api_key="YOUR_API_KEY")

# Chunk a document
chunks = client.experimental.enrichment.chunk(
    text="Your document text here...",
    strategy="hybrid",
    chunk_size=512,
)
print(f"{chunks.data.num_chunks} chunks")

# Full enrichment with features
result = client.experimental.enrichment.enrich(
    text="Your document text here...",
    strategy="hybrid",
    features=["quality", "semantic", "drift"],
)
for chunk in result.data.chunks:
    print(f"  [{chunk.index}] {chunk.content[:80]}...")
print(f"Mean coherence: {result.data.features['quality']['aggregate']['mean_coherence']:.2f}")

Best Practices

Use hybrid strategy for the best balance of speed and boundary quality

Start with chunk_size=512 and adjust based on your downstream model's context window

Use the features parameter to compute only the feature groups you need — reduces processing time

For large documents (>1MB), prefer character or hybrid strategy over semantic

Use the quality feature to filter low-coherence chunks before indexing

Use drift detection to identify natural document sections without relying on headings

Explore Tutorials & Notebooks

Deep-dive examples and interactive notebooks in our GitHub repository

View on GitHub

Looking for production-grade processing?

The Data Intelligence Pipeline chains services automatically and returns structured Data Packages.