Documentation/Compression

Compression

Experimental3 credits

Compression - dramatically reduces token count while preserving meaning. Using TOON encoding and intelligent compression, achieve up to 80% token reduction.

Production Recommendation

This is a direct endpoint for development and testing. For production workloads, use the Data Intelligence Pipeline -- it provides structured Data Packages with quality metrics, is async by default, and is covered by Enterprise SLAs.

Overview

The Compression service dramatically reduces token count while preserving meaning. Using TOON encoding and intelligent compression, achieve up to 80% token reduction.

Key features:

  • Up to 80% token reduction
  • TOON encoding support
  • Chat message annealing (preserves recent messages)
  • Force-preserve important tokens/digits

API Reference

POSThttps://api.latence.ai/api/v1/compression/compress
Compress text while preserving meaning

Request Parameters

ParameterTypeRequiredDefaultDescription
textstringInput text to compress
compression_ratenumberTarget compression (0.0-1.0, e.g., 0.5 = remove 50%)
force_preserve_digitbooleanPreserve numbers
apply_toonbooleanApply TOON encoding

Response Fields

FieldTypeDescription
compressed_textstring
original_tokensnumber
compressed_tokensnumber
compression_rationumber
tokens_savednumber
toon_appliedboolean
usageobject

Response Example

200 OKJSON
{
  "compressed_text": "AI transforms information processing...",
  "original_tokens": 2089,
  "compressed_tokens": 1385,
  "compression_ratio": 0.66,
  "tokens_saved": 704,
  "toon_applied": false,
  "usage": { "credits": 1.0 }
}
POSThttps://api.latence.ai/api/v1/compression/compress_messages
Compress chat messages with gradient annealing

Request Parameters

ParameterTypeRequiredDefaultDescription
messagesarrayArray of {role, content} messages
target_compressionnumberTarget compression rate

Response Fields

FieldTypeDescription
compressed_messagesarray
original_total_lengthnumber
compressed_total_lengthnumber
compression_percentagenumber

Response Example

200 OKJSON
{
  "compressed_messages": [
    {"role": "system", "content": "AI assistant."},
    {"role": "user", "content": "What is ML?"}
  ],
  "original_total_length": 170,
  "compressed_total_length": 67,
  "compression_percentage": 60.6
}

Code Examples

from latence import Latence

client = Latence(api_key="YOUR_API_KEY")

# Compress text while preserving meaning
result = client.experimental.compression.compress(
    text="Your long document or prompt text here...",
    compression_rate=0.5,      # Remove 50% of tokens
    force_preserve_digit=True  # Keep numbers intact
)

print(f"Original: {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Saved: {result.tokens_saved} tokens")
print(result.compressed_text)

# Or compress chat messages with gradient annealing
result = client.experimental.compression.compress_messages(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning..."},
        {"role": "assistant", "content": "Machine learning is..."}
    ],
    target_compression=0.5
)

Explore Tutorials & Notebooks

Deep-dive examples and interactive notebooks in our GitHub repository

View on GitHub

Looking for production-grade processing?

The Data Intelligence Pipeline chains services automatically and returns structured Data Packages.