Compression
Experimental3 creditsCompression - dramatically reduces token count while preserving meaning. Using TOON encoding and intelligent compression, achieve up to 80% token reduction.
Production Recommendation
This is a direct endpoint for development and testing. For production workloads, use the Data Intelligence Pipeline -- it provides structured Data Packages with quality metrics, is async by default, and is covered by Enterprise SLAs.
Overview
The Compression service dramatically reduces token count while preserving meaning. Using TOON encoding and intelligent compression, achieve up to 80% token reduction.
Key features:
- •Up to 80% token reduction
- •TOON encoding support
- •Chat message annealing (preserves recent messages)
- •Force-preserve important tokens/digits
API Reference
https://api.latence.ai/api/v1/compression/compressRequest Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | — | Input text to compress | |
compression_rate | number | — | Target compression (0.0-1.0, e.g., 0.5 = remove 50%) | |
force_preserve_digit | boolean | — | Preserve numbers | |
apply_toon | boolean | — | Apply TOON encoding |
Response Fields
| Field | Type | Description |
|---|---|---|
compressed_text | string | |
original_tokens | number | |
compressed_tokens | number | |
compression_ratio | number | |
tokens_saved | number | |
toon_applied | boolean | |
usage | object |
Response Example
{
"compressed_text": "AI transforms information processing...",
"original_tokens": 2089,
"compressed_tokens": 1385,
"compression_ratio": 0.66,
"tokens_saved": 704,
"toon_applied": false,
"usage": { "credits": 1.0 }
}https://api.latence.ai/api/v1/compression/compress_messagesRequest Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
messages | array | — | Array of {role, content} messages | |
target_compression | number | — | Target compression rate |
Response Fields
| Field | Type | Description |
|---|---|---|
compressed_messages | array | |
original_total_length | number | |
compressed_total_length | number | |
compression_percentage | number |
Response Example
{
"compressed_messages": [
{"role": "system", "content": "AI assistant."},
{"role": "user", "content": "What is ML?"}
],
"original_total_length": 170,
"compressed_total_length": 67,
"compression_percentage": 60.6
}Code Examples
from latence import Latence
client = Latence(api_key="YOUR_API_KEY")
# Compress text while preserving meaning
result = client.experimental.compression.compress(
text="Your long document or prompt text here...",
compression_rate=0.5, # Remove 50% of tokens
force_preserve_digit=True # Keep numbers intact
)
print(f"Original: {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Saved: {result.tokens_saved} tokens")
print(result.compressed_text)
# Or compress chat messages with gradient annealing
result = client.experimental.compression.compress_messages(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain machine learning..."},
{"role": "assistant", "content": "Machine learning is..."}
],
target_compression=0.5
)Explore Tutorials & Notebooks
Deep-dive examples and interactive notebooks in our GitHub repository
Looking for production-grade processing?
The Data Intelligence Pipeline chains services automatically and returns structured Data Packages.