x402 Chunking // Autonomous AI Agent Infrastructure
API.ONLINE
SOLANA-MAINNET
x402.v2

System.Chunker

📁
DROP FILE OR CLICK TO UPLOAD
[SUPPORTED: PDF, DOCX, TXT, RTF, HTML, MD]
PAYMENT REQUIRED
$0.001
1000 USDC micro-units
WALLET NOT CONNECTED
Network: SOLANA-MAINNET

Pricing.Matrix

FREE
$0
  • Recursive splitting
  • 100 pages/day limit
  • Basic quality scoring
  • Character-based chunks
STRUCTURE
$0.002/pg
  • Code block preservation
  • Tables & lists intact
  • Smart sentence bounds
  • Abbreviation handling
BEST VALUE
SEMANTIC
$0.004/pg
  • OpenAI embeddings
  • Topic detection
  • Similarity-based splits
  • Structure preservation
AI
$0.01/pg
  • Claude Haiku analysis
  • LLM boundary detection
  • Semantic understanding
  • Context-aware splits
AI PRO
$0.025/pg
  • Claude Sonnet analysis
  • Per-chunk summaries
  • Entity extraction
  • Enriched metadata
RECOMMENDED

System.Info

>> Purpose: Split documents into chunks suitable for RAG pipelines and vector databases. LLMs have context limits; large documents need to be chunked before embedding or retrieval.

>> How it works: Upload PDF, DOCX, TXT, or HTML. Select a tier. Lower tiers use rule-based splitting. Higher tiers use OpenAI embeddings or Claude for boundary detection. AI Pro adds per-chunk summaries and entity extraction.

>> Payment: Free tier available (100 pages/day). Paid tiers require USDC on Solana via x402 protocol. Payment included in request header after wallet signature.

Comparison

vs. LOCAL LIBS
LangChain, LlamaIndex
We handle structure detection. RecursiveCharacterTextSplitter doesn't preserve code blocks or tables.
vs. UNSTRUCTURED.IO
Similar features
Pay-per-use with crypto. No API keys or accounts needed. Different payment model.
vs. DIY
Build your own
We handle edge cases: abbreviations, decimal numbers, nested structures, quality scoring.
[NOTE] AI Pro tier uses Claude Sonnet to generate per-chunk summaries and extract entities. This adds context that helps retrieval, but increases cost and latency compared to lower tiers.

System.Benefits

API_BASED
REST API. Upload file, get chunks. No SDK required. Works with any language that can make HTTP requests.
PAY_PER_USE
USDC on Solana. No subscriptions. Free tier: 100 pages/day. Paid tiers: $0.002-$0.025 per page depending on features.
TIERED_PROCESSING
5 tiers: basic recursive, structure-aware, embedding-based semantic, Claude Haiku boundaries, Claude Sonnet with enrichment.
STRUCTURE_AWARE
Detects code blocks, tables, lists, headers. Avoids splitting mid-structure. Returns quality scores per chunk.

Features

// STRUCTURE DETECTION
Detects fenced code blocks, markdown/HTML tables, bullet and numbered lists. Avoids splitting these structures across chunks when possible.
// QUALITY SCORES
Each chunk receives a quality grade (A-D) based on length, sentence completeness, and structure integrity. Useful for filtering before embedding.
// SENTENCE BOUNDARIES
Handles common abbreviations (Dr., Mr., Inc., etc.) to avoid false sentence breaks. Splits at actual sentence endings when possible.
// JSON OUTPUT
Returns JSON with chunk text and metadata. Includes quality score, detected structures, and chunk index. Ready for your embedding pipeline.

API.Usage

ENDPOINTS
GET / - API info and config
GET /health - Health check
POST /estimate - Get price estimate for file
POST /chunk/demo - Free tier (100 pages/day)
POST /chunk/structure - Structure tier ($0.002/page)
POST /chunk/semantic - Semantic tier ($0.004/page)
POST /chunk/ai - AI tier ($0.01/page)
POST /chunk/ai-pro - AI Pro tier ($0.025/page) [ASYNC]
GET /job/{id}/status - Poll async job status
GET /job/{id}/result - Get async job result
# 1. Get price estimate
curl -X POST https://api.chunker.cc/estimate -F "[email protected]"
# Returns: { "estimated_pages": 5, "pricing": { "structure": { "price_usd": 0.01, "price_usdc": 10000 }, ... } }
# 2. Free tier (no payment needed)
curl -X POST https://api.chunker.cc/chunk/demo -F "[email protected]"
# 3. Paid tiers: Send USDC payment first, then include TX signature
curl -X POST https://api.chunker.cc/chunk/structure \
  -H "X-PAYMENT: <solana-tx-signature>" \
  -F "[email protected]"
AI PRO ASYNC PROCESSING
AI Pro tier uses async processing for long-running LLM analysis. Instead of waiting, you receive a job_id immediately:

1. POST to /chunk/ai-pro → returns job_id
2. Poll /job/{id}/status for progress
3. Fetch /job/{id}/result when completed

Jobs expire after 1 hour. See API docs for details.
RESPONSE FORMAT
{
  "success": true,
  "tier": "structure",
  "total_chunks": 4,
  "chunks": [
    {
      "text": "chunk content...",
      "metadata": {
        "index": 0,
        "quality_score": "A",
        "has_code": false,
        "has_table": false
      }
    }
  ],
  "metadata": {
    "filename": "doc.pdf",
    "estimated_pages": 2
  }
}

FAQ.Database

What is document chunking? [+]
Splitting large documents into smaller pieces for AI processing. LLMs have context limits, so a 500-page PDF needs to be split into chunks before embedding or retrieval. The quality of chunking affects retrieval accuracy - chunks that cut off mid-sentence or split code blocks create problems for RAG systems.
Why not just run LangChain locally? [+]
You can. RecursiveCharacterTextSplitter splits on character count and doesn't detect document structure. Common issues:

Cuts code blocks mid-function
Splits abbreviations like "Dr. Smith"
Breaks tables into fragments
Scatters list items across chunks

Our Structure tier ($0.002/page) detects these structures and avoids splitting them. Whether that's worth paying for depends on your use case.
What structures does CHUNKER preserve? [+]
Code Blocks: Fenced code (``` and ~~~) stays intact with language detection
Tables: Markdown and HTML tables kept as complete units
Lists: Bullet points, numbered lists, and nested items stay together
Headers: H1-H6, ALL CAPS, and numbered sections (Chapter 1, Section 2.3) are detected and tracked

Each chunk's metadata tells you exactly what structures it contains and which section of the document it belongs to.
What's the difference between tiers? [+]
Free ($0): Basic recursive splitting. 100 pages/day limit.

Structure ($0.002/page): Detects and preserves code blocks, tables, lists. Handles abbreviations. Recommended starting point.

Semantic ($0.004/page): Uses OpenAI text-embedding-3-small to find topic boundaries.

AI ($0.01/page): Claude Haiku analyzes text to identify semantic boundaries.

AI Pro ($0.025/page): Claude Sonnet for boundaries + generates per-chunk summaries and extracts entities.
What is quality scoring? [+]
Every chunk gets an A-D grade based on five factors:

Length: Optimal 400-1500 chars (not too short, not too long)
Completeness: Starts with capital letter, ends with punctuation
Coherence: Doesn't start mid-sentence ("and", "but", "however")
Structure: Code blocks are properly closed, not cut off
Density: Reasonable words-per-sentence ratio

Use quality scores to filter chunks before embedding, or flag low-quality chunks for manual review.
What file formats are supported? [+]
PDF - Text-based PDFs (not scanned images)
DOCX - Microsoft Word documents (including tables, headers, footers)
HTML - Web pages (tags stripped, content extracted)
TXT - Plain text files (UTF-8, Windows-1252, Latin-1)

Pages calculated at ~2500 characters per page.
How does payment work? [+]
We accept USDC on Solana - fast, cheap, no subscriptions.

1. Web UI: Connect Phantom wallet, select tier, approve USDC transfer. Done.

2. API: Call /estimate to get price, send USDC transfer on Solana, include TX signature in X-PAYMENT header.

Pay only for what you use. No monthly fees. No credit card required.
Can AI agents use this API programmatically? [+]
Yes. The API uses x402 payment protocol, which supports programmatic payments:

1. Call POST /estimate to get exact USDC cost
2. Execute Solana USDC transfer (requires wallet with signing capability)
3. Call chunking endpoint with TX signature in X-PAYMENT header

No API keys required. Payment verification happens on-chain. Free tier works without any payment for testing.
Is my data stored or logged? [+]
No. Documents are processed in memory and immediately discarded. We don't store your files, chunks, or content. Only basic request metadata (timestamp, file size, wallet address) is logged for rate limiting and payment verification. Your documents never touch disk.