Independent validation: arXiv:2603.11021. Mathematically optimal compression applied to LLM weights. Not an approximation — a theorem.
All measurements at equivalent model quality on Qwen 2.5-7B. Lower cosine drift and bits/weight is better.
| Method | Compression Ratio | Cosine Drift | Bits / Weight | Lattice Basis | Error Correction |
|---|---|---|---|---|---|
| HELIX L3 Our method | 5.91× | <0.3% | 5.42 bpw | Proprietary geometric | ECC |
| QuIP# | 4:1 | ~1.2% | 4.0 bpw | Random lattice | None |
| QTIP | 3:1 | ~2.1% | 5.3 bpw | None | None |
| PVQ | 4:1 | ~1.8% | 4.0 bpw | Partial lattice | None |
| GPTQ | 4:1 | ~1.5% | 4.0 bpw | None | None |
| AWQ | 4:1 | ~1.3% | 4.0 bpw | None | None |
Measured on our hardware. WikiText-2 PPL (lower is better). Coherence = 5-question gold-standard suite (factual, math, code, reasoning, language). TPS = tokens/sec on CPU.
| Model | Format | PPL (WikiText-2) | Size | Coherence (5q) | TPS | Status |
|---|---|---|---|---|---|---|
| Llama 3.2 1B | FP16 baseline | 13.61 | 2.48 GB | 5/5 | 28 | Reference |
| HELIX Fidelity+H (5.47 bpw) Ships now | 14.36 (+5.6%) | 1.31 GB | 5/5 PASS | 202 | $19 → | |
| Qwen2.5 7B | FP16 baseline | — | ~16 GB | 5/5 | 6 | Reference |
| Q4_K_M (llama.cpp SOTA) | — | 4.7 GB | 5/5 | 16 | Free | |
| HELIX Balanced (3.47 bpw) Ships now | n/a† | 5.88 GB | 5/5 PASS | 21 | $49 → | |
| Phi-3.5 Mini 3.8B | Q4_0 baseline | — | 2.2 GB | 5/5 | 48 | Reference |
| HELIX Fidelity+H (5.47 bpw) Ships now | — | 2.99 GB | 5/5 PASS | 14 | $29 → |
sorry. The math behind your model weights is theorem-level correct.Each advantage is a mathematical property, not an engineering choice. You cannot optimize your way past a theorem.
HELIX uses a formally verified high-dimensional codebook — the densest known sphere packing configuration in its target dimension, achieving provably optimal nearest-neighbor coverage. This is a theorem, not a benchmark.
Our ECC corrects up to 3 errors per codeword. Every compressed weight is self-healing: single-bit flips, memory errors, and transmission noise are corrected automatically at decode time. No other quantization method in the comparison table offers error correction.
HELIX Fidelity is a one-time download that decompresses to a standard Q8_0 GGUF file. The compression algorithm runs on Cloudflare Workers; the resulting file works locally with Ollama, llama.cpp, and LM Studio — no cloud dependency at inference time. arXiv:2603.11021 (validates geometric vector quantization achieving provably optimal coverage in its target dimension). Parity check runs on every batch for integrity assurance.
Every compression and decompression operation follows a deterministic path with mathematical guarantees at each stage.
sorry.sorry.Compress your LLM weights with HELIX's proprietary geometric compression. Sub-50ms on Cloudflare Workers. Pay per request with USDC.