🔬 Ablation Study Details
Component Analysis:
The ablation study systematically tests each component's contribution to achieving 450× compression:
- Stage 1 (Permanent Eviction): Tests SnapKV++ and magnitude-guided token selection
- Stage 2 (Multi-dimensional): Tests hybrid sparse attention and head compression
- Precision Levels: Compares aggressive INT4 floor vs conservative FP16/INT8
- Magnitude Thresholds: Tests extreme (0.1%) vs conservative (1%) thresholds
- Position Awareness: Tests impact of recent window and sink token protection
- Head Selection: Tests reserved FP16 heads for critical attention patterns
Metrics Evaluated:
- Compression ratio achievement
- Generation perplexity degradation
- Memory reduction percentage
- Decode speedup factor
- End-to-end throughput gain
- Component importance ranking
📬 GPT-Neo Architecture Details
Model Specifications:
- GPT-Neo 125M: 12 layers, 768 hidden dim, 12 heads
- GPT-Neo 1.3B: 24 layers, 2048 hidden dim, 16 heads
- GPT-Neo 2.7B: 32 layers, 2560 hidden dim, 20 heads
- Maximum Context: 2048 tokens (full 2048)
Memory Requirements:
- 125M: Minimum 1GB VRAM
- 1.3B: Minimum 6GB VRAM
- 2.7B: Minimum 12GB VRAM (16GB+ recommended)
📦 Proving Protocol Features
Attestable Proof Bundle (.zip) contains:
- Full environment and configuration
- Per-sample raw measurements
- Layer-level compression fingerprints
- Exact package versions for reproducibility
- Ablation study results (if enabled)
Verification:
- Recomputes summary from raw records
- Validates compression ratio achievement
- Checks numerical tolerances
- Hard-fails in CI if verification fails
This ensures research-grade reproducibility on GPT-Neo models with full 2048 token context and component analysis.