🔬 GPT-Neo Architecture Details
Model Specifications:
- GPT-Neo 125M: 12 layers, 768 hidden dim, 12 heads
- GPT-Neo 1.3B: 24 layers, 2048 hidden dim, 16 heads
- GPT-Neo 2.7B: 32 layers, 2560 hidden dim, 20 heads
- Maximum Context: 2048 tokens (full 2048)
Memory Requirements:
- 125M: Minimum 1GB VRAM
- 1.3B: Minimum 6GB VRAM
- 2.7B: Minimum 12GB VRAM (16GB+ recommended)
Optimal Datasets for GPT-Neo:
- WikiText: Clean Wikipedia articles
- OpenWebText: High-quality web text (GPT-2 training data recreation)
- The Pile: 800GB diverse text corpus
- C4: Colossal Clean Crawled Corpus
Compression Adjustments for GPT-Neo:
- Adjusted stage compression ratios for architecture
- Optimized recent window for layer count
- Reserved FP16 heads tuned per model size
- Memory cleanup for 2.7B model
- Full 2048 token context support
📦 Proving Protocol Features
Attestable Proof Bundle (.zip) contains:
- Full environment and configuration
- Per-sample raw measurements
- Layer-level compression fingerprints
- Exact package versions for reproducibility
Verification:
- Recomputes summary from raw records
- Validates compression ratio achievement
- Checks numerical tolerances
- Hard-fails in CI if verification fails
This ensures research-grade reproducibility on GPT-Neo models with full 2048 token context.