autoresearch-unified

Autonomous LLM-driven hyperparameter optimization for GPU pretraining research. Claude designs experiments, trains GPT-2-scale models, and decides what to try next.

Benchmark Leaderboard
Ranking LLMs as autonomous ML researchers by keep rate, crash rate, and best validation bits-per-byte across GPUs and datasets.
Live Data
Interactive Course
Learn how the entire system works — the experiment loop, crash resilience, and cross-platform training. No coding experience needed.
7 Modules
Experiment Dataset
Full experimental data published on HuggingFace. Croissant-compliant, indexed on Google Dataset Search.
2,637+ Experiments
Documentation
Platform guides, dataset results, cross-platform analysis, and data access instructions.
16 Pages
Source Code
Training scripts, experiment orchestration, and tooling for NVIDIA, AMD, Apple, and Intel platforms.
Open Source