autoresearch-unified

Autonomous LLM-driven hyperparameter optimization for GPU pretraining research. Claude designs experiments, trains GPT-2-scale models, and decides what to try next.

Benchmark Leaderboard

Ranking LLMs as autonomous ML researchers by keep rate, crash rate, and best validation bits-per-byte across GPUs and datasets.

Interactive Course

Learn how the entire system works — the experiment loop, crash resilience, and cross-platform training. No coding experience needed.

Experiment Dataset

Full experimental data published on HuggingFace. Croissant-compliant, indexed on Google Dataset Search.

2,637+ Experiments

Platform guides, dataset results, cross-platform analysis, and data access instructions.

Training scripts, experiment orchestration, and tooling for NVIDIA, AMD, Apple, and Intel platforms.