paperbench-evaluating-ai-s-ability-to-replicate-ai-research.log
|src: openai.com
PaperBench: Evaluating AI’s Ability to Replicate AI Research
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.