<cd ../feed
why-we-no-longer-evaluate-swe-bench-verified.log
|src: openai.com

Why we no longer evaluate SWE-bench Verified

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.