SWE-bench SWE-bench Leaderboards Benchmarks SWE-bench SWE-bench Verified SWE-bench Bash Only SWE-bench Multilingual SWE-bench Multimodal SWE-bench Lite About Paper Docs Blog Contact Citations Press Submit SWE-bench Family SWE-agent mini-SWE-agent SWE-smith CodeClash SWE-ReX SWE-bench CLI Official Leaderboards Bash Only Verified Lite Full Multimodal Compare results New! Filters: Open Scaffold ▼ All Tags ▼ Compare results Resolved (bar chart) Resolved by repository Resolved instances matrix Resolved vs cost (scatter plot) Resolved vs average cost Resolved vs cost limit Resolved vs step limit Cumulative cost distribution Cumulative cost distribution (resolved only) Cumulative step distribution Cumulative step distribution (resolved only) Light Selection JSON PNG Copy Link Instances 1-100 Instances 101-200 Instances 201-300 Instances 301-400 Instances 401-500 Select models via the checkboxes, then click Compare results . No models selected Select at least one model using the checkboxes in the first columns, or click one of the following buttons for a pre-defined selection. Quick select: Select top 10 Select top 20 Select all Select all (open weights) SWE-bench Bash Only uses the SWE-bench Verified dataset with the mini-SWE-agent environment for all models [ Post ]. SWE-bench Lite is a subset curated for less costly evaluation [ Post ]. SWE-bench Verified is a human-filtered subset [ Post ]. SWE-bench Multimodal features issues with visual elements [ Post ]. Each entry reports the % Resolved metric, the percentage of instances solved (out of 2294 Full, 500 Verified & Bash Only, 300 Lite, 517 Multimodal). Analyze Results in Detail News [11/2025] Introducing CodeClash, our new eval of LMs as goal (not task) oriented developers! [ Link ] [07/2025] mini-SWE-agent scores 65% on SWE-bench Verified in 100 lines of python code. [ Link ] [05/2025] SWE-smith is out! Train your own models for software engineering agents. [ Link ] [03/2025] SWE-agent 1.0 is the open source SOTA on SWE-bench Lite! [ Link ] [10/2024] Introducing SWE-bench Multimodal ! [ Link ] [08/2024] SWE-bench x OpenAI = SWE-bench Verified [ Report ] [06/2024] Docker -ized SWE-bench for easier evaluation [ Report ] [03/2024] Check out SWE-agent (12.47% on SWE-bench) [ Link ] [03/2024] Released SWE-bench Lite [ Report ] Acknowledgements We thank the following institutions for their generous support: Open Philanthropy, AWS, Modal, Andreessen Horowitz, OpenAI, and Anthropic. © 2025 SWE-bench Team. All rights reserved. GitHub HuggingFace Paper