Measuring AI Ability to Complete Long Tasks

(metr.org)

246 points | by spicypete 7 days ago ago

195 comments