WebBench benchmark: agents fail 53% of write tasks | Collab365 Spaces