Claude Code and Codex are my daily drivers, and I often run them side by side on the same task to compare. For me Claude Code gets something working faster, but I burn the 5h quota on the $200 Max plan really fast. Codex I tend to trust a bit more on the careful diffs. The bigger change for me wasn't the tool though, it was writing a spec doc first (features/UX, technical, language-specific) before either agent touches code, then reviewing every diff. I still set up a lot of the harness by hand. How are people automating worktrees and parallel sessions?
Claude code for a few months now ($100/mo plan). I was using Codex before that, both via VS code. Initially the usage limits for Claude were pretty awful, I would routinely exhaust the 5 hour window with, say, 3 different chats/tasks. It has gotten much better in the last month or so, so I'm not sure if they increased usage limits (maybe corresponding with the ridiculous amount of downtime seemingly getting better?) or something else. So far I haven't really noticed a big change between Opus 4.8 and Fable for coding - both still need a good amount of babysitting even on a smallish repo.
Also see https://model.reviews which is, as of me hitting go yesterday, an attempt to provide a forum for subjective ratings on a task-per-task basis.
Letta Code - It’s a much more coworker-like experience because it can learn, but also performs very well for coding, and the harness can be extended like pi
Is it true that there is not much difference between free and affordable models? And, unless you are spending $2000 per month, you are not really leveraging the industry standard coding agents.
Anecdotally, yes there is definitely a difference. Even e.g. Haiku (cheapest Anthropic model) vs gpt-oss-120b had a big difference in quality and syntax issues when I was testing them for DSL generation. Granted, that's a little different from generating a popular language with lots of training data, but you could consider it a proxy for "learning" new concepts outside of training.
Claude Code and Codex are my daily drivers, and I often run them side by side on the same task to compare. For me Claude Code gets something working faster, but I burn the 5h quota on the $200 Max plan really fast. Codex I tend to trust a bit more on the careful diffs. The bigger change for me wasn't the tool though, it was writing a spec doc first (features/UX, technical, language-specific) before either agent touches code, then reviewing every diff. I still set up a lot of the harness by hand. How are people automating worktrees and parallel sessions?
Claude code for a few months now ($100/mo plan). I was using Codex before that, both via VS code. Initially the usage limits for Claude were pretty awful, I would routinely exhaust the 5 hour window with, say, 3 different chats/tasks. It has gotten much better in the last month or so, so I'm not sure if they increased usage limits (maybe corresponding with the ridiculous amount of downtime seemingly getting better?) or something else. So far I haven't really noticed a big change between Opus 4.8 and Fable for coding - both still need a good amount of babysitting even on a smallish repo.
Also see https://model.reviews which is, as of me hitting go yesterday, an attempt to provide a forum for subjective ratings on a task-per-task basis.
Letta Code - It’s a much more coworker-like experience because it can learn, but also performs very well for coding, and the harness can be extended like pi
(disclaimer: I work on Letta Code)
Doesn't the memory in letta become very expensive considering that LLMs are stateless (the context of the memory needs to be sent).
Is it true that there is not much difference between free and affordable models? And, unless you are spending $2000 per month, you are not really leveraging the industry standard coding agents.
Anecdotally, yes there is definitely a difference. Even e.g. Haiku (cheapest Anthropic model) vs gpt-oss-120b had a big difference in quality and syntax issues when I was testing them for DSL generation. Granted, that's a little different from generating a popular language with lots of training data, but you could consider it a proxy for "learning" new concepts outside of training.
Shameless plug, I use Codex to build my coding agent VT Code (https://github.com/vinhnx/vtcode)
Claude code and open code with various models. Codex thrown in for good measure here or there and when I hit limits elsewhere
Both CC (mostly pet projects and automation), and Cursor (mostly at work, because I still read the code, interact with python notebooks, etc.)
claude code mostly. run a clawmetry tab alongside so i can see what's actually happening across sessions, especially for longer tasks.
Used Antigravity, but now Claude Code
Aiden, Claude Code
pi.dev,I like minimalism