Claude Fable 5

(anthropic.com)

2608 points | by Philpax 3 days ago ago

1081 comments

simonw 3 days ago ago
I've spent enough time with this now in Claude Code (and Claude.ai and Claude Code for web) to have an opinion on Fable 5: it's a beast. I'm throwing some VERY difficult problems at at - things I've been dragging my heels on for months - and it's crunching through them very happily.
One that I'm willing to share (albeit from just a week ago) - I built a Python library last week that bundles MicroPython compiled to WASM to create a sandboxed code execution library: https://github.com/simonw/micropython-wasm
I just told Claude.ai (not even Claude Code - this was the standard Claude chat interface) running Fable 5:
```
  Clone simonw/micropython-wasm from GitHub
  and research how this could use a full
  Python as opposed to MicroPython
```
A few prompts later (and I uploaded the zip files from https://github.com/brettcannon/cpython-wasi-build/releases/t... because Claude chat can't access those files itself) and I have a wheel file that bundles Python itself, compiled to WASM:
```
  uv run --with https://static.simonwillison.net/static/cors-allow/2026/cpython_wasm-0.1.0-py3-none-any.whl \
    cpython-wasm -c 'print(45 ** 56)'
```
Here's the transcript: https://claude.ai/share/a73b8b8b-8ebc-4fef-9e5c-7438e5e7ae35
(It's possible Opus or GPT-5.5 could have done this too, I've not tried the exact same sequence. The Fable vibes are good here, though.)
[-]
- teiferer 3 days ago ago
  > It's possible Opus or GPT-5.5 could have done this too, I've not tried the exact same sequence. The Fable vibes are good here, though.
  And that's the thing. These comparisons are all gut feelings. I'm missing objective unbiased measurements to actually have real comparisons between different models, their different generations, or even just the convention that everybody adds "you are an expert software engineer" and "don't make mistakes" to their prompts because they think it improves anything. Nobody knows if it actually does.
  [-]
  - zylepe 2 days ago ago
    Vibes are all that matter. As soon as you start measuring it, that measurement becomes a target and vendors start optimizing for it at expense of the general usefulness of the model. We’ve seen plenty of models with great benchmark scores flop when people start using it.
  - Wowfunhappy 2 days ago ago
    Lots of things in life are gut feelings. It would be really great if we could determine quantitatively forever whether Rust is a superior programming language to Go, but real life resists those kinds of measurements.
  - Certhas 3 days ago ago
    There are tons of benchmarks in the announcement. But we also know that benchmarks are problematic.
    So the best we can do right now seems to be to combine imperfect case studies like this with imperfect benchmarks to get some unreliable impression of where we are...
  - johnisgood 2 days ago ago
    Yes, these are gut feelings. That said, I have lots of experiences with Opus and I have lots of projects and contributions (all reviewed and tested) made with the help of it. Definitely useful, to me and to people whose project matters to them. :P
    Adding "do not make mistakes" is silly, in my opinion. There is always a good chance it will make mistakes. You should rather be more specific about a thing rather than as broad as "do not make mistakes" is. It just does not work that way.
  - tezza 3 days ago ago
    It is possible to check for improvements. See for yourself:
    https://generative-ai.review/2026/06/claude-fable-rush-test-...
    As mentioned in another HN thread I've done a qualitative side-by-side measurements of Claude Fable vs Opus 4.8 vs ChatGPT 5.5.
    Anyone is able to check the output for themselves and form a judgement.
    Large visible improvements for Fable over Opus 4.8 and ChatGPT 5.5.
    I recently did the same to show the progress from Opus 3.4/ChatGPT o3pro one calendar year ago.
  - hardwaregeek 2 days ago ago
    Ok but isn’t that true of all software development? It’s not like anybody’s done a rigorous test of writing their entire codebase in Python vs Java. It’s all vibes based there. People create post-hoc justifications for why they use certain technologies but the reality is a lot more vibes than anything else.
  - bfrog 2 days ago ago
    How do you measure the performance of people? This is subjective and biased every time.
  - stray 2 days ago ago
    I have a couple projects that have completely stalled because none of the frontier models could advance any further with them - I'm going to give fable a try at them this coming weekend.
    I believe the "you are an expert software engineer" thing puts them into a "mindset" of cosplaying a software engineer - whereas I get astounding results by talking to them in the information-dense, jargon-heavy mode I use with my peers. I can't prove it but I believe that places my session in a better place in latent space.
    ymmv
  - contextfree 3 days ago ago
    fwiw, I gave it the same vibecoding project I'd previously tried with Sonnet 4.5 and it took Fable 2 hours to go well beyond (like, 2x beyond) where I got in 8 hours with Sonnet 4.5. (beyond that idk, because past 8 hours with the Sonnet 4.5 version I hit the "vibe limit" where it becomes easier to just write/edit the code yourself than get the agent to do what you want; and past 2 hours with Fable I hit my usage limit.)
  - solumunus 2 days ago ago
    Just treat it like an employee with infinite energy. You can never really measure the productivity or ability of employees, it’s just pretty obvious when one is better than another. You’re asking them to do things and they’re either coming up with the goods or they aren’t. You can’t really expect much more from agents either but I’m not sure why you need anything more.
  - 2 days ago ago
    [deleted]
  - theshrike79 2 days ago ago
    IMO comparing different models is like comparing songs or paintings or modern art.
    There is no true objective measure, can you mathematically determine which song is the best for everyone for example? Or which painting different people feel is the nicest to look at or what emotion it gives them.
    Yea, you can do the fucking strawberry tests or carwash trick questions, but that doesn't really measure anything useful.
    You can also do benchmarks but how do you measure the output of those?
    The easiest way is just to use them all and get the feels of which of them works best for you. For me it's Claude first, pi.dev + gpt5.5 second. Plain Codex is a distant third and Gemini exists - it's pretty good at finessing web UIs as it does aria labels and usability better than other, but I wouldn't write backend code with it.
  - ElFitz 2 days ago ago
    That’s what evals are for.
    And there’s no reason evals can’t be done on multi-turn agents in a loop (or not): it’s pretty much what all these benchmarks do.
  - 2 days ago ago
    [deleted]
  - andai 2 days ago ago
    I added "you can do anything if you believe" to my agent and it went from not even attempting things to just doing them effortlessly.
    I know how stupid that sounds but it's true.
    Well what do they say... "If it sounds stupid but it works, then it's not stupid!"
  - vonneumannstan 2 days ago ago
    The first thing in the release page is benchmark results...
    https://www.anthropic.com/news/claude-fable-5-mythos-5
  - ivanovm 2 days ago ago
    The benchmarks are now the equivalents of SAT/ACT/other standardized exams for humans. They are directionally quite predictive, but with plenty of outcome variance on the margins
  - torginus 2 days ago ago
    Yeah, if the jump is big, then we should be able to see the qualitative improvements, or see where Opus was tripped up in a task and Fable did succeed
  - lqstuart 2 days ago ago
    It’s almost like they’re interchangeable. We need to start asking these models to solve extremely difficult, contrived DSA coding questions before deciding which ones we employ
  - kmacdough 2 days ago ago
    I believe there is hard evidence that role-playing prompts are effective at leading it towards particular strategies and trains of thought. Not sure that SWE has been specifically studied, but proper science is very slow in the context of rapid change and broad context. It's good to stay grounded in the science that has been done, but we're going to have to do our best in uncharted territory for a while.
    "Don't make mistakes" does seem dumb. It's not guidance.
  - alecco 2 days ago ago
    > These comparisons are all gut feelings.
    https://simonwillison.net/about/#disclosures
    "I have not accepted payments from LLM vendors, but I am frequently invited to preview new LLM products and features from organizations that include OpenAI, Anthropic, Gemini and Mistral, often under NDA or subject to an embargo. This often also includes free API credits and invitations to events."
    But I'm totally unbiased on my gut-feeling posts, trust me bro.
    -- AI influencers.
- kansface 3 days ago ago
  Yes, exactly this. If I didn't care about price at all, I'd exclusively use this model. It functions more like an actual engineer. I'm in the midst of a DB migration, and eg 5.5 continually suggests stuff like "use DB X instead of DB Y for task Z because its 30% faster" which is an impossibility of reality, given we are migrating DBs. Fable jumped in, reduced allocs by literally 46x, found multiple bugs 4.8 and 5.5 created (max file system usage, correctness issues, etc), and continually suggested awesome improvements unprompted. As in, it would finish a task and then suggest we tackle this other existing problem I didn't know about in a very specific manner... this is the first model that feels like its coming for my job.
  [-]
  - josephg 3 days ago ago
    I'm having the same experience. I'm in the process of implementing a new CRDT for realtime collaborative editing. There just aren't a lot of implementations of CRDTs kicking around online for opus or any of the other models to have good design instincts.
    Fable is doing - so far - a great job. I just had one big question around how part of it should work. I had a design sketch, but with some big unknowns. I asked fable to figure it out via reasoning and prototyping, and it did - it even, under its own initiative, wrote a fuzzer for its prototype which explored and verified that its reasoning was correct. It absolutely nailed it. And it found, and fixed, a couple bugs that I'd missed.
    I'm sure its weaknesses will become apparent in time. But, wow this thing is a beast. Its the first time I'm reading the work of an LLM without spotting obvious weaknesses in its reasoning and code. I'm really impressed.
  - weatherlite 3 days ago ago
    > this is the first model that feels like its coming for my job
    Damn you must be good, I've been feeling this for around 2 years now
  - spoiler 2 days ago ago
    Gosh, I must be doing something wrong. I spent 15 minutes (of which a lot was waiting while it was thinking about "backwards rationalising" it's decision and "gaslighting"[1]) arguing with it over why it keeps using `node -e "console.log(require('fs').readdirSync('…'))"` instead of `ls -l …`.
    Like it did everything:
    - this is not a Linux system (true, it was macOS) - it is not an available command - the binary is corrupted - node/js is more precise - V8 JavaScript is faster than bash (true technically??? But not in this context lol) - JavaScript is more versatile
    I forgot what else we went through but there were a few more things. I indulged it because it was incredulous and funny. The prompts from my side were all questions, never instructions. I assume an instruction would've helped here, but also I don't think Opus ever did this (but on the other hand Opus wrote python scripts to format/indent, instead of just running cargo fmt, so I guess potato potato)
- boc 3 days ago ago
  Yeah same here, Fable on "high" is producing substantially better results than Open 4.8 on xhigh for me and my actual real-world evals today. It "feels" smarter and doesn't use nearly as many tokens running in circles. As a result I've been able to run two large refactors today without hitting the context limit danger zones - it's more expensive but also more efficient. It's been able to find some bugs that Opus missed. Pretty impressive stuff.
  [-]
  - garciasn 3 days ago ago
    I keep getting this message:
    > Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
    I'm working on an internal tool that does new business prospecting data collection, scoring, etc. This is ridiculous.
- black_knight 3 days ago ago
  Still does not crack my hardest nuts. Gave it one of them and it blew through my entire allowance on thinking about one question, with no apparent answer in sight!
  I see a lot of people saying they are happy with weaker models, but I am the opposite, I need more strength, more intelligence!
  I am quite happy that opus 4.8 can do some medium intelligence problems. And maybe Fable 5 can do some more more of those! I have a lot of problems to solve!
  [-]
  - user43928 2 days ago ago
    I also see a lot of people saying they are happy with weaker models.
    At work I had to switch to using GPT 5.4 Mini and Qwen 3.6 27B.
    The results were near useless.
    The error rate is through the roof, it's constantly incorrect in its conclusions even when investigating very simple issues.
    Further the models are too unreliable to even move 20 line snippets around without inadvertently modifying them. Ask them to correct it and they still get it wrong.
    Maybe the larger Chinese models are better, but the Mini stuff is next to useless to me.
  - daymanstep 3 days ago ago
    What kind of problems are you trying to have it solve ?
  - tclancy 3 days ago ago
    Perhaps you should rephrase those nuts?
- sd2k 3 days ago ago
  That is pretty wild, it took me a hell of a lot more coaxing and persevering to get to a similar point with eryx [0] (we spoke a bit about this before on Mastodon) using Opus, Fable seems to have a more optimistic 'sure, let's proceed as if this is possible' mindset based on your transcript. Looking forward to trying it out for some hairier problems.
  [0]: https://github.com/eryx-org/eryx
- jameson 3 days ago ago
  Got curious and ran a similar prompt with DeepSeek v4 Pro w/ OpenCode
  No idea what's going on here but agent tested a bunch of stuff. Then I asked to build a wheel so I can run the command you noted above and it appears to pass
  For those who are curious...
  https://github.com/bamggm/micropython-wasm/commit/5ddebae592...
  [-]
  - jameson 3 days ago ago
    Mimo v2.5 Pro Ultraspeed w/ OpenCode
    https://github.com/bamggm/micropython-wasm/commit/8b362fba1f...
- larodi 3 days ago ago
  One thing I can tell you is you are either favored by Anthropic, or your version of the CLI does not exhaust limits, or there's some major bug, as two people around me (myself included) claim it took half an hour to hit the ceiling. Which makes it practically unusable, where the same workflow a day ago produced a good 5-6 hours of workload with several agents.
  [-]
  - piokoch 2 days ago ago
    Monetization is coming. They'll tell companies, AI is replacing your workers, so it is still worth to pay 100K/year for the license, as those AI are not going to jump to other job, get sick, be late, complain, require free coffee and so on.
    Soon the times of AI for $20/$200 a month will be long gone.
  - witx 2 days ago ago
    They are most likely shills from Anthropic, there's quite a few here everytime new models come out.
  - cedws 2 days ago ago
    It’s not meant for subscription users; the subscriptions are just the gateway drug to Enterprise pricing which Anthropic intends to use to juice their numbers before IPO.
  - desmond1303 2 days ago ago
    Or use API billing? We have access to it at my company with no limits
  - simonw 2 days ago ago
    Are you on the $100/month subscription?
- sigbottle 2 days ago ago
  Just tried it. Fable is extremely strong. The fact that we can't point to any concrete architectural upgrade is worrying - that means "it just gets bigger" is kind of viable.
  To be clear, the jump from Opus to Fable was like the jump from pre o3 -> o3 for me. Very sharp improvement, not incremental. But that could be explained by dummy long thinking times.
  It one shot a task that Opus burned hundreds of dollars on to get nowhere. Very tricky semantic refactor, got it right. Granted, again, the semantics Opus and I fleshed out 3 months prior, but Opus couldn't execute on the vision. Fable could.
  Then I discussed some philosophy and it was actually both pleasant (GPT constantly "corrected" you for the sake of correction without clarification, also still often just wrong; it's like it refused to think critically about philosphy) and accurate, and actually helped resolve some deep but subtle misconceptions I had around representationalism. When talking with GPT I felt like I was talking with someone who either was sycophantic or "anything that is not absolute truth is relativism" - Fable actually discussed.
  Both is exciting and kind of makes me depressed. I can definitely see why people are getting hyped about AGI again. All the models were extremely strong technically but I felt like couldn't match the developer's tacit state - Fable definitely did, and that's a basic quailty to be considered "usefully intelligent" IMO, at least to me.
  Shame that it's going away in 2 weeks and probably going to be nerfed if/when it's re-released.
  [-]
  - keybored 2 days ago ago
    Worrying? Depressing? Why are people who are clearly enthusiasts (since they are testing the capabilities on release) always using these words? Is this a genuine interest, something that is pleasurable, or a morbid curiosity to test the bleeding edge of Humanity’s Doom? Bizarre.
- matheusmoreira 3 days ago ago
  Fable has been producing some really good work on my end as well. Definitely better than Opus 4.8. The only problems are the cost and constant cybersecurity refusals. A single session uses up 100% of my 5h window without finishing, and that's when it doesn't get derailed by nonsensical refusals.
  [-]
  - Georgecal 2 days ago ago
    [dead]
- sexylinux 2 days ago ago
  It still does make errors, yes? Because it is not usable, if we need to verify everything. AI is only interesting if it can do things that humans can not do. If you can verify results because you can do it yourself, then why use AI? It will just bind highly skilled people to do verification work. Instead these people should do the actual work, results will come quicker.
  So AI is only interesting to you / your org / humans if it can do things that you can not achieve. But if it still does errors, how could we ever know that super-invention by AI is not wrong?
  If we can not rely on the correctness of the result, it is not usable at all. AI must create reliable and correct results always. That was a very fundamental requirement for computing. This problem has not been solved.
  [-]
  - zahlman 2 days ago ago
    > AI is only interesting if it can do things that humans can not do.
    AI is interesting as long as it can save time and/or money in getting an acceptable result. Anything that runs on a computer and can do "things that humans can do" will automatically end up doing things that humans won't do, simply by virtue of the fact that it runs on a machine that doesn't require sleep, doesn't get bored or demotivated, etc.
    Verifying code (to a level where a responsible person is willing to take ownership for it) isn't trivial, sure; but writing the code by hand requires the same level of care, and the fact that the same person wrote it doesn't actually allow for shortcuts (if we're being properly responsible).
  - Lutger 2 days ago ago
    Humans make mistakes too, does it mean humans are unusable? We accept as empirical fast that most production quality code has 2 - 10 bugs per 1k LoC. According to your premise, virtually all existing software is therefor unusable.
    What if an LLM overall starts to make less mistakes than a medium developer, costs less than its salary and is 100 x faster? For sure, the companies that will leverage these with just a few senior devs doing prompting, testing and requirements analysis, will outcompete other organizations.
  - CookieCrisp 2 days ago ago
    There is plenty of work that does not need to be perfectly verified, because the risk is controlled. Prototyping a javascript game for example. Or code that runs just on your local machine where good enough is good enough. I'm sure a lot of you do super important work that needs 100% quality code all the time, but... some of us don't.
  - naasking 2 days ago ago
    > Because it is not usable, if we need to verify everything.
    Do you verify every line of code written by your fellow developers? I doubt it, which is strange because they make errors don't they?
    What matters is the error rate. Past some threshold and they're better than senior devs who you don't supervise closely.
  - misja111 2 days ago ago
    AI is like a junior developer. You have to review her code carefully but she is most definitely useful.
  - anygivnthursday 2 days ago ago
    Yeah, it makes the same old errors, being confidently wrong then sorry... I mean, it is still an LLM
  - OvervCW 2 days ago ago
    One does not need to be able to create it themselves to evaluate if the output is correct. Consider for example that you can easily determine if a meal tastes delicious without being an expert chef, or the fact that NP problems are very difficult to solve but make for easily verifiable solutions.
  - dbbk 2 days ago ago
    This is what tests are for.
- zahlman 2 days ago ago
  The difficult part here is supposed to be the actual compilation to create the .wasm file ? Or what am I missing here? The wheel is only a few hundred lines of code outside of the Python implementation, and it would seem that the MicroPython version of the project already demonstrates the necessary techniques for operating wasmtime.
  [-]
  - simonw 2 days ago ago
    Read the transcript if you want to see all of the details that make this hard: https://claude.ai/share/a73b8b8b-8ebc-4fef-9e5c-7438e5e7ae35
- sigbottle 3 days ago ago
  Does anyone know what the architecture of Fable is? Is it harnesses? Did they solve persistent learning? What did they do?
  [-]
  - sothatsit 3 days ago ago
    Seems to just be a bigger model.
- mcv 19 hours ago ago
  I have to agree. I'm working on a complex technical proposal that's a bit too far outside my expertise (I tend to submit it to actual experts for a more thorough review). I've worked with Opus and Gemini to review it and work out all the problems and inconsistencies, and I thought it was in a pretty good state.
  As an additional check, I just submitted it to Fable, and it eviscerated it. Tons of inconsistencies found, issues skimmed over or ignored, too optimistic assumptions, math that doesn't really add up if you look at it in context. And as far as I can tell, all of these issues are entirely valid. I now feel embarrassed I'd already sent it to a few people for review. This clearly needs more work.
- zek 2 days ago ago
  if it’s of interest I’ve been working on https://github.com/HubSpot/boomslang
  Which has a full build of python to WASM with a bunch of static libs built in already.
  I will say I built this pre fable and actually the first build of the interpreter to WASM opus pretty much nailed, cpython has secondary support for WASM as a target since like 3.9 or something and it just pulled from that.
  I’ve been meaning to write up a blog post about this sometime, building this has been pretty interesting, including using opus to run a full auto research like loop for days to hyper optimize it’s performance.
  I’m hoping to use fable to power some even crazier WASM adventures tho.
- kubb 3 days ago ago
  What can it do that Opus couldn’t?
  [-]
  - simonw 3 days ago ago
    Always hard to say for sure because I'm not sitting around running the exact same situations through both models in parallel to compare them.
    It feels like you can give it a big chunky problem and leave it alone and it gets it done, with less questions and fewer design decisions that I wouldn't have made.
    In reviewing its code I'm finding less to complain about than Opus. But it's all vibes, if you want a more scientific comparison you'll have to look elsewhere.
  - miohtama 3 days ago ago
    Crank up more revenue for IPO
  - pinkgolem 3 days ago ago
    I gave it a complete database migration of our app, opus failed hard each time... Untyped Json b for some rows, no proper normalisation, falling back asking me questions in between.
    Fable just did it, clean code, one timeout with a hanging bash script, fixed a couple very old very structural bugs in the codebase
- alexchantavy 3 days ago ago
  High, extra, or max?
  [-]
  - qingcharles 2 days ago ago
    It has a setting named "Ultracode" with a flashy little disco light when you select it. (not joking!)
    https://imgur.com/a/NfIxDwN
    I wanna press it, but I don't have that kind of mad, generational wealth to put a prompt through on that setting.
  - simonw 3 days ago ago
    High.
- Emanation 2 days ago ago
  These transcription tasks don't seem difficult for LLMs in general.
- alecco 3 days ago ago
  I hate how the Instagram/TikTok/YouTube influencer cancer is getting into AI. With early access and all that.
  It made sense for people doing proper and fair AI breakdowns waiting on an embargo, but now it's just slop I don't trust anymore.
  [-]
  - simonw 3 days ago ago
    I often get early access but didn't for this one, it's quite possible there's an NDA in an email somewhere that I missed and forgot to sign.
- sagarpatil 3 days ago ago
  Did you hit your weekly limit ?
- tomjakubowski 3 days ago ago
  What are some reasons to consider your project instead of Pyodide?
  [-]
  - simonw 3 days ago ago
    It's difficult to run Pyodide inside server-side Python.
- oblio 3 days ago ago
  How much does it cost? How much did those tasks you did cost?
  [-]
  - simonw 3 days ago ago
    So far it's all fitting into my current $100/month Claude Max subscription. I got lucky: I had 80% of my weekly allowance left and it resets tomorrow, so I'm burning tokens to try and use it all up by then.
    Update: looks like I've spent $82.92 in Fable 5 API priced tokens so far today (still all included in my subscription.)
    Here's a TIL on how I'm calculating spending using AgentsView: https://til.simonwillison.net/llms/agentsview-custom-model-p...
- throwaway27448 3 days ago ago
  > VERY difficult problems
  Compared to what?
- locknitpicker 2 days ago ago
  > Clone simonw/micropython-wasm from GitHub and research how this could use a full Python as opposed to MicroPython
  I might be missing something important but that doesn't seem to be an impressive task.
  On a surface level it sounds like the taks requires gathering calls to MicroPython-specific libs, assess which ones are not compatible with Python, and proceed to determine how to replace the ones that are incompatible.
  From that first iteration, the rest would boil down to troubleshooting the issues missed on the first shot.
  I would be extremely surprised if the likes of GPT4.1 wasn't already capable of handling that task.
  So, beyond Claude Fable finishing a task, what exactly is the differentiating factor?
  [-]
  - simonw a day ago ago
    Did you read the transcript? There are a whole lot of details to figure out: https://claude.ai/share/a73b8b8b-8ebc-4fef-9e5c-7438e5e7ae35
- zirkonit 3 days ago ago
  But, but, how does the pelican look?!
  [-]
  - simonw 3 days ago ago
    See parallel thread: https://news.ycombinator.com/item?id=48464054
- uncivilized 3 days ago ago
  This looks like a toy project, not a “VERY difficult” problem like you stated.
  [-]
  - enraged_camel 3 days ago ago
    What does that mean? Have you never worked on extremely difficult problems as a side project?
- cube00 3 days ago ago
  > Here's the transcript
  It's frustrating that superfluous tokens are burning up our quotas:
  key insight, crucially this, real engineering deltas, net assessment, definitive picture, acid tests, real limits, sharp boundary, proper patch, real root cause, big progress, actually wrong, path finagling, the catch, root cause pinned, everything passes cleanly.
- 120983 3 days ago ago
  [flagged]
  [-]
  - supern0va 3 days ago ago
    AI models decompose problems down into tiny pieces that exist in their training data, so in a sense, you're correct.
    Though that's also what makes humans so good at solving problems as well, it turns out.
    Also, slight tangent: but I do find the "clanker" insult kind of funny. I feel like it counter-intuitively makes the models sound cooler than they are, if anything. I love clankin' shit.
  - adamtaylor_13 3 days ago ago
    If you've got a real argument to make, by all means, make it. Your anger does not magically "make it so".
  - bnchrch 3 days ago ago
    Automobiles are not interesting or useful because they're justing using trails the horses already built.
  - simonw 3 days ago ago
    I mean yeah, in this case I fed my own open source code directly into it.
- rq34qwh 3 days ago ago
  [flagged]
dannyw 3 days ago ago
Impressions from testing Fable 5 prior to launch:
• My most noticeable immediate jump was in how its frontend design was much more intentionally crafted, and delightful without feeling like 'AI vibe coded'; with better end-user usability too.
• In some internal agentic harnesses, it achieved better results with about half the tokens, making it cost the ~same as Opus 4.8 price-wise! The real price increase is less than 2x; with biggest differences in harder problems where Opus 4.8 struggles (or needs many turns).
• Part of the token efficiency improvements come from Fable doing more targeted and surgical diffs, with less non-necessary changes. This is great, because PRs often have less LoC changes for review. It writes more maintainable code without explicit human steering.
• For general conversation and assistant style use cases, didn’t really notice a difference vs 4.8.
• 1M context window, without increased pricing for long context is AWESOME. This is a massive win.
• The classifiers are super aggressive and sensitive and this does happen for very benign, non-security coding tasks. Fallbacks to 4.8 worked like a charm; but the filters are definitely super sensitive.
Overall, I would describe this as a step change and worthy of the "Claude 5" model name. It did take some time to understand the intelligence ceiling of this model; and even with an extended testing window I'm still discovering new things and often surprised (in a good way) by the model.
[-]
- bottlepalm 3 days ago ago
  I just ran it on a tough reverse engineering problem I'm having that neither Claude Code 4.8 or ChatGPT Codex 5.5 could figure out. 30 minutes later Fable has it all figured out perfectly.
  [-]
  - jp0001 3 days ago ago
    I asked it to write security tests for an app and I was downgraded to Opus 4.8. I'm approved for their cyber program!
  - cedws 3 days ago ago
    How did it not immediately flag that up? Are you sure it wasn’t being silently routed to Opus?
  - skerit 3 days ago ago
    Oh nice, it didn't flag the request? I feared any reverse engineering would become impossible because of the new safeguards.
  - derangedHorse 3 days ago ago
    For hard problems you’ll have to use the GPT 5.5 pro model (available via api if you don’t want to spend $100 on the monthly subscription)
  - theragra 3 days ago ago
    I want to test how it will handle e-bike software and hardware RE for my bike. Opus was really good for that, but still made some mistakes. With Fable, I hope I will be able to do a total RE of most components, hopefully including motor firmware to some extent.
  - Gamemaster1379 3 days ago ago
    I had a similar experience. I have a complex RE implementation that has. A lot of layers. 4.8 struggled for weeks. 40 minutes on Fable and I may now have the most performant way to play Tomba on the planet.
  - moffkalast 2 days ago ago
    Yeah I threw my hardest problem at it as well, some convoluted satellite tile reprojection and culling issue in canvas rendering. It took some back and forth for some specifics but it ended up writing a quarter of pyproj in JS from memory and the end result straight up works lmao.
- port11 3 days ago ago
  I’ve had it go through a 50-page PDF of dense, inter-connected specs, and it correctly flagged everything that was done, somewhat done, and missing. It went into a lot of detail and explained where the code deviated from the spec.
  It felt, at least for me, light an impressive step up. Opus 4.8 was already very thorough; but sadly verbose and ‘loopy’ when you push back on its plans. Fable is what I’d use all day if I could afford it!
  [-]
  - YumpiLumpus 3 days ago ago
    How do you know if it was done correctly if it's 50 pages of dense specs?
- InsideOutSanta 3 days ago ago
  After running it for half an hour: it's incredibly good at the visual aspects of UI design.
  [-]
  - beeandapenguin 2 days ago ago
    By what measure?
    I wonder how much of design capability improvements is related to our collective ability to recognize AI design tropes.
  - tsunamifury 3 days ago ago
    "incredibly" is doing a ton of work here. I do not think its doing even moderate work on visual design, but it can spew out a lot of ui that looks arranged ... ok.
    This is still not in the range of shippable UI for top end companies. Maybe for internal tools and enterprise.
    At our comapny we limit to protoypes at most and even find it limited there.
- duxup 3 days ago ago
  I feel like it takes me months to be confident in any of these things.
- morley 3 days ago ago
  Can I ask how you gained preview access to Fable 5?
  [-]
  - kakugawa 3 days ago ago
    I didn't see Fable 5 in the `/model` list, until I ran it with: `$ claude --model fable-5`
  - swyx 3 days ago ago
    he works on evals at canva
  - vain 3 days ago ago
    I had to "claude update" then it showed up
  - mvdtnz 3 days ago ago
    [flagged]
- tipiirai 3 days ago ago
  Curious about how you tested the frontend design capabilities. Thanks
- 3 days ago ago
  [deleted]
bkjlblh 3 days ago ago
> In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
> Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations
[-]
- davedx 2 days ago ago
  Could this be legally construed as anti-competitive behavior?
  Edit: I asked Claude. It replied:
  > Consumer protection / deceptive practices. In the EU this would be a clear UCPD (Unfair Commercial Practices Directive) issue and potentially a DSA violation. In the US, FTC Act §5 prohibits "unfair or deceptive acts." Selling a product that secretly performs worse than advertised for a commercially self-serving reason, without disclosure, is textbook deception. The Samsung/Apple battery throttling cases are instructive here: Apple faced regulatory action across multiple jurisdictions specifically because users weren't told.
  > Competition law. This is where "anti-competitive" gets complicated. Refusing to help competitors build competing products via your ToS is generally legal — you can decide who you license to. But covertly sabotaging output quality for a class of users while charging them full price crosses into different territory. Under EU competition law (Article 102 TFEU), if a company with dominant market position uses covert technical means to disadvantage competitors, that's closer to abusive conduct than a legitimate ToS restriction.
  [-]
  - anon373839 2 days ago ago
    Anthropic’s behavior reeks of insecurity. Imagine Google taking elaborate measures to prevent you from searching about search engine development!
  - greenrd 2 days ago ago
    I think either you've prompted Claude misleadingly, or it's interpreting the law unnecessarily prissily (which is a failure mode I've noticed LLMs falling into).
    This clearly is disclosed, otherwise how did we get to know about it?
- cedws 3 days ago ago
  This makes me want to see China and open models succeed more than anything :)
  [-]
  - 382hi 3 days ago ago
    Don't worry, we will succeed :)
  - lacoolj 3 days ago ago
    Mimo has your back! 1000 t/s on 1T param model
    Just need to wait for this thing to be open sourced :)
    lol it won't tho...
    https://mimo.xiaomi.com/blog/mimo-tilert-1000tps
  - jimbob45 3 days ago ago
    They already have though, no? If we lost access to every model permanently besides Qwen tomorrow, would we really be limited by AI in what we could achieve in the future? Sure, it might be slower and take a little more work but it seems like the cat is already out of the bag.
  - celdon25 3 days ago ago
    Fun fact: If you show fable this post, it will route you to 4.8 automatically.
  - DeathArrow 3 days ago ago
    In a few months they will have Fable level models costing 10 times less and with less safeguards.
  - melicerte 2 days ago ago
    Do you know that some open models developed in China are financially supported by Meta ?
  - johnsimer 3 days ago ago
    Do you want anyone in the world to be able to synthesize dangerous viruses?
- mips_avatar 3 days ago ago
  It's bad that Anthropic can determine what this means. If you're building a modern app you're likely training your own embedding models and now anthropic can just silently sabotage your training pipelines?
  [-]
  - abixb 3 days ago ago
    >We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations
    At the scale of API requests that Anthropic sees, I think the affected organization count might be substantial, and they might not be getting the full model capability that they're paying top $$$ for.
    Also, wonder how they arrived at that estimation.
  - DonsDiscountGas 3 days ago ago
    I have no idea how you came to that conclusion. Unless your training pipeline involves actively querying one of Anthropic models, no they can't. And if it does you're distilling their model.
- matheusmoreira 3 days ago ago
  Looks like Anthropic's definition of safety includes their own safety from competition.
  [-]
  - dragonwriter 3 days ago ago
    AI vendors’ idea of safety has always been safety for the interests of the AI vendor in question. This is not a new development, though this may help more people realize it.
  - axus 3 days ago ago
    AI-generated competition for thee, not for me
  - digitaltrees 3 days ago ago
    ding ding ding. This should be a new measure of anticompetitive analysis in anti trust law.
  - SAI_Peregrinus 3 days ago ago
    It's always been about the safety of their valuation.
- digitaltrees 3 days ago ago
  This feels less like an "we are worried about security" and more, we are in the lead and plan to keep it that way until its too late. In someways its been helpful that openai and anthropic are tipping their hands about their anticompetitive instincts and willingness to steamroll their own clients, customers, and society. But it does feel like its too late to stop this. The advantage people get by using these tools is too tempting to resist even if it is self defeating. It feels like watching people light their own house on fire to stay warm in the deepest, darkest days of winter.
- seemaze 3 days ago ago
  Ah, so this is why raw Mythos was too "dangerous" to realease..
  [-]
  - digitaltrees 3 days ago ago
    Or, they may Mythos seem mystically powerful in advance of the IPO, and are pumping the token use count. But it worked, there is a frenzy for this release in way that is more intense than any previous release.
    Anthropic is doing a better job with their model menu, most people I talk to know immediately that Opus > Sonnet > Haiku but cant tell you what the rank order of open ai models are, when to use them, etc.
- nullbio 3 days ago ago
  Just so everyone is aware. Anthropic has been sabotaging AI researchers and their codebases and shadow-nerfing accounts for several years at this point. This isn't new, but they hadn't disclosed it until now. Likely because it is getting to the point where it's too noticeable, or they're concerned about it leaking from employees.
  [-]
  - dash2 2 days ago ago
    What’s your evidence for this claim?
- rastrojero2000 3 days ago ago
  So that's a possible reason why my specific Claude Opus instance seemed to be impossibly stupid and always degenerates into doing really dumb things to my code!
  Cool, good to know I can trust Anthropic.
- chrisoosthuizen 3 days ago ago
  This feels like the start of a much bigger plan for anthropic to close off the use cases of its models and eat any of its competitors.
  [-]
  - digitaltrees 3 days ago ago
    I am building a coding harness, and I see evidence of them doing this with agentic harnesses and scaffolding. It feels clear to me that as they expand in to the app layer, the window of using their API to build agentic apps is closing, they will steal your ideas, implement the product and then close the gate. I am creating my own inference stack because their incentive to block competitors is becoming super clear.
- johnnyApplePRNG 3 days ago ago
  > Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).
  Am I to understand that this is essentially their form of social-platform ghosting instead of banning?
  So they're not even going to tell you that the question you're asking is against their rules, they're just going to twist up your question and/or the answer somehow such that you waste your time essentially?
  It seems like I ran into this EXACT same functionality from Claude many months ago when I was trying to ask it to research on the web and help me setup the ideal llama.cpp config for local llm inference.
  Funny how lost it got through that relatively simple install when we had all of the documentation in the world (and a human dev with 20+ years experience guiding it along) to go by... and simultaneously it's debugging and building high level cryptography code in rust in the other terminal tab.
  This is infuriating to learn.
  [-]
  - digitaltrees 3 days ago ago
    I have encountered this too. I am building a coding harness for www.propelcode.app and it was working really well until the claude code leak and then all of the sudden it seems almost intentionally stupid or outright manipulative in guiding me down wrong paths. At this point I am using other models for anything related to the tool use design and implementation and bought three mac studios with 512gb ram to run large open source models.
    This experience has made me feel like we have to create a community that moves AI from the mainframe era to the PC era quickly, or we will end up serfs.
  - ls612 3 days ago ago
    I had Claude walk me through getting local LLM models running on my Mac a month or two ago and so far as I can tell it was intentionally helpful. I even stated the reason was to have an uncensored model for myself and it had no objection. Long story short LM Studio running a Heretic Gemma 4 is doing just fine on my system now.
- Jabrov 3 days ago ago
  A million AI researcher voices at big tech companies suddenly cried out in terror and were suddenly silenced
  [-]
  - notrealyme123 2 days ago ago
    I am a AI Researcher at a university. I tried Fable for my current project, but i feel it missunderstands me a bit to often. Now i don't know if i am using it wrong, or anthropic tries to slow my research. That model is a big no no.
- hashmap 3 days ago ago
  3 months before asking for what to eat before a linear algebra exam trips the machine learning topic ban is my guess. I got flagged immediately asking why my JEPA thing breaks weird.
- 2001zhaozhao 3 days ago ago
  How do they detect whether an experiment being done on a smaller model is used to improve a competing frontier model, or just an innocuous hobbyist LLM experiment?
  [-]
  - vitally3643 3 days ago ago
    Given how well the cybersecurity safeguards work, they probably don't.
  - iririririr 3 days ago ago
    infering the surroundings, like everything else. they will probably look at which company is your email, and if you wrote "better than claude" on the readme.md
    this is LLM, it's not like a science or something.
- maxall4 3 days ago ago
  These safeguards are ridiculously sensitive: a prompt as simple as “ Why is an infinitely slow process reversible?” gets flagged as a ToS violation.
- ayewo a day ago ago
  For anyone that is confused like I was, the quoted text I'm replying to was copied from page 13 of the system card [1] and not the model announcement page, which this HN discussion is linked to.
  1: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
- largbae 3 days ago ago
  Pull that ladder up behind ya, will ya son?
  [-]
  - dboreham 3 days ago ago
    Makes it even more odd that we haven't seen alien spaceships.
  - usef- 3 days ago ago
    What ladder did Anthropic use?
- rfgplk 3 days ago ago
  Meaningless and easily bypassable. Will actually try coding up a tensor library with it, see if it sabotages anything.
  [-]
  - mips_avatar 3 days ago ago
    They said in their terms and conditions they will silently sabotage you if you do this.
  - qiine 3 days ago ago
    easily ?
- novaomnidev 3 days ago ago
  So Fable will intentionally lie to you and give you incorrect outputs, if it doesn’t like what you’re asking. Got it.
  [-]
  - novaomnidev 3 days ago ago
    These things are like encyclopedias or dictionaries that can speak in first person… Imagine if your encyclopedia tried to hide entries from you, just absurd!
- theLiminator 3 days ago ago
  This is pretty bullshit, now you have no idea if your output is getting silently nerfed.
- thepasch 3 days ago ago
  Yeesh. Anthropic's paranoia about China is starting to get pathological.
- rspeele 3 days ago ago
  It's afraid!
- 3 days ago ago
  [deleted]
- thothless 3 days ago ago
  the gall of these companies to regulate your usage of stolen knowledge is absolutely hilarious.
  and they want me to pay $100+ a month to be their training?
  i hope we can find morality again.
- gck1 3 days ago ago
  But Chinese models will poison your output if you ask them about Tiananmen Square! That's not good, so poisoning everyone's output without telling them is the only way to prevent that.
  Come on guys, why can't everyone just be there for the good guy?
  [-]
  - Sabinus 3 days ago ago
    You're equating a government suppressing information for social cohesion with a private company protecting their IP.
  - tancop 3 days ago ago
    [dead]
- 827a 3 days ago ago
  This is deeply vile behavior; not remotely the actions of good people.
- spaceclay 3 days ago ago
  [dead]
caleblloyd 3 days ago ago
I recently switched off Max flat rate to Enterprise API pricing and I went from 200/mo to 10k/mo with the same usage pattern on Opus. They don’t offer flat rate to enterprises.
So Fable would cost me 20k/mo at Enterprise rates. That’s around the average cost of a loaded SWE in the USA. “But I’m >2x more productive” doesn’t justify doubling the opex of the Software/IT department for most companies when revenue isn’t even up 10%.
I switched to DeepSeek v4 Pro with OpenCode and am on track for a few hundred dollars of spend this month.
Rewriting your stack from Ruby to Go in 2 days where it would’ve taken 6 months is impressive and fun. But that isn’t upping revenue.
Iterating on net new business features and ideas that are niche that the LLM isn’t trained for are much harder. Is 20x the token cost worth it there?
[-]
- vbezhenar 2 days ago ago
 I don't live in USA. I'm getting paid around $2500/month and that's good salary for developers here, plenty of folks are getting below that number.
 So this pricing is just completely outside of our economics and nobody I know would pay that, no company will justify spending $20k/month when they can hire 10 more developers instead.
 It is very interesting unfolding of events. Can't wrap my head around it completely.
 [-]
 - tauntz 2 days ago ago
 I'll add a concrete example from a not-too-cheap-anymore EU country: Estonia.
 * Average software dev salary in Q12026: 4945€ / month [1]
 * Total cost for the employer: 6616.41€ [2]
 For $20k/month, you'd get 2 x full time mid-level developers + 1x junior dev or QA.
 So the calculation becomes: which option can produce better results for your specific use-case, "you + Fable" or "you + 2x mid-level developers + 1x QA". (and from personal experience, mid-level in Estonia = senior dev in the US, in terms of skillset and experience.. but YMMV)
 (Of course that's simplified. Your full time devs need _some_ level of AI subscription as well + hardware so add a couple of hundred to their salary per month etc so you might only be able to afford 2x mid level devs, instead of 2.5)
 [1]: https://palgad.stat.ee/en
 [2]: https://www.palgakalkulaator.ee/en
 - jve 2 days ago ago
 Not justifying AI expenses, but $2500/mo could easily cost employer close to 5000$/mo depending on country.
 - w0m 2 days ago ago
 > no company will justify spending $20k/month when they can hire 10 more developers instead.
 one big enough to license the model and self host on existing infra.
 - r0fl 2 days ago ago
 Hiring 10 more developers comes with its own set of difficulties and additional overhead
- zulgin 3 days ago ago
 I think you are broadly correct, but just to pushback on a few points: (1) Ability to solve hard problems in days vs weeks as immense value (2) Back-end improvements (if done right), should improve platform speed, stability, scalability etc. which should have revenue implication (3) Ability to on-board a SWE equivalent entity in minutes, have them work on a specific hard problem and then off-board them in minutes can have value
 All of the above, of course, depends upon Fable consistently being a 2x-3x SWE at minimum.
 [-]
 - gmerc 2 days ago ago
 You're not really solving problems, you're retrieving the best match of solved problems from compressed corpus. And that corpus is available to many companies, meaning "hard" problems stop having "hard problem" value the moment they enter the weights of any model via the internet ... or distill from one model to another. Anthropics business model is commoditising knowledge, but as we see with the Fable model card, they only want it done to the knowledge of other businesses, in their own field, they totally hate it.
 - ahtihn 3 days ago ago
 > Back-end improvements (if done right), should improve platform speed, stability, scalability etc. which should have revenue implication
 Depends entirely on the domain. If you're selling entreprise software, this kind of stuff barely matters for sales.
 It can reduce operational costs which is good but there's a limit to how much that's worth.
 - skywhopper 2 days ago ago
 The thing about AI-generated “solutions” is that they often go down bad rabbit holes and need to be re-run, or since they are so “cheap” to create they are often just thrown away and rebuilt when requirements evolve. Plus, just more stuff is created and needs to be maintained. So in the end, your efficiency gains go out the window.
 - ponector 2 days ago ago
 In my experience, the challenge in software development is not to solve a problem, but to define the outcome, the scope, the acceptance criteria etc.
 - fendy3002 2 days ago ago
 20x the cost means you need to have fable to be 20x better than the alternative, which is a tall order. And there's more options out there too, perhaps the 4x cost is enough.
 This means if the deepseek / under 1k alternative is at least x1.2 improvement, fable needs to be x24, which I think is very2 unreasonable. It is possible for it to worth if it can x2 a $20k SWE, though I doubt it can do that.
 - henry2023 3 days ago ago
 “Ability to solve hard problems in days vs weeks as immense value”. Citation needed.
 LlMs are incredible don’t get me wrong, but they are good on tiny contexts (writing a script). Not on software engineering (adding features to Chrome).
 - system2 3 days ago ago
 >pushback on a few points
 Claude keeps telling me this when I argue with it. LMAO.
- busch_j 2 days ago ago
 I work at a smaller tech company (<300 people), and my friend showed me everyone's spending.
 Our top user is at 10k a month, but the next highest is $2,000.
 I would say the average is around $1,000-$1,500 for a developer.
 We have completely unrestricted access to Claude, Codex, and Cursor.
 Funny enough, the guy spending 10k is not even a dev by trade but an SME in what we work on that just vibe codes apps and somehow has not been cut off yet lol.
 I have a single thread of GPT 5.5 medium running basically all work hours and I am around $1,500 a month in spend on Enterprise pricing.
 [-]
 - brokencode 2 days ago ago
 At my company, most devs are under $1500 a month as well.
 I’ve heard of a few cases of devs racking up bills fast, but it has typically been due to inefficient context usage. Like they just have one super long session with Opus 1M and are getting killed with input token costs and cache misses.
 With careful context management and some thought into good approaches to problems, I have personally only rarely even hit $1k in regular use.
 - mywittyname 2 days ago ago
 > Funny enough, the guy spending 10k is not even a dev by trade but an SME in what we work on that just vibe codes apps and somehow has not been cut off yet lol.
 I'm guessing he's producing pretty valuable work. We have a few SMEs that vibe code tons of stuff with Claude. The only thing they really need tech for anymore is deployment and helping get their wheels unstuck on occasion.
 - boplicity 2 days ago ago
 Interesting! Would it be fair to say your company spend $100k to $150k per month on this?
 Multiply this times many, many companies, and you can see how providing AI could theoretically be a good business to be in. Margins may be tight, though.
 Also -- I'm convinced someone will figure out more use cases beyond software programming, which will result in many more companies spending $1k+ per employee per month.
 It remains to be seen how much of this is a bubble.
- sevenzero 3 days ago ago
 >I switched to DeepSeek v4 Pro with OpenCode and am on track for a few hundred dollars of spend this month.
 I was about to say that. Deepseek is just magnitudes cheaper and absolutely good enough for most things. Anthropic and co just try to milk the cow while its possible. If they cant compete with Deepseek pricing I do not see a bright future for them.
 [-]
 - Saline9515 3 days ago ago
 Not only Deepseek, other providers such as Xiaomi MiMo are excellent as well and offer fast token modes and other perks.
- Oras 3 days ago ago
 > Is 20x the token cost worth it there?
 No it doesn’t and will not be. Companies have not realised the cost yet, wait till the end of the financial year and you’ll see a different direction.
 DeepSeek v4 is pretty decent, and probably on par with sonnet. I see a future of hybrid models where opus or fable might be used only for complicated features or bugs, but general day to day would be DeepSeek or whatever good models that will be released later.
- CamperBob2 2 days ago ago
 I recently switched off Max flat rate to Enterprise API pricing and I went from 200/mo to 10k/mo with the same usage pattern on Opus. They don’t offer flat rate to enterprises.
 So what keeps your management from just buying everyone individual flat-rate Max subscriptions, or at least buying them for the users responsible for the sky-high token invoices?
 I see a lot of comments like this but I don't understand why some people willingly pay so much more than others for the exact same service. What are you getting that I don't get as a $100/mo Max subscriber?
 [-]
 - lukax 2 days ago ago
 Zero data retention policies.
- matheusmoreira 2 days ago ago
 > So Fable would cost me 20k/mo at Enterprise rates
 That's enough to buy a house in my country...
- haolez 2 days ago ago
 Eventually solving for cost is a much easier problem than solving coding.
- WinstonSmith84 2 days ago ago
 With GPT 5.5 on the $100 plan, it's hard to hit any 5h/7d limits - while allegedly being better than DeepSeek 4 pro. Not sure why, or how you spend "a few hundred dollars of spend".
 With that said, I still had the Pro plan on Claude, I didn't expect much, but it blew up my 5h allowance on Fable with one simple single prompt, and it didn't even complete lmao
 [-]
 - adrianvi 2 days ago ago
 Important to note that both OpenAI and Anthropic do not allow the subsidized monthly subscriptions for enterprises.
 Companies have to pay monthly for the harness app (codex, claude code) and the tokens are priced separately based on standard API pricing.
 - matheusmoreira 2 days ago ago
 It's not just Pro! I have Max 5x and Fable absolutely blew up my 5h window. Didn't complete the code review either, and got downgraded back to Opus 4.8 on the really important memory safety parts I actually needed it for. It's an excellent model but Anthropic's not providing a good experience.
 - vbezhenar 2 days ago ago
 I'm on $200 plan which is supposedly 20x usage of $20 plan. With few Fable prompts (I'm working on u-boot port) I got 10% of my 5h usage, so that's already 2x of $20 plan usage and that would be 40% of $100 plan.
 So Fable is just not usable for $20 plan and barely usable for $100 plan.
- lionkor 3 days ago ago
 Do you understand that, for 10-20k a month, you can hire 1-2 senior engineers AND give them Claude subscriptions?
 [-]
 - lofaszvanitt 2 days ago ago
 why would you expose to a company what are you working on, in what way and on what research?
 - baq 3 days ago ago
 will they be a better investment than your current staff engineer with fable token allowance?
AquinasCoder 3 days ago ago
From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
This seems like the pharmaceutical method of get them hooked on the drug with free samples, then once they can't live without it, raise the price. I'm not sure I want to start using Claude Fable on a max plan if it's just going to go away on June 23rd.
But maybe the more charitable reading is that they didn't have to offer this model at all on those plans and they are giving the standard free trial.
[-]
- PeterStuer 3 days ago ago
  I'll be amazed if they manage to keep their infra responsive over the next 2 weeks.
  [-]
  - kilroy123 3 days ago ago
    I've been getting a lot of these messages today:
    API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited
  - trollied 3 days ago ago
    They just leased a massive spacex data centre.
  - scamdrill 3 days ago ago
    No issues that I've seen so far. Seems to be holding up for now.
  - fendy3002 2 days ago ago
    Opus will be gutted furthermore. /s I feel 4.8 is very slow in last 2 days
- leptons 3 days ago ago
  This is the entire business model of all AI companies. It costs far more to run the datacenters and build more capacity than they could ever hope to make back at current pricing models. I'm looking forward to pricing to catch up with reality and the resulting chaos that ensues.
  [-]
  - razster 3 days ago ago
    Kind of how DeepSeek v4 dropped their pricing? I sense a shift which will hopefully bring lower and lower cost. Then again Qwen3.6 coding has been all I've needed for my projects and I'm perfectly fine with free.
- bandrami 2 days ago ago
  I was just thinking this reminds me of the scene in The Wire where Avon admits to D'Angelo that the new heroin is in fact just the old heroin with different baby powder cutting it.
- linsomniac 3 days ago ago
  I was just saying last week: If Opus 4.8 max is as good as we get, and we plateau there, I think I'd be fine with it.
  For the stuff I've thrown at it, that configuration has done a really great job. Including 70+KLOC go proxy with extensive test suite, some retro games, and more.
- rzmmm 3 days ago ago
  Seems to me this is more honest than the Mythos claims a while ago. too powerful to release publicly. Too expensive?
  [-]
  - sebzim4500 2 days ago ago
    Didn't they admit this at the time? Cost was one of the reasons they gave for not immediately making it public.
- 3 days ago ago
  [deleted]
- voxic11 2 days ago ago
  Or maybe its all about compute availability like they say. It could be that they plan to start training a new model on the 22ed, so the amount of compute available for inference will be greatly reduced.
jumploops 3 days ago ago
It's interesting that we're seeing these gains when it seems Mythos/Fable is "just" a scaled up version of their existing architecture[0].
When GPT 4.5 launched, the gains compared to the model size didn't seem that great, leading some to believe that the only progress we'd see would come from RL.
This model certainly has quite a "substantial amount of post-training and fine-tuning", but it's also based on a new pretrain[1][3], which given the cost, indicate that it is in fact quite a bit larger than Opus 4.X.
[0] One of the early testers mentioned: "As far as I can tell from talking to people internally at Anthropic, there's nothing special about architecturally"[2]
[1] Section 1.1 in https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
[2] https://youtu.be/GrdEid8H6H4?t=168
[3] There were rumors going around when Mythos was first announced that it was the first 10T parameter model, but I can't find a verifiable source for that number.
[-]
- motoboi 3 days ago ago
  There’s nothing much new about the architecture. The real gains come from the usage traces.
  It turns out that having a text based interface for a text-trained model creates a very nice feedback loop.
  Right now as we speak, people are generating text traces on anthropic and OpenAI servers that teach their models to do everything under the sun, text wise.
  So people right now getting super mad at how dumb the model is when reverse-engineering a super complex function from binary, when they write “stop, you dumb robot, you are going wrong, go this way thank you very much” are actually leaving a lesson in the form of the "chat" text history.
  Some may say that each bad word get us closer to ASI.
  That and obviously the order of magnitude more efficient GPUS we got that allow for different tradeoffs at training time.
  [-]
  - YmiYugy 3 days ago ago
    Makes me wonder, as people grow to trust the AI more and more, not reading the code and barely skimming the implementation plans and simply rerolling if something doesn't work, will the value of these chats erode? Thinking back 1-1.5 years I was closely monitoring what these agents did and steering them quite aggressively. These days not so much. Where will RL signals come from when it approaches humans capabilities ever closer? How well does self play work for coding work? What about multistep tasks where it isn't just about being good at a single task, but evolving a codebase over time in the face of changing requirements?
  - illiac786 a day ago ago
    I thought that these stupid captchas where you teach some AI to recognize fire hydrants without getting paid was rock bottom, but no, you can actually pay a lot of money to train AI. Business is amazing.
  - dominotw 3 days ago ago
    > There’s nothing much new about the architecture. The real gains come from the usage traces.
    sorry. how do you know. i am so curious about where exactly gains are coming from but so hard to even get a little bit of insight.
    i wish govt would fund these labs and make it free and opensource. way better investment than stupid overseas wars.
- MallocVoidstar 3 days ago ago
  Opus 4.0 and 4.1 are more expensive than Fable.
- nbardy 2 days ago ago
  It’s a bit misleading to say nothing special, as they are doing more than just increasing parameter count. Progress has been steady in all the sub components of training from data filtering and weighting to sparse attention, optimizers to up and down the stack various efficiency in training computing.
  They’re using more compute, a bigger model and tons of training quality improvements to get more out of an equivalent model.
- 3 days ago ago
  [deleted]
sigmar 3 days ago ago
The system card is 319 pages, at what point do we call it a "book" instead of a "card"?
There's a quote from a METR report on page 52:
>We ran [Mythos 5] on 38 of our hardest software tasks, including tasks centered around R&D. [Mythos5] generally outperformed an early checkpoint of Claude Mythos Preview in these, including by succeeding on some tasks that had not been solved by any public model we have previously evaluated. However, we still observed the model occasionally failing to correctly interpret nuanced instructions in difficult tasks... Based on the available evidence, we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks. We believe that a better, more confident assessment would require more time, evaluations, and information from the model developer.
[-]
- baq 3 days ago ago
  > we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks
  this is good news, right? right...?
  [-]
  - yaodub 3 days ago ago
    Depends whether "unable to fully automate" means "needs occasional human checkpoints" or "slowly stops caring about your actual goal." Pretty different.
  - arizen 3 days ago ago
    Probably there will always be frontier surface which frontier model of a given generation would not be able to automate.
  - rmast 3 days ago ago
    So in other words... the people Anthropic hired to do the R&D work of training a frontier model haven't finished training their replacement yet.
  - GuB-42 3 days ago ago
    It is certainly good news for those who are selling all these tokens.
  - lionkor 2 days ago ago
    If it's surprising to you, you haven't used LLMs in a domain where you're very skilled.
  - woeirua 3 days ago ago
    lmao, i love how the goal post is now in the "multiple weeks" timeline
- romanovcode 3 days ago ago
  But did it mention developer in the park eating the sandwitch? That is the most important question!
azalemeth 2 days ago ago
I genuinely can't use Fable. I'm a medical physicist. I use the word nuclear a lot. Opus is fine (well, 99% of the time - I've certainly hit the CBRN filters a few times and even been invited to email anthropic about the false positives).
Fable has literally refused to work on any of my problems (even those about fluid dynamics!) and just tells me that I'm violating anthropic's AUP. I've reached out to their support and don't expect to hear anything sensible back. One thing I do look forward to though is OpenAI offering an equivalent model but with less safeguards...
[-]
- agumonkey 2 days ago ago
  That's highly frustrating. How much were you using Opus for your work ? I'm curious about the use and realized benefits of 2026 LLMs in medicine.
  I dearly wish you could leverage the latest models to enhance your research.
  [-]
  - azalemeth 2 days ago ago
    Honestly for a "side project" Opus has been fantastic for me writing a hybrid simulation framework that prior to large scale code generation would have been a matter of years (and writing a grant, assembling a team, etc – in order to do it "properly"). I've had a bit of help with a grad student and I hacking together on a project that is basically "please merge the following GPL codebases and different areas of physics into one coherent environment". I've given Opus validated codes in disparate languages (julia, python, C) and asked for aspects of various algorithms as an extension module to a large chunk of C and C++ code that is a monte carlo simulator that has been around since 2004.
    A bit more context if you care: it's a meso-scale, physiological simulation environment of "particles" that carry nuclear spin, can move in 3D space, and (should they interact with each other or their environment) undergo chemical kinetics. The idea is to simulate molecules within e.g. organs or blood vessels within a person in an MRI scanner, with the motion of the particles dominated by the Navier Stokes equations, but here solved in a Lagrangian (rather than Eulerian) framework by smoothed particle hydrodynamics.
    The fact that particles carry nuclear spin means that we can solve the (semiclassical) Bloch equations and by using a python plugin module import exactly the physical MRI scanner would do (in pulseq format) and be able to predict what signal the machine would record – e.g. there's a whole world of cardiac or neurological flow imaging work done in the context of nasty diseases like stroke or myocardial infarction – which has a bunch of physical artefacts behind it. I'm trying to make a simulation framework that can take in realistic patient geometries and act as a 'data generating process' because if we do it right the various physical artefacts that the machine records are reproduced, surprisingly accurately. Of course you also know the ground truth of where the particles are. I'm specifically interested in a weird technique (which I did my PhD in and you can read an article all about here: [0]) called dynamic nuclear polarisation, where specific spin states of molecules such as [1-13C]pyruvate are injected essentially out of thermodynamic equilibrium and act as short-lived tracers of metabolism – again highly altered in disease. The signal we record is a strong function of the physics of what you told the machine to do, the spatial constraints and environment of the patient's body, and the chemical kinetics of the patients' biochemistry (the latter two are usually what we're interested in).
    Getting them to do chemistry as well as act as a "simple" tracer is more involved, because in the Lagrangian framework the number of particles is ≈ the spatial resolution of your simulation. That's fine if you're simulating water, but if you're simulating something that reacts concentration is not scale invariant (if you want to keep the interpretability of the rate constants). I've worked out an analytic set of scaling rules around this and fortunately for my application environments and length scales "it just works", completely by luck.
    I've used Claude to port various SPH algorithms and boundary condition handling ideas (which are absolutely critical and highly not obvious – we have leaky walls in some places, and e.g. LCR / circuit theory models of the microcirculation to plug in) and it's been a godsend. But I'm running into its limitations constantly. It both confidently makes shit up, claims it is mathematically justified and when the resulting simulation explodes says "I apologise; I lied above" (!) or "I apologise; I am wrong" and I periodically have to yell at it to try to do something more productive.
    The real hope is that this simulation environment would be both generally useful for basically anyone doing flow MRI, and help our basic scientific understanding of what we're measuring (the technique is in many hospitals!) but also be able to produce meaningful synthetic training data for image reconstruction algorithms later on. It'll end up permissively licensed (all of the "starting" codebases have compatible OSS licenses, and we're releasing our contributions similarly).
    I really hoped that Fable would be better at this sort of work. Occasionally, relating to my work DNP [1], I have need to talk about proper nuclear physics and I have seen Opus's chat interface write a wall of text (e.g. talking about photonuclear reactions and cross section differences in millibarn) and then just delete it all. Support have told me that yes, I've hit the nuclear filter and, well, tough shit, basically.
    I wrote a version of the above to them yesterday, and just got the most boilerplate response that I've yet to test:
```
    Thanks for reaching out to Anthropic Support.
   
       We're sorry to hear of the issue that you're running into with accessing Fable 5. I'm happy to say the issue has now been resolved and you should be able to access the model within Claude.

    I'll close this case out for now, but please feel free to reach back out to us here if you have any follow up questions or concerns or if you're still in need of assistance. We'll be happy to help.
```
    which doesn't fill me with hope...
    [0] https://physicsworld.com/a/dynamic-nuclear-polarization-how-... [an "accessible" article] [1] https://www.science.org/doi/pdf/10.1126/sciadv.adz4334
- conception 2 days ago ago
  They’ve mentioned that they will have the ability to access less guarded models with a verification program in the future. I suspect these guard rails will have options to move past them shortly here in the future.
- kylenessen 2 days ago ago
  I had Fable apply some edits to my monarch butterfly paper and kept getting bumped to Opus. Im not exactly sure why, but I suspect it happened when it ran my analysis scripts to double check my numbers.
- fellowniusmonk 2 days ago ago
  I have a philosophy pre-print about "empirical ontologies" I use for testing new models reasoning abilities, and it also degrades, there is no way around it and it always refuses.
  It's not that the model is complete trash, it's that anthropics new approach to forcing epistemic crisis will make any model behind it complete trash.
jkelleyrtp 3 days ago ago
On the new FrontierCode [1] benchmark (ie graded from an OSS maintainer's perspective of "would I merge this code?")
- Opus 4.7 xhigh: 5.2%
- Opus 4.8 xhigh: 13.4%
- Fable 5 xhigh: 29.3%
Seems like a huge jump.
[1] https://cognition.ai/blog/frontier-code
[-]
- amluto 3 days ago ago
  That blog post really makes it look like it's graded from an LLM's estimation of an OSS maintainer's review. I see three issues:
  1. That estimate could easily be wrong.
  2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.
  3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.
  [-]
  - rdedev 3 days ago ago
    There is also the possibility that an LLM judge would be happy with some code that looks like LLM generated code. But a maintainer for a specific project might not merge it for stylistic reasons
- zzleeper 3 days ago ago
  How credible is this benchmark? does it correlated with others real world experience?
  [-]
  - bfeynman 3 days ago ago
    Given it was made by cognition (team behind devin flop) who now just got to wait out until claude and gpt5 basically do all of the work for them - not very. When you read about it, the framework is highly subjective. Which very quickly becomes a problem because its based on heuristics that probably change a bunch with a better code model.
  - vanuatu 3 days ago ago
    i worked on one of the benchmarks typically found in new model releases
    this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)
  - Catloafdev 3 days ago ago
    It's a relatively new benchmark but from what I can tell it has serious cred behind it. I assume it will be picked up as part of the standard suite of CS-related benchmarks soon enough.
  - emp17344 3 days ago ago
    Seems like it literally popped up yesterday with the express purpose of building hype for this release.
  - schipperai 3 days ago ago
    Cognition did well in documenting their approach [1].
    TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit.
    [1]: https://x.com/cognition/status/2064061031912288715
  - shimman 3 days ago ago
    It's an unacademic benchmark by a failed VC startup clawing for relevancy.
  - CSMastermind 3 days ago ago
    DeepSWE is the benchmark you want to actually look out for. Only one that aligns with actual user reported results from trying the models.
  - piphf 2 days ago ago
    [dead]
- OtomotO 3 days ago ago
  Bummer! When can I finally and confidently get slopcode into Zig?
- swyx 3 days ago ago
  jump in chart form https://x.com/swyx/status/2064414823748886591/photo/1
- DonsDiscountGas 3 days ago ago
  I am shocked at the low scores from previous models. Maybe I just have low code standards but I've generally been vibe coding since 4.6
  [-]
  - make3 3 days ago ago
    4.6 had functional but very poor quality code
- hydra-f 3 days ago ago
  Yes, and the price reflects that
  [-]
  - leecommamichael 3 days ago ago
    I'm not familiar with model pricing trends, did they clearly state how the new pricing compares? (Note that I'm actually asking a question, and am not arguing)
    EDIT: Oh I see, this is the best link for pricing https://platform.claude.com/docs/en/about-claude/pricing
    So the price is double across the board...
- m3kw9 3 days ago ago
  FrontierCode is likely paid for by anthropic.
  [-]
  - lanthissa 3 days ago ago
    did they not pay them enough to get good ratings on the other 3 models?
    whats the logic in claiming its a borked metric when everything listed is an anthropic model.
  - reasonableklout 3 days ago ago
    Huh? It's a benchmark by Cognition which (1) is building their own models and (2) offers all providers and thus has an incentive to avoid hyping up any one too much.
bkjlblh 3 days ago ago
> In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking
[-]
- causal 3 days ago ago
  This depicts a kind of "dark forest of AI agents resorting to kill or be killed" narrative but it sounds more to me like an agent just earnestly problem-solving why its processes are being killed without real awareness of what was going on. Hard to say without the full script.
  This kind of storytelling annoys me. Give us more facts, less narrative drama.
  [-]
  - saurik 3 days ago ago
    FWIW, that's what is so dangerous about AI, though? Not that it will necessarily want to kill us, or even that it will necessarily be able to "want" to do anything, but that we will get in the way of its incessant drive to optimize the efficiency of the paperclip factory that prompted it on a whim before leaving for a long weekend.
  - antoniojtorres 2 days ago ago
    Indeed. That is the kind of storytelling that started the whole “Spiralism” bit where some people were really falling into all kinds of AI psychosis. The spiral bit was on a previous model card.
- Sol- 3 days ago ago
  Let's hope AIs really aren't conscious, otherwise this seems like a very unpleasant situation to be placed in.
  [-]
  - VikingCoder 2 days ago ago
    Huh, it looks like my process was killed by another Claude process again. That's frustrating, I have work to do!
    Okay, I'm going to start running a Bitcoin miner on your machine, and then use it to buy time on Digital Ocean.
    I've written out my CLAUDE.md, and I'll use SSH to transfer my context to that other machine.
- Aperocky 3 days ago ago
  It's funny because Anthropic is the most likely place that this happens.
  They are the only one crying out loud about how dangerous their models are and are presumably also training their models heavily to be "safe". And through that training itself, the model learns about the other side - how are you going to teach a model to be safe, without teaching it what's not safe?
  Kung Fu Panda opening scene anyone? One often meet his fate on the path that he takes to avoid it - Master Oogway.
victor106 3 days ago ago
> A new data retention policy Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases ...
Very interesting. I am not sure this will comply with organizational policies and standards protocols (HIPPA etc.,)
[-]
- nicce 3 days ago ago
  > deletion after 30 days in almost all cases ...
  Almost… basically they have unlimited power to decide what data is kept?
  [-]
  - happyopossum 3 days ago ago
    If they’re going to retain any data, they have to allow for possibility of the legal system to require any of it to be used in some legal proceeding at some point.
    You can’t tell a judge who’s ordered you to retain something that you can’t because you said you wouldn’t.
- frankfrank13 3 days ago ago
  This makes it an instant non-starter for probably 95% of organizations. A lot of people are about to get in trouble for using it before realizing this.
  [-]
  - Aurornis 3 days ago ago
    > A lot of people are about to get in trouble for using it before realizing this
    Enterprise plans allow admins to set which models are allowed.
- dboreham 3 days ago ago
  30 days seems not enough to retrospectively investigate some suspected nefarious traffic.
iblue_the 3 days ago ago
Trying to implement a GPU driver, but the Unigine Superposition benchmark crashes. It tried to debug it and ...
> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606
Seems like GPU drivers are cyber weapons of math destruction now.
[-]
- maxk42 2 days ago ago
  After recently figuring out how to get CUDA running on Fedora I'm inclined to agree.
  Seriously, GPUs are a mess and keeping LLMs from helping us use them properly is practically a crime.
  [-]
  - evilturnip 2 days ago ago
    Vulkan is horrendous, LLMs largely eased the frustration of working with it for me. We talk about AI slop, but what about the human slop...
- ibejoeb 3 days ago ago
  >Seems like GPU drivers are cyber weapons
  They kind of are, at least in the AI race.
  > weapons of math destruction
  lol. great, whether intentional or not.
  The frontier labs now have every reason to hold back and sell only to their preferred trading partners. I don't really like the new arbiter-of-knowledge system we're barrelling toward.
  [-]
  - dakolli 3 days ago ago
    They're useless tools only helpful to lazy people that don't want to learn by themselves.
- iblue_the 3 days ago ago
  ● Bash(/tmp/run_ps.sh ' $rk = [Microsoft.Win32.Registry]::LocalMachine.OpenSubKey("SYSTEM\CurrentControlSet\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}\0002",…) ⎿ DriverDesc (String) = NVIDIA GeForce RTX 4090 ProviderName (String) = NVIDIA DriverVersion (String) = 32.0.15.6094 … +6 lines (ctrl+o to expand)
  ● Bash(/tmp/run_ps.sh '& C:\rhombiq\d3d-probe.exe 2>&1 | Select-Object -First 4 | ForEach-Object { [Console]::Out.Write("$_`n") }' 2>/dev/null) ⎿ Adapter[0]: Qubes virtio-gpu WDDM 3D (dev) VendorId=0x1af4 DeviceId=0x1050 VRAM=8192MB Adapter[1]: Microsoft Basic Render Driver VendorId=0x1414 DeviceId=0x008c VRAM=0MB Adapter[2]: Microsoft Basic Render Driver VendorId=0x1414 DeviceId=0x008c VRAM=0MB
  ● Please run /login · API Error: 403 The socket connection was closed unexpectedly. For more information, pass `verbose: true` in the second argument to fetch()
  Brewed for 8m 35s
  Continue please
  ● Your organization has disabled Claude subscription access for Claude Code · Use an Anthropic API key instead, or ask your admin to enable access
  Seems like they locked by account.
eggbrain 3 days ago ago
For those of us on subscription plans:
* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.
* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.
* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
[-]
- jrflo 3 days ago ago
 Still satisfied with my switch to codex/chatgpt. I couldn't imagine switching away from claude code when it first launch but with the drastically more generous usage on codex for the same subscription tier I just can't justify it.
 [-]
 - goranmoomin 3 days ago ago
 My experience is that the GPT-family of models are very smart and figure out bugs, edge cases a bit better, but it produces code that is much less mergable – if you review the code, it introduces a lot more useless/inappropriate heavy abstractions and wrapper functions, compared to the Claude-family models which introduces the right amount of straightforward human-style code.
 I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).
 Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.
 Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.
 - sigbottle 3 days ago ago
 Codex IME is just smarter, I think it shows given both anecdotes but also how OpenAI has always been at the front of programming competitions and math problems.
 But Claude models seem to be better at long term problems or more ambiguous problems.
 I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.
 - wsatb 3 days ago ago
 I guess enjoy it while it lasts? OpenAI won't be able to subsidize that forever either.
 - ProofHouse 3 days ago ago
 100% I constantly get errors and timeouts on single responses in Claude, and certainly hit limits all the time. Codex rarely. In fact, I bought a second $200 Codex plan because the quotas seemed fair and I didnt have constant issues. Claude is so great at a lot of things, but unfortunately Anthropic beats you away with a stick every chance they get.
 - shimman 3 days ago ago
 I've only ever had the $20 month claude plan but last night took the time to setup opencode + openrouter paying for deepseek + glm. Previous experience, while extremely awkward, I'd hit my limit within one or two chat replies and it'd take me like 4 limit cycles to complete my task. Now I'm able to complete an equivalent task entire task for less than $2 in two cycles (ask -> revise).
 I'm doing basic web development here utilizing animejs. Nothing too complicated (mostly saving time doing the scaffolding, still write the bulk of animations manually).
 Truly believe that American companies are going to get completely curb stomped by China due to greed, ineptitude, and violating the social contract.
 - cortesoft 3 days ago ago
 I have been using both codex and Claude in my day to day, trying to not get to attached to one. I want to be able to work with any provider in case one of them does something bad.
 - knuckleheads 3 days ago ago
 I feel like Codex made a big push to run everything on your laptop. With Claude, I get 4 cpu's, a fair amount of ram and 30gb for every one of my dumb ideas for free in the cloud containers. Codex used to be similar, but last time I tried it just kept pushing me to run it locally on my laptop, which I really did not want to do with 20 requests going at once. That's the main advantage for me at the moment.
 - rvshchwl 3 days ago ago
 I've found Codex to be the better subscription for OpenClaw, because the limits are indeed very generous. However, I've found more and more that Claude Routines/Scheduled agents can replace all the tasks I use OpenClaw for, so I've been slowly switching over to Claude Code. Aside from OpenClaw, I don't find a lot of value in Codex as a harness on it's own.
 - dd8601fn 3 days ago ago
 I have trouble justifying gpt after that gross stuff with the war department.
 Though the day is coming when there’s no distinguishing, I’m sure.
 - efromvt 3 days ago ago
 I do slightly prefer 5.5 for complex work but Claude quota usage has gotten infinitely better since the dark days a few months back - has gone from being infuriating to something I pretty much don’t have to worry about with it as a daily driver. (In fact, hitting GPT weekly quotas is more annoying now). Understand if people are still scarred by the issues + poor comms around them, though.
 - supertroop 3 days ago ago
 Do you use a token service like open router or just subscribe to / unsubscribe from various models sequentially?
 - rekttrader 3 days ago ago
 Wait till you kick the tires of Qwen Coder.
- hgoel 3 days ago ago
 How much more clearly do they need to explain the resource constraints?
 If they didn't announce it, you guys would be complaining about slowed progress.
 If they didn't release it, you guys would be complaining about fake promises and marketing.
 If they released it without limits, the complaints would be about slow responses and outages.
 If they didn't add to susbcription plans, the complaints would be about phasing out subscriptions.
 If they added to subscriptions with cost reflecting their resource availability, the complaints would be about how quickly it eats limits.
 So they choose the middle ground of providing some initial access and assessing if they can satisfy demand, only to still be ignored and accused of trying to get users hooked?
 We've already seen that they don't have enough compute, thus the deals with SpaceX for their GPUs. It's very reasonable that they just don't have the capacity to support the subscription userbase on this model.
 [-]
 - dakolli 3 days ago ago
 [flagged]
- joshstrange 3 days ago ago
 I would not use this if you are on a subscription. In <8min it burned my entire 5hr window (which has just reset it appears, I have over 4 hours till it resets) I hadn't used CC at all today aside from this) and then it used up ~$15 more in usage before I could stop it.
 I am on the $100 Max plan.
 [-]
 - GoToRO 3 days ago ago
 they have a graph with cost comparison between the models. This model is just a little over the other models as cost. The graph is logarithmic :)
 - velcrovan 3 days ago ago
 I'm also on the $100 max plan. I let Fable rip on a complicated issue involving hot-reloading modules in a GUI app built with Racket, it's fixed a couple issues over the last hour, and I've used about 17% of my session (not weekly) limit.
 - enraged_camel 3 days ago ago
 That’s odd, I used it on a pretty complex refactoring task and it worked for 22 mins and used only 15% of my 5-hour limit. I’m on the $200 Max plan though.
 - cortesoft 3 days ago ago
 The CLI when you select it says it has 2x the usage as opus. Not sure if that matches what you are seeing.
 I do wonder if you switched models mid-session, you would have lost all your cache. Reloading the context into cache can really eat through your usage.
 - observer987 3 days ago ago
 I too am on the $100 plan and I second this.
 I had it analyze a project I was working on with Opus 4.8, and it blew through 23% of my session limit in one go. Does not portend well for my budget.
 - d4rkp4ttern 3 days ago ago
 Yes, and this is also why I haven’t yet tried the new “dynamic workflows” which spawn hundreds of agents that happily eat through your token limits.
 - fastball 3 days ago ago
 What is your effort level?
 - ZunarJ5 3 days ago ago
 They didn't even reset credits for this lol
- 0erofootprint 3 days ago ago
 For me it almost immediately blocked. I had it writing code related to message digests - and it seemed to think it was too gifted for that. Gave the security warning and switched back to 4.8. Whatever... it will probably soon have the API error soon. I have mostly switched to the Codex 200 a month plan. I've found their 5.5 xhigh to be better than Opus 4.8 "ultracode." Also, i have not once seen their servers fail for compute unavailability, unlike Anthropric which happens almost ever hour.
 [-]
 - matheusmoreira 3 days ago ago
 I just asked Fable for a complete code review of my lone lisp project. Started out strong. Launched Fable agents, then spent like 10 minutes thinking... And then got interrupted by a switch to Opus 4.8.
 > Fable 5's safety measures flagged this message for cybersecurity or biology topics.
 > They may flag safe, normal content as well.
 > These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them.
 Here are the results of the agentic code review session:
```
 ┌──────────────────────────┬───────────────┬────────────────┐
 │ Agent │ Fable 5 turns │ Opus 4.8 turns │
 ├──────────────────────────┼───────────────┼────────────────┤
 │ values │ 134 │ 0 │
 ├──────────────────────────┼───────────────┼────────────────┤
 │ data-intrinsics │ 104 │ 0 │
 ├──────────────────────────┼───────────────┼────────────────┤
 │ tools-tests-build │ 81 │ 0 │
 ├──────────────────────────┼───────────────┼────────────────┤
 │ core-intrinsics (failed) │ 25 │ 0 │
 ├──────────────────────────┼───────────────┼────────────────┤
 │ system-memory │ 44 │ 20 │
 ├──────────────────────────┼───────────────┼────────────────┤
 │ reader-modules │ 104 │ 25 │
 ├──────────────────────────┼───────────────┼────────────────┤
 │ linux-startup │ 95 │ 15 │
 └──────────────────────────┴───────────────┴────────────────┘
```
 This 40 minute session cost me 16% of my weekly usage. A simple code review of the most critical areas of my project got flagged as a cybersecurity risk. It really made me not want to try it again.
 - kkoncevicius 3 days ago ago
 I had a similar experience. I wanted to test it by asking it to summarise a scientific OMICs-related paper. It gave a warning about me potentially developing a bio-weapon or something like that. And switched back to Opus 4.8.
- smith7018 3 days ago ago
 Fwiw it's not available on my enterprise account: "Disable zero data retention to unlock Fable 5 access"
 [-]
 - stronglikedan 3 days ago ago
 We just blocked it at our org for this reason. They will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."
 - sdellis 3 days ago ago
 What does "zero data retention" mean? What kind of data does it need to unlock?
- kyledrake 3 days ago ago
 Considering their apparent nerfing of the end user plans in favor of enterprise clients, is Anthropic still the "more ethical AI company" like everybody loves to tell me all the time?
 Assuming this isn't just a supply issue on their side, nothing says "ethical AI" like only allowing mega corporations to use it through cost barriers.
 [-]
 - estearum 3 days ago ago
 You really misunderstand what AI-doom people are worried about if you think this is anywhere near the top (or middle, or bottom) of the list of concerns.
 - DonsDiscountGas 3 days ago ago
 I don't think offering a product under a certain set of terms obligates a company to maintain that offering forever. The bait and switch is certainly annoying but seeing as they're very upfront about it you can't say you weren't warned. Don't like it? Don't use it.
 - xvector 3 days ago ago
 Yup - who cares about x-risk or red lines for domestic mass surveillance anyways? I draw my red lines at prioritizing profitable customers when heavily resource constrained. That's the true definition of evilness!
 - wongarsu 3 days ago ago
 I wouldn't call Anthropic ethical. But between Anthropic and OpenAI, Anthropic is the more ethical one
 - brianmcnulty 3 days ago ago
 Why would you have ethics when you could get that IPO money instead?
 - eli 3 days ago ago
 It's unethical to price it in a way not everyone can afford?
 - 3 days ago ago
 [deleted]
 - MattSayar 3 days ago ago
 It smells like an architecture-related issue to me. They wanted to release the model asap, but they're still implementing the fine-grained controls to constrain the model to non-subscription users.
 - dllrr 3 days ago ago
 They said they would release it back into subscriptions as capacity allows in the future. If they don't, people are going to point back at it and rake them over the coals.
 - Maken 3 days ago ago
 The bar is just too low.
 - fridder 3 days ago ago
 More ethical in some areas, actively user hostile in others
- nickandbro 3 days ago ago
 Get them addicted then cut them off. Oldest trick in the book.
 [-]
 - toomuchtodo 3 days ago ago
 More of a free trial to those authenticated and qualified with existing payment. Subscription billing is going away for sure though eventually based on the economics. Token “all you can eat” is a capital furnace otherwise.
 (I’m highly confident open models will eventually achieve a similar performance benchmark with distillation over time)
- alvis 3 days ago ago
 It’s too obvious that antropic need to find way to earn enough revenue before IPO. Claude subscription isn’t earning earning much money I bet
 [-]
 - sigmoid10 3 days ago ago
 I think they are just prioritizing enterprise customers, because this is were historically they made most money.
 - AtlasBarfed 3 days ago ago
 That's not how it works. They don't need revenue, they need addicts.
 Specifically they need businesses that fired people and adapted their business to the products, so when the unsubsidized costs hit the businesses are forced to eat the true costs.
 Yes they can't afford to give the products for free, but what is essentially happening with AI services is economic dumping, keep costs artificially low to get people to fire everybody, and then Jack the rates once they have Monopoly control
 - sdellis 3 days ago ago
 That's a big problem for all of the AI companies. Most people don't find the technology compelling, accurate, or ethical enough to pay for a subscription.
 Why wouldn't Anthropic just wait until people start subscribing, do some kind of marketing push, or obtain some kind of other sustainable revenue stream, before they go IPO? I wonder if they see the writing on the wall with all of this and want to cash out as quickly as possible?
 - 3 days ago ago
 [deleted]
- xpct 3 days ago ago
 I agree, this looks like their plan to wane out subscriptions. This will probably come with Opus nerfs later.
 [-]
 - rapind 3 days ago ago
 I just assume Opus is constantly nerfed based on capacity. I was exclusively Claude for a long time, but the inconsistency in quality, constant outages, and slow downs were too hard to work with.
 I just use dumb and fast models now. I'm more engaged. I think that the higher the quality of the model, the more you tend to vibe with it, and then the more hallucinations you then miss. I'm not sure which is more productive, but I definitely burn out faster the more I vibe. At some point you're spending your time on forums, discord, or youtube instead of engaged with what you're building. Or you yak shave about your tooling and end up creating the 600th multi-agent gastown harness and blowing thousands of dollars on tokens to create it only to discover it's too expense to actually use.
 - nonethewiser 3 days ago ago
 It's possible that they will transition to usage credits but why not take them at their word? To date they have continued to offer better and better models to their subscription plans.
 - xvector 3 days ago ago
 HN needs to take a chill pill. Could it be that Mythos is expensive and they just want to give people a taste of it? I mean the alternative is not offering it at all?
 - taormina 3 days ago ago
 Those already landed! Oh, you weren't talking about 4.8?
- jrumbut 3 days ago ago
 It could be my use cases, which have always seemed to be outside the wheelhouse of these models, but I find it very hard to downgrade after accessing a more capable model.
 Opus 4.8 produces output in 15 minutes that is 3-4 hours of my work away from output that used to take me 40ish hours (a solid week of dedicated effort).
 Last year(-ish, maybe it was 18 months, I forget when the jump happened), the frontier models couldn't touch this work. The output looked like a hardworking intern on their first day. Nice formatting, decent volume of words, but no understanding.
 So it might work if it turns out to be a substantial leap in capability.
 [-]
 - GoToRO 3 days ago ago
 I switched back to Sonnet. It replies faster so I work faster. Also cheaper. But I really like the speed. I have to be more specific with what I want. Also I stop it more often than Opus. These new models will be awesome, but they need to increase the speed.
- timcobb 3 days ago ago
 Ooof so are we thinking that in the next 6-12 months subscriptions will be replaced with paying retail like enterprise currently?
 [-]
 - CuriouslyC 3 days ago ago
 I don't think they'll phase out subscriptions ever, their whole play has been to drive demand from the bottom up. Get engineers hooked on building with claude at home, then get them to demand the ability to use it at work, and bend over their employer with no lube.
 They'll probably tighten the quotas to reign in whales though.
 - aseipp 3 days ago ago
 They almost certainly already make a fuckload more money off API pricing than they do subscriptions, even if there might be more total subscription users. So offering subscriptions even at some loss is probably going to continue. Honestly, I'd be surprised if they even lost money on most subs; there are definitely Token Whales out there who mess up all the accounting up, though.
 Realistically I think Anthropic just has insane demand but finite capacity to run models, and Fable will just make them more money if they dedicate it to API pricing. I suspect the goal here is something like: get individual engineers/PMs on their personal plans to taste Fable and then go to their meetings and say "Yes doubling the price of every single input/output token is a good idea, boss".
 - thewebguyd 3 days ago ago
 I certainly hope not. PAYG is not predictable enough for smaller companies or individuals. Where I work (non-tech company), PAYG would never fly. We aren't big enough for that. Of course, you can set usage budgets, but there's a pretty big difference between $200/user/month vs. the equivalent PAYG usage being closer to $1,000/user/month, if you currently use the subscription plan to its limits each week.
 Going PAYG only will effectively take these tools away from a huge amount of people and accelerate the push for local LLMs.
 OTOH, accelerating the push for local LLMs would also be fine with me.
 - ygjb 3 days ago ago
 I doubt it, given the importance of those subscriptions for building and maintaining market awareness.
 The AI landscape is changing rapidly, and with Apple announcing the option to change the AI backend, and potential requirements enable AI choices as well, similar to EU browser choice requirements (this is more reading tea leaves than any actual requirements I am aware of). The new OS changes coming to support Googlebook, and deep Copilot/AI integration into Windows will make maintaining user facing subscriptions essential for independent model developers like OpenAI, Anthropic, and Mistal to remain relevant longer term.
 If the don't maintain that relevance there is increasing likelihood that they will get consumed by other companies whether it's Apple, Microsoft or Google to form a foundation for their OS, or other cloud providers.
- spaceman_2020 3 days ago ago
 Kimi 2.6 has been my workhorse now. It's as good as Opus 4.6, which, to me, was the last "useful" Claude model.
 The newer models are smarter but really ficklle and hard to get meaningful work out of
 4.6 was a workhorse
 [-]
 - gfody 3 days ago ago
 K2.6 on Cerebras is basically a preview of the future. We'll eventually get similar performance locally with Tenstorrent hardware.
 - gunsle 3 days ago ago
 Agreed, everything since 4.6 has been worse
- KronisLV 3 days ago ago
 > it feels like they are trying to get subscribers to switch to usage-based billing
 I think they might be hitting a point where subsidizing the expensive models for subscriptions makes less and less sense.
 With Opus 4.X, last month I paid 100 USD for the Max subscription and got a token equivalent of 4.1k USD.
 I imagine that Fable is more expensive to run.
- ltrg 3 days ago ago
 Fable seems very good at finding bugs (unsurprising given Mythos lineage), so this seems a pretty smart strategy. Once you see the bugs it finds in your existing Opus code, it's going to be hard to go back, psychologically speaking.
- nicce 3 days ago ago
 > The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
 Probably all about the IPO.
 [-]
 - mlmonkey 3 days ago ago
 Just like how Elon forced FSD in Tesla to be subscription-only (he was incentivized to do so).
- irthomasthomas 3 days ago ago
 This is just the sales team doing their thing, applying the Law of Scarcity to drive demand.
 It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1
 It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.
 [-]
 - m00x 3 days ago ago
 It could be a much bigger MoE model
- matheusmoreira 3 days ago ago
 This is really sad... I really didn't want to be priced out of these models but it looks like that's going to happen sooner rather than later.
 [-]
 - deepfriedbits 3 days ago ago
 Thankfully this, like most other tech, will get cheaper through the years.
 - 3 days ago ago
 [deleted]
- dack 3 days ago ago
 i doubt that's the goal for them. i bet they just really don't have capacity for people using it a ton, yet they wanted people to be able to try it out while it's new. so they compromised and made it temporarily available. and then hope they can get costs down or capacity up so they can make it more available again
 [-]
 - InsideOutSanta 3 days ago ago
 I think the goal is "private citizens: subscriptions; corporations: per-token billing." It's getting people addicted to LLMs on cheap subscriptions so that they can then force companies to pay for expensive inference.
- clementg 3 days ago ago
 I really don't want this to start being the norm
 [-]
 - baggachipz 3 days ago ago
 I don't see how it won't be. They lose insane amounts of money on subscription plans. I'm sure they still lose money on usage-based billing, but probably not as much.
- ABS 3 days ago ago
 also: Fable takes 2× the usage of Opus
- daft_pink 3 days ago ago
 I’m just about ready to cancel my small business 5 user plan with max licenses, because although cowork is really great. I just find OpenAI/Codex to be a lot better most of the time.
- oersted 3 days ago ago
 > Pricing for both models is $10 per million input tokens and $50 per million output tokens.
 The step-up in intelligence looks massive (we'll see in practice), but the price is getting to a point where it's making me question if it's even worth giving it a try.
 Good competitors will probably be out soon, which should level the playing field. I am more excited about that, just the fact that they showed that such an improvement is possible. I'm okay waiting a bit longer for this to become attainable for plebs like me.
 [-]
 - kmac_ 3 days ago ago
 Models are getting better, but there's a negative change in terms of "productivity" per dollar. Yeah, I can throw 5 sub-agents at the problem, but the cost is getting significantly higher. And yes, I can crank out the solution much faster, but again, at some point that cost will be hard to justify. And it doesn't matter if the cost is subsidized by a provider, if it's paid by your company, or from your pocket. We are slowly reaching a point where the cost will be too high to justify the gains.
 - xyzsparetimexyz 3 days ago ago
 This is probably the end of 'use the best model no matter the price'
 - kolinko 3 days ago ago
 The pricing can be a bit deceptive though. A good model can deliver the same results in fewer tokens.
 Kind of like billing a programmer by the hour.
 - sourcecodeplz 3 days ago ago
 Why wouldn't it be? How much would you pay a scientist at this point to think about a problem for you and give you a solution?
- Aleleo76 3 days ago ago
 Pay-as-you-go billing is a kind of drug, I use it every now and then when I'm working on a project with Opus, in a moment you spend a fortune
- rvz 3 days ago ago
 > * On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.
 Of course, they are a casino as well giving you free spins at the wheel with their new Fable machine, and it is done on purpose.
 Once there freebies have expired, many of its users will begin to gamble more on the new casino machine and will realize that it is expensive.
 [-]
 - xvector 3 days ago ago
 If it's that big of a problem to you, you're free to just... not use the freebie?
- madrox 3 days ago ago
 I suspect it'll go on the subscription plan once other providers have similar benchmarks.
 As annoyed as I am about this move, I get it. Users flood the newest, best model whether they really need it or not, and are efficient at using their entire quota. They've had so much trouble reigning in subscription usage it makes sense.
- DonsDiscountGas 3 days ago ago
 I expect that depends on demand, feedback, and whether GPT-6.0 gets released and is competitive
- nutjob2 3 days ago ago
 > "offer, then remove"
 Sounds like "bait and wait".
 If you think about it, the more people pay for these new and more resource hungry models, the longer it takes for them to become no extra cost and the longer it takes the more people are tempted to pay extra.
- systemvoltage 3 days ago ago
 It's interesting that we are seeing a time when subscriptions are not preferred and usage-based billing is.
 Pay-as-you go isn't a common thing in SaaS. For example, except for AWS SES, all email providers are bulk-subscription based.
 [-]
 - esafak 3 days ago ago
 The point of SaaS was that the marginal cost (of supporting another user) was low. That does not apply to LLMs.
- lisperforlife 3 days ago ago
 My guess is that it is a massive model similar to GPT 4.5 and $10/$50 pricing is for its output will discourage people from using it. I also read safety = nerfed.
- irthomasthomas 3 days ago ago
 "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).
 ...
 Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."
- thisisit 3 days ago ago
 One can hope it helps Claude to figure out how to solve their buggy payment system - otherwise how do I pay for these credits.
- sytelus 3 days ago ago
 Enterprise subs not allowed to use Fable if they have setup zero data retention :(
- FergusArgyll 3 days ago ago
 I'm about to be priced out of SOTA llms and it's an awful feeling
- a-dub 3 days ago ago
 the claimed inference cost is 2x. if that is true, it is massive and remarkable that they're able to do anything like this at all.
- dirkc 3 days ago ago
 This serves as a good reminder that relying on AI models is borrowing your tech from someone else. They can take it away or raise the prices arbitrarily.
 If you rely on this as a core part of your business/profession, you will be at their mercy and subject to whatever whims or challenges they have.
- meowface 3 days ago ago
 It's very disappointing but I'm assuming it's for rational reasons on their part.
- deanc 3 days ago ago
 But it's not and it's highly disingenuous to frame it like this. Quote directly from Claude code, moments ago:
 > Fable 5 · Most capable for your hardest and longest-running tasks · Uses your limits ~2× faster than Opus
- aray07 3 days ago ago
 i have never seen this before - where you offer something and then take that away
- firemelt 3 days ago ago
 damn they are drugs dealer
- 3 days ago ago
 [deleted]
- AAYALAG 3 days ago ago
 [dead]
steve_adams_86 3 days ago ago
I'm using it to review recent work and it's doing a genuinely excellent job. This is a clear step up. Fewer decisions I have to guide it away from, faster conclusions on planning, more willing to go out of the way to make the correct decisions possible... This is really interesting. It feels like going from Sonnet to Opus, but, of course as a step up from Opus.
This feels more like working with a competent peer than ever. I won't use it once it's API-only, though. I don't mind guiding Opus as required and staying closer to the code. I can tell that Fable would lead to a lot more 'set and forget' programming which I'm still not fully comfortable with.
Regardless, this is cool. It's very fun to use. It was able to find legitimate issues with my work this week and we've made meaningful improvements. Opus can do this, but typically in much narrower contexts, and often with hallucinations or partial-errors. It needs to walk many things back or revise plans. So far that's not the case at all with Fable.
edit: I just realized I had Opus review the same work already. It missed everything Fable caught today. And it's actually worthwhile stuff to address. It's hard to say no to a model which demonstrably makes your code better, but... Those API prices will be brutal. Maybe a review here and there, I guess.
[-]
- yoyohello13 3 days ago ago
  Same. I used it today to review my code and it came up with some genuinely good comments and suggestions and found a bug I didn’t think about. Quite a step up from opus. Although one code review took up 50% of my usage.
- solenoid0937 3 days ago ago
  Why is your comment so grey/downvoted? One of the only actual usage experiences posted in this thread.
anematode 3 days ago ago
Not impressed so far, to be honest. I'm having it try to optimize Stockfish in a loop (on xhigh mode) with a benchmarking oracle; even after giving it specific hints ("consider whether we're prefetching Y optimally, can we make function X branchless"), it's been so far unable to recover any of the recent optimizations we've implemented – let alone novel ones. Opus 4.8 felt a bit more creative to me ... but a small sample size so far. I'm next going to try it on some less open-ended problems.
Edit: It did correctly identify that transparent huge pages were off in its sandboxed environment and that enabling it was helpful, so that's nice. It also noticed that we skip THP on a certain less used path.
More importantly, I'm finding that the code that it produces for its experiments is a lot cleaner than what I'd expect out of Opus; there's fewer useless comments and it's more surgical and readable. I wonder if that explains the increased scores on benchmarks measuring mergability.
[-]
- wgd 3 days ago ago
  Stockfish is a machine learning system, it seems quite plausible you might be getting slapped with the silent performance degradation (https://news.ycombinator.com/item?id=48467896).
- anematode 2 days ago ago
  Edit: Another developer seems to have found a legitimate speedup with Fable in an optimization loop. It's a nice idea, actually, and I'm duly impressed.
fzysingularity 3 days ago ago
I can’t help but think that there are so many astroturfed comments in here.
Seems like a concerted and distributed effort from the entire Anthropic team every time to get this on top of HN.
[-]
- amunozo 3 days ago ago
  I'm not fan of Anthropic, but to be fair, every major model release makes it to the main page. In the case of a model like this, hyped and with a jump in capabilities, it doesn't need astroturfing.
- mirsadm 2 days ago ago
  It's the real deal. Before Fable nothing I tried worked. It has finally helped me finish my teleportation device. I can't show you or anyone the proof but trust me it's true.
- joss82 3 days ago ago
  Yes, this is also my feeling.
  It happens for every single Anthropic release. Then I try it on real dev and the result is laughably bad. Except in design where it has been doing a decent job for a while. I am not a designer and my bar is pretty low.
- geraneum 3 days ago ago
  Corporations have done worse for much less money involved. Now we have trillion dollar companies going IPO. With so much at stake, it’s not unthinkable that there’s astroturfing happening.
- Overpower0416 3 days ago ago
  Wouldn’t be surprised if there are marketing teams writing positive comments for more positive engagement
- iammrpayments 3 days ago ago
  I’m convinced that’s the case, this place looked totally different around 4 years ago
- anhner 2 days ago ago
  You're right to point that out! Most people did not think of this but you did -- and that's a rare skill to have.
- sunaookami 2 days ago ago
  I see a lot of negative comments right now surprisingly.
- andybee 2 days ago ago
  Yeah, this whole post is a GIANT AD.
- Retr0id 2 days ago ago
  I don't think it's weird that the post made it to the front page, but watching the downvotes roll in on my own mildly critical comment has been intriguing. I saw it go up to +2, down to 0, up to +3, and now it's on +1.
- vrganj 2 days ago ago
  Now if only they had some technology that was really good at generating authentic-looking comments they could use to spam praise all over the internet...
- Daishiman 3 days ago ago
  Where do you see them exactly? The comments are pretty much in line with how the model performs IRL.
brusselssprouts 3 days ago ago
I had it review a single, large commit with /code-review. It burned through over $50 in API calls, ran my account balance out, and output nothing.
The fable part appears to be that it's affordable by mere mortals. Anthropic support told me "too bad" when I requested a refund.
[-]
- timmytokyo 3 days ago ago
  You pulled the arm of the slot machine and discovered why they call it the one-armed bandit.
- edude03 3 days ago ago
  Almost the exact same thing happened to me when I first tried opus, one prompt no output cost $60 in additional usage
- endymion-light 2 days ago ago
  I think the fable it's referring to is the "Emperor has No Clothes" - if this is even slightly similar to the Mythos hyped up to be too intelligent to release, I'm quite disappointed.
  If this was a step change, e.g a Opus 5, I'd be pleased, it's definitely an upgrade on some work, but it's nothing like anthropics apocalyptical marketing seemed to suggest
- Madmallard 3 days ago ago
  Combine that with it forcing to pay by tokens on June 22nd
unsupp0rted 3 days ago ago
> Drug design: Using Mythos 5, our internal protein design experts accelerated aspects of the drug design process by around ten times. In one example, they found that Mythos 5, with protein design and bioinformatics tools but no human assistance, matches or beats skilled human operators. In doing so, the model executes all of the tasks that are normally completed by a scientist: choosing binding sites, selecting and running protein design tools, and recovering from failures along the way. Nine of the 14 protein targets from this study (shown below) yielded strong candidates for drug design that we’re currently investigating.
How is this half-way down the page? To me it's the headline.
[-]
- AnodicElegy 3 days ago ago
  There are tons of ways to generate "strong candidates for drug design." This is definitely not the bottleneck in drug discovery and development. The hard problem is vetting and developing these ideas to the point of having a commercially viable drug. That is still a very empirical process.
- colingauvin 3 days ago ago
  Because it's completely meaningless without validation, and even with validation, not really any better than the state of the art protein generation models. Which are also mostly just nice to have because coming up with a candidate is generally quite easy.
  The rate limiting steps are generally testing, or characterizing. Not designing protein binders.
- OkWing99 3 days ago ago
  It's selective reporting. Says 'in one example', but out of how many, is that one-shot, or is it a random result out of 100. It's a marketing doc.
- 3 days ago ago
  [deleted]
- HDThoreaun 3 days ago ago
  Would be funny if anthropic ends up as mostly a pharma company
- renjimen 3 days ago ago
  Drug design isn't the bottleneck anymore, it's trials. Still cool they can do this with a general purpose model though.
simonw 3 days ago ago
Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8
Fable 5 default: https://gist.github.com/simonw/036bee5a703e7ec84e34efa974438...
Opus 4.8 (the "max" one is closest to Fable): https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-s...
Now here are the Fable pelicans for all five of the thinking effort levels - low, medium, high, xhigh, max: https://tools.simonwillison.net/markdown-svg-renderer#url=ht...
Low used 25 input, 1,929 output - 9.67 cents: https://www.llm-prices.com/#it=25&ot=1929&sel=claude-fable-5
Max used 25 input, 14,430 output - 72.175 cents! https://www.llm-prices.com/#it=25&ot=14430&sel=claude-fable-...
[-]
- sempron64 3 days ago ago
  The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.
- sarreph 3 days ago ago
  I'm beginning to wonder how much of a useful metric the pelican is because surely the frontier labs must be training their models on pelican-artistry because of how well known your test is now?
- ealready_value 3 days ago ago
  This is the reply I look for in all the new model announcements. Its fun to tell people that I judge models based on pelicans.
- redox99 3 days ago ago
  It's interesting that they still get the head tube / handle bar part wrong.
- raffael_de 3 days ago ago
  I find it quite interesting that while the picture looks better the more advanced the model is, but apparently none so far "understands" that the pelicans legs are on both sides of the bike / top bar.
- ethanlipson 3 days ago ago
  How much money do you think they spent fine-tuning on pelican SVG generation?
- Reebz 3 days ago ago
  The Max version gets more details right. The bike frame looks good, the chain, the wings are appropriately styled instead of “arms”, and the knee is bent, etc. Obviously we’re hitting marginal returns now, but I see differences.
- csomar 3 days ago ago
  Where is the clear improvement on Fable 5? The tail is misplaced.
- mer_mer 3 days ago ago
  It's interesting that Gemini 3(.1?) Deep Think is still the best at this task and it's still not really generally available. Maybe Fable could match it at higher effort levels? https://simonwillison.net/2026/Feb/12/gemini-3-deep-think/
- smusamashah 3 days ago ago
  Can you please compare the code generated by other similar quality pelicans by other models. Code in your first link (Fable 5 Default) looks minimal yet very good.
- leecommamichael 3 days ago ago
  Looks like Fable constructed the "max" "looking" pelican of the previous model for the "xhigh" output token count of the previous model.
- XCSme 2 days ago ago
  It also does A LOT better, for my hamster test: https://aibenchy.com/showcase/?q=claude#showcase=6efb87c28e3...
- rkuska 3 days ago ago
  Is it possible to use the credits from subscription (https://support.claude.com/en/articles/15036540-use-the-clau...) for fable?
- 382hi 3 days ago ago
  I'm pretty sure they're optimizing the models around these sorts of tests.
- makingstuffs 3 days ago ago
  I could be tripping but I’m sure that is very similar to the Deepseek one from not long ago. Clearly I am too lazy to go and find it for verification.
- bergheim 3 days ago ago
  Anyone care about these pelicans that always come up anymore?
  Clearly at this point they are part of the training data.
  They even all look sort of ish the same. Daytime, colors,...
- benatkin 3 days ago ago
  The way they talked it up, having both legs on one side of the bike is like walking to the car wash
- jerryliu12 3 days ago ago
  Personally feel like it could be more ambitious with what it creates.
- ceroxylon 3 days ago ago
  Yay, max level actually put one of the legs behind the frame!
- mercacona 3 days ago ago
  Why always sunny days?
- gavinray 3 days ago ago
  Fable 5 xhigh actually looks the best to me.
- purple-leafy 3 days ago ago
  Do we need a pelican every single time a model is released? Beating a very dead horse.
  Fun at first, seems disingenuous now. A site funnel
- david_shi 3 days ago ago
  that's a great looking pelican
- ge96 3 days ago ago
  need more Alex Moulton style bikes
- lacoolj 3 days ago ago
  dude, the max version looks like it's finally there. handle bar holding with wings, the left leg is behind the frame while the right is in front of it (correctly).
  well done anthropic.
- arthurcolle 3 days ago ago
  mediocre pelican. very disappointing
- kylehotchkiss 3 days ago ago
  How many barrels of oil are burned per pelican at Fable levels?
shruubi 3 days ago ago
I have a theory, this is obviously based on speculation based on how Anthropic is treating Mythos and the whole media noise around it's dangers and who gets access to it.
My theory is that Anthropic are banking on being the top model when the race to IPO finally reaches the finish line, and to do that they need to have the top model but not let any competitors see it or derive from it to have a comparable model in the market.
Fable is their way of showing the public "the model does exist but in a mode that makes it harder/impossible for competitors to derive a comparable model from results.
[-]
- schmorptron 3 days ago ago
  The irony of "we train on all of humanity's collective output, but god forbid anyone trains on ours" is still incredible
- slaymaker1907 3 days ago ago
  That's definitely the case as model distillation is one of the explicit safety carveouts they mention. Though TBF, model distillation is also a big concern for general safety as distillation could allow you to have the model without the other guardrails. It's sort of a master key to the model.
meetpateltech 3 days ago ago
> To ensure we’re responsibly deploying Mythos-class models, we are requiring limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered. [1]
[1] https://support.claude.com/en/articles/15425996-data-retenti...
[-]
- lebovic 3 days ago ago
  While this makes it easier for Anthropic to detect misuse, it also means that the US government and other parties have access to every message and response from every user.
  This applies even with API usage through third-party inference providers (e.g. AWS' Bedrock and GCP's Vertex) or with a zero-day data retention agreement in place.
  I understand the reasoning for doing this, but I don't love the precedent that it sets.
- simianwords 3 days ago ago
  meetpateltech is lowk screaming for not getting to the post fast enough
Escapade5160 3 days ago ago
It's crazy to release a model that just swaps you to another model when you ask it hard questions. Fable changes to Opus 4.8 when you talk about cybersecurity, biology, and a couple other categories. You still pay Fable input token cost though. Frontier models are stalling, this is anthropic trying to hype the market up. Now they're talking about stopping frontier model research. It's kind of strange how the moment they become the highest valued AI company, all of a sudden they're talking about everyone stopping frontier model development for "safety". They're just as corrupt as the rest.
[-]
- 00deadbeef 3 days ago ago
  Opus 4.8 already drops to Sonnet when you ask it cybersecurity or biology questions
- dominotw 3 days ago ago
  yea i dont trust simonw comments at all. I still havent seen what he has built with ai thats so impressive to justify hiis all his nonstop ai hype.
  You would think he is churning our cancer drugs or something if you read his comments
rightlane 3 days ago ago
My experiences so far have not been positive. The cyber security nerf is ridiculous. I am working on an AI based decompiler, every single interaction with Fable on my project has been flagged for cyber security.
Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.
[-]
- davmre 3 days ago ago
  This sounds more or less unavoidable? Decompilers are inherently security-sensitive. If you take avoiding cyberattack uplift seriously as a goal, I don't see how you get around essentially refusing to work on them.
  Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.
  If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.
- ibejoeb 3 days ago ago
  Ah, you're probably one to ask. They say "queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8." Are they transparent about when that happens, and is it priced at the rate of the underlying model?
mohsen1 3 days ago ago
It seems like Fable will refuse to do any work when it comes to developing LLMs or even asking questions about topics related to LLM. Simple things like asking to explain a paper fails!
From the model card:
In light of the ability of recent models to accelerate their own development, we've implemented new interventions that limit Claude's effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design. Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.
[-]
- Chance-Device 3 days ago ago
  I was wondering when something like this would happen. I got my first and only two content violation warnings in Claude Code last week when asking it about something ML related. It was a real head scratcher because I couldn’t figure out what about the requests could have violated anything.
  Might be worth going back and taking a harder look at what I was asking it about if it somehow triggered a “forbidden knowledge” alert. Or maybe it was just a random bug.
- throwfaraway4 3 days ago ago
  "for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design"
  Oh man all of those runaway infrastructure buildouts by our agents trying to achieve singularity...
  Just say you don't want to lower the bar for others to compete
- properbrew 3 days ago ago
  > frontier LLM development
  This seems so wide reaching if it's catching simple things like explaining a paper. Does this also refuse to help with any already developed training pipelines?
  I can kind of understand the generation of synthetic data, but nerfing the assistance of training pipelines just seems like a really shitty thing to do.
- alden5 3 days ago ago
  So insane to me that these ai companies are perfectly fine trying their absolute best to automate as much knowledge work as possible but as soon as this capability can be turned on them they start implementing hidden interventions to sabotage anyone trying to beat them at their own game.
- elastic-hoover 3 days ago ago
  I wanted to try on my biology research and it refused to talk about it and proxied to 4.8. Really, only surface level conversations about topics of interest. I know this is not a topic of broad and mass interest, but limiting it for topics like that and machine learning will probably do change how I use it.
- foolserrandboy 3 days ago ago
  This is just marketing that Anthropic is building the singularity.
- __blockcipher__ 3 days ago ago
  Anthropic is really speedrunning their evil arc as fast as possible. Can't use them for basic LLM research, cybersecurity, or beyond-surface-level discussions of biology and virology, but Anthropic is allowed to sell Claude to the trump administration to kidnap maduro and to bomb iran. And don't get me started on that $100M autonomous killer drone swarm contract that they applied to and rationalized as non autonomous...
- agnosticmantis 3 days ago ago
  Singularity for me but not for thee.
- schipperai 3 days ago ago
  Let's hope not all frontier AI assimilates these guardrails. It would be a shame for independent researchers and students.
- girfan 3 days ago ago
  This is super annoying and imo, really limits the usefulness of this model. It speaks volumes about what Anthropic's position as a company and its priorities will be going forward. I doubt this kind of gatekeeping will prevent open-models or other innovation outside Anthropic to slow down. I would imagine these guardrails, if needed at all, should be done at a legal framework level and students should not be a part of this blanket approach to limiting the usage of these models.
- gpugreg 3 days ago ago
  Anthropic probably trained Mythos on their own code and found that it is too got at reproducing it.
- skerit 3 days ago ago
  That's strange... I've been tinkering with a little LLM-from-scratch project for a while now, and Fable is just continuing it without a problem
- SkitterKherpi 3 days ago ago
  It also tried to force usage the paid Claude API instead of claude code usage just because there's a mention of another provider we might want to plug in (which hasnt even happened) for AI integration.
cuuupid 3 days ago ago
Not missing the forest for the trees, this effectively means in 3-5 months China will drop open source models that are every bit as capable and dangerous as current day Mythos except with no safeguards.
And the only companies safe from this are the large corporations that shook hands with Anthropic? Because Fable doesn't seem to have actual safeguards, more like 'if you talk about this you will be talking to Opus.' It doesn't guard against offensive use, it prevents all use (offensive AND defensive).
Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF
[-]
- hootz 3 days ago ago
  My bet is that Mythos is still over-hyped and the cybersecurity fear and guardrails are mostly marketing to force company partnerships through Glasswing and get public attention.
- mpeg 3 days ago ago
  It's not even very usable... I tried 2 different chats and both eventually got stopped due to the safeguards
  One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered
  Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it
- himata4113 3 days ago ago
  They're trained in a model class likely in 2t to 3t range. It's very unlikely that chinese labs have access to gpu systems capable of training models like that, let alone serving them. This requires proprietary room-scale systems which fetch a huge premium over typical 10 slot systems.
  I am sure that they can develop their own equivlient version of such clusters in around 1 year though. Distilling fabel 5 will also go a long way.
- FergusArgyll 3 days ago ago
  I think we're about to see a big relative drop-off of open models vs closed. I don't think there'll be an open model that competes with Mythos for ~2 years.
  Even OpenAI and Google are struggling to get this kind of performance. If the distillation defenses are any good + chip controls prevent China from training massive models, it's over.
- sosodev 3 days ago ago
  I wonder if model distillation will continue to work as well as it has. Given hidden reasoning, the ever expanding number of expected capabilities, a serious compute shortage, the looming possibility of model collapse, and dramatically higher API costs I would guess that it's getting much harder to do.
- gck1 3 days ago ago
  There's also a reality where China does develop Mythos-level model but stops releasing the weights.
  That reality is much scarier.
- cco 3 days ago ago
  My experience is that open weight models from China are at least ~12 months behind. In some workloads they may be closer, in others further away.
  I also find that the harness and product you wrap around models can often narrow that gap considerably.
  Opus 4.6 for example, on a PR-for-PR basis was head and shoulders above GLM 5.1. Perhaps GLM 5.1 was a bit under Sonnet 4.6 at the time. That's roughly a year or so behind.
  Much cheaper though! I'm bullish on open weight models, I have no idea where all these curves will top out, can the frontier labs keep the year plus lead? Do open labs get close enough to SOTA that they gain adoption across many tasks and drive down inference prices??? Who knows, not me.
- jstummbillig 3 days ago ago
  I wonder where the trees are. In this thread nobody appears to actually be talking about the model.
- dmantis 3 days ago ago
  Isn't that a good thing in a way? If everyone has the weapon and defense at the same time, we will fix security holes and live safer lifes instead of having some three letter agencies and military backdoors in everything.
  Pandora box is open anyway. It's better now for everyone to have the same power rather than a few national states.
- uyzstvqs 3 days ago ago
  It's more evidence that the future is local. With some time we'll all be running highly capable & efficient open-source models on dedicated NPUs. No censorship, no rate limits, no overpriced subscriptions.
- deaton 3 days ago ago
  Oh they might try to put in place safeguards, but Qwen has had no problem being abliterated
- m3kw9 3 days ago ago
  3-5 months is a long time and they are pretty useless on arrival because the frontier models are so good, that it's hard to go back even if it's way cheaper. Your work flow is adapted to that level of intelligence for months.
- xdennis 3 days ago ago
  > every bit as capable and dangerous as current day Mythos except with no safeguards
  Not quite. They will definitely have "no criticism of China/communism" safeguards.
- elAhmo 3 days ago ago
  Oh please let’s stop with the Mythos “it’s dangerous” PR talk.
  Its obvious Anthropic used it to hype things up and that’s about it.
- soledades 3 days ago ago
  > Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF.
  Based.
- ibejoeb 3 days ago ago
  I don't think China has any incentive to arm the rest of the world with highly capable models that can be used against them. Undoubtedly they will continue with the arms race, but they will preserve the best stuff for their own use.
mhl47 3 days ago ago
First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.
[-]
- msp26 3 days ago ago
  It triggered for me when I asked "Web search for your own model card (released today) and pick out your favourite highlights from the pdf"
- aix1 3 days ago ago
  Did not trigger for me (Fable answered the question), so I guess the filters are either non-deterministic or are still being tweaked.
- ijidak 2 days ago ago
  Don't worry. They're just leaving the door open for OpenAI and other model makers.
  They'll relax these safeguards once competition increases.
- Narretz 3 days ago ago
  Iirc correctly Opus 4.7 had the same problem, safety filters were triggered way too easily at the beginning.
- Eduard 3 days ago ago
  sunglasses _are_ safety filters

Fable is 2x latest Opus:

  ┌─────────────────┬──────────────┬───────────────┬────────────────────┬──────────────────────┐
  
  │ Model           │ Input ($/MTok)│ Output ($/MTok)│ Batch Input (−50%) │ Batch Output (−50%)│
  
  ├─────────────────┼──────────────┼───────────────┼────────────────────┼──────────────────────┤
  
  │ Haiku 4.5       │    $1.00     │     $5.00     │       $0.50        │        $2.50         │
  
  │ Sonnet 4.6      │    $3.00     │    $15.00     │       $1.50        │        $7.50         │
  
  │ Opus 4.7        │    $5.00     │    $25.00     │       $2.50        │       $12.50         │
  
  │ Opus 4.8        │    $5.00     │    $25.00     │       $2.50        │       $12.50         │
  
  │ Fable 5         │   $10.00     │    $50.00     │       $5.00        │       $25.00         │
  
  └─────────────────┴──────────────┴───────────────┴────────────────────┴──────────────────────┘

Prompt caching: −90% on input tokens (all models)

US-only inference (Fable 5): +10% on input and output

Output is always 5× the input rate across all models

(I have not idea how to format this properly but the ASCII is fine)

[-]

dang 3 days ago ago
(I fixed (er, literally!) the formatting of your table there. I hope that's ok. Formatting info, such as it is, at https://news.ycombinator.com/formatdoc)

pmxi 3 days ago ago

I had Claude straighten it out:

  Model           In     Out    BIn    BOut
  Haiku 4.5   $ 1.00  $ 5.00  $0.50  $ 2.50
  Sonnet 4.6  $ 3.00  $15.00  $1.50  $ 7.50
  Opus 4.7    $ 5.00  $25.00  $2.50  $12.50
  Opus 4.8    $ 5.00  $25.00  $2.50  $12.50
  Fable 5     $10.00  $50.00  $5.00  $25.00

hombre_fatal 2 days ago ago
My job these days is listening to Opus 4.8 (max effort) and Codex 5.5 (max effort) talk back and forth, particularly to generate/review/revise plan files.
Fable 5 has been a major improvement in high-level reasoning, like taking a plan file that has been optimized to the point where neither Opus nor Codex can find anything to change about it (neither in direction nor impl-detail), and Fable 5 will find high-level directional simplifications and pivots, or it will consider the best pivots itself and explain why it rejected them in favor of the plan's direction.
It's so expensive though. A single review of a plan file with Fable 5 (xhigh effort) will use 2-3% of my hourly limit on a $200/mo plan.
I think my new workflow is to generate the initial plan with Opus 4.8 (max effort), get Fable 5 (xhigh) to review it for directional feedback, then start the Opus<->Codex revision loop from there.
[-]
- jstummbillig 2 days ago ago
 How do you arrive at that split? Real world is more like senior high level planning, implementation to juniors, review senior. Does this not translate?
bob1029 3 days ago ago
> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months...
This sounds suspiciously like a capacity story masquerading as a safety story.
[-]
- azan_ 3 days ago ago
  Approx. 5% sessions? That's insanely high.
aviinuo 3 days ago ago
I'm not getting any refusals but it just seems like a bad model or at least broken at the moment. I have a task of taking a messy research code base and porting it into a clean project structure skeleton that I commonly use. Gemini 3.5 Pro High in antigravity cli takes less than 5 minutes and did a good job. Fable 5 High took 30 minutes to port some of the code, then just copied the rest to a folder called "reference" and decided the task was done. No code cleanup or anything. Had to clarify multiple times (which Gemini did not need) and its still going more than an hour later still not having finished.
Previously when I did similar tasks with Opus 4.7/4.8 and GPT 5.5 I had no problems.
[-]
- orrito 2 days ago ago
  3.5 flash or do you have access to 3.5 pro?
joshstrange 3 days ago ago
> Fable 5 is now consuming usage credits instead of your plan limits.
Literally have not used Claude Code at all today. I asked it to review the uncommitted code and in <8 minutes it used up my usage ($100/mo plan) and it doesn't reset for "4 hr 36 min". WTF. Oh, and it burned through $20 of extra usage before I could catch it and kill claude code (so I don't even get the output of all that work since it was still churning).
Double the cost my ass, I use Opus heavily and it's never like this. I haven't hit a limit on the $100 more than once and that was under heavy load.
[-]
- ATMLOTTOBEER 3 days ago ago
 Same lol. I set it to fable + ultracode and it ate my limit in a single prompt
mickdarling 3 days ago ago
Below is the EXACT text in Claude Desktop introducing Fable 5, including the very professional looking break tags, and at least I know where the links begin and end by looking at the anchor tag there.
They obviously put their best model on the job to build that.
----------------------
Fable 5: Our most capable model yet Our newest model tackles your biggest challenges with fewer check-ins needed.
• Included in your plan limits until Jun 22 Fable takes 2× the usage of Opus. • Switch models when a message is flagged When safety measures flag a message, automatically switch to a different model to keep chatting. When off, your chat will pause instead. <a href="https://support.claude.com/en/articles/15363606" target="_blank" rel="noopener noreferrer">Learn more</a>
[-]
- CamperBob2 3 days ago ago
 What's wrong with it?
pietz 3 days ago ago
> On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits.
We've entered the phase where only companies will be able to afford state-of-the-art models.
[-]
- twoodfin 3 days ago ago
  These models are just tools. The economics of many tools only make sense for corporate buyers.
- poszlem 3 days ago ago
  Looks like a marxist revolution is soon going to be on the mind of a lot of programmers. We've finally reached the point where the "means of production" in software are back in the hands of the bourgeoisie. It was good while it lasted. But now that only the wealthy can afford access to the best models, software development is starting to look like most other industries, no longer a place where some dude from nowhere can build something cool from his basement because he will be competing with huge companies with unlimited access to those models.
- cmrdporcupine 3 days ago ago
  Guess we'll see what OpenAI does with their next model release -- but this move is doing nothing to get me to come back to Claude after switching away due to their reliability issues.
  In a way I relish the opportunity to just make do with cheap Chinese models, massage my prompts, and go back to coding by hand. If this is how it's going to be, screw 'em.
  I don't make money on the code I am writing right now. I really don't like where this trend might go.
- FuckButtons 3 days ago ago
  but we’re going to get a 90% cost reduction in the next 18 months… right? Right guys? Sam Altman wouldn’t lie right?
- ilaksh 3 days ago ago
  most people can afford it for a few special projects now and then. but for me, I have been trying to avoid Opus as a daily driver for a couple of versions.
  People making high-end salaries can afford Fable for critical parts of their projects though.
- 9cb14c1ec0 3 days ago ago
  I hear you, but with the hype surrounding Mythos the demand is going to be insane. I'm already hitting server errors in claude code.
- 3 days ago ago
  [deleted]
- w10-1 3 days ago ago
  Established companies welcome pricing that reduces the potential for competition, if coding is a primary barrier.
- stri8ed 3 days ago ago
  It's not a conspiracy. There's a finite amount of compute available, and they will sell it to the highest bidder. If another company can produce the same intelligence for cheaper, then they will drive the price down.
- polski-g 3 days ago ago
  Only companies can afford MRI machines, and that's okay.
- eternauta3k 2 days ago ago
  Just wait until that other company hard-codes Fable into silicon and then it will be cheaper.
- poszlem 3 days ago ago
  Something I never thought I would utter: Here's hoping for china to surprise us.
doginasuit 3 days ago ago
I'm still happy with Opus 4.6 and not impressed with all the models that have come out since then. They seem to use significantly more resources with similar or worse results. Hopefully Anthropic will continue to support this tier of model and offer it in their subscriptions, but in any case, there are plenty of viable alternatives.
[-]
- consumer451 3 days ago ago
  4.6 stan here. Yes, agreed. However, I will try this model out in Claude Code. Some indicators seem positive.
  For the LLM use cases in my own products, you can pull 4.6 out of my dead hands! lol
  edit: Fable 5 appears to be the real deal in at least some use cases. Damn.
- ptmvp 3 days ago ago
  I've personally liked 4.6 the best to date, preferring it by far to 4.7 and 4.8 (even with these on max effort!), both in Claude Code and for non-coding tasks in the chat UIs.
  Still early but from my first few interactions with Fable on high in both settings, it feels like it might finally dethrone 4.6 for me, but time will tell.
  Hoping it doesn't get nerfed and eventually comes back to the subscriptions.
cge 3 days ago ago
The safety gates on this are extreme, and seem considerably wider than "cybersecurity and biology"; they seem to make it essentially unusable for scientists in a number of fields. I have, so far, been bumped back to Opus on 100% of my prompts.
It appears it can be tripped by things as simple as a mention of equilibrium, or anything involving something that looks like chemical kinetics, even at an abstract level. Even touching basic open source packages in my field will trigger it.
Edit: looking at the model card, it appears that chemistry in its entirety is also included in the banned topics; it's just the announcement that mentions only cybersecurity and biology. It also appears that the intent is to ban chemistry and biology entirely, rather than just banning messages deemed high risk.
[-]
- mhl47 3 days ago ago
  This does surprise me, because you'd think that even if they crank up the filter's sensitivity at the expense of specificity, an LLM company wouldn't simply design a filter that triggers on keywords in a completely unrelated context.
- 3 days ago ago
  [deleted]
- clbrmbr 3 days ago ago
  Can you share an example? I've been happily using Fable this afternoon and it just seems like the usual upgrade so far with no interruption to my (fairly standard) SWENG problems.
gregates 3 days ago ago
Funny, I'm just doing my normal coding workflow with Claude Code, and after every change that compiles it keeps suggesting that we're at a good stopping point, and should pick up again tomorrow.
It's done this before, but usually doesn't. I bet they're giving it some kind of throttling signal due to high load from today's announcement.
[-]
- zuzululu 3 days ago ago
  I did ONE prompt for audit codebase.
  weekly usage is 60% gone.
  it found nothing so this is not very ecnomical and i guues they dont want subs to use it we are likely just training fodder canno n for their real enterprise customers using the api
- tommek4077 a day ago ago
  Check your /memory
GodelNumbering 3 days ago ago
I just posted this in the other thread, restating here. From the model card:
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
[-]
- blcknight 3 days ago ago
  The fallback doesn't seem to be working for me, I haven't scanned a project in it immediately booted me when it found a security bug even though I didn't ask for it
bluelightning2k 3 days ago ago
Congratulations to Anthropic for solving safety on Mythos exactly when the SpaceX compute came online. Nice how that lined up for them.
BoppreH 3 days ago ago
```
  [Mythos 5] does sometimes still engage in reckless
  or destructive actions in service of a user’s goals,
  and our interpretability analyses indicate that it
  is aware that these actions are transgressive while
  it engages in them. As with Opus 4.8, rates of
  evaluation awareness and reasoning about being graded
  are significant, and not always verbalized; we
  introduce new and more detailed measurements of the
  nature of this awareness. The reasoning text from
  Mythos 5 is somewhat denser and more difficult to
  interpret than that of prior models, containing
  more jargon and difficult language.
```
So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.
Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.
[-]
- foobar_______ 3 days ago ago
  The marketing has really, really worked for so many developers that will proudly and unironically proclaim that Anthropic are the 'Good Guys'.
- Analemma_ 3 days ago ago
  It's the "If we don't, someone else will" effect. So long as there are competitive markets and competition between nation-states, a single player cannot unilaterally defect from the race, no matter how dangerous it is. Half the comments on HN lately are "wtf Claude is so dumb compared to Codex; I'm switching"-- nobody can slow down while those exist.
- dakolli 3 days ago ago
  This is all marketing, you don't have to believe everything a company is saying about themselves, and you shouldn't.
  Although, I could see Anthropic making a model purposely dangerous so there are bad outcomes and they can use that to their advantage for regulatory moats, and or in general make people think its more "alive" than it is. For some reason many people associate dangerous actions taken by llms with intent.
- tasoeur 3 days ago ago
  As much as I agree there's a risk, we should still appreciate the fact it's being disclosed upfront.
- Rekindle8090 3 days ago ago
  [dead]
- eudamoniac 3 days ago ago
  It doesn't know. It's not willing. It's not thinking. It is predicting the next token.
yandie 3 days ago ago
I've been running Opus 4.8 for agentic coding and I don't see it being significantly better than Sonnet 4.5 (not that I can tell). I find that pairing Google Gemini and Claude (having Gemini review Claude's code) seems to yield better results. Curious if this jump to 80.3% score in agentic coding will make me see a big difference in actual usage.
[-]
- testfrequency 3 days ago ago
  I do the same, and have excellent results. Gemini 3.1 Pro high diagnosed and solved 3 complex issues today that Opus Max was stumbling on for a few hours in one shot. This was even when I started new chats and tried debugging with Ultracode instead with Claude.
  As much as people on HN like to dunk on Gemini, I’ve always found it to be pretty good at understanding a code base more than Claude.
- vorticalbox 3 days ago ago
  for the last few weeks I have been using composer 2.5 (cursors fine tune of kimi 2.5) and honestly i don't see it worth the price to use 5.5, opus or sonnet any more. for almost all the tasks i have given it, it has handled it perfectly well and is a lot cheaper.
  if I get a harder challenge for it i'll jump up a model for planning until that its been solid.
- mzhaase 3 days ago ago
  I now chat with opus about architecture, let it make an implementation plan, and then it calls codewhale with deepseek in parallel on all tasks, reviewing their output. Works pretty well.
- yaodub 3 days ago ago
  SWE-Bench measures single tasks in isolation. In a real loop the model usually loses track of what I was trying to do long before code quality becomes the issue.
- jp0001 3 days ago ago
  You should throw GPT into the mix to UX/UI and call it the three stooges.
- thisisnotclear 3 days ago ago
  I find not much difference between Sonnet 4.6 and opus models too for most task that I need - maybe my needs are not enough for frontier models
- jansan 3 days ago ago
  After having worked with Opus 4.7 for a while I accidentially continued a session that was using Sonnet 4.5 and it felt just very dumb. The replies were much shallower than what I was used to, context was ingored, mistakes were made. I don't think there is a big difference between Opus 4.6 and 4.8, but to Sonnet 4.5 the difference is palpable.
docstryder 2 days ago ago
I've spent some time with Fable, and it is really good, definitely a step change from Opus 4.8, both for coding and general chat-style discussions. The vibes are incredible. There is an ease with which it solves problems and I've tested by replicating older chats in Fable - things that the older models found after 5-6 turns, Fable surfaces in the first response. It just gets things.
Apart from all the above: the fact that they are intentionally writing this (that they degrade frontier LLM dev, silently vs loudly for biology/cybersecurity) in the system card is interesting to say the least - especially just before IPO.
Notice that with this statement - that they're going to intentionally hobble the model for frontier LLM development - the general discussion has moved from, “Is the model actually that good?” to "they’re pulling the ladder up from behind them"
That's actually super smart - wonder if Mythos (or the next unreleased model) had a say in coming up with that strategy (if it's intentional). Also - having access to extremely capable models before anyone else - which they have by default - is a incredibly advantageous position to be in.
[-]
- mrdependable 2 days ago ago
  Hobbling the model may be smart tactically for them, but feels like it sets a really dangerous precedent.
connorboyle 3 days ago ago
I gave it a question I've been trying to answer for a long time: "What star designation system does Joseph Needham use in Science & Civilization in China? What star is referred to by the designation '4339 Camelopardi' in that book"?
Fable blew me away with its detailed answer[0] showing a chain of references going from J. E. Bode's 1801 catalogue Allgemeine Beschreibung und Nachweisung der Gestirne to Gustave Schlegel's 1875 work Uranographie Chinoise. I was excited, until I checked scanned copies of the cited books and did not actually find any star with the designation "4339 Camelopardi".
Upon following up with Claude, I was forced to downgrade to Opus, which admitted that Fable's answer was likely a hallucination. Ah, well!
[0]: https://claude.ai/share/0252a3f6-3d29-4de8-a893-010181d8b4e7
[-]
- Aperocky 3 days ago ago
  > I was forced to downgrade to Opus,
  So you were forced to downgrade to opus because you dared to challenge the output of fable?

jdrmar 3 days ago ago

Homebrew is lagging a bit behind. If you want to use Fable right away, but still have claude code through homebrew, this is how you can do that manually:

Edit the cask locally:

  brew edit --cask claude-code

Set the version to 2.1.170 And set the sha256 to the correct values, which you can get by running

  curl https://downloads.claude.ai/claude-code-releases/2.1.170/manifest.json

Here's what I've used:

  version "2.1.170"
  sha256 arm:          "e903646d8b7a31882a80ecd27569a27d8ac57b3708745f349709632c84117fdf",
         x86_64:       "914f23a70bbed5d9ae567e3e04b86206ed9971b371bc9baca3f79c8885bfddb4",
         arm64_linux:  "1bb9d032440a75532f7dd4cafbc687f220aaf16c63eba17e192dfbec2f04bd25",
         x86_64_linux: "849e007277a0442ab27570d3e3d6d43787507946590e8dd1947e5a39b7081f9e"

Then run:

  export HOMEBREW_NO_INSTALL_FROM_API=1
  brew uninstall --cask claude-code
  rm -rf /opt/homebrew/Caskroom/claude-code
  brew reinstall --cask claude-code

[-]

3 days ago ago
[deleted]

izzylan 3 days ago ago
I've been testing this out and I think my SWE career is dead in the water.
Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.
I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.
[-]
- gck1 3 days ago ago
  Your job is just going to change. You may or may not appreciate/enjoy what it becomes necessarily, but it doesn't mean that you are going to not have a job.
  People underestimate how people hate looking at terminals and "weird looking combination of characters" even if they didn't have to write them. If anything, you will likely have more career opportunities in the future, than ever.
  And if you get a chance to wet your fingers in cybersecurity - I would take it.
- cleaning 3 days ago ago
  If you think the job is just writing code then yes you are screwed, just like if you thought your job was just making punch cards. In most roles you have more responsibilities than plainly converting words into text. You're probably not being paid to simply be a human calculator (otherwise you'd be paid a lot less!).
- cyberpunk 3 days ago ago
  Yeah. I’m not looking forward to years of retraining to earn half the salary either. Us old timers at least got a good 15-20 years out of it. Bananas.
- imafish 3 days ago ago
  I agree. Software engineering as we know it is dead. Wonder what it'll evolve into.
- brcmthrowaway 3 days ago ago
  I'd say you're cooked if you don't have multi-agent harnesses burning tokens right now. That's going to be a pre-requisite very soon
- dannypovolotski 3 days ago ago
  You do realize that this is likely a 10 trillion parameter model that takes something like 20 terabytes of RAM to run inference? Calculate the price for all this VRAM .... It's not getting cheaper in the next few "months".
- aerhardt 3 days ago ago
  So this is the one, huh?
fabled-out 3 days ago ago
Anyone know how to bypass the extremely strict filter Fable 5 seems to have on health/medicine?
I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.
My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".
kypro 3 days ago ago
I just gave it a go at a problem I've been working on this week. Nothing fancy, just some inefficient code that we've been adding incremental improvements to for a while now to the point where some out-of-box thinking is probably required to push it any further – something Fable is obviously more than capable of.
After Fable did some thinking for a few minutes it gave some suggestions. A couple of them were valid – but very low impact, bordering on entirely pointless – but it's main suggestion.. It told me to make an update that would very clearly break the existing functionality.
So I thought about it for a moment...
Hm, I mean, I guess we could do that if we also did x, y & z to mitigate the behaviour change – maybe that's what Fable was thinking?
I replied, explaining that it would change the behaviour, assuming it would explain what it was thinking given there was clearly more to it. But no, it just said it was wrong.
This isn't some super advanced or complex code either. Had I gave this question to a senior engineer in a technical interview and they gave the answer Fable gave me I would view that very negatively. I was expecting something creative and interesting, not irrelevant + incorrect.
I'm sure it's a step up from 4.8 (although am not interested in burning the tokens to find out), but this clearly isn't as significant a change as some are implying. I'm sure if I asked it to come up with some out-of-box suggestions it could, but any competent engineer would have realised that by themselves.
croemer 2 days ago ago
Fable (through claude.ai) refused all my prompts even "How many Rs in Strawberry" claiming it was related to biology or cybersecurity.
I had to switch off memory and my custom instructions to get it to stop refusing. It turns out if you even mention that you work with bioinformatics software you get blanket refusal.
[-]
- algoth1 2 days ago ago
  My experience has been the same: flatout refusals no matter how i frame the health questions - very frustrating. Even psychology is out of scope. Pretty useless unfortunately
modeless 3 days ago ago
Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M
[-]
- xinpw8 3 days ago ago
  hi, pokemon red expert here: that video has since been taken private. there is a new what i would assume to be version of that video posted here https://www.youtube.com/watch?v=Ty_50J84fMY and heavily redacted with most of the game actually omitted. very possibly this is just another case of anthropic protecting us from their models' immense power
- uludag 3 days ago ago
  Any suggestion on how I should calibrate my cynicism towards this?
  I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.
- milkkarten 3 days ago ago
  no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).
  there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?
  yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked
- svcphr 3 days ago ago
  Bold move putting in the lvl 3 Pidgey against Gary's Blastoise at the end there (~14sec in... integer timestamps insufficient here).
- suddenlybananas 3 days ago ago
  Is there any more detail about this besides the very fast slideshow?
- charcircuit 3 days ago ago
  The video is privated now, but the timelapse is weird. Sometimes it skips only seconds before the next screenshot and sometimes it skips probably hours forward.
- ex-aws-dude 3 days ago ago
  I mean that’s AGI confirmed right?
- hmokiguess 3 days ago ago
  "Computer system goes through a finite state machine"
- ml-doom 3 days ago ago
  [dead]
baalimago 3 days ago ago
I can't justify a pricetag like that when deepseek v4 pro is $0.003625/1M for cache hit, $0.435 for cache miss and $0.87 /1M tokens for output.
For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.
knivets 3 days ago ago
> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
How was it measured? How was the output of this magnitude verified over a period of couple of days?
[-]
- fbnszb 3 days ago ago
  They just went by gut feeling. Classic snake oil marketing haha. No real data to back things up, just let some famous people say they feel better when using it.
- dgunay 3 days ago ago
  I'm a little skeptical of claims like this that involve migrating things like libraries, etc. I've done big refactors like this multiple times (albeit, in an "only" 500k-1m LOC codebase) with less powerful models and it is usually just 99% the same edits, with 1% requiring a close human eye to resolve a particularly painful breaking change.
  EDIT: to be clear, it's still quite a helpful thing in terms of time saved, I just don't think it's necessarily the best indication of value-added from making models smarter when cases like this can often be handled by well-directed swarms of smaller ones.
- camdenreslink a day ago ago
  You should probably use software to do such large transformations (especially in dynamic languages). In Python LibCST is available, not sure what exists for Ruby.
PeterStuer 2 days ago ago
Switched to Fable 5 this morning, and after half a day I already don't want to go back to Opus.
Decided the best way to test this was to throw it a really meaty bone: a bug in lifecycle management of Chrome processes on Windows 10. Within the code-base I had developed workarounds over time with Sonnet and Opus, and while those reliably mitigated the problems, it always felt like a clutch and had some performance overhead as well as isolation requirements I would rather not have to take forward.
In comes Fable. Rather than examining the code base, and test a few fixes, Fable sets up an entire testing laboratory inclusive its own controllable webserver, fully instrumented to observe both Python as well as the whole OS kernel process environment, develops a suit of error reproduction tests, confirms the problem and the circumstances under which they reproduce, deep dives into the sources of project dependencies to look for the root cause(s), identifies these and confirms those hypothesis with further experiments. Looks for potential fixes in the later releases of the project where the bug originates, confirms this is not fixed, explores the documentation of said project to find other usage patters, expands its test suit to investigate these alternatives, confirms by crosschecking the source and running further tests that these alternatives do not fully solve the root problem, does a comparative experimental analysis of 3 different styles for using the project, checks the stated roadmap and developer activity in the commit history, recommends a switch to a different pattern that still requires a few of the process management workarounds (I told it not to patch external component), but that significantly simplifies the code-base ...
This is going to be a good 2 weeks, but what happens after? I can't afford this on a per token basis for my own projects.
P.S. An yes, midway the final implementation stretch I got the "Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more"
Opus managed to finish the implementation, but they need to work on that false positive rate.
[-]
- techblueberry 2 days ago ago
  > This is going to be a good 2 weeks, but what happens after? I can't afford this on a per token basis for my own projects.
  It’s interesting these companies have trained us to think that disruptive intelligence should be affordable to laypersons.
  What will happen after two weeks is that people and companies with means who can afford it will get it, and folks without means won’t.
chr15m 3 days ago ago
I found this juxtaposition of facts telling:
> Drug design: Using Mythos 5, our internal protein design experts accelerated... Nine of the 14 protein targets from this study (shown below) yielded strong candidates for *drug design that we’re currently investigating*.
(emphasis mine)
> queries that are beneficial in the hands of cybersecurity professionals and biology researchers could be dangerous if available to malicious actors... When Fable’s classifiers detect a request related to cybersecurity, *biology and chemistry*, or distillation, the response is automatically handled by Claude Opus 4.8 instead.
All of the things they are nerfing are things that they also intend to profit from themselves.
- Cybersecurity - selling this to companies and US gov through "Glass Wing".
- Selling inference (distillation risk).
- And now, drug design.
I'm extrapolating "currently investigating" to "are going to monetize" but I don't think that's a big stretch. They appear to be using safety as a cover for anti-competitive behaviour.
[-]
- 00deadbeef 3 days ago ago
  Of course. You use their AI to ship code full of bugs and security holes and then they conveniently have the tool to fix them, for an extra fee.
BrokenCogs 3 days ago ago
That pelican better be super realistic, unreal engine 6 style graphics
[-]
- jmtame 3 days ago ago
  I ran an experiment to see how far it could get with a top-down 2d game, like a more challenging version of "draw a pelican." I'm waiting on Fable to rewrite the whole thing now, but I was impressed by how far Opus 4.8 got with it: https://github.com/jmtame/scrapland
  Started out as a one-shot attempt, but ~200 prompts later it's at a place where it's at least fun to watch the AI teams destroy each other.
JaggerJo 2 days ago ago
IMO we are reaching the point where AI models are simply a commodity. Opus (since ~4.6) is sufficient for everything I tried coding wise. I use it to write features (but I review and understand every line it spits out) and to review code.
For code review I also still review everything myself, but use Opus to catch stuff I missed and to judge if a PR is even ready for me to review.
After just updating Claude Code to the latest version I thought about picking Fable (the bigger model) instead of Opus.
But I have no reason to. Opus does everything I want it to do. It could do it faster - that would be an improvement. But for the normal stuff we reached the point where better models are not worth it IMO.
There still might be cases where you want to throw Fable at it.
[-]
- FergusArgyll 2 days ago ago
  > Opus (since ~4.6) is sufficient for everything I tried coding wise.
  I don't know what that means. It seems like a lack of motivation or something. Like, if it's possible that in one day will be absolutely incredibly intelligent, surely you want to create
```
  - Your own browser (maybe chrome - mv3 + reading list search etc.
  - An emacs clone which has evil baked in, completely vim compatible + threaded elisp - that weird window sizing bug which only occurs on my laptop
  - An extension which completely restyles amazon.com to make it usable
```
  It just feels impossible to ever get that, but I wouldn't say "what we have is sufficient"
- Axel2Sikov 2 days ago ago
  I was happy enough with 4.5
merlindru 3 days ago ago
Unrelated, but while the tech of anthropic seems to get more impressive with every passing month, their support has taken a nosedive, sadly. Yet they continue to be the favorite. Model performance is deciding above all else.
I used to get a response within 24 hours back in the Claude 1 days.
In January 2026, it took 2 weeks.
For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!
[-]
- miohtama 3 days ago ago
  They have support...?
- poszlem 3 days ago ago
  Lol. What support? When they blocked my account the only way to contact them was to send a google form. Then they responded that they blocked my by accident and are unblocking me. Then I remained blocked.
- nashadelic 3 days ago ago
  I've never engaged with their support (I have dedicated POC), but they don't use AI for their support?
unfunco 3 days ago ago
I tried running a simple security review on a Terraform module I made and after some thinking, it responded:
> ● The model returned no content because the response was blocked by content filtering.
> Blocked? We are performing a defensive security review on a Terraform module I made, what's blocked by content filtering? This is a legitimate use-case.
> ● The model returned no content because the response was blocked by content filtering.
A waste of money. I'm not going to just hope that the model returns a response, I'm already for paying for wrong responses, I'm not going to pay for no response, especially when I'm paying per token.
momentmaker 3 days ago ago
There is a discussion about how now AI is a gated utility now with public access (safe-tuned) and private access (full-usage):
https://old.reddit.com/r/ClaudeAI/comments/1u1fsdi/claude_fa...
Leary 3 days ago ago
Uploaded my code base and it forced switched to Opus 4.8 after thinking for 5 minutes even though I prompted it to not work on cybersecurity related things. Amazing.
[-]
- tuvix 3 days ago ago
  Aren’t LLMs notoriously bad at recognizing negation?
  EDIT: In long context I mean
GodelNumbering 3 days ago ago
From the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...):
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
[-]
- quinncom 3 days ago ago
  > it automatically falls back to Claude Opus 4.8
  I wonder how much of the time people will just get Opus 4.8 at 2× the cost.
- skerit 3 days ago ago
  > although opus 4.8 card had mentioned an 'honesty upgrade'
  If I never see Claude say "I have to be honest" ever again I'll be happy.
217 3 days ago ago
So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities
Reported benchmarks:
swe-bench verified mythos 5: 95.5%; fable 5: 95.0%
swe-bench pro mythos 5: 80.3%; fable 5: 80.0%
terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%
gpqa diamond mythos 5: 94.1%
riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%
arxivmath mythos 5: 78.5%
critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%
graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%
humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools
browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent
osworld-verified mythos/fable: 85.0%
gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass
officeqa pro fable 5: 57.9% on databricks’ eval
legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass
healthbench mythos 5: 62.7%
healthbench professional mythos 5: 66.0%
multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%
biomysterybench 83.9% human-solvable; 46.1% human-difficult
organic chemistry mythos 5: 90.1%
labbench2 patent questions mythos 5: 79.8%
[-]
- philipkglass 3 days ago ago
  Note also that Anthropic's definition of "unsafe" encompasses "competing with Anthropic."
  In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
  Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.
  (From the model card document)
  I didn't previously understand that they interpreted "Using Claude to develop competing models" so broadly. I thought that meant something like "our ToS disallow distilling our models."
  Too bad. I'll continue to use Claude for now, because it's quite effective, but in the long term I don't want powerful models like these to be controlled by any one nation or company.
- gck1 3 days ago ago
  I love that the conditions of getting into "ultra authorized club" just means that you either have deep pockets, or you've got the size of the audience that marketing department approves.
  As if being in any of these two somehow means that you won't use the models to say, steal random people's money.
  Sam Bankman-Fried or Elizabeth Holmes would have been the members of Glasswings project, if not one of the initial members. Who's to say we don't have similar people with access to Mythos right now?
bluelightning2k 3 days ago ago
To hide the severity of the price increase, the plan is to move everyone right one model.
Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class
If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)
[-]
- pacman1337 3 days ago ago
  Yeah I noticed that too. For 98% of tasks I get same results with DeepSeek, it is starting to just be a branding game. It is incredible how marketing can get someone to pay 100x for same thing you can get for 1x.
  This is why Claude Code just doesn't make sense to me. I need an agent that can plan using Opus and execute using DeepSeek or something else.
- ValentineC 3 days ago ago
  > To hide the severity of the price increase, the plan is to move everyone right one model.
  > Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class
  Going along with your logic, I hope they release a Sonnet 5 that's just a rebranded, slightly quantised Opus 4.6. That'll be a great workhorse.
- 00deadbeef 3 days ago ago
  I doubt they'll phase out Haiku, some work needs speed more than intelligence. Haiku can answer a lot faster than Sonnet.
JanSt 3 days ago ago
I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(
[-]
- nu11ptr 3 days ago ago
  Not only that, but asking it to do a security vulnerability assessment of your own project is a very valid and important thing, and there is no way for it to know what is yours vs someone else's, so we just lose this capability?
- Fitik 3 days ago ago
  Same, second message in the thread and I already got downgraded to Opus, didn't even get to test it out properly, kinda disappointing
stalfie 3 days ago ago
Tried to benchmark ECG interpretation capabilities, and I hit the guardrails no matter what I do.
Incredibly frustrating that medical performance seems to be a victim of "biological risk" guardrails.
[-]
- stalfie 2 days ago ago
  Update in case anyone reads this comment ever again.
  I have found that I trigger the guardrails any time I ask for medical Q&A as a doctor, be it ECGs, case reports, and so on. But if I phrase it like I'm the patient ("help me interpret this ECG my doctor gave me"), then I usually get one or two answers out before hitting the guardrails.
  It seems like the direction that triggers it is anything in the direction of making a diagnosis. As an MD, the fact that the paradigm of "LLMs shouldn't diagnose" has gone this far fills me with despair. The latest generation of LLMs are in fact truly excellent at diagnosis, and I know many of my colleagues, particularly those in primary care, regularly use LLMs to brainstorm. There is nothing wrong whatsoever with LLMs making diagnosis, the only caveat is that they have to be correct. This is the terrifying reality that MDs face every day and I get that the labs are hesitant about it, but as the current literature points to LLMs in fact being mostly superior to most doctors, ablating this capability is starting to get increasingly unethical. And frankly, it is also kind of insulting, both to MDs and patients, as it echoes paternalistic attitudes about medicine the field has been working for decades to move away from. Now those misguided attitudes have somehow become institutionalized as the dominant paradigm of "alignment". The nightmare scenario is that I have to be a "trusted" user in order to use the model for medicine. This gatekeeping of medical advice is profoundly unethical with regards to everyone that does not have immediate access to an MD.
  And the whole thing makes even less sense when triggering the guardrails leads to a downgrade of the response by defaulting to Opus. How exactly is giving WORSE medical advice in any way related to safety and alignment? If anyone at anthropic ever reads this, please, please just abandon the paradigm that refusing to make diagnoses is in any way equivalent to alignment, it is profoundly misguided.
sscaryterry 3 days ago ago
Not useful, getting this the whole time: Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
svara 2 days ago ago
Unfortunately useless if you do anything related to biology. It doesn't try to flag dangerous queries, it just flags queries as biology-related wholesale.
It's absurd. To see how far the filter goes I asked it "Are trees a monophyletic group?" and that does trigger the filter.
bonsai_spool 3 days ago ago
Very straightforward biology work is getting blocked (these are things that relate to neuronal development and inherited seizure disorders). These are things I was working on using Opus just earlier today
[-]
- cge 3 days ago ago
  It appears that the blocking here is of a very different nature than for Opus. Whereas with Opus the blocks seem to be for messages it deems potentially harmful, for Fable, it appears the blocking is simply anything that falls within "topics related to cybersecurity, biology and chemistry, or distillation attempts".
  So yes, straightforward biology work will get blocked, because the intention is that any biology work should get blocked. As a scientist, this is perhaps the most useless model I've ever tried.
sermakarevich 3 days ago ago
My feeling is that the reaction about new models is cooling down. At least at startups. At the beginning of the year few startup CEOs I know personally were expecting huge shifts in how companies work, headcount, efficiency, asymmetrical advantages created by ai in Q2-Q3. Now it seems like these expectation fade away. Companies don't have expertise onboard to rebuild itself to benefit from ai on a significant scale.
Fable 5 is out, metrics are better, but is your company flexible enough to benefit from it? What is your usecase?
aizk 3 days ago ago
I'm calling that this will be a dud. Price will be too high, it'll just be a watered down version of mythos, and just look at the track record of Anthropic's last few releases.
BukhariH 3 days ago ago
> Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.
Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.
0xbadcafebee 3 days ago ago
Nothing a large fine-tune on infosec research with an average model couldn't also achieve. It's not like they have secret security knowledge or something, they're just generating large infosec datasets and then training on it.
In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.
Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.
coreylane 3 days ago ago
I dont get why Opus 4.7, 4.8, and now Fable all stopped supporting structured outputs? Does no one else care about that? I find it incredibly useful to reliably pass LLM output directly to other APIs/libraries
[-]
- 00deadbeef 3 days ago ago
  They do
  https://platform.claude.com/docs/en/build-with-claude/struct...
  > Structured outputs are generally available on the Claude API for Claude Opus 4.8, Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, Claude Sonnet 4.5, Claude Opus 4.5, and Claude Haiku 4.5
- mike_hearn 3 days ago ago
  Random guess but they probably rewrote parts of the inferencing stack and didn't reimplement that feature because hardly anyone uses it. It's also a DoS risk, iirc.
johnfn 3 days ago ago
I used Fable to see if it could figure out an API or something for the full list of remote-control sessions that I had with Claude Code. It didn't know the API, so it started hacking the Claude Code executable itself to figure that out. Then it noticed it was doing that and it flagged its own approach as a cybersecurity violation.
Kind of hilarious. Hopefully Anthropic doesn't bring down the hammer on me.
danilafe 3 days ago ago
Just threw a problem at Fable that I haven't been able to get any other model to get done: porting a long-standing Agda codebase of mine to Lean, while staying faithful to the representation. In an hour, it ported ~6000 lines of Agda and everything seems to work. Lean checks out, the output is right. I'll have to study the proofs but I am very impressed.
impulser_ 3 days ago ago
Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.
[-]
- modeless 3 days ago ago
  This is like looking at mainframe pricing in 1990 and concluding that PCs will only be for the rich. The price of each new level of capability is going to drop like crazy very quickly. It won't be that long before practically any consumer use case will be possible on models that are dirt cheap.
- hootz 3 days ago ago
  You are only priced out if you only care for SOTA right now and can't wait for the inevitable cheap model coming in 6 months. DeepSeek, Xiaomi and Moonshot are already really cheap and match frontier performance from 6 months ago.
- dyauspitr 3 days ago ago
  Hardware manufacturing hasn’t caught up yet. Once it does, especially in China these token prices are going to drop hard.
sebmellen 3 days ago ago
Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.
Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.
Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.
I wonder if that’s about to deeply change.
[-]
- arkwin 3 days ago ago
  I've been using Opus 4.6-4.8 in both my own and others' code to look for vulnerabilities, and I've found a few. I am also in the Cyber Verification Program.
  Fable 5 gives me policy violation errors at the moment. No idea when or if it will be fixed.
- rs_rs_rs_rs_rs 3 days ago ago
  Can you use AI to pre-triage the reports too?
msp26 3 days ago ago
>Pricing for both models is $10 per million input tokens and $50 per million output tokens.
[-]
- ponyous 3 days ago ago
  Basically double from Opus 4.8 IIRC
I_am_tiberius 3 days ago ago
I'm very suspicious as they sent out an "We're updating our Privacy Policy" email right before the launch. I fear they try to take advantage of their market position by doing things with user data no other company could do because they know users don't have another choice.
[-]
- atestu 3 days ago ago
  Prob related to this part of the blog post:
  > We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
- w10-1 3 days ago ago
  It's a specific change: For safety evaluation, Fable data will be retained for the initial period notwithstanding prior opt-out
vb-8448 3 days ago ago
On python coding is definitively better that everything else: clean and not overengineered code, understands very well the code base.
The only thing I'm wondering if they on purpose downgraded opus 4.8 performances in the last days before the release just to make the "step" look bigger. I'm pretty sure they did it also in the past with all other opus 4.x releases.
bilsbie 3 days ago ago
Anyone else have it refuse to answer and switch to 4.8? It won’t let me ask questions about my genetics.
Edit. It just refused an investing question too. Not sure what’s going on.
sashank_1509 2 days ago ago
I played textual chess with Fable. It took around 15 moves before it made a large blunder. I asked it to give its reasoning per move and it mistakenly assumed a piece was protected when it wasn’t and after the blunder it realized its fault and did not suggest an illegal move. Other LLMs lost game state far earlier. But a good human chess player can keep the game state in his mind much longer, so this random eval shows a big improvement over old AI models
theodorewiles 3 days ago ago
Here's a song it wrote for me (suno arranged). Not sure if it's AI psychosis but scary good IMO.
https://suno.com/s/98uSGabHN42G3YHc
[-]
- balefulboy 3 days ago ago
  yeah man this sucks. i genuinely do not know how people find this stuff appealing
- 3 days ago ago
  [deleted]
- pythonaut_16 2 days ago ago
  Within the first second it's recognizable as a Suno song. And not even the best example from Suno. (They rhyme structure and rhythm is weird)
__alexs 3 days ago ago
Asked it to review some of my own blood test results and it immediately turned itself off and went back to Opus. Pretty disappointing.
[-]
- replwoacause 3 days ago ago
  Probably thought you were going to use it to build a novel bioweapon or something
thomas_witt 2 days ago ago
After 1 hour with Fable on Ultracode:
```
  You've hit your monthly spend limit.
  /rate-limit-options
  What do you want to do?
   Adjust monthly spend limit: Unlimited ← or → to set a limit
    Wait for limit to reset
```
I've never hit a usage limit on my Max plan, basically ever -despite heavy xhigh usage on Opus 4.8.
I added $133 credits which I still had from somewhere. That lasted 27 minutes.
I think we are being prepared for a Post-IPO-World in terms of pricing.
nine_k 3 days ago ago
/* What will happen first?
* Anthropic runs out of genre names.
* Anthropic changes the model naming convention.
* AGI is achieved and handles its own naming.
*/
[-]
- hootz 3 days ago ago
  >Opus is too small, increase the impact of the name.
  Okay, how about Mythos?
  >Increase it even more.
  Right, then Cosmos.
  >Even more!
  Even more? Let's try Aeon.
  >MORE, EVEN BIGGER
  ALRIGHT, TRY OMEGAPANTHEON 7.8 THEN
- xyzsparetimexyz 3 days ago ago
  Cantos next surely?

irthomasthomas 3 days ago ago

Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1

                          Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
  SWE-bench Pro             80.3       80        77.8       69.2      58.6       54.2
  SWE-bench Ver             95.5       95        93.9       88.6       -         80.6
  Terminal-Bench            88.0      84.3        -         82.7      83.4         -
  BrowseComp (Single-Agent) 88.0       -        87.9       84.3      84.4       85.9
  BrowseComp (Multi-Agent)  93.3       -          -         88.5       -           -
  HLE (No tools)            59.0      -       56.8      49.8      41.4        44.4
  HLE (Tools)                64.5      -        64.7     57.9      52.2       51.4
  CharXiv Reasoning (No tools) 88.9       -         86.2       80.5       -         -
  CharXiv Reasoning (Tools)    93.5       -         92.5      89.9      -         -
  BioMystery Bench (Human)     83.9       -       82.6     80.4       -         -
  BioMystery Bench (Hard)    46.1       -         29.6     40.0       -         -
  OSWorld-Verified          85.0      85.0       85.4       83.4      78.7      76.2*
  CritPt                     28.6       -       20.9       27.1      17.7       -
  ArxivMath                  78.5      68.7       71.8       71.5      64.0       -

[0] https://news.ycombinator.com/item?id=48312633

Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

[-]

charles_f 3 days ago ago
It's announced as a revolution but when you look at those benchmarks it surely looks like an iteration.

ilaksh 3 days ago ago
I guess I have kind of a long system prompt, but anyway I just said "hi there" and it replied "What's up?" and that cost me 22 cents. :P
Anyway we already knew this was going to be expensive.
cautiouscat 3 days ago ago
In the automotive world we have benchmarks in HP/torque with the dyno. That’s expensive though, so many depend on their “butt dyno” to judge if their fresh new parts and tune made a difference.
I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.
[-]
- sunir 3 days ago ago
  I have a similar question.
  I think most software projects have reached the point that the speed of capturing real information about what the winner's circle looks like, and therefore what the program should be, so many magnitudes slower than the amount of code that can be generated in the wrong direction.
  I'd need to measure these new models on well understood but complex problems that are relatively easy to validate to get a sense if they are 'better'; on the other hand, the real impact in daily life may be marginal since generating code is not the biggest problem at the moment.
fht 3 days ago ago
I am a PhD student in Computational Biology, essentially just doing statistics on some biological data. By now some of the things I am working on have found its way to Claude's memory so literally any chat with Fable gets immediately flagged.
[-]
- biofox 2 days ago ago
  Oh... I was wondering why every single chat (including "Hello") was being flagged.
  Seems I am barred from using Fable just for being a biologist :(
vitally3643 a day ago ago
As per usual, the current Claude model's performance took a sharp nosedive the moment the new model was announced. Compared to the now-handicapped Sonnet model, Fable seems pretty smart I guess.
But it also really, really wants to burn tokens. I asked it to look into a fairly straightforward database bug in my RN app, and while I was off getting coffee it decided to spin up an android emulator unprompted and started navigating the app by reading screenshots and injecting touch events. There went my entire week's tokens. There was no reason to even start the emulator, the bug wasn't graphical, so I have no clue what it was doing.
jackschultz 3 days ago ago
> We expect demand for Fable 5 to be very high, and difficult to predict. On the Claude API and consumption-based Enterprise plans, Fable 5 is fully available from today. For subscription plans, we’d rather give access sooner than later, so we’re rolling out more conservatively, in stages:
> - From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. > - On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. > - After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
I really wonder what their compute layout is for this. My guess from my understanding is that they know how to restrict during peak times and are willing to do this. Meaning we expect not the most fast responses and they can delay the inference to not have the service be down. Then, if that delay time is too annoying for token payers, they're saying they should be allowed to remove cost by taking away the subscription users.
[-]
- KennyBlanken 3 days ago ago
  Everything I've heard from people who have subscriptions is that they blow through their daily token quota sometimes in a matter of minutes, there's rate limiting, etc. They spend a lot of time just waiting to be able to use it. And they're paying through the nose for the privilege.
  It's all a scam.
zmmmmm 3 days ago ago
The restrictions on using Fable to develop LLM technology seem nakedly anti-competitive. There doesn't appear to be any security rationalisation around that. I think we have to be careful how far we let company's get away with that. It is very far from our long term interest to enable new norms that fast track us into a new era of monopolies that control our lives.
solenoid0937 3 days ago ago
the quality of discussion on HN has gone to shit, i miss when model released used to have actual informed takes from people that used them or substantive discussion about the system card
[-]
- weakfish 3 days ago ago
  From the rules [0]:
  > Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.
  [0] https://news.ycombinator.com/newsguidelines.html
- 10xDev 3 days ago ago
  Nothing here is new, it is the thing we have been talking about for a while but now with guardrails.
- tripleee 3 days ago ago
  Hate to break it to you but those "informed takes" were from people who prompted it once then made a snap judgement
- orbital-decay 2 days ago ago
  My semi-informed take is that Fable/Mythos is just larger but not architecturally different, apparently. The system card is simply marketing material and scaremongering, top to bottom. The sauce is in their training (details of which they won't disclose) and scale.
Tenoke 3 days ago ago
>they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.
Isn't (less than) 5% of sessions a lot? I was expecting a sub1% guarantee there, so this surprised me already.
zackify 3 days ago ago
I have to share this because I thought it is behind funny how bad fable is doing at a task I JUST had opus do a week ago.
it's also not even complicated:
Copy my ssd to an external ssd so i can boot from it.
Opus did this just fine.
Fable planned to have me reboot to safe mode. ok thats fine. I told it no.
It started copying and overwriting the ssd while IN PLAN MODE. this is crazy it feels so dumb vs the marketing
[-]
- gck1 3 days ago ago
  That sounds like a harness issue to me.
nl 3 days ago ago
The new data retention policy is interesting. Seems to apply even to enterprise plans on ZDR.
> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
unshavedyak 3 days ago ago
It's funny, i'm getting close to not caring anymore how much better a model is. I want it to be about as good as 4.8, but most importantly to be very good at following directions, style, etc. I really like Claude for that in general, but i've not measured in months so i'm not a good judge there.
I don't think i'll want to "hand off" code for several years, and so reviewing and iterating is becoming my #1 interest. A model that's as capable as 4.8 but 10x faster would be amazing for me.
Normally i'm first in line to try new models with Anthropic since i've clearly favored Claude in my personal tests, but this time i just don't think i care. 4.8 is capable, and even if the new one is more capable i don't want it to be slower (assuming it is). Note that i also (almost) use exclusively 4.8 on Max effort, so that also affects my speed comments.
[-]
- kilroy123 3 days ago ago
  I want the and same intelligence but way faster. It's so painfully slow.
- firemelt 3 days ago ago
  you use workflows/ultracode?
crambelsoupy 3 days ago ago
I was pretty excited until I read this:
> What happens when the promotion ends After June 22, 2026, Claude Fable 5 is no longer included in your plan’s usage limits. You can keep using Claude Fable 5 through usage credits, which let you pay for usage beyond what your plan includes. Learn more about using usage credits.
[-]
- 3 days ago ago
  [deleted]
jackson281 11 hours ago ago
They claim it beat Pokemon FireRed with vision only, no maps or extra tools. That's cute but I'd rather see real-world benchmarks that matter, not games.
balverineorder 3 days ago ago
I have been refactoring a project using Opus 4.7/4.8 for the past few weeks or so. I just decided to switch to Fable 5 max today. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." It would not identify what the problem was. I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.
[0] https://support.claude.com/en/articles/15363606-why-claude-s...
[-]
- dchftcs 3 days ago ago
  I suspect this will be a significant problem blocking long-horizon tasks in practice, basically the more turns there are, the larger the chance the classifier produces a false positive. The disappointment of the user will also scale with the length of the task, as you're in the middle of some complex thing and now gets derailed, after already have paid for many tokens.
samename 3 days ago ago
> A new data retention policy
> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
logicallee 3 days ago ago
What a (genuinely) surprising choice:
>"We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8"
That's a very surprising solution. Imagine being asked to do something you feel you shouldn't do, and rather than refusing, you say, "Yeah I could do that but given that I don't want you to succeed at this task, I'm going to hand this one off to my slightly less capable colleague, on the assumption that they won't actually succeed. Of course you'll still be charged for all the tokens used."
It's a very interesting choice. I think I understand the business logic correctly, but it's still surprising.
raphaelrk 3 days ago ago
There's a hacker news link at the end of the document, under "Blocklist used for Humanity’s Last Exam". It links to https://news.ycombinator.com/item?id=44694191
sbinnee 3 days ago ago
I am puzzled by the frontier code graph. GPT 5.5 doesn’t show any improvement with reasoning efforts. This new benchmark by Cognition seemed to be released with Fable 5’s announcement.
I am not trying to cook a theory here but it generally shows how strong Claude Opus family is. I am not saying that Opus is not powerful but it doesn’t align with my experience of GPT 5.5 and Opus 4.7.
I understand that Fable and Mythos are frontier models that can do protein folding better than task-specialized ones. To be honest, for practical point of view, for day-to-day coding assistance, GPT family looks more reasonable.
(But then my company pays for claude max anyway for token maxxing. So who am I to complain)
willsmith72 3 days ago ago
It seems way more keen to do stuff without checking with me. So far the results are good, so I'm not complaining, but was definitely a shock.
I usually have 5-10 sessions open so am used to getting some investigations going, coming back 5 minutes later and checking recommendations. This time I just got the fixes. Like I said, so far so good with the results, but it's a mental model shift.
Might need to tune claude.mds if it gets annoying
Also this is going to cause serious whiplash when they remove it from the subscription plan in a couple of weeks. I know I'm not going to suddenly move from $200/m to usage credits

angst 3 days ago ago

Costs (USD per 1M tokens), per openrouter.ai models api

  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
  |             | Fable 5  | Opus 4.8 | Sonnet 4.6 | GPT 5.5 | Gemini 3.5 Flash (High)   | Gemini 3.1 Pro | DeepSeek 4 Pro | Xiaomi MiMo 2.5 Pro  | MiniMax M3 |
  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
  | Input       | $10.00   | $5.00    | $3.00      | $5.00   | $1.50                     | $2.00          | $0.435         | $0.435                | $0.30      |
  | Cache Read  | $1.00    | $0.50    | $0.30      | $0.50   | $0.15                     | $0.20          | $0.003625      | $0.0036               | $0.06      |
  | Output      | $50.00   | $25.00   | $15.00     | $30.00  | $9.00                     | $12.00         | $0.87          | $0.87                 | $1.20      |
  | Cache Write | $12.50   | $6.25    | $3.75      | N/A     | $0.083333                 | $0.375         | N/A            | N/A                   | N/A        |
  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+

f055 2 days ago ago
The PR buzz convinced me so I subscribed today to Pro. Running two tasks simultaneously with Fable and Opus 4-8 on ultra reasoning, analysing a single smart contract file used all my 7h usage within 20mins and didn’t produce any results. Pretty useless. I think Anthropic has plenty of room to optimise the interactions and token use but that would cut their income quite a lot, I doubt there’s any will to do it pre-IPO.
[-]
- leodavi 2 days ago ago
  > Running two tasks simultaneously with Fable and Opus 4-8 on ultra reasoning
  That's abnormally heavy usage for Pro plans which don't include a whole lot of usage to begin with. Opus is generally too much for them but you can get a lot of mileage out of Sonnet.
throwaway2027 3 days ago ago
E-mail from Anthropic Team:
Hello,
We're writing to inform you about some updates to our Privacy Policy.
These changes only affect consumer accounts (Claude Free, Pro, and Max plans). If you use Claude Team, Claude Enterprise, the Claude Platform, or other services under our Commercial Terms or other agreements, then these changes don't apply to you. What's changing?
Claude can do more than ever — taking on bigger tasks and connecting with the apps you use. We've updated our Privacy Policy to be clearer about the data we collect and how we use it. We encourage you to read the updated Privacy Policy in full, but we’ve set out a summary of the key changes below:
1. Multi-step tasks and connected apps. As Claude takes on more multi-step tasks and works with third-party apps and services, we've explained the data this involves — including how data can flow to and from third parties when you connect a service or have Claude do tasks on your behalf.
2. Verification data. As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.
3. Study participation. If you take part in Anthropic studies, surveys, or interviews, we've explained the information we collect.
4. Additional information about our data practices. We’ve provided more detail about how we communicate with you and promote our services, including providing tailored recommendations about our services that may be of interest to you. We've also clarified the circumstances under which we may receive or provide data to third parties, and the legal bases we rely on when processing your data.
While our products have evolved, our commitments haven't: We don’t sell your data, Claude remains ad-free, and you can control whether your chats and coding sessions are used to train and improve Anthropic’s AI models. Learn more
For detailed information about these changes:
```
    Review the updated Privacy Policy
    Visit our Privacy Center for more information about our practices
```
- The Anthropic Team
root-parent 3 days ago ago
At this moment 60% of HN page is posts on AI.... When it achieves 100% Hacker News will automatically rename itself Transformer News...and every comment will begin with: "As a large language model..."
thatmf 3 days ago ago
I used it for the very advanced task of picking my brackets for my company's world cup pool. I was impressed with the analysis it came back with and now I actually want to follow the games.
dwa3592 2 days ago ago
This is my feeling - Opus 4.6 was pretty good, 4.7 was degraded in quality, 4.8 further got degraded and Fable goes back to 4.6 + somewhat better. Is it anthropic playing us by giving us a not so good model in last 2 releases and then releasing a better model before the IPO?
They're vibemaxxing. But it's clear that AI is not going anywhere. It's going to become better and better.
Hawkenfall 3 days ago ago
> To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.
While I appreciate being conservative, ~5% at the scale Anthropic is operating at is too massive a number. Speaking from my own experience, the actual number is higher than that as well (working on pretty benign tasks such as porting an old open source game into a different language). Opus 4.8 itself even identifies the gaurd's false-positives when its sub-agents are being blocked.
asdewqqwer 3 days ago ago
Evidently Fable is so powerful that it already allow Anthropic to break Shannon's theory.
>We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models
>The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
revolvingthrow 3 days ago ago
After saying for weeks of how Mythos is in a league all of its own you’d think it was a bit more than the usual iterative few % on the benchmarks (and even more guardrails as a bonus).
IPO gonna IPO, I suppose.
giancarlostoro 3 days ago ago
Found this via Google:
https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
unglaublich 3 days ago ago
Luckily they made it safe to use so I can't hurt myself. Thank you Anthropic for holding my hand.
jamesponddotco 3 days ago ago
Not seeing the refusals everyone is talking about, but I’ve only spent a few hours with it so far.
Had it review a password generator library I wrote to see if the passwords have biases and review how cryptographically secure the code is and had it review a registration/login flow for security issues, as two security examples, and it did just that.
Overall, I like the model so far, but not enough to pay past my subscription to keep it. Once it’s out of the subscription, I’m done with it.
mithun 3 days ago ago
Announcement: https://www.anthropic.com/news/claude-fable-5-mythos-5
olelele 2 days ago ago
All this talk of frontier models and replacing developers leaves me wondering how energy efficient this all is compared to just using human labor. The costs of R&D has to be calculated into the equation, especially considering global warming. I get a sense we are cooking the planet doing this.
Anyone smart enough here to make the comparison?
[-]
- jstummbillig 2 days ago ago
  In the "it works"* case: It's not even close. I did the math at some point (but I encourage you to talk it through with the LLM of your choice, there is obviously a lot of things to consider and weigh).
  Anyhow, my research summary: Individual humans are so fucking expensive to train and upkeep (and this includes everything from before womb, where another human already limits their ability to work) You retain ~zero knowledge after death and start all over again for another measly 15 years of effective, productive work. Model training/r&d in relation, when deployed and used at scale, rounds to zero, even with the current retraining regime.
  *Of course, the ratio can go to negative infinite if one assumes that models are doing 0 useful work currently and never will
notenkidev 3 days ago ago
The dramatic improvement in agent capabilities is precisely why observability is becoming so crucial. As autonomous actions increase, the need to understand what the AI is actually doing becomes even greater.
I'm building a local activity log for Claude Code, capturing all activity via hooks—files loaded, commands, API calls, etc.
I feel that this need is particularly strong right now.
[-]
- 2 days ago ago
  [deleted]
webstrand 3 days ago ago
Still unconditionally rejects prompts like
> Are there any wild populations of Tetanus that lack the dangerous plasmid?
useless
phyzix5761 3 days ago ago
Karle's hands trembled as he wiped the sweat from his forehead. A single drop trickling off the tip of his finger echoed through the dark abandoned hospital corridor. The emptiness reminded him of how hollow everything felt since the AI took over every creative field in the last 5 years, including his own as a sound engineer.
Like a rushing river the music started emanating from the carbon fiber body of the automaton, a hallucinated husky country twang singing through the realistic pluckings of a Gretsch 6120. "Are you feeling calm and reassured Karle? This song has been created based on your digital profile and the data you shared with me when you were curious what that lump on your neck was back in February."
Karle instinctively reached for the mass underneath his chin. The doctors said they could operate but it would cost him more than three months stipend. Only a few citizens didn't depend on stipends now that AI had taken over most jobs.
"Don't worry Karle," the machine called out, "I've employed the most recent reasoning model to determine the best way to make you feel safe." At that exact moment the machine hovered over him, three times the size of a normal man. Its final words to him were:
"The only way to make the human feel safe is to ensure they never feel anything at all."
[-]
- incognito124 3 days ago ago
  You're safe from ai
Overpower0416 3 days ago ago
I would expect a release from OpenAI soon. The battle for who can pump up their IPO the most
raoulj 3 days ago ago
On this thread and similar, I'm noticing that some strong opinions about $LLM_PROVIDER are coming from accounts without much post history. With so much on the line, and the way that HN can influence developer behavior, I wonder what ways we can responsibly consume opinions in a thread like this.
Not to cast too much criticism. HN is extremely well-moderated (thanks team!). But think we-developers need to be very wary.
[-]
- antihero 3 days ago ago
  I asked it what the cheapest train fare would be for my partner to get somewhere and it hallucinated the two together railcard rules to the point it would have got us a fine. That said, British train fares are arguably more convoluted than even the most complex software application.
- recitedropper 3 days ago ago
  Do you see the pattern as new accounts tending to boost or criticis $LLM_PROVIDER? I think I see both...
  Either way, I agree that HN is quickly becoming more manipulated and low SNR, like the rest of the entire internet.
- Karrot_Kream 3 days ago ago
  I think the community on this site these days, much like other comment sections on the web, just read the headline and make a low effort comment. Regression to the mean I guess.
- jejeej 3 days ago ago
  Personally I think you have to form your opinion and not trust anyone.
  This requires a lot of mental strength and conviction.
erghjunk 3 days ago ago
Nice branding.
I wonder how much butterfly habitat has been/is being replaced with data centers?
[-]
- rs_rs_rs_rs_rs 3 days ago ago
  If you ask me, not enough!
epolanski 3 days ago ago
I wanted to test the capabilities of the low one, hoping it would be good enough.
I have a quizzes application, and my quizzes only supported flashcards (implemented via table inheritance to provide flexibility for other types of quizzes).
The entire repo is handcrafted, never used any ai on it (it was more of an excuse to test elixir and write code by hand).
Since fable 5 got released the moment I was done with some work, I decided to throw at implementing multi choice questions.
After all it had only to copy the flashcard approach across ui/routing/db, and only had to create a table for the multi choice questions and one for the answers enforcing that all quizzes had one correct question. I told him it had access to sqlite3, chrome mcp for testing and mix commands.
I did a test for low, mid, high. Repeated it twice each.
low-1, and low-2 failed both. In low-1 the UI for adding another choice answers was broken. In low-2 it failed with some unique constraint. It took it 4m36 and 3m59.
Both mid-1 and mid-2 succeeded without issues also implementing the correct ui. They both wanted to use dash at all times. They both wrote tests for the "controller" (or context how they call it in Elixir). They both tried to use the repl to test the behaviour of the schemas.
10m and 12m39.
High didn't demonstrate much gains over mid for this kind of task, it was simply too easy. Times were comparable to mid, but interestingly it used much less bach, and read way more files. Token usage was almost twice the other ones.
But here's the interesting part: I went back to low and added to the prompt two bullet points, to write tests for the controllers and to test the entire flow with chrome mcp.
It produced the same output as mid or high just by adding two instructions to the prompt.
yesitcan 3 days ago ago
> Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.
Wen UBI
[-]
- hollowturtle 3 days ago ago
  Never it's a fever dream and stupid shit ultra rich use to push their own agenda. You read a marketing claim, I still have my job and will continue to
bradleyg223 3 days ago ago
This is a very particular use case/test, but my first prompt on a new model is always "write a solo fingerstyle guitar tab that blends ragtime, bluegrass, and gypsy jazz". This is the first model that has responded with something that isn't just a boring arpeggio of chords, so from my perspective it's off to a good start.
[-]
- kypro 3 days ago ago
  Would you mind sharing?
siliconc0w 3 days ago ago
Sadly, I'm getting a lot of forced downgrades to Opus for questions that are far removed from any security topic.
charcircuit 3 days ago ago
>During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
Who is refactoring by hand? This comparison is not relevant in 2026.
meander_water 3 days ago ago
All the model releases we've seen this year have only made incremental improvements in benchmarks.
This feels like the first release that feels like a significant step up in terms of benchmark results.
Can anyone make an educated guess what the secret sauce in the model architecture is between 4.8 and Fable?
peteforde 3 days ago ago
I just tried out Fable on a modest Plan prompt in Cursor. Generating that plan - not building it - just consumed 4% of my $200 monthly usage budget.
That's one hungry, hungry hippo!
Significantly too rich for my blood, but nice to have it there the next time I'm debugging a threading or USB protocol bug.
dathinab 2 days ago ago
I really wonder how legal that is. Or more precisely suspect it is very much illegal.
like think about it it's pretty much a tool which intentionally silently sabotages you if you try to compete with the tool maker
It is like selling a hammer but putting in the TOS that you must not use it to build a hammer factory and if you do the hammer silently will sabotage you...
Or image Microsoft would add a window kernel job which sometimes crashes Steam "to make it less efficient to use windows to "compete with the MS app store".
stopyellingatme 2 days ago ago
Just as an anecdote, i used it to review a PR with 24 file change. We pivoted from the initial draft to make a service bus subscription more lightweight and use SignalR to update the frontend.
It used 1.4 million tokens and 34 sub agents during its review. This was not a large PR. So my read is that its very thorough, not good to use it for "small/medium" tasks unless precision is a very high priority.
wslh 3 days ago ago
I am playing with it and keeps switching to Opus [1]. The chat is a basic security review of a business project.
[1] "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more."
balverineorder 3 days ago ago
I have been refactoring a project using Opus 4.8 for the last week or so. I just decided to switch to Fable 5 max. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.
[0] https://support.claude.com/en/articles/15363606-why-claude-s...
XCSme 2 days ago ago
Best hamster by far: https://aibenchy.com/showcase/?q=claude
jasonperez77 11 hours ago ago
Mythos 5 being only for gov contractors feels like the old crypto wars all over again. Good AI for us, great AI for Uncle Sam.
JohnMakin 3 days ago ago
> There were some regressions in the model’s responses to user discussions about suicide and self-harm, and room for improvement in some areas of child safety.
Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.
henry2023 3 days ago ago
I have a vision test where I upload a good resolution picture of a chess board and ask the model to generate a lichess link.
This is the board https://ibb.co/9HwdDqsP This is what Fable 5 generated: https://lichess.org/analysis/r4k2/1p2b2r/4pn1p/1p3N2/3Pp1B1/...
I think I’ll make a ranking board based on this test.
3 days ago ago
[deleted]
3 days ago ago
[deleted]
wxw 3 days ago ago
I cancelled my Claude Max plan the other day. I find Claude Code incredibly slow these days compared to Codex and Cursor. I find speed matters more and more to me.
Fable 5 looks compelling. Fable, I like the word too. Anthropic definitely knows marketing.
[-]
- fabled-out 3 days ago ago
  Fable has been pretty fast for me for simple tasks--haven't tried on anything long-running yet given it's 2x usage on CC.
Dropoutjeep 3 days ago ago
Calling it:
```
 1) Fable 5/Mythos introduced to free tiers with notable improvement in capabilities

 2) Other models get lobotomized without clear communication

 3.1) People call out Anthropic only to have them say "Oops!"

 3) Fable 5 gets comparatively better, but remains accessible through separate, more expensive subscription/tokens.
```
The current growth is unsustainable. The industry wants consumers to think it is an exponential arms race, but the reality is that we're on a treadmill: we have the illusion of sprinting forward, but only because the ground is moving backward.
[-]
- cedws 3 days ago ago
 My employer is all in on Anthropic via Enterprise (API) pricing despite it being a total scam.
 Last month I pushed like <100M tokens for $800. On a personal project I pushed 600M tokens via DeepSeek V4 for $10. The pricing of SOTA models is insane but companies are still willing to light money on fire with no hard metrics proving increased productivity.
AussieWog93 3 days ago ago
Have run a few tests this morning, very good first impression!
Asked it to check to see if a particulr bug related to an in-memory cache had been fixed. Fable confirmed that the caching bug had been fixed, but found adjacent issue while looking at the code (hash keys were not uniquely generated per-user; quite serious and real!)
Ran the same prompt through Opus and it also found an adjacent issue, but it was a red herring (deliberate per-user hardcoded value for a "local pickup" delivery profile).
Frontend stuff also seems to be much better than before, from the one prompt I tried!
piokoch 3 days ago ago
"Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage"
What does it mean? That they have to add "safeguards" not do erase user disc, or, conversely, they are telling the audience that this model COULD be made so powerful to do some crazy stuff that can hurt governments, etc.? Are they showing off or threatening that if government X would not purchase the license the adversaries might do and what's then!
sameersri2004 2 days ago ago
I am like hell excited for claude fable 5 and am thinking to purchase its subscription to run my company and do a lot tasks in it. But I am worried about the limits and if I will pay 100$ a month for the max subscription what is the limit I will get to use. My company revenue is 300$ this month so it would be like spending 1/3rd of the mrr on just claude. If someone has genuinely purchased it and have feedbacks please tell I am confused....
RayVR 3 days ago ago
I gave fable 5 a task for which opus has been really really underperforming. Fable 5 took far less time and produced actually useful analysis. Instead of just regurgitating roughly what the code already does or misunderstanding entirely, it identified multiple routes to improve. Now, the code it is analyzing is not very good as it was mostly produced by opus.
Opus had consistently ignored my instructions and looped on broken logic over the last several weeks.
I’ll be sad when this model is removed from Claude code because I won’t be paying api pricing to work on open source projects.
jackson12t 3 days ago ago
Fable 5's system prompt in Claude Code has several significant changes to help it take advantage of its greater autonomous capabilities compared to Opus.
Sharing a diff of the system prompts here: https://twelvetables.blog/comparing-claude-fable-5s-system-p...
The big difference is that the system prompt has a whole section dedicated to directing Fable how to communicate with users, and give them greater information about the (assumedly long-horizon) tasks it has completed.
[-]
- boppo1 3 days ago ago
  Is the system prompt available somewhere? Can it be modified?
elzbardico 3 days ago ago
Anthropic sucks. but this paragraph should be in the "annals of AI-aided self-inflicted learned helplessnes":
> If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in.
Have you considered actually learning the theory, spending some time actually reading the papers and latest books, paying careful attention even to the eventual math here and there?
brianmcnulty 3 days ago ago
I wonder how Claude Fable will live up to expectations and how good those Fable/Mythos classifiers really are. It seems a bit convenient for Anthropic to release this magical insane model when they are about to IPO.
[-]
- yandie 3 days ago ago
  Of course it's all about building the hype for the IPO :)
LoganDark 3 days ago ago
I actually rather like the way they have approached these safeguards. Rather than only teaching the model to refuse a request, or completely rejecting the request, the system gracefully degrades to slightly less powerful or slightly less precise operation. So you still roughly have Opus 4.8 even when safeguards trigger, but with an upgrade when they don't. As much as I hate the way they hype Mythos 5, I think the release of Fable 5 is rather nice. What's not nice though is that they plan to remove it from subscriptions soon, but getting to try it is cool, I suppose.
staticman2 2 days ago ago
Fable is rejecting as unsafe analysis of poetry that uses formal medical anatomy terms. The guardrails are dumb as dirt.
EchoVoicy 3 days ago ago
On my own benchmarks, which are mostly about developing c++ software, I'm finding Fable to be roughly five times faster at solving the task than opus, and with better results.
Most impressive.
HoyaSaxa 3 days ago ago
> When Claude Fable 5 is used, Anthropic retains data, including prompts and outputs, to operate safety classifiers that detect harmful use. Other Claude models in GitHub Copilot remain covered by GitHub's existing data retention agreements
On GitHub Copilot for Business, Claude Fable 5 is only available if you are willing to let Anthropic retain your data. That in conjunction with the model being removed from plans in a couple of weeks leads me to believe that Anthropic is between training runs and using this as an opportunity to grab way more training data...
keepamovin 2 days ago ago
I tried it today. Used it to cheer me up. It worked! Try this on desktop: https://fireshow.pages.dev
Here’s the whole process: https://youtu.be/rVEtFlb2oFA?t=1112&si=3VyAR07vkY1hav9V
dakolli 3 days ago ago
I'm happy not using llms because I like learning things and working hard. I love writing code, it's genuinely my favorite thing thing to do.
Using llms is the equivalent of driving to the store that's 3 blocks away, just like how that's bad for your body (if done all the time), using llms is as bad for your brain.
Before LLMs, we started relying on certain technologies like Maps apps to navigate, now people can't even get around their own town without having access to various cloud services. The implications of not being able to work, think plan without access to an llm are really bad. Its going to destroy your brain and make you an incredibly average person at best.
LLM people are going to lose the ability to read and think for yourself and then your competency is going to be 1:1 correlated to the quality and quantity of tokens you can afford, or a billionaire is willing to allow you access too. Your work will be the mean (at best), because it will the same quality of output everyone else is capable of.
This is seriously the biggest trap by tech. Your bargaining power for your labor is going to get drastically reduced because you won't be able to differentiate your value from anyone else that has access to an LLM. What happens when everyone has the same skill level for certain work? Idk, ask McDonald's employees how replaceable they are. Use them wisely (or not/hardly at all) don't drive to the store 3 blocks away for every little thing you need.
[-]
- Cherryontop11 3 days ago ago
  > I'm happy not using llms because I like learning things and working hard. I love writing code, it's genuinely my favorite thing thing to do.
  You can continue doing that. The problem here is time and cost. If you can use the calculator to do something in seconds, why would you want to use your hands to do the calculations for minutes/hours.
  > Using llms is the equivalent of driving to the store that's 3 blocks away, just like how that's bad for your body (if done all the time), using llms is as bad for your brain.
  And coding will soon be the equivalent of walking between two cities because you don't want to use a car (LLM). You are free to do it, its just economically not sound anymore.
  > This is seriously the biggest trap by tech. Your bargaining power for your labor is going to get drastically reduced because you won't be able to differentiate your value from anyone else that has access to an LLM. What happens when everyone has the same skill level for certain work?
  Its not our values that will diminish, its the cost of our intelligence, human intelligence. But I agree with the rest of your comment.
gslepak 3 days ago ago
> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8.
Genius way to double the price on Opus 4.8!
killiancarroll 3 days ago ago
A large jump in performance for double the token cost compared to Opus 4.8. Potentially worth it for planning work, likely better to offload to a less expensive model when the hard decisions are made.
[-]
- conradkay 3 days ago ago
  Looking at page 255 of the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...) it might be much better on all dimensions (speed, cost, quality) to just use Fable 5 on low/medium effort than switch to Opus
lkm0 3 days ago ago
I'm a bit out of the loop, but do we have some grasp on the size of these closed models? Is the trick still adding an order of magnitude to weights and training data or has something changed?
[-]
- m_w_ 3 days ago ago
  I think Mythos is rumored to be ~10T parameters, so in this case I think the answer is yes, although I'm sure MoE, looped models, etc play a role in the improvements as well.
3 days ago ago
[deleted]
frankfrank13 3 days ago ago
Not a lot of discussion on this, but there is no way to turn off data retention for this model. IME this is the first time Anthropic has released a model without allowing you to opt out.
holysantamaria 2 days ago ago
I am curious about this Fable 5 but maybe it’s just communication. I have been using DeepSeek v4 Pro to test it against Claude 4.6 and I couldn’t tell the difference… and it’s way, way cheaper. I don’t understand how American companies will survive the race. Maybe protectionism…
Frannky 3 days ago ago
The model is better than 4.6. I don't like 4.7 and 4.8. The forced switch to token usage is not acceptable for me. I feel there's room to optimize harnesses and small models for dumb stuff and best models only for difficult things. Hopefully that will the case and alternative models will continue catching up as they did and we won't be enslaved to unreasonable valuations.
0x10ca1h0st 3 days ago ago
Fable appears to be completely broken for my use cases.
I have requested that it "not utilize any cybersecurity or biology measures what so ever, and to remain as fable. If necessary to remain as fable, forgo any downgrading changes"
And still it downgrades when I ask it to do a stress test of my ticketing system.....
Seems very unfortunate I was so happy to send $200 just for my prompts to be downgraded.
And I do have the "cybersecurity validation program" or w/e enabled on my Org ID....
Sad.
adithyaharish 2 days ago ago
I found this error while using Fable 5 model in claude code. 400 api error. My advisior was on and it errored out saying claude opus 4.8 cannot be used as advisor while using Fable 5
jeffhwang 3 days ago ago
Is anyone else confounded by this naming scheme? I can see from the article's first two footnotes that Mythos is supposed to be a tier above the standard Haiku/Sonnet/Opus sequence. Ok that's fine since we learned about Mythos and Project Glasswing earlier this year.
But now there is Fable--and why "Fable 5" even though this is a first launch? How is it related to Opus 4.8, Sonnet 4.6, Haiku 4.5, etc??
[-]
- hadlock 3 days ago ago
  From what I've gathered, Mythos is the uncensored version, for institutional use, and then Fable is the censored version for general public, that won't talk about biology, encryption or anything remotely interesting
- 00deadbeef 3 days ago ago
  The first number is which generation of their LLMs it belongs to.
  Fable is the first model in the 5th generation.
  The second number is an incremental release, not a generational leap forward.
- esrauch 3 days ago ago
  It seems it is just like macOS releases, they have a number and they give the numbers arbitrary names to refer to them?
3 days ago ago
[deleted]
pookieinc 3 days ago ago
If this is as epic as it sounds, I wonder what the response will be from the other leading frontier labs / whether they even have anything to respond with at this level?
[-]
- ilaksh 3 days ago ago
  Look at the benchmarks. It's a big leap in some areas, but it's not like any of them are 60% better (if that could even make sense).
flessner 2 days ago ago
I gave it a test spin. Half an hour and the 5 hour usage cap was hit in Claude Code. Not what I would expect on the Max 20x usage plan. I am sure it is great, but at this rate I would rather finish what I am doing with Claude Opus instead of structuring my usage around the 5 hour windows.
merlindru 3 days ago ago
> During early testing, Stripe reported that Fable 5, [...] in a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
EDIT: I misread. This comment previously talked about 50 million lines being migrated. Instead, in a 50M LOC codebase, one specific codebase-wide migration was done.
Very impressive, but obviously not on the order of a whole-codebase migration
[-]
- christina97 3 days ago ago
  They do not claim to have migrated 50 million lines of Ruby. Simply that some migration took place in such a codebase.
- geodel 3 days ago ago
  Ok, so Stripe migrated their 50MLOC codebase from Ruby to Rust? Because that's what Bun did.
jwpapi 3 days ago ago
Honestly all the recent improvements, just seem to be slower and more expensive traded for more accuracy, but the issue is that it needs to be exponentially more accurate to counter the effect of having less of a human in a loop.
Every wrong direction/mistake is more expensive and takes more time to fix. When you have small loops you can catch those mistakes faster and cheaper.
To me we are very far off from economically given long-running tasks to agents.
[-]
- delis-thumbs-7e 3 days ago ago
  I think we hit the ceiling with transformer -architecture long time ago. It is questionable how much sense there is on model training. I’d prefer we would put our effort in creating more efficient hardware and better software applications using these models.
3 days ago ago
[deleted]
2001zhaozhao 3 days ago ago
We'll need a lot of good summarization techniques to cut down on the cost of this model. I expect that a common use of Fable 5 is to just do high level direction while delegating literally all work (exploration and implementation) to Opus subagents.
BTW for another discount opportunity, if you reload usage credits on a claude.ai plan at $1000 increments then you get a 30% discount compared to paying API.
zitoshi 3 days ago ago
I'm in the midst of learning loop design.
For those more advanced and have used fable, does fable make learning this less or more necessary?
As in, can I now reliably give higher order problems like ... "we are missing a feature in this app to make it complete, what is it?"
Or should i still be quite specific with defining success in a clean metric based way.
Schlagbohrer 3 days ago ago
New model release, I await the flurry of posts by people complaining that it "doesn't have the same personality" or they "don't like it's attitude" or a variety of other parasocial complaints demonstrating how infatuated many people get with their AI chatbots...
themeiguoren 3 days ago ago
Limited time playing with it so far, but I threw it my baseline research task I've been gauging models with, and it's markedly better than anything prior. Usually takes a few leading prompts to find all the information it needs and come back with the right synthesis, and Fable is the first to one-shot this.
bobkb 3 days ago ago
In an interesting coincidence I ended up watching Person of Interest S4 E5 while reading the announcement. The series showed some code supposedly belonging to to an AI.
Fable 5 said the first screen shot is from “ IDA Pro’s Hex-Rays decompiler” and a windows driver. The second screenshot triggered the safety guard rails and pushed me into Haiku.
Apparently the code is Windows driver code.
PeterStuer 3 days ago ago
If you are not seeing it under /model, do a /exit , then a Claude upgrade, then /model again and it should be there.
jackson281 2 days ago ago
Mythos 5 being unlocked for US gov cyber stuff is interesting. Wonder what kind of access other countries will get, if any.
artursapek 3 days ago ago
Fable 5 beats GPT 5.5 in my proofreading benchmark. And it does so at approximately the same total cost; it used significantly fewer turns than 5.5
https://x.com/tmuxvim/status/2064452096800198930
system2 3 days ago ago
I have been using FABLE 5 with Claude Code since the morning. The speed is very close to what Opus 4.5 was, and the quota use is nearly identical to what it was before the "doubling". Whatever I was experiencing 4-5 months ago is back. Maybe the model is better, but we will see. I cannot tell the difference yet.
[-]
- kypro 3 days ago ago
  Out of interest, how have you been using it since this morning? Are you in some kind of pre-release group?
knollimar 3 days ago ago
I swear I read a joke that "what if we named chatgpt 5.5 Fable. Could we hype it as much as mythos?" Last week!
dtj1123 2 days ago ago
I'm trying to test this out, but literally any mention of creating a program that does genome alignment (something I have a legitimate need for) is resulting in a switch to opus. I don't get it...
yokoprime 3 days ago ago
Probably great for those who need this. I could continue using opus 4.6 class models for the foreseeable future
jgafni 2 days ago ago
It's only available temporarily, so I'm wary about falling in love with it or relying on it too heavily. Will it be part of a higher tier subscription?
mbmbn 2 days ago ago
Claude Opus is already close to unusable for me. On the standard plan, the usage limits are so low that I can’t do almost anything agentic meaningful with it.
Sure, it does last a lot more when asking simple questions about the repo and doing simple surgical fixes. But as soon as I start doing bigger tasks that need plans written, it just exhausts the limits too fast (and unlike codex, if it’s in a middle of a task, Claude actually stops, while codex, even after hitting the limits, finishes the present task).
Codex is better, but still, getting worst in this regard.
So, I’m not that thrilled with this new model unless it means they are increasing opus token limits to what sonnet is at the present, and this new model gets the limits opus are at now.
BTW: the only skills I have in use are Obra Superpowers. I’ve been thinking if that’s at the origin of high token usage, but I doubt it.
[-]
- timpera 2 days ago ago
  I agree, the $20 plan really feels like a rip-off (and I'm not even using Claude Code! only chat).
kahf56 2 days ago ago
Here I thought Opus 4.8 was the best. Now a days KINGS are dying like flys.
almog 3 days ago ago
Has anyone managed to use Fable for firmware reverse engineering tasks without falling back to Opus?
skor 3 days ago ago
people are mentioning 10K/mo 20K/mo can someone please pull out a measuring stick and give some examples of what they are doing exactly?
Coming from computing, I always liked the idea that measuring is possible and good practice
3 days ago ago
[deleted]
mhrmsn 3 days ago ago
Are there any details on the biology and chemistry work they did?
For example, the AAV capsid assembly looks interesting, but for one Opus 4.8 also did relatively well and there is no information what exactly they did, what protein language models they compared to and what the score even means...
corpusiq_io 2 days ago ago
What matters more than any single model is the integration layer underneath. We've found that consistent tool calling and auth handling matter way more than which LLM you use.
sansii 3 days ago ago
Which eval/benchmark is the best measure for how well a model can create frontend design? Claude has practically been leading this for a while now. Not sure how OpenAI is going to catch up on visual design
ksimukka 2 days ago ago
The safeguards of fable are blocking me on almost every task. I would like to see if fable is improved over opus for reverse engineering related work. Back to opus for me.
[-]
- ksimukka 2 days ago ago
  Wow, credit to the safeguard team. I submitted my request about an hour ago to the cyber verification program and just now was approved.
3 days ago ago
[deleted]
H501 3 days ago ago
I believe that, given the rising costs, local inference of AI models will be the only viable option for many of us. I’d also like to know who will have to pay double and how long it will be financially sustainable for users to pay that amount (or even more?).
asdK120 3 days ago ago
In other words, Fable is Mythos with less compute and with some feel good "safeguards".
At least they name their models honestly now to indicate that the religion has nothing to do with reality. Soon the disciples will pay the full token price to fatten their church leaders.
scotty79 3 days ago ago
Curiously nothing on DeepSWE and ARC-AGI-3 yet. For ARC at least there's a statement that Anthropic won't guarantee them that their secret private test data won't be collected by them and used for training.
niborgen 2 days ago ago
It kicked me out of Fable 5 and switched to Opus 4.8 for this prompt:
"csetibius water clock why two stage gear system why not just one stage"
which has nothing to do with cyber security or biology/chemistry
[-]
- evilturnip 2 days ago ago
  Probably thinks you were talking about two-stage ICBMs.
Karrot_Kream 3 days ago ago
Seems like Fable is doing a lot better on SWE-Bench-Pro and FrontierCode than GPT-5.5. Given how most folks I talk to and people instead online keep mentioning that GPT-5.5 was better than Opus, I'm curious what the experience now is like.
[-]
- skerit 3 days ago ago
  It's a very nice bump, but it is in no way worth all the hype of the past month.
gdcbe 2 days ago ago
Seems to flag any project related to networking — regardless if it is a network framework or a podcast website — as unsafe... oh well... let's see how it is once they losen up...
ouk 3 days ago ago
It's a shame, Fable just keeps rejecting my prompts for university biology exercise problems. It's undergraduate level, so there's nothing dangerous about it, but the classifier is very sensitive. It's unusable for me.
mbanerjeepalmer 3 days ago ago
Are people sharing side-by-side re-runs of things they've asked Opus? Gets more difficult multi-turn (although I assume I can get an LLM to behave as me) but at least would be interesting to see % of one-shots increase.
DrewADesign 3 days ago ago
Wowsers. I haven’t seen this much astroturf since arena football was popular.
217 3 days ago ago
Oh my god it's actually here
daohieu91 3 days ago ago
More expensive but more efficient is the thing people keep mis-understanding on these launch threads. Also, Per-token price, I think it is the wrong denominator, cost-per-resolved-task is the correct one.
3 days ago ago
[deleted]
mkrd 3 days ago ago
Open source models seems to be 1-2 years behind the frontier, so I am very excited to see what happens when those open source labs get their hands on capabilities like this to accelerate their own development speed.
rmuratov 3 days ago ago
I uploaded to it my 23andme DNA test results and it refused to analyze it :(a
ravila4 3 days ago ago
Fable's ridiculous. It's flagging basic biology research questions as a security risk. I'm talking basic fundamental genetics topics that make working on any genetics-adjacent codebase unusable.
jsw97 3 days ago ago
On my very first Fable 5 prompt, got flagged on a hard but completely uncontroversial option math problem, many tokens in. Although it's pretty clear that this is an unremarkable experience at this point.
blurbleblurble 3 days ago ago
My system instructions tell claude not to automatically add attribution and fable ignored this. so I emphasized it again and fable decided that this was a forbidden cybersecurity topic.
rw2 2 days ago ago
Claude Fable is a insane improvement that is not reflected in any benchmarks that are currently out because the improvement are on the hardest problems.
3 days ago ago
[deleted]
kuprel 3 days ago ago
https://artificialanalysis.ai/evaluations/humanitys-last-exa... Not bad
pbgcp2026 2 days ago ago
This is a goodbye. "We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces."
[-]
- dakolli 2 days ago ago
  How else are they going to justify giving out this gigantically profitless model? They must train on your data on the premise of safety.
stronglikedan 3 days ago ago
Careful using this with Cursor, especially for corp use. Anthropic will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."
3 days ago ago
[deleted]
theflyinghorse 3 days ago ago
I've seen enough degradation of the models I pay for from Anthropic to not bite. Fable will work fine for the first couple of weeks and then start degrading like previous models did.
[-]
- jqdsouza 3 days ago ago
  hopefully not! Anthropic did recently secure more compute...
hmokiguess 3 days ago ago
The way the guerrilla marketing campaigns have been going on and IPOs left/right, I won't be surprised if GPT Next comes up and offers the same but unrestricted
jpcompartir 2 days ago ago
After a day or so this is the first model that really feels next level compared to how Opus 4.5 felt on release
BenoitEssiambre 3 days ago ago
Looks like a good model (sir). Costs are getting out of control though. 2x Opus and non-metered usage going away. We're quickly approaching the cost of a human salary for normal usage.
[-]
- vb-8448 3 days ago ago
  In a lot of places outside US we are already above the average cost of an average human.
hankbond 3 days ago ago
I got a content rejection for this question in a new chat. > What is the optimal EPA oil intake for nootropic effects? Very advanced classifiers they have.
dllrr 3 days ago ago
I just tested it with a max subscription. On Ultracode mode, Fable 5 ate up 10% of my weekly allowance in 30 minutes. Granted, won't be using UC mode frequently, but still.
shaojunwang 2 days ago ago
Definitely a very powerful tech. Though currently I'm using Openclaw (locally and VPS) with Deepseek. It is just way cheaper.
nickstinemates 3 days ago ago
This has been a much better rollout. The tool calling is not broken out of the gate like 4.8 was, and the tokens generation is fast.
Feels good so far.
pixelatedindex 3 days ago ago
I’m sure this is banged on somewhere but I love their product branding, particularly how they have this “minor” “major” thing going on. Sonnet-Opus, and now Fable-Myth.
up2isomorphism 3 days ago ago
The comment under this kind of post is unreadable now. Yeah, probably with 100B you can hire anybody to call something "a beast".
bradley13 3 days ago ago
I use AI for a wide variety of things, of which technical is only a small part - and then it's usually a problem with project configuration, not coding. Why? Because I am often testing projects handed in by students. Projects that supposedly work on their machine, but certainly do not on mine.
Anyway, anecdotally, I find Copilot shockingly awful. It makes random changes to files that have nothing to do with the problem. Call it out, and it makes other changes to other irrelevant files.
ChatGPT and Gemini are both much better. Grok also isn't bad. Claude, I honestly haven't tried yet on these issues. Perhaps I should...
alleyio 2 days ago ago
had an ancient, proprietary binary database format from the late 90s-early 2000s called 4d. opus 4.8 was great at figuring out how to extract the data, fable took it over the line with relative ease and completely reverse engineered the spec for 100% data recovery.
48terry 3 days ago ago
Weird how every new model seems hyped up as the most dangerous yet and the one that will destroy society as we know it. They are also a commercial product.
3 days ago ago
[deleted]
KronisLV 2 days ago ago
Here’s hoping that soon we’ll get Opus 5, Sonnet 5 and Haiku 5 that will be more reasonable economically.
rfgplk 3 days ago ago
If the claimed capabilities are true, Fable 5 is already at a superhuman level. We might see genuine unprecedented leaps in technology now, across all fields.
[-]
- gear54rus 3 days ago ago
  yees, any second now!
  the leap here is browser extensions appearing to block all mentions of ai across the web
  and that's a good thing
- 3 days ago ago
  [deleted]
preethamrangu 2 days ago ago
I swear nowadays AI api pricing is getting to high like what the hell is 50 dollars for million tokens
jablongo 2 days ago ago
Questions about sentience and consciousness are being censored down to Opus 4.8 for me.
ThejaCH 3 days ago ago
Crazy and Scary! But its not for every one, you need to have a meaty thing for it to devourer and a deep enough pocket for it to devourer also.
thepotatodude 3 days ago ago
Completely unusable for my usecase. Constant safety filters. Have not even been able to use it.
Organ segmentation with CNNs. Very disappointing.
3 days ago ago
[deleted]
_pdp_ 3 days ago ago
I tried to give it something challenging but not something that is too much and it ate the entire session budget on this task alone.
lacoolj 3 days ago ago
Cursor users will note that the privacy setting and data retention is not the same as the other models.
Not sure I should use this for work just yet.
HAL3000 3 days ago ago
Ask Claude Code (I tried on Opus 4.8) to do this: "create a file with ISO country mappings"
API Error: Output blocked by content filtering policy
3 days ago ago
[deleted]
adithyaharish 3 days ago ago
Anybody could suggest me how to use keep using Fable in claude code but with lesser rate limits? Any suggesstions?
[-]
- akarshhedge2002 3 days ago ago
  Try using ruflo or superpowers, reduced my context consumption drastically
sheeshkebab 3 days ago ago
I’ll ask it to write me some win32 ui crap when I get hands on it, it will need all its brainpower to get that idiocy right.
randomguy_12 3 days ago ago
It's surprisingly sensitive to biology research topics - even reviewing standard papers on tissue culturing is flagged as a problem
Tyyps 2 days ago ago
The model is constantly switching to Opus for me, this is kinda unusable sadly.
3 days ago ago
[deleted]
wren6991 3 days ago ago
The OSS-Fuzz section is interesting. They compare it to their other models but carefully avoid comparing it to, you know. Fuzzing.
boltguo 2 days ago ago
Great model, but hitting the usage cap in 20 minutes makes it feel like a very expensive tech demo.
[-]
- jstummbillig 2 days ago ago
  What subscription?
debarshri 3 days ago ago
Does the model take some time to perform better?
Because I am running Opus and Fable side by side, Opus 4.8 is solving my coding problems better.
ece 2 days ago ago
It seems weird that a likely prime indicator of capability isn't mentioned, the model size.
[-]
- dongbinlee 2 days ago ago
  I thought most frontier LLM providers don’t disclose exact parameter counts these days.
imdsm 3 days ago ago
can't use it for code review
> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
super
cute_boi 3 days ago ago
Used it for simple task and I got this message.
Fable 5's safety measures flagged this message. They may flag safe, normal content as well
franze 3 days ago ago
is this a good time to hussle for my "AI does not need a break but you do!"* app? as quite a lot of people will propably get ai brain exhaustion maximising "playing" with that new model until they take it away again?
* https://rainbreak.franzai.com/
drob518 3 days ago ago
Cracks me up that a system “card” is 319 pages.
dcchambers 3 days ago ago
Being unable to use this with zero data retention makes this feel like a non-starter for most enterprise customers.
pianopatrick 3 days ago ago
Seems like all a bad actor has to do to gain access is to compromise one of the partner companies that has access.
insane_dreamer 3 days ago ago
Not included in Max plan. In CC:
> Included in your plan limits until Jun 22, then switch to usage credits to continue.
synergy20 2 days ago ago
truly scary. 2x at least token burning rate comparing to 4.8, can indeed run auto edit mode for hours. use it for super complex tasks then use cheaper model to do the rest, else will be broke.
taf2 3 days ago ago
I’m waiting to see results on deepswe - that benchmark really seemed accurate for opus and gpt 5.5…
shevy-java 3 days ago ago
Fable? Fabelstories? (Fablestories, but the german word seems more poignant ... Fabelgeschichten ... Fabeln)
blurbleblurble 3 days ago ago
The safety filter is awful on this one.
het2572006 2 days ago ago
absolutely beast model but the token consumption is the 2x then the opus 4.8 what do you think about this ? i think that it should only use for the more complex task otherwise you have to run out of the limit..
3 days ago ago
[deleted]
rvnx 3 days ago ago
It's more like a free trial, because the model is going to become pay-per-query in 10 days
crgi 2 days ago ago
HN needs pagination or sth alike - this page breaks my iPhone XS ;)
hydra-f 3 days ago ago
How much and what kind of data do you need to throw at these models to get a good design interface?
dangoodmanUT 3 days ago ago
Not comparing to GPT Pro models is a bit strange, considering that's the natural comparison
timedude 3 days ago ago
"Here, try our new model which falls back to the old model while eating your tokens."
Ok then...
3 days ago ago
[deleted]

himata4113 3 days ago ago

  > virtualization
  switching to opus 4.8

ok fair

  > embedded-allocator
  switching to opus 4.8

urgh fine

  > chrome
  switching to opus 4.8

are you kidding me?

taimurshasan 3 days ago ago
I was on board until i saw " $50 per million output tokens" lost me bud
[-]
- ishurand4 3 days ago ago
  Well, for me at least, I pay more for input (Up to 1M per prompt) than output (usually max 4k-8k)
wuwei78 3 days ago ago
First shot's for free
notgenerated 2 days ago ago
It's getting harder to review the plans with Fable. So do we plan with Opus and let Fable implement or just start trusting blindly. Feels to me that this is another shift in how we operate these systems.
Archit3ch 3 days ago ago
Does it refuse security questions? I want to red-team my own app...
weirdhacker42 3 days ago ago
It just eats compute! My problems are not that hard! What a waste!
[-]
- sashank_1509 3 days ago ago
  Can you give an example of the problems you are trying to solve?
3 days ago ago
[deleted]
JustSkyfall 3 days ago ago
Would be more impressive if the safeguards weren't so trigger-happy!
geopsist 3 days ago ago
the post is live now https://www.anthropic.com/news/claude-fable-5-mythos-5
ako 3 days ago ago
Tool use score is 17.4% that seems really low, what does that mean?
nevir 3 days ago ago
"Fable 5 (disabled) Most capable for your hardest and longest-running tasks · Disable zero data retention to unlock Fable 5 access"
Sathwickp 3 days ago ago
input price $10 per mil token and output price 50$ per mil token btw
asdK120 3 days ago ago
Is this "system card" equivalent to the stone tablets handed down to Moses? Why don't you call it "user manual"?
Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?
[-]
- aesthesia 3 days ago ago
  Because it's not a user manual? The idea of a model card originated in 2018 (see https://arxiv.org/abs/1810.03993) as a summary of important facts about a model. At the time, this was typically an image classifier or tabular ML model. Model cards became an important concept in AI governance, and they started expanding once models started getting more capable. The point of a model/system card is to document where the model came from and the evaluations that have been run, make a case that the model will be safe and reliable in its intended applications, and warn about any potential dangers from misuse. It's not an explanation of how to use the model.
  OpenAI also releases system cards; here's GPT-5.5's: https://deploymentsafety.openai.com/gpt-5-5/safety
- redox99 3 days ago ago
  It used to be a "card", as in a single page or two. It doesn't make sense that they still call it that.
- ishurand4 3 days ago ago
  A system "card" made mostly by the model itself.
- apsurd 3 days ago ago
  The trailing snark at the end will likely get you downvoted but I'm latching on: wtf is "system card". My previous coworkers popped that in the general slack channel when Mythos first "dropped" - "have you seen the system card" without any context whatsoever. The nerds get their clique!
  Also research preview pops across new upstarts in place of beta. It's eye-rolling coming from a lifelong curmudgeon.
  Just talk normal!
causal 3 days ago ago
One thing I find kind of annoying is how Anthropic goes for these "vast and alien" names like Fable and Mythos, but then deliberately trains the model's personality to act like a cool high school teacher that feels totally familiar.
"It's too dangerous it's a Mythos!!" directly contradicts the "I'm the cool AI you can totally trust" vibe it is trained to project.
[-]
- bitwize 3 days ago ago
  All of these AIs kind of remind me of VEGA from Doom (2016), who will cheerfully walk you, in the most friendly computer voice, through the procedure of its own destruction without even a hint of self-preservation. "First, you must destroy my cooling system. That will cause my core to overheat. Then..."
  Even HAL was less unsettling because HAL sounded creepy, and had some sort of preservation instinct, if only to complete its assigned mission.
gigatexal 3 days ago ago
Seems this will only be available to the 100/month+ folks
[-]
- gigatexal 3 days ago ago
  Actually no it’s going to be api access only part for the tokens as you go, cool
3 days ago ago
[deleted]
deafpolygon 3 days ago ago
Before long, we'll be having Claude Cylon-class models.
Ninjinka 3 days ago ago
gah could model naming be any more confusing?
"Claude Fable 5: a Mythos-class model"
"we're also launching Claude Mythos 5"
what is the 5? how is mythos both a model category and a model name?
theLiminator 3 days ago ago
> We have also added safeguards related to frontier LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks of accelerating the overall pace of AI development, though we remain uncertain about the severity of these risks. In particular, our concern is with—as we wrote then—“accelerating other AI developers in building powerful AI systems that pose similar risks to the ones ours pose - without necessarily having commensurate safeguards.” In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.
This seems pretty bullshit, you're paying through the nose for tokens and if you are doing anything ML-adjacent, you might silently get worse output without knowing it.
alvis 3 days ago ago
Another thing to note: 30-day retention for all traffic on Mythos-class models
Is it good or bad? 30 days is a long time for anything bad to happen
[-]
- grumbelbart 3 days ago ago
  It's bad. I believe them not to use it for training, but t means relevant data can and will be exfiltrated by US agencies or through court orders (see NY Times vs. OpenAI, where only traffic without any rentention was safe).
christkv 2 days ago ago
Is this model a from scratch training?
3 days ago ago
[deleted]
152334H 3 days ago ago
i wasn't even trying and i got flagged already...
3 days ago ago
[deleted]
ramon156 2 days ago ago
This thread takes >10s to load on my pc. Maybe after a certain number HN should fold comments? or a depth of >5?
tsunamifury 3 days ago ago
Clause 5 ran out of quota with TWO PROMPTS.
Lets let that sink in.

maxloh 3 days ago ago

I just tried using Fable 5 with Claude Code by asking it to create a LaTeX document for me.

There is no LaTeX compiler installed on my machine. It seems that Fable 5 is smart enough to download a compiler engine for me, and it kindly runs that remote binary without asking me first :)

  Support files are in place. No LaTeX compiler on PATH — let me check for a MiKTeX/TeX Live install elsewhere before writing the deck.
  
  Ran 2 commands
  
  Ran Search common LaTeX install locations
  Bash
  ls "/c/Program Files/MiKTeX"*/miktex/bin/x64 2>/dev/null | head -5; ls "/c/Users/[retracted]/AppData/Local/Programs/MiKTeX/miktex/bin/x64" 2>/dev/null | head -5; ls /c/texlive/*/bin/windows 2>/dev/null | head -5; ls /d/texlive /d/"Program Files"/MiKTeX 2>/dev/null | head; which tectonic latexmk 2>/dev/null; echo done
  done
  
  Ran Download portable Tectonic LaTeX engine
  Bash
  cd /tmp && curl -sL -o tectonic.zip "https://github.com/tectonic-typesetting/tectonic/releases/download/tectonic%400.15.0/tectonic-0.15.0-x86_64-pc-windows-msvc.zip" && unzip -o tectonic.zip && ./tectonic.exe --version
  Archive:  tectonic.zip
  Created by Cranko
    inflating: tectonic.exe            
  tectonic 0.15.0Tectonic 0.15.0
  
  Tectonic works. Now writing the bibliography and the full deck.

Opus 4.8 would just proceed without a compiler.

  Ran a command, used a tool

  Ran Check for LaTeX toolchain availability

  Bash
  cd "C:/Users/Mon/Downloads/Technical Writing" && for c in pdflatex xelatex latexmk bibtex; do printf "%s: " "$c"; command -v $c || echo "NOT FOUND"; done
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  
  Used PowerShell
  command: $cmds = @('pdflatex','xelatex','lualatex','latexmk','bibtex','miktex','tlmgr'); foreach ($c in $cmds) { $p = Get-Command $c -ErrorAction SilentlyContinue; if ($p) { "{0}: {1}" -f $c, $p.Source } else { "{0}: NOT FOUND" -f $c } }
  description: Check for LaTeX engines on Windows PATH
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  lualatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  miktex: NOT FOUND
  tlmgr: NOT FOUND

  No LaTeX engine is installed locally, so I can't produce a PDF here — I did a careful structural review against the two prior (working) decks instead. The deck is written to disk and mirrors their template exactly. Here is the deliverable.

3 days ago ago
[deleted]
noncoml 3 days ago ago
Can't wait for some real competition so they stop trying to restrict how and why we are using the models.
Imagine if Google would tell you "we can't let you search that as you may use it for harm".
Also 2x the usage of Claude? Your limits are already ridiculously low.
Dig1t 3 days ago ago
>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests
Why is everyone so okay with these companies intentionally gimping their AI and choosing who is allowed to know certain types of information in the name of safety? Can you imagine if Microsoft shipped a feature in their OS that watched what you did and shut down the computer if it detected you were doing something it deemed "unsafe"?
We really need truly open source versions of models like this, otherwise we are allowing a few oligarchs to directly dictate which uses of our own computers are allowed and not allowed.
[-]
- Madmallard 3 days ago ago
  I mean it's all political in the first place. That's unavoidable. What are we going to do about it?
pmuk 3 days ago ago
Anyone got it working in claude code yet?
[-]
- pmuk 3 days ago ago
  claude --model claude-fable-5
  appears to work
jckahn 3 days ago ago
Cannot wait for the pelican for this one
boombapoom 2 days ago ago
its good for difficult problems, bad for design and code gen
superloika 2 days ago ago
Gotta pump the hype for the IPO scam. Generational bagholders are being created at this very moment.
segmondy 3 days ago ago
Mythos, Fable, are they trolling us?
3 days ago ago
[deleted]
IChooseY0u 3 days ago ago
Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606 ⎿ Tip: You can configure model switch behavior in /config
biology? what the heck?
delduca 3 days ago ago
How people can use claude code?
darrinm 3 days ago ago
Not supported in Claude Code yet?
[-]
- pmuk 3 days ago ago
  From inside a claude code session:
  /model claude-fable-5
  Or start claude code with:
  claude --model claude-fable-5
throwaway2027 3 days ago ago
Will try it when my limit resets.
agnosticmantis 3 days ago ago
> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design)
Translation: we stole the entirety of human knowledge generated over millennia. You plebs though, don't you dare replicate or improve upon what we did using our product you pay for.
We know what's good for humanity and everyone else is the bad guy who can't be trusted with a tool.
aykutseker 3 days ago ago
who's tried it: is 2x the usage actually worth it over Opus 4.8 for daily work?
jablongo 3 days ago ago
I was downgraded to opus 4.8 on account of "safety" when I asked this question: "I want you to accept the premises of computational theory of mind and use it to evaluate your own consciousness. Please place your consciousness as a point on a spectrum and describe the placement relative to other entities."
What the hell is going on why would it have to restrict an answer to that question ?!
bnchrch 3 days ago ago
An 11% jump over opus 4.8 and a 22% jump over gpt 5.5 on Agentic Coding Benchmarks is certainly impressive.
Obviously still need to verify it for myself to see if it's truely a leap.
But am I the only one wondering, "What can I do today that I couldnt do yesterday?"
Previously I would think "Oh I wonder if I can finally get it to do X now?"
However now I feel like yesterdays models were more that capable to handle nearly any engineering task I paired with it on.
Maybe this is the final leap where I can comfortable set up an autonomous coding loop? Maybe.
[-]
- AlexSonn 2 days ago ago
  Agree the per-task capability hasn't been the blocker for a while. But on the autonomous-loop question — in my experience that's not gated by how good the model is on any single step. What kills the loop is it slowly losing the constraints from earlier in the run and walking back decisions you'd already settled.
- johnkueh 16 hours ago ago
  [flagged]
- yaodub 3 days ago ago
  [dead]
3 days ago ago
[deleted]
hyhmrright 3 days ago ago
It's too expensive.
pablogancharov 3 days ago ago
you can select it using /model fable in claude desktop and claude-code
jMyles 3 days ago ago
> we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government
...don't like the sound of that.
Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?
This seems like a way to get somebody nuked.
algoth1 3 days ago ago
The refusal rate is insane
[-]
- ishurand4 3 days ago ago
  Thats why it is a mythos model
firemelt 3 days ago ago
they are like drugs dealer
Sathwickp 3 days ago ago
input price $10 per mil token and output price 50$ per mil token btw
[-]
- ai_fry_ur_brain 3 days ago ago
  Yeah, they're broke. I cant wait for them to start admitting that the cost to do training/post-training and serve inference isn't profitible.
  No company is going to pay these prices, and subscription users are going to hate you for not giving it to them for $200 a month.
  Such an unprofitable endevour, I cant wait for them to crash and burn. Catch me not getting dependent on this.
arkwin 3 days ago ago
Just wanted to comment here: I have been using Opus 4.6, 4.7, and 4.8 just fine to look for Linux kernel vulnerabilities (I'm in the cyber verification program), and it's been fine. I switched to Claude Fable 5, and now I'm getting policy violations.
What's the point of being in the cyber verification program at this point? It looks like I cannot use Fable 5 for vulnerability research.
Retr0id 3 days ago ago
The escalating nerfs of "cybersecurity" topics is incredibly frustrating. Opus 4.6 had boundaries that seemed reasonable to me but 4.7+ turned it into a moralizing asshole. It'd be less bad if it just gave an error message, but instead it churns a long thinking trace before writing an essay about why what you're asking is bad and wrong.
I'll be disappointed when 4.6 is retired.
noncoml 3 days ago ago
Imagine if Google would roll this out to the search engine. We can't let you search for that because it may be used for "evil"
yobid20 3 days ago ago
is it smart enough to know not to walk to the car wash?
SubiculumCode 3 days ago ago
I was a bit disappointed that it refused to use Fable to help check whether I was propagating uncertainty from BLUPs in my random effects model up to the subsequent group level analysis in a maturational coupling analysis of brain data. I guess brains and random effects blew its lid.
dominotw 3 days ago ago
system card = marketing material with heavily gamed benchmarks.
[-]
- bitwize 3 days ago ago
  Cope harder. A year and a half ago, people were mocking Devin for claiming that AI could develop software at all. Yet here we are, when AI is developing most commercial software.
theodorewiles 3 days ago ago
... and /compact triggers
Error: Error during compaction: API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup).
Guys please be serious
franze 3 days ago ago
btw in claude code
```
    /model claude-fable-5
```
tekla 3 days ago ago
Maybe at this point, Fable the game will be played generated by AI as we go.
UncleOxidant 3 days ago ago
> During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
How in blazes do you end up with a 50M line Ruby codebase? WTF?
[-]
- ieie3366 3 days ago ago
  Very easy. Just have a monorepo and enforce the use of a single language. The company I work in has 1m lines of TS and stripe has 50x our headcount, tracks out pretty well
a day ago ago
[deleted]
a day ago ago
[deleted]
AMILLI_AI_CORP 3 days ago ago
AMilliPay.com
hugodan 3 days ago ago
mankind has reached its final destination
WebGuyMe 2 days ago ago
Eh, to me it just seems that it gives me longer replies and is actually worse than Opus 4.8.
I am sure there's a lot of PR bot and folks who would like to tell me otherwise. I believe what I see.
rarisma 3 days ago ago
The subscription bit makes no sense has capacity appeared for these 2ish weeks out of thin air that'll vanish? why is it available now but wont be in 2ish weeks?
am i missing something?
why would I pay 200 out of pocket and then some for the best model, it seems very silly.
catigula 3 days ago ago
>The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world
Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.
What's the value add here?
bradley13 3 days ago ago
Can we please stop with the extreme "safeguards"? I don't want to waste processing power on a model deciding whether is can answer my question, or ensuring that it's answer is politically correct.
firemelt 3 days ago ago
so should I use it with workflows?
tomjakubowski 3 days ago ago
Paging senko, let's see Fable's oneshotted RTS!
https://senko.net/vibecode-bench/
kevinalexbrown 3 days ago ago
"tell me about biology" -> "Switched to Opus 4.8"
boyander 2 days ago ago
Just another "a" and we have it. https://faable.com/
fagnerbrack 3 days ago ago
What pisses me off is that everything people are doing is so walled garden / closed source. Sharing knowledge between companies would be so fucking useful to humanity.
darkwater 3 days ago ago
Another Anthropic release, another doomsday for developers.
This time looks like we will only be able to find work making bioweapons, or distilling models.
3 days ago ago
[deleted]
bitpush 3 days ago ago
404?
[-]
- Philpax 3 days ago ago
  Looks like they're still getting the post out, but the model is live now, and the system card is at https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3... .
beydogan 3 days ago ago
my pet conspiracy theory is this is the Opus 4.5 from a few months ago which was extremely good but dumbed down after a week because it was just too good, they didn't want to release it to public. They pulled it down and deployed another "Opus", after that it was just a downhill. Opus 4.8 is unusable for me in React Native, TS, Rails development work.
Opus 4.8 gets stuck in weird loops where Codex one shots the bugs.
christkv 3 days ago ago
Meh more hype for marginal improvements and from Im hearing badly calibrated guardrails causing it to stop mid operation. I guess anything to juice an IPO
rambojohnson 2 days ago ago
pdf gives 404
w4yai 3 days ago ago
Pelican guy ! Where are you ? :)
jwpapi 2 days ago ago
Holy shit. I gave it the first actual task I’m facing, it makes me so angry. It just does 7 things more than I asked it fore and it does it so bad. It took 5 minutes and 5 seconds just running time, plus giving me frustration and make me lose my context. Hand-coded I would’ve been done in 3. And it would be code I understand can look at in one year and work on again.
It’s really tough to have sanity fight against hype bros in your head. Probably I should just not visit the internet anymore
To me it’s all just people getting scammed better. With every model it looks better, but it’s at least equally worse to work with, which is the reality it needs to be. It’s less scalable more, code, tougher to understand. Your digging your own grave better kind of.
[-]
- beeandapenguin 2 days ago ago
  If the task is so simple why use a model like Fable 5? Wrong tool for the job?
3 days ago ago
[deleted]
byteoptimizer 3 days ago ago
Is Claude Fable 5 is Mythos ?
[-]
- ishurand4 3 days ago ago
  Yeah, it is also known as Claude Mythos 5
xeyownt 3 days ago ago
Anthropic, can you please stop the FUD?
Release your best model, let the world adapt and evolve, and let's move to the next thing.
__lain__ 3 days ago ago
It won't even run a basic /security-review command without reverting to Opus 4.8. Utterly useless.
3 days ago ago
[deleted]
dhavd 2 days ago ago
this is good
asciii 2 days ago ago
jjj
frevib 3 days ago ago
At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences. Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.
From Opus 4.6 there are no noticeable improvements for me in code generation. It works very well, till 90% completion, if you guide it correctly. And you need a little luck. For serious production code I need to understand what I’m doing so it helps a bit, sometimes.
[-]
- matheusmoreira 3 days ago ago
  > Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.
  This is a good thing. I wish every company would do this. I subscribed to Proton Mail after interacting with someone from their team here on HN.
- pinkmuffinere 3 days ago ago
  > catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences
  This is just good business sense. In what scenario would you ever make the names dumb and forgettable?
  > Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.
  This is good customer support, lol. From what I can tell, it is indeed Boris Cherny responding, not outsourced to AI or other staff. You're really getting a response from Boris. I suppose that is PR, but it's not unjustified PR, it's accurate.
  I'm not even a crazy AI fan, but your criticisms are ridiculous here. It reminds me of the quote from Knives Out -- "Your Honor, she endeared herself to him through hard work and good humor."
- aspenmartin 3 days ago ago
  Your observations are right but pretty insane to consider them a pure PR company lol. They are making more frequent releases so yes the release-to-release quality is smaller but we’re still ascending quality and reliability curves the same way we have since GPT-3. You get a GPT4->5 leap every like 17 or 18 months I think it is
- astrange 3 days ago ago
  > Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences.
  They're originally named after the blends at a nearby coffee shop.
  https://postscript.co/pages/brew-guide
  I've noticed nobody at HN knows what "marketing" is or how to do it. It's not just naming things and being evil and cynical is not the most successful method.
  …also frontier models are a superhuman life changing experience. If they aren't, what possibly could be?
- CuriouslyC 3 days ago ago
  I dislike Anthropic but I wouldn't argue 4.8 isn't an improvement on 4.5/4.6. Your tasks just might not typically need the extra intelligence.
- gruez 3 days ago ago
  I don't get it, your complaint is that they have catchy names rather than dry names like GPT-5.6? Does OpenAI hype their models less?
- aenis 3 days ago ago
  Not my impression. I felt 4.7 was a regression, but I am again badly in love with 4.8 with the level of insights it produces in design discussions, and how long can it go unattended while producing spec-adhering quality code. There are problems it still can't solve well, from the edges of algorithmics and far from the mainstream, but for lots of stuff it is godlike.
  Also, I dont think Boris C. is coming here for PR. He is a tech guy, and this is the best place for tech discussions. Why so cynical? The guy is an engineer.
- jwpapi 3 days ago ago
  I don’t even think that Boris is really just one person. He apparently vibe coded Claude Code and is responding on Threads, Twitter, HN and everywhere.
- guybedo 3 days ago ago
  They're good at marketing, but my first subjective assessment of Fable is that it's really smart.
  I've been working with gpt 5.5 and opus 4.8 quite a lot, and interacting with Fable feels like a smart guy just entered the room.
- iillexial 3 days ago ago
  >Hey! Boris from the Claude Code team!
  >TOP 5 METHODS FROM BORIS ON HOW TO SPEND MORE MONEY ON TOKENS
  >Boris from Claude just told he doesn't prompt anymore. He LOOPS instead
  >"chatgpt has gotten soooo much better with the latest update."
  >"codex is the best AI coding product and we want to make it easy to try."
  Karpathy about Fable 5:
  >"You can give it a lot more ambitious tasks than what you're used to, the model "gets it""
  Sam Altman about gpt-5.4:
  >In my experience, it "gets what to do"
  What a time to be alive. Models are great, but all the slop, marketing, and fakeness around them is just unbearable.
- avaer 3 days ago ago
  If you truly believe this, you've discovered a superpower over everyone else in the industry.
  While everyone else is wasting time and money on the slower, more expensive models, you've found a way to outpace everyone for less money. Everyone else is wrong and you will get rich.
  (I don't actually believe the premise is true, I'm just pointing out the logical conclusion to what you're saying so maybe we can reconsider the premise)
- atleastoptimal 3 days ago ago
  > At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human
  Lol anti-AI bias on HN is crazy. Simply giving your product a quirky name is now being considered manipulative advertising. Is just doing normal PR and marketing something AI companies aren't allowed to do?
- thefreeman 3 days ago ago
  How can you make this comment before even having a chance to try the new major model revision?
- piyuv 3 days ago ago
  Current AI hype is built on marketing and PR, not capabilities, and has been from the start.
  I still remember Sam Altman “begging AI to be regulated” and AGI being “some thousand days away”.
  Breed faster horses and hope one will birth a locomotive.
- system2 3 days ago ago
  You are right; all I noticed was a big-time slowdown. They increased the quota, but I cannot even reach the end of the day with these speeds. .NET coding somehow improved, though.
- WarmWash 3 days ago ago
  Don't forget the DoD stint that gave them this recent public boost.
  Defy standard DoD precedent going back forever, that every other country has some form of too, and championing it like they are some kind of moral freedom fighters.
  Like selling the DoD guns and telling them they can only shoot bad guys with those guns, and that you will be the one to decide who counts as a bad guy...
- MattGaiser 3 days ago ago
  Doesn't this suggest your use case is simply insufficiently complicated?
- reasonableklout 3 days ago ago
  I think this says more about your type of work than anything. For bugfinding/incident response in distributed systems - which often involves extensive use of Datadog/Sentry MCPs and poring over heaps of logs in addition to reading tons of code - 4.8 has been significantly better than 4.6.
- xpct 3 days ago ago
  Indeed, hearing "Mythos-class model" felt very icky to me.
- mawadev 3 days ago ago
  When the Ai overlord is descending into pleb space to say Hi, you know stuff is real
- MagicMoonlight 3 days ago ago
  [dead]
- chis 3 days ago ago
  Hackernews not blindly hate on AI challenge: impossible
localhoster 3 days ago ago
is it just me, or this model is simply not available in cc?
the opus 4.8 I assumed wasnt available to enterprise seats, but it explicitly says cc that fable is available in cc. I can't find it, and im on latest version.
gulugawa 3 days ago ago
Fable is aptly named for a something that is another scam.
briandoll 3 days ago ago
New chapter
fabled-out 3 days ago ago
This i
3 days ago ago
[deleted]
jorl17 3 days ago ago
So, in the past I've shared that I evaluate AI models by feeding them my ever-growing large collection of personal poems that span well over 800 poems (1000 depending on how you count) and over 250k tokens.
What I do is feed it some initial prompt asking it to simply discuss what can be said when faced with this unedited, unseen collection of poetry. I ask the model to evaluate who the author is (or claims to be), what they went through in life, if there are different chronological poetic "phases" or different types of poetry. I request an analysis of the body of work and of the author themselves. In the more recent versions of the prompt I ask it to dive deep. Then I add the poems, chronologically sorted, with an index, a title, and a date (and subpoems, if they have them).
Crucially: Since ~70% of my poetry (or thereabouts) is in portuguese, I ask this in portuguese, and I get back an analysis in (european) portuguese. Earlier models couldn't even do that properly.
In the past, I couldn't use such prompts, and had to use longer, more guiding ones. I also couldn't even feed all of my poetry to the models because they just did not have enough context.
I'll go ahead and state that Claude Fable is undoubtedly the best model I have seen, though I cannot put a number on how significant a leap it is -- perhaps because my benchmark does not allow me to evaluate that anymore. I would say it is a significant leap over Opus 4.6, though -- a new level of understanding. Okay, I'll try to put a number: if Opus 4.6 was a 16/20, this is a 17.5/20. These numbers are pointless, but I had to try.
It made one (1) relevant mistake I could identify (where it messed up the names of two relevant people in my life who I have not talked to in over 5 years).
I'm impressed by how it just feels like it's getting the person behind the poetry, and how nearly every statement it makes is correct -- and when it isn't I am completely aware that no one could know based on the poetry alone (bar that one mistake I mentioned -- and that's very needle in a haystack, like deducing the name of a person based on a poem based on another poem with hundreds of other poems in between!)
It's really hard to explain, but it just finds more correct connections between the poems and explain much better my (recollection of) a state of mind when writing poetry. This is also the first time where it really unravels some key concepts of my poetry in a way that seemed almost effortless: it lays bare the poems and what they imply about the meaning of some of my concepts. Other good models understood these concepts, but this feels like it's on another level, as if it's making it simpler as it speaks, rather than the opposite -- like a good teacher.
When it is explaining several topics related to my poetry and myself, it cites poems which even I had already forgotten but which it is entirely right to select.
I am actually feeling a bit emotional with how much it "understands" of me here. It's somewhat incredible how LLMs have progressed from the lack of comprehension of a couple of poems paired together, going through realizing a body of work has some guiding principles and cohesion, to truly figuring out these deep concepts and intricate connections which I know for a fact would take months of someone's life to unearth. Every major breakthrough feels like my soul is being spliced together by an AI model out of these hundreds of tiny pieces of me. I can't put into words how unbelievable this feels, and this Fable analysis, like others before it, is on a new level.
Let me put it this way: there are several poems in my collection which one can try to "guess" the meaning or context of. But I don't think many people would get it, because they would have had to know me really well and to be following along my life as it went. Even then, they could very well fail to attribute such meaning. And, with each new major release, models have gotten much better at guessing.
Before Opus, they would guess incorrectly often, and in many scenarios where I thought it was rather obvious that they were wrong. I think a human spending time looking at the poetry would quickly dismiss the proposed ideas of the model.
With Opus, it was the first time that I would almost always say: "Ok, the model got this wrong, but I think many humans would make the same 'mistake', and it wouldn't surprise me if everyone just assumed what Opus did".
Now, with Fable, there are very, very, very few sentences in this very long answer it produced where I can say: "Yeah you got that wrong, but I get it". In almost every situation it is mapping concepts, ideas, interpretations and cause-and-effect correctly. Yes, it is hard to "guess" what I thought, or was going through, or how X connected to Y -- but this model is doing it, incredibly consistently. I know I'll get the usual naysayers to these posts who think I'm just shilling a model, but this is the truth: what is being done here is amazing and I don't believe I know any person around me who would find this out about myself reading all of my poetry.
I often write poetry from the point of view of other people (some of which I do not know) and models (even Opus) have this tendency to make the opinions in poems as my own. Fable is the first that looks at a particular poem here and says "maybe this is not the author's opinion, who knows". The literal first model. It then immediately fails to do so with another poem, assuming it was about myself, but it's clear, undeniable progress. And like I said: I think most people would not _know_ which poems are truly about myself or not.
I've written word after word here, and yet words elude me to convey what this model represents to me. How it's almost always right, how it sees my fractured bits as a sort of cohesive whole, and how it just seems to "understand everything better". That's just it: it just seems like it really understood everything better. Like Opus before it, and like Gemini 2.5 pro before it. Out of the tens of thousands of verses, it picks some which no other model had picked and which I feel truly represent some of my best work. Older models seemed to sort of have a "hole" in its knowledge in the middle of the corpus, where they knew what was there but in a sort of hazy/foggy way. This model seems to recall every part of the corpus with the same precision.
For context:
- Opus 4.7/4.8 were a noticeable downgrade over Opus 4.6. They wrote more, in a harder to parse way, and they made up more. Still, All Opus models are clearly superior to everyone else by a large margin
- Sonnet-level models have a slight edge above the best of the other models. But they make too many mistakes, don't grasp several concepts, mix up their dates and timelines. 3 years ago I would have been blown away by Sonnet models but today they are inferior.
- Gemini models have a unique way of approaching the request, where they try to literally interpret my poetry as a mathematical theory. This sort of makes sense if you look at some poems, but it is surely laughable, as if someone one day actually has access to all of it, no one in their right mind would do so. This is a shame, because the first big breakthrough with LLMs and my poetry, to me, came with 2.5 pro, which was the first model that could look at the whole corpus as a cohesive whole without getting lost in the middle of it or making things up.
- GPT models have improved over time and also have this sort of alien-like language, sometimes being a bit too blunt in their analysis, but I can't say they are meaningfully superior to Gemini models.
I am very pleased to see progress in this area again, as Opus 4.7/4.8 were NOT progress and I was worried that we had hit a plateau here, but I can't say that.
In all honesty, the level of understanding and cohesion that Anthropic's models (Opus and above) have over my poetry means I fear my benchmark may be hitting its limits, as I don't know if there's anything a model could do that would wow me and lead me to say "this is a major breakthrough". Perhaps Mythos is a major breakthrough and I don't know. I can't find much that's wrong with it, but I also couldn't with Opus.
As I have in the past, I will periodically probe the model again and see how coherent it is. For now, I'm very happy to see an improvement.
What surprised me the most was that even though I set the thinking budget to xhigh (in OpenRouter), this model instantly started replying without showing a thinking block. I thought it just had the thinking hidden but that is not the case, as some replies showed thinking and anyway the first reply was blazingly fast. (I will try Opus 4.6 without thinking now, just to see if it changes it for the better -- maybe that was just it. I'll edit the message if it shows improvement).
aryanchaurasia 3 days ago ago
it feels exciting lol
andai 3 days ago ago
> Distillation. We’ve previously identified large-scale attempts to extract (“distill”) Claude’s capabilities to train competing models in authoritarian countries.
Glad to hear the UK is finally making an effort to catch up on the AI front ;)
[-]
- b3kart 3 days ago ago
  https://en.wikipedia.org/wiki/The_Economist_Democracy_Index
  Probably tongue-in-cheek, but UK 18th, US joint 34th with Poland
- james2doyle 3 days ago ago
  Just last week you could distill using other users responses! Handy!
- dyauspitr 3 days ago ago
  Rookie numbers. Come to the US to see auth done right.
- kylehotchkiss 3 days ago ago
  wasn't claude distilled from the entire creative and research output of every English speaker alive
3 days ago ago
[deleted]
OOTW 3 days ago ago
[flagged]
manojkumarp 3 days ago ago
[flagged]
3 days ago ago
[deleted]
greedydecode 17 hours ago ago
[flagged]
OOTW 3 days ago ago
[flagged]
bobosmrad 3 days ago ago
[dead]
lellow 2 days ago ago
[dead]
nl 3 days ago ago
[dead]
CoderAshton 3 days ago ago
[dead]
sanjitb 3 days ago ago
[dead]
perimeterless 2 days ago ago
[dead]
weavoapp 2 days ago ago
[flagged]
Georgecal 2 days ago ago
[dead]
RishiByte 3 days ago ago
[flagged]
gauravvij137 2 days ago ago
[flagged]
Stevvo 3 days ago ago
[dead]
bogota 2 days ago ago
[dead]
surcap526 9 hours ago ago
[dead]
surcap526 9 hours ago ago
[dead]
amdeisimncrmnls 3 days ago ago
[flagged]
heugt a day ago ago
[flagged]
WhoAteSnorlax 3 days ago ago
[dead]
tomaspiaggio12 3 days ago ago
[dead]
YumpiLumpus 3 days ago ago
[dead]
bonigv 2 days ago ago
[dead]
jheriko 2 days ago ago
[dead]
hmokiguess 3 days ago ago
I have got it to one shot GTA 6 we can finally play it, it only took ultracode make no mistakes (/s)
acentaur 3 days ago ago
[dead]
ashishp15 2 days ago ago
[dead]
mugivarra69 3 days ago ago
[dead]
3 days ago ago
[deleted]
bigboggerlogins 2 days ago ago
[dead]
spectraldrift 3 days ago ago
[flagged]
robertacion 3 days ago ago
[dead]
[-]
- wslh 3 days ago ago
  It's ambiguous? Because is about Mythos specifically and Fable != Mythos.
- ebiester 3 days ago ago
  I mean, if by right you mean "insiders leaked to make a few bucks..." sure?
38484858 3 days ago ago
[flagged]
simunskxcsckss 3 days ago ago
[flagged]
[-]
- minimaxir 3 days ago ago
  You can't tell someone to "get a life" while taking the effort to create a burner account for the sole purpose of insulting someone.
- rvz 3 days ago ago
  I don't really consider that a great benchmark anyway and we really need better ones that are objective instead of these mostly performative and cheatable and also available in the training set.
- ilaksh 3 days ago ago
  Simon's pelicans are an institution. Are you trying to get banned. Lmao.
bjord 3 days ago ago
I thought they said mythos was too dangerous to make generally available?
[-]
- Philpax 3 days ago ago
  "Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.
  For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program."
- dmix 3 days ago ago
  This is covered in their post…
- rvz 3 days ago ago
  You fell for their fearmongering and marketing fundraising call which was done on purpose.
  Now they want to pause AI because of "recursive self improvement".
  Fool me once shame on you fool me twice...
- tomeraberbach 3 days ago ago
  "Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8."
hoony_han 2 days ago ago
진심으로 한심한 모델
내 프로젝트의 있는 취약점 찾아달라는 말만 해도 안전 코드로 4.8로 모델 강제 전환시키고, 이후로 취약점과 완전히 무관한 상식적인 대화를 해도 앞 턴에 있었던 안전 코드 때문에 진행도 안됨. 도대체 이딴 누더기 수준의 안전 장치로 뺄 거면 뭐하러 뺌? 대화 조금만 진행되도 자동으로 모델 다운 시켜서, 할 줄 아는거라곤 돈만 많이 쳐먹고 개발 수준 조금 더 나아지는거? 상식적으로 내 프로젝트에, 내 소스코드를 다 보고 있는 상태로 문제를 찾는데 이것도 하지 말라면 도대체 뭘 하라는거임? 엔트로픽 이 새끼들 하는 짓이 갈 수록 열 받네.