Daily Model Quality

LLMs have wildly varying quality day to day. Over the weekend Claude was doing incredible work, and using Claude Hooks was able to work overnight and solve something that previously I had wasted a week on to no progress. Then came today, Monday, and it feels completely useless.

When I asked it to tidy up the MR for the feature it worked out over the weekend it just simply couldn’t figure it out. GitLab CI is all of the sudden a foreign concept that it couldn’t work through.

Over the weekend I also did an attempt at making a brain for one of my project. A small follow-on to Searching with AI where I was messing with LightRag and Cognee. I spent about $200 in cre d i t so n t h e tw oso l u t i o n s, an d w h e n I t r i e d t o m ess w i t hi tt o ni g h tCl a u d e j u s t d e l e t e d a ll o f t h eco d e t ha tw a s ma d e t oco ll ec t an d p rocess t h e d a t a se t . A f t er t ha t I ma d e i tt i d y t hin g s u p an d co mmi t i t s w or k so w e w o u l d n^{'} tl ose an y t hin g e l se, an d I a s k e d i tt o g e tC o g n ee [[MCP]] w or kin g . Cl a u d e j u s t co u l d n^{'} t f i gu re i t o u t an d s t a r t e d t ry in g t o mak e an ss h t u nn e l f ro mm y l a pt o p$ \textemdash$ to my laptop…

There must be someone doing daily or even hourly benchmarks on these models. I don’t care what anyone says there are absolutely long term and even instantaneous changes in model quality.

Today at the office I was getting 529 Status Codes from Anthropic and that should have been my sign that I wouldn’t get any good results from Claude today.

Anson's Brain

Recent Notes

Gemma 4 on M4 Mac Mini

Labor and Monopoly Capital

My corner of the web before AI takes over

2 Chainz Rest

Anson Biggs

Daily Model Quality

Graph View