LLMs have wildly varying quality day to day. Over the weekend Claude was doing incredible work, and using Claude Hooks was able to work overnight and solve something that previously I had wasted a week on to no progress. Then came today, Monday, and it feels completely useless.

When I asked it to tidy up the MR for the feature it worked out over the weekend it just simply couldn’t figure it out. GitLab CI is all of the sudden a foreign concept that it couldn’t work through.

Over the weekend I also did an attempt at making a brain for one of my project. A small follow-on to Searching with AI where I was messing with LightRag and Cognee. I spent about \textemdash$ to my laptop…


There must be someone doing daily or even hourly benchmarks on these models. I don’t care what anyone says there are absolutely long term and even instantaneous changes in model quality.

Today at the office I was getting 529 Status Codes from Anthropic and that should have been my sign that I wouldn’t get any good results from Claude today.