Here’s an interesting prompt, turn a mockup into an application.
I’ve been getting some dumbed down outputs so it’s no wonder that the Program Bench has such low scores.
The amount of hand-holding is absurd and it can still get it wrong when presented with the answer which is nuts. I do wonder what the future holds. Right now there’s just no way that these (large language) models are worth any money since outputs vary wildly.
While working on WebVG it became apparent that the (large language) models will often ignore part of the prompt. Cutting corners to “save tokens” is a ridiculous concept as it gets nothing done right. I think 2030 will be horrible if they all start asking for payment.
(Large language) models are very useful but a replacement? No wsy. Like how I had to learn about Kokoro TTS ONNX before I could make a video about it.