Evaluating LLMs that write code by their perceived soft skills
In the context of programming, Large Language Models (LLMs for short) are usually measured and compared only by their ability to write code. This makes sense because code is easily measurable, even if some metrics have limited value.
We never focus on soft skills because we don’t usually include them in team meetings. They are given a task, complete it quickly, and that’s the interaction. But we humans care about our interactions with LLMs, and we notice when they change their tone or general demeanour towards us.