Looking at reddit, it seems like a lot of people *want* their LLMs to be sycophantic. But I think that part of what's going on is that some users associate sycophantic LLMs with feeling understood and supported. But sycophancy can be de-coupled from supportiveness. If users are given the choice between a supportive LLM that lies a lot (sycophantic) and an unsupportive LLM that never lies, there'll be quite a lot of people that pick the sycophant. But if the choice is between a supportive LLM that lies a lot (sycophantic) and a supportive LLM that doesn't lie, it's a much less painful gap -- users won't miss the sycophant as much if the replacement is still supportive. ![[Pasted image 20250823070734.png]] ![[Pasted image 20250823070611.png]] LLMs need to learn reflective disagreement. When they disagree with the user, they should disagree - BUT first they should reflect the user's view to show that they properly understand it, and then state the different view with no artificial praise and no negative moralization. The solution isn't just aggressive argumentativeness either. I remember reading a book a long time ago about helping people with alcoholism. One of the things it talked about is that labeling, arguing, etc, generally doesn't change behavior or minds, instead, it causes the other person to double down and push back, because they feel like they're being attacked and need to defend themselves. This is basically about how to disagree agreeably. And the answer is that when you disagree, it'll land most well if you first reflect back what the other person's view is ("You think X ... perhaps because Y ... is that right?") and confirm that they feel you *get it* before you share your view. And then when you share your view, share it as your view, not the moral high-ground, or the one truth. This all of course generalizes beyond LLMs. It's the same in relationships, parenting, work collaborations. Making others feel understood AND having a backbone and sharing your view. Reciprocal understanding. I make a genuine effort to reflect back your view to you, AND, I share my view. (Side note: Don't say "I understand" -- to make someone feel understood, you have to [[Learning Resonance|*show* you understand]], don't tell.) Reflective disagreement makes it more likely that two people will reach mutual understanding and feel understood without feeling disconnected/annoyed at each other, and I think it might also help LLMs make users feel supported without lying to them. As a quick fix if your ChatGPT is saying "love this" "good question" and "great start" -- you can add this to your custom instructions: > Please avoid praise such as “love this,” “good question,” or “great start.” Instead, keep responses neutral, no positive or negative judgments. I posted an earlier variant of this [on twitter](https://x.com/chrisbarber/status/1956749769986265388) and the comments were: * Shannon Sands, AI researcher at Nous: "That's an interesting take, and it certainly seems trainable. Will keep it in mind for future rlaif actually." * David Manheim, lecturer at Technion: "Maybe a good goal in the near term, but not a solution that is clearly operationalizable in terms of what to reward the model for - except in the sense of specific phrases that would be over-used - and not a solution to address broader issues of what a future with AGI looks like." * Davidad, program director at ARIA Research: "i basically agree that this is important and good"