Talk:Reflection (artificial intelligence)

How you can contribute

Would it be better to shorten or supplement the data transferred from Prompt Engineering?
The history section could be expanded, ideally incorporating the achievements of external teams (both foundational developments and agent-based systems like DeepClaude).
A list of benchmarks and their results should be included.

TheTeslak (talk) 11:04, 5 February 2025 (UTC)[reply]

Thanks for leading this. As of now, it needs more references (e.g. the Coconut's and R1 papers), and the mention of reinforcement learning applied to reasoning steps. Hplotter (talk) 16:34, 5 February 2025 (UTC)[reply]

The "Techniques" section was copied from the article prompt engineering. So it's mostly relevant for how humans should prompt LLMs, rather than for how to train LLMs for reflection. This part seems to need improvement. Alenoach (talk) 02:02, 6 February 2025 (UTC)[reply]

Non-LLM reflection

I support broadening the definition to every neural networks that feed back to neurons already passed, on the same input. This would also include e.g. fully-connected modern Hopfield networks, hence my previous distinction, since this would not particularly be tied to LLMs. I don't want however to verse into WP:No original research, so if you are aware of literature about such broader definition, please add them. Hplotter (talk) 10:18, 7 February 2025 (UTC)[reply]

Why is this needed?

I'm not at all sure that this article is needed; the original prompt engineering page already is questionable enough, given the pseudoscientific nature of the topic. Similar to that article, this seems to be highly biased towards the nonconsensus view that prompt engineering is an actual scientific discipline that can be meaningfully considered as a kind of engineering. cgranade (talk) 21:39, 8 February 2025 (UTC)[reply]

I totally get your concerns about scientific rigor, but what’s your suggestion? Just delete both articles?

The general principles of reflection are important to describe, and reflection itself is a real and rapidly developing area within LLM research. The lack of a strong evidentiary base is precisely why there aren’t many well-developed articles on the topic yet. LLM research, in general, doesn’t follow the same academic and peer-review traditions as classical sciences, which makes it harder to meet traditional standards of rigor. But this article isn’t trying to position reflection in LLMs as a classical scientific discipline: it's documenting an emerging area.

I've read the discussion on prompt engineering and agree that, from a classical academic perspective, the evidentiary base is still developing. But what about fields like neurobiology? It would be unreasonable to say it doesn't exist or that articles on it aren’t needed, even though its evidentiary standards can be debated. The same applies to political science, which deals with complex, often subjective topics yet is widely documented on Wiki.

If you'd like to discuss the scientific validity of specific claims or concerns about bias toward a non-consensus perspective, I'm happy to go over them. TheTeslak (talk) 02:20, 14 February 2025 (UTC)[reply]

Regarding bias, let’s look at OpenAI’s papers. They are, of course, an interested party, and their work hasn’t undergone independent peer review. But there’s no evidence that they are falsifying their results (please share if you have any). I understand that framing the question this way isn’t entirely scientific, but as I mentioned earlier, this is a rapidly evolving field and not a "pure" science. We do have confirmations from different sources showing similar results, and their papers reflect the broader development of models. Judging by the community discussions, this approach seems to be acceptable, as long as the necessary caveats are included.

I would add to the Criticism section that the research isn’t rigorous enough, but what can I cite to ensure it doesn’t come across as just personal opinion? Aside from the fact that these studies aren’t peer-reviewed, it would be useful to have a breakdown of specific flaws, ideally in highly cited works as well. TheTeslak (talk) 02:40, 14 February 2025 (UTC)[reply]

Potential merge

The article Reasoning language model covers a very similar topic. Maybe we should consider merging the two articles. Alenoach (talk) 20:36, 20 February 2025 (UTC)[reply]

I agree with the need for a merge, though I think we should keep the 'reflection' title, as 'reasoning' is unspecific (all non-reflective LLM reason). Hplotter (talk) 11:49, 11 March 2025 (UTC)[reply]

I agree that a merge is necessary. Initially, I hadn't seen the RLM article by Cosmia Nebula (which is very good) and ended up here after reading the discussion at this talk page.

The main issue here is the naming. In the prompt engineering discussion, Alenoach suggested "reflection," which I believe is the most accurate term. Currently, there is no standard term; various studies and discussions refer to these models as Large Reasoning Models, Reasoning Language Models, and Reasoning Models.

I would also like to point out that another established term exists: VLM. However, it is used less frequently than LLM when referring to multimodal models. Moreover, I think that distinguishing such terms has its limitations. Modern models often have the ability to speak, listen, and handle various inputs, so introducing umbrella terms such as VARLM or splitting the concepts into V, A, and R would be excessive.

I would like to hear @Cosmia Nebula opinion. TheTeslak (talk) 00:35, 14 March 2025 (UTC)[reply]

The name "Reflection" is unclear.

Is it supposed to cover all language models that "reflect"... on what? Does Chain of Thought count as reflection? If so, then it does not fall under the format as shown in the lead image, which clearly shows a very specific format of what "reflection" is supposed to be:

Assuming that, then many reasoning language models do not reflect, yet they do reason. Specifically, DeepSeek-R1 and Grok 3 do not reflect. They simply generate one very long thinking trace.

In fact, it seems to me that "Reflection" is supposed to be a quite specific Cognitive architecture. If you consider "just run a very long chain of thought" as a good cognitive architecture, then I have no reply, other than note that if that's the case, then the lead image is wrong.

I think the page should pivot to covering the specific "Reflection" approach to reasoning language model, instead of attempting to cover all reasoning language models and other test-time computing methods. pony in a strange land (talk) 02:10, 14 March 2025 (UTC)[reply]

Okay. Do you view this article as a distinct topic or do you believe it should be merged? Considering that both articles receive very few contributions, their completeness is affected. TheTeslak (talk) 18:52, 14 March 2025 (UTC)[reply]

I think the two pages should remain separate. If Wikipedia can have one page for every town and every WWII ship, it has enough room for both these pages. pony in a strange land (talk) 04:00, 24 April 2025 (UTC)[reply]

For me, what makes reflection specific is that there is a feedback connection to earlier layers, after either a full pass (last layers connect back to the first ones after decoding, like in every commercial LLM to this day that implement it), or without decoding in between (continuously in latent space) or via inter-layer feedback connection (sub-network recurrence, only in a few research pub.s for now),. i.e. this refers to recurrence in depth, a topological property. I think it was clearer in one of my version, but aleonach partially reverted it. Hplotter (talk) 16:54, 15 March 2025 (UTC)[reply]

Looking back at one of my edits, there is indeed a sentence in the "Introduction" section that seems clearer before. Feel free to modify it if the current phrasing is not great. Alenoach (talk) 01:10, 16 March 2025 (UTC)[reply]

I eventually modified the sentence in this edit to be closer to the original meaning. Alenoach (talk) 01:39, 16 March 2025 (UTC)[reply]

I think ‘Feedback (artificial intelligence)’ as a title for the merge solves issues with both: it is used in the literature (both in AI and neuroscience, sometimes as ‘top-down feedback’), meaningful (unlike ‘reasoning’), and unspecific to LLMs. Hplotter (talk) 08:24, 6 April 2025 (UTC)[reply]

@Alenoach: I think Reasoning language model should have its own page. Obviously the field is moving fast but we should have a page that describes reasoning models as an object, instead of reflection describing some kind of specific technique or sub-module. It's kind of a generic name for a specific thing. To be clear it describes the thing that OpenAI O1/O3/O4 or R1 or Gemini Flash is, which is actually trained using RL into the model and not just some structured chain of thought prompting although perhaps that's the primitive form of it – Kjerish (talk) 05:07, 28 April 2025 (UTC)[reply]

@Alenoach: @Hplotter: @TheTeslak: Update: There seemed to be a lot of sources of confusion and hopefully it is more clear now. For one thing, a lot of the content in the old Reasoning language model article was just a re-hashing of LLM concepts instead of focusing on the additional techniques that make RLMs unique. I removed the LLM content. I also moved content from the Reflection article to the RLM article, where RLMs were specifically talked about, which should not be controversial. The old version of Reflection's own article gets these concepts confused. For example "Reflective models require significantly more test-time compute than non-reasoning models.", when it could have said "Reasoning models require... than non-reasoning models". It uses the words "Reflective models" and "Reasoning models" interchangeably. I think this merge proposal is reversed. I think we should move essentially all content to the RLM page and remove the reflection page and re-link the references to it, unless there is value in talking about it as a generalized technique. I am also okay with reverting both pages to the versions before I started editing today but I think the difference is a lot more clear now. The primary article should be about a noun and not about an adjective – Kjerish (talk) 22:18, 4 May 2025 (UTC)[reply]

Seems good to me. I believe the difference is too small to justify having two articles, and having only one article should facilitate maintenance. Alenoach (talk) 22:29, 4 May 2025 (UTC)[reply]

unless there is value in talking about it as a generalized technique

I think so, and if we're merging, it seems more logical to write about a specific application inside the article of a more general concept than the contrary, even if the latter end up being more talked about in it. As I said, perhaps it should be renamed to something like 'Feedback (artificial neural network)' though, as it is a more common name.

Reflection is a noun.

I take this opportunity to voice that 'thinking language model' would be a more proper/clear/revealing denomination than 'reasoning', and is the one chosen by at least Google & Anthropic. Hplotter (talk) 16:35, 6 May 2025 (UTC)[reply]

Badly sourced promotional essay

It reads like an essay ("Introduction"?) and the sourcing is almost entirely arXiv preprints and promotional material from the companies themselves. Is this just advertising? - David Gerard (talk) 08:56, 10 March 2025 (UTC)[reply]