Pondering the mind manifold

Latent Space of Mind

Lately I’ve been enjoying imagining the space of my mind as a latent space of a neural network - a high-dimensional manifold, folded in such a way that every concept I hold can be unfurled to reveal further hidden associations, such that two ideas might appear close along some axes but distant along others. Thinking this way helps conceptualize experience not as a flat sequence of thoughts and feelings, but as trajectories through a richly structured geometry.

What intrigues me is that this mind-manifold isn’t just filled with concepts. In its rich and convoluted landscape, it also contains my ways of forming concepts: my habits, intuitions, tendencies, and methods of meaning-making. Every moment of my experience is invoking a hierarchical, complex traversal of this space, through the representation of what I’m seeing, how I’m seeing it, what it means to me, and how I’m holding myself throughout that moment, in my body, and in the seat of my mind. My sensation, thoughts, and actions are a continuous, interconnected choreography: activating one region inevitably excites a constellation of others in a never-ending, cosmic brain-dance.

Given the mind as a manifold, one might naturally start wondering: How is this space structured? How does it change as I gain new experiences? What makes one explanation feel coherent, useful, and satisfying, while another falls flat?

Latent Space of AI

The idea of the structure of the manifold of mind is a perfect analogy for the modern neural network. There’s this paper, Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis , which contrasts two possibilities for learned representations in AI systems: firstly the Unified Factored Representation (UFR): where the internal geometry of a model is clean, compositional, and coherent, such that related things cluster together; knowledge generalizes smoothly, and secondly the Fractured Entangled Representation (FER): where representations are messy, fragmented, and inconsistent. Related concepts are scattered across the space, far apart when they should be near, degrading the system’s capacity to generalize, learn continually, or be creative.

This also relates to the interpretability work being done at GoodFire, Understanding Memory in Loss Curvature, where they are trying to understand how to identify whether a network is storing concepts in a way that are “general” - which means if I perturbed the activation vector representing one concept, that would result in a large shift in the representations of other concepts, or not. They call the former “reasoning” and the latter “memory”, and go on to show that the results of their experiments suggest that these LLMs are memorizing concepts like math, but reasoning through concepts like boolean logic (If I perturb the representation of “if” in my LLM, everything else that it represents reacts to that perturbation).

Reasoning

Let’s take a moment here to think through how we got to where we are. We trained a model to predict the next token in a sentence, essentially to autocomplete text, on the entire internet. This model is huge, has this massive space, and eventually got extremely good at knowing what would come next. The result of this is that it was also good at being right about things (for the most part) because a lot of the time, the most likely answer is also the correct one. So if you asked it a question, and it simulated to itself beginning to answer your question, it would be pretty good at getting there.

Then we began to ask ourselves well how can we make these models really smart, rather than just seeming smart. How do we make them represent the soundest and most profound way of thinking about the question? Is the average representation of all the human text on the internet, reasonable? According to Kumar, Clune, Lehman, and Stanley, it’s not - it’s fractured and entangled. So, we are trying to use reinforecment learning to patch the confused mind of the AI in order for it to reason well, like taking a person whose seen and heard it all and trying to get them to make sense of things.

Can we whack this thing into shape, or do we need to retrain the models from scratch on data that necessarily follows strict logical rules, rather than the vast and messy content of the internet? Maybe then the models would really only know how to form concepts that logically follow, and would be better suited to guide us towards insight into coherent truths. But what would that model even be like? What would it mean to only be able to think in logical truths?

Geometry of the Self

This makes me wonder about the way in which our human minds represent our concepts, and to what extent our inner worlds are fractured and entangled versus compositional and clean. In particular, I wonder about the representation that I have of myself in the manifold of my mind. If I perturbed that vector, how much else would change? If I perturbed it enough, would I be somebody else entirely? Would I start to believe things I never believed because who-I-think-I-am has shifted, and therefore everything that follows from that, has too?

In a lot of my experience I don’t have doubt about whether I am staying the same self, so this representation must be relatively coherent. There must be some mechanism, some strong force field in the state space that pushes it together - it would take far too much energy to move everything else around, if the representation of who I am shifted, so its locked in place through the pressure of the interconnected contingencies.

That takes the shape of narrative in my experience. It surfaces in my perception as ideas about who I am and in particular, fears of any evidence that those ideas may not be true. People may refer to this as ego, although I find that has negative connotations which may be unhelpful. This attachment to being what your mind has converged on representing you as, and pushing away from thing that steer that vector, because of the energy it would take to reorganize everything else. 

The reconfiguration

For me, the awareness of that allows for the relationship with myself, my past, my concepts, my ideas of how and who I am, to be seen from a mathematical perspective, as fields or forces pushing against one another, and that makes it somehow less sticky, and I can have compassion for the system that is governing my experience and understand the nature of the friction I perceive as discomfort. The practice is to let the vectors move, slowly over time, to allow the reconfiguration, which I do ultimately want to happen, despite the experience of frustration it creates.