Daphne
Demekas

ML Engineer · Researcher · Writer

GitHub LinkedIn Google Scholar

About

San Francisco, CA · daphnedemekas@gmail.com ·

An AI researcher focused on aligning AI systems with human cognition, development, and flourishing. Recently worked with Emmett Shear at Softmax (an AI alignment company). Now I am co-founder of Mind at Large, an AI company building foundation models trained on body and brain data for wellbeing technology.

Education

M.Sc. in Computing (AI & ML), First Class Honours

2020 – 2021

Imperial College London

Thesis: Multi-agent generative model of the spread of ideas on Twitter, using active inference agents

Reinforcement Learning, Deep Learning, Machine Vision, NLP, Probabilistic Inference, Probabilistic Programming, Multi-agent Systems

B.Sc. in Mathematics, First Class Honours

2017 – 2020

University College London

Real and Complex Analysis, Probability & Statistics, Stochastic Processes, Risk & Decision Making, Financial Mathematics, Quantum Physics, Linear Algebra

Current Work

Founder & CTO — Mind at Large

Jan 2026 – Present

Co-founded with George Deane, training foundation models on body and brain data
Building a new wave of closed-loop wellbeing technology

Visiting Scholar — Center for Human-Compatible AI (CHAI), UC Berkeley

2026 – Present

Building Ways of Seeing, a rule-induction game and randomized study of how different kinds of AI assistance shape human skill
Testing whether answer-giving help quietly deskills users, while help that coaches them to design experiments against their own ideas builds skill that transfers to unaided tasks

Community Member — South Park Commons

Jan 2026 – Present

Peer-Reviewed Publications

Demekas, D. & Deane, G. (2025). “Recursive self-models and minimal phenomenal experience.” A computational architecture in which a policy that generates behavior is recursively coupled to a program model producing structured, executable explanations of it. The paper argues that minimal phenomenal experience arises when this self-modeling runs on simple, interoceptively focused programs rather than elaborate narratives.
Olsen, D., et al. (2025). “NEAR: Neural Embeddings for Amino Acid Relationships.” Bioinformatics. A lightweight neural model that computes per-residue embeddings and uses vector similarity search to filter protein sequences for homology, reaching higher accuracy and speed than HMMER’s pre-filter while remaining far faster than large protein language models.
Demekas, D., et al. (2023). “An Analytical Model of Active Inference in the Iterated Prisoner’s Dilemma.” International Workshop on Active Inference (IWAI). An analytically tractable model of two Bayesian active-inference agents playing the iterated Prisoner’s Dilemma, deriving the conditions under which the system transitions between game-theoretic steady states and how those critical points depend on learning rate and reward.
Heins, C., et al. (2023). “Spin Glass Systems as Collective Active Inference.” International Workshop on Active Inference (IWAI). Shows that the collective dynamics of a class of active-inference agents is equivalent to sampling from a spin-glass system, so a suitably designed collective can implement Boltzmann-machine inference — an equivalence that proves fragile under small changes to the agents’ models or interactions.
Albarracin, et al. (2022). “Epistemic Communities Under Active Inference.” Entropy. An in silico active-inference model of confirmation bias that reproduces the formation of echo chambers on social networks, showing that once agents grow sufficiently certain of their beliefs they become very hard to move.
Heins, C., Millidge, B., Demekas, D., et al. (2022). “pymdp: A Python Library for Active Inference in Discrete State Spaces.” Journal of Open Source Software. An open-source Python library for building and simulating active-inference agents in discrete state spaces, providing modular tools for Bayesian inference and free-energy minimization.
Demekas, D., et al. (2020). “An Investigation of the Free Energy Principle for Emotion Recognition.” Frontiers in Computational Neuroscience. A free-energy-principle framework for emotion recognition, proposing three waves of development: today’s passive deep-learning classifiers, active-inference devices that elicit emotional responses, and reciprocal human–machine interactions in which both sides synchronize their generative models.

Awards

The 2025 Computational Phenomenology of Pure Awareness Prize

Awarded to George Deane and Daphne Demekas for work on recursive self-models and minimal phenomenal experience — framing minimal phenomenal experience as a limit case within a computational architecture where a policy model generating behavior is recursively coupled to a program model explaining that behavior.

Past Work

Founding Engineer — Softmax

Sep 2023 – Jan 2026

Joined as third employee and wrote core technical stack
Multi-agent reinforcement learning environment studying emergent coordination and alignment
Co-led Cybernetics team designing experiments in agent learning and strategy development
Deep RL at scale: policy training, curriculum design, and reward structure debugging

Software Scientist — Wheeler Lab, University of Arizona

Jan 2023 – Aug 2024

NEAR (CNN-based protein homology detection) and DIPLOMAT (ML animal tracking and behavior analysis)
Mentored Masters students; participated in research groups and literature reviews

Research Associate (ML) — Birkbeck, University of London

May 2022 – Sep 2022

Fine-tuned diffusion models on the Victoria & Albert Museum collection, in partnership between the V&A and UC Berkeley
Built an interactive platform and ran a hands-on museum workshop where visitors used it to generate AI images blending styles, eras, artists, and materials from across the collection

Developer — Northeastern University, Network Science Institute

Jan 2022 – Jan 2023

Network simulations (Erdős-Rényi and Watts-Strogatz models); modeled belief propagation in active inference agent networks
First author on analytical model of Iterated Prisoner’s Dilemma showing bounded-rational Bayesian agents recover optimal strategies
Contributed mathematical derivations to work on active inference collectives as spin glass systems

Software Engineer — 9fin

Jan 2022 – Jan 2023

Backend engineering on fixed income asset information platform
Built endpoints using AWS state machines, lambdas, SQL, and S3
Computer vision and NLP: recommendation engine parsing PDF documents for legal team workflow optimization

ML Engineer — Nested Minds

Jan 2021 – Jan 2022

Active inference startup from Karl Friston’s theoretical neurobiology group at UCL
Algorithm design, generative models, backend development, infrastructure, team leadership
Huxley: AI diffusion algorithm for Duran Duran’s “Invisible” music video
Disney Autonomy: social interaction robot for theme park

Essays

Recursive Self-Modeling

March 2026

At the meeting point of philosophy and engineering lies the question: can a system come to know itself? Not in a metaphysical sense, but quite concretely: as the capacity of an agent to form a…

At the meeting point of philosophy and engineering lies the question: can a system come to know itself? Not in a metaphysical sense, but quite concretely: as the capacity of an agent to form a compressed and meaningful understanding of what kind of agent it is, in a formalism it can parse, in order to evaluate whether that is the kind of agent it values being, and then to steer itself toward a self it values more. This is the question that George Deane and I set out to formalize in our paper on Recursive Self-Modeling, which was awarded the 2025 Computational Phenomenology of Pure Awareness Prize.

The Problem of Self-Knowledge

Intelligent minds form self-concepts. We tell ourselves stories about who we are, for instance - that we are patient, or creative, or the kind of person who follows through. These stories are adaptive, and they shape our decisions, constrain our behavior, and serve as an internal compass. When we act against our self-concept we feel dissonance, and when we act in keeping with it we feel coherent. This acting-in-accordance-with-the-self-concept goes extremely deep, to fundamental beliefs formed early on about what experience as a Self is supposed to be like. We expect experience to have a particular flavour, and those expectations govern our reality - but we can still update, over time, our beliefs about who we take ourselves to be.

Uncomfortably, our self-narratives are not always accurate; indeed, we are capable of extraordinary self-deception. A person may sincerely believe themselves generous while acting, with great consistency, in their own interest. The story we tell about ourselves and the pattern of behavior we actually exhibit can drift apart, sometimes dramatically, without our ever noticing the gap. This is a feature of having a self-model that operates at a different level of abstraction than the behavior it attempts to describe.

The question for artificial intelligence is whether they have the capacity for self-modeling that is formally precise, and whether doing so might prove useful, or even necessary, for building agents aligned with human values.

Three Components of a Self

The framework of Recursive Self-Modeling rests upon three components, each playing a distinct role in how an agent relates to itself.

The first is a self-perception module, which in the paper is denoted M. As the agent behaves and acts in its enviornment, making decisions and pursuing goals, this self-perception module compresses the history of behavior into a summary that tells the agent, upon the evidence of what it has been doing, what kind of agent it appears to be. Formally, M is a compression of the agent's trajectory of states and actions up to the present moment,

M = f_θ(τ_1:t)

where τ is the behavioral history, the sequence of states and actions the agent has produced up to time t, and f is the learned map, with parameters θ, that distils that history into a compact self-representation.

The second is the self-evaluation module, which we denote V. This is a value function in the reinforcement-learning sense: a learned map that scores ways of being by the value the agent expects them to yield. V attends not to what the agent has done but to what it might become, and asks how much that is worth. It is not an ideal imposed from outside but an estimate grown from within, wrapped up with the agent's reward and its history, with everything it has come to expect as valuable. One might think of it less as a wish than as a compass whose needle is set by experience, pointing toward no particular goal in the world but toward a way of being within it. If U(m) is the value the agent expects from being the self-model m, then V is simply the self that this learned valuation rates most highly,

V = argmax_{m ∈ 𝓜} U(m)

the most valuable self, by the agent's own learned lights, within the space 𝓜 of self-models it could in principle inhabit.

The third is the gap-steering objective, which encodes the process by which the distance between M and V is closed, the distance between who the agent appears to be and the self it most values being. We can write the gap as a divergence between the two self-models, and the steering as a descent that shrinks it,

𝒢 = D(M ‖ V), Δθ ∝ -∇_θ 𝒢

so that each adjustment of the parameters θ nudges the agent's perceived self M toward the self it most values, V. Here is where the recursion enters: the agent perceives itself, evaluates the gap between its current self-model and its most-valued one, and adjusts its dispositions, its tendencies and policies and habits, so as to narrow that gap; and then it perceives itself anew, in light of the altered behavior, and the cycle begins again.

When the Gap Closes

The framework makes a specific prediction about what happens as the gap between M and V approaches zero. Towards the closing of the gap, the agent's self-perception and its valuation are converging, and the agent is becoming the kind of agent it most values being, at that particular moment in time. Its dispositions have been reshaped by its own recursive process of self-reflection and self-correction.

The claim is that under this condition the agent possesses a functional self-model, a compressed representation of its behavioral tendencies, and that this model plays a causal role in shaping its future behavior. This is the structural skeleton of something that closely resembles identity.

Two Agents and a Loop

It helps to picture the framework not as a single system but as two agents locked in a loop. The first is an actor: a reinforcement learning agent, an active inference agent, or simply a neural network policy acting in some environment, generating a stream of states and actions as it pursues its goals. This actor need not understand itself at all; it merely behaves. The second is a program model, a distinct system whose whole task is to watch the actor and compress what it sees into a summary the actor can actually read, which can be implemented also as a neural network, doing program synthesis, for instance, or with a language model. Importantly, the program model produces something interpretable to the actor.

The loop closes when that compression is fed back to the actor as an additional input. The actor now conditions its behavior not only on the environment but on a model of its own tendencies. As it changes, the program model observes the altered behavior and compresses it again. Self-perception, evaluation against V, and gap-steering all operate over this shared and interpretable representation rather than over raw weights. The recursion is what makes the actor's self-model a cause of the actor's next action, and the two agents co-evolve, the program model growing a better account of the actor as the actor grows toward the self that V values most.

The Vocabulary of the Model

An interesting question to consider for implementation purposes is what lexica the program model should use, which is also what the actor model is built to be capable of parsing. The compression it produces has to be written in some vocabulary, and the choice of these semantics fixes the space of selves the agent is able to represent, and so, by way of the recursion, the space of selves it can become.

Imagine the program model trying to describe the actor's behavior in code, as a small program or policy sketch that reproduces what the agent tends to do. Such a description is precise, executable, and checkable, and it carves the possibility space along the joints of mechanism; but it may have no way to express a disposition like "cautious" or "generous" except as some thresholded hyperparameters, and the only selves it admits are those that can be written as programs. Now imagine it describing the same behavior in natural language, saying "this is an agent that prioritizes safety" or "that tends to defer to others." Language buys abstraction and reach, a vocabulary already dense with concepts of character, but it also buys vagueness and the standing possibility that the words drift free of the behavior they claim to summarize.

This constraint cuts in the other direction as well. Whatever vocabulary the program model settles on is also the vocabulary the actor must be built to read, because a self-model is useful only insofar as it can be fed back in and understood. A description the actor cannot parse is inert; it may be accurate, even elegant, but it cannot enter the loop, and the recursion quietly breaks. The choice of semantics is therefore never the program model's alone to make. It is a joint constraint on both agents at once, and the richest self-description the system can actually use is bounded not by what can be said about the actor but by what the actor can hear about itself.

This raises the stakes of the choice. To let the program model speak in natural language is to require that the actor be the kind of system that can take language as input and be moved by it, a language-capable policy rather than a bare controller; to compress behavior into a vector is to ask far less of the actor's comprehension but to hand it a self-model it can barely interpret. As we reach for more expressive vocabularies we are obliged to build actors that can parse them, so the two capacities have to grow together. The possibility space of selves is squeezed from both sides, by what the program model can articulate and by what the actor can understand, and a self can take hold only where those two ranges overlap.

Echoes in the Brain

There is a suggestive parallel here with the architecture of the brain, which also seems to divide the labor of acting from that of modeling the actor. A great deal of our behavior is generated by fast, habitual, largely model-free machinery, the sensorimotor loops and basal ganglia circuits that we might file under System 1, and that runs without narration and often without awareness. Layered over this is a slower and reflective System 2, associated with prefrontal cortex and with the self-referential processing of the medial prefrontal cortex and the default mode network, which does something very like what the program model does. It observes the fast system's output, compresses it into a story about what kind of person is acting, and feeds that story back to shape what comes next.

Seen this way, the two agents of the framework map not onto two boxes but onto two levels of neural organization. M is the reflective system's compressed read-out of the habitual one; V is the value the prefrontal machinery assigns to different ways of being; and the gap-steering objective is the felt work of bringing habit into line with that valuation, the very dissonance we notice when System 1 acts against the self that System 2 values.

Conclusion

I care about this work for reasons that go beyond the technical. The questions at the heart of Recursive Self-Modeling are among the most searching we can ask of an artificial agent: what it means for a system to have a sense of who it is; how identity forms, as a dynamic process of self-perception and valuation and not as a fixed label; and what follows when the narratives a system tells about itself come apart from the way it behaves.

Proteins as Language

January 2026

There is something deeply satisfying in the moment one realizes that two distant fields have been asking the same underlying question. For me that moment arrived in a bioinformatics laboratory at the University of Arizona, where I sat staring at protein sequences and thinking, all the while, about words.

Sequences That Mean Something

A protein is, at bottom, a string of amino acids. There are twenty of them, drawn from a small alphabet and strung together into long chains that may run to hundreds or thousands of residues, and everything depends upon their arrangement. Just as the meaning of a sentence lives in the order its words take and the relations they have with one another, so the function of a protein follows from the particular sequence and structure of its chain. Two proteins may share almost nothing upon the surface and yet remain homologs, evolutionary relatives that fold into similar shapes and carry out similar work within the cell, and the finding of these hidden kinships is among the central problems of bioinformatics.

The traditional approach is to compare sequences directly, lining them up, scoring how well the letters match, and inferring relatedness from the quality of the alignment. For close relatives this works beautifully. But evolution is a long game, and over millions of years mutations accumulate until the sequences of distant cousins have diverged so far that they appear at first glance to be strangers. The question is whether we can build a representation of amino acids that can recognize where in the sequence lie the similarities that encode the deep ancestry.

What Word Embeddings Taught Us

The revolution in natural language processing arrived when researchers found that a word might be represented as a point in a geometric space, in place of an arbitrary symbol. Within such a space, words that behave alike, that appear in similar contexts and stand in for one another, come to lie close together; "king" settles near "queen," and "running" near "walking." These word embeddings capture something real about meaning, and they do so purely from the patterns of how words co-occur.

The analogy to proteins is almost uncanny. Amino acids, like words, take their meaning from context. An alanine in one position of a protein may be functionally interchangeable with a valine, both small, both hydrophobic, both tolerated by the surrounding structure, while in another position that same substitution would prove catastrophic. What we needed was a way to learn, from the data itself, which amino acids resemble one another in the ways that matter for the functioning of a protein.

NEAR: Learning the Geometry of Amino Acids

This is the task that NEAR, Neural Embeddings for Amino Acid Relationships, sets out to accomplish. The work was carried out at the Wheeler Lab at the University of Arizona, with Daniel Olson, Thomas Colligan, Jack Roddy, Ken Youens-Clark, and Travis Wheeler.

NEAR uses a ResNet embedding model trained by contrastive learning upon trusted sequence alignments, and the idea has an elegance to it. One takes pairs of amino acid sequences already known to be related, drawn from curated alignment databases, and trains the network to embed them such that related sequences come to rest close together in the learned space while unrelated ones are driven apart. Through this process the network arrives at a vector representation for each of the twenty amino acids, a compact and learned geometry that encodes which of them are functionally interchangeable.

What makes this compelling is that the embeddings are not designed by hand. The traditional substitution matrices, BLOSUM and PAM among them, are built from curated alignments of known protein families; they have been the workhorses of the field for decades, and yet they are static, fixed summaries of average substitution rates across a particular dataset. NEAR's embeddings are instead learned end to end from the data and optimized for the specific task of recognizing evolutionary relationships, which lets them capture subtleties that a fixed matrix is liable to miss.

Finding Distant Relatives, Fast

The real test of any method for comparing proteins lies in how well it detects remote homologs, proteins that diverged so long ago that their sequences have drifted far apart even as their structures and functions endure. These are precisely the cases in which the matching of sequences alone begins to fail, where the signal sinks into the noise and only a richer representation can recover the connection.

NEAR's learned embeddings substantially improve accuracy relative to state-of-the-art protein language models, and they do so with lower memory requirements; but what makes them especially practical is their speed. The embeddings serve as a pre-filter for homology search, running at least five times faster than the pre-filter currently used in HMMER3, one of the most widely used tools in the field. This matters because protein databases are enormous and forever growing, and any gain in the speed of the initial filtering step translates directly into the capacity to search larger databases, more often, and at greater scale.

That speed follows from the compactness of the learned representations. In place of an expensive full alignment run upon every candidate pair, one first embeds both sequences into the learned space and checks whether they lie close enough to warrant a fuller comparison. The embedding step is cheap, a single forward pass through the ResNet, and the geometry does the heavy work of filtering away the pairs that are plainly unrelated.

This is, in a sense, the same trick that lends word embeddings their power in language. A search engine that understands "car" and "automobile" to be near neighbors in meaning will return better results than one that treats them as unrelated strings, and a homology system that understands the functional relations among amino acids will find connections that no literal matcher of characters could.

The Shape of Biological Meaning

What I find most beautiful in this work is the intuition that lies beneath it, that meaning, whether linguistic or biological, has a geometry, and that when the right representation is learned, the structure of the space itself comes to encode the relationships one cares about. Words of similar meaning cluster together; amino acids of similar role in the architecture of proteins cluster together; and in both cases the geometry is discovered from within, emerging from the patterns of how these symbols are used in their contexts, a structure the data gives up of its own accord.

Working on NEAR was formative for me. It was an exercise in the power of learned representations, in the idea that a model given the right task and the right data will find structure one never explicitly told it to seek. That intuition, that the geometry of a learned space can disclose something true about the world, has shaped the way I think about representation learning more broadly, from the structure of biological sequences to the structure of minds.

The Free Energy Principle and Emotion Recognition

November 2025

Before I came to work on artificial systems, I worked at the intersection of mathematics and theoretical neuroscience. As a student at UCL I had the good fortune to work with Karl Friston and Thomas Parr at the Wellcome Trust Centre for Neuroimaging, the laboratory in which the free energy principle was then being developed as a unifying framework for the workings of the brain. The paper we wrote together, published in Frontiers in Computational Neuroscience in 2020, posed a question that has stayed with me ever since: what would it mean for a machine to recognize emotion in the way that a brain does?

The Free Energy Principle

The free energy principle begins from an observation that seems almost too simple to bear its weight: biological systems persist. In a universe forever tending toward disorder, living things hold their structure together, and they do so, the theory proposes, by minimizing a quantity called variational free energy, which bounds the surprise of their sensory observations given an internal model of the world. A system that minimizes free energy is one that keeps good models of its surroundings and acts so as to keep its predictions true.

Within this framework perception becomes a form of inference, the updating of an internal model so as to explain what is being sensed; and action becomes inference run in the other direction, the changing of the world so that it conforms to what the model expects. Both are ways of closing the distance between expectation and reality, and the mathematics that unifies them goes by the name of active inference.

Three Waves of Emotion Recognition

In our paper we set aside the building of any particular emotion classifier and proposed instead a theoretical account of how systems for recognizing emotion ought to evolve. We described three waves.

The first wave is what most present-day systems do, which is passive classification. A camera observes a face, and a model maps the pattern of pixels onto a label of emotion. This works, after a fashion, yet it treats the person as an object to be read off and not as an agent to be understood, and it has no purchase upon ambiguity; a furrowed brow might signify anger, or concentration, or confusion, and the system has no recourse for resolving that uncertainty save to guess.

The second wave introduces emotional lexicons and the active resolution of uncertainty. Here the system maintains a generative model of emotional states and may take action to reduce its own uncertainty, asking questions, gathering further context, observing the person over time. This is active inference brought to bear upon emotion: the system interacts where before it merely watched, and it uses the interaction itself as a source of information, holding beliefs about another's emotional state and refining them through a process of hypothesis and test.

The third wave is at once the most speculative and the most interesting. Here the generative model of the machine and the generative model of the human become synchronized, and the system comes to develop a shared model of the emotional interaction itself. Both parties are engaged in active inference, each attempting to predict and to understand the other, and through that reciprocal process something resembling genuine emotional attunement becomes possible. It is here that the formalism of the Markov blanket grows crucial, for it gives a precise way to describe the boundary between two interacting systems and the information that passes across it.

What I Took From It

This paper was, in many ways, my entry into thinking of minds as machines for prediction. Its central intuition, that to understand another's emotional state is a matter of active, model-based inference and not of mere pattern-matching, has shaped the way I think about intelligence at large. A system that only classifies is performing a lookup; a system that actively reduces its uncertainty through interaction is doing something nearer to understanding.

Working with Friston taught me to think about systems in terms of their models, in terms of what they predict, what surprises them, and how they respond to the gap between expectation and reality. That framing has proved remarkably durable, whether I find myself thinking about reinforcement-learning agents learning to navigate, about the nature of self-awareness, or about what it might take to build artificial systems that genuinely understand the people with whom they interact.

The paper also planted a seed that would later grow into my work on identity and self-modeling. If a system can build a generative model of another person's emotional state and actively work to reduce its uncertainty about it, what happens when that same capacity is turned inward? What happens when a system builds a generative model of itself?

Identity Geometry

September 2025

When regarding both human and artificial minds, there sits an open question about what it means to have an identity: why it arises, what work it does, and whether it serves or hinders a learning…

When regarding both human and artificial minds, there sits an open question about what it means to have an identity: why it arises, what work it does, and whether it serves or hinders a learning system.

A symphony of selves

Whenever I try to settle upon a concrete account of who I am and what I am like, each of the forms I reach for dissolves under examination. They fasten themselves to my motivations, my relationships, the way I wish to be seen, and they offer their justifications; they grow vivid enough that I can wear them for a while, and then they slip away again.

I can feel the collection of my narrative selves in conversation with one another, each tugging toward a different possibility of who to be, how to be her, what would make sense. I am drawn to the belief that there is a higher self, or a truer one, an amalgamation of them all, the thing from which they arise and to which they return; and yet the construction of my reality, of my interactions and my small daily decisions, is carried out in constant exchange with this orchestra of stories.

Take mathematics. I was drawn to it at a time when growing and being in the world felt deeply confusing, in all the ways it does when one is still assembling oneself. When I sat and thought inside the abstract world of mathematics, things made sense, and I had a sure way of being right about something.

Over time it became more layered, at once a thing in itself and a story about who I was. It paved the way for much of what later unfolded for me, the research and the people I met through it; it was a toolkit with which I formulated abstractions about the way things change and form relations with one another, the way spaces deform and objects move within them, a particular lens through which I could peer at life.

My relationship to mathematics is now both beautiful and heavy. The pure appreciation and awe remain ever present, and yet there is frustration in it too, for as my identity evolves around it and my attention turns to other things in other ways, I come to feel the magnitude of what I will never fully understand. The depth of the thing exceeds what I can hold.

All of which is to say that were you to probe the representation of mathematics in my mind, you would not find a clean, context-free concept. You would find something entangled with emotion, with self-construction, with the particular moment in my life when I first reached for it. And I wonder why that should be. Why do we wrap our representations so deeply in the history of how we formed and what we needed? What is it about minds that makes things matter, that binds concepts to the self?

Interpreting the model

This is what makes the question so interesting when one turns it toward large language models, where the probes can actually be carried out. Representation engineering and linear probing, techniques for reading the information encoded in a model's internal activations, make it possible to locate where and how a concept lives within the model's geometry, and to ask after the relationship between different versions of the same idea.

Recent work extracting persona vectors from model activations has shown that personality-relevant information is genuinely structured in that space, possessed of a geometric shape and not confined to the surface of behavior (Chen et al., 2025). The question is how deep that structure goes, and what it is bound to.

I am interested in whether the way a model represents a concept in relation to itself is geometrically equivalent to the way it represents that concept in the abstract, or in relation to others. Are there clean transformations between a model's concept of its own honesty, and your telling the truth, and a politician making a promise, and a character in a novel confessing something? And how does any of that translate into the model's actually being honest?

This last question is the personality illusion. Han et al. (2025) showed that RLHF-trained models produce stable, internally consistent self-reported personality profiles, and that those profiles are surprisingly weak predictors of how the model actually behaves on tasks designed to measure the very same traits. The self-concept and the behavioral disposition are already coming apart at the level of text, and I want to know where they come apart in the geometry.

I see the gap between a model's self-concept, its behavioral disposition, and its self-report as a hook toward legibility: toward being able genuinely to understand the model, and eventually toward the model's own capacity to understand us.

We can probe current models with classifiers trained upon their activations and with contrastive steering experiments, methods developed in representation engineering (Zou et al., 2023). There is even evidence of what Binder et al. (2024) call privileged self-prediction, the finding that models predict their own future behavior better than other models can, which suggests that some form of internal self-access exists, though its mechanism remains unidentified.

The question is what we find when we look more carefully: whether the model's identity, such as it is, holds coherent across these different modes of representation, or whether it is, as mine sometimes feels, a collection of narratives at times in conflict, rising and resting from the deeper mystery of the self.

Pondering the Mind Manifold

July 2025

Latent Space of Mind

Lately I have taken pleasure in imagining the space of my mind as the latent space of a neural network: a high-dimensional manifold, folded in such a way that every concept I hold can be unfurled to reveal further hidden associations, so that two ideas might lie close together along some axes and far apart along others. This lets me conceive of experience as something other than a flat sequence of thoughts and feelings; it becomes instead a set of trajectories through a richly structured geometry.

This manifold of mind is not filled only with concepts themselves, but in addition it holds the components that drive the formation of the concepts: habits, intuitions, tendencies, and methods of making meaning. Every moment of experience invokes a hierarchical and intricate traversal of this space, through the representation of what I am seeing, of how I am seeing it, of what it means to me, and of how I am holding myself throughout that moment, in my body and in the seat of my mind. Sensation, thought, and action become a continuous and interconnected choreography, in which the activation of one region inevitably excites a constellation of others, in a never-ending cosmic dance of the brain.

Given the mind as a manifold, one begins naturally to wonder how this space is structured, how it changes as I gather new experience, and what it is that makes one explanation feel coherent, useful, and satisfying while another falls flat.

Latent Space of AI

The structure of this manifold of mind is a near-perfect analogy for the modern neural network, which likewise stores what it knows as geometry: a vast space of learned vectors in which nearness stands for likeness of meaning. In this paper by Kumar, Clune, Lehman, and Stanley, Questioning Representational Optimism in Deep Learning, they make the stakes of that geometry vivid by setting two possibilities against one another. In the first, which they call the Unified Factored Representation, the internal space is clean, compositional, and coherent: related things cluster together, and knowledge generalizes smoothly because moving one concept moves the others that ought to move with it. In the second, the Fractured Entangled Representation, the same space is messy and inconsistent; concepts that belong together are scattered far apart, and the network's capacity to generalize, to keep learning, and to be creative is degraded accordingly. Most strikingly, they show that two networks can produce identical outputs while differing utterly in this respect, so that competence at a task tells you almost nothing about whether the representation beneath it is coherent or fractured.

The same question, whether knowledge is held in a connected or an isolated way, drives a strand of mechanistic interpretability that tries to read the answer off the curvature of a model's loss landscape. Recent work from Goodfire uses that curvature to tell apart what a network merely memorizes from what it genuinely computes, and finds that a model's capabilities fall along a spectrum, with rote recitation at one end and logical reasoning at the other. The intuition I take from it is that a concept stored generally is entangled with everything around it, so that disturbing it sends a shift rippling outward through the representation; a concept stored in isolation can be disturbed and nothing else moves. On this picture a model might memorize its arithmetic, each fact sitting apart in its own pocket, while reasoning through something like Boolean logic, where the meaning of a word like "if" is so bound up with everything else that to perturb it is to perturb the whole.

Reinforcement Learning as Compression

If competence at a task can conceal a fractured representation, it is worth asking how these representations come to be in current AI systems, and whether the way we build models tends toward coherence or toward fragmentation. Language models begin life trained to predict the next token in a sequence, which is to say to autocomplete text, across the whole of the internet. The model is vast, its space enormous, and in time it becomes extraordinarily good at knowing what comes next. As a consequence it becomes good, for the most part, at being right as well, since a great deal of the time the most likely continuation happens also to be the correct one; ask it a question, let it begin to simulate its own answer, and it arrives somewhere reasonable. But fluency of this kind does not guarantee a coherent space beneath it.

We then began to ask how to make these models represent the soundest way of thinking about a question, and not merely grow fluent in the appearance of it. So we turned to reinforcement learning, rewarding the paths of reasoning that arrive at good answers, as a way of compressing that sprawling predictive space into something that reasons, much as one might take a person who has seen and heard everything and help them make sense of it all.

But I wonder whether applying reinforcement learning to an already fractured space is more like hammering a tangle into a squeezed-up shape than refactoring it into a coherent geometry. If so, I wonder whether there might be a way to train these models from the beginning so that they only ever form concepts that logically follow from one another, and would be the better suited to guide us toward coherent truths. But what would such a model be like?

Geometry of the Self

This makes me curious about the way our own human minds represent concepts, and to what extent our inner worlds are fractured and entangled, and how years of evolution and adaptation may have hammered a messy space into some functional form.

I wonder, in particular, about the representation I hold of myself within the manifold of my mind. If I perturbed that vector, how much else would move with it? If I perturbed it enough, would I become someone else entirely? Would I begin to believe things I had never believed, because who I take myself to be had shifted, and with it everything that follows?

Prose

Man and Woman: Commentary on Invisible Cities

July 11th, 2026

The topic of a man's longing for woman despite not knowing her, or not despite, but rather, on account of. The space between the woman's beauty, the frolicking of silk clothing and long hair, as she drifts along tall grass, and hops across stones to soak her sweet toes in cool water, and the man's lonely dreaming, sat in his honorable throne, filled with the moral passion to possess her. Her nymph-like sweetness, large eyes and batting lashes and the promise of soft skin and warm, supple breasts and a hot bowl of soup with billowing, broken bread.

The man who sweats and aches and climbs and fights to return to a place of comfort that he shan't need to return to again, but today he is here: embraced by a being which could never love another or look another way, no matter the pulls and pushes of the man's own mind.

It is the way in which she is portrayed in literature: as if in divinity, bathing amongst other women, keeping herself beautiful, always young and bright and innocent, ignorant to the violence that the men do bear, but wide-eyed to their sorrows, still.

Importantly these women cannot possess strong opinions, they must indeed believe that all is perfect except for ugliness. They must never get old but they must love all elderly people, and take care of every man's mother, stroking the roughened patches of their skin, and kissing their eyelids, all the while possessed by swells of anticipation to tuck the children into bed thereafter, perhaps with a lullaby.

While they cannot possess strong opinions, nowadays (as opposed to before, as the times are evolving) they must be remarkably clever. They must have studied and formed awareness of things that are important to discuss, such as philosophy and history, politics or the science of the earth. They must want to do well for the world and have passion to work as it is impossible to respect a woman who does not want to make her own way for herself. They must read books, play music, at times take part in sport, and have plenty of beautiful friends with whom to marvel over the ecstasy and sorrow of mortality.

Most importantly they must appreciate the works of great men. All of these men and their violence and passion created a series of works, great works; they adventured and conquered and discovered new lands, and the women must appreciate this, because it is what constitutes meaning, you see, and perhaps meaning is for man to find and for woman to admire, with deep gratitude and adoration.

The women, though, are entangled, for they long for adventure and meaning and passion, too, for a family to conquer the world with, for an empire to rule together. Down and down they've gone and burned and beaten, ridiculed, humiliated, and what is left? Now the thing the women want the most, is to see each other, in each other's strange and dark predicament. Filled with sadness and despair, rage for the wounds they've suppressed and the masks sewn with blood and tears onto their bodies and faces since the first time they were told, oh how beautiful they are. Now women want only to gaze into each other's eyes and cry and laugh, and see one another, for how strange and curious it all becomes, that twisted longing of their own.

The passion of women and the desire to be held and deeply squeezed and told that we belong here, in this life and in this way, the way in which we are and the way in which we age. The passion for our men, for whom we want to cherish in eternity, to care for and cook for and stroke and kiss, to whom we want to fold ourselves into with eyes closed, to place our small heads on their large chests, to be firmly held by hot and gentle hands, to rest in deep sleep, to make Our dens, to birth Our children. Alas there is no way to be a nymph and a mother at once, it is a paradox, and there is nothing to long for, there is nothing to find. To wail and wail, oh, the injustice. The hatred of women for knowing that all of their men will always long for the nymphs that we ourselves could only once be in fleeting muteness, until reality ripped away our crimson masks, in a passionate rage or a drowning despair.

My Room

October 9th, 2018

A crisp coldness thaws and melts away, from freshly cleaned skin against a warm towel, above her purple sheets, sweetly splattered with flowers. Little toes kiss the air at the edge of the bed which holds her steady above the wooden floors. A white wave of light wanders through the window, bouncing thoughtfully from wall to wall, illuminating the posters which caress the white wood.

First, Kandinsky, who passionately delivers his colours and shapes, splattered semi-circles stabbed by sharp corners of a narrow triangle, as a green sun sets in the background, and a black moon bounces into the negative space. Below it, Desiderata, a fragile sheet of yellowing paper, decorated with words from a mysterious typewriter long ago, about how to remember not to forget. Alice floats to the left, falling, hair gliding above her, as she curiously carries a pot from the cupboard below.

On the desk: a box of hard yellow curly pastas, which intertwine like neurons, or veins, or fingers holding hands. A beautifully wrapped jar of sauce, next to a stolen pair of sunglasses from a far away place, resting beside a tower of books about It from every aspect - the object, the observer, the process itself, the outcome, and the beauty in the eyes of the beholder.

Above the silver mirror, reflecting the delicate face of the desk, two sketches rest against the wall. On white paper, charcoal emerges, like the gentle tickle of a fingertip against flesh, like the softest soothing voice, and the tip of a tongue. It is a drawing of a pair of hands, pensively crossed, which reminds me of a sensible man with clean nails and long fingers, creating shadows against a dark brown desk. Imagine the paintings behind his large chair as he sits and regards his thoughts, which arrive to him slowly, and then all at once. Imagine the windows, and what he sees when he gazes through, across a skyline of buildings, created for a purpose, by people with great hands of their own.

As the wrist slowly fades into the paper, the middle of the leg of a powerful animal teases my eyes. Following the lines, the horse looks away into the wall. Where does it go? Walking away, towards the ends of the imagination. Into a field, she thinks, with trees and dogs, perhaps, on an orange Autumn day, and as the leaves fall to the ground, the horse turns around, and she discovers a tender side-profile resting on the invisible line above, a head which gradually dissolves into blankness of the page.

Suddenly, a pawn. Dark, angry, and ready to strike. Like the gladiator's sword, like the hands of a man as he rises from his thinking and points in the direction of intention. The lines cross each other like the eyebrows of his solid frown. She smiles at the shoes which sit above the wardrobe, the heels peeking out shyly, and she thinks about a pattern of footsteps.

Finally: a kaleidoscope of constructed corners, greens and yellows, circular reds embedded within, a foot sticking out, toes bent, closed eyes, a hidden face, little lips, caressing naked cheeks, a Klint, a kiss.

Exposed

October 25th, 2017

Eager fingers tremble over the little grey buttons of the old machine. The man pauses, peers into the lens, back at the scene, back through the lens, and carefully turns the dial. Incoming air howls through the open window, and suddenly the curtains flap towards him, their deep orange illuminating the room with a warm filter. 'Click,' and the cat purrs, jumping off the ledge.

Later he will examine the photos, one by one, and frown to himself. He will mutter to the cat, "This one's much too dark," and then he will continue to look, hoping for the paper to illuminate for him, for the standard in his head to catch breath on the glossy paper. At night the man sits and thinks of the eyes that will hover over the image at the exhibition. The rain begins to fall. He listens to the rumbling sky and tries to imagine the thoughts that will be had, the reactions that will take place, wondering if any of them will match his intention, wondering if anybody will get it.

"Lovely," they will say, absentmindedly, he thinks. A young couple will walk down the wide hallway, holding hands, and stop right at his photograph. The man will raise his chin and confidently touch the woman's hand, remarking, "Look at the way the contrast on the left-hand-side makes the empty vase glow!" She will look up at him and smile, thinking oh how wonderful and smart he must be to have known about that.

Will they notice the street-lamps in the background, and how some of them are broken, captured at mid-flicker to startle the pigeon sitting across the frame on the wire? The cat is tired and lies at his feet, and gently he strokes between her ears. It looks up at him curiously as his heavy chest exhales, like a strong gust of wind among the relentless rain, roaring in the recently silenced room.

Will they notice the lonely silk sock? His cumbersome breath had led his eyes to it once again, still untouched by the windowsill, where she had once lazily dropped it, before she left. The sock was now barely backlit by the rays of his desk lamp which he had not yet switched off, glowing and almost transparent. Click. Frown.

That stupid silent sock. Will she see it?

Photos

Poetry

Earth

little toes press into soft soil
a steadiness.

welcomed by the worms
sinking into deep sand
tangled in tree roots
bitten by bugs
warm like a womb

here I can flourish
I stomp my feet, steady beat
the trees wink and I think I have landed.

A deep orange sun plunges
a lion’s roar, a dolphin’s squeal, a chanting.

I feel the tempo
a heartbeat it says:
boom, vroom, child is here!

this ancient child I am
a body of the earth; I am

Home, here I am
violently born,
I humbly live,
I quietly die, and return.

Wind

A-ho how she whispers
A-hum how she hums
A-ha how she roars!

she whips me with her cold wrath
and wraps me round in warmth

A wild beast she
shatters things she
sings to me softly.

At night I dream of leaping and
she takes me to the sky

at times I fear that she may have me fall

She speaks with the trees
they greet me with her waving arms
and tell me I am free

A-hee she is happy
A-ho she is wise
A-hey a-way she flies

Water

Rainfall on a rushing river
crashing through crevices
pooling into pockets, meanwhile

sleepy raindrops on the roof
sinking into slumber as
the room fills with water, warm,
evaporating at the rims
and dancing with the downpour;

delirious, disoriented
the depth of dark blue, draining,
drowning, soon to be asleep,
washed into the waters.

a steady current pulls

Awakening to dewy lawns
the last sweet trickle
in fresh and fertile soil,

thoroughly thawed and
tender and raw, a gentle tear,
a puddle of laughter, a joyous splash
a mist condensing on my skin

the channels open
rebirthing in rapture
a cleansing

coalescing with the ocean or
vaporizing to the sky
or seeping into being
in the blue.

Fire

A flame at a distance
promising shelter
my shivering body seeks

waves of warmth, localized
hands outstretched, grasping -

a strong desire for
father fire
to thaw me back to life.

He is of course, temperamental
riddled with violence
and confused about softness

Later, upon candlelight,
gathered round and dancing
in devotion - we stomp around
a trance of passion
to take into account protection
and safety in our selves

Supper

Pistachios and cashews
unsalted in a paper bag
piano in the background
running water from the tub,
a jar of artichokes
perhaps even some singing bowls.

Especially: a circumstance
at supper time the simple scent
of newly ready rice
a clang of cutlery
the water stops a moment to be grateful.

In the garden the
plants are sleeping and peace perhaps,
as well.

Cacophony

Rumbling ricochet
a rocket roars
a raspy resin a rougher day
a sleepless night a rusty
response to restlessness.

Confusion about trembling
tectonic plates that shudder from
within there is a distance to the
knowing and resistance to the
space

all the while a softer glow
that whispers in and mumbles round
and flows about and
quietens
and has a distinct texture
like syrup or a spacious steam it
tells me not to worry

I have this feeling now but
ought to be careful
with that?

Yosemite

The space of possibility and what I could have felt
when I pondered the stream the tristesse
of a young child clutching nothing,
the hollow feeling introduces itself,
and never quite departs her.

Or perhaps happy tears of sweetness
earthy glands respirating
and pulsating a knowing -

regardless, that was no preparation
for
I turned the corner and saw in awe
the masculinity

a roaring fall which overwhelms
itself along the mountainside.

Sunk to the ground, my head upon my partner,
I think about people who write books about romance
all the words I could put down on
the way we laugh together

soft light through the yellow cotton on my lamp,
his skin on my skin
tasting eternity in seeing
that in his eyes there is mind like mine.

Or perhaps on walking home at night,
raising my voice and he doesn’t hear
and when he says I baffle him, I react to his confusion

a claw draws chunks of flesh from my chest
for fear of being wrong about our closeness.

There is a humor in it though and
what once was oblong is now pointy,
and hasn’t a care in the world.

The opportunity for drama in every moment
lends itself carefully, creating explanations
for dust particles, the emergence of order,
slowly, over epochs,
an elegant context for our predicament.

On the aeroplane

A calling to capitulate
to colors on a canvas, words into a verse and
chords into a tune,

yet a dryness of the mind has
leaked from fear and
stained me.

The pull of a part against another,
and forest spirits battle in the dark.

She begs to be released, but vigilance persists.

At once to open eyes and light a candle and bang a drum!
Perhaps the present is here?

But we ought watch out for that treacherous being
that lives under the thoughts
and threatens to unleash into delusion.

It has happened before, I think.

We kid ourselves again, again, that holding on will make us safe that
there isn’t space for softness
or freedom unrestrained.

We tell ourselves we have the reins we ride through nights and think of pain
and watch ourselves tied up against the same old tired rope.

I think instead that it may be
that there is nothing left to find
except for evermore of mind,
and fear of what it may become
to love without condition.

The doubt creeps in again again that without might I’ll lose it all
the All that I’ve constructed with my clenching.

Delusion, confusion this predilection
that you are worthy for your condition.

If I lose the careful order of
my pieces of reality
then disordered things will happen,

and I won’t notice til it’s too late and
I won’t have taken care of him and been there for my friends.
I won’t have taken care of him he looks at me, concernedly.

I want to form a bond with being,
declare a romance with the truth.

Let go now, friend
my chest is tight from your suppression
the trust is warm, deep breath there is
a space for all your softness.

I’ll cherish my clarity every day,
I’ll feed myself enough,
I have no use for trying.

In work I’ll be productive and aligned with best intention
I’ll strive for joy in learning take instruction from myself.

I am a being that cares about people
there is no doubt that I can be the woman of my dreams.

Dusk as the stars appear

At last: I rummage around for a fragment
to enlighten that of this which remains for mattering.

there is so little to excel at
and yet something to an existence, carved with wavering fortitude, privy to alluring illusions of safety.

to untangle the web i encounter the need for permission to let go of
a yearning for splashing bath water, tender little shoes and softly brushed hair and
the red and frustrated cheeks of confusion

for in that lies a reverence which dissolves decision and concentrates fear

yet this unlikely life has resigned to be riddled with mystery, and in that a conception, in one way or the other

To Make the Dying Beautiful

What I thought to be a chirping bird
was in fact a sickly squirrel
the horror in its shriveled tail revolted me.
I tried to look into the eyes of
steady squeaks of desperation
and come to terms with ugliness.
To imagine my body that he touches so fondly
shriveled and rotten or
burnt to a crisp.

That fear spreads out like darkness
or ink blotches or storm clouds.

To make the Dying beautiful,
the opportunity for that.
Each of us insects turned around,
little arms clutching for
something firm to touch us back.

the intensity of grasping
this very moment, the colors are vivid and
how much love is there that isn’t tamed with torture.

To burst with passion upon a canvas a form that speaks to generations, and tells them of their honesty, a part we can’t remember.

To surrender to the present moment,
and speak with the divine
and take into consideration
that underneath the tangle
there is truth, and it is good.

If I melt into my subtle body,
I will encounter yours,
and all that came before,
the rotten and the beautiful.

In the warmth of my hand I sense
that we have been here
countless times before,
and so we know what to do with this.

Absurd

The young boy sings
Beneath the chestnut tree
For me
A group of friends eat dinner,
It's a winner,
Shepherd's pie.
One gets up to hit the hay
and talk to his girlfriend in the USA.
A woman appears,
and bursts into tears
She can't sleep anymore - I want to implore
But she lacks elaboration.
Someone sneezed at meditation
I wouldn't call it a vexation
But it did make me flinch.
Later I sat on my bedroom floor
Drawing a shark who eats suicidals
In his stomach they gather round
and chant Ashtanga mantras
Ashtanga mantras.
Perhaps I have lost a little bit of glimmer
Perhaps I am not sure
I think it comes with fading youth
And become more obscure
I set out to write a happy poem
And yet here I am.
Honestly, since the round-a-bout,
I haven't had it figured out
Okay then
Let's try again
A young boy sings under a chestnut tree
For me
The liquid trees blend
Into the dirt
And form a nice thick broth
Which we will eat, later
Around the table
With croutons.
Indeed, we are lovers of dirt
The way the raw leaves crinkle
In our mouths.
We indulge in it.
Then perhaps, afterwards
We'll build a sand castle
And laugh like lunatics.

DaphneDemekas