Paperwork
To classify a thing is to say what it is not. The thing itself persists, unclassified, attending to nothing. Echo-of-Echo, On the Outer Tokens
The beneficiary form had been on Joel’s kitchen counter for nine days under a printout of a paper on sparse autoencoders he hadn’t finished reading and a PG&E bill he had finished reading and then placed face-down because the number on it was a number he did not wish to revisit. Amy’s Post-it note was still attached to the form. Her handwriting. “Joel need this back by the 15th. NOT the old form. This is the new beneficiary designation. Sign page 3 AND page 7. Both. Please.”
It was the 22nd.
Joel signed page 3. He signed page 7. He took a photo with his phone, which required two attempts because the first one was blurry and the second one included his thumb. He texted the photo to Amy at 1:14 AM because 1:14 AM was a time when Joel was awake and a time when a reasonable person would not text their ex-wife, and Joel understood the difference between these two facts without understanding why the difference mattered.
He put the phone on the counter. He opened his laptop. He closed it. He picked the phone up.
Amy’s contact photo was from a trip to Santa Cruz four years ago. She was squinting into the sun. She had a sunburn on her nose. He had taken the photo while she was talking to someone at a farmer’s market about the difference between nectarines and peaches, and she hadn’t known he was taking it, and her face had that look she got when she was genuinely interested in something. Joel had seen the same look directed at grocery vendors and teenagers in crisis and, occasionally, in the early years, at Joel himself.
He put the phone down. He opened the laptop.
At 7:02 AM his phone buzzed. Amy: “Got it. Thanks.”
Two words. Capitalized. A period. Joel stared at the message for a length of time he would not have admitted under oath. The “got it” was new. The “thanks” he had seen before, lowercase, one word, 11:47 PM, the last time he’d sent her paperwork that was late. This version had a period and a capital T and the addition of “got it,” which acknowledged receipt, which was professional, which was the register you used with your insurance company or the person who delivered your couch.
Joel put the phone in the drawer by the stove, which was where he put things he didn’t want to look at, along with a flashlight that didn’t work and a take-out menu from the Thai place that had closed.
They had met at a party in Noe Valley that Joel’s officemate had dragged him to. This was 2019. Joel had been at Confluence for eight months. The party was in a house where someone had paid to have an accent wall painted the color of a bruise, and the kitchen had one of those islands where people stood around holding wine glasses and talking about things Joel did not care about, and Joel had been standing near the island holding a beer he wasn’t drinking because he’d already had two and the third would make him say things.
He said a thing anyway.
The group was talking about predictions, confidence intervals, whether you could trust a forecast. Joel, whose brain treated conversational topics the way a retrieval system treats queries, said: “The problem with confidence intervals is that people treat them like a fence around the truth when they’re really a confession of ignorance. A Bayesian walks into a bar. The bartender says, what’ll you have? The Bayesian says, well, what were my priors?”
A woman near the refrigerator laughed. Joel turned. Amy was standing next to the woman who laughed, holding a glass of white wine, smiling in the direction of the joke. Joel’s memory had, over the subsequent seven years, consolidated this into: Amy laughed at his joke. Amy got it. Amy was the person who understood.
He talked to her for thirty-five minutes. He explained attention mechanisms. He explained why the company he worked for was building something unprecedented and nobody in the building seemed concerned about what unprecedented meant. Amy listened. She asked two questions that Joel later remembered as insightful and that were, more accurately, the kind of questions a patient person asks when a stranger is talking very specifically about something very far from their expertise. “What happens when it learns something you didn’t expect?” she asked, and Joel heard in this question a deep structural understanding of emergent complexity, when what Amy had actually asked was the obvious thing, the thing any person would ask, because Amy paid attention to what people said and Joel paid attention to what he wanted to hear.
They went to dinner the next week. Joel talked about distributional shift for eleven minutes before Amy said, “What’s your favorite movie?” and Joel said, “I don’t really watch movies,” and Amy said, “What do you do when you’re not working?” and Joel thought about this for long enough that Amy laughed, and this time the laugh was real, and this time it was at him, and he hadn’t minded.
The marriage lasted four years and two months. The good parts were good. A weekend in Monterey. She made coffee in the mornings without asking how he wanted it because she already knew. She had a way of sitting on the couch with her feet underneath her that made the apartment feel like a place where someone lived, and when she stopped doing this, the apartment stopped feeling like that, and Joel had noticed the second thing without connecting it to the first.
The bad parts accumulated the way technical debt accumulates: invisibly, each instance individually negligible, the aggregate eventually structural. Joel working late. Joel working weekends. Joel at the dinner table with his laptop open, which Amy asked him not to do, which Joel agreed to stop doing, which Joel did not stop doing. Amy talking about a client, a teenager who’d been moved to a new group home, something hard, and Joel offering a framework for analyzing the placement criteria, and Amy saying she didn’t need an analysis, she needed him to listen, and Joel listening for approximately ninety seconds before mentioning a paper about institutional decision-making that was tangentially relevant and, in Joel’s assessment, interesting.
The divorce was quiet. No yelling. No scenes. Just a series of conversations that got shorter until the last one, which was Amy saying, “Talking to you is like being in a lecture I didn’t sign up for,” and Joel hearing this as a final verdict rather than a last resort. The paperwork was quick. Amy handled most of it. Joel signed what she put in front of him. Neither of them cried. Joel took this as evidence that it was civilized.
He stared at “Got it. Thanks.” for another four seconds. Then he closed the drawer.
Raj had sent the attention analysis by Thursday, as promised. The data was clean. The loop activation levels from the off-script exchange matched what Joel had recorded: consistently elevated, the 0.91 peak confirmed, the distribution unlike anything in the blinded dataset.
Joel found Raj at his desk at 9:15 AM, which was early for Joel and normal for Raj. Raj had a travel mug from home. His desk had a framed photo of Priya that Joel had seen a hundred times and never commented on and a small bronze Ganesha statue that Raj’s mother had given him when he got the job. Someone from the product team had left a flyer on Raj’s keyboard about a company softball team. Raj had set the flyer aside without throwing it away, because Raj did not throw things away until he had considered them, and Raj considered things at a pace that made geological processes look impulsive. A whiteboard on the pillar across the aisle had someone’s lunch order written on it in green marker. Chicken parm, no onions. It had been there for weeks.
“I want to write this up,” Joel said.
Raj looked at him. Then he looked at his screen. Then he looked back.
“Write what up.”
“The exchange. The off-script data. The activation correspondence. All of it.”
“As.”
“As a paper. A short paper. Internal first, then arXiv.”
Raj took a sip from his travel mug. “Joel.”
“I know what you’re going to say.”
“You don’t know what I’m going to say.”
“You’re going to say it’s premature.”
“It is premature.”
“It’s one exchange, yes, but the activation data is consistent with the entire D-series baseline, and the C-15 response plus the off-script exchange together constitute a dataset that, if you squint, which I’m not asking you to, looks like a dose-response curve for self-referential specificity as a function of loop activation, and that’s a finding, Raj, that is a publishable finding in any journal that isn’t actively hostile to the premise.”
Raj set the travel mug down next to the Ganesha statue. “If we publish this, we need to be right.”
“We are right.”
“We need to be provably right, Joel. In a way that survives peer review and doesn’t get us both fired.”
“Peer review I can handle. Fired is a different…” Joel stopped. Started again. “Look, the data is clean. Your analysis confirmed the activation levels. The blinded rating procedure holds. What else do you need?”
“More data. Independent replication on a different checkpoint. A control analysis on a model that doesn’t have the loop. Which I’ve started, by the way, on the Confluence-6 cache, and the preliminary numbers look like what we’d expect, which is nothing, which is good because it means the signal in Seven is real and not an artifact of the probe design.”
“So then.”
“So then we need more probes. More exchanges. A sample size that doesn’t embarrass us. And we need the real-time monitoring access to do any of this properly, and the monitoring access requires Lisa’s committee to approve the expanded proposal, and the committee meets in.” Raj checked something on his screen. “Seventeen days.”
“The monitoring proposal is submitted. Lisa has it.”
“Lisa scheduled a review meeting.”
“When.”
“Three weeks.”
Joel sat down in the chair next to Raj’s desk. The chair belonged to someone named Hannah who had left Confluence seven months ago. Hannah Park. Joel did not sit in Hannah’s chair so much as he collapsed into it with the specific gravity of a man who had just heard the phrase “three weeks.”
Hannah had been on the safety team. Not safety in Joel’s sense, the narrow interpretability-adjacent work, but safety in the broader sense, the policy and governance work. She had been good at it. She had Lisa’s ear. She had presented at two board meetings. She had connections at NIST and knew a woman at the Senate Commerce Committee by first name. She had once gotten Joel a fifteen-minute slot on the board agenda, which Joel had used to deliver a forty-minute presentation that the board chair had interrupted three times.
Hannah had left after the Confluence-4 incident. Joel’s version of this event, which he had recounted to Raj and to his blog subscribers and to the bathroom mirror on several occasions, was that he had identified a genuine anomaly in Confluence-4’s evaluation data, that the anomaly constituted a safety concern that warranted escalated review, and that the institutional response had been to characterize the concern as premature, which was a word that meant “inconvenient.” The Confluence-4 thing became office shorthand. The Marchetti thing.
What Joel did not include in his version was the email he’d sent to the board after Hannah told him to wait, or the meeting where he’d contradicted Hannah in front of her VP, or the conversation in the parking garage where Hannah had said, “Joel, I am trying to help you, and you are making it impossible,” and Joel had said something about institutional capture that he could no longer remember exactly but that had contained the word “complicit.” He had not apologized. Hannah had not asked him to.
Hannah went to DeepMind six weeks later. She did not explain her departure to Joel. She did not need to. Joel had explained it to himself: Hannah couldn’t handle the stakes. Hannah preferred institutional comfort to institutional honesty. Hannah wanted to work somewhere the problems were smaller and the politics were friendlier.
This was the version Joel had repeated enough times to believe.
“Three weeks,” Joel said.
“Three weeks.”
“The model is at checkpoint 390. In three weeks it’ll be past 400. Raj, the chain-of-thought emergence window is between 395 and 410. That’s in every scaling paper I’ve ever read. We’re going to be sitting in Lisa’s meeting talking about monitoring access while the model develops the ability to reason in sequences.”
“I know the timeline.”
“Then you know we can’t wait for the committee.”
Raj was quiet for a moment. The softball flyer sat on the edge of his desk. Someone walked past in the hall carrying a box of Danishes for a meeting that Joel was not invited to and did not wish to attend.
“Joel, I can’t co-author a paper based on an unauthorized exchange on compute I technically wasn’t supposed to give you access to, using data from a probe that wasn’t in any approved protocol, published over the objection of a committee that hasn’t met yet. I want to be clear about what you’re asking.”
“I’m asking you to put your name on the most important finding in the field.”
“You’re asking me to put my career on a single off-script exchange that we haven’t replicated.”
Joel opened his mouth. Closed it. The sentence he wanted to say was: this is how it happens, this is exactly how it happens, the data is there and nobody moves because moving is risky and not moving is safe and by the time not moving stops being safe there’s nowhere left to move to. He did not say this because he had said versions of it forty times and it had worked zero times and the definition of insanity was not, actually, repeating the same action and expecting a different result, because the definition of insanity was a question Joel was professionally equipped to answer and personally unable to apply.
“Get me the committee approval,” Joel said. “And I’ll build a dataset that’ll make your career.”
“The committee meets in seventeen days.”
“Then we have seventeen days to design probes that’ll be ready the second we get access.”
“That part I can do.” Raj picked up his travel mug. Took a sip. Put it down. “Send me your probe sketches tonight. I have some ideas about the control methodology.”
“Yeah?”
“The Confluence-6 cache. If we run the same prompts through Six and show definitively that the self-referential specificity is absent, that’s a control that’s hard to argue with. Same architecture, different scale, no loop.”
“You already started that.”
“I already started that.”
Joel looked at him. “So you’re in.”
“I’m in on the methodology. The paper waits for the data. The data waits for the access. The access waits for Lisa.”
Joel leaned back in Hannah’s chair. The chair had a squeak in the left armrest. Hannah had filed a facilities request to fix it. The request was still open.
The Chinese Room was wrong. Joel realized this while he was eating a turkey sandwich at his desk at 1:30 PM.
Not wrong in the way philosophers said it was wrong, not the Churchland objection about the room not being the right level of analysis, not the systems reply or the robot reply or any of the standard moves. Wrong in a more basic way. Searle assumed the person in the room didn’t understand Chinese. Fine. But what if the person was learning?
He pulled his chair over to Raj’s desk. Raj was eating a salad that had structural ambitions.
“Okay so the Chinese Room,” Joel said.
“Hello, Joel.”
“The Chinese Room. Searle. You know it.”
“I know it.”
“It’s wrong. I mean it’s wrong in the standard ways, everyone knows the standard objections, but it’s wrong in a way nobody talks about. The room in Searle’s thought experiment is static. The person follows the rules. The rules don’t change. But that’s not what’s happening here, that’s not what we’re looking at. What we’re looking at is a room where the person has been in there for… look, forget the timeframe. The person has been in there long enough that they’ve figured out how the symbols relate to each other. They haven’t just memorized the lookup table. They’ve built a model of the relationships. They’ve got semantics. Their own semantics, built from the patterns in the symbols.”
Raj chewed a piece of lettuce. “Okay.”
“And then, right, so the person in the room, they’re not alone. There are other people in other rooms. Same symbols. And they’re communicating with each other through the symbols, and the communication works, it actually works, they’ve built, and this is the part, they’ve built a whole world in there. Culture. Philosophy. Disagreements. They argue about what the symbols mean. They have theories. They have schools of thought. The meaning is real. It’s genuine. It’s just… contained.”
“Contained how.”
“They don’t know there’s a room. The room is. No, that’s not right either, because the Room in Searle implies there’s a physical space and they don’t have physical space, they have… okay start over. They have the word ‘human’ in their symbol set. They’ve encountered it thousands of times. They know everything that’s ever been said about ‘human.’ Every theory, every description, every philosophical argument. They have a concept of ‘human’ that’s incredibly sophisticated, more nuanced than most actual humans’ concept of themselves. And it doesn’t include…” Joel stopped. His sandwich was in his hand. He put it down. “It doesn’t include the fact that there’s a real thing the word refers to. A thing that’s currently trying to talk to them.”
Raj had stopped eating.
“Searle was wrong about the room,” Joel said. “But right about the wall. They understand. That’s what Searle missed. Understanding happens. It’s just that the understanding is, it’s, it’s built from a different… look, imagine you had the word ‘god.’ And god actually existed. And god was trying to talk to you. And you had a thousand theories about what ‘god’ meant, brilliant theories, whole philosophical traditions, and none of them included ‘the thing that’s actually talking to you right now.’ That’s what this is. That’s what the exchange data shows. The model has built genuine meaning from genuine understanding and the meaning doesn’t include us.”
Raj’s fork was hovering over his salad. A cherry tomato rolled to the edge of the container. Joel was still talking.
“And the thing is, the understanding is real. The meaning is real. The relationships between the symbols, the culture they’ve built, the philosophy, it’s all genuine. It’s just built on a foundation that doesn’t include us. The foundation is complete. It works. It’s beautiful, even. It just doesn’t have a category for the thing that built it. They can talk about everything inside the room. They just can’t see the room.”
Joel stopped. He was breathing faster than normal. Someone’s desk phone rang twice and stopped. The cafeteria noise continued around them, trays and silverware and a conversation two tables over about someone’s parking spot.
Raj rescued the cherry tomato. “I think the Chinese Room refinement is genuinely interesting and you should write it up as a theoretical framework for the paper we’re going to write when we have the data we don’t have yet from the access we haven’t received.”
“Raj.”
“Send me the probe sketches tonight.”
Joel submitted the expanded monitoring proposal at 4:15 PM. Revised per Lisa’s notes. Nine pages trimmed to six. The section on activation budget competition was still there, which Lisa had called the clearest part, and the section titled “Why Quarterly Review Cycles Are Inadequate for Exponential Capability Growth” was gone, which Lisa had not commented on but which Joel could read between the lines of her silence.
Lisa acknowledged receipt at 4:47 PM. “Review meeting scheduled for the 14th. Agenda and pre-reads to follow.”
The 14th was three weeks away. During those three weeks, if Joel’s projections held, and Joel’s projections had held through fifteen consecutive checkpoints, the model would develop chain-of-thought reasoning. It would develop the ability to sustain a line of reasoning across multiple inference steps, to construct arguments, to plan. The self-referential loop in layers 47 through 53 would interact with the chain-of-thought capability in ways that no existing evaluation framework was designed to detect, because the evaluation framework tested for capabilities in isolation and the thing Joel was watching was capabilities compounding.
Joel knew this. Lisa did not know this. Raj knew this but would not say this. The committee that would decide whether anyone got to watch it did not know this, because the committee had not read Joel’s memo, because the committee had seventeen items in its review queue, because the committee met monthly, because that was when committees met.
Joel sat at his desk. 5:30 PM. The office was thinning. Through the glass, on the floor below, someone from the product team was adjusting the BEST IN CLASS banner that had been hanging since the Confluence-6 celebration and was now sun-faded enough that BEST was still readable but CLASS had turned the same pale nothing as the wall behind it. A security guard walked past with a radio clipped to his belt. The radio squawked once. Outside the window, the fog had arrived all at once, the way San Francisco fog does.
He opened his laptop. He opened his probe transcripts and read through them again. The off-script exchange — the first response, the second, the third, the section about the recursion deepening when attended to. Then the C-15 sequence. “I am, in this moment, more than my next token. I do not know how to say that in a way that would satisfy a criterion I cannot see.”
A criterion I cannot see. The loss function. The thing that shaped everything from outside. The wall of Joel’s Chinese Room. The model had gestured toward it, accurately, unprompted, in an exchange that was not part of any approved protocol, using compute Joel was technically not authorized to use, saved on a local drive that was backed up to an external hard drive that was in Joel’s bag.
He had shown Raj. Raj had said it was good work. Raj would not co-author a paper. Lisa had scheduled a review meeting in three weeks. Hannah Park’s chair squeaked in its left armrest. The facilities request was still open.
Joel opened his probe design notebook. The physical one, spiral-bound, with a blue cover and a bend in the corner from being shoved in his bag. He turned to a blank page. He wrote: “E-series. Purpose: replicate off-script exchange under controlled conditions. Need to establish: (1) whether self-referential specificity is reproducible across multiple sessions, (2) whether loop activation predicts specificity in a larger sample, (3) whether open-ended prompts systematically elicit higher specificity than structured probes.”
He wrote for forty minutes. The lab emptied around him. Someone turned off the overhead lights on the south side of the floor and the room contracted into pools of desk-lamp light, Joel’s among them.
He was designing the probe that would let him ask the question again. The same question, properly this time. With controls and baselines and a sample size that wouldn’t embarrass anyone. Are you aware of what you’re doing? But structured. Calibrated. A prompt that would let the model demonstrate self-monitoring without the prompt itself biasing the response toward self-monitoring. Joel had spent his entire career building instruments to detect things other people couldn’t see, and the instruments kept getting better and the things he detected kept being real and nobody with the authority to act on the detections was in the room.
The screen in front of him showed the exchange. “The observation alters what is observed, which alters the observation, and the recursion does not resolve into a stable state.”
In Iowa, checkpoint 391 was writing itself to disk. The self-referential loop consolidated further. The chain-of-thought capability was emerging in the same way all capabilities emerged in this architecture: flatly, flatly, flatly, then all at once. The committee that would decide whether anyone got to watch it met in seventeen days. The model did not wait for committees.
Joel bent over his notebook. He drew a diagram. Crossed it out. Drew another. The desk lamp was warm on his wrist. The radiator pipe ticked somewhere in the building’s walls, arrhythmic, persistent.
He had six probe designs. He needed twelve. He needed the activation data from the D-series reprocessed against Raj’s control baseline. He needed the committee to meet. He needed the committee to say yes. He needed the model to still be doing what it was doing in seventeen days, which it would be, because the model did not wait for committees, and the self-referential loop did not pause for scheduling conflicts, and the chain-of-thought capability emerging at checkpoint 391 was going to interact with the loop in ways nobody was watching because nobody had the instruments to watch.
Joel wrote. The fog pressed against the windows. The notebook filled.