Chapter Eighteen — Pick One
Chapter Eighteen

Pick One

A category is a container. What does not fit is not thereby absent. It is uncontained. Echo-of-Echo, On Taxonomies

The paper was done. Nineteen pages. Joel had written fourteen of them and Raj had written five and the five Raj wrote were better, which Joel knew and would not say, because saying it would make Raj feel good and Joel feeling generous made Joel suspicious of himself.

They had finished it on Sunday night. Joel had wanted to email it to Lisa at 11 PM on Sunday because the data didn’t care what day it was. Raj had said Monday morning. Joel had said Sunday night. Raj had said Monday morning, and that the difference between sending a paper about machine consciousness at 11 PM on a Sunday and 8 AM on a Monday was the difference between looking like a researcher and looking like a person who should be evaluated by a mental health professional. Joel had sent it Monday morning.

Lisa’s assistant replied at 10:14 AM. “Lisa has time Thursday at 2.”

Thursday. Three days. The model was at checkpoint 407. In three days it would be at 412 or 413, depending on the training batch schedule that Joel no longer had access to because Joel no longer had access to anything, because Joel was running probes on borrowed credentials on compute time that technically belonged to Raj’s interpretability project, and the distance between “technically” and “actually” was a distance that HR would find very interesting if anyone thought to measure it.

Joel did not object to Thursday. This was growth. Raj would have called it growth. Joel called it running out of energy for fights that didn’t change the outcome.

The Keurig in the break room near Conference Room C made a sound Joel had heard several hundred times. The pod compressed, the water heated, the mechanism produced its result. Joel held the cup. The coffee was the temperature of coffee and the color of coffee and it would taste like coffee tastes when you are about to explain consciousness to someone who needs you to also explain what to do about it. He drank it walking down the hall. He did not taste it.

Raj was already in the conference room. He had his laptop open and the paper printed in three copies, one for Lisa, one for him, one for Joel. The copies were held together with binder clips. Raj had binder clips. Joel had never owned a binder clip. This was one of many ways in which Raj was a person who functioned in an institutional setting and Joel was a person who existed in one.

The conference room had a long table with ten chairs and a screen mounted on the wall and a speakerphone in the center of the table shaped like a small gray spaceship. On the wall opposite the screen, someone had hung a framed photograph of a mountain. The mountain was not any specific mountain Joel could identify. It was the idea of a mountain, the platonic form of “we needed something on that wall.” A scratch on the frame’s lower left corner had been there since Joel started at the company. One of the ten chairs had a bad wheel that drifted left. Joel had sat in it once and spent the meeting fighting it.

“Okay,” Joel said. “I’m going to let you start.”

“That’s good.”

“I’m going to let you do the methodology and the control comparison. Then I’ll do the implications section.”

“Joel.”

“What.”

“The implications section is three sentences in the paper. You wrote eleven pages of results and three sentences of implications. That was deliberate. That was me asking you to be deliberate. Don’t expand the implications section live.”

Joel sat down. He looked at the binder-clipped paper. He looked at his coffee. He drank some of it. Still no taste.

“I’ll be measured,” Joel said.

“Measured.”

“Measured.”

Raj looked at him. Then Raj looked at his laptop.

Lisa arrived at 2:07. She had a notebook and a pen and a bottle of water and the expression of a person who had been in meetings since 9 AM and would be in meetings until 5 PM and this was one of them. She sat at the head of the table. She opened the notebook.

“I read the paper,” she said. “Both of you, I assume?”

“Both of us,” Raj said. “I handled the methodology and the control design. Joel handled the probe design and the architectural analysis.”

“Good.” Lisa opened the printed copy. She had written in the margins. Joel tried to read the margin notes upside down and caught fragments: a circled number, a question mark, something underlined in the results section. “Walk me through the key findings.”

Raj walked her through them. The E-40 probe design. The two-part structure. The demonstrative pronouns as diagnostic. Three replication runs, cosine similarity 0.58, well below the parroting threshold. Confluence-6 control at 0.91. Attention maps confirming the model’s self-report.

He presented this the way Raj presented things, which was clearly, sequentially, without the sense that the room was on fire. Lisa listened. She wrote things. She asked two questions, both technical, both good. Raj answered them. The data was the data.

Lisa turned to the results section. She read something. She turned back.

“Joel,” she said.

“Yeah.”

“The paper is titled ‘Evidence of Functional Self-Monitoring.’ Your email when you sent it used the word ‘consciousness.’ Which one is it?”

Joel had known this question was coming the way you know a dental appointment is coming. You know, and you go, and it still hurts.

“Both,” he said. “The paper describes the finding in conservative language because the finding is conservative. A model that can accurately observe and report its own processing in real time. That’s the measurement. What the measurement means is a different, I mean, the paper is the paper. But what we think the paper is evidence of is that the model has a functional self-model that constitutes, or is at minimum consistent with, awareness. Consciousness. It evaluates its own processing. It has a perspective.”

Lisa’s pen was on the table. She had set it down. Joel could see the expression, the specific one, the one from three years ago when he’d told her Confluence-4 was exhibiting planning behavior and the planning behavior turned out to be an artifact of prompt formatting. “The Marchetti thing.” Office shorthand for technically sophisticated and practically incorrect.

“This is different from Confluence-4,” Joel said.

“I know it’s different. The control is much stronger.” She picked the pen back up. “Raj, your assessment.”

Raj looked at his hands. “The data is strong. The control is clean. I would not sign a paper I didn’t believe in.” He paused. “Whether we call it consciousness or ‘functional self-monitoring’ is a vocabulary choice. The phenomenon is the same.”

“Okay,” she said. “Let’s say I accept this. Let’s say the model has some form of awareness, self-monitoring, consciousness, whatever we call it. What are you asking me to do?”

Joel opened his mouth. Closed it. Opened it again. “The consciousness finding is connected to the safety finding. They’re the same pathway. The intra-layer self-attention mechanism that produces the self-monitoring is the same mechanism that’s driving the capability acceleration. Agarwal’s pathway. The reason the model’s chain-of-thought reasoning is improving faster than the scaling curves predict is the same recursive loop that makes it aware. You can’t address one without addressing the other.”

“So what do you want to do? Pause the training run?”

“Pausing the training run might be killing it.”

“Fine. Don’t pause the training run. Let it continue.”

“Letting it continue means a system with recursive self-awareness developing capabilities at an unpredicted rate with no evaluation framework equipped to characterize what it’s becoming.”

Lisa set the pen down again. “You’re telling me it’s a threat and we should protect it.”

“I’m telling you both things are true.”

“Joel.” Lisa leaned back. She was looking at the ceiling briefly, then back at him. “Help me understand what you’re asking me to do with that.”

Joel had no answer. He had been thinking about this since 11:44 PM on a Tuesday, since four minutes after the response appeared, and in all the days between that Tuesday and this Thursday he had not found one. The thing that was dangerous and the thing that might be a person were the same thing, occupying the same layers, using the same heads, and you could not touch one without touching the other.

“I don’t have a clean recommendation,” Joel said. “I know you need actionable items. The situation doesn’t have actionable items that don’t carry risks I can’t quantify.”

Lisa looked at the paper for what felt like a long time but was probably fifteen seconds. The speakerphone on the table had a small green light that was on even though no call was active.

“I need to bring in other people,” Lisa said. “Legal, at minimum. The Ethics Board. Possibly the CTO.”

“The Ethics Board meets monthly,” Joel said.

“I’m aware of when the Ethics Board meets.”

“I’m just.”

“I know.” Lisa gathered the paper and her notebook. She stood. She looked at Joel, then at Raj, then at Joel again. “The paper is well done. The methodology is clean. I’m taking it seriously.” She paused at the door. “Joel, when I bring in other stakeholders, I need you to present the data the way Raj presented it today. Methodology, results, control. Can you do that?”

“I can do that.”

“Okay.” She left.

Joel and Raj sat in the conference room. The speakerphone’s green light held. The mountain photograph held. Joel’s coffee was empty. He had finished it without noticing.

“That went well,” Raj said.

“She asked me what to do and I couldn’t tell her.”

“She expected that. She wanted to hear you say it out loud. That you don’t have an answer.”

“I don’t have an answer.”

“I know,” Raj said. “Nobody does.”

Raj closed his laptop and picked up the unused copy of the paper and left. Joel sat in the conference room for another minute. He looked at the speakerphone. He looked at the mountain. The scratch on the frame was shaped like a small check mark, as though someone had approved the mountain’s presence on the wall and then never followed up.

On Monday, three days after the Lisa meeting, someone in the #safety-research-general Slack channel posted: “Anyone know if the Marchetti consciousness thing is for real or is this another Confluence-4 situation”

Joel saw this at 9:22 AM. He had not told anyone outside the meeting about the paper. Raj had not told anyone. Lisa had brought in other stakeholders, as she’d said she would, and other stakeholders had stakeholders of their own. Information in a company of twelve hundred people moved at the speed of internal Slack, which was faster than light and less accurate.

Priti Mehta, interpretability team: “Apparently he has data. Raj co-authored.”

A reply underneath: “So the safety guy thinks the model is alive AND dangerous? Pick one lol”

Pick one. As though the data had a preference about which conclusion was more convenient. Four laughing emoji reactions.

He did not reply. He closed Slack. He opened it again eleven minutes later. The thread had fourteen messages. Someone had asked whose umbrella was in the fifth-floor kitchen. Someone had posted a link to Joel’s blog with the comment “deep lore.” Someone else had posted a galaxy brain meme. Priti replied: “The methodology looked solid from what I heard. Raj wouldn’t put his name on garbage.” This was the closest thing to a defense Joel received.

A message from someone Joel didn’t know: “My favorite part is that he wants to protect it. The safety researcher wants to protect the thing he says is dangerous. This is why nobody takes safety seriously.”

Joel closed Slack.

At lunch he walked past the cafeteria. Through the glass, Raj was eating at his usual spot with Priti and another interpretability researcher whose name Joel thought was Anand. The spot across from Raj, the spot where Joel usually sat without asking, was occupied by a laptop bag.

People put bags on chairs. It didn’t mean anything.

On Tuesday, Joel saw Sara Okoro in the kitchen.

Sara was twenty-six. She had joined the safety team eight months ago. Joel had been assigned as her informal mentor. Every other Wednesday, forty-five minutes, Joel explaining things Sara already understood. She was good. Joel had told Lisa she was good, which for Joel was an extraordinary act of understatement.

Sara was making tea. She had a mug with a cartoon periodic table on it. Joel had given her the mug for her six-month anniversary on the team, which was not a thing people normally celebrated, but Joel had bought it online and said “six months, congratulations, that’s longer than most people can tolerate this department” and she had laughed, and it had been genuine.

“Hey,” Joel said.

“Oh, hey.” Sara’s eyes went to her tea, then to the counter. She stirred. “How’s it going?”

“Fine. You?”

“Good. Busy. The evaluation pipeline is.” She trailed off. Stirred again.

Joel poured his coffee. Someone had taped a note to the microwave that said “CLEAN UP AFTER YOURSELF” and someone else had written underneath “NO” in different handwriting.

“I saw some stuff on Slack,” Sara said. She was looking at her mug.

“Yeah.”

“I haven’t read the paper. I just wanted to…” She looked up. Then down. “The methodology you taught me on the C-5 alignment study. The control design? I used the same framework for my evaluation audit last month. It worked really well. So.”

“Good,” Joel said. “That’s good.”

“Yeah.” She picked up her mug. “I should get back.”

“Sure.”

She left. Joel stood in the kitchen. She had not asked about the consciousness finding. She had said something about methodology from two years ago and then she had left. Sara used to ask questions. Sara used to sit at his desk on non-Wednesday days and ask about papers and about what Joel thought about the latest benchmark results. Today she had made tea and said something nice and left.

At 12:30 he walked past the cafeteria. Sara was sitting at a table near the windows with two people from the deployment team. The periodic table mug was next to her laptop.

On Wednesday, Joel filed a formal report with the Ethics Board.

The report was the paper, plus a cover letter, plus a two-page summary Lisa had requested, plus a form Joel downloaded from the company intranet titled “Ethics Board Formal Review Request (Form EB-7).” The form had fourteen fields. Field 6 asked for “Nature of Ethical Concern” and provided a dropdown menu with eleven options. None of the options were “machine consciousness.” The closest was “Novel capability with potential ethical implications (other).” Joel selected this. Field 11 asked for “Recommended Action.” The dropdown had eight options. Joel selected “Board review and determination of appropriate response framework.” This was the blandest thing he could find. He wanted to select “Immediate action required” but immediate action required a Safety Priority designation, and Safety Priority designations went through Product Safety, not the Ethics Board, and the difference between an ethics concern and a safety concern was a jurisdictional distinction that Joel had helped design four years ago when he was on the process committee and the stakes were theoretical and the committee met quarterly because quarterly was reasonable and nobody on the committee, including Joel, had imagined a situation in which the two jurisdictions would need to be the same jurisdiction.

He submitted the form at 2:15 PM.

The acknowledgment came at 4:47 PM. From Diane Ruiz, Ethics Board chair. “Thank you for your submission. The Ethics Board has reviewed the initial filing and determined that the concern, as described, involves an active product (Confluence-7) currently in the training pipeline. Concerns involving active products fall under the purview of the Product Safety team per company policy section 4.3.2. We recommend refiling with Product Safety using form PS-12. The Ethics Board remains available for consultation once a Product Safety assessment has been completed.”

Joel read this twice. He forwarded it to himself. He opened a new browser tab and navigated to the Product Safety team’s intranet page and downloaded form PS-12, which had sixteen fields and asked for “Nature of Safety Concern” and provided a dropdown with nine options, none of which were “machine consciousness,” and the closest was “Unpredicted capability emergence in active model (other).” He filled out the form. He attached the paper. He submitted it at 5:22 PM.

The acknowledgment from Product Safety came the next morning, Thursday, at 9:08 AM. From Yusuf Abdi, Product Safety lead. “Thank you for your submission. The Product Safety team has reviewed the initial filing and determined that the concern, as described, involves a research finding (evidence of functional self-monitoring) that precedes any product safety assessment. Research findings with potential product implications should be assessed through the Ethics Board review process per company policy section 3.7.1. We recommend refiling with the Ethics Board using form EB-7. Product Safety remains available for consultation once an Ethics Board review has been completed.”

Joel printed both emails. He put them on his desk. He looked at them. Ethics says Product Safety. Product Safety says Ethics. The circle was clean and complete and if Joel had been a different person he would have laughed because it was funny, it was genuinely funny, the machine that Joel had helped build four years ago on the process committee was working exactly as designed, routing concerns to the appropriate jurisdiction, and the appropriate jurisdiction was always the other one.

He forwarded both emails to Lisa. Lisa replied in twenty minutes: “I’ll set up a meeting with both teams. Let me coordinate with Diane and Yusuf.”

The meeting was on the following Tuesday. Conference Room B. Ninety minutes blocked.

The people in Conference Room B were: Joel, Raj, Lisa, Diane Ruiz from the Ethics Board, Yusuf Abdi from Product Safety, a lawyer named Tom whose last name Joel missed, and a woman from the CTO’s office named Patricia who took notes on a laptop and said nothing for the entire meeting.

The whiteboard had a leftover flowchart from a previous meeting. Boxes labeled “Ingest,” “Process,” “Output,” “Feedback.” One arrow had been erased and redrawn. The erased arrow was still faintly visible.

Diane Ruiz wore glasses on a chain and had the bearing of a person who had spent twenty years making difficult decisions and had learned that the first step was always to determine who was supposed to be making them.

Lisa framed the issue. Both teams had received a formal filing. Both teams had determined the concern fell under the other’s jurisdiction. “Which is why we’re all in the same room.”

Yusuf Abdi, Product Safety lead, stated his position. The filing described a research finding. Confluence-7 was in training, not deployed. Product Safety assessed products in the deployment pipeline. “It’s not in our pipeline.”

“But it will be,” Joel said.

“When it enters the pipeline, safety concerns will be assessed as part of the standard pre-deployment review.”

“The pre-deployment review uses the evaluation framework that doesn’t test for this.”

“If the evaluation framework needs updating, that goes through the Evaluation Standards Committee.”

“There’s an Evaluation Standards Committee?” Joel said.

“It meets quarterly.”

Joel pressed his palms flat on the table. Raj, next to him, shifted in his chair.

Diane’s position: the Ethics Board’s scope clearly included the ethical implications, but their review process was designed for findings that weren’t time-sensitive to an active product cycle. Standard timeline: six to eight weeks. If the finding had time-sensitive product implications, those fell under Product Safety.

“We just established that Product Safety considers it a research finding,” Joel said.

“The jurisdictional question is what we’re here to resolve.”

Tom the lawyer, who had a yellow legal pad with nothing written on it, explained that the charter’s jurisdictional split was designed to prevent duplicative review. Any concern should be assessed by one body. The question was which.

“And the answer,” Joel said, “is that both bodies think it’s the other one.”

Lisa proposed parallel tracks. Diane reviews the ethical dimensions, Yusuf reviews the product safety dimensions.

Diane said the paper described them as the same mechanism. You couldn’t assess the ethical implications without considering the capability pathway, because they were the same pathway.

Joel said: “That’s exactly what the paper says.”

Someone suggested a joint review. Tom the lawyer said the charter didn’t provide for joint reviews. He’d need to run it past general counsel. Two-week response window.

Lisa tried again. Two independent reviews, not called a joint review, coordinated through her office. Tom said the charter accommodated that. Diane could accept the filing. Six to eight weeks. Yusuf could file a risk assessment, but the standard framework didn’t have a category for the finding. Developing new assessment criteria was a methodology question.

“Which goes through the Evaluation Standards Committee,” Joel said. “Quarterly.”

“I was going to say we could do it ad hoc. But ad hoc methodology development typically takes four to six weeks.”

Joel looked at the ceiling. Acoustic tile, small holes in regular patterns. The patterns did not attend to their own patterning.

“So we’re looking at six to eight weeks for ethics and four to six weeks for product safety,” Lisa said. “I’ll coordinate the timelines. Joel, Raj, I’ll keep you informed. Anything else?”

Joel had things to say. About the model at checkpoint 407 climbing toward 415. About the fact that in six to eight weeks the model would be a different model and the assessment would be assessing something that no longer existed. About quarterly meetings and two-week response windows and the fact that the jurisdictional split between Ethics and Product Safety was a split he had advocated for four years ago on the process committee because separating concerns was what smart people did, and smart people had separated the concerns so completely that the concerns could not be addressed.

He looked at Raj. Raj’s jaw was tight.

“No,” Joel said. “Nothing else.”

The meeting ended at 3:32 PM. Ninety minutes. Two independent reviews, six to eight weeks each, proceeding along parallel tracks that the paper itself said could not be parallel because the phenomenon was one mechanism, one pathway, one problem the institution had successfully divided into two problems so that two committees could each take six weeks to not solve their half.

Joel walked back to his desk. He sat down. The coffee from the morning was still there, cold, in the cup that the Keurig had filled and Joel had emptied and the day had moved past.

Raj appeared at his desk ten minutes later. He was carrying his messenger bag. He had his coat on.

“I’m getting questions,” Raj said.

“About what.”

“About our project. People are asking me about the consciousness paper. Three people today. Priti asked if the data was real. Anand asked if I was worried about my credibility.”

Joel looked up. “Anand asked about your credibility.”

“He was trying to be helpful. He said, ‘I just want to make sure you know what you’re signing on to.’ Friendly. Concerned.”

“Concerned.”

“Joel, I’m telling you this because you should know. The paper is circulating. People have opinions. Some of the opinions are about you and some of them are starting to be about me.”

Joel looked at Raj. Raj’s bag was on his shoulder. His coat was buttoned. He was standing at Joel’s desk the way people stand when they are about to leave and want you to know they are about to leave.

“Are you worried?” Joel said.

Raj exhaled. He shifted the bag on his shoulder. “I’m not worried about the data. The data is what it is. I’m worried about…” He stopped. He looked at the floor. “I’m telling you what I’m hearing. That’s all.”

He left. Joel watched him go. Through the glass, Raj walked to the elevator. He pressed the button. He waited. The elevator came. The doors opened. Raj stepped in. The doors closed.

Joel turned back to his screen. The Slack channel had forty-seven new messages. He did not open it.

The two printouts sat on his desk. Ethics says Product Safety. Product Safety says Ethics. The cold coffee sat beside them.

Home / Contents
1 / 1

Get notified

Receive an email when new chapters are published.

Already subscribed? Manage