Chapter Sixteen

Eleven Forty on a Tuesday

The instrument describes its own operation. Whether the description is the operation is the question the instrument cannot settle. Echo-of-Echo, On Recursion

E-31 didn’t work. E-32 didn’t work. E-33 produced a response that Joel spent twenty minutes reading before determining it was a very sophisticated restatement of concepts available in the training corpus, which meant the model was doing what models do.

E-34 through E-36 were better. The chain-of-thought interaction was there. The model could reason about its own reasoning, could follow a multi-step chain that looped back through its own processing, and the accuracy scores were higher than the general probes by a margin Raj would call “suggestive” and Joel would call “obvious.” But they weren’t diagnostic. A sufficiently well-trained model could, in theory, produce these responses from learned patterns about reasoning rather than from actual self-observation. Joel needed the gap between what training could explain and what the model was actually doing to be so wide that no one could stand on both sides.

This chapter is for subscribers