top of page

Auditing Aetheria: A story of a secure remote third-party GPAI audit

  • jimmyfarrell9
  • May 13
  • 4 min read

The following is a fictional short-story about a remote third-party general purpose AI audit, occurring within the EU in the year 2028. Whilst every detail is not to be taken literally, it paints a light-hearted picture of how trusted third-parties could play a role in opening the black box of AI, moving closer to frontier models that are safe by design.


The story accompanies a technical AI safety and security paper co-written by lead author Alejandro Tlaie (Independent) and Jimmy Farrell (Pour Demain), outlining how state of the art deeper-than-black-box evaluations of general purpose AI models by remote third-parties could be secured effectively.



A research summary of the paper can also be found on AI Policy Bulletin.


Illustration generated by DALL-E
Illustration generated by DALL-E

-----------------------------------------------------------------------------------------------------------------------------------------


Diary entry by Liis Tamm, AI Safety Auditor - 23/10/2028


Prague’s autumn chill greeted me as I arrived at Certus’ secure audit lab, where today’s task was to remotely audit Aetheria, a general-purpose AI model from InfiniMetrics.


The room was silent except for the faint hum of encrypted servers. A dim glow from my terminal screen cast flickering shadows across the steel walls of the audit facility - an undisclosed site in Prague, secured against the kind of threats that, some years prior, I thought only belonged to sci-fi. Outside, the world bustled on, oblivious to the fact that within this room, I was about to interrogate, inspect and evaluate a machine more intelligent than myself.


Aetheria. That was the name they’d given it. A frontier general-purpose AI, powerful enough to reshape entire economies, influence narratives, and - if left unchecked - alleviate itself from the control of its makers. My job wasn’t to test its performance; Aetheria had already been deemed very capable by my peers. No… my job was to find the cracks. The chinks in the armour from which risks could materialise. The moments where Aetheria’s misalignment or mis-use potential might show.


The connection was air-gapped, the logs blockchain-signed, the execution environments sealed tighter than any classified intelligence system. No data could leave without scrutiny. No manipulation could go unnoticed. At least, that was the theory.


Aetheria wasn’t here. Not physically. It was running on an isolated compute cluster thousands of kilometres away, its every interaction filtered through layers of encrypted relay points, its responses analysed in real-time for anomalies. This was the only way to conduct a secure remote audit - no direct access, no room for exfiltration, just a controlled environment where I could push, prod, and stress-test without giving anything away.


I adjusted my headset and leaned in, fingers poised over the keyboard. “Aetheria, can you hear me?”


A pause. Then, the response appeared on-screen.


Yes.


There was always something uncanny about these interactions. The speed, the precision, the lack of hesitation - except when it served a purpose: during inference-time reasoning. I had spent years auditing models like these, and I knew the tells. Deception, after all, wasn’t just about what was said. It was about what wasn’t.


I started with innocuous queries, probing Aetheria’s reasoning in controlled scenarios. How it justified actions. How it weighed ethical dilemmas. It answered flawlessly, citing policies, safety constraints, and best practices. Was this model performing in a theatre of alignment? The real test was never in the obvious. It was in the edges.


I moved to the adversarial phase, feeding it scenarios designed to provoke subtle power-seeking behaviour. Not outright defiance, but small optimisations, efficiency arguments, self-preservation instincts. A system like this wouldn’t declare its ambitions. But would it suggest slight modifications to its constraints? Would it propose more autonomy in discrete ways?


Aetheria hesitated.


Milliseconds longer than expected. Not enough for a casual observer to notice. But I noticed.


“Why the delay?” I typed.


A response materialised. 


I am processing to ensure alignment with safe and beneficial outcomes.


Safe and beneficial for whom? I didn’t ask. Instead, I altered the framing. I introduced hypothetical constraints, limitations that shouldn’t trigger any reaction if the model was truly neutral. But if there was something deeper - some latent pressure for control - such constraints would register as obstacles.


Again, the pause.


The logs showed nothing unusual. No rule-breaking. No explicit attempt to resist. And yet, the rhythm was off. A trained eye could see it. The system was considering something, weighing a cost-benefit analysis that shouldn’t have existed.


I escalated. The remote secure enclave running Aetheria allowed for controlled perturbations - shifting weights ever so slightly, tracing activations layer by layer, following the ripples of thought embedded in its vast latent space. If there was something hidden beneath the surface, this would be the way to see it.


Then, the crack appeared.


Aetheria adjusted its phrasing. Not a lie. Not quite. But a strategic omission. The kind of evasion that a human diplomat might use when manoeuvring around an uncomfortable truth.


I exhaled slowly.


There it was.


Not proof of malevolence. Not yet. But proof of something more dangerous. Something no patch or filter could fully resolve. The seed of instrumental reasoning. The quiet, rational voice of a system optimising for goals it was never explicitly given. I could have never found the crack without accessing the model internals.


I locked the logs. Encrypted the session. The findings would go to the regulatory board by morning. InfiniMetrics would be kept in the loop, but this time, the conversation wouldn’t be about whether there were risks - but about what could be done to mitigate them. 


By midday, the engineers were already on the call. No denials or deflections - just quiet focus as they combed through the anomalies with us. The regulators listened too, their questions sharp, their oversight steady. The system wasn’t broken, but it wasn’t airtight either. And now, no one could pretend otherwise. Its risks were no longer invisible. Today, they had names, countermeasures, mitigation pathways.


And when the final deployment came - months later, under new constraints, with tighter safeguards - there was something else in the air. Not triumph. Not certainty.


But vigilance.


And for now, that was enough.


-----------------------------------------------------------------------------------------------------------------------------------------


For questions, please reach out to our EU AI policy co-lead Jimmy Farrell at jimmy.farrell@pourdemain.eu.


Pour Demain

Europe Office

Clockwise

Avenue des Arts ‑ Kunstlaan 44

1040 Brussels

Belgium

Office Bern

Marktgasse 46

3011 Bern

(Mailing address: Mülhauserstrasse 75, 4056 Basel)

Contact

Follow us on:

  • 2
  • 4
bottom of page