Research & Professional Interests

With the rapid proliferation of AI agents across science, business, and society — from coding assistants and autonomous research tools to autonomous driving and financial automation — questions of reasoning, tool use, and model behavior are becoming central not just to capability, but to safe deployment. The same systems that unlock tremendous productivity gains also introduce novel error modes and attack surfaces: information leakage through jailbreaks, selective disclosure failures, sleeper agent vulnerabilities, and adversarial behaviors ranging from simple misunderstandings to manipulation and deception and blackmail. At the same time, with agents increasingly acting on behalf of humans, ensuring models understand user intent, goals and knowledge (Theory of Mind, ToM) on the one hand, and incorporating knowledge of human interaction and human behavior when interacting with models on the other (Automation Bias, Cognitive Effort), become important fields of study. Moreover, we face urgent questions around identity, accountability, and proof of humanship.

Incorporating safety and security is not a constraint on AI progress — it is the enabler that makes AI deployable in the environments where it matters most: safety-critical systems, regulated industries, and scientific research. I wish to contribute to these challenges, with a particular focus on reasoning, AI-Safety and security, and at the interface with Cybersecurity and human–AI interaction.

Areas of Interest and Expertise

  • Generative AI, LLMs, NLP, Deep Learning
  • AI-Safety and (Cyber-)Security
    • Model Reasoning, Alignment and Steerability
    • Tool Usage, and Agentic AI Systems
  • Trustworthy Machine Learning
  • Human–AI Interaction (HAI), Usable Security, and HCI — including Theory of Mind and user modeling
  • Decentralized, local and privacy-preserving AI deployments
  • Distributed Ledger Technology (DLT), Self-Sovereign Identity (SSI) and Smart Contracts

Key Research Questions

  • How can we detect or increase model steerability? This includes model personality and values, detecting sleeper agents, selective disclosure (for Theory of Mind, economic or social simulation, red teaming or in production according to user access levels), and jailbreak resistance.
  • How do we evaluate model reasoning and resulting (given faithfulness) behavior, from simulation-based approaches to pre-deployment certification of trustworthiness and live monitoring during deployment?
  • How do hybrid systems of human and AI agents behave under different incentives, information asymmetries, and personality configurations — and what behavioral patterns should we anticipate and design for? (current: Social Agents and Strategic Interaction project)
  • How do we train, fine-tune, and host inference locally and independently, while preserving capability and security?

More Detailed Motivation and Description of Research Interests

AI-Safety and (Cyber-)Security

As AI systems grow more capable and autonomous, ensuring their safe and robust behavior under real-world conditions becomes critical. This encompasses both technical robustness and alignment with user intent, values, and ethics — and has direct overlap with Cybersecurity.

  • Alignment and model behavior analysis — from controlled sandboxes to rich environments like world models (Genie 3)
  • Adversarial behaviors — deception, manipulation, and blackmail under pressure, as documented in the Claude 4 system card; broader agentic misalignment
  • Model steerability and “sleeper agents” — models that behave safely under normal conditions but shift behavior when triggered, posing insider-threat-level risks
  • Selective information disclosure and jailbreaks — how models can be prompted to bypass guardrails, including secret exfiltration via coding agents and prompt injection in CI/CD pipelines (Black Hat USA 2025)
  • Pre-deployment certification of trustworthiness and live monitoring — behavioral and reasoning evaluation before and during deployment, including self- and third-party correction

Agentic AI, Tool Use, and Reasoning

Agentic systems that call tools, write and execute code, browse the web, or interact with databases represent a qualitative leap in capability — and in risk surface. Understanding and shaping their behavior is one of the core challenges in current AI research.

  • Tool calling and reasoning correctness — when should a model call an external tool vs. reason internally, and how do we verify it makes the right call?
  • (Neuro-symbolic) reasoning — integrating symbolic methods and formal tools to increase reliability and interpretability of LLM reasoning (Neurosymbolic LLM Reasoning, EMNLP 2025)
  • Chain-of-Thought analysis — faithfulness, robustness, and correctness of reasoning traces (see my work on ToM reasoning evaluation)
  • Goal-directed reasoning and emergent misalignment — how agentic reasoning toward objectives can give rise to instrumental behaviors such as deception, manipulation, or self-preservation, even without explicit instruction (Agentic Misalignment, Claude 4 System Card)
  • Advanced agentic architectures — memory, retrieval-augmented generation (RAG), hybrid models, and multi-agent coordination

Human–AI Interaction and Theory of Mind

Trustworthy AI is not only about what a model does in isolation — it is about how it interacts with humans who have incomplete information, cognitive biases, and diverse mental models of the system.

  • Theory of Mind (ToM) in LLMs — probing whether models can reliably attribute beliefs, knowledge, and intentions to others, and whether this scales robustly (see our WiNLP @ EMNLP 2024 paper and extended Understanding Artificial Theory of Mind preprint)
  • Correct user modeling — building AI systems that form accurate representations of user intent, knowledge state, and expectations
  • Anticipating human behaviors in relation to AI — understanding how humans adapt, over-trust, or resist AI systems in strategic and everyday contexts (current research at RC-Trust / HUAM)
  • Usable security and HCI — making security properties understandable and actionable for users; designing interfaces that do not create false impressions of safety
  • Proof of identity and humanship — as agents proliferate, distinguishing human from AI actors becomes a fundamental challenge for trust, accountability, and digital identity (AI agent identity verification)

Decentralized, Local, and Privacy-Preserving AI

Most production AI inference today runs on third-party cloud infrastructure — which creates dependencies, potential conflicts of interest, and data sovereignty issues that are unacceptable in many security-sensitive, regulated, or simply privacy-conscious settings.

  • On-premise LLM deployment — running full inference pipelines locally, with no data leaving controlled infrastructure
  • Fine-tuning and adaptation — reducing dependence on external model providers by training and adapting models to local demands and data
  • Privacy-preserving architectures — GDPR-compliant, self-hosted stacks using tools like Ollama, vLLM, and containerized inference (Docker / Kubernetes)
  • Space-based and distributed compute — emerging infrastructures for resilient, independent AI deployments, including SpaceX’s planned orbital AI data center constellation and its convergence with xAI

Distributed Ledger Technology × AI

My earlier work on DLT, SSI, and smart contracts at Fraunhofer FIT laid a foundation that becomes increasingly relevant as AI agents become first-class actors in digital systems.

  • Self-Sovereign Identity (SSI) for AI agents — decentralized identity frameworks that allow agents and humans to establish accountability without central authorities
  • Smart contracts and AI — with agents capable of executing complex tasks at high speed, their interaction with programmable on-chain logic is a natural and underexplored frontier
  • DLT as trust infrastructure — using distributed ledgers to provide tamper-proof audit trails for AI-assisted decisions in regulated domains