Announcing Concordance: Mech Interp and Inference Mods
Token level interventions give you control, reliability & observability

Custom Inference Controls & Applied Mechanistic Interpretability
We're excited to officially introduce Concordance. We want to share what we’ve discovered so far, as well as where we’re headed.
Concordance is building a software suite for applying mechanistic interpretability (“mech interp”) techniques to AI systems, with the thesis that it will improve control, reliability, observability, and widen the design space for more complex applications. While much of the mech interp field is focused around alignment concerns, we are primarily focused on how these tools can impact the developer experience, the end user experience, and the model’s performance.
Our journey starts with granular token-level interventions, and we will soon release an SDK to build custom inference modifications (“mods”) that enable conditional forced tokens, backtracking, smarter tool-calling, per-token sampling strategies, and deep logit analytics.
The SDK is built around Events, Actions, Mods, and Flows. Events are emitted at important steps in the inference process (Prefill, ForwardPass, Sampled, and Added). Actions are responses that can steer the inference process after each Event. The Actions are AdjustPrefill, ForceTokens, AdjustLogits, ForceOutput, ForceToolCalls, and Backtrack. Mods are modules that ingest Events and return Actions. Mods can hold arbitrary state and be strung together with Flows to create complex inference-time steering. Below are a few examples.
Example 1
A simple token intervention - look for simple math expressions, and when a = is generated, evaluate the math expression and force the generated result.
Example 2
Use the Concordance SDK's self-prompting techniques to route to inference-time tool calling (no client back-and-forth).
Example 3
An implementation of Reasoning with Sampling: Your Base Model is Smarter Than You Think as a Mod.
Once you treat LLMs like complete programs, many more control surfaces appear. At the inference level, you can deterministically guarantee outputs under programmed conditions, alter style and constraints in real time (without a multi-step agent loop), and make LLM applications behave like engineered systems.
We believe the industry is mostly overlooking the potential improvements found at the inference level, beyond cost and speed optimizations. We’re excited to bring these controls to developers and product teams building AI systems.
We’re beginning a closed alpha that will allow developers to use mods across common architectures (Llama, Gemma, DeepSeek, GPT-OSS, and more).
If you want to join us and experiment with inference mods, send us a DM.
