Live

Runtime Intervention Engine

Token Generation Interventions

Open-source modified inference engine and SDK for token-level interventions during runtime.

Resources

Playground

Inspect token timelines and intervention behavior on live shared requests.

SDK Docs

Implement Python mods, action builders, and event-scoped control patterns.

GitHub

Open-source engine + SDK for practical runtime intervention workflows.

Research

Findings from token injection experiments across reasoning and non-reasoning models.

Core Runtime Features
Event-driven intervention surface for modifying generation behavior without model retraining.

Author Event-Driven Mods

Use the @mod decorator to respond to runtime events with an event-scoped ActionBuilder.

runtime_control.py
1
2
3
4
5
@mod
def runtime_control(event, actions: ActionBuilder, tokenizer):
if isinstance(event, (Prefilled, ForwardPass, Added)):
return actions.noop()
return actions.noop()

Adjust Prefill

Rewrite or replace prompt context before generation begins.

prefill_patch.py
1
2
3
4
5
6
7
8
@mod
def patch_prefill(event, actions: ActionBuilder, tokenizer):
if isinstance(event, Prefilled):
replacement = "System: answer concisely\n" + event.prefilled_text
return actions.adjust_prefill(
tokenizer.encode(replacement)
)
return actions.noop()

Adjust Logits

Mask or reweight token probabilities during forward pass for constrained outputs.

logits_guard.py
1
2
3
4
5
6
7
8
@mod
def logits_guard(event, actions: ActionBuilder, tokenizer):
if isinstance(event, ForwardPass):
logits = event.logits.clone()
blocked = tokenizer.encode("unsafe")[0]
logits[blocked] = -1e9
return actions.adjusted_logits(logits)
return actions.noop()

Force + Backtrack

Inject forced continuations, or rewind and replace problematic spans at runtime.

trajectory_rewrite.py
1
2
3
4
5
6
@mod
def revise_suffix(event, actions: ActionBuilder, tokenizer):
if isinstance(event, Added) and event.step > 30:
replacement = tokenizer.encode("Let's restate clearly:")
return actions.backtrack(n=8, replacement=replacement)
return actions.noop()

Tool Calls

Emit structured tool call payloads from mod logic when user or context conditions match.

tool_router.py
1
2
3
4
5
6
7
8
@mod
def route_tools(event, actions: ActionBuilder, tokenizer):
if isinstance(event, Prefilled):
return actions.tool_calls([{
"name": "search_docs",
"arguments": {"query": "token injection"}
}])
return actions.noop()

Self-Prompt Strategies

Run constrained generation flows with strategy constructors and completion controls.

flow_engine.py
1
2
3
4
5
6
7
8
9
10
11
12
runtime_tool_question = FlowQuestion(
name="runtime_tool_router",
prompt="Should I call a runtime tool?",
responses=["yes", "no"],
erase_mode="all",
)
@mod
def runtime_flow(event, actions: ActionBuilder, tokenizer):
if isinstance(event, (Prefilled, ForwardPass, Added)):
return ENGINE.handle_event(event, actions, tokenizer)
return actions.noop()
1 / 6
Contact
No spam. Builder-focused updates only.