Telemetry

msgFlux integrates with msgtrace-sdk — a lightweight wrapper around OpenTelemetry from the msg* library family — to provide production-grade observability for your AI systems.

All modules, agents, and tools are automatically instrumented. You can also add custom instrumentation to your own code using Spans.

✦₊⁺ Overview

The telemetry pipeline works at two levels:

Automatic — every Module, Agent, Tool, and functional operation emits spans with no extra code.
Manual — use Spans.instrument() / Spans.ainstrument() to trace your own functions.

Telemetry is disabled by default and has zero overhead when turned off.

from msgflux import Spans

1. Enabling Telemetry

Set the environment variable before running your application:

export MSGTRACE_TELEMETRY_ENABLED=true

Or configure it programmatically at startup:

from msgflux.telemetry.config import configure_msgtrace

configure_msgtrace(enabled=True)

2. Environment Variables

msgtrace-sdk (transport & exporter)

These variables control how traces are collected and exported.

Variable	Type	Default	Description
`MSGTRACE_TELEMETRY_ENABLED`	`bool`	`false`	Master switch — enable/disable all telemetry
`MSGTRACE_EXPORTER`	`str`	`"console"`	Exporter backend: `"console"` or `"otlp"`
`MSGTRACE_OTLP_ENDPOINT`	`str`	`"http://localhost:4318"`	OTLP collector endpoint (gRPC/HTTP)
`MSGTRACE_SERVICE_NAME`	`str`	`"msgflux"`	Service name shown in your tracing backend
`MSGTRACE_SAMPLING_RATIO`	`str`	`None`	Sampling ratio, e.g. `"0.5"` for 50%
`MSGTRACE_CAPTURE_PLATFORM`	`bool`	`true`	Attach OS/platform metadata to spans
`MSGTRACE_MAX_RETRIES`	`int`	`3`	Max retries on export failure

msgflux (what to capture)

Fine-grained control over the data included in spans.

Variable	Type	Default	Description
`MSGFLUX_TELEMETRY_CAPTURE_TOOL_CALL_RESPONSES`	`bool`	`true`	Include tool return values in spans
`MSGFLUX_TELEMETRY_CAPTURE_AGENT_PREPARE_MODEL_EXECUTION`	`bool`	`false`	Capture agent state, system prompt and tool schemas before each LM call
`MSGFLUX_TELEMETRY_CAPTURE_STATE_DICT`	`bool`	`false`	Attach the full `state_dict()` of a module to its span

3. Console Exporter (development)

The default exporter prints spans to stdout — useful during local development.

export MSGTRACE_TELEMETRY_ENABLED=true
# MSGTRACE_EXPORTER defaults to "console"

import msgflux as mf
import msgflux.nn as nn
from msgflux import Spans

model = mf.ChatCompletion("openai/gpt-4.1-mini")
agent = nn.Agent("MyAgent", model)
result = agent("What is the capital of France?")
# Span output will be printed to the console

4. OTLP Exporter (production)

Send traces to any OpenTelemetry-compatible backend (Jaeger, Tempo, Honeycomb, Datadog, etc.):

export MSGTRACE_TELEMETRY_ENABLED=true
export MSGTRACE_EXPORTER=otlp
export MSGTRACE_OTLP_ENDPOINT=http://localhost:4318
export MSGTRACE_SERVICE_NAME=my-ai-app

Quick start with Jaeger

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

Then open http://localhost:16686 to browse traces.

5. Instrumenting Your Own Code

Use Spans.instrument() to add tracing to any function without changing its signature.

Sync functions

from msgflux import Spans

@Spans.instrument()
def fetch_documents(query: str) -> list[str]:
    # This function now emits a span automatically
    ...

Async functions

@Spans.ainstrument()
async def embed_and_store(texts: list[str]) -> None:
    ...

Custom span attributes

Pass arbitrary key/value pairs to attach metadata to the span:

@Spans.ainstrument(attributes={"pipeline.stage": "retrieval", "index": "products"})
async def retrieve(query: str) -> list[str]:
    ...

Context manager (manual span)

For finer control, use the context manager API directly:

from msgflux import Spans
from opentelemetry.trace import Status, StatusCode

with Spans.init_flow("my-pipeline") as span:
    try:
        result = run_pipeline()
        span.set_status(Status(StatusCode.OK))
    except Exception as e:
        span.record_exception(e)
        span.set_status(Status(StatusCode.ERROR, str(e)))
        raise

Async version:

async with Spans.ainit_flow("my-pipeline") as span:
    result = await run_pipeline_async()
    span.set_status(Status(StatusCode.OK))

6. Automatic Instrumentation

Modules and Agents

Every call to a Module subclass automatically creates a span. When the module is the entry point (no parent span), a flow span is created; nested modules get module spans.

import msgflux as mf
import msgflux.nn as nn

model = mf.ChatCompletion("openai/gpt-4.1-mini")
agent = nn.Agent("Summarizer", model)

# Emits: flow > module(Summarizer) > model call
result = agent("Summarize this document...")

Each span records:

Module name and type
Execution status (OK / ERROR)
Exception details on failure
Full state_dict() when MSGFLUX_TELEMETRY_CAPTURE_STATE_DICT=true

Tools

LocalTool and MCPTool emit spans with:

Tool name, description, and type
Tool call ID (for correlation with the LM call)
Input arguments (JSON-encoded)
Execution type (local or remote)
Protocol (mcp for MCP tools)
Return value when MSGFLUX_TELEMETRY_CAPTURE_TOOL_CALL_RESPONSES=true

Functional API

All operations in msgflux.nn.functional are automatically traced:

Function	Description
`map_gather` / `amap_gather`	Map over args and gather results
`scatter_gather` / `ascatter_gather`	Scatter inputs and gather outputs
`bcast_gather`	Broadcast and gather
`Inline`	DSL workflow execution
`spawn`	Background execution

7. Programmatic Configuration

You can configure everything at runtime instead of using environment variables:

from msgflux.telemetry.config import configure_msgtrace

configure_msgtrace(
    enabled=True,
    exporter="otlp",
    otlp_endpoint="http://otel-collector:4318",
    service_name="my-ai-app",
    sampling_ratio="1.0",
    capture_platform=True,
    max_retries=3,
)

Note

Call configure_msgtrace() before creating any modules or agents to ensure all spans are captured correctly.

8. Sampling

Use MSGTRACE_SAMPLING_RATIO to control the fraction of traces that are recorded:

# Record 10% of all traces
export MSGTRACE_SAMPLING_RATIO=0.1

# Record everything (default behaviour when unset)
export MSGTRACE_SAMPLING_RATIO=1.0

9. Reducing Span Payload Size

For high-throughput systems, disable verbose captures to keep span sizes small:

# Disable tool response capture
export MSGFLUX_TELEMETRY_CAPTURE_TOOL_CALL_RESPONSES=false

# Disable full agent state capture
export MSGFLUX_TELEMETRY_CAPTURE_AGENT_PREPARE_MODEL_EXECUTION=false

# Disable module state dict capture (already off by default)
export MSGFLUX_TELEMETRY_CAPTURE_STATE_DICT=false