Langfuse
Open-source or cloud-hosted. Strong on tracing, prompt management, evaluation. Free tier generous. Good fit for teams wanting vendor-neutral stack and cost control.
Self-host Langfuse v3 on a single Postgres + ClickHouse stack; the official Helm chart deploys in 30 minutes on EKS or GKE. Tracing uses OpenTelemetry under the hood — instrumented LangChain, LlamaIndex, and OpenAI SDK calls flow in automatically. Prompt management ships with versioning, A/B labels, and a Git-style diff. Eval supports both code-based and LLM-as-judge with dataset versioning. Cloud pricing starts free for 50K observations/month and scales to roughly $499/month for the team plan.
Portkey
Focus on gateway, reliability, routing across multiple LLM providers. Strong for multi-model strategies where you need fallbacks, A/B testing, cost optimization at the API level.
Portkey sits in front of OpenAI, Anthropic, Bedrock, Vertex, and 200+ providers as a single OpenAI-compatible endpoint. Configurable retries with exponential backoff, automatic fallback chains (Claude Sonnet 4.5 -> Haiku -> GPT-5 mini), per-key rate limits, semantic caching, prompt-injection guardrails, and PII redaction. Pricing is consumption-based at roughly $0.001 per request for the production tier with volume discounts. Strongest fit for teams running >100K requests/day across multiple providers.
from portkey_ai import Portkey
client = Portkey(
api_key=PORTKEY_KEY,
config="pc-fallback-chain" # Sonnet -> Haiku -> GPT-5
)
response = client.chat.completions.create(
messages=[{"role":"user","content":"Summarize this case"}],
metadata={"team": "service", "feature": "case-summary"}
)
LangSmith
From LangChain. Tight integration if you use LangChain framework. Trace, evaluate, prompt management integrated. Commercial; pricing scales with volume.
LangSmith is the natural choice when LangChain or LangGraph is the agent framework — tracing is automatic, evaluation hooks deeply into the framework’s run model, and the prompt hub integrates with LangGraph node references. Pricing tiers: free for 5K traces/month, Plus at $39/user/month plus usage, Enterprise on negotiation. Adds Studio, a visual graph debugger, and Hub, a prompt-sharing marketplace. Weaker fit for non-LangChain stacks; the value proposition narrows.
Decision
Open-source preference: Langfuse. Multi-model routing focus: Portkey. LangChain-native stack: LangSmith. Many teams use combinations — Portkey for gateway, Langfuse for tracing, separate evaluation tooling.
Decision tree. Start with what you build on. LangChain or LangGraph -> default to LangSmith. Multi-provider gateway need -> Portkey. Vendor-neutral observability with self-host option -> Langfuse. Compliance or air-gap requirement -> Langfuse self-hosted is the only of the three with mature on-prem deployment. Combination patterns are common: Portkey gateway in front for reliability, Langfuse for trace storage and eval, LangSmith for any LangChain-heavy components. Helicone is a fourth option worth considering for pure OpenAI-compatible logging at the lowest cost.
Cost Considerations
Self-hosted Langfuse runs at infrastructure cost only — typically $200-$800/month for a 1M-trace deployment on managed Postgres and ClickHouse. Portkey’s gateway adds 20-50ms latency per call versus direct provider; budget for it in latency SLOs. LangSmith costs scale linearly with traces, so high-volume agentic systems often shift heavy traces to Langfuse and keep curated traces in LangSmith.
What to Do This Week
Pick one stack and instrument a single CRM AI feature end-to-end this week — trace, prompt version, and one eval — before committing across the org.