True foundry research companion
what is it
- started in 2021
- As an MLops tool in cloud and onprem
- Pivoted to LLM Ops (LLM gateway, deploy, finetune, rag, prompt lib)
- Further pivot to MCP and Agentic gateways and governance in the cloud and on prem
MLFlow and True Foundry
- MLflow
- ****************
- MLflow is opensource and widely used for running and evaluating models during training
- Also expanded into AI space such as prompt effectiveness, evals, and cost control of LLMs
- Free but managing and hosting is on enterprises
- ---------------
- True Foundry
- ***********************
- Managed
- Commercial only
- LLM and Agentic shift seem strong
- What an enterprise has been implementing internally over the last couple of years using LangChain ecosystem and libraries and utilities are now offered by True Foundry
- ----------------------
- Caution
- ************************
- QUestion to ask is the licensing cost
- Vendor lock in
- Probably good for the bottom 60% of the LLM capabilites
- However if one is to go to agentic core or similar cloud based platform these may be out of the box
- The "agent run time" is likely to be managed and controlled by frontier companies
- So it is not clear how it will integrate into that 40% space
So questions to ask are from the above
- Cost factors
- vendor lock
- How does it compare to native aws and azure offerings
- How can it help in "operationalizing" ai in enterprises, even at mid and lower levels
- What would an enterprise want in its "operational ai platform"?
Another detailed question
- If one is to use say Bedrock, what is the overlap of this functionality?
- what will one be missing from that that True Foundry still provides?
- Can you list the "absolute" requirements of an ai operational platform that one would want?
Key features interested in licensing
- AI Gateway: Routes LLM calls, enforces rate limits and quotas, meters usage per team, applies semantic caching and guardrails
- MCP Gateway: Central registry for all MCP servers with per-server RBAC, OAuth 2.0, and environment grouping (dev/staging/prod)
- Agent Gateway: Governs multi-agent workflows, traces agent-to-tool calls, enforces agent-level access policies, supports human-in-the-loop approvals
- Prompt Management: Versioned prompt templates stored centrally and shared across teams
- Control Plane UI: Single dashboard for administering all three gateways
Hyper scaler like aws vs True Foundry
- First of all very similar features in hyper scalers
- Both offer agent registries, mcp registries, rbac, administration etc.
- True Foundry may be more focused, at a higher cost
Ok, true competitors
- Only hyper scalers honestly
- There are many LLM gateways but none for MCP and Agentic layers along with rbac and control planes
- Likely the later will be done by Frontier labs likely
Summary
- if adopts aws agentcore for example, it has most of the functionality. so just use it
- Or use frontier agentic run times: Copilot, agentcore, Google, etc
- Space is still evolving
Devops capabilities of True foundry summarizied
- Deploy AI assets, LLM Inference Servers, MCPs, and Agents in their runtime environments
- Dashboards to manage the assets and deploy
- Observability dashboards
- Logging dashboards
- Access control of asset dashboards
- CI/CD pipelines for DevOps automation via Git and other tools
- Environment promotion ? promote MCP servers or agents from dev ? staging ? prod with approval gates
- Version management and rollbacks ? track deployed versions, roll back to a prior version on failure
- Health checks and alerting ? liveness/readiness probes on deployed assets with failure alerts
- Cost attribution per asset ? track which agent, MCP server, or team is driving LLM spend
- Rate limiting and quota enforcement per asset ? cap consumption per agent or MCP server
- Secret and credential management ? centralized storage of API keys, OAuth tokens, and credentials without hardcoding in agent or MCP server code
Security features
- Authentication & Identity: every agent, user, and service account has a verified identity before touching any resource; no anonymous access to MCP servers or LLM endpoints
- Authorization & RBAC: fine-grained control over who can invoke which agent, which MCP tool, and which LLM ? enforced at the gateway, not the application
- Secret & Credential Management: API keys, OAuth tokens, and service credentials never hardcoded; centrally stored, rotated, and audited
- Audit Logging & Non-repudiation: immutable, tamper-evident logs of every LLM call, tool invocation, and agent action ? who did what, when, and with what result
- Data Residency & Sovereignty: control over which regions or infrastructure handle AI traffic and data, especially for HIPAA, ITAR, and EU AI Act compliance
- Input/Output Guardrails: PII detection, prompt injection detection, and content filtering applied at the gateway before reaching the model or tool
- Prompt Injection Defense: agents that call external tools or ingest external content are vulnerable to malicious instructions embedded in tool outputs; the gateway needs to detect and block these
- Blast Radius Containment: rate limits, quotas, and environment isolation (dev/staging/prod) so a compromised or runaway agent cannot escalate or exhaust resources across the enterprise
- MCP Server Trust Verification: ensuring agents only connect to registered, verified MCP servers and not rogue or shadow tool endpoints
- Zero Trust Networking: gateway-to-tool and agent-to-agent traffic is authenticated and encrypted in transit; no implicit trust based on network location
Some quoted competitors
- AWS Bedrock + AgentCore
- Azure AI Foundry + Azure API Management
- Kong AI Gateway
- Cloudflare AI Gateway
- TrueFoundry
- Portkey
- LiteLLM
Few architecture pictures
Few architecture pictures
LLM gateway arch

MCP Gateway arch

Skills registry

Control plane architecture

Control plane

gateway plane

Compute plane

Security model

working with IDPs
