<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>SentryML</title><description>Engineering-focused coverage of ML observability and MLOps. Model monitoring, drift detection, training/serving skew, debugging production model failures, evaluation pipelines, and the tooling that actually works at scale.</description><link>https://sentryml.com/</link><language>en</language><item><title>Model Monitoring Tools in 2026: What&apos;s Changed, What to Use Now</title><link>https://sentryml.com/posts/model-monitoring-tools-2/</link><guid isPermaLink="true">https://sentryml.com/posts/model-monitoring-tools-2/</guid><description>The model monitoring tools landscape shifted in 2026 — WhyLabs shut down, LLM observability went mainstream, and open source caught up to managed SaaS. Here&apos;s the current map.</description><pubDate>Mon, 22 Jun 2026 00:00:00 GMT</pubDate><category>model-monitoring</category><category>drift-detection</category><category>mlops</category><category>tooling</category><category>llm-observability</category><author>SentryML Editorial</author></item><item><title>Predicting Model Behavior Before Release: What OpenAI&apos;s Deployment Simulation Means for MLOps</title><link>https://sentryml.com/posts/weekly-predicting-model-behavior-before-release-by-simulating-deplo-2/</link><guid isPermaLink="true">https://sentryml.com/posts/weekly-predicting-model-behavior-before-release-by-simulating-deplo-2/</guid><description>OpenAI&apos;s Deployment Simulation replays 1.3M real conversations through candidate models before release, hitting 1.5x median error on safety predictions and surfacing behaviors like &apos;calculator hacking&apos; that conventional evals never find.</description><pubDate>Mon, 22 Jun 2026 00:00:00 GMT</pubDate><category>deployment-simulation</category><category>llm-safety</category><category>pre-deployment-evaluation</category><category>mlops</category><category>shadow-testing</category><category>model-behavior</category><author>SentryML Editorial</author></item><item><title>ML Model Deployment: Serving Frameworks, KV Cache, and the Latency Metrics That Matter</title><link>https://sentryml.com/posts/ml-model-deployment-2/</link><guid isPermaLink="true">https://sentryml.com/posts/ml-model-deployment-2/</guid><description>Once a model clears staging, the serving stack decision determines whether you hit your latency SLAs or spend a sprint chasing p99 spikes. Here&apos;s what to evaluate and what to instrument.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate><category>mlops</category><category>model-deployment</category><category>inference</category><category>latency</category><category>serving</category><author>SentryML Editorial</author></item><item><title>Replaying Production to Catch Drift: Inside OpenAI&apos;s Deployment Simulation Framework</title><link>https://sentryml.com/posts/weekly-predicting-model-behavior-before-release-by-simulating-deplo/</link><guid isPermaLink="true">https://sentryml.com/posts/weekly-predicting-model-behavior-before-release-by-simulating-deplo/</guid><description>OpenAI&apos;s deployment simulation replays 1.3M de-identified production conversations through a candidate model pre-release, catching behavior shifts static benchmarks miss. Here&apos;s how it works and what it means for teams running their own models.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate><category>mlops</category><category>evaluation</category><category>drift</category><category>model-monitoring</category><category>pre-deployment</category><category>safety</category><author>SentryML Editorial</author></item><item><title>Federated Learning in Production: What Substra Actually Does for Privacy-Preserving ML</title><link>https://sentryml.com/posts/creating-privacy-preserving-ai-with-substra/</link><guid isPermaLink="true">https://sentryml.com/posts/creating-privacy-preserving-ai-with-substra/</guid><description>Owkin&apos;s Substra framework keeps training data local while sharing only model weights — but federated architectures break standard MLOps assumptions around</description><pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate><category>federated-learning</category><category>privacy</category><category>mlops</category><category>tooling</category><category>monitoring</category><category>data-governance</category><author>SentryML Editorial</author></item><item><title>OpenAI Tops Gartner&apos;s Coding-Agent Quadrant. Now You Own a Production ML System.</title><link>https://sentryml.com/posts/openai-named-a-leader-in-enterprise-coding-agents-by-gartner/</link><guid isPermaLink="true">https://sentryml.com/posts/openai-named-a-leader-in-enterprise-coding-agents-by-gartner/</guid><description>Gartner named OpenAI a Leader in its first Magic Quadrant for Enterprise AI Coding Agents. The operational story is the part the press release skips: a</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate><category>llm-observability</category><category>drift</category><category>monitoring</category><category>evals</category><category>mlops</category><author>SentryML Editorial</author></item><item><title>The ML Monitoring Metrics Taxonomy: Drift, Data Quality, and Model Decay</title><link>https://sentryml.com/posts/ml-monitoring-metrics-taxonomy-drift-data-quality-decay/</link><guid isPermaLink="true">https://sentryml.com/posts/ml-monitoring-metrics-taxonomy-drift-data-quality-decay/</guid><description>A reference taxonomy of the signals that actually tell you a production ML system is failing — input drift, prediction drift, concept drift, data quality</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><category>mlops</category><category>monitoring</category><category>drift</category><category>data-quality</category><category>model-decay</category><category>observability</category><category>metrics</category><author>SentryML Editorial</author></item><item><title>OpenTelemetry GenAI Semantic Conventions: Instrument LLM Apps</title><link>https://sentryml.com/posts/opentelemetry-genai-semantic-conventions-instrumenting-llm-apps/</link><guid isPermaLink="true">https://sentryml.com/posts/opentelemetry-genai-semantic-conventions-instrumenting-llm-apps/</guid><description>How the OpenTelemetry GenAI semantic conventions standardize spans, metrics, and events for LLM apps, what they skip, and how to instrument without rework.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><category>observability</category><category>opentelemetry</category><category>llm-security</category><category>agents</category><category>monitoring</category><category>instrumentation</category><category>mlops</category><author>SentryML Editorial</author></item><item><title>Model Monitoring in Production: A Four-Layer Framework</title><link>https://sentryml.com/posts/model-monitoring-2/</link><guid isPermaLink="true">https://sentryml.com/posts/model-monitoring-2/</guid><description>Model monitoring covers more than drift detection. Here&apos;s the four-layer framework — software health, data quality, model quality, business KPIs — wired</description><pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate><category>model-monitoring</category><category>drift-detection</category><category>mlops</category><category>evidently</category><category>psi</category><author>SentryML Editorial</author></item><item><title>Model Monitoring for LLM Inference: Metrics Your APM Can&apos;t See</title><link>https://sentryml.com/posts/model-monitoring-3/</link><guid isPermaLink="true">https://sentryml.com/posts/model-monitoring-3/</guid><description>Model monitoring for LLM APIs requires a different metric set than traditional ML. Here&apos;s the signal hierarchy — TTFT, KV cache hit rate, output length</description><pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate><category>model-monitoring</category><category>llm-observability</category><category>ttft</category><category>drift-detection</category><category>mlops</category><category>vllm</category><author>SentryML Editorial</author></item><item><title>SmithDB and Five Other Things LangChain Shipped at Interrupt 2026</title><link>https://sentryml.com/posts/langchain-interrupt-2026-smithdb-announcements/</link><guid isPermaLink="true">https://sentryml.com/posts/langchain-interrupt-2026-smithdb-announcements/</guid><description>LangChain&apos;s Interrupt 2026 surfaced a purpose-built trace database, a context version-control system, and an automated failure-triage engine.</description><pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate><category>agent-observability</category><category>tracing</category><category>langsmith</category><category>mlops</category><category>infra</category><author>SentryML Editorial</author></item><item><title>LLM Benchmarks in 2026: Which Still Discriminate, and How to Run</title><link>https://sentryml.com/posts/llm-benchmarks-2/</link><guid isPermaLink="true">https://sentryml.com/posts/llm-benchmarks-2/</guid><description>Static benchmarks like MMLU and HumanEval have saturated for frontier models. Here&apos;s which LLM benchmarks still produce signal, why contamination is worse</description><pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>benchmarks</category><category>evaluation</category><category>model-selection</category><category>mlops</category><category>monitoring</category><author>SentryML Editorial</author></item><item><title>Watermarking Should Be Treated as a Monitoring Primitive</title><link>https://sentryml.com/posts/watermarking-should-be-treated-as-a-monitoring-primitive/</link><guid isPermaLink="true">https://sentryml.com/posts/watermarking-should-be-treated-as-a-monitoring-primitive/</guid><description>A new paper reframes LLM watermarking from an adversarial evasion problem into a monitoring infrastructure question.</description><pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate><category>watermarking</category><category>monitoring</category><category>provenance</category><category>attribution</category><category>mlops</category><author>SentryML Editorial</author></item><item><title>LLM Fine Tuning: Methods, Training Data, and Evaluation</title><link>https://sentryml.com/posts/llm-fine-tuning-2/</link><guid isPermaLink="true">https://sentryml.com/posts/llm-fine-tuning-2/</guid><description>A practitioner&apos;s guide to llm fine tuning — how to pick between SFT, LoRA, and DPO, what your training data actually needs, and how to validate a</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>fine-tuning</category><category>mlops</category><category>lora</category><category>dpo</category><category>evaluation</category><author>SentryML Editorial</author></item><item><title>LLM Testing: A Guide to Evals, Metrics, and Production Monitoring</title><link>https://sentryml.com/posts/llm-testing/</link><guid isPermaLink="true">https://sentryml.com/posts/llm-testing/</guid><description>LLM testing spans offline evals, CI gate checks, and live production monitoring — three distinct jobs that need different tools.</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>evaluation</category><category>monitoring</category><category>mlops</category><category>testing</category><category>observability</category><author>SentryML Editorial</author></item><item><title>ML Testing: A Checklist from Pre-Train Checks to Production Drift</title><link>https://sentryml.com/posts/ml-testing/</link><guid isPermaLink="true">https://sentryml.com/posts/ml-testing/</guid><description>ML testing spans pre-train sanity checks, behavioral validation, data integrity, and continuous drift monitoring.</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><category>ml-testing</category><category>model-validation</category><category>drift-detection</category><category>mlops</category><category>data-quality</category><author>SentryML Editorial</author></item><item><title>Choosing MLOps Tools: A Decision Framework for Production Teams</title><link>https://sentryml.com/posts/mlops-tools-2/</link><guid isPermaLink="true">https://sentryml.com/posts/mlops-tools-2/</guid><description>Picking the wrong MLOps tools costs months of migration work. Here&apos;s how to evaluate experiment tracking, orchestration, monitoring, and serving options</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><category>mlops</category><category>tooling</category><category>mlops-tools</category><category>model-monitoring</category><category>orchestration</category><author>SentryML Editorial</author></item><item><title>When Embedding-Based Defenses Fail in Multi-Agent LLMs</title><link>https://sentryml.com/posts/embedding-defenses-fail-multi-agent-llm-logging/</link><guid isPermaLink="true">https://sentryml.com/posts/embedding-defenses-fail-multi-agent-llm-logging/</guid><description>A new arXiv paper shows that embedding-distance detectors miss three classes of adversarial agent. The fix lives in your observability stack, not your</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>multi-agent</category><category>observability</category><category>llm-monitoring</category><category>agent-telemetry</category><category>drift-detection</category><category>mlops</category><author>SentryML Editorial</author></item><item><title>LLM Benchmarks Explained: What the Numbers Mean and Miss</title><link>https://sentryml.com/posts/llm-benchmarks/</link><guid isPermaLink="true">https://sentryml.com/posts/llm-benchmarks/</guid><description>A practical guide to the major LLM benchmarks — MMLU, HumanEval, GPQA Diamond, SWE-bench — what they actually test, why saturation makes most scores</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>benchmarks</category><category>evaluation</category><category>mlops</category><category>model-selection</category><category>monitoring</category><author>SentryML Editorial</author></item><item><title>LLM Fine Tuning in Production: A Practical MLOps Guide</title><link>https://sentryml.com/posts/llm-fine-tuning/</link><guid isPermaLink="true">https://sentryml.com/posts/llm-fine-tuning/</guid><description>When to use LLM fine tuning over RAG, how LoRA and QLoRA cut GPU costs, and what to monitor after you ship a fine-tuned model — for ML engineers who own</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>fine-tuning</category><category>mlops</category><category>lora</category><category>model-drift</category><category>monitoring</category><author>SentryML Editorial</author></item><item><title>Machine Learning Pipeline: Stages, Failure Points, and Monitoring</title><link>https://sentryml.com/posts/machine-learning-pipeline/</link><guid isPermaLink="true">https://sentryml.com/posts/machine-learning-pipeline/</guid><description>A practitioner&apos;s guide to the machine learning pipeline — from data ingestion to production monitoring — covering common failure points, drift types, and</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>mlops</category><category>monitoring</category><category>drift</category><category>pipelines</category><category>data-validation</category><category>ci-cd</category><author>SentryML Editorial</author></item><item><title>ML Model Deployment: A Guide to Shipping Models That Stay Healthy</title><link>https://sentryml.com/posts/ml-model-deployment/</link><guid isPermaLink="true">https://sentryml.com/posts/ml-model-deployment/</guid><description>ML model deployment fails far more often than it should — typically before the model ever serves traffic. Here&apos;s what breaks, which deployment patterns</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>mlops</category><category>model-deployment</category><category>production-ml</category><category>monitoring</category><category>feature-store</category><author>SentryML Editorial</author></item><item><title>MLOps Best Practices: What Keeps Models Running in Production</title><link>https://sentryml.com/posts/mlops-best-practices/</link><guid isPermaLink="true">https://sentryml.com/posts/mlops-best-practices/</guid><description>A practitioner&apos;s guide to mlops best practices — from CI/CD pipeline automation and model versioning to drift detection and continuous retraining — based</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>mlops</category><category>monitoring</category><category>drift</category><category>ci-cd</category><category>versioning</category><category>retraining</category><author>SentryML Editorial</author></item><item><title>MLOps Tools: A Practitioner&apos;s Map of the Production Stack</title><link>https://sentryml.com/posts/mlops-tools/</link><guid isPermaLink="true">https://sentryml.com/posts/mlops-tools/</guid><description>A category-by-category breakdown of MLOps tools — experiment tracking, orchestration, feature stores, serving, and monitoring — with honest tradeoffs for</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>mlops</category><category>tooling</category><category>experiment-tracking</category><category>orchestration</category><category>monitoring</category><author>SentryML Editorial</author></item><item><title>Model Monitoring Tools: A Technical Comparison for ML Teams</title><link>https://sentryml.com/posts/model-monitoring-tools/</link><guid isPermaLink="true">https://sentryml.com/posts/model-monitoring-tools/</guid><description>Evidently, Arize, WhyLabs, Fiddler, NannyML, Alibi Detect — how each tool actually detects drift, what it costs to run, and which one fits your stack.</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>model-monitoring</category><category>drift-detection</category><category>tooling</category><category>mlops</category><category>observability</category><author>SentryML Editorial</author></item><item><title>Model Monitoring in Production: What to Track and When to Act</title><link>https://sentryml.com/posts/model-monitoring/</link><guid isPermaLink="true">https://sentryml.com/posts/model-monitoring/</guid><description>A practical guide to model monitoring for ML engineers: drift types, the metrics that actually matter, handling the no-ground-truth problem, and which</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>model-monitoring</category><category>data-drift</category><category>mlops</category><category>observability</category><category>concept-drift</category><author>SentryML Editorial</author></item><item><title>OpenAI&apos;s DeployCo Pushes the Observability Problem Onto You</title><link>https://sentryml.com/posts/openai-deployco-forward-deployed-observability/</link><guid isPermaLink="true">https://sentryml.com/posts/openai-deployco-forward-deployed-observability/</guid><description>OpenAI&apos;s new $10B deployment subsidiary will build production AI systems inside enterprises. What that means for ML platform teams who inherit the runbook</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>mlops</category><category>observability</category><category>drift</category><category>deployment</category><category>openai</category><category>platform-engineering</category><author>SentryML Editorial</author></item><item><title>Detection Engineering for LLM Apps: A MITRE ATLAS Runbook</title><link>https://sentryml.com/posts/llm-detection-engineering-mitre-atlas-runbook/</link><guid isPermaLink="true">https://sentryml.com/posts/llm-detection-engineering-mitre-atlas-runbook/</guid><description>Mapping LLM application telemetry to MITRE ATLAS techniques. Concrete log shapes, alerting heuristics, and a runbook structure that scales beyond ad-hoc</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>detection-engineering</category><category>blue-team</category><category>mitre-atlas</category><category>llm-security</category><category>siem</category><category>incident-response</category><author>SentryML Editorial</author></item><item><title>A Lean 4 Stability Proof for Tool-Mediated LLM Agents</title><link>https://sentryml.com/posts/lean4-stability-proof-tool-mediated-llm-agents/</link><guid isPermaLink="true">https://sentryml.com/posts/lean4-stability-proof-tool-mediated-llm-agents/</guid><description>A new arXiv paper certifies controllability and ISS robustness for an LLM-driven SOC agent using Lean 4. The MLOps takeaway is simpler than the math</description><pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate><category>agents</category><category>observability</category><category>formal-methods</category><category>llm-monitoring</category><category>mlops</category><author>SentryML Editorial</author></item><item><title>The Agent Authority Gap Is an Observability Problem</title><link>https://sentryml.com/posts/agent-authority-gap-observability-instrumentation/</link><guid isPermaLink="true">https://sentryml.com/posts/agent-authority-gap-observability-instrumentation/</guid><description>Orchid Security&apos;s framing of agent governance as a delegation problem lands in the lap of ML observability teams.</description><pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate><category>agent-observability</category><category>identity</category><category>mlops</category><category>opentelemetry</category><category>governance</category><category>runbook</category><author>SentryML Editorial</author></item><item><title>Local Coding Assistants Crossed the Quality Bar: Now Observe Them</title><link>https://sentryml.com/posts/local-coding-assistants-quality-bar-observability/</link><guid isPermaLink="true">https://sentryml.com/posts/local-coding-assistants-quality-bar-observability/</guid><description>A practitioner&apos;s Reddit report on running Qwen3.6-27B locally signals a real inflection point. But moving off managed cloud APIs shifts monitoring</description><pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate><category>local-llm</category><category>inference</category><category>tooling</category><category>mlops</category><category>observability</category><category>serving</category><author>SentryML Editorial</author></item></channel></rss>