home
Mohamed Arbi Nsibi

Models Decay. Systems Do not Have To

April 19, 2026 · 13 min
· 0 views
Notes on MLOps, LLMOps, and what survives
LLMOps ML models MLOPs

Building an ML model in a notebook is the easy part. This article walks through what actually happens when you try to ship that model to real users, the practices, tools, and mindset shifts that fall under the umbrella of MLOps (and now LLMOps)

ML in research vs in production

Understanding ML systems will be helpful in designing and developing them. In this section, we’ll go over how ML systems are different from both ML in research (or as often taught in academia or school ) and traditional software

As the use of ML in the industry is still a bit new, most people with ML expertise have gained it through academia: taking courses, doing research, reading papers. If that describes your background, you might find it hard to understand how to use ML systems in the real world and deal with all the solutions to the problems they cause. ML in production is very different from ML in research. the next table shows the major differences

ResearchProduction
ObjectivesModel performance (accuracy, ..)Different stakeholders have different objectives
Computational priorityFast training, high throughputFast inference, low latency
DataStaticConstantly changing over time
FairnessGood to have (sadly)Important
InterpretabilityGood to haveImportant

moving from research to production isn’t just a technical step but it is a cultural shift. according to NVIDIA[2] MLOps is often confusing because it is :

MLops is a set of practices that helps data scientist and engineers to manage the ML life cycle more efficiently , it aims to bridge the gap between development and operations for ML. the goal of MLOps is to ensure that ML models are developed, tested , and deployed in a consistent and reliable way

Ops in MLOps comes from DevOps short for Developments and Operations. To operationalize something means to bring it into production, which includes deploying, monitoring, and maintaining it.

The main differences between MLOps and DevOps is that :

Devops is a set of practices that helps organizations to bridge the gap between software development and operations teams. MLOps is a similar set of practices that specifically addresses the needs of ML models, main key differences includes :

even though they have some differences, they share a lot of common principles, such as the importance of automation , continuous improvements, collaboration. Google [1] mentioned that organizations that have adopted DevOps practices can leverage those practices when implementing MLops

Benefits of MLOps

MLOps offers numerous benefits to organizations that adopt it, including:

Vocabulary : CI: testing code + data + model CD: deploying model as a service CT (Continuous Training): the ML-specific addition

Basic components of MLOps [1]

MLOps consists of several components that work together to manage the ML life cycle, including:

1. Exploratory data analysis (EDA)

2. Data prep and feature engineering

Data preparation and feature engineering are critical steps in the MLOps process. Data preparation involves cleaning, transforming, and formatting the raw data to make it suitable for model training. 

A feature store is a centralized repository that stores, versions, and serves features for both training and inference. It solves a classic MLOps headache: making sure the features your model sees in production are computed the exact same way as during training (avoiding training-serving skew).

3. Model training and tuning

4. Model review and governance

5. Model inference and serving

6. Model monitoring

Our models are designed to decay , the data will shift and the performance will drop and yes we can fight back. You can’t really fix what you can’t see for that we need monitoring which involves continuously monitoring the performance and behavior of the ML model in production. Tasks may include:

7. Automated model retraining

Automated model retraining involves retraining the ML model when its performance degrades or when new data becomes available. Automated model retraining includes:

From Notebooks to Production-Grade Code Real-world MLOps requires moving beyond messy notebooks to structured software. To ensure the reliability mentioned above, engineers should adopt robust design patterns like : type-safe data contracts (Pydantic), abstract interfaces for swappable models (ABCs), proper testing , and modular pipelines. I’ ll dedicate a future blogs to these patterns soon.

traditional baseline : core mlops lifecycle

MLOps Lifecycle

The next workflow shows 7 human processes each of which informs the next:

Notice the colored dots: different personas own different stages. Let’s meet them

Human-Centric ML Workflow

An example of a human-centric machine learning discovery and development workflow, with the relevant tasks, personas, and interactions

Who are the players in this Game ?

As AI matures, roles have become more specialized, according to Nvidia here is how the team usually looks :

Gartner ML Pipeline

Gartner’s view of the machine-learning pipeline , source (NVIDIA blog)

Myths of ML deployment

The main myths and their realities: see Stanford MLsys seminars [Video] Machine learning production myths

MythReality
#1 You only deploy one or two ML models at a timeEnterprise applications require compound systems and ensembles of dozens of specialized models and tools
#2 If we don’t do anything, model performance remains the same.Models degrade due to data distribution shifts; continuous monitoring is mandatory.
#3 You won’t need to update your models as much.Continuous learning and online evaluation drive weekly, if not daily, updates to context and prompts.
#4 Most ML engineers don’t need to worry about scale.LLMs introduce unprecedented computational bottlenecks and latency challenges in production.
#5 (LLMops-related) LLMs are plug-and-play and don’t need ‘Ops’.LLMs are non-deterministic; they require Prompt Versioning and LLM-as-a-Judge evaluations to ensure that a simple prompt change doesn’t break the entire system’s logic

Myth #1: Uber has thousands of models in production. At any given moment, Google[1] has thousands of models training concurrently with hundreds of billions parameters in size. Booking.com has 150+ models. A 2021 study by Algorithmia shows that among organizations with over 25,000 employees, 41% have more than 100 models in production

Fig shows a wide range of the tasks that leverage ML at Netflix.[3]

Different tasks that leverage ML at Netflix. Source: [Video]

Netflix ML Tasks

Myth #2 mentions “data distribution shifts”, that umbrella term actually splits into two distinct phenomena:

Concept Drift: The statistical relationship between the input features and the target variable changes over time, meaning the “logic” of the model’s predictions is no longer valid. Data Drift: The underlying distribution of the input data changes (e.g., due to seasonal shifts or sensor changes), even if the fundamental relationship between those features and the output remains the same.

Matching the Solution to the Data

Not all MLOps is the same. The requirements change based on what you are processing

The 4 stages of ML maturity

scalability is not a feature you can add, it is a state you achieve by evolving through the 4 stages of continual learning (framework adapted from Chip Huyen’s Designing Machine Learning Systems). from manual scripts to full automation is a cultural transformation that bridges the gap between the data scientist’s focus on patterns and the ML engineer’s focus on infrastructure

ML Maturity Stages

The LLMOps pivot: hallucination and prompt versioning

LLMOps: is a specialized subset of MLOps which focuses specifically on the challenges and requirements of managing LLMs. while MLOps covers the general principles and practices of managing machine learning models, LLMops addresses the unique characteristics of LLMs such as their large size complex training requirements and high computational demands [1]

and as Nvidia[2] defines it as a subset of the broader GenAIOps paradigm, focused on operationalizing transformer-based networks for language use cases in production applications. Language is a foundational modality that can be combined with other modalities to guide AI system behavior. For example, multimodal systems combine text and image data for visual content production.

The LLMOps workflow

According to Google [1] LLMOps involves a number of different steps, including:

Benefits of LLMOps [1]

GenAIOps capabilities include:

the transition to LLMs means we need to change how we monitor changes in data and focus more on understanding the subtleties of Natural language and human intent. the unit of logic has moved from the code to prompt. the main problem with GenAI is hallucination, we mitigate this using RAG which grounds the model with facts by fetching relevant context from entreprise documents . in this architecture the llm is not a database but a reasoning engine.

New Lifecycle: (RAG)

In this new paradigm, the lifecycle expands. we are no longer just managing a model; we are managing the Context This introduces the Vector-Retrieval Skew: a situation where your vector database index becomes out of sync with your primary data sources, leading the “reasoning engine” to provide outdated or incorrect facts. Managing this requires the same iterative process for traditional pipelines, but applied to embedding quality and retrieval accuracy [read more about RAG here ]

and to manage the production mode of GenAI, architects deploy a GenAI API gateway as shown in the Fig below. this infrastructure layer handles the operational overhead, prompt versioning, rate limiting, cost tracking across different providers and the enforcement of guardrails to filter adversarial inputs or unsafe outputs

In LLMOps, evals are the new unit tests. Every prompt change or model swap must be benchmarked against them to ensure the system is growing rather than just changing. The main evaluation strategies are:

Architecture shift : orchestrating compound systems

GenAI Architecture

We are moving away from simple REST APIs that serve static predictions. Today’s GenAI architecture requires an API gateway to handle routing, caching, and guardrails, which then feeds into an orchestrator that interacts with vector databases and foundation models dynamically.

GenAI API Gateway: for ease of development, ease of switching models, logging, monitoring metrics, alerting, cost tracking… The core idea of agents is to use a language model to choose a sequence of actions to take.

LLMOps vs MLOps

CategoryMLOpsGenAIOps / LLMOps
Core FocusModel Training & TuningModel Interaction & Context
Key AssetFeature StoresVector Databases & Embeddings
Data ManagementTabular & Structured business dataUnstructured data (Text, Video, Audio)
EvaluationStandard deterministic metrics (Accuracy, F1 Score)‘LLM-as-a-Judge’, output safety, hallucination rates, and prompt versioning

Traditional MLOps was all about structured data, managing feature store and tracking metrics like accuracy

LLMOps; a whole different beast : now we are dealing with unstructured text, managing prompts like code and using vector databases and building guardrails that keeps those powerful models to not go off the rails

a new set of problems to solve : we are not tunning models anymore, we are managing embeddings for rag systems or versioning our prompts and detect when exactly our model hallucinate , how to build a reliable system when the outputs are not guaranteed to be the same twice

LLMOps is not replacing MLOps it is just building right on top of it

all those fundamentals about data quality , monitoring and automation are much important than ever

as these ai systems become more and more creative , unpredictable the job is now a bit different, the challenge is not just managing the lifecycle of the model , the real task is designing systems to not just manage but trust AI

In very few short years we have moved from XGboost to transformers and the next architecture might be on the horizon. however the fundamentals of robust system design are constant. Reliability, scalability and maintainability are the only things that survive a model’s obsolescence

As we evaluate our pipelines today we can ask ourselves: is our current infrastructure designed to facilitate the iterative process required for long-term survival or have you built a static monument to a model that will be irrelevant in 6 months?

Consider this the foundation. There’s a lot more to dig into, piece by piece.

References

Share this article: LinkedIn

(END)

Join the discussion