Preparing a Model for TANGO

Before a machine learning model can be registered and managed within the TANGO infrastructure, it must be prepared to be compatible with TANGO's architecture. This preparation involves implementing a specific interface and structuring the model's files—its "artifacts"— in a way that TANGO can understand.

This guide details the three essential steps for this preparation process.

The Core Architecture: TANGO as a Proxy

The most critical concept to understand is that TANGO acts as an intelligent proxy, not as a model execution engine. It does not run the model's prediction code directly. Instead, it manages requests and responses to and from an external endpoint where the model is already running.

This decoupled design provides two major benefits.

Flexibility: TANGO can manage any model, regardless of the framework it was built in (Scikit-learn, PyTorch, TensorFlow, etc.), without needing to install its heavy dependencies.
Stability: the TANGO environment remains lightweight and stable, as the computationally intensive work is handled by a dedicated, external model server.

Step 1: Fulfilling Infrastructure Prerequisites

Before writing any TANGO-specific code, the following two infrastructure components must be in place.

An Artifact Repository: this is a location where all the model's files are stored and are accessible to TANGO. This can be a service like MLflow, Databricks, or even a properly configured file server or object storage bucket. TANGO will need to read from this repository to find the model's components.
A Prediction Endpoint: this is a live, accessible URL where the trained model is already deployed and served. TANGO will send prediction requests to this endpoint. This is typically a microservice created using tools like FastAPI, Flask or dedicated serving platforms like KServe.

Step 2: Implementing the `TangoModel` Interface

To enable TANGO to correctly format requests and interpret responses, a model's integration logic must extend the TangoModel base class.

This class requires the implementation of two mandatory methods responsible for data transformation.

map_request(body: dict) -> any: this method handles all pre-processing. Its role is to receive the raw JSON body from an incoming API request to TANGO and transform it into the precise format expected by the model's prediction endpoint (e.g., creating a pandas DataFrame, a NumPy array, a tensor or a specific JSON structure).
map_response(response: any) -> dict: this method handles all post-processing. It takes the raw output received from the model's prediction endpoint and transforms it into a clean, user-friendly JSON dictionary that will be returned to the end-user.

These two methods need to be implemented by the creator of the model, as they are specific to the model's functionality.

Step 3: Structuring the Model Artifacts

A model registered in TANGO is not a single file but a collection of components, or "artifacts," stored in the repository. The main components are the Model and optionally the Mapper and Explainers. TANGO uses a config.json file to understand how these artifacts are organized.

The Mapper Component (Recommended)

A core part of making a model compatible with TANGO is implementing the TangoModel interface, which handles data transformation. This logic can exist in two ways.

Directly within the served model: the simplest approach is for the deployed model at the prediction endpoint to directly extend the TangoModel class and implement the map_request and map_response methods itself.
Using a separate Mapper: a more advanced and recommended approach for performance is to create a separate, lightweight artifact called a Mapper.

A Mapper is a distinct component that contains only the TangoModel implementation for pre-processing (map_request) and post-processing (map_response). It does not contain the actual prediction code.

The key advantage of using a Mapper is performance and efficiency. The environment that runs the data transformation only needs to load the Mapper, which has minimal dependencies (e.g., just pandas). The heavy model, with its large libraries like TensorFlow or PyTorch, remains isolated at its serving endpoint. This reduces the size of the virtual environment needed for the transformation step, keeping the TANGO proxy lean and fast.

The Mapper class can be defined as

from interfaces.interfaces import TangoModel
import pandas as pd


class MyChurnModelMapper(TangoModel):
    """
    A mapper for the customer churn prediction model.
    """

    def map_request(self, body: dict) -> pd.DataFrame:
        # Transforms the incoming dictionary into a pandas DataFrame
        data = {'monthly_charges': [body['charges']], 'tenure': [body['tenure']]}
        return pd.DataFrame(data)

    def map_response(self, response: dict) -> dict:
        # Transforms the model's raw output.
        probability = response['churn_probability'][0]
        return {
            'prediction_label': 'Churn' if probability > 0.5 else 'No Churn',
            'churn_probability': float(probability)
        }

Including Explainers (Optional)

The TANGO framework also supports the integration of "explainer" models, which provide insights into a model's predictions. These explainers are treated as separate components within the artifact structure and can even have their own Mappers.

The `config.json` file

important

All related artifacts, such as mappers or explainers, must be in the same registry connector. Otherwise, the config.json file will fail to map them correctly.

Now that the components are defined, you must tell TANGO where to find them. This is done using a file named config.json.

This file acts as a "table of contents" for the model's artifacts. It must be placed in the root directory of the model's artifacts. It uses key-value pairs to point to the relative paths of the Mapper and any Explainers. If the model mapper is not defined and neither present, TANGO firstly checks the presence for the default mapper, named as model_mapper, and if not found, it uses the model itself as the mapper. If the model does not implement the TangoModel interface, TANGO will raise an error.

note

Even if the model has just one explainer, it should be mapped in the explainer_path_01. The number of nested explainers supported is set to 10, meaning you can have up to 10 explainers, each with its own mapper, named as explainer_mapper_01, explainer_mapper_02, etc.

If this file is omitted, TANGO will assume a default directory structure:

{
  "model_mapper_path": "model_mapper",
  "model_path": "model",
  "explainer_path_01": "explainer_01",
  "explainer_mapper_path_01": "explainer_mapper_01"
}

Example: Mapping Artifacts with `config.json`

Consider an MLFlow run that produces the following artifact structure with custom folder names:

/
├── config.json
├── random_forest_model/
├── forest_mapper/
├── explainer/
└── explainer_mapper/

To ensure TANGO can locate these components, the config.json file must map the corresponding paths. By defining the keys as shown below, the system correctly identifies the main model in the random_forest_model folder and its associated mapper in the forest_mapper folder, overriding the default names.

{
    "model_path": "random_forest_model",
    "model_mapper_path": "forest_mapper",
    "explainer_path_01": "explainer",
    "explainer_mapper_path_01": "explainer_mapper"
}

The Core Architecture: TANGO as a Proxy​

Step 1: Fulfilling Infrastructure Prerequisites​

Step 2: Implementing the TangoModel Interface​

Step 3: Structuring the Model Artifacts​

The Mapper Component (Recommended)​

Including Explainers (Optional)​

The config.json file​

Example: Mapping Artifacts with config.json​