Monitoring & Metrics

note

At the moment of writing this documentation, the TANGO Private API server to make the requests is hosted at this link.

TANGO provides a comprehensive monitoring system to track the health, performance, and usage of your models, sessions, and deployments. The infrastructure leverages Prometheus for efficient time-series data collection and Grafana for rich, interactive visualizations.

Contributors can access monitoring data in two ways:

Grafana Dashboard: A pre-configured, web-based dashboard for real-time visual monitoring.
Metrics API: A set of RESTful endpoints for programmatically querying metric data.

This document details both the Grafana dashboard and the API endpoints available for monitoring.

Grafana Dashboard

For a high-level, visual overview of the platform's status, TANGO provides a dedicated Grafana dashboard. This dashboard offers an intuitive way to monitor key performance indicators (KPIs) in real-time, helping you quickly identify trends, anomalies, and potential issues.

Key metrics displayed on the dashboard typically include:

Active session counts.
Total model invocation failure rates.
Model invocation latency.
Total model counts.

Registration

Access to the Grafana dashboard is provided to all contributors. Please refer to your TANGO administrator for the URL and login credentials.

Metrics API

The Metrics API provides instead direct access to the underlying Prometheus data.

List Available Metrics

GET `/api/metrics`

Retrieves a list of all queryable TANGO metrics, including their descriptions and available parameters. This endpoint is the starting point for discovering what data you can query.

Requires bearer token authentication (bearerAuth).

Available Metrics

The following metrics are available through the API:

Metric Name	Description	Required Parameters	Optional Parameters
`active_sessions`	Number of active TANGO sessions.	None	`model_code`, `model_version`, `user_id`, `workspace`
`closed_sessions`	Number of closed TANGO sessions.	None	`model_code`, `model_version`, `user_id`, `workspace`
`total_invocations`	Total number of invocations in TANGO.	None	`model_code`, `model_version`, `status`, `type`
`total_deployments`	Total number of deployments in TANGO.	None	`workspace`, `proxy_class`, `connector_class`
`total_models`	Total number of models in TANGO.	None	`workspace`, `model_code`, `model_version`
`invocation_failure_rate`	Rate of failed invocations in TANGO in percentage.	None	`model_code`, `model_version`
`last_model_invocation_duration_seconds`	Last model invocation duration in seconds.	`model_code`, `model_version`	`success`

Response Format

If successful, the endpoint returns a JSON array of metric definition objects.

[
  {
    "name": "active_sessions",
    "description": "Number of active TANGO sessions",
    "prometheus_query": "sum(active_sessions{__PARAMS__})",
    "required_params": [],
    "optional_params": ["model_code", "model_version", "user_id", "workspace"]
  },
  {
    "name": "last_model_invocation_duration_seconds",
    "description": "Last model invocation duration in seconds",
    "prometheus_query": "model_invocation_duration_seconds_sum{__PARAMS__}",
    "required_params": ["model_code", "model_version"],
    "optional_params": ["success"]
  }
]

Retrieve a Specific Metric

GET `/api/metrics/{metricName}`

Fetches the data for a specific metric. You can filter the data by providing parameters as query strings in the URL. This endpoint supports both instant queries (returning the latest value) and range queries (returning data over a period of time).

Requires bearer token authentication (bearerAuth).

Path Parameters

Name	Type	Required	Description	Example
`metricName`	string	Yes	The name of the metric to query (from the list above).	`invocation_failure_rate`

Query Parameters

Metric-Specific Parameters

These parameters correspond to the required_params and optional_params for each metric, as detailed in the "Available Metrics" table.

Example: To get the total_invocations for a specific model, you can add ?model_code=my-model&model_version=1.0 to your request.

Time-Range Parameters

To perform a range query and retrieve data over a time period, you must provide all three of the following parameters. If none are provided, the API will return the latest instantaneous value.

important

When performing a range query, you must provide all three parameters: start, end, and step. Providing only one or two will result in an error.

Name	Type	Required (for range)	Description
`start`	string	Yes	The start of the time range. Can be a UNIX timestamp or an ISO 8601 string (`YYYY-MM-DDTHH:MM:SS`).
`end`	string	Yes	The end of the time range. Can be a UNIX timestamp or an ISO 8601 string (`YYYY-MM-DDTHH:MM:SS`).
`step`	string	Yes	The query resolution step width, specified in seconds (`s`), minutes (`m`), hours (`h`), etc.

Example: Range Query

This example retrieves the invocation_failure_rate for the model customer-churn-predictor over a 2-hour period, with a data point every 5 minutes.

GET /api/metrics/invocation_failure_rate?model_code=customer-churn-predictor&start=2025-08-27T09:00:00&end=2025-08-27T11:00:00&step=5m

Response Format

If successful, the endpoint returns a JSON object containing the metric name, the executed Prometheus query, and the data returned by Prometheus.

{
  "metric": "invocation_failure_rate",
  "query": "sum(total_invocations{status=\"ERROR\",model_code=\"customer-churn-predictor\"}) / sum(total_invocations{model_code=\"customer-churn-predictor\"}) * 100",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {},
        "values": [
          [1724756400, "5.2"],
          [1724756700, "5.1"],
          [1724757000, "5.3"]
        ]
      }
    ]
  }
}

note

An unsuccessful request may return a 400 Bad Request if required parameters are missing or time formats are invalid, a 404 Not Found if the metric name does not exist, or a 502 Bad Gateway if the backend fails to fetch data from Prometheus.

Grafana Dashboard​

Metrics API​

List Available Metrics​

GET /api/metrics​

Available Metrics​

Response Format​

Retrieve a Specific Metric​

GET /api/metrics/{metricName}​

Path Parameters​

Query Parameters​

Metric-Specific Parameters​

Time-Range Parameters​

Example: Range Query​

Response Format​

Grafana Dashboard

Metrics API

List Available Metrics

GET `/api/metrics`

Available Metrics

Response Format

Retrieve a Specific Metric

GET `/api/metrics/{metricName}`

Path Parameters

Query Parameters

Metric-Specific Parameters

Time-Range Parameters

Example: Range Query

Response Format