Calls Metrics and Monitoring

This guide provides detailed information on monitoring Mattermost Calls performance and health through metrics and observability tools. Effective monitoring is essential for maintaining optimal call quality and quickly addressing any issues that arise.

Metrics Overview

Mattermost Calls provides metrics through Prometheus for both the Calls plugin and the RTCD service. These metrics help track:

  • Active call sessions and participants

  • Media track statistics

  • Connection states and errors

  • Resource utilization (CPU, memory, network)

  • WebSocket connections and events

The metrics are exposed through HTTP endpoints:

  • Calls Plugin: /plugins/com.mattermost.calls/metrics

  • RTCD Service: /metrics (default) or a configured endpoint

Resource utilization metrics (CPU, memory, network) are mainly provided by an external service (node-exporter).

Metrics for the calls plugin are exposed through the /plugins/com.mattermost.calls/metrics subpath under the existing Mattermost server metrics endpoint. This is controlled by the Listen address for performance configuration setting. It defaults to port 8067. For example: http://localhost:8067/plugins/com.mattermost.calls/metrics The RTCD Service /metrics endpoint is exposed on the HTTP API (e.g. http://localhost:8045/metrics).

Setting Up Monitoring

For instructions on deploying Prometheus and Grafana for Mattermost, please refer to the Deploy Prometheus and Grafana for Performance Monitoring guide.

Once Prometheus and Grafana are set up, you will need to configure Prometheus to scrape metrics from the Calls-related services.

Prometheus Scrape Configuration

Add the following jobs to your prometheus.yml configuration:

scrape_configs:
  - job_name: 'calls-plugin'
    metrics_path: /plugins/com.mattermost.calls/metrics
    static_configs:
      - targets: ['MATTERMOST_SERVER_IP:8067']
        labels:
          service_name: 'calls-plugin'

  - job_name: 'rtcd'
    metrics_path: /metrics
    static_configs:
      - targets: ['RTCD_SERVER_IP:8045']
        labels:
          service_name: 'rtcd'

  - job_name: 'rtcd-node-exporter'
    metrics_path: /metrics
    static_configs:
      - targets: ['RTCD_SERVER_IP:9100']
        labels:
          service_name: 'rtcd'

  - job_name: 'calls-offloader-node-exporter'
    metrics_path: /metrics
    static_configs:
      - targets: ['CALLS_OFFLOADER_SERVER_IP:9100']
        labels:
          service_name: 'offloader'

Replace the placeholder IP addresses with your actual server addresses:

  • MATTERMOST_SERVER_IP: IP address of your Mattermost server

  • RTCD_SERVER_IP: IP address of your RTCD server

  • CALLS_OFFLOADER_SERVER_IP: IP address of your calls-offloader server (if deployed)

Important

Metrics Configuration Notice: Use the service_name labels as shown in the configuration above. These labels help organize metrics in dashboards and enable proper service identification.

Note

  • node_exporter: Optional but recommended for system-level metrics (CPU, memory, disk, network). See node_exporter setup guide for installation instructions.

  • calls-offloader: Only needed if you have call recording/transcription enabled.

Mattermost Calls Grafana Dashboard

You can use the official Mattermost Calls Performance Monitoring dashboard to visualize these metrics.

Key Metrics to Monitor

RTCD Metrics

Process Metrics

These metrics help monitor the health and resource usage of the RTCD process:

  • rtcd_process_cpu_seconds_total: Total CPU time spent

  • rtcd_process_open_fds: Number of open file descriptors

  • rtcd_process_max_fds: Maximum number of file descriptors

  • rtcd_process_resident_memory_bytes: Memory usage in bytes

  • rtcd_process_virtual_memory_bytes: Virtual memory used

WebRTC Connection Metrics

These metrics track the WebRTC connections and media flow:

  • rtcd_rtc_conn_states_total{state="X"}: Count of connections in different states

  • rtcd_rtc_errors_total{type="X"}: Count of RTC errors by type

  • rtcd_rtc_rtp_tracks_total{direction="X"}: Count of RTP tracks (incoming/outgoing)

  • rtcd_rtc_sessions_total: Total number of active RTC sessions

WebSocket Metrics

These metrics track the signaling channel:

  • rtcd_ws_connections_total: Total number of active WebSocket connections. This is about RTCD <-> MM, so the connection count should match the number of MM nodes.

  • rtcd_ws_messages_total{direction="X"}: Count of WebSocket messages (sent/received)

Calls Plugin Metrics

Similar metrics are available for the Calls plugin with the following prefixes:

  • Process metrics: mattermost_plugin_calls_process_*

  • WebRTC connection metrics: mattermost_plugin_calls_rtc_*

  • WebSocket metrics: mattermost_plugin_calls_websocket_*

  • Store metrics: mattermost_plugin_calls_store_ops_total

Performance Baselines

The following performance benchmarks provide baseline metrics for RTCD deployments under various load conditions and configurations.

Deployment specifications

  • 1x r6i.large nginx proxy

  • 3x c5.large MM app nodes (HA)

  • 2x db.x2g.xlarge RDS Aurora MySQL v8 (one writer, one reader)

  • 1x (c7i.xlarge, c7i.2xlarge, c7i.4xlarge) RTCD

  • 2x c7i.2xlarge load-test agents

App specifications

  • Mattermost v9.6

  • Mattermost Calls v0.28.0

  • RTCD v0.16.0

  • load-test agent v0.28.0

Media specifications

  • Speech sample bitrate: 80Kbps

  • Screen sharing sample bitrate: 1.6Mbps

Results

Below are the detailed benchmarks based on internal performance testing:

Calls Participants/call Unmuted/call Screen sharing CPU (avg) Memory (avg) Bandwidth (in/out) Instance type (RTCD)
1 1000 2 no 47% 1.46GB 1Mbps / 194Mbps c7i.xlarge
1 800 1 yes 64% 1.43GB 2.7Mbps / 1.36Gbps c7i.xlarge
1 1000 1 yes 79% 1.54GB 2.9Mbps / 1.68Gbps c7i.xlarge
10 100 1 yes 74% 1.56GB 18.2Mbps / 1.68Gbps c7i.xlarge
100 10 2 no 49% 1.46GB 18.7Mbps / 175Mbps c7i.xlarge
100 10 1 yes 84% 1.73GB 171Mbps / 1.53Gbps c7i.xlarge
1 1000 2 no 20% 1.44GB 1.4Mbps / 194Mbps c7i.2xlarge
1 1000 2 yes 49% 1.53GB 3.6Mbps / 1.79Gbps c7i.2xlarge
2 1000 1 yes 73% 2.38GB 5.7Mbps / 3.06Gbps c7i.2xlarge
100 10 2 yes 60% 1.74GB 181Mbps / 1.62Gbps c7i.2xlarge
150 10 1 yes 72% 2.26GB 257Mbps / 2.30Gbps c7i.2xlarge
150 10 2 yes 79% 2.34GB 271Mbps / 2.41Gbps c7i.2xlarge
250 10 2 no 58% 2.66GB 47Mbps / 439Mbps c7i.2xlarge
1000 2 2 no 78% 2.31GB 178Mbps / 195Mbps c7i.2xlarge
2 1000 2 yes 41% 2.6GB 7.23Mbps / 3.60Gbps c7i.4xlarge
3 1000 2 yes 63% 3.53GB 10.9Mbps / 5.38Gbps c7i.4xlarge
4 1000 2 yes 83% 4.40GB 14.5Mbps / 7.17Gbps c7i.4xlarge
250 10 2 yes 79% 3.49GB 431Mbps / 3.73Gbps c7i.4xlarge
500 2 2 yes 71% 2.54GB 896Mbps / 919Mbps c7i.4xlarge

Troubleshooting Metrics Collection

Verify RTCD Metrics are Being Collected

To verify that Prometheus is successfully collecting RTCD metrics, use this command:

curl http://PROMETHEUS_IP:9090/api/v1/label/__name__/values | jq '.' | grep rtcd

This command queries Prometheus for all available metric names and filters for RTCD-related metrics.

If no RTCD metrics appear, check:

  1. RTCD is running

  2. Prometheus is configured to scrape the RTCD metrics endpoint

  3. RTCD metrics port is accessible from Prometheus (default: 8045)

Check Prometheus Scrape Targets

To verify all Calls-related services are being scraped successfully:

  1. Open the Prometheus web interface (typically http://PROMETHEUS_IP:9090)

  2. Navigate to Status > Targets

  3. Look for your configured Calls services:

    • Mattermost server (for Calls plugin metrics)

    • RTCD service

Each target should show status “UP” in green. If a target shows “DOWN” or errors:

  • Verify the service is running

  • Check network connectivity between Prometheus and the target

  • Verify the metrics endpoint is accessible

Other Calls Documentation

Note: Configure Prometheus storage accordingly to balance disk usage with retention needs. If you need to be tight on storage, you can use a short retention period. If you have lots of storage you can keep the retention length longer.