Calls Metrics and Monitoring¶
Available on Enterprise and Enterprise Advanced plans
This guide provides detailed information on monitoring Mattermost Calls performance and health through metrics and observability tools. Effective monitoring is essential for maintaining optimal call quality and quickly addressing any issues that arise.
Metrics Overview¶
Mattermost Calls provides metrics through Prometheus for both the Calls plugin and the RTCD service. These metrics help track:
Active call sessions and participants
Media track statistics
Connection states and errors
Resource utilization (CPU, memory, network)
WebSocket connections and events
The metrics are exposed through HTTP endpoints:
Calls Plugin:
/plugins/com.mattermost.calls/metricsRTCD Service:
/metrics(default) or a configured endpoint
Resource utilization metrics (CPU, memory, network) are mainly provided by an external service (node-exporter).
Metrics for the calls plugin are exposed through the
/plugins/com.mattermost.calls/metricssubpath under the existing Mattermost server metrics endpoint. This is controlled by the Listen address for performance configuration setting. It defaults to port 8067. For example:http://localhost:8067/plugins/com.mattermost.calls/metricsThe RTCD Service/metricsendpoint is exposed on the HTTP API (e.g.http://localhost:8045/metrics).
Setting Up Monitoring¶
For instructions on deploying Prometheus and Grafana for Mattermost, please refer to the Deploy Prometheus and Grafana for Performance Monitoring guide.
Once Prometheus and Grafana are set up, you will need to configure Prometheus to scrape metrics from the Calls-related services.
Prometheus Scrape Configuration¶
Add the following jobs to your prometheus.yml configuration:
scrape_configs:
- job_name: 'calls-plugin'
metrics_path: /plugins/com.mattermost.calls/metrics
static_configs:
- targets: ['MATTERMOST_SERVER_IP:8067']
labels:
service_name: 'calls-plugin'
- job_name: 'rtcd'
metrics_path: /metrics
static_configs:
- targets: ['RTCD_SERVER_IP:8045']
labels:
service_name: 'rtcd'
- job_name: 'rtcd-node-exporter'
metrics_path: /metrics
static_configs:
- targets: ['RTCD_SERVER_IP:9100']
labels:
service_name: 'rtcd'
- job_name: 'calls-offloader-node-exporter'
metrics_path: /metrics
static_configs:
- targets: ['CALLS_OFFLOADER_SERVER_IP:9100']
labels:
service_name: 'offloader'
Replace the placeholder IP addresses with your actual server addresses:
MATTERMOST_SERVER_IP: IP address of your Mattermost serverRTCD_SERVER_IP: IP address of your RTCD serverCALLS_OFFLOADER_SERVER_IP: IP address of your calls-offloader server (if deployed)
Important
Metrics Configuration Notice: Use the service_name labels as shown in the configuration above. These labels help organize metrics in dashboards and enable proper service identification.
Note
node_exporter: Optional but recommended for system-level metrics (CPU, memory, disk, network). See node_exporter setup guide for installation instructions.
calls-offloader: Only needed if you have call recording/transcription enabled.
Mattermost Calls Grafana Dashboard¶
You can use the official Mattermost Calls Performance Monitoring dashboard to visualize these metrics.
To import it directly into Grafana, use dashboard ID:
23225.The dashboard is also available as JSON source from the Mattermost performance assets repository for manual import or customization.
Key Metrics to Monitor¶
RTCD Metrics¶
Process Metrics¶
These metrics help monitor the health and resource usage of the RTCD process:
rtcd_process_cpu_seconds_total: Total CPU time spentrtcd_process_open_fds: Number of open file descriptorsrtcd_process_max_fds: Maximum number of file descriptorsrtcd_process_resident_memory_bytes: Memory usage in bytesrtcd_process_virtual_memory_bytes: Virtual memory used
WebRTC Connection Metrics¶
These metrics track the WebRTC connections and media flow:
rtcd_rtc_conn_states_total{state="X"}: Count of connections in different statesrtcd_rtc_errors_total{type="X"}: Count of RTC errors by typertcd_rtc_rtp_tracks_total{direction="X"}: Count of RTP tracks (incoming/outgoing)rtcd_rtc_sessions_total: Total number of active RTC sessions
WebSocket Metrics¶
These metrics track the signaling channel:
rtcd_ws_connections_total: Total number of active WebSocket connections. This is about RTCD <-> MM, so the connection count should match the number of MM nodes.rtcd_ws_messages_total{direction="X"}: Count of WebSocket messages (sent/received)
Calls Plugin Metrics¶
Similar metrics are available for the Calls plugin with the following prefixes:
Process metrics:
mattermost_plugin_calls_process_*WebRTC connection metrics:
mattermost_plugin_calls_rtc_*WebSocket metrics:
mattermost_plugin_calls_websocket_*Store metrics:
mattermost_plugin_calls_store_ops_total
Performance Baselines¶
The following performance benchmarks provide baseline metrics for RTCD deployments under various load conditions and configurations.
Deployment specifications
1x r6i.large nginx proxy
3x c5.large MM app nodes (HA)
2x db.x2g.xlarge RDS Aurora MySQL v8 (one writer, one reader)
1x (c7i.xlarge, c7i.2xlarge, c7i.4xlarge) RTCD
2x c7i.2xlarge load-test agents
App specifications
Mattermost v9.6
Mattermost Calls v0.28.0
RTCD v0.16.0
load-test agent v0.28.0
Media specifications
Speech sample bitrate: 80Kbps
Screen sharing sample bitrate: 1.6Mbps
Results
Below are the detailed benchmarks based on internal performance testing:
| Calls | Participants/call | Unmuted/call | Screen sharing | CPU (avg) | Memory (avg) | Bandwidth (in/out) | Instance type (RTCD) |
|---|---|---|---|---|---|---|---|
| 1 | 1000 | 2 | no | 47% | 1.46GB | 1Mbps / 194Mbps | c7i.xlarge |
| 1 | 800 | 1 | yes | 64% | 1.43GB | 2.7Mbps / 1.36Gbps | c7i.xlarge |
| 1 | 1000 | 1 | yes | 79% | 1.54GB | 2.9Mbps / 1.68Gbps | c7i.xlarge |
| 10 | 100 | 1 | yes | 74% | 1.56GB | 18.2Mbps / 1.68Gbps | c7i.xlarge |
| 100 | 10 | 2 | no | 49% | 1.46GB | 18.7Mbps / 175Mbps | c7i.xlarge |
| 100 | 10 | 1 | yes | 84% | 1.73GB | 171Mbps / 1.53Gbps | c7i.xlarge |
| 1 | 1000 | 2 | no | 20% | 1.44GB | 1.4Mbps / 194Mbps | c7i.2xlarge |
| 1 | 1000 | 2 | yes | 49% | 1.53GB | 3.6Mbps / 1.79Gbps | c7i.2xlarge |
| 2 | 1000 | 1 | yes | 73% | 2.38GB | 5.7Mbps / 3.06Gbps | c7i.2xlarge |
| 100 | 10 | 2 | yes | 60% | 1.74GB | 181Mbps / 1.62Gbps | c7i.2xlarge |
| 150 | 10 | 1 | yes | 72% | 2.26GB | 257Mbps / 2.30Gbps | c7i.2xlarge |
| 150 | 10 | 2 | yes | 79% | 2.34GB | 271Mbps / 2.41Gbps | c7i.2xlarge |
| 250 | 10 | 2 | no | 58% | 2.66GB | 47Mbps / 439Mbps | c7i.2xlarge |
| 1000 | 2 | 2 | no | 78% | 2.31GB | 178Mbps / 195Mbps | c7i.2xlarge |
| 2 | 1000 | 2 | yes | 41% | 2.6GB | 7.23Mbps / 3.60Gbps | c7i.4xlarge |
| 3 | 1000 | 2 | yes | 63% | 3.53GB | 10.9Mbps / 5.38Gbps | c7i.4xlarge |
| 4 | 1000 | 2 | yes | 83% | 4.40GB | 14.5Mbps / 7.17Gbps | c7i.4xlarge |
| 250 | 10 | 2 | yes | 79% | 3.49GB | 431Mbps / 3.73Gbps | c7i.4xlarge |
| 500 | 2 | 2 | yes | 71% | 2.54GB | 896Mbps / 919Mbps | c7i.4xlarge |
Troubleshooting Metrics Collection¶
Verify RTCD Metrics are Being Collected¶
To verify that Prometheus is successfully collecting RTCD metrics, use this command:
curl http://PROMETHEUS_IP:9090/api/v1/label/__name__/values | jq '.' | grep rtcd
This command queries Prometheus for all available metric names and filters for RTCD-related metrics.
If no RTCD metrics appear, check:
RTCD is running
Prometheus is configured to scrape the RTCD metrics endpoint
RTCD metrics port is accessible from Prometheus (default: 8045)
Check Prometheus Scrape Targets¶
To verify all Calls-related services are being scraped successfully:
Open the Prometheus web interface (typically
http://PROMETHEUS_IP:9090)Navigate to Status > Targets
Look for your configured Calls services:
Mattermost server (for Calls plugin metrics)
RTCD service
Each target should show status “UP” in green. If a target shows “DOWN” or errors:
Verify the service is running
Check network connectivity between Prometheus and the target
Verify the metrics endpoint is accessible
Other Calls Documentation¶
Calls Overview: Overview of deployment options and architecture
RTCD Setup and Configuration: Comprehensive guide for setting up the dedicated RTCD service
Calls Offloader Setup and Configuration: Setup guide for call recording and transcription
Calls Deployment on Kubernetes: Detailed guide for deploying Calls in Kubernetes environments
Calls Troubleshooting: Detailed troubleshooting steps and debugging techniques
Note: Configure Prometheus storage accordingly to balance disk usage with retention needs. If you need to be tight on storage, you can use a short retention period. If you have lots of storage you can keep the retention length longer.