Calls Metrics and Monitoring#
Available only on Enterprise plans
This guide provides detailed information on monitoring Mattermost Calls performance and health through metrics and observability tools. Effective monitoring is essential for maintaining optimal call quality and quickly addressing any issues that arise.
Metrics Overview#
Mattermost Calls provides metrics through Prometheus for both the Calls plugin and the RTCD service. These metrics help track:
Active call sessions and participants
Media track statistics
Connection states and errors
Resource utilization (CPU, memory, network)
WebSocket connections and events
The metrics are exposed through HTTP endpoints:
Calls Plugin:
/plugins/com.mattermost.calls/metrics
RTCD Service:
/metrics
(default) or a configured endpoint
Setting Up Monitoring#
Prerequisites#
To monitor Calls metrics, you’ll need:
Prometheus: For collecting and storing metrics
Grafana: For visualizing metrics (optional but recommended)
Installing Prometheus#
Download and install Prometheus:
Visit the Prometheus download page for installation instructions.
Configure Prometheus to scrape metrics from all Calls-related services:
Complete
prometheus.yml
configuration for Calls monitoring:global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['PROMETHEUS_IP:9090'] - job_name: 'mattermost' metrics_path: /metrics static_configs: - targets: ['MATTERMOST_SERVER_IP:8067'] - job_name: 'calls-plugin' metrics_path: /plugins/com.mattermost.calls/metrics static_configs: - targets: ['MATTERMOST_SERVER_IP:8067'] labels: service_name: 'calls-plugin' - job_name: 'rtcd' metrics_path: /metrics static_configs: - targets: ['RTCD_SERVER_IP:8045'] labels: service_name: 'rtcd' - job_name: 'rtcd-node-exporter' metrics_path: /metrics static_configs: - targets: ['RTCD_SERVER_IP:9100'] labels: service_name: 'rtcd' - job_name: 'calls_offloader-node-exporter' metrics_path: /metrics static_configs: - targets: ['CALLS_OFFLOADER_SERVER_IP:9100'] labels: service_name: 'offloader'
Replace the placeholder IP addresses with your actual server addresses:
MATTERMOST_SERVER_IP
: IP address of your Mattermost serverRTCD_SERVER_IP
: IP address of your RTCD serverCALLS_OFFLOADER_SERVER_IP
: IP address of your calls-offloader server (if deployed)PROMETHEUS_IP
: IP address of your Prometheus serverNote: The configuration above uses the default ports (RTCD:
8045
, Mattermost metrics:8067
, etc.). Adjust these ports inprometheus.yml
if you have customized them.
Important
Metrics Path: Ensure the metrics paths are correct. The RTCD service exposes metrics at
/metrics
by default, and the Calls plugin at/plugins/com.mattermost.calls/metrics
.Important
Metrics Configuration Notice: Use the
service_name
labels as shown in the configuration above. These labels help organize metrics in dashboards and enable proper service identification.Note
node_exporter: Optional but recommended for system-level metrics (CPU, memory, disk, network). See node_exporter setup guide for installation instructions.
calls-offloader: Only needed if you have call recording/transcription enabled
Installing Grafana#
Download and install Grafana:
Visit the Grafana download page for installation instructions.
Configure Grafana to use Prometheus as a data source:
Add a new data source in Grafana
Select Prometheus as the type
Enter the URL of your Prometheus server
Test and save the configuration
Import the Mattermost Calls dashboard:
Navigate to Dashboards > Import in Grafana
Enter dashboard ID:
23225
or use the direct link: Mattermost Calls Performance MonitoringSelect your Prometheus data source, and enter values for the
Confirm the port used for RTCD metrics (default is
8045
), and the port used for the Calls plugin metrics (default is8067
)Click Import to add the dashboard to your Grafana instance
Note
The dashboard is also available as JSON source from the Mattermost performance assets repository for manual import or customization.
Key Metrics to Monitor#
RTCD Metrics#
Process Metrics#
These metrics help monitor the health and resource usage of the RTCD process:
rtcd_process_cpu_seconds_total
: Total CPU time spentrtcd_process_open_fds
: Number of open file descriptorsrtcd_process_max_fds
: Maximum number of file descriptorsrtcd_process_resident_memory_bytes
: Memory usage in bytesrtcd_process_virtual_memory_bytes
: Virtual memory used
Interpretation:
High CPU usage (>70%) may indicate the need for additional RTCD instances
Steadily increasing memory usage might indicate a memory leak
High number of file descriptors could indicate connection handling issues
WebRTC Connection Metrics#
These metrics track the WebRTC connections and media flow:
rtcd_rtc_conn_states_total{state="X"}
: Count of connections in different statesrtcd_rtc_errors_total{type="X"}
: Count of RTC errors by typertcd_rtc_rtp_tracks_total{direction="X"}
: Count of RTP tracks (incoming/outgoing)rtcd_rtc_sessions_total
: Total number of active RTC sessions
Interpretation:
Increasing error counts may indicate connectivity or configuration issues
Track by state to see if connections are failing to establish or dropping
Larger track counts require proportionally more CPU and bandwidth
WebSocket Metrics#
These metrics track the signaling channel:
rtcd_ws_connections_total
: Total number of active WebSocket connectionsrtcd_ws_messages_total{direction="X"}
: Count of WebSocket messages (sent/received)
Interpretation:
Connection count should match expected participant numbers
Unusually high message counts might indicate protocol issues
Connection drops might indicate network issues
Calls Plugin Metrics#
Similar metrics are available for the Calls plugin with the following prefixes:
Process metrics:
mattermost_plugin_calls_process_*
WebRTC connection metrics:
mattermost_plugin_calls_rtc_*
WebSocket metrics:
mattermost_plugin_calls_websocket_*
Store metrics:
mattermost_plugin_calls_store_ops_total
Performance Baselines#
The following performance benchmarks provide baseline metrics for RTCD deployments under various load conditions and configurations.
Deployment specifications
1x r6i.large nginx proxy
3x c5.large MM app nodes (HA)
2x db.x2g.xlarge RDS Aurora MySQL v8 (one writer, one reader)
1x (c7i.xlarge, c7i.2xlarge, c7i.4xlarge) RTCD
2x c7i.2xlarge load-test agents
App specifications
Mattermost v9.6
Mattermost Calls v0.28.0
RTCD v0.16.0
load-test agent v0.28.0
Media specifications
Speech sample bitrate: 80Kbps
Screen sharing sample bitrate: 1.6Mbps
Results
Below are the detailed benchmarks based on internal performance testing:
Calls |
Users/call |
Unmuted/call |
Screen sharing |
CPU (avg) |
Memory (avg) |
Bandwidth (in/out) |
Instance (EC2) |
---|---|---|---|---|---|---|---|
100 |
8 |
2 |
no |
60% |
0.5GB |
22Mbps / 125Mbps |
c6i.xlarge |
100 |
8 |
2 |
no |
30% |
0.5GB |
22Mbps / 125Mbps |
c6i.2xlarge |
100 |
8 |
2 |
yes |
86% |
0.7GB |
280Mbps / 2.2Gbps |
c6i.2xlarge |
10 |
50 |
2 |
no |
35% |
0.3GB |
5.25Mbps / 86Mbps |
c6i.xlarge |
10 |
50 |
2 |
no |
16% |
0.3GB |
5.25Mbps / 86Mbps |
c6i.2xlarge |
10 |
50 |
2 |
yes |
90% |
0.3GB |
32Mbps / 1.33Gbps |
c6i.xlarge |
10 |
50 |
2 |
yes |
45% |
0.3GB |
32Mbps / 1.33Gbps |
c6i.2xlarge |
5 |
200 |
2 |
no |
65% |
0.6GB |
8.2Mbps / 180Mbps |
c6i.xlarge |
5 |
200 |
2 |
no |
30% |
0.6GB |
8.2Mbps / 180Mbps |
c6i.2xlarge |
5 |
200 |
2 |
yes |
90% |
0.7GB |
31Mbps / 2.2Gbps |
c6i.2xlarge |
Troubleshooting Metrics Collection#
Verify RTCD Metrics are Being Collected#
To verify that Prometheus is successfully collecting RTCD metrics, use this command:
curl http://PROMETHEUS_IP:9090/api/v1/label/__name__/values | jq '.' | grep rtcd
This command queries Prometheus for all available metric names and filters for RTCD-related metrics.
If no RTCD metrics appear, check:
RTCD is running
Prometheus is configured to scrape the RTCD metrics endpoint
RTCD metrics port is accessible from Prometheus (default: 8045)
Check Prometheus Scrape Targets#
To verify all Calls-related services are being scraped successfully:
Open the Prometheus web interface (typically
http://PROMETHEUS_IP:9090
)Navigate to Status > Targets
Look for your configured Calls services:
Mattermost server (for Calls plugin metrics)
RTCD service
Each target should show status “UP” in green. If a target shows “DOWN” or errors:
Verify the service is running
Check network connectivity between Prometheus and the target
Verify the metrics endpoint is accessible
Other Calls Documentation#
Calls Overview: Overview of deployment options and architecture
RTCD Setup and Configuration: Comprehensive guide for setting up the dedicated RTCD service
Calls Offloader Setup and Configuration: Setup guide for call recording and transcription
Calls Deployment on Kubernetes: Detailed guide for deploying Calls in Kubernetes environments
Calls Troubleshooting: Detailed troubleshooting steps and debugging techniques
Configure Prometheus storage accordingly to balance disk usage with retention needs.