PostgreSQL high availability cluster¶
- nosearch:
This guide describes how to deploy a high availability PostgreSQL cluster for Mattermost using repmgr for replication management and automatic failover, HAProxy for connection routing, and Keepalived for Virtual IP (VIP) management.
This is infrastructure-level HA that operates independently of your Mattermost edition. It is compatible with any self-hosted Mattermost deployment.
Note
This guide has been validated on: Ubuntu 24.04 LTS, PostgreSQL 17, repmgr 5.5, HAProxy 2.8, Keepalived.
Architecture overview¶
A PostgreSQL HA cluster for Mattermost consists of three nodes running in parallel. Each node runs the full stack: PostgreSQL, repmgr daemon (repmgrd), HAProxy, Keepalived, and a health-check service. A Virtual IP (VIP) floats across nodes and always points to the current primary.
VIP: <CLUSTER_VIP>
│
┌───────────────┼───────────────┐
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│ pg1 │ │ pg2 │ │ pg3 │
│ │ │ │ │ │
│ HAProxy │ │ HAProxy │ │ HAProxy │
│ Keepalived │ │ Keepalived │ │ Keepalived │
│ pgchk.py │ │ pgchk.py │ │ pgchk.py │
│ repmgrd │ │ repmgrd │ │ repmgrd │
├─────────────┤ ├─────────────┤ ├─────────────┤
│ PostgreSQL │ │ PostgreSQL │ │ PostgreSQL │
│ PRIMARY │ │ STANDBY │ │ STANDBY │
└─────────────┘ └─────────────┘ └─────────────┘
Components:
Component |
Version |
Role |
|---|---|---|
PostgreSQL |
17 |
Primary database engine. Streaming replication with replication slots. |
repmgr / repmgrd |
5.5 |
Replication manager. Monitors cluster health and automatically promotes a standby when the primary fails. |
HAProxy |
2.8 |
TCP load balancer. Routes write traffic to the primary and read traffic to standbys via two ports. |
Keepalived |
— |
Manages the VIP using VRRP. Moves the VIP to the new primary after failover. |
pgchk.py |
— |
HTTP health-check endpoint (port 8008). HAProxy queries this to determine which node is the current primary. |
HAProxy ports:
Port |
Purpose |
|---|---|
5000 |
Write traffic — routes to the current primary only |
5001 |
Read traffic — load-balanced across all standbys |
Sizing: This architecture is appropriate for Mattermost deployments up to approximately 2,000 concurrent users. For larger deployments, see Scaling for Enterprise.
Before you begin¶
Is this the right architecture for you?¶
Scenario |
Recommendation |
Why |
|---|---|---|
Cloud-hosted on AWS/GCP/Azure |
Use managed RDS/Cloud SQL with Multi-AZ |
Managed failover, no infrastructure to operate |
On-premises or private cloud, single site |
This guide — single-DC HA cluster |
Automatic failover within the datacenter, no cloud dependency |
On-premises, two or more sites, DR required |
Single-DC HA (this guide) + Multi-DC DR guide (coming soon) |
Active/warm-standby across datacenters |
Requirements¶
Hardware (per node — minimum):
Operating system: Ubuntu 24.04 LTS
CPU: 2 cores
RAM: 4 GB
Disk: 50 GB
You need 3 nodes and one spare IP address on the same subnet for the VIP.
Network — ports that must be open between all three nodes:
Port |
Purpose |
|---|---|
22 |
SSH (administration) |
5432 |
PostgreSQL (replication, repmgr) |
8008 |
pgchk.py health check (HAProxy → database nodes) |
VRRP (112) |
Keepalived VIP election between nodes |
Ports that Mattermost application servers must reach:
Port |
Purpose |
|---|---|
5000 |
Write connections (primary) |
5001 |
Read connections (standbys) |
Software: The following packages will be installed during setup. No pre-installation is required.
postgresql-17postgresql-17-repmgrhaproxykeepalivedpython3(for pgchk.py)
Node planning worksheet¶
Complete this before starting. You will substitute these values throughout the guide.
Node |
Hostname |
IP address |
Initial role |
|---|---|---|---|
1 |
pg1 |
Primary |
|
2 |
pg2 |
Standby |
|
3 |
pg3 |
Standby |
|
VIP |
— |
Floating (always points to primary) |
Subnet: _______________ (e.g. 10.0.1.0)
Time estimate¶
Allow 2–3 hours for a first-time setup on pre-provisioned servers.
Setup guide¶
Note
Throughout this guide, substitute the IP addresses and subnet you recorded in the node planning worksheet above.
Warning
Complete each phase in order. The checkpoint at the end of each phase must pass before you proceed.
Phase 1: Base installation (all nodes)¶
Run all steps in Phase 1 on pg1, pg2, and pg3.
Step 1.1 — Configure /etc/hosts
On each node, append to /etc/hosts:
<PG1_IP> pg1
<PG2_IP> pg2
<PG3_IP> pg3
Verify hostname resolution on each node:
ping -c 1 pg1 && ping -c 1 pg2 && ping -c 1 pg3
Expected: 3 successful pings.
Step 1.2 — Install PostgreSQL 17 and repmgr 5.5
sudo apt update
sudo apt install -y curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
--fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
sudo sh -c 'echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" \
> /etc/apt/sources.list.d/pgdg.list'
sudo apt update
sudo apt install -y postgresql-17 postgresql-17-repmgr
✅ Phase 1 checkpoint — run on every node:
sudo systemctl status postgresql | grep "active (running)"
/usr/lib/postgresql/17/bin/repmgr --version
Pass: PostgreSQL shows active (running); repmgr prints repmgr 5.5.x.
Fail: If PostgreSQL did not start, check journalctl -u postgresql for errors.
Phase 2: PostgreSQL configuration (all nodes)¶
Run all steps in Phase 2 on pg1, pg2, and pg3.
Step 2.1 — Configure postgresql.conf
Append to /etc/postgresql/17/main/postgresql.conf:
# Replication settings
listen_addresses = '*'
max_wal_senders = 10
max_replication_slots = 10
wal_level = replica
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
shared_preload_libraries = 'repmgr'
wal_log_hints = on
wal_keep_size = 1024
Step 2.2 — Configure pg_hba.conf
Append to /etc/postgresql/17/main/pg_hba.conf:
# repmgr access
host repmgr repmgr <SUBNET>/24 scram-sha-256
host repmgr repmgr 127.0.0.1/32 scram-sha-256
# Replication connections
host replication repmgr <SUBNET>/24 scram-sha-256
host replication repmgr 127.0.0.1/32 scram-sha-256
Create a .pgpass file on each node so repmgr can authenticate without
an interactive password prompt:
echo "*:*:repmgr:repmgr:<YOUR_REPMGR_PASSWORD>" >> ~/.pgpass
chmod 600 ~/.pgpass
Warning
Lab and testing only: If you want to skip password authentication for
initial setup, you can temporarily use trust instead of
scram-sha-256. Do not use trust in production — it allows
passwordless connections from any host on the subnet.
Step 2.3 — Restart PostgreSQL
sudo systemctl restart postgresql
✅ Phase 2 checkpoint — run on every node:
sudo -u postgres psql -c "SHOW wal_level;"
sudo -u postgres psql -c "SHOW shared_preload_libraries;"
Pass: wal_level is replica; shared_preload_libraries contains repmgr.
Fail: If PostgreSQL did not restart, check journalctl -u postgresql.
Phase 3: repmgr configuration and cluster initialisation¶
Step 3.1 — Create repmgr user and database (pg1 only)
Note
repmgr requires superuser privileges to perform certain cluster operations
including pg_rewind (used to resync a failed primary as a standby) and
event notifications. If your security policy prohibits superuser accounts,
refer to the repmgr documentation on permissions for the
minimum required grants.
sudo -u postgres createuser --superuser repmgr
sudo -u postgres createdb --owner=repmgr repmgr
sudo -u postgres psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"
sudo -u postgres psql -c "ALTER USER repmgr PASSWORD '<YOUR_REPMGR_PASSWORD>';"
Step 3.2 — Create /etc/repmgr.conf (all nodes)
Create /etc/repmgr.conf on each node. Adjust node_id, node_name,
and host for each node:
pg1:
node_id=1
node_name='pg1'
conninfo='host=<PG1_IP> user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/17/main'
use_replication_slots=yes
monitoring_history=yes
log_level=INFO
pg_bindir='/usr/lib/postgresql/17/bin'
service_start_command='sudo /usr/bin/pg_ctlcluster 17 main start'
service_stop_command='sudo /usr/bin/pg_ctlcluster 17 main stop'
service_restart_command='sudo /usr/bin/pg_ctlcluster 17 main restart'
service_reload_command='sudo /usr/bin/pg_ctlcluster 17 main reload'
service_promote_command='sudo /usr/bin/pg_ctlcluster 17 main promote'
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
reconnect_attempts=3
reconnect_interval=5
monitor_interval_secs=2
pg2: Same as above with node_id=2, node_name='pg2', host=<PG2_IP>.
pg3: Same as above with node_id=3, node_name='pg3', host=<PG3_IP>.
Step 3.3 — Register primary (pg1 only)
sudo -u postgres repmgr -f /etc/repmgr.conf primary register
Step 3.4 — Clone standbys (pg2 and pg3)
Run on pg2, then pg3:
sudo systemctl stop postgresql
sudo -u postgres repmgr -h <PG1_IP> -U repmgr -d repmgr \
-f /etc/repmgr.conf standby clone --delete-existing-pgdata
sudo systemctl start postgresql
sudo -u postgres repmgr -f /etc/repmgr.conf standby register
Step 3.5 — Start repmgrd (all nodes)
Create /etc/systemd/system/repmgrd.service:
[Unit]
Description=repmgr daemon
After=postgresql.service
Requires=postgresql.service
[Service]
User=postgres
ExecStart=/usr/lib/postgresql/17/bin/repmgrd -f /etc/repmgr.conf --no-daemonize
Restart=on-failure
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable repmgrd
sudo systemctl start repmgrd
✅ Phase 3 checkpoint — run on any node:
sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
Pass: Output shows all three nodes — pg1 as * running (primary), pg2 and
pg3 as running (standby). On pg1, the following query returns 2 rows:
sudo -u postgres psql -c "SELECT client_addr, state FROM pg_stat_replication;"
Fail: A standby showing ! running means replication did not establish.
Check journalctl -u postgresql on the failed standby. Common cause: firewall
blocking port 5432 between nodes.
Phase 4: HAProxy, health check, and VIP (all nodes)¶
Run all steps in Phase 4 on pg1, pg2, and pg3.
Step 4.1 — Install HAProxy
sudo apt install -y haproxy
Step 4.2 — Configure HAProxy
Replace /etc/haproxy/haproxy.cfg:
global
log /dev/log local0
maxconn 4000
defaults
log global
mode tcp
timeout connect 5s
timeout client 30s
timeout server 30s
frontend pg_write
bind *:5000
default_backend pg_primary
frontend pg_read
bind *:5001
default_backend pg_replicas
backend pg_primary
option tcp-check
server pg1 <PG1_IP>:5432 check port 8008
server pg2 <PG2_IP>:5432 check port 8008 backup
server pg3 <PG3_IP>:5432 check port 8008 backup
backend pg_replicas
balance roundrobin
option tcp-check
server pg2 <PG2_IP>:5432 check port 8008
server pg3 <PG3_IP>:5432 check port 8008
server pg1 <PG1_IP>:5432 check port 8008 backup
Step 4.3 — Deploy pgchk.py
pgchk.py is a lightweight HTTP server that returns 200 OK when the local
node is the primary and 503 otherwise. HAProxy queries port 8008 on each
node to determine where to route connections.
On each node, create /usr/local/bin/pgchk.py with the following content:
#!/usr/bin/env python3
import subprocess
from http.server import BaseHTTPRequestHandler, HTTPServer
import argparse
DEFAULT_PORT = 8008
PG_USER = "postgres"
PG_DB = "postgres"
PG_PORT = "5432"
class PostgresHealthCheckHandler(BaseHTTPRequestHandler):
def safe_write(self, data):
try:
self.wfile.write(data)
except (BrokenPipeError, ConnectionResetError):
pass
def check_postgres_status(self):
try:
cmd = ["psql", "-U", PG_USER, "-d", PG_DB, "-p", PG_PORT,
"-t", "-c", "SELECT pg_is_in_recovery();"]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=5)
if result.returncode != 0:
return None
output = result.stdout.strip()
if output == 't':
return True # Standby
elif output == 'f':
return False # Primary
return None
except Exception:
return None
def do_GET(self):
status = self.check_postgres_status()
if status is None:
self.send_response(503)
self.end_headers()
self.safe_write(b"PostgreSQL Unreachable\n")
return
if self.path in ('/', '/master'):
if not status:
self.send_response(200); self.end_headers()
self.safe_write(b"OK - Primary\n")
else:
self.send_response(503); self.end_headers()
self.safe_write(b"Service Unavailable - Not Primary\n")
elif self.path == '/replica':
if status:
self.send_response(200); self.end_headers()
self.safe_write(b"OK - Replica\n")
else:
self.send_response(503); self.end_headers()
self.safe_write(b"Service Unavailable - Not Replica\n")
else:
self.send_response(404); self.end_headers()
self.safe_write(b"Not Found\n")
def log_message(self, format, *args):
pass
def run(port=DEFAULT_PORT):
httpd = HTTPServer(('', port), PostgresHealthCheckHandler)
print(f"Starting PostgreSQL Health Check on port {port}...")
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
httpd.server_close()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PostgreSQL Health Check for HAProxy')
parser.add_argument('--port', type=int, default=DEFAULT_PORT)
args = parser.parse_args()
run(port=args.port)
Make the script executable:
sudo chmod +x /usr/local/bin/pgchk.py
Create /etc/systemd/system/pgchk.service:
[Unit]
Description=PostgreSQL Health Check for HAProxy
After=postgresql.service
[Service]
ExecStart=/usr/bin/python3 /usr/local/bin/pgchk.py --port 8008
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable pgchk
sudo systemctl start pgchk
sudo systemctl enable haproxy
sudo systemctl start haproxy
Step 4.4 — Install and configure Keepalived
sudo apt install -y keepalived
First, identify the name of the network interface that carries your node’s IP address:
ip -o link show | awk '{print $2, $9}' | grep UP
Note the interface name (e.g. ens3, enp3s0, eth0). You will use it in
the next step.
Create /etc/keepalived/keepalived.conf. Replace <INTERFACE> with the
interface name from the previous step. Set priority: pg1 gets 101,
pg2 gets 100, pg3 gets 99. Set virtual_ipaddress to your VIP:
vrrp_instance VI_1 {
state BACKUP
interface <INTERFACE>
virtual_router_id 51
priority 101
advert_int 1
nopreempt
virtual_ipaddress {
<CLUSTER_VIP>/24
}
}
sudo systemctl enable keepalived
sudo systemctl start keepalived
✅ Phase 4 checkpoint — run on any node:
# VIP should be active on the primary node (pg1)
ip addr show | grep <CLUSTER_VIP>
# Port 5000 should connect to primary
psql -h <CLUSTER_VIP> -p 5000 -U repmgr -d repmgr \
-c "SELECT inet_server_addr(), pg_is_in_recovery();"
# Port 5001 should connect to a standby
psql -h <CLUSTER_VIP> -p 5001 -U repmgr -d repmgr \
-c "SELECT inet_server_addr(), pg_is_in_recovery();"
Pass: VIP visible on pg1. Port 5000 returns pg_is_in_recovery = f (primary).
Port 5001 returns pg_is_in_recovery = t (standby).
Fail: If the VIP is not on pg1, check journalctl -u keepalived. If HAProxy
is not routing correctly, check journalctl -u haproxy and verify pgchk.py
is responding: curl http://<PG1_IP>:8008 should return HTTP 200.
Phase 5: End-to-end validation¶
Run this phase after all four previous phases pass on all nodes. This confirms the cluster behaves correctly under failure before you connect Mattermost.
Step 5.1 — Confirm healthy starting state
sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
Pass: pg1 is * running (primary); pg2 and pg3 are running (standby).
Step 5.2 — Simulate primary failure
On pg1:
sudo systemctl stop postgresql
Wait 30 seconds, then on pg2 or pg3:
sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
Pass: One of pg2 or pg3 is now * running (primary). pg1 shows as ! running
(unreachable — expected).
Step 5.3 — Verify HAProxy and VIP followed the new primary
psql -h <CLUSTER_VIP> -p 5000 -U repmgr -d repmgr \
-c "SELECT inet_server_addr(), pg_is_in_recovery();"
Pass: Returns the IP of the newly promoted node with pg_is_in_recovery = f.
Step 5.4 — Recover the old primary as a standby
On pg1:
sudo systemctl start postgresql
sudo -u postgres repmgr -f /etc/repmgr.conf node rejoin \
--force-rewind --config-files=postgresql.conf,pg_hba.conf
Then on any node:
sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
Pass: All three nodes show running; pg1 is now a standby.
Note
Your cluster is ready for production. Connect Mattermost using the VIP
address and port 5000 as the primary datasource. Optionally configure
port 5001 as a read replica in config.json.
Day-2 operations¶
Check cluster status¶
Run on any node:
sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
Expected healthy output shows one * running primary and two running standbys.
Check replication lag¶
Run on the primary:
sudo -u postgres psql -c "
SELECT client_addr, application_name, state,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS lag
FROM pg_stat_replication;"
Normal lag is under 1 MB during steady state. Lag growing continuously indicates a replication problem — check network connectivity and standby PostgreSQL logs.
Controlled switchover (planned maintenance)¶
To move the primary role to a standby with zero data loss:
# Run on the TARGET standby (e.g. pg2)
sudo -u postgres repmgr -f /etc/repmgr.conf standby switchover
repmgr will demote the old primary and promote this node. The VIP and HAProxy will follow automatically.
Add a standby node¶
Provision a new server and complete Phases 1–2 of the setup guide.
Create
/etc/repmgr.confwith the next availablenode_id.On the new node:
sudo systemctl stop postgresql sudo -u postgres repmgr -h <PRIMARY_IP> -U repmgr -d repmgr \ -f /etc/repmgr.conf standby clone --delete-existing-pgdata sudo systemctl start postgresql sudo -u postgres repmgr -f /etc/repmgr.conf standby register
Add the new node to
/etc/haproxy/haproxy.cfgon all existing nodes and reload HAProxy:sudo systemctl reload haproxy.
Rejoin a failed node¶
After recovering a failed standby:
sudo -u postgres repmgr -f /etc/repmgr.conf node rejoin \
--force-rewind --config-files=postgresql.conf,pg_hba.conf
After rejoining a failed primary (after automatic failover has already promoted a new primary), run the same command on the old primary to re-register it as a standby.
Troubleshooting¶
repmgrd is not starting¶
Symptom: systemctl status repmgrd shows failed or activating.
Likely cause: PostgreSQL has not fully started yet, or the repmgr database is not accessible.
Resolution:
# Verify PostgreSQL is running first
sudo systemctl status postgresql
# Check repmgrd logs
journalctl -u repmgrd -n 50
# Test repmgr connection manually
sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
Standby not replicating¶
Symptom: repmgr cluster show shows a standby as ! running, or
pg_stat_replication on the primary shows fewer than expected rows.
Likely cause: Network connectivity issue on port 5432, or pg_hba.conf
not permitting the replication connection.
Resolution:
# From the standby, test connectivity to the primary
pg_isready -h <PRIMARY_IP> -p 5432 -U repmgr
# Check PostgreSQL logs on the standby
sudo -u postgres tail -50 /var/log/postgresql/postgresql-17-main.log
VIP not moving after failover¶
Symptom: After a primary failure and successful repmgr promotion, the VIP remains on the failed node or does not appear on the new primary.
Likely cause: Keepalived is not running, or VRRP traffic is blocked by a firewall.
Resolution:
sudo systemctl status keepalived
journalctl -u keepalived -n 50
# Verify VRRP traffic is not blocked — check cloud security groups or
# iptables rules for protocol 112 (VRRP)
HAProxy routing to wrong node¶
Symptom: Connections on port 5000 land on a standby (writes fail), or port 5001 routes to the primary.
Likely cause: pgchk.py is not running or returning incorrect status.
Resolution:
# Check health check response on each node
curl -v http://<NODE_IP>:8008
# Primary should return HTTP 200; standbys should return HTTP 503
sudo systemctl status pgchk
journalctl -u pgchk -n 30
Split-brain prevention¶
repmgr’s failover=automatic setting and reconnect_attempts=3 with
reconnect_interval=5 provide a brief delay before promoting a standby.
This prevents promotion during transient network blips.
If you suspect a split-brain scenario (two nodes both believing they are
primary), do not write to either node. Check cluster status from a
third node and use repmgr node service --action=stop to fence the
unintended primary before recovering.