This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Administration

Standard Operating Procedures (SOP) for database administration tasks

How to maintain existing PostgreSQL clusters with Pigsty?

This section provides standard operating procedures (SOP) for common PostgreSQL administration tasks:

  • SOP: Standard operating procedures for creating/removing clusters and instances, backup & restore, rolling upgrades, etc.
  • Failure: Common failure troubleshooting strategies and handling methods, such as disk exhaustion, connection exhaustion, XID wraparound, etc.
  • Drop: Emergency procedures for handling accidental data deletion, table drops, and database drops
  • Maintain: Maintenance tasks including regular inspections, post-failover cleanup, bloat management, VACUUM FREEZE, etc.
  • Tuning: Automatic optimization strategies and adjustment methods for memory, CPU, storage parameters, etc.

1 - Troubleshooting

Common failures and analysis troubleshooting approaches

This document lists potential failures in PostgreSQL and Pigsty, as well as SOPs for locating, handling, and analyzing issues.


Disk Space Exhausted

Disk space exhaustion is the most common type of failure.

Symptoms

When the disk space where the database resides is exhausted, PostgreSQL will not work normally and may exhibit the following symptoms: database logs repeatedly report “no space left on device” errors, new data cannot be written, and PostgreSQL may even trigger a PANIC and force shutdown.

Pigsty includes a NodeFsSpaceFull alert rule that triggers when filesystem available space is less than 10%. Use the monitoring system’s NODE Instance panel to review the FS metrics panel to locate the issue.

Diagnosis

You can also log into the database node and use df -h to view the usage of each mounted partition to determine which partition is full. For database nodes, focus on checking the following directories and their sizes to determine which category of files has filled up the space:

  • Data directory (/pg/data/base): Stores data files for tables and indexes; pay attention to heavy writes and temporary files
  • WAL directory (e.g., pg/data/pg_wal): Stores PG WAL; WAL accumulation/replication slot retention is a common cause of disk exhaustion.
  • Database log directory (e.g., pg/log): If PG logs are not rotated in time and large amounts of errors are written, they may also consume significant space.
  • Local backup directory (e.g., data/backups): When using pgBackRest or similar tools to save backups locally, this may also fill up the disk.

If the issue occurs on the Pigsty admin node or monitoring node, also consider:

  • Monitoring data: VictoriaMetrics time-series metrics and VictoriaLogs log storage both consume disk space; check retention policies.
  • Object storage data: Pigsty’s integrated MinIO object storage may be used for PG backup storage.

After identifying the directory consuming the most space, you can further use du -sh <directory> to drill down and find specific large files or subdirectories.

Resolution

Disk exhaustion is an emergency issue requiring immediate action to free up space and ensure the database continues to operate. When the data disk is not separated from the system disk, a full disk may prevent shell commands from executing. In this case, you can delete the /pg/dummy placeholder file to free up a small amount of emergency space so shell commands can work again. If the database has crashed due to pg_wal filling up, you need to restart the database service after clearing space and carefully check data integrity.


Transaction ID Wraparound

PostgreSQL cyclically uses 32-bit transaction IDs (XIDs), and when exhausted, a “transaction ID wraparound” failure occurs (XID Wraparound).

Symptoms

The typical sign in the first phase is when the age saturation in the PGSQL Persist - Age Usage panel enters the warning zone. Database logs begin to show messages like: WARNING: database "postgres" must be vacuumed within xxxxxxxx transactions.

If the problem continues to worsen, PostgreSQL enters protection mode: when remaining transaction IDs drop to about 1 million, the database switches to read-only mode; when reaching the limit of about 2.1 billion (2^31), it refuses any new transactions and forces the server to shut down to avoid data corruption.

Diagnosis

PostgreSQL and Pigsty enable automatic garbage collection (AutoVacuum) by default, so the occurrence of this type of failure usually has deeper root causes. Common causes include: very long transactions (SAGE), misconfigured Autovacuum, replication slot blockage, insufficient resources, storage engine/extension bugs, disk bad blocks.

First identify the database with the highest age, then use the Pigsty PGCAT Database - Tables panel to confirm the age distribution of tables. Also review the database error logs, which usually contain clues to locate the root cause.

Resolution

  1. Immediately freeze old transactions: If the database has not yet entered read-only protection mode, immediately execute a manual VACUUM FREEZE on the affected database. You can start by freezing the most severely aged tables one by one rather than doing the entire database at once to accelerate the effect. Connect to the database as a superuser and run VACUUM FREEZE table_name; on tables identified with the largest relfrozenxid, prioritizing tables with the highest XID age. This can quickly reclaim large amounts of transaction ID space.
  2. Single-user mode rescue: If the database is already refusing writes or has crashed for protection, you need to start the database in single-user mode to perform freeze operations. In single-user mode, run VACUUM FREEZE database_name; to freeze and clean the entire database. After completion, restart the database in multi-user mode. This can lift the wraparound lock and make the database writable again. Be very careful when operating in single-user mode and ensure sufficient transaction ID margin to complete the freeze.
  3. Standby node takeover: In some complex scenarios (e.g., when hardware issues prevent vacuum from completing), consider promoting a read-only standby node in the cluster to primary to obtain a relatively clean environment for handling the freeze. For example, if the primary cannot vacuum due to bad blocks, you can manually failover to promote the standby to the new primary, then perform emergency vacuum freeze on it. After ensuring the new primary has frozen old transactions, switch the load back.

Connection Exhaustion

PostgreSQL has a maximum connections configuration (max_connections). When client connections exceed this limit, new connection requests will be rejected. The typical symptom is that applications cannot connect to the database and report errors like FATAL: remaining connection slots are reserved for non-replication superuser connections or too many clients already. This indicates that regular connections are exhausted, leaving only slots reserved for superusers or replication.

Diagnosis

Connection exhaustion is usually caused by a large number of concurrent client requests. You can directly review the database’s current active sessions through PGCAT Instance / PGCAT Database / PGCAT Locks. Determine what types of queries are filling the system and proceed with further handling. Pay special attention to whether there are many connections in the “Idle in Transaction” state and long-running transactions (as well as slow queries).

Resolution

Kill queries: For situations where exhaustion has already blocked business operations, typically use pg_terminate_backend(pid) immediately for emergency pressure relief. For cases using connection pooling, you can adjust the connection pool size parameters and execute a reload to reduce the number of connections at the database level.

You can also modify the max_connections parameter to a larger value, but this parameter requires a database restart to take effect.


etcd Quota Exhausted

An exhausted etcd quota will cause the PG high availability control plane to fail and prevent configuration changes.

Diagnosis

Pigsty uses etcd as the distributed configuration store (DCS) when implementing high availability. etcd itself has a storage quota (default is about 2GB). When etcd storage usage reaches the quota limit, etcd will refuse write operations and report “etcdserver: mvcc: database space exceeded”. In this case, Patroni cannot write heartbeats or update configuration to etcd, causing cluster management functions to fail.

Resolution

Versions between Pigsty v2.0.0 and v2.5.1 are affected by this issue by default. Pigsty v2.6.0 added auto-compaction configuration for deployed etcd. If you only use it for PG high availability leases, this issue will no longer occur in regular use cases.


Defective Storage Engine

Currently, TimescaleDB’s experimental storage engine Hypercore has been proven to have defects, with cases of VACUUM being unable to reclaim leading to XID wraparound failures. Users using this feature should migrate to PostgreSQL native tables or TimescaleDB’s default engine promptly.

Detailed introduction: PG New Storage Engine Failure Case (Chinese)

2 - Database Management

Database management: create, modify, delete, rebuild databases, and clone databases using templates

In Pigsty, database management follows an IaC (Infrastructure as Code) approach—define in the configuration inventory, then execute playbooks.

When no baseline SQL is defined, executing the pgsql-db.yml playbook is idempotent. It adjusts the specified database in the specified cluster to match the target state in the configuration inventory.

Note that some parameters can only be specified at creation time. Modifying these parameters requires deleting and recreating the database (using state: recreate to rebuild).


Define Database

Business databases are defined in the cluster parameter pg_databases, which is an array of database definition objects. Databases in the array are created in definition order, so later-defined databases can use previously-defined databases as templates.

Here’s the database definition from the default cluster pg-meta in Pigsty’s demo environment:

pg-meta:
  hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
  vars:
    pg_cluster: pg-meta
    pg_databases:
      - { name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty] ,extensions: [{name: postgis, schema: public}, {name: timescaledb}]}
      - { name: grafana  ,owner: dbuser_grafana  ,revokeconn: true ,comment: grafana primary database }
      - { name: bytebase ,owner: dbuser_bytebase ,revokeconn: true ,comment: bytebase primary database }
      - { name: kong     ,owner: dbuser_kong     ,revokeconn: true ,comment: kong the api gateway database }
      - { name: gitea    ,owner: dbuser_gitea    ,revokeconn: true ,comment: gitea meta database }
      - { name: wiki     ,owner: dbuser_wiki     ,revokeconn: true ,comment: wiki meta database }
      - { name: noco     ,owner: dbuser_noco     ,revokeconn: true ,comment: nocodb database }

The only required field is name, which should be a valid and unique database name within the current PostgreSQL cluster—all other parameters have sensible defaults. For complete database definition parameter reference, see Database Configuration Reference.


Create Database

To create a new business database on an existing PostgreSQL cluster, add the database definition to all.children.<cls>.pg_databases, then execute:

bin/pgsql-db <cls> <dbname>    # Equivalent to: pgsql-db.yml -l <cls> -e dbname=<dbname>

Example: Create a business database named myapp

  1. Add the database definition to the configuration file:
pg-meta:
  vars:
    pg_databases:
      - name: myapp
        owner: dbuser_myapp
        schemas: [app]
        extensions:
          - { name: pg_trgm }
          - { name: btree_gin }
        comment: my application database
  1. Execute the create command:
bin/pgsql-db pg-meta myapp

Execution effects:

  • Creates database myapp on the primary
  • Sets database owner to dbuser_myapp
  • Creates schema app
  • Installs extensions pg_trgm and btree_gin
  • Configures default privileges (dbrole_readonly/readwrite/admin)
  • Adds database to Pgbouncer connection pool
  • Registers database as a Grafana data source

Modify Database

Modify database properties by updating the configuration and re-executing the playbook:

bin/pgsql-db <cls> <dbname>    # Idempotent operation, can be executed repeatedly

Modifiable Properties

PropertyDescriptionExample
ownerDatabase ownerowner: dbuser_new
tablespaceDefault tablespace (triggers data migration)tablespace: fast_ssd
is_templateMark as template databaseis_template: true
allowconnAllow connectionsallowconn: false
connlimitConnection limitconnlimit: 100
revokeconnRevoke PUBLIC CONNECT privilegerevokeconn: true
commentCommentcomment: new comment
parametersDatabase-level parametersSee examples below
schemasAdd/remove schemas (incremental)See Manage Schemas
extensionsAdd/remove extensions (incremental)See Manage Extensions
pgbouncerInclude in connection poolpgbouncer: false
pool_*Connection pool parametersSee Connection Pool Config

Non-modifiable Properties

The following properties cannot be modified after database creation—use state: recreate to rebuild the database:

  • template - Template database
  • encoding - Character encoding
  • locale / lc_collate / lc_ctype - Locale settings
  • locale_provider / icu_locale / icu_rules / builtin_locale - Locale provider settings
  • strategy - Clone strategy

Change Owner

- name: myapp
  owner: dbuser_new_owner     # Change to new owner
bin/pgsql-db pg-meta myapp

Executed SQL:

ALTER DATABASE "myapp" OWNER TO "dbuser_new_owner";
GRANT ALL PRIVILEGES ON DATABASE "myapp" TO "dbuser_new_owner";

Change Connection Limit

- name: myapp
  connlimit: 100              # Limit to max 100 connections

Executed SQL:

ALTER DATABASE "myapp" CONNECTION LIMIT 100;

Revoke Public Connection Privilege

- name: myapp
  owner: dbuser_myapp
  revokeconn: true            # Revoke PUBLIC CONNECT privilege

Executed SQL:

REVOKE CONNECT ON DATABASE "myapp" FROM PUBLIC;
GRANT CONNECT ON DATABASE "myapp" TO "replicator";
GRANT CONNECT ON DATABASE "myapp" TO "dbuser_monitor";
GRANT CONNECT ON DATABASE "myapp" TO "dbuser_dba" WITH GRANT OPTION;
GRANT CONNECT ON DATABASE "myapp" TO "dbuser_myapp" WITH GRANT OPTION;

To restore public connection privilege, set revokeconn: false:

- name: myapp
  revokeconn: false           # Restore PUBLIC CONNECT privilege

Executed SQL:

GRANT CONNECT ON DATABASE "myapp" TO PUBLIC;

Mark as Template Database

- name: app_template
  is_template: true           # Allow any user with CREATEDB privilege to clone

Executed SQL:

ALTER DATABASE "app_template" IS_TEMPLATE true;

Manage Parameters

Database-level parameters are configured via the parameters dictionary, generating ALTER DATABASE ... SET statements.

Set Parameters

- name: myapp
  parameters:
    work_mem: '256MB'
    maintenance_work_mem: '512MB'
    statement_timeout: '30s'
    search_path: 'app,public'

Executed SQL:

ALTER DATABASE "myapp" SET "work_mem" = '256MB';
ALTER DATABASE "myapp" SET "maintenance_work_mem" = '512MB';
ALTER DATABASE "myapp" SET "statement_timeout" = '30s';
ALTER DATABASE "myapp" SET "search_path" = 'app,public';

Reset Parameters to Default

Use the special value DEFAULT (case-insensitive) to reset parameters to PostgreSQL defaults:

- name: myapp
  parameters:
    work_mem: DEFAULT         # Reset to PostgreSQL default
    statement_timeout: DEFAULT

Executed SQL:

ALTER DATABASE "myapp" SET "work_mem" = DEFAULT;
ALTER DATABASE "myapp" SET "statement_timeout" = DEFAULT;

Common Database-Level Parameters

ParameterDescriptionExample Value
work_memQuery work memory'64MB'
maintenance_work_memMaintenance operation memory'256MB'
statement_timeoutStatement timeout'30s'
lock_timeoutLock wait timeout'10s'
idle_in_transaction_session_timeoutIdle transaction timeout'10min'
search_pathSchema search path'app,public'
default_tablespaceDefault tablespace'fast_ssd'
temp_tablespacesTemporary tablespaces'temp_ssd'
log_statementStatement logging level'ddl'

Manage Schemas

Schemas are configured via the schemas array, supporting create, assign owner, and delete operations.

Create Schemas

- name: myapp
  schemas:
    # Simple form: schema name only
    - app
    - api

    # Full form: specify owner
    - { name: core, owner: dbuser_myapp }

Executed SQL:

CREATE SCHEMA IF NOT EXISTS "app";
CREATE SCHEMA IF NOT EXISTS "api";
CREATE SCHEMA IF NOT EXISTS "core" AUTHORIZATION "dbuser_myapp";

Specify Schema Owner

Use the owner field to assign a schema owner—useful for multi-tenant or permission isolation scenarios:

- name: multi_tenant_db
  owner: dbuser_admin
  schemas:
    - { name: tenant_a, owner: dbuser_tenant_a }
    - { name: tenant_b, owner: dbuser_tenant_b }
    - { name: shared, owner: dbuser_admin }

Delete Schemas

Use state: absent to mark schemas for deletion:

- name: myapp
  schemas:
    - { name: deprecated_schema, state: absent }

Executed SQL:

DROP SCHEMA IF EXISTS "deprecated_schema" CASCADE;

Manage Extensions

Extensions are configured via the extensions array, supporting install and uninstall operations.

Install Extensions

- name: myapp
  extensions:
    # Simple form: extension name only
    - postgis
    - pg_trgm

    # Full form: specify schema and version
    - { name: vector, schema: public }
    - { name: pg_stat_statements, schema: monitor, version: '1.10' }

Executed SQL:

CREATE EXTENSION IF NOT EXISTS "postgis" CASCADE;
CREATE EXTENSION IF NOT EXISTS "pg_trgm" CASCADE;
CREATE EXTENSION IF NOT EXISTS "vector" WITH SCHEMA "public" CASCADE;
CREATE EXTENSION IF NOT EXISTS "pg_stat_statements" WITH SCHEMA "monitor" VERSION '1.10' CASCADE;

Uninstall Extensions

Use state: absent to mark extensions for uninstall:

- name: myapp
  extensions:
    - { name: pg_trgm, state: absent }    # Uninstall extension
    - { name: postgis }                    # Keep extension

Executed SQL:

DROP EXTENSION IF EXISTS "pg_trgm" CASCADE;
CREATE EXTENSION IF NOT EXISTS "postgis" CASCADE;

Delete Database

To delete a database, set its state to absent and execute the playbook:

pg_databases:
  - name: olddb
    state: absent
bin/pgsql-db <cls> olddb

Delete operation will:

  1. If database is marked is_template: true, first execute ALTER DATABASE ... IS_TEMPLATE false
  2. Force drop database with DROP DATABASE ... WITH (FORCE) (PG13+)
  3. Terminate all active connections to the database
  4. Remove database from Pgbouncer connection pool
  5. Unregister from Grafana data sources

Protection mechanisms:

  • System databases postgres, template0, template1 cannot be deleted
  • Delete operations only execute on the primary—streaming replication syncs to replicas automatically

Rebuild Database

The recreate state rebuilds a database, equivalent to delete then create:

pg_databases:
  - name: testdb
    state: recreate
    owner: dbuser_test
    baseline: test_init.sql    # Execute initialization after rebuild
bin/pgsql-db <cls> testdb

Use cases:

  • Test environment reset
  • Clear development database
  • Modify immutable properties (encoding, locale, etc.)
  • Restore database to initial state

Difference from manual DROP + CREATE:

  • Single command, no need for two operations
  • Automatically preserves Pgbouncer and Grafana configuration
  • Automatically loads baseline initialization script after execution

Clone Database

You can use an existing database as a template to create a new database, enabling quick replication of database structures.

Basic Clone

pg_databases:
  # 1. First define the template database
  - name: app_template
    owner: dbuser_app
    schemas: [core, api]
    extensions: [postgis, pg_trgm]
    baseline: app_schema.sql

  # 2. Create business database using template
  - name: app_prod
    template: app_template
    owner: dbuser_app

Specify Clone Strategy (PG15+)

- name: app_staging
  template: app_template
  strategy: FILE_COPY        # Or WAL_LOG
  owner: dbuser_app
StrategyDescriptionUse Case
FILE_COPYDirect data file copyLarge templates, general scenarios
WAL_LOGCopy via WAL logsSmall templates, doesn’t block template connections

Use Custom Template Database

When using non-system templates (not template0/template1), Pigsty automatically terminates connections to the template database to allow cloning.

- name: new_db
  template: existing_db      # Use existing business database as template
  owner: dbuser_app

Mark as Template Database

By default, only superusers or database owners can use regular databases as templates. Using is_template: true allows any user with CREATEDB privilege to clone:

- name: shared_template
  is_template: true          # Allow any user with CREATEDB privilege to clone
  owner: dbuser_app

Use ICU Locale Provider

When using the icu locale provider, you must specify template: template0:

- name: myapp_icu
  template: template0        # Must use template0
  locale_provider: icu
  icu_locale: en-US
  encoding: UTF8

Connection Pool Config

By default, all business databases are added to the Pgbouncer connection pool.

Database-Level Connection Pool Parameters

- name: myapp
  pgbouncer: true              # Include in connection pool (default true)
  pool_mode: transaction       # Pool mode: transaction/session/statement
  pool_size: 64                # Default pool size
  pool_size_min: 0             # Minimum pool size
  pool_reserve: 32             # Reserved connections
  pool_connlimit: 100          # Maximum database connections
  pool_auth_user: dbuser_meta  # Auth query user

Generated Configuration

Configuration file located at /etc/pgbouncer/database.txt:

myapp                       = host=/var/run/postgresql pool_mode=transaction pool_size=64

Hide Databases

Some internal databases may not need connection pool access:

- name: internal_db
  pgbouncer: false           # Don't add to connection pool

Pool Mode Explanation

ModeDescriptionUse Case
transactionReturn connection after transaction ends (default)Most OLTP applications
sessionReturn connection after session endsApplications requiring session state
statementReturn connection after statement endsStateless queries

Locale Provider

PostgreSQL 15+ introduced the locale_provider parameter, supporting different locale implementations.

Use ICU Provider (PG15+)

- name: myapp_icu
  template: template0        # ICU must use template0
  locale_provider: icu
  icu_locale: en-US          # ICU locale rules
  encoding: UTF8

Use Builtin Provider (PG17+)

- name: myapp_builtin
  template: template0
  locale_provider: builtin
  builtin_locale: C.UTF-8    # Builtin locale rules
  encoding: UTF8

ICU Collation Rules (PG16+)

- name: myapp_custom_icu
  template: template0
  locale_provider: icu
  icu_locale: en-US
  icu_rules: '&V << w <<< W'  # Custom ICU collation rules

Provider Comparison

ProviderVersion RequirementFeatures
libc-Traditional, depends on OS
icuPG15+Cross-platform consistent, feature-rich
builtinPG17+Most efficient C/C.UTF-8 collation

Quick Reference

Common Commands

OperationCommand
Create databasebin/pgsql-db <cls> <dbname>
Modify databasebin/pgsql-db <cls> <dbname>
Delete databaseSet state: absent then run bin/pgsql-db <cls> <dbname>
Rebuild databaseSet state: recreate then run bin/pgsql-db <cls> <dbname>
List databasespsql -c '\l'
View connection pool databasescat /etc/pgbouncer/database.txt

Common Operation Examples

# Create basic database
- name: myapp
  owner: dbuser_myapp
  comment: my application database

# Create database with extensions
- name: geodata
  owner: dbuser_geo
  extensions: [postgis, postgis_topology]

# Private database with limited connections
- name: secure_db
  owner: dbuser_secure
  revokeconn: true
  connlimit: 10

# Set database-level parameters
- name: analytics
  owner: dbuser_analyst
  parameters:
    work_mem: '512MB'
    statement_timeout: '5min'

# Use ICU locale
- name: i18n_db
  template: template0
  locale_provider: icu
  icu_locale: zh-Hans
  encoding: UTF8

# Delete database
- name: old_db
  state: absent

# Rebuild database
- name: test_db
  state: recreate
  baseline: test_init.sql

Execution Flow

bin/pgsql-db executes these steps in order:

  1. Validate - Check dbname parameter and database definition
  2. Delete (if state=absent/recreate) - Execute DROP DATABASE
  3. Create (if state=create/recreate) - Execute CREATE DATABASE
  4. Configure - Execute ALTER DATABASE to set properties
  5. Initialize - Create schemas, install extensions, execute baseline
  6. Register - Update Pgbouncer and Grafana data sources

For database access permissions, refer to ACL: Database Privileges.

3 - HBA Management

PostgreSQL and Pgbouncer HBA rule management operations: refresh, reload, verify, and troubleshoot.

HBA rule changes require re-rendering configuration files and reloading services. This article covers HBA rule daily management operations.


Quick Reference

OperationCommand
Refresh cluster HBAbin/pgsql-hba <cls>
Refresh specific instancesbin/pgsql-hba <cls> <ip>...
Refresh PostgreSQL only./pgsql.yml -l <cls> -t pg_hba,pg_reload
Refresh Pgbouncer only./pgsql.yml -l <cls> -t pgbouncer_hba,pgbouncer_reload
View current HBApsql -c "TABLE pg_hba_file_rules"
Verify HBA configpsql -c "SELECT pg_reload_conf()"

Refresh HBA Rules

After modifying HBA rules in pigsty.yml, you need to re-render configuration files and reload services.

Using the Admin Script

The recommended approach is using the bin/pgsql-hba script to refresh PostgreSQL and Pgbouncer HBA in one step:

# Refresh entire cluster's HBA rules
bin/pgsql-hba pg-meta

# Refresh specific instances (multiple IPs separated by spaces)
bin/pgsql-hba pg-meta 10.10.10.10
bin/pgsql-hba pg-meta 10.10.10.11 10.10.10.12

# View script help
bin/pgsql-hba --help

The script internally executes:

./pgsql.yml -l <cluster> -t pg_hba,pg_reload,pgbouncer_hba,pgbouncer_reload

Using Ansible Playbook

Directly use the relevant tags from the pgsql.yml playbook:

# Refresh PostgreSQL HBA and reload
./pgsql.yml -l pg-meta -t pg_hba,pg_reload

# Refresh Pgbouncer HBA and reload
./pgsql.yml -l pg-meta -t pgbouncer_hba,pgbouncer_reload

# Refresh both
./pgsql.yml -l pg-meta -t pg_hba,pg_reload,pgbouncer_hba,pgbouncer_reload

# Use extra variables to force reload
./pgsql.yml -l pg-meta -e pg_reload=true -t pg_hba,pg_reload
TagDescription
pg_hbaRender PostgreSQL HBA configuration file
pg_reloadReload PostgreSQL config (requires pg_reload=true)
pgbouncer_hbaRender Pgbouncer HBA configuration file
pgbouncer_reloadReload Pgbouncer config

Configuration File Locations

HBA configuration files are rendered by Ansible:

ServiceConfig File PathTemplate File
PostgreSQL/pg/data/pg_hba.confroles/pgsql/templates/pg_hba.conf
Pgbouncer/etc/pgbouncer/pgb_hba.confroles/pgsql/templates/pgbouncer.hba

Warning: Don’t edit these files directly—they will be overwritten the next time a playbook runs. All changes should be made in pigsty.yml.


Verify HBA Rules

View Currently Active HBA Rules

# Use psql to view PostgreSQL HBA rules
psql -c "TABLE pg_hba_file_rules"

# Or view the config file directly
cat /pg/data/pg_hba.conf

# View Pgbouncer HBA rules
cat /etc/pgbouncer/pgb_hba.conf

Check HBA Configuration Syntax

# PostgreSQL config reload (validates syntax)
psql -c "SELECT pg_reload_conf()"

# If there are syntax errors, check the logs
tail -f /pg/log/postgresql-*.log

Test Connection Authentication

# Test connection for specific user from specific address
psql -h <host> -p 5432 -U <user> -d <database> -c "SELECT 1"

# See which HBA rule matches the connection
psql -c "SELECT * FROM pg_hba_file_rules WHERE database @> ARRAY['<dbname>']::text[]"

Common Management Scenarios

Add New HBA Rule

  1. Edit pigsty.yml, add rule to the cluster’s pg_hba_rules:
pg-meta:
  vars:
    pg_hba_rules:
      - {user: new_user, db: new_db, addr: '192.168.1.0/24', auth: pwd, title: 'new app access'}
  1. Execute refresh:
bin/pgsql-hba pg-meta

Emergency IP Block

When detecting a malicious IP, quickly add a blocklist rule:

  1. Add high-priority (order: 0) deny rule:
pg_hba_rules:
  - {user: all, db: all, addr: '10.1.1.100/32', auth: deny, order: 0, title: 'emergency block'}
  1. Refresh immediately:
bin/pgsql-hba pg-meta

Role-Based Rules

Configure different HBA rules for primary and replica:

pg_hba_rules:
  # Only primary allows write users
  - {user: writer, db: all, addr: intra, auth: pwd, role: primary, title: 'writer on primary'}

  # Replicas allow read-only users
  - {user: reader, db: all, addr: world, auth: ssl, role: replica, title: 'reader on replica'}

After refresh, rules are automatically enabled or disabled based on the instance’s pg_role.

Refresh HBA After Cluster Expansion

When new instances are added to the cluster, rules using addr: cluster need refresh to include new members:

# Add new instance
./pgsql.yml -l 10.10.10.14

# Refresh all instances' HBA (includes new member IPs)
bin/pgsql-hba pg-meta

Refresh HBA After Failover

After Patroni failover, instance pg_role may not match the configuration. If HBA rules use role filtering:

  1. Update role definitions in pigsty.yml
  2. Refresh HBA rules
# Refresh after updating roles in config file
bin/pgsql-hba pg-meta

Troubleshooting

Connection Rejected

Symptom: FATAL: no pg_hba.conf entry for host "x.x.x.x", user "xxx", database "xxx"

Troubleshooting steps:

  1. Check current HBA rules:
psql -c "TABLE pg_hba_file_rules"
  1. Confirm if client IP, username, database matches any rule

  2. Check rule order (first match wins)

  3. Add corresponding rule and refresh

Authentication Failed

Symptom: FATAL: password authentication failed for user "xxx"

Troubleshooting steps:

  1. Confirm password is correct
  2. Check password encryption method (pg_pwd_enc) compatibility with client
  3. Check if user exists: \du or SELECT * FROM pg_roles WHERE rolname = 'xxx'

HBA Rules Not Taking Effect

Troubleshooting steps:

  1. Confirm refresh command was executed
  2. Check if Ansible execution succeeded
  3. Confirm PostgreSQL reloaded:
psql -c "SELECT pg_reload_conf()"
  1. Check if config file was updated:
head -20 /pg/data/pg_hba.conf

Rule Order Issues

HBA uses first-match-wins logic. If rules aren’t working as expected:

  1. Check order values
  2. Use psql -c "TABLE pg_hba_file_rules" to view actual order
  3. Adjust order values or rule positions

While you can directly edit /pg/data/pg_hba.conf and reload, this is not recommended:

# Direct edit (not recommended)
vi /pg/data/pg_hba.conf

# Reload config
psql -c "SELECT pg_reload_conf()"
# Or
pg_ctl reload -D /pg/data
# Or
systemctl reload postgresql

Problem: Manual changes will be overwritten the next time an Ansible playbook runs.

Correct approach: Always modify in pigsty.yml, then run bin/pgsql-hba to refresh.


Pgbouncer HBA Management

Pgbouncer HBA management is similar to PostgreSQL, with some differences:

Configuration Differences

  • Config file: /etc/pgbouncer/pgb_hba.conf
  • Doesn’t support db: replication
  • Authentication method: local connections use peer instead of ident

Refresh Commands

# Refresh Pgbouncer HBA only
./pgsql.yml -l pg-meta -t pgbouncer_hba,pgbouncer_reload

# Or use unified script (refreshes both PostgreSQL and Pgbouncer)
bin/pgsql-hba pg-meta

View Pgbouncer HBA

cat /etc/pgbouncer/pgb_hba.conf

Best Practices

  1. Always manage in config files: Don’t directly edit pg_hba.conf—all changes through pigsty.yml
  2. Verify in test environment first: HBA changes can cause connection issues—verify in test environment first
  3. Use order to control priority: Blocklist rules use order: 0 to ensure priority matching
  4. Refresh promptly: Refresh HBA after adding/removing instances or failover
  5. Principle of least privilege: Only open necessary access—avoid addr: world + auth: trust
  6. Monitor authentication failures: Watch for authentication failures in pg_stat_activity
  7. Backup configuration: Backup pigsty.yml before important changes

Command Quick Reference

# Refresh HBA (recommended)
bin/pgsql-hba <cluster>

# View PostgreSQL HBA
psql -c "TABLE pg_hba_file_rules"
cat /pg/data/pg_hba.conf

# View Pgbouncer HBA
cat /etc/pgbouncer/pgb_hba.conf

# Reload PostgreSQL config
psql -c "SELECT pg_reload_conf()"

# Test connection
psql -h <host> -U <user> -d <db> -c "SELECT 1"

# View authentication failure logs
tail -f /pg/log/postgresql-*.log | grep -i auth

4 - SOP

Common PostgreSQL administration procedures in Pigsty for maintaining production database clusters.

This document organizes common PostgreSQL administration procedures in Pigsty for maintaining production database clusters.

Here are the standard operating procedures for common PostgreSQL administration tasks:


Cheatsheet

PGSQL playbooks and shortcuts:

bin/pgsql-add   <cls>                   # create pgsql cluster <cls>
bin/pgsql-user  <cls> <username>        # create pg user <username> on <cls>
bin/pgsql-db    <cls> <dbname>          # create pg database <dbname> on <cls>
bin/pgsql-svc   <cls> [...ip]           # reload pg service of cluster <cls>
bin/pgsql-hba   <cls> [...ip]           # reload postgres/pgbouncer HBA rules of cluster <cls>
bin/pgsql-add   <cls> [...ip]           # append replicas for cluster <cls>
bin/pgsql-rm    <cls> [...ip]           # remove replicas from cluster <cls>
bin/pgsql-rm    <cls>                   # remove pgsql cluster <cls>

Patroni admin command and shortcuts:

pg list        <cls>                    # print cluster info
pg edit-config <cls>                    # edit cluster config
pg reload      <cls> [ins]              # reload cluster config
pg restart     <cls> [ins]              # restart pgsql cluster
pg reinit      <cls> [ins]              # reinit cluster members
pg pause       <cls>                    # entering maintenance mode (no auto failover)
pg resume      <cls>                    # exiting maintenance mode
pg switchover  <cls>                    # switchover on cluster <cls> (primary is healthy)
pg failover    <cls>                    # failover on cluster <cls> (primary failed)

pgBackRest backup/restore command and shortcuts:

pb info                                 # print pgbackrest repo info
pg-backup                               # make a backup, incr, or full backup if necessary
pg-backup full                          # make a full backup
pg-backup diff                          # make a differential backup
pg-backup incr                          # make a incremental backup
pg-pitr -i                              # restore to most recent backup completion time (not common)
pg-pitr --time="2022-12-30 14:44:44+08" # restore to specific time point (e.g., in case of table/database drop)
pg-pitr --name="my-restore-point"       # restore to named restore point created by pg_create_restore_point
pg-pitr --lsn="0/7C82CB8" -X            # restore immediately before LSN
pg-pitr --xid="1234567" -X -P           # restore immediately before specific transaction ID, then promote to primary
pg-pitr --backup=latest                 # restore to latest backup set
pg-pitr --backup=20221108-105325        # restore to specific backup set, can be checked with pgbackrest info

Systemd components quick reference:

systemctl stop patroni                  # start stop restart reload
systemctl stop pgbouncer                # start stop restart reload
systemctl stop pg_exporter              # start stop restart reload
systemctl stop pgbouncer_exporter       # start stop restart reload
systemctl stop node_exporter            # start stop restart
systemctl stop haproxy                  # start stop restart reload
systemctl stop vip-manager              # start stop restart reload
systemctl stop postgres                 # only when patroni_mode == 'remove'

Create Cluster

To create a new Postgres cluster, first define it in the inventory, then initialize:

bin/node-add <cls>                # init nodes for cluster <cls>           # ./node.yml  -l <cls>
bin/pgsql-add <cls>               # init pgsql instances of cluster <cls>  # ./pgsql.yml -l <cls>

Note: PGSQL module requires managed nodes. Use bin/node-add to manage nodes first.

Example: Create Cluster

asciicast


Create User

To create a new business user on an existing Postgres cluster, add the user definition to all.children.<cls>.pg_users, then create it using:

bin/pgsql-user <cls> <username>   # ./pgsql-user.yml -l <cls> -e username=<username>
Example: Create Business User

asciicast


Delete User

To delete a user from an existing Postgres cluster, set the user’s state to absent in the definition, then execute:

bin/pgsql-user <cls> <username>   # ./pgsql-user.yml -l <cls> -e username=<username>

For example, to delete dbuser_test user:

pg_users:
  - name: dbuser_test
    state: absent

The deletion process will:

  1. Use the pg-drop-role script to safely delete the user
  2. Automatically disable user login and terminate active connections
  3. Automatically transfer database/tablespace ownership to postgres
  4. Automatically handle object ownership and permissions in all databases
  5. Revoke all role memberships
  6. Create an audit log for traceability
  7. Remove the user from the Pgbouncer user list (if previously added)
  8. Reload Pgbouncer configuration

Protected System Users:

The following system users cannot be deleted via state: absent and will be automatically skipped:

  • postgres (superuser)
  • replicator (or the user configured in pg_replication_username)
  • dbuser_dba (or the user configured in pg_admin_username)
  • dbuser_monitor (or the user configured in pg_monitor_username)
Example: pg-drop-role Script Usage
# Check user dependencies (read-only operation)
pg-drop-role dbuser_old --check

# Preview deletion operation (don't actually execute)
pg-drop-role dbuser_old --dry-run -v

# Delete user, transfer objects to postgres
pg-drop-role dbuser_old

# Delete user, transfer objects to specified user
pg-drop-role dbuser_old dbuser_new

# Force delete (terminate active connections)
pg-drop-role dbuser_old --force

Create Database

To create a new database on an existing Postgres cluster, add the database definition to all.children.<cls>.pg_databases, then create the database as follows:

bin/pgsql-db <cls> <dbname>       # ./pgsql-db.yml -l <cls> -e dbname=<dbname>

Note: If the database specifies a non-default owner, the owner user must already exist, otherwise you must Create User first.

Example: Create Business Database

asciicast


Reload Service

Services are access points exposed by PostgreSQL (reachable via PGURL), served by HAProxy on host nodes.

Use this task when cluster membership changes, for example: append/remove replicas, switchover/failover / exposing new services, or updating existing service configurations (e.g., LB weights)

To create new services or reload existing services on entire proxy cluster or specific instances:

bin/pgsql-svc <cls>               # pgsql.yml -l <cls> -t pg_service -e pg_reload=true
bin/pgsql-svc <cls> [ip...]       # pgsql.yml -l ip... -t pg_service -e pg_reload=true
Example: Reload PG Service to Remove an Instance

asciicast


Reload HBA

When your Postgres/Pgbouncer HBA rules change, you may need to reload HBA to apply the changes.

If you have any role-specific HBA rules, or IP address ranges referencing cluster member aliases, you may also need to reload HBA after switchover/cluster scaling.

To reload postgres and pgbouncer HBA rules on entire cluster or specific instances:

bin/pgsql-hba <cls>               # pgsql.yml -l <cls> -t pg_hba,pg_reload,pgbouncer_hba,pgbouncer_reload -e pg_reload=true
bin/pgsql-hba <cls> [ip...]       # pgsql.yml -l ip... -t pg_hba,pg_reload,pgbouncer_hba,pgbouncer_reload -e pg_reload=true
Example: Reload Cluster HBA Rules

asciicast


Config Cluster

To change configuration of an existing Postgres cluster, you need to issue control commands on the admin node using the admin user (the user who installed Pigsty, with nopass ssh/sudo):

Alternatively, on any node in the database cluster, using dbsu (default postgres), you can execute admin commands, but only for this cluster.

pg edit-config <cls>              # interactive config a cluster with patronictl

Change patroni parameters and postgresql.parameters, save and apply changes according to prompts.

Example: Non-Interactive Cluster Configuration

You can skip interactive mode and override postgres parameters using the -p option, for example:

pg edit-config -p log_min_duration_statement=1000 pg-test
pg edit-config --force -p shared_preload_libraries='timescaledb, pg_cron, pg_stat_statements, auto_explain'
Example: Change Cluster Config Using Patroni REST API

You can also use the Patroni REST API to change configuration non-interactively, for example:

$ curl -s 10.10.10.11:8008/config | jq .  # get current config
$ curl -u 'postgres:Patroni.API' \
        -d '{"postgresql":{"parameters": {"log_min_duration_statement":200}}}' \
        -s -X PATCH http://10.10.10.11:8008/config | jq .

Note: Patroni sensitive API access (e.g., restart) is restricted to requests from infra/admin nodes, with HTTP basic authentication (username/password) and optional HTTPS protection.

Example: Configure Cluster with patronictl

asciicast


Append Replica

To add a new replica to an existing PostgreSQL cluster, add its definition to the inventory all.children.<cls>.hosts, then:

bin/node-add <ip>                 # add node <ip> to Pigsty management
bin/pgsql-add <cls> <ip>          # init <ip> as new replica of cluster <cls>

This will add node <ip> to pigsty and initialize it as a replica of cluster <cls>.

Cluster services will be reloaded to accept the new member.

Example: Add Replica to pg-test

asciicast

For example, if you want to add pg-test-3 / 10.10.10.13 to existing cluster pg-test, first update the inventory:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary } # existing member
    10.10.10.12: { pg_seq: 2, pg_role: replica } # existing member
    10.10.10.13: { pg_seq: 3, pg_role: replica } # <--- new member
  vars: { pg_cluster: pg-test }

Then apply the changes as follows:

bin/node-add          10.10.10.13   # add node to pigsty
bin/pgsql-add pg-test 10.10.10.13   # init new replica for cluster pg-test on 10.10.10.13

This is similar to cluster initialization but works on a single instance:

[ OK ] Initialize instance 10.10.10.11 in pgsql cluster 'pg-test':
[WARN]   Reminder: add nodes to pigsty first, then install module 'pgsql'
[HINT]     $ bin/node-add  10.10.10.11  # run this first except for infra nodes
[WARN]   Init instance from cluster:
[ OK ]     $ ./pgsql.yml -l '10.10.10.11,&pg-test'
[WARN]   Reload pg_service on existing instances:
[ OK ]     $ ./pgsql.yml -l 'pg-test,!10.10.10.11' -t pg_service

Remove Replica

To remove a replica from an existing PostgreSQL cluster:

bin/pgsql-rm <cls> <ip...>        # ./pgsql-rm.yml -l <ip>

This will remove instance <ip> from cluster <cls>. Cluster services will be reloaded to remove the instance from load balancers.

Example: Remove Replica from pg-test

asciicast

For example, if you want to remove pg-test-3 / 10.10.10.13 from existing cluster pg-test:

bin/pgsql-rm pg-test 10.10.10.13  # remove pgsql instance 10.10.10.13 from pg-test
bin/node-rm  10.10.10.13          # remove node from pigsty (optional)
vi pigsty.yml                     # remove instance definition from inventory
bin/pgsql-svc pg-test             # refresh pg_service on existing instances to remove from load balancer
[ OK ] Remove pgsql instance 10.10.10.13 from 'pg-test':
[WARN]   Remove instance from cluster:
[ OK ]     $ ./pgsql-rm.yml -l '10.10.10.13,&pg-test'

And remove the instance definition from inventory:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary }
    10.10.10.12: { pg_seq: 2, pg_role: replica }
    10.10.10.13: { pg_seq: 3, pg_role: replica } # <--- remove this line after execution
  vars: { pg_cluster: pg-test }

Finally, you can reload PG service to remove the instance from load balancers:

bin/pgsql-svc pg-test             # reload service on pg-test

Remove Cluster

To remove an entire Postgres cluster, simply run:

bin/pgsql-rm <cls>                # ./pgsql-rm.yml -l <cls>
Example: Remove Cluster

asciicast

Example: Force Remove Cluster

Note: If pg_safeguard is configured for this cluster (or globally set to true), pgsql-rm.yml will abort to avoid accidental cluster removal.

You can explicitly override it with playbook command line parameters to force removal:

./pgsql-rm.yml -l pg-meta -e pg_safeguard=false    # force remove pg cluster pg-meta

Switchover

You can use the patroni command line tool to perform PostgreSQL cluster switchover.

pg switchover <cls>   # interactive mode, you can skip the wizard with the following parameter combination
pg switchover --leader pg-test-1 --candidate=pg-test-2 --scheduled=now --force pg-test
Example: pg-test Switchover

asciicast

$ pg switchover pg-test
Master [pg-test-1]:
Candidate ['pg-test-2', 'pg-test-3'] []: pg-test-2
When should the switchover take place (e.g. 2022-12-26T07:39 )  [now]: now
Current cluster topology
+ Cluster: pg-test (7181325041648035869) -----+----+-----------+-----------------+
| Member    | Host        | Role    | State   | TL | Lag in MB | Tags            |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Leader  | running |  1 |           | clonefrom: true |
|           |             |         |         |    |           | conf: tiny.yml  |
|           |             |         |         |    |           | spec: 1C.2G.50G |
|           |             |         |         |    |           | version: '15'   |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-2 | 10.10.10.12 | Replica | running |  1 |         0 | clonefrom: true |
|           |             |         |         |    |           | conf: tiny.yml  |
|           |             |         |         |    |           | spec: 1C.2G.50G |
|           |             |         |         |    |           | version: '15'   |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-3 | 10.10.10.13 | Replica | running |  1 |         0 | clonefrom: true |
|           |             |         |         |    |           | conf: tiny.yml  |
|           |             |         |         |    |           | spec: 1C.2G.50G |
|           |             |         |         |    |           | version: '15'   |
+-----------+-------------+---------+---------+----+-----------+-----------------+
Are you sure you want to switchover cluster pg-test, demoting current master pg-test-1? [y/N]: y
2022-12-26 06:39:58.02468 Successfully switched over to "pg-test-2"
+ Cluster: pg-test (7181325041648035869) -----+----+-----------+-----------------+
| Member    | Host        | Role    | State   | TL | Lag in MB | Tags            |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Replica | stopped |    |   unknown | clonefrom: true |
|           |             |         |         |    |           | conf: tiny.yml  |
|           |             |         |         |    |           | spec: 1C.2G.50G |
|           |             |         |         |    |           | version: '15'   |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-2 | 10.10.10.12 | Leader  | running |  1 |           | clonefrom: true |
|           |             |         |         |    |           | conf: tiny.yml  |
|           |             |         |         |    |           | spec: 1C.2G.50G |
|           |             |         |         |    |           | version: '15'   |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-3 | 10.10.10.13 | Replica | running |  1 |         0 | clonefrom: true |
|           |             |         |         |    |           | conf: tiny.yml  |
|           |             |         |         |    |           | spec: 1C.2G.50G |
|           |             |         |         |    |           | version: '15'   |
+-----------+-------------+---------+---------+----+-----------+-----------------+

To perform this via Patroni API (e.g., switch primary from instance 2 to instance 1 at a specified time):

curl -u 'postgres:Patroni.API' \
  -d '{"leader":"pg-test-2", "candidate": "pg-test-1","scheduled_at":"2022-12-26T14:47+08"}' \
  -s -X POST http://10.10.10.11:8008/switchover

After either switchover or failover, you need to refresh services and HBA rules after cluster membership changes. You should complete this promptly (e.g., within a few hours or a day) after the change:

bin/pgsql-svc <cls>
bin/pgsql-hba <cls>

Backup Cluster

To create backups using pgBackRest, run the following commands as local dbsu (default postgres):

pg-backup       # make a backup, incremental or full if necessary
pg-backup full  # make a full backup
pg-backup diff  # make a differential backup
pg-backup incr  # make an incremental backup
pb info         # print backup info (pgbackrest info)

See Backup & Restore for more information.

Example: Create Backup

asciicast

Example: Create Scheduled Backup Task

You can add crontab to node_crontab to specify your backup strategy.

# Full backup daily at 1 AM
- '00 01 * * * postgres /pg/bin/pg-backup full'

# Full backup on Monday at 1 AM, incremental backups on other weekdays
- '00 01 * * 1 postgres /pg/bin/pg-backup full'
- '00 01 * * 2,3,4,5,6,7 postgres /pg/bin/pg-backup'

Restore Cluster

To restore a cluster to a previous point in time (PITR), run the Pigsty helper script pg-pitr as local dbsu user (default postgres):

pg-pitr -i                              # restore to most recent backup completion time (not common)
pg-pitr --time="2022-12-30 14:44:44+08" # restore to specific time point (e.g., in case of table/database drop)
pg-pitr --name="my-restore-point"       # restore to named restore point created by pg_create_restore_point
pg-pitr --lsn="0/7C82CB8" -X            # restore immediately before LSN
pg-pitr --xid="1234567" -X -P           # restore immediately before specific transaction ID, then promote cluster to primary
pg-pitr --backup=latest                 # restore to latest backup set
pg-pitr --backup=20221108-105325        # restore to specific backup set, can be listed with pgbackrest info

The command will output an operations manual, follow the instructions. See Backup & Restore - PITR for details.

Example: PITR Using Raw pgBackRest Commands
# Restore to latest available point (e.g., hardware failure)
pgbackrest --stanza=pg-meta restore

# PITR to specific time point (e.g., accidental table drop)
pgbackrest --stanza=pg-meta --type=time --target="2022-11-08 10:58:48" \
   --target-action=promote restore

# Restore specific backup point, then promote (or pause|shutdown)
pgbackrest --stanza=pg-meta --type=immediate --target-action=promote \
  --set=20221108-105325F_20221108-105938I restore

Adding Packages

To add new RPM packages, add them to repo_packages and repo_url_packages.

Use ./infra.yml -t repo_build subtask to rebuild local repo on Infra node. Then you can install these packages using ansible’s package module:

ansible pg-test -b -m package -a "name=pg_cron_15,topn_15,pg_stat_monitor_15*"  # install some packages with ansible
Example: Manually Update Packages in Local Repo
# Add upstream repo on infra/admin node, then manually download required packages
cd ~/pigsty; ./infra.yml -t repo_upstream,repo_cache # add upstream repo (internet)
cd /www/pigsty;  repotrack "some_new_package_name"   # download latest RPM packages

# Update local repo metadata
cd ~/pigsty; ./infra.yml -t repo_create              # recreate local repo
./node.yml -t node_repo                              # refresh YUM/APT cache on all nodes

# You can also manually refresh YUM/APT cache on nodes using Ansible
ansible all -b -a 'yum clean all'                    # clean node repo cache
ansible all -b -a 'yum makecache'                    # rebuild yum/apt cache from new repo
ansible all -b -a 'apt clean'                        # clean APT cache (Ubuntu/Debian)
ansible all -b -a 'apt update'                       # rebuild APT cache (Ubuntu/Debian)

For example, you can install or upgrade packages as follows:

ansible pg-test -b -m package -a "name=postgresql15* state=latest"

Install Extension

If you want to install extensions on a PostgreSQL cluster, add them to pg_extensions, then execute:

./pgsql.yml -t pg_extension     # install extensions

Some extensions need to be loaded in shared_preload_libraries to take effect. You can add them to pg_libs, or configure an existing cluster.

Finally, execute CREATE EXTENSION <extname>; on the cluster’s primary to complete extension installation.

Example: Install pg_cron Extension on pg-test Cluster
ansible pg-test -b -m package -a "name=pg_cron_15"          # install pg_cron package on all nodes
# Add pg_cron to shared_preload_libraries
pg edit-config --force -p shared_preload_libraries='timescaledb, pg_cron, pg_stat_statements, auto_explain'
pg restart --force pg-test                                  # restart cluster
psql -h pg-test -d postgres -c 'CREATE EXTENSION pg_cron;'  # install pg_cron on primary

For more details, see PGSQL Extension Installation.


Minor Upgrade

To perform minor version upgrade/downgrade, first add packages to the local repo: latest PG minor version RPM/DEB.

First perform rolling upgrade/downgrade on all replicas, then perform cluster switchover to upgrade/downgrade the primary.

ansible <cls> -b -a "yum upgrade/downgrade -y <pkg>"    # upgrade/downgrade packages
pg restart --force <cls>                                # restart cluster
Example: Downgrade PostgreSQL 15.2 to 15.1

Add 15.1 packages to repo and refresh nodes’ yum/apt cache:

cd ~/pigsty; ./infra.yml -t repo_upstream               # add upstream repo
cd /www/pigsty; repotrack postgresql15-*-15.1           # add 15.1 packages to yum repo
cd ~/pigsty; ./infra.yml -t repo_create                 # rebuild repo metadata
ansible pg-test -b -a 'yum clean all'                   # clean node repo cache
ansible pg-test -b -a 'yum makecache'                   # rebuild yum cache from new repo

# For Ubuntu/Debian users, use apt instead of yum
ansible pg-test -b -a 'apt clean'                       # clean node repo cache
ansible pg-test -b -a 'apt update'                      # rebuild apt cache from new repo

Execute downgrade and restart cluster:

ansible pg-test -b -a "yum downgrade -y postgresql15*"  # downgrade packages
pg restart --force pg-test                              # restart entire cluster to complete upgrade
Example: Upgrade PostgreSQL 15.1 Back to 15.2

This time we’ll do a rolling upgrade:

ansible pg-test -b -a "yum upgrade -y postgresql15*"    # upgrade packages (or apt upgrade)
ansible pg-test -b -a '/usr/pgsql/bin/pg_ctl --version' # check binary version is 15.2
pg restart --role replica --force pg-test               # restart replicas
pg switchover --leader pg-test-1 --candidate=pg-test-2 --scheduled=now --force pg-test    # switchover
pg restart --role primary --force pg-test               # restart primary

Major Upgrade

The easiest way to perform a major upgrade is to create a new cluster using the new version, then perform online migration through logical replication and blue-green deployment.

You can also perform in-place major upgrades. When using only the database kernel itself, this is not complicated - use PostgreSQL’s built-in pg_upgrade:

Suppose you want to upgrade PostgreSQL major version from 14 to 15. First add packages to the repo and ensure core extension plugins are installed with the same version numbers on both major versions.

./pgsql.yml -t pg_pkg -e pg_version=15                         # install pg 15 packages
sudo su - postgres; mkdir -p /data/postgres/pg-meta-15/data/   # prepare directory for 15
pg_upgrade -b /usr/pgsql-14/bin/ -B /usr/pgsql-15/bin/ -d /data/postgres/pg-meta-14/data/ -D /data/postgres/pg-meta-15/data/ -v -c # precheck
pg_upgrade -b /usr/pgsql-14/bin/ -B /usr/pgsql-15/bin/ -d /data/postgres/pg-meta-14/data/ -D /data/postgres/pg-meta-15/data/ --link -j8 -v -c
rm -rf /usr/pgsql; ln -s /usr/pgsql-15 /usr/pgsql;             # fix binary link
mv /data/postgres/pg-meta-14 /data/postgres/pg-meta-15         # rename data directory
rm -rf /pg; ln -s /data/postgres/pg-meta-15 /pg                # fix data directory link

5 - Cluster Management

Standard operation guide for creating/destroying PostgreSQL clusters and scaling existing clusters.

Create Cluster

To create a new Postgres cluster, first define it in the inventory, then initialize:

bin/node-add <cls>                # init nodes for cluster <cls>           # ./node.yml  -l <cls>
bin/pgsql-add <cls>               # init pgsql instances of cluster <cls>  # ./pgsql.yml -l <cls>

Note: PGSQL module requires managed nodes. Use bin/node-add to manage nodes first.

Example: Create Cluster

asciicast


Create User

To create a new business user on an existing Postgres cluster, add the user definition to all.children.<cls>.pg_users, then create it using:

bin/pgsql-user <cls> <username>   # ./pgsql-user.yml -l <cls> -e username=<username>
Example: Create Business User

asciicast


Create Database

To create a new database on an existing Postgres cluster, add the database definition to all.children.<cls>.pg_databases, then create the database as follows:

bin/pgsql-db <cls> <dbname>       # ./pgsql-db.yml -l <cls> -e dbname=<dbname>

Note: If the database specifies a non-default owner, the owner user must already exist, otherwise you must Create User first.

Example: Create Business Database

asciicast


Reload Service

Services are access points exposed by PostgreSQL (reachable via PGURL), served by HAProxy on host nodes.

Use this task when cluster membership changes, for example: append/remove replicas, switchover/failover / exposing new services, or updating existing service configurations (e.g., LB weights)

To create new services or reload existing services on entire proxy cluster or specific instances:

bin/pgsql-svc <cls>               # pgsql.yml -l <cls> -t pg_service -e pg_reload=true
bin/pgsql-svc <cls> [ip...]       # pgsql.yml -l ip... -t pg_service -e pg_reload=true
Example: Reload PG Service to Remove an Instance

asciicast


Reload HBA

When your Postgres/Pgbouncer HBA rules change, you may need to reload HBA to apply the changes.

If you have any role-specific HBA rules, or IP address ranges referencing cluster member aliases, you may also need to reload HBA after switchover/cluster scaling.

To reload postgres and pgbouncer HBA rules on entire cluster or specific instances:

bin/pgsql-hba <cls>               # pgsql.yml -l <cls> -t pg_hba,pg_reload,pgbouncer_hba,pgbouncer_reload -e pg_reload=true
bin/pgsql-hba <cls> [ip...]       # pgsql.yml -l ip... -t pg_hba,pg_reload,pgbouncer_hba,pgbouncer_reload -e pg_reload=true
Example: Reload Cluster HBA Rules

asciicast


Config Cluster

To change configuration of an existing Postgres cluster, you need to issue control commands on the admin node using the admin user (the user who installed Pigsty, with nopass ssh/sudo):

Alternatively, on any node in the database cluster, using dbsu (default postgres), you can execute admin commands, but only for this cluster.

pg edit-config <cls>              # interactive config a cluster with patronictl

Change patroni parameters and postgresql.parameters, save and apply changes according to prompts.

Example: Non-Interactive Cluster Configuration

You can skip interactive mode and override postgres parameters using the -p option, for example:

pg edit-config -p log_min_duration_statement=1000 pg-test
pg edit-config --force -p shared_preload_libraries='timescaledb, pg_cron, pg_stat_statements, auto_explain'
Example: Change Cluster Config Using Patroni REST API

You can also use the Patroni REST API to change configuration non-interactively, for example:

$ curl -s 10.10.10.11:8008/config | jq .  # get current config
$ curl -u 'postgres:Patroni.API' \
        -d '{"postgresql":{"parameters": {"log_min_duration_statement":200}}}' \
        -s -X PATCH http://10.10.10.11:8008/config | jq .

Note: Patroni sensitive API access (e.g., restart) is restricted to requests from infra/admin nodes, with HTTP basic authentication (username/password) and optional HTTPS protection.

Example: Configure Cluster with patronictl

asciicast


Append Replica

To add a new replica to an existing PostgreSQL cluster, add its definition to the inventory all.children.<cls>.hosts, then:

bin/node-add <ip>                 # add node <ip> to Pigsty management
bin/pgsql-add <cls> <ip>          # init <ip> as new replica of cluster <cls>

This will add node <ip> to pigsty and initialize it as a replica of cluster <cls>.

Cluster services will be reloaded to accept the new member.

Example: Add Replica to pg-test

asciicast

For example, if you want to add pg-test-3 / 10.10.10.13 to existing cluster pg-test, first update the inventory:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary } # existing member
    10.10.10.12: { pg_seq: 2, pg_role: replica } # existing member
    10.10.10.13: { pg_seq: 3, pg_role: replica } # <--- new member
  vars: { pg_cluster: pg-test }

Then apply the changes as follows:

bin/node-add          10.10.10.13   # add node to pigsty
bin/pgsql-add pg-test 10.10.10.13   # init new replica for cluster pg-test on 10.10.10.13

This is similar to cluster initialization but works on a single instance:

[ OK ] Initialize instance 10.10.10.11 in pgsql cluster 'pg-test':
[WARN]   Reminder: add nodes to pigsty first, then install module 'pgsql'
[HINT]     $ bin/node-add  10.10.10.11  # run this first except for infra nodes
[WARN]   Init instance from cluster:
[ OK ]     $ ./pgsql.yml -l '10.10.10.11,&pg-test'
[WARN]   Reload pg_service on existing instances:
[ OK ]     $ ./pgsql.yml -l 'pg-test,!10.10.10.11' -t pg_service

Remove Replica

To remove a replica from an existing PostgreSQL cluster:

bin/pgsql-rm <cls> <ip...>        # ./pgsql-rm.yml -l <ip>

This will remove instance <ip> from cluster <cls>. Cluster services will be reloaded to remove the instance from load balancers.

Example: Remove Replica from pg-test

asciicast

For example, if you want to remove pg-test-3 / 10.10.10.13 from existing cluster pg-test:

bin/pgsql-rm pg-test 10.10.10.13  # remove pgsql instance 10.10.10.13 from pg-test
bin/node-rm  10.10.10.13          # remove node from pigsty (optional)
vi pigsty.yml                     # remove instance definition from inventory
bin/pgsql-svc pg-test             # refresh pg_service on existing instances to remove from load balancer
[ OK ] Remove pgsql instance 10.10.10.13 from 'pg-test':
[WARN]   Remove instance from cluster:
[ OK ]     $ ./pgsql-rm.yml -l '10.10.10.13,&pg-test'

And remove the instance definition from inventory:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary }
    10.10.10.12: { pg_seq: 2, pg_role: replica }
    10.10.10.13: { pg_seq: 3, pg_role: replica } # <--- remove this line after execution
  vars: { pg_cluster: pg-test }

Finally, you can reload PG service to remove the instance from load balancers:

bin/pgsql-svc pg-test             # reload service on pg-test

Remove Cluster

To remove an entire Postgres cluster, simply run:

bin/pgsql-rm <cls>                # ./pgsql-rm.yml -l <cls>
Example: Remove Cluster

asciicast

Example: Force Remove Cluster

Note: If pg_safeguard is configured for this cluster (or globally set to true), pgsql-rm.yml will abort to avoid accidental cluster removal.

You can explicitly override it with playbook command line parameters to force removal:

./pgsql-rm.yml -l pg-meta -e pg_safeguard=false    # force remove pg cluster pg-meta

6 - User Management

Creating PostgreSQL users/roles, managing connection pool roles, refreshing expiration times, user password rotation

Creating Users

To create a new business user on an existing Postgres cluster, add the user definition to all.children.<cls>.pg_users, then create it using the following command:

bin/pgsql-user <cls> <username>   # ./pgsql-user.yml -l <cls> -e username=<username>

Example: Creating a business user

asciicast


Defining Users

Pigsty defines roles and users in database clusters through two configuration parameters:

  • pg_default_roles: Defines globally unified roles and users
  • pg_users: Defines business users and roles at the database cluster level

The former is used to define roles and users shared across the entire environment, while the latter defines business roles and users specific to individual clusters. Both have the same format, being arrays of user definition objects.

You can define multiple users/roles. They will be created sequentially first globally, then by cluster, and finally in array order, so later users can belong to previously defined roles.

Below is the business user definition in the default cluster pg-meta in the Pigsty demo environment:

pg-meta:
  hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_meta     ,password: DBUser.Meta     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: pigsty admin user }
      - {name: dbuser_view     ,password: DBUser.Viewer   ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer for meta database }
      - {name: dbuser_grafana  ,password: DBUser.Grafana  ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for grafana database    }
      - {name: dbuser_bytebase ,password: DBUser.Bytebase ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for bytebase database   }
      - {name: dbuser_kong     ,password: DBUser.Kong     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for kong api gateway    }
      - {name: dbuser_gitea    ,password: DBUser.Gitea    ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for gitea service       }
      - {name: dbuser_wiki     ,password: DBUser.Wiki     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for wiki.js service     }
      - {name: dbuser_noco     ,password: DBUser.Noco     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: admin user for nocodb service      }

Each user/role definition is an object that may include the following fields, using the dbuser_meta user as an example:

- name: dbuser_meta               # Required, `name` is the only mandatory field in a user definition
  password: DBUser.Meta           # Optional, password, can be a scram-sha-256 hash string or plaintext
  login: true                     # Optional, can log in by default
  superuser: false                # Optional, default is false, is this a superuser?
  createdb: false                 # Optional, default is false, can create databases?
  createrole: false               # Optional, default is false, can create roles?
  inherit: true                   # Optional, by default, can this role use inherited permissions?
  replication: false              # Optional, default is false, can this role perform replication?
  bypassrls: false                # Optional, default is false, can this role bypass row-level security?
  pgbouncer: true                 # Optional, default is false, add this user to the pgbouncer user list? (production users using connection pooling should explicitly set to true)
  connlimit: -1                   # Optional, user connection limit, default -1 disables limit
  expire_in: 3650                 # Optional, expiration time for this role: calculated as created time + n days (higher priority than expire_at)
  expire_at: '2030-12-31'         # Optional, time point when this role expires, specify a specific date using YYYY-MM-DD format string (lower priority than expire_in)
  comment: pigsty admin user      # Optional, description and comment string for this user/role
  roles: [dbrole_admin]           # Optional, default roles are: dbrole_{admin,readonly,readwrite,offline}
  parameters: {}                  # Optional, configure role-level database parameters for this role using `ALTER ROLE SET`
  pool_mode: transaction          # Optional, pgbouncer pool mode defaulting to transaction, at user level
  pool_connlimit: -1              # Optional, maximum database connections at user level, default -1 disables limit
  search_path: public             # Optional, key-value configuration parameters according to postgresql documentation (e.g., use pigsty as default search_path)
  • The only required field is name, which should be a valid and unique username in the PostgreSQL cluster.
  • Roles don’t need a password, but for login-enabled business users, it’s usually necessary to specify a password.
  • password can be plaintext or a scram-sha-256 / md5 hash string; please avoid using plaintext passwords.
  • Users/roles are created sequentially in array order, so ensure role/group definitions come before members.
  • login, superuser, createdb, createrole, inherit, replication, bypassrls are boolean flags.
  • pgbouncer is disabled by default: to add business users to the pgbouncer user list, you should explicitly set it to true.

ACL System

Pigsty has a built-in, out-of-the-box access control / ACL system. You can easily use it by assigning the following four default roles to business users:

  • dbrole_readwrite: Role with global read-write access (production accounts primarily used by business should have database read-write permissions)
  • dbrole_readonly: Role with global read-only access (if other businesses want read-only access, they can use this role)
  • dbrole_admin: Role with DDL permissions (business administrators, scenarios requiring table creation in applications)
  • dbrole_offline: Role with restricted read-only access (can only access offline instances, typically for personal users)

If you want to redesign your own ACL system, consider customizing the following parameters and templates:


Creating Users

Users and roles defined in pg_default_roles and pg_users will be automatically created sequentially during the PROVISION phase of cluster initialization. If you want to create users on an existing cluster, you can use the bin/pgsql-user tool. Add the new user/role definition to all.children.<cls>.pg_users and create the database using the following method:

bin/pgsql-user <cls> <username>    # pgsql-user.yml -l <cls> -e username=<username>

Unlike databases, the user creation playbook is always idempotent. When the target user already exists, Pigsty will modify the target user’s attributes to conform to the configuration. So running it repeatedly on existing clusters typically won’t cause issues.


Modifying Users

The method for modifying PostgreSQL user attributes is the same as Creating Users.

First, adjust your user definition, modify the attributes that need adjustment, then execute the following command to apply:

bin/pgsql-user <cls> <username>    # pgsql-user.yml -l <cls> -e username=<username>

Note that modifying users does not delete users but modifies user attributes using the ALTER USER command; it also doesn’t revoke user permissions and groups, and uses the GRANT command to grant new roles.


Deleting Users

To delete a user, set its state to absent and execute the playbook:

pg_users:
  - name: dbuser_old
    state: absent
bin/pgsql-user <cls> dbuser_old

The deletion process will:

  1. Use the pg-drop-role script to safely delete the user
  2. Automatically disable user login and terminate active connections
  3. Automatically transfer database/tablespace ownership to postgres
  4. Automatically handle object ownership and permissions in all databases
  5. Revoke all role memberships
  6. Create an audit log for traceability
  7. Remove the user from the Pgbouncer user list (if previously added)
  8. Reload Pgbouncer configuration

Protected System Users:

The following system users cannot be deleted via state: absent and will be automatically skipped:

  • postgres (superuser)
  • replicator (or the user configured in pg_replication_username)
  • dbuser_dba (or the user configured in pg_admin_username)
  • dbuser_monitor (or the user configured in pg_monitor_username)

pg-drop-role Script

pg-drop-role is a safe user deletion script provided by Pigsty, located at /pg/bin/pg-drop-role.

Usage:

pg-drop-role <role_name> [successor_role] [options]

Common Options:

OptionDescription
--checkOnly check dependencies, don’t execute deletion
--dry-runShow SQL statements that would be executed, don’t actually execute
--forceForce terminate active connections before deletion
-v, --verboseShow verbose output
-h, --hostDatabase host
-p, --portDatabase port

Examples:

# Check user dependencies (read-only operation)
pg-drop-role dbuser_old --check

# Preview deletion operation (don't actually execute)
pg-drop-role dbuser_old --dry-run -v

# Delete user, transfer objects to postgres
pg-drop-role dbuser_old

# Delete user, transfer objects to specified user
pg-drop-role dbuser_old dbuser_new

# Force delete (terminate active connections)
pg-drop-role dbuser_old --force

Deletion Process:

  1. Pre-check - Verify connection, check if user exists, check if protected
  2. Create audit snapshot - Record all user dependencies
  3. Disable login - ALTER ROLE ... NOLOGIN
  4. Terminate connections - Terminate active connections when using --force
  5. Transfer shared objects - Transfer database, tablespace ownership
  6. Process all databases - Execute REASSIGN OWNED + DROP OWNED in each database
  7. Revoke memberships - Revoke all role memberships
  8. Drop role - Execute DROP ROLE

Pgbouncer Users

Pgbouncer is enabled by default and serves as connection pool middleware, with users managed by default.

Pigsty defaults to adding all users in pg_users that explicitly have the pgbouncer: true flag to the pgbouncer user list.

Users in the Pgbouncer connection pool are listed in /etc/pgbouncer/userlist.txt:

"postgres" ""
"dbuser_wiki" "SCRAM-SHA-256$4096:+77dyhrPeFDT/TptHs7/7Q==$KeatuohpKIYzHPCt/tqBu85vI11o9mar/by0hHYM2W8=:X9gig4JtjoS8Y/o1vQsIX/gY1Fns8ynTXkbWOjUfbRQ="
"dbuser_view" "SCRAM-SHA-256$4096:DFoZHU/DXsHL8MJ8regdEw==$gx9sUGgpVpdSM4o6A2R9PKAUkAsRPLhLoBDLBUYtKS0=:MujSgKe6rxcIUMv4GnyXJmV0YNbf39uFRZv724+X1FE="
"dbuser_monitor" "SCRAM-SHA-256$4096:fwU97ZMO/KR0ScHO5+UuBg==$CrNsmGrx1DkIGrtrD1Wjexb/aygzqQdirTO1oBZROPY=:L8+dJ+fqlMQh7y4PmVR/gbAOvYWOr+KINjeMZ8LlFww="
"dbuser_meta" "SCRAM-SHA-256$4096:leB2RQPcw1OIiRnPnOMUEg==$eyC+NIMKeoTxshJu314+BmbMFpCcspzI3UFZ1RYfNyU=:fJgXcykVPvOfro2MWNkl5q38oz21nSl1dTtM65uYR1Q="
"dbuser_kong" "SCRAM-SHA-256$4096:bK8sLXIieMwFDz67/0dqXQ==$P/tCRgyKx9MC9LH3ErnKsnlOqgNd/nn2RyvThyiK6e4=:CDM8QZNHBdPf97ztusgnE7olaKDNHBN0WeAbP/nzu5A="
"dbuser_grafana" "SCRAM-SHA-256$4096:HjLdGaGmeIAGdWyn2gDt/Q==$jgoyOB8ugoce+Wqjr0EwFf8NaIEMtiTuQTg1iEJs9BM=:ed4HUFqLyB4YpRr+y25FBT7KnlFDnan6JPVT9imxzA4="
"dbuser_gitea" "SCRAM-SHA-256$4096:l1DBGCc4dtircZ8O8Fbzkw==$tpmGwgLuWPDog8IEKdsaDGtiPAxD16z09slvu+rHE74=:pYuFOSDuWSofpD9OZhG7oWvyAR0PQjJBffgHZLpLHds="
"dbuser_dba" "SCRAM-SHA-256$4096:zH8niABU7xmtblVUo2QFew==$Zj7/pq+ICZx7fDcXikiN7GLqKKFA+X5NsvAX6CMshF0=:pqevR2WpizjRecPIQjMZOm+Ap+x0kgPL2Iv5zHZs0+g="
"dbuser_bytebase" "SCRAM-SHA-256$4096:OMoTM9Zf8QcCCMD0svK5gg==$kMchqbf4iLK1U67pVOfGrERa/fY818AwqfBPhsTShNQ=:6HqWteN+AadrUnrgC0byr5A72noqnPugItQjOLFw0Wk="

User-level connection pool parameters are maintained in a separate file: /etc/pgbouncer/useropts.txt, for example:

dbuser_dba                  = pool_mode=session max_user_connections=16
dbuser_monitor              = pool_mode=session max_user_connections=8

When you create a database, Pgbouncer’s database list definition file will be refreshed and take effect through online configuration reload, without affecting existing connections.

Pgbouncer runs with the same dbsu as PostgreSQL, defaulting to the postgres OS user. You can use the pgb alias to access pgbouncer management functions using dbsu.

Pigsty also provides a utility function pgb-route that can quickly switch pgbouncer database traffic to other nodes in the cluster for zero-downtime migration:

Connection pool user configuration files userlist.txt and useropts.txt will be automatically refreshed when you create users and take effect through online configuration reload, normally without affecting existing connections.

Note that the pgbouncer_auth_query parameter allows you to use dynamic queries to complete connection pool user authentication, which is a compromise solution when you don’t want to manage users in the connection pool.

7 - Parameter Tuning

Tuning Postgres Parameters

Pigsty provides four scenario-based parameter templates by default, which can be specified and used through the pg_conf parameter.

  • tiny.yml: Optimized for small nodes, VMs, and small demos (1-8 cores, 1-16GB)
  • oltp.yml: Optimized for OLTP workloads and latency-sensitive applications (4C8GB+) (default template)
  • olap.yml: Optimized for OLAP workloads and throughput (4C8G+)
  • crit.yml: Optimized for data consistency and critical applications (4C8G+)

Pigsty adopts different parameter optimization strategies for these four default scenarios, as shown below:


Memory Parameter Tuning

Pigsty automatically detects the system’s memory size and uses it as the basis for setting the maximum number of connections and memory-related parameters.

  • pg_max_conn: PostgreSQL maximum connections, auto will use recommended values for different scenarios
  • pg_shared_buffer_ratio: Shared buffer memory ratio, default is 0.25

By default, Pigsty uses 25% of memory as PostgreSQL shared buffers, with the remaining 75% as the operating system cache.

By default, if the user has not set a pg_max_conn maximum connections value, Pigsty will use defaults according to the following rules:

  • oltp: 500 (pgbouncer) / 1000 (postgres)
  • crit: 500 (pgbouncer) / 1000 (postgres)
  • tiny: 300
  • olap: 300

For OLTP and CRIT templates, if the service is not pointing to the pgbouncer connection pool but directly connects to the postgres database, the maximum connections will be doubled to 1000.

After determining the maximum connections, work_mem is calculated from shared memory size / maximum connections and limited to the range of 64MB ~ 1GB.

{% raw %}
{% if pg_max_conn != 'auto' and pg_max_conn|int >= 20 %}{% set pg_max_connections = pg_max_conn|int %}{% else %}{% if pg_default_service_dest|default('postgres') == 'pgbouncer' %}{% set pg_max_connections = 500 %}{% else %}{% set pg_max_connections = 1000 %}{% endif %}{% endif %}
{% set pg_max_prepared_transactions = pg_max_connections if 'citus' in pg_libs else 0 %}
{% set pg_max_locks_per_transaction = (2 * pg_max_connections)|int if 'citus' in pg_libs or 'timescaledb' in pg_libs else pg_max_connections %}
{% set pg_shared_buffers = (node_mem_mb|int * pg_shared_buffer_ratio|float) | round(0, 'ceil') | int %}
{% set pg_maintenance_mem = (pg_shared_buffers|int * 0.25)|round(0, 'ceil')|int %}
{% set pg_effective_cache_size = node_mem_mb|int - pg_shared_buffers|int  %}
{% set pg_workmem =  ([ ([ (pg_shared_buffers / pg_max_connections)|round(0,'floor')|int , 64 ])|max|int , 1024])|min|int %}
{% endraw %}

CPU Parameter Tuning

In PostgreSQL, there are 4 important parameters related to parallel queries. Pigsty automatically optimizes parameters based on the current system’s CPU cores. In all strategies, the total number of parallel processes (total budget) is usually set to CPU cores + 8, with a minimum of 16, to reserve enough background workers for logical replication and extensions. The OLAP and TINY templates vary slightly based on scenarios.

OLTPSetting LogicRange Limits
max_worker_processesmax(100% CPU + 8, 16)CPU cores + 4, minimum 12
max_parallel_workersmax(ceil(50% CPU), 2)1/2 CPU rounded up, minimum 2
max_parallel_maintenance_workersmax(ceil(33% CPU), 2)1/3 CPU rounded up, minimum 2
max_parallel_workers_per_gathermin(max(ceil(20% CPU), 2),8)1/5 CPU rounded down, minimum 2, max 8
OLAPSetting LogicRange Limits
max_worker_processesmax(100% CPU + 12, 20)CPU cores + 12, minimum 20
max_parallel_workersmax(ceil(80% CPU, 2))4/5 CPU rounded up, minimum 2
max_parallel_maintenance_workersmax(ceil(33% CPU), 2)1/3 CPU rounded up, minimum 2
max_parallel_workers_per_gathermax(floor(50% CPU), 2)1/2 CPU rounded up, minimum 2
CRITSetting LogicRange Limits
max_worker_processesmax(100% CPU + 8, 16)CPU cores + 8, minimum 16
max_parallel_workersmax(ceil(50% CPU), 2)1/2 CPU rounded up, minimum 2
max_parallel_maintenance_workersmax(ceil(33% CPU), 2)1/3 CPU rounded up, minimum 2
max_parallel_workers_per_gather0, enable as needed
TINYSetting LogicRange Limits
max_worker_processesmax(100% CPU + 4, 12)CPU cores + 4, minimum 12
max_parallel_workersmax(ceil(50% CPU) 1)50% CPU rounded down, minimum 1
max_parallel_maintenance_workersmax(ceil(33% CPU), 1)33% CPU rounded down, minimum 1
max_parallel_workers_per_gather0, enable as needed

Note that the CRIT and TINY templates disable parallel queries by setting max_parallel_workers_per_gather = 0. Users can enable parallel queries as needed by setting this parameter.

Both OLTP and CRIT templates additionally set the following parameters, doubling the parallel query cost to reduce the tendency to use parallel queries.

parallel_setup_cost: 2000           # double from 100 to increase parallel cost
parallel_tuple_cost: 0.2            # double from 0.1 to increase parallel cost
min_parallel_table_scan_size: 16MB  # double from 8MB to increase parallel cost
min_parallel_index_scan_size: 1024  # double from 512 to increase parallel cost

Note that adjustments to the max_worker_processes parameter only take effect after a restart. Additionally, when a replica’s configuration value for this parameter is higher than the primary’s, the replica will fail to start. This parameter must be adjusted through Patroni configuration management, which ensures consistent primary-replica configuration and prevents new replicas from failing to start during failover.


Storage Space Parameters

Pigsty automatically detects the total space of the disk where the /data/postgres main data directory is located and uses it as the basis for specifying the following parameters:

{% raw %}
min_wal_size: {{ ([pg_size_twentieth, 200])|min }}GB                  # 1/20 disk size, max 200GB
max_wal_size: {{ ([pg_size_twentieth * 4, 2000])|min }}GB             # 2/10 disk size, max 2000GB
max_slot_wal_keep_size: {{ ([pg_size_twentieth * 6, 3000])|min }}GB   # 3/10 disk size, max 3000GB
temp_file_limit: {{ ([pg_size_twentieth, 200])|min }}GB               # 1/20 of disk size, max 200GB
{% endraw %}
  • temp_file_limit defaults to 5% of disk space, capped at 200GB.
  • min_wal_size defaults to 5% of disk space, capped at 200GB.
  • max_wal_size defaults to 20% of disk space, capped at 2TB.
  • max_slot_wal_keep_size defaults to 30% of disk space, capped at 3TB.

As a special case, the OLAP template allows 20% for temp_file_limit, capped at 2TB.


Manual Parameter Tuning

In addition to using Pigsty’s automatically configured parameters, you can also manually tune PostgreSQL parameters.

Use the pg edit-config <cluster> command to interactively edit cluster configuration:

pg edit-config pg-meta

Or use the -p parameter to directly set parameters:

pg edit-config -p log_min_duration_statement=1000 pg-meta
pg edit-config --force -p shared_preload_libraries='timescaledb, pg_cron, pg_stat_statements, auto_explain' pg-meta

You can also use the Patroni REST API to modify configuration:

curl -u 'postgres:Patroni.API' \
    -d '{"postgresql":{"parameters": {"log_min_duration_statement":200}}}' \
    -s -X PATCH http://10.10.10.10:8008/config | jq .

8 - Accidental Deletion

Handling accidental data deletion, table deletion, and database deletion

Accidental Data Deletion

If it’s a small-scale DELETE misoperation, you can consider using the pg_surgery or pg_dirtyread extension for in-place surgical recovery.

-- Immediately disable Auto Vacuum on this table and abort Auto Vacuum worker processes for this table
ALTER TABLE public.some_table SET (autovacuum_enabled = off, toast.autovacuum_enabled = off);

CREATE EXTENSION pg_dirtyread;
SELECT * FROM pg_dirtyread('tablename') AS t(col1 type1, col2 type2, ...);

If the deleted data has already been reclaimed by VACUUM, then use the general accidental deletion recovery process.

Accidental Object Deletion

When DROP/DELETE type misoperations occur, typically decide on a recovery plan according to the following process:

  1. Confirm whether this data can be recovered from the business system or other data systems. If yes, recover directly from the business side.
  2. Confirm whether there is a delayed replica. If yes, advance the delayed replica to the time point before deletion and query the data for recovery.
  3. If the data has been confirmed deleted, confirm backup information and whether the backup range covers the deletion time point. If it does, start PITR.
  4. Confirm whether to perform in-place cluster PITR rollback, or start a new server for replay, or use a replica for replay, and execute the recovery strategy.

Accidental Cluster Deletion

If an entire database cluster is accidentally deleted through Pigsty management commands, for example, incorrectly executing the pgsql-rm.yml playbook or the bin/pgsql-rm command. Unless you have set the pg_rm_backup parameter to false, the backup will be deleted along with the database cluster.

Warning: In this situation, your data will be unrecoverable! Please think three times before proceeding!

Recommendation: For production environments, you can globally configure this parameter to false in the configuration manifest to preserve backups when removing clusters.

9 - Clone Replicas

How to clone databases, database instances, and database clusters?

PostgreSQL can already replicate data through physical replicas and logical replicas, but sometimes you may need to quickly clone a database, database instance, or entire database cluster. The cloned database can be written to, evolve independently, and not affect the original database. In Pigsty, there are several cloning methods:

  • Clone Database: Clone a new database within the same cluster
  • Clone Instance: Clone a new instance on the same PG node
  • Clone Cluster: Create a new database cluster using PITR mechanism and restore to any point in time of the specified cluster

Clone Database

You can copy a PostgreSQL database through the template mechanism, but no active connections to the template database are allowed during this period.

If you want to clone the postgres database, you must execute the following two statements at the same time. Ensure all connections to the postgres database are cleaned up before executing Clone:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'postgres';
CREATE DATABASE pgcopy TEMPLATE postgres STRATEGY FILE_COPY;

Instant Clone

If you are using PostgreSQL 18 or higher, Pigsty sets file_copy_method by default. This parameter allows you to clone a database in O(1) (~200ms) time complexity without copying data files.

However, you must explicitly use the FILE_COPY strategy to create the database. Since the STRATEGY parameter of CREATE DATABASE was introduced in PostgreSQL 15, the default value has been WAL_LOG. You need to explicitly specify FILE_COPY for instant cloning.

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'postgres';
CREATE DATABASE pgcopy TEMPLATE postgres STRATEGY FILE_COPY;

For example, cloning a 30 GB database: normal clone (WAL_LOG) takes 18 seconds, while instant clone (FILE_COPY) only needs constant time of 200 milliseconds.

Since Pigsty v4.0, you can use strategy: FILE_COPY in the pg_databases parameter to achieve instant database cloning.

    pg-meta:
      hosts:
        10.10.10.10: { pg_seq: 1, pg_role: primary }
      vars:
        pg_cluster: pg-meta
        pg_version: 18
        pg_databases:

          - name: meta

          - name: meta_dev
            template: meta
            strategy: FILE_COPY         # <---- Introduced in PG 15, instant in PG18

After configuration, use the standard database creation SOP to create the database:

bin/pgsql-db pg-meta meta_dev

Limitations and Notes

This feature is only available on supported file systems (xfs, btrfs, zfs, apfs). If the file system doesn’t support it, PostgreSQL will fail with an error.

By default, mainstream OS distributions’ xfs have reflink=1 enabled by default, so you don’t need to worry about this in most cases.

If your PostgreSQL version is below 15, specifying strategy will have no effect.

Please don’t use the postgres database as a template database for cloning, as management connections typically connect to the postgres database, which prevents the cloning operation.

Use instant cloning with caution in extremely high concurrency/throughput production environments, as it requires clearing all connections to the template database within the cloning window (200ms), otherwise the clone will fail.

10 - Maintenance

Common system maintenance tasks

To ensure Pigsty and PostgreSQL clusters run healthily and stably, some routine maintenance work is required.


Regular Monitoring Review

Pigsty provides an out-of-the-box monitoring platform. We recommend you browse the monitoring dashboards once a day to keep track of system status. At a minimum, we recommend you review the monitoring at least once a week, paying attention to alert events that occur, which can help you avoid most failures and issues in advance.

Here is a list of pre-defined alert rules in Pigsty.


Failover Follow-up

Pigsty’s high availability architecture allows PostgreSQL clusters to automatically perform primary-replica switchovers, meaning operations and DBAs don’t need to intervene or respond immediately. However, users still need to perform the following follow-up work at an appropriate time (e.g., the next business day), including:

  • Investigate and confirm the cause of the failure to prevent recurrence
  • Restore the cluster’s original primary-replica topology as appropriate, or modify the configuration manifest to match the new primary-replica status.
  • Refresh load balancer configuration through bin/pgsql-svc to update service routing status
  • Refresh the cluster’s HBA rules through bin/pgsql-hba to avoid primary-replica-specific rule drift
  • If necessary, use bin/pgsql-rm to remove the failed server and expand with a new replica through bin/pgsql-add

Table Bloat Management

Long-running PostgreSQL will experience “table bloat” / “index bloat” phenomena, leading to system performance degradation.

Regularly using pg_repack to perform online rebuilding of tables and indexes helps maintain PostgreSQL’s good performance. Pigsty has already installed and enabled this extension by default in all databases, so you can use it directly.

You can use Pigsty’s PGCAT Database - Table Bloat panel to confirm table bloat and index bloat in the database. Select tables and indexes with high bloat rates (larger tables with bloat rates above 50%) and use pg_repack for online reorganization:

pg_repack dbname -t schema.table

Reorganization does not affect normal read and write operations, but the switching moment after reorganization completes requires an AccessExclusive lock on the table, blocking all access. Therefore, for high-throughput businesses, it’s recommended to perform this during off-peak periods or maintenance windows. For more details, please refer to: Managing Relation Bloat


VACUUM FREEZE

Freezing expired transaction IDs (VACUUM FREEZE) is an important PostgreSQL maintenance task used to prevent transaction ID (XID) exhaustion leading to downtime. Although PostgreSQL already provides an automatic vacuum (AutoVacuum) mechanism, for high-standard production environments, we still recommend combining both automatic and manual approaches, regularly executing database-wide VACUUM FREEZE to ensure XID safety.

You can manually execute VACUUM FREEZE on a database using the following commands:

-- Execute VACUUM FREEZE on the entire database
VACUUM FREEZE;

-- Execute VACUUM FREEZE on a specific table
VACUUM FREEZE schema.table_name;

Or set up a scheduled task through crontab, for example, execute every Sunday morning:

# Execute VACUUM FREEZE on all databases every Sunday at 3 AM
0 3 * * 0 postgres psql -c 'VACUUM FREEZE;' dbname

11 - Version Upgrade

How to upgrade (or downgrade) PostgreSQL minor version kernel, and how to perform major version upgrades

Minor Version Upgrade

To perform a minor version server upgrade/downgrade, you first need to add software to your local software repository: the latest PG minor version RPM/DEB.

First perform a rolling upgrade/downgrade on all replicas, then execute a cluster switchover to upgrade/downgrade the primary.

ansible <cls> -b -a "yum upgrade/downgrade -y <pkg>"    # Upgrade/downgrade packages
pg restart --force <cls>                                # Restart cluster

This time we’ll perform a rolling upgrade:

ansible pg-test -b -a "yum upgrade -y postgresql15*"    # Upgrade packages (or apt upgrade)
ansible pg-test -b -a '/usr/pgsql/bin/pg_ctl --version' # Check binary version is 15.2
pg restart --role replica --force pg-test               # Restart replicas
pg switchover --leader pg-test-1 --candidate=pg-test-2 --scheduled=now --force pg-test    # Switchover primary and replica
pg restart --role primary --force pg-test               # Restart primary

Minor Version Downgrade

Add 15.1 packages to the software repository and refresh the node’s yum/apt cache:

cd ~/pigsty; ./infra.yml -t repo_upstream               # Add upstream repository
cd /www/pigsty; repotrack postgresql15-*-15.1           # Add 15.1 packages to yum repository
cd ~/pigsty; ./infra.yml -t repo_create                 # Rebuild repository metadata
ansible pg-test -b -a 'yum clean all'                   # Clean node repository cache
ansible pg-test -b -a 'yum makecache'                   # Regenerate yum cache from new repository

# For Ubuntu/Debian users, use apt instead of yum
ansible pg-test -b -a 'apt clean'                       # Clean node repository cache
ansible pg-test -b -a 'apt update'                      # Regenerate apt cache from new repository

Execute downgrade and restart cluster:

ansible pg-test -b -a "yum downgrade -y postgresql15*"  # Downgrade packages
pg restart --force pg-test                              # Restart entire cluster to complete upgrade

Major Version Upgrade

The simplest way to perform a major version upgrade is to create a new cluster using the new version, then perform online migration through logical replication and blue-green deployment.

You can also perform an in-place major version upgrade. When you only use the database kernel itself, this is not complicated; use PostgreSQL’s built-in pg_upgrade:

Suppose you want to upgrade PostgreSQL major version from 14 to 15. You first need to add software to the repository and ensure that core extension plugins installed on both sides of the two major versions also have the same version numbers.

./pgsql.yml -t pg_pkg -e pg_version=15                         # Install pg 15 packages
sudo su - postgres; mkdir -p /data/postgres/pg-meta-15/data/   # Prepare directory for 15
pg_upgrade -b /usr/pgsql-14/bin/ -B /usr/pgsql-15/bin/ -d /data/postgres/pg-meta-14/data/ -D /data/postgres/pg-meta-15/data/ -v -c # Pre-check
pg_upgrade -b /usr/pgsql-14/bin/ -B /usr/pgsql-15/bin/ -d /data/postgres/pg-meta-14/data/ -D /data/postgres/pg-meta-15/data/ --link -j8 -v -c
rm -rf /usr/pgsql; ln -s /usr/pgsql-15 /usr/pgsql;             # Fix binary link
mv /data/postgres/pg-meta-14 /data/postgres/pg-meta-15         # Rename data directory
rm -rf /pg; ln -s /data/postgres/pg-meta-15 /pg                # Fix data directory link