This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Module: PILOT

Extra modules in pilot development.

1: Module: MySQL

2: Module: Kafka

3: Module: DuckDB

4: Module: TigerBeetle

5: Module: Kubernetes

6: Module: Consul

7: Module: Victoria

8: Module: Jupyter

1 - Module: MySQL

Deploy a MySQL 8.0 cluster with Pigsty for demonstration or benchmarking purposes.

MySQL used to be the “most popular open-source relational database in the world”.

Overview

MySQL module is currently available in Pigsty Pro as a Beta Preview. Note that you should NOT use this MySQL deployment for production environments.

Installation

You can install MySQL 8.0 from the official software source on EL systems directly on the nodes managed by Pigsty.

# el 7,8,9
./node.yml -t node_install -e '{"node_repo_modules":"node,mysql","node_packages":["mysql-community-server,mysql-community-client"]}'

# debian / ubuntu
./node.yml -t node_install -e '{"node_repo_modules":"node,mysql","node_packages":["mysql-server"]}'

You can also add the MySQL package to the local repo and use the playbook mysql.yml for production deployment.

Configuration

This config snippet defines a single-node MySQL instance, along with its Databases and Users.

my-test:
  hosts: { 10.10.10.10: { mysql_seq: 1, mysql_role: primary } }
  vars:
    mysql_cluster: my-test
    mysql_databases:
      - { name: meta }
    mysql_users:
      - { name: dbuser_meta    ,host: '%' ,password: 'dbuesr_meta'    ,priv: { "*.*": "SELECT, UPDATE, DELETE, INSERT" } }
      - { name: dbuser_dba     ,host: '%' ,password: 'DBUser.DBA'     ,priv: { "*.*": "ALL PRIVILEGES" } }
      - { name: dbuser_monitor ,host: '%' ,password: 'DBUser.Monitor' ,priv: { "*.*": "SELECT, PROCESS, REPLICATION CLIENT" } ,connlimit: 3 }

Administration

Here are some basic MySQL cluster management operations:

Create MySQL cluster with mysql.yml:

./mysql.yml -l my-test

Playbook

Pigsty has the following playbooks related to the MYSQL module:

mysql.yml: Deploy MySQL according to the inventory

`mysql.yml`

The playbook mysql.yml contains the following subtasks:

mysql-id       : generate mysql instance identity
mysql_clean    : remove existing mysql instance (DANGEROUS)
mysql_dbsu     : create os user mysql
mysql_install  : install mysql rpm/deb packages
mysql_dir      : create mysql data & conf dir
mysql_config   : generate mysql config file
mysql_boot     : bootstrap mysql cluster
mysql_launch   : launch mysql service
mysql_pass     : write mysql password
mysql_db       : create mysql biz database
mysql_user     : create mysql biz user
mysql_exporter : launch mysql exporter
mysql_register : register mysql service to prometheus

Monitoring

Pigsty has two built-in MYSQL dashboards:

MYSQL Overview: MySQL cluster overview

MYSQL Instance: MySQL instance overview

Parameters

MySQL’s available parameters:

#-----------------------------------------------------------------
# MYSQL_IDENTITY
#-----------------------------------------------------------------
# mysql_cluster:           #CLUSTER  # mysql cluster name, required identity parameter
# mysql_role: replica      #INSTANCE # mysql role, required, could be primary,replica
# mysql_seq: 0             #INSTANCE # mysql instance seq number, required identity parameter

#-----------------------------------------------------------------
# MYSQL_BUSINESS
#-----------------------------------------------------------------
# mysql business object definition, overwrite in group vars
mysql_users: []                      # mysql business users
mysql_databases: []                  # mysql business databases
mysql_services: []                   # mysql business services

# global credentials, overwrite in global vars
mysql_root_username: root
mysql_root_password: DBUser.Root
mysql_replication_username: replicator
mysql_replication_password: DBUser.Replicator
mysql_admin_username: dbuser_dba
mysql_admin_password: DBUser.DBA
mysql_monitor_username: dbuser_monitor
mysql_monitor_password: DBUser.Monitor

#-----------------------------------------------------------------
# MYSQL_INSTALL
#-----------------------------------------------------------------
# - install - #
mysql_dbsu: mysql                    # os dbsu name, mysql by default, better not change it
mysql_dbsu_uid: 27                   # os dbsu uid and gid, 306 for default mysql users and groups
mysql_dbsu_home: /var/lib/mysql      # mysql home directory, `/var/lib/mysql` by default
mysql_dbsu_ssh_exchange: true        # exchange mysql dbsu ssh key among same mysql cluster
mysql_packages:                      # mysql packages to be installed, `mysql-community*` by default
  - mysql-community*
  - mysqld_exporter

# - bootstrap - #
mysql_data: /data/mysql              # mysql data directory, `/data/mysql` by default
mysql_listen: '0.0.0.0'              # mysql listen addresses, comma separated IP list
mysql_port: 3306                     # mysql listen port, 3306 by default
mysql_sock: /var/lib/mysql/mysql.sock # mysql socket dir, `/var/lib/mysql/mysql.sock` by default
mysql_pid: /var/run/mysqld/mysqld.pid # mysql pid file, `/var/run/mysqld/mysqld.pid` by default
mysql_conf: /etc/my.cnf              # mysql config file, `/etc/my.cnf` by default
mysql_log_dir: /var/log              # mysql log dir, `/var/log/mysql` by default

mysql_exporter_port: 9104            # mysqld_exporter listen port, 9104 by default

mysql_parameters: {}                 # extra parameters for mysqld
mysql_default_parameters:            # default parameters for mysqld

2 - Module: Kafka

Deploy Kafka KRaft cluster with Pigsty: open-source distributed event streaming platform

Kafka is an open-source distributed event streaming platform: Installation | Configuration | Administration | Playbook | Monitoring | Parameters | Resources

Overview

Kafka module is currently available in Pigsty Pro as a Beta Preview.

Installation

If you are using the open-source version of Pigsty, you can install Kafka and its Java dependencies on the specified node using the following command.

Pigsty provides Kafka 3.8.0 RPM and DEB packages in the official Infra repository, which can be downloaded and installed directly.

./node.yml -t node_install  -e '{"node_repo_modules":"infra","node_packages":["kafka"]}'

Kafka requires a Java runtime environment, so you need to install an available JDK when installing Kafka (OpenJDK 17 is used by default, but other JDKs and versions, such as 8 and 11, can also be used).

# EL7 (no JDK 17 support)
./node.yml -t node_install  -e '{"node_repo_modules":"node","node_packages":["java-11-openjdk-headless"]}'

# EL8 / EL9 (use OpenJDK 17)
./node.yml -t node_install  -e '{"node_repo_modules":"node","node_packages":["java-17-openjdk-headless"]}'

# Debian / Ubuntu (use OpenJDK 17)
./node.yml -t node_install  -e '{"node_repo_modules":"node","node_packages":["openjdk-17-jdk"]}'

Configuration

Single node Kafka configuration example. Please note that in Pigsty single machine deployment mode, the 9093 port on the admin node is already occupied by AlertManager.

It is recommended to use other ports when installing Kafka on the admin node, such as (9095).

kf-main:
  hosts:
    10.10.10.10: { kafka_seq: 1, kafka_role: controller }
  vars:
    kafka_cluster: kf-main
    kafka_data: /data/kafka
    kafka_peer_port: 9095     # 9093 is already hold by alertmanager

3-node Kraft mode Kafka cluster configuration example:

kf-test:
  hosts:
    10.10.10.11: { kafka_seq: 1, kafka_role: controller   }
    10.10.10.12: { kafka_seq: 2, kafka_role: controller   }
    10.10.10.13: { kafka_seq: 3, kafka_role: controller   }
  vars:
    kafka_cluster: kf-test

Administration

Here are some basic Kafka cluster management operations:

Create Kafka clusters with kafka.yml playbook:

./kafka.yml -l kf-main
./kafka.yml -l kf-test

Create a topic named test:

kafka-topics.sh --create --topic test --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092

Here the --replication-factor 1 means each data will be replicated once, and --partitions 1 means only one partition will be created.

Use the following command to view the list of Topics in Kafka:

kafka-topics.sh --bootstrap-server localhost:9092 --list

Use the built-in Kafka producer to send messages to the test Topic:

kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
>haha
>xixi
>hoho
>hello
>world
> ^D

Use the built-in Kafka consumer to read messages from the test Topic:

kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092

Playbook

Pigsty provides 1 playbook related to the Kafka module for managing Kafka clusters.

`kafka.yml`

The kafka.yml playbook for deploying Kafka KRaft mode cluster contains the following subtasks:

kafka-id       : generate kafka instance identity
kafka_clean    : remove existing kafka instance (DANGEROUS)
kafka_user     : create os user kafka
kafka_pkg      : install kafka rpm/deb packages
kafka_link     : create symlink to /usr/kafka
kafka_path     : add kafka bin path to /etc/profile.d
kafka_svc      : install kafka systemd service
kafka_dir      : create kafka data & conf dir
kafka_config   : generate kafka config file
kafka_boot     : bootstrap kafka cluster
kafka_launch   : launch kafka service
kafka_exporter : launch kafka exporter
kafka_register : register kafka service to prometheus

Monitoring

Pigsty has provided two monitoring panels related to the KAFKA module:

KAFKA Overview shows the overall monitoring metrics of the Kafka cluster.

KAFKA Instance shows the monitoring metrics details of a single Kafka instance.

Parameters

Available parameters for Kafka module:

#kafka_cluster:           #CLUSTER  # kafka cluster name, required identity parameter
#kafka_role: controller   #INSTANCE # kafka role, controller, broker, or controller-only
#kafka_seq: 0             #INSTANCE # kafka instance seq number, required identity parameter
kafka_clean: false                  # cleanup kafka during init? false by default
kafka_data: /data/kafka             # kafka data directory, `/data/kafka` by default
kafka_version: 3.8.0                # kafka version string
scala_version: 2.13                 # kafka binary scala version
kafka_port: 9092                    # kafka broker listen port
kafka_peer_port: 9093               # kafka broker peer listen port, 9093 by default (conflict with alertmanager)
kafka_exporter_port: 9308           # kafka exporter listen port, 9308 by default
kafka_parameters:                   # kafka parameters to be added to server.properties
  num.network.threads: 3
  num.io.threads: 8
  socket.send.buffer.bytes: 102400
  socket.receive.buffer.bytes: 102400
  socket.request.max.bytes: 104857600
  num.partitions: 1
  num.recovery.threads.per.data.dir: 1
  offsets.topic.replication.factor: 1
  transaction.state.log.replication.factor: 1
  transaction.state.log.min.isr: 1
  log.retention.hours: 168
  log.segment.bytes: 1073741824
  log.retention.check.interval.ms: 300000
  #log.retention.bytes: 1073741824
  #log.flush.interval.ms: 1000
  #log.flush.interval.messages: 10000

Resources

Pigsty provides some Kafka-related extension plugins for PostgreSQL:

kafka_fdw: A useful FDW that allows users to read and write Kafka Topic data directly from PostgreSQL
wal2json: Used to logically decode WAL from PostgreSQL and generate JSON-formatted change data
wal2mongo: Used to logically decode WAL from PostgreSQL and generate BSON-formatted change data
decoder_raw: Used to logically decode WAL from PostgreSQL and generate SQL-formatted change data
test_decoding: Used to logically decode WAL from PostgreSQL and generate RAW-formatted change data

3 - Module: DuckDB

Install DuckDB, a high-performance embedded analytical database component.

DuckDB is a fast in-process analytical database: Installation | Resources

Overview

DuckDB is an embedded database, so it does not require deployment or service management. You only need to install the DuckDB package on the node to use it.

Installation

Pigsty already provides DuckDB software package (RPM / DEB) in the Infra software repository, you can install it with the following command:

./node.yml -t node_install  -e '{"node_repo_modules":"infra","node_packages":["duckdb"]}'

Resources

There are some DuckDB-related extension plugins provided by Pigsty for PostgreSQL:

pg_analytics: Add OLAP capabilities to PostgreSQL based on DuckDB
pg_lakehouse: Data lakehouse plugin by ParadeDB, wrapping DuckDB. (Currently planned to be renamed back to pg_analytics)
duckdb_fdw: Foreign data wrapper for DuckDB, read/write DuckDB data files from PG
pg_duckdb: WIP extension plugin by DuckDB official MotherDuck and Hydra (only available on EL systems as a pilot)

4 - Module: TigerBeetle

Deploy TigerBeetle, the Financial Transactions Database that is 1000x faster.

TigerBeetle is a financial accounting transaction database offering extreme performance and reliability.

Overview

The TigerBeetle module is currently available for Beta preview only in the Pigsty Professional Edition.

Installation

Pigsty Infra Repo has the RPM / DEB packages for TigerBeetle, use the following command to install:

./node.yml -t node_install -e '{"node_repo_modules":"infra","node_packages":["tigerbeetle"]}'

After installation, please refer to the official documentation for configuration: https://github.com/tigerbeetle/tigerbeetle

TigerBeetle Requires Linux Kernel Version 5.5 or Higher!

Please note that TigerBeetle supports only Linux kernel version 5.5 or higher, making it incompatible by default with EL7 (3.10) and EL8 (4.18) systems.

To install TigerBeetle, please use EL9 (5.14), Ubuntu 22.04 (5.15), Debian 12 (6.1), Debian 11 (5.10), or another supported system.

5 - Module: Kubernetes

Deploy Kubernetes, the Production-Grade Container Orchestration Platform.

Kubernetes is a production-grade, open-source container orchestration platform. It helps you automate, deploy, scale, and manage containerized applications.

Pigsty has native support for ETCD clusters, which can be used by Kubernetes. Therefore, the pro version also provides the KUBE module for deploying production-grade Kubernetes clusters.

The KUBE module is currently in Beta status and only available for Pro edition customers.

However, you can directly specify node repositories in Pigsty, install Kubernetes packages, and use Pigsty to adjust environment configurations and provision nodes for K8S deployment, solving the last mile delivery problem.

SealOS

SealOS is a lightweight, high-performance, and easy-to-use Kubernetes distribution. It is designed to simplify the deployment and management of Kubernetes clusters.

Pigsty provides SealOS 5.0 RPM and DEB packages in the Infra repository, which can be downloaded and installed directly, and use SealOS to manage clusters.

./node.yml -t node_install -e '{"node_repo_modules":"infra","node_packages":["sealos"]}'

Kubernetes

If you prefer to deploy Kubernetes using the classic Kubeadm, please refer to the module reference below.

./node.yml -t node_install -e '{"node_repo_modules":"kube","node_packages":["kubeadm,kubelet,kubectl"]}'

Kubernetes supports multiple container runtimes. If you want to use Containerd as the container runtime, please make sure Containerd is installed on the node.

./node.yml -t node_install -e '{"node_repo_modules":"node,docker","node_packages":["containerd.io"]}'

If you want to use Docker as the container runtime, you need to install Docker and bridge with the cri-dockerd project (not available on EL9/D11/U20 yet):

./node.yml -t node_install -e '{"node_repo_modules":"node,infra,docker","node_packages":["docker-ce,docker-compose-plugin,cri-dockerd"]}'

Playbook

kube.yml playbook (TBD)

Monitoring

TBD

Parameters

Kubernetes module parameters:

#kube_cluster:                                          #IDENTITY# # define kubernetes cluster name
kube_role: node                                                    # default kubernetes role (master|node)
kube_version: 1.31.0                                               # kubernetes version
kube_registry: registry.aliyuncs.com/google_containers             # kubernetes version aliyun k8s miiror repository
kube_pod_cidr: "10.11.0.0/16"                                      # kubernetes pod network cidr
kube_service_cidr: "10.12.0.0/16"                                  # kubernetes service network cidr
kube_dashboard_admin_user: dashboard-admin-sa                      # kubernetes dashboard admin user name

6 - Module: Consul

Deploy Consul, the alternative to Etcd, with Pigsty.

Consul is a distributed DCS + KV + DNS + service registry/discovery component.

In the old version (1.x) of Pigsty, Consul was used as the default high-availability DCS. Now this support has been removed, but it will be provided as a separate module in the future.

https://github.com/pgsty/pigsty/tree/v1.5.1/roles/consul

Configuration

To deploy Consul, you need to add the IP addresses and hostnames of all nodes to the consul group.

At least one node should be designated as the consul server with consul_role: server, while other nodes default to consul_role: node.

consul:
  hosts:
    10.10.10.10: { nodename: meta , consul_role: server }
    10.10.10.11: { nodename: node-1 }
    10.10.10.12: { nodename: node-2 }
    10.10.10.13: { nodename: node-3 }

For production deployments, we recommend using an odd number of Consul Servers, preferably three.

Parameters

#-----------------------------------------------------------------
# CONSUL
#-----------------------------------------------------------------
consul_role: node                 # consul role, node or server, node by default
consul_dc: pigsty                 # consul data center name, `pigsty` by default
consul_data: /data/consul         # consul data dir, `/data/consul`
consul_clean: true                # consul purge flag, if true, clean consul during init
consul_ui: false                  # enable consul ui, the default value for consul server is true

7 - Module: Victoria

Deploy VictoriaMetrics & VictoriaLogs, the in-place replacement for Prometheus & Loki.

VictoriaMetrics is the in-place replacement for Prometheus, offering better performance and compression ratio.

Overview

Victoria is currently only available in the Pigsty Professional Edition Beta preview. It includes the deployment and management of VictoriaMetrics and VictoriaLogs components.

Installation

Pigsty Infra Repo has the RPM / DEB packages for VictoriaMetrics, use the following command to install:

./node.yml -t node_install -e '{"node_repo_modules":"infra","node_packages":["victoria-metrics"]}'
./node.yml -t node_install -e '{"node_repo_modules":"infra","node_packages":["victoria-metrics-cluster"]}'
./node.yml -t node_install -e '{"node_repo_modules":"infra","node_packages":["victoria-metrics-utils"]}'
./node.yml -t node_install -e '{"node_repo_modules":"infra","node_packages":["victoria-logs"]}'

For common users, installing the standalone version of VictoriaMetrics is sufficient. If you need to deploy a cluster, you can install the victoria-metrics-cluster package.

8 - Module: Jupyter

Launch Jupyter notebook server with Pigsty, a web-based interactive scientific notebook.

Run Jupyter notebook with Docker, you have to:

Change the default password in .env: JUPYTER_TOKEN
Create data dir with proper permission: make dir, owned by 1000:100
make up to pull up Jupyter with docker compose

cd ~/pigsty/app/jupyter ; make dir up

Visit http://lab.pigsty or http://10.10.10.10:8888, the default password is pigsty

http://lab.pigsty?token=pigsty

Prepare

Create a data directory /data/jupyter, with the default uid & gid 1000:100:

make dir   # mkdir -p /data/jupyter; chown -R 1000:100 /data/jupyter

Connect to Postgres

Use the Jupyter terminal to install psycopg2-binary & psycopg2 package.

pip install psycopg2-binary psycopg2

# install with a mirror
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple psycopg2-binary psycopg2

pip install --upgrade pip
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

Or installation with conda:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/

Then use the driver in your notebook:

import psycopg2

conn = psycopg2.connect('postgres://dbuser_dba:[email protected]:5432/meta')
cursor = conn.cursor()
cursor.execute('SELECT * FROM pg_stat_activity')
for i in cursor.fetchall():
    print(i)

Alias

make up         # pull up jupyter with docker compose
make dir        # create required /data/jupyter and set owner
make run        # launch jupyter with docker
make view       # print jupyter access point
make log        # tail -f jupyter logs
make info       # introspect jupyter with jq
make stop       # stop jupyter container
make clean      # remove jupyter container
make pull       # pull latest jupyter image
make rmi        # remove jupyter image
make save       # save jupyter image to /tmp/docker/jupyter.tgz
make load       # load jupyter image from /tmp/docker/jupyter.tgz