datasketches

Approximate analytics sketches and aggregates for PostgreSQL

Overview

PackageVersionCategoryLicenseLanguage
datasketches1.7.0FUNCApache-2.0C++
IDExtensionBinLibLoadCreateTrustRelocSchema
4690datasketchesNoYesNoYesNoYes-

Built against Apache DataSketches C++ core 5.0.0.

Version

TypeRepoVersionPG VerPackageDeps
EXTPIGSTY1.7.01817161514datasketches-
RPMPIGSTY1.7.01817161514datasketches_$v-
DEBPIGSTY1.7.01817161514postgresql-$v-datasketches-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
d12.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0

Build

You can build the RPM / DEB packages for datasketches using pig build:

pig build pkg datasketches         # build RPM / DEB packages

Install

You can install datasketches directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:

pig repo add pgsql -u          # Add repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install datasketches;          # Install for current active PG version
pig ext install -y datasketches -v 18  # PG 18
pig ext install -y datasketches -v 17  # PG 17
pig ext install -y datasketches -v 16  # PG 16
pig ext install -y datasketches -v 15  # PG 15
pig ext install -y datasketches -v 14  # PG 14
dnf install -y datasketches_18       # PG 18
dnf install -y datasketches_17       # PG 17
dnf install -y datasketches_16       # PG 16
dnf install -y datasketches_15       # PG 15
dnf install -y datasketches_14       # PG 14
apt install -y postgresql-18-datasketches   # PG 18
apt install -y postgresql-17-datasketches   # PG 17
apt install -y postgresql-16-datasketches   # PG 16
apt install -y postgresql-15-datasketches   # PG 15
apt install -y postgresql-14-datasketches   # PG 14

Create Extension:

CREATE EXTENSION datasketches;

Usage

Sources: README, latest release 1.7.0, Apache DataSketches

datasketches adds approximate analytics sketch types and aggregates to PostgreSQL. The upstream README lists CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches; the 1.7.0 release is the latest published GitHub release, while the default branch has already moved on to 1.8.0-SNAPSHOT.

CREATE EXTENSION datasketches;

Core Sketch Families

  • cpc_sketch and hll_sketch for approximate distinct counting.
  • theta_sketch for distinct counting plus set operations such as union, intersection, and A-not-B.
  • aod_sketch for tuple-style metrics keyed by identifiers with arrays of doubles.
  • kll_*_sketch and quantiles_*_sketch for quantiles, ranks, PMF, and CDF.
  • frequent_strings_sketch for heavy-hitter detection.

Common Patterns

Build a sketch from raw values:

SELECT cpc_sketch_build(1);
SELECT kll_float_sketch_build(value) FROM normal;

Use one-shot approximate aggregates:

SELECT cpc_sketch_distinct(id) FROM random_ints_100m;

Merge sketches across groups or cube dimensions:

SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5)
FROM kll_float_sketch_test;

Run set operations on Theta sketches:

SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2))
FROM theta_set_op_test;

Find frequent items above a threshold:

SELECT frequent_strings_sketch_result_no_false_negatives(
  frequent_strings_sketch_build(9, value),
  1000000
)
FROM zipf_1p1_8k_100m;

Caveats

  • Upstream documents PostgreSQL 9.6+ plus Boost 1.75.0 and DataSketches C++ core 5.0.0 or later as build dependencies.
  • These are approximate structures meant to be mergeable across dimensions; they are not exact replacements for COUNT(DISTINCT ...) or exact histograms.

Last Modified 2026-04-19: update extension stub docs (9f178c3)