datasketches
Overview
| Package | Version | Category | License | Language |
|---|---|---|---|---|
datasketches | 1.7.0 | FUNC | Apache-2.0 | C++ |
| ID | Extension | Bin | Lib | Load | Create | Trust | Reloc | Schema |
|---|---|---|---|---|---|---|---|---|
| 4690 | datasketches | No | Yes | No | Yes | No | Yes | - |
Built against Apache DataSketches C++ core 5.0.0.
Version
| Type | Repo | Version | PG Ver | Package | Deps |
|---|---|---|---|---|---|
| EXT | PIGSTY | 1.7.0 | 1817161514 | datasketches | - |
| RPM | PIGSTY | 1.7.0 | 1817161514 | datasketches_$v | - |
| DEB | PIGSTY | 1.7.0 | 1817161514 | postgresql-$v-datasketches | - |
Build
You can build the RPM / DEB packages for datasketches using pig build:
pig build pkg datasketches # build RPM / DEB packages
Install
You can install datasketches directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:
pig repo add pgsql -u # Add repo and update cache
Install the extension using pig or apt/yum/dnf:
pig install datasketches; # Install for current active PG version
pig ext install -y datasketches -v 18 # PG 18
pig ext install -y datasketches -v 17 # PG 17
pig ext install -y datasketches -v 16 # PG 16
pig ext install -y datasketches -v 15 # PG 15
pig ext install -y datasketches -v 14 # PG 14
dnf install -y datasketches_18 # PG 18
dnf install -y datasketches_17 # PG 17
dnf install -y datasketches_16 # PG 16
dnf install -y datasketches_15 # PG 15
dnf install -y datasketches_14 # PG 14
apt install -y postgresql-18-datasketches # PG 18
apt install -y postgresql-17-datasketches # PG 17
apt install -y postgresql-16-datasketches # PG 16
apt install -y postgresql-15-datasketches # PG 15
apt install -y postgresql-14-datasketches # PG 14
Create Extension:
CREATE EXTENSION datasketches;
Usage
Sources: README, latest release 1.7.0, Apache DataSketches
datasketches adds approximate analytics sketch types and aggregates to PostgreSQL. The upstream README lists CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches; the 1.7.0 release is the latest published GitHub release, while the default branch has already moved on to 1.8.0-SNAPSHOT.
CREATE EXTENSION datasketches;
Core Sketch Families
cpc_sketchandhll_sketchfor approximate distinct counting.theta_sketchfor distinct counting plus set operations such as union, intersection, and A-not-B.aod_sketchfor tuple-style metrics keyed by identifiers with arrays of doubles.kll_*_sketchandquantiles_*_sketchfor quantiles, ranks, PMF, and CDF.frequent_strings_sketchfor heavy-hitter detection.
Common Patterns
Build a sketch from raw values:
SELECT cpc_sketch_build(1);
SELECT kll_float_sketch_build(value) FROM normal;
Use one-shot approximate aggregates:
SELECT cpc_sketch_distinct(id) FROM random_ints_100m;
Merge sketches across groups or cube dimensions:
SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5)
FROM kll_float_sketch_test;
Run set operations on Theta sketches:
SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2))
FROM theta_set_op_test;
Find frequent items above a threshold:
SELECT frequent_strings_sketch_result_no_false_negatives(
frequent_strings_sketch_build(9, value),
1000000
)
FROM zipf_1p1_8k_100m;
Caveats
- Upstream documents PostgreSQL 9.6+ plus Boost 1.75.0 and DataSketches C++ core 5.0.0 or later as build dependencies.
- These are approximate structures meant to be mergeable across dimensions; they are not exact replacements for
COUNT(DISTINCT ...)or exact histograms.
Feedback
Was this page helpful?
Thanks for the feedback! Please let us know how we can improve.
Sorry to hear that. Please let us know how we can improve.