Getting Started with PipelineDB

Table of Contents

PipelineDB extends PostgreSQL with streaming primitives—continuous views over unbounded input. Although the upstream project was discontinued, the concepts remain useful.

Install & configure
#

PipelineDB ships as an extension. Install the RPM/DEB, then edit postgresql.conf:

shared_preload_libraries = 'pipelinedb'
max_worker_processes = 128

Restart Postgres. You must raise max_worker_processes or PipelineDB will fail to launch.

Example: Wikipedia page views
#

Create a stream (foreign table backed by the PipelineDB handler):

CREATE FOREIGN TABLE wiki_stream (
  hour timestamp,
  project text,
  title text,
  view_count bigint,
  size bigint
) SERVER pipelinedb;

Create a continuous view to materialize rolling aggregates:

CREATE VIEW wiki_stats WITH (action = materialize) AS
SELECT hour,
       project,
       count(*) AS total_pages,
       sum(view_count) AS total_views,
       min(view_count) AS min_views,
       max(view_count) AS max_views,
       avg(view_count) AS avg_views,
       percentile_cont(0.99) WITHIN GROUP (ORDER BY view_count) AS p99_views,
       sum(size) AS total_bytes_served
FROM wiki_stream
GROUP BY hour, project;

Ingest data via COPY:

curl -sL http://pipelinedb.com/data/wiki-pagecounts | gunzip |
  psql -c "COPY wiki_stream (hour, project, title, view_count, size) FROM STDIN"

wiki_stats now updates continuously as new rows arrive.

Core concepts
#

Streams – foreign tables representing append-only input.
Continuous views – materialized aggregates that update incrementally as stream events arrive.
Transforms – optional preprocessing stages.

PipelineDB lets you express streaming jobs with plain SQL and reuse the Postgres toolchain you already know.

Install & configure#

Example: Wikipedia page views#

Core concepts#

Related

Install & configure
#

Example: Wikipedia page views
#

Core concepts
#