Agent monitoring and privacy

Data privacy and security

What DBtune collects

The agent collects only the metrics needed to optimize your database:

Performance metrics: query runtime, throughput, cache hit ratios.
System metrics: CPU utilization, memory usage, I/O statistics.
Database statistics: connection counts, wait events, vacuum activity.
Configuration parameters: current database configuration settings.
Hardware information: CPU count, memory size, disk type, instance type.
Schema metadata: catalogs describing your tables, columns, indexes, and constraints (no row data).

All transmitted data is either numeric, structural (catalog rows), or obfuscated query text used for query identification on the platform.

Query text

Query text collection is optional. Toggle it with the DBT_POSTGRESQL_INCLUDE_QUERIES environment variable, or with the per-collector include_queries knob on the pg_stat_statements collector.

If enabled, table and column names that appear in pg_stat_statements.query are transmitted along with the obfuscated parameters. See the PostgreSQL pg_stat_statements docs for what that field contains.

Query text is not used for optimization. Its only purpose is to let you identify queries in the platform, e.g. when picking queries for a Workload Fingerprint. DBtune still works without it; in that case we surface a query_id you can correlate manually with pg_stat_statements.

note

If you previously enabled query text and then disabled it, the obfuscated text already collected is retained. Email support@dbtune.com if you want it deleted.

What DBtune does NOT collect

The agent never reads or transmits:

Database contents: no row data, no stored values.
Query parameters: obfuscated by pg_stat_statements, or omitted entirely when query text is disabled.
Credentials: no passwords, API keys, or authentication tokens.
Personal data: no end-user information.

How collection works

Collection is per-collector, not a single global stream. Each collector has its own purpose, its own data source, and its own schedule, ranging from every 5 seconds for hot signals like wait events down to every 5 minutes for slow-moving catalog data.

Every collector ships with two universal knobs:

DBT_COLLECTOR_<NAME>_ENABLED. Set to false to turn the collector off.
DBT_COLLECTOR_<NAME>_INTERVAL_SECONDS. Overrides the default cadence (subject to a per-collector minimum).

Some collectors expose extra knobs, listed inline below where they apply. Defaults are sensible for most workloads; you should not need to touch these unless you are deliberately tuning the agent's footprint or privacy posture.

Collectors that target version-gated views (e.g. pg_stat_io on PG16+, pg_stat_checkpointer on PG17+) automatically skip themselves on older PostgreSQL releases.

Collector reference

The tables below list every collector the agent ships, grouped by what it watches. The "Source" column points at the underlying PostgreSQL view, system file, or cloud API.

Hot signals (every 5 seconds)

These are sampled aggressively because they drive real-time tuning decisions.

Collector	Source	What it captures
`wait_events`	`pg_stat_activity`	Counts of backends waiting on each wait event type (Lock, IO, IPC, LWLock, etc.)
`database_connections`	`pg_stat_activity`	Active / idle / idle-in-transaction / aborted / total connection counts
`database_transactions`	`pg_stat_database`	Cumulative commit count, converted to TPS between samples
`autovacuum_count`	`pg_stat_activity`	Number of currently running autovacuum workers
`system_db_size`	`pg_database_size()`	Total bytes across all databases on the instance
`server_uptime`	`pg_postmaster_start_time()`	Server uptime in minutes

Database-wide statistics (every minute)

Collector	Source	What it captures
`pg_stat_database`	`pg_stat_database`	Per-database commits, rollbacks, block I/O, tuple I/O, deadlocks, session time, temp files
`pg_stat_database_conflicts`	`pg_stat_database_conflicts`	Replication conflict counts by type (PG17+)
`pg_stat_bgwriter`	`pg_stat_bgwriter`	Background-writer checkpoints, buffers written, fsync time
`pg_stat_checkpointer`	`pg_stat_checkpointer`	Checkpointer-process metrics (PG17+)
`pg_stat_io`	`pg_stat_io`	Fine-grained I/O by backend type, object, and context (PG16+)
`pg_stat_wal`	`pg_stat_wal`	WAL records, FPIs, bytes, buffer-full count, write/sync timing (PG14+)
`pg_stat_archiver`	`pg_stat_archiver`	WAL archive successes/failures and last archived file
`pg_stat_slru`	`pg_stat_slru`	SLRU buffer pool I/O (commit timestamps, multixact, etc.)
`pg_database`	`pg_database`	Per-database metadata, frozen XID age, multixact age

Query and session activity (every minute, hot collectors faster)

Collector	Source	What it captures	Notes
`pg_stat_statements`	`pg_stat_statements`	Per-statement execution time, call count, rows, buffer I/O, plan time, JIT, WAL stats	Every 5 s. Honors `include_queries`, `max_query_text_length`, `diff_limit` (see below)
`pg_stat_activity`	`pg_stat_activity`	Backend PID, user, app, wait event, transaction age, query ID (never query text)
`pg_locks`	`pg_locks` + `pg_blocking_pids()`	Only blocked-or-blocking locks, with wait time and lock mode	Every 30 s
`pg_prepared_xacts`	`pg_prepared_xacts`	Prepared (two-phase commit) transactions and their age
`pg_stat_user_functions`	`pg_stat_user_functions`	Per-function call count and execution time (top 500 by calls)

Configuration knobs for pg_stat_statements:

include_queries (default: true). If false, the query column is sent as NULL instead of obfuscated SQL. Same effect as DBT_POSTGRESQL_INCLUDE_QUERIES=false.
max_query_text_length (default and max: 8192). Truncates query text to this many characters.
diff_limit (default and max: 500). Caps the number of statement deltas reported per tick.

Tables and indexes (every minute)

Collector	Source	What it captures	Notes
`pg_stat_user_tables`	`pg_stat_user_tables`	Per-table seq/index scans, row churn, vacuum/analyze counts and timing, dead tuples	Sampled across three buckets (writes, reads, dead tuples), default 200 rows each
`pg_stat_user_indexes`	`pg_stat_user_indexes`	Per-index scan count, last scan time, tuples read/fetched	Sampled across three buckets (scans, rows returned, unused)
`pg_statio_user_tables`	`pg_statio_user_tables`	Per-table heap, index, and toast block reads/hits	Top 500 active tables
`pg_statio_user_indexes`	`pg_statio_user_indexes`	Per-index block reads vs cache hits	Delta-mode after backfill; backfill batch default 2 000

Configuration knobs:

category_limit on pg_stat_user_tables and pg_stat_user_indexes (default: 200). Caps rows per sampling bucket.
backfill_batch_size on pg_statio_user_indexes (default: 2 000). Sets the initial-load batch size before delta mode kicks in.

Replication (every minute)

Collector	Source	What it captures
`pg_stat_replication`	`pg_stat_replication`	Per-standby LSN positions, replication lag, sync state, Xmin age
`pg_stat_wal_receiver`	`pg_stat_wal_receiver`	Standby WAL receiver status and LSN positions
`pg_replication_slots`	`pg_replication_slots`	Replication slot state, LSN positions, WAL retention, Xmin age, conflict status
`pg_stat_replication_slots`	`pg_stat_replication_slots`	Slot streaming activity: spill / stream transactions and bytes (PG14+)
`pg_stat_subscription`	`pg_stat_subscription`	Per-subscription worker state and LSN
`pg_stat_subscription_stats`	`pg_stat_subscription_stats`	Logical-replication apply / sync error counts (PG15+)
`pg_stat_recovery_prefetch`	`pg_stat_recovery_prefetch`	Standby recovery-prefetch metrics: cache hits, skips, I/O depth (PG15+)

Progress reporting (every 30 seconds)

Collector	Source	What it captures
`pg_stat_progress_vacuum`	`pg_stat_progress_vacuum`	In-flight VACUUM / autovacuum phase, blocks scanned, dead tuples
`pg_stat_progress_analyze`	`pg_stat_progress_analyze`	In-flight ANALYZE phase, sample blocks scanned, extended-stats progress
`pg_stat_progress_create_index`	`pg_stat_progress_create_index`	In-flight `CREATE INDEX` phase, block and tuple progress

Schema and catalog metadata (every 5 minutes)

These collectors describe what your schema looks like; they do not read row data.

Collector	Source	What it captures	Notes
`pg_class`	`pg_class` + `pg_namespace` + `pg_am` + `pg_stat_user_tables`	Tables, indexes, matviews: page count, tuple estimates, freezing age, access method, partitioning info	Backfill in 500-row batches; full rescan periodically
`pg_index`	`pg_index` + `pg_class`	Index keys, opclasses, predicates, expressions, validity, and full DDL via `pg_get_indexdef()`
`pg_attribute`	`pg_attribute` + `pg_class`	Column metadata: type, length, storage method, NOT NULL, statistics target
`pg_constraint`	`pg_constraint`	Primary keys, unique, foreign-key, exclusion constraints
`pg_type`	`pg_type` + `pg_namespace`	Built-in and user-defined data types and their properties	Skipped if unchanged; heartbeat every 5 ticks
`pg_stats`	`pg_stats`	Column statistics: null fraction, distinct count, correlation, histograms, MCVs	Backfill in 200-row batches. `include_table_data=false` strips MCV/histogram values for privacy
`ddl`	`pg_get_*def()`	Hash and full DDL of every user-schema object (tables, views, indexes, sequences, functions, types)	Skipped if unchanged; heartbeat every 5 ticks

Configuration knob:

include_table_data on pg_stats (default: true). Set to false to omit the actual values inside MCV arrays and histogram bounds. Statistical shape is preserved either way.

System monitoring

PostgreSQL statistics tell us what the database is doing; system monitoring tells us what the host underneath can take. The agent pairs the two so DBtune can size memory-bound parameters (shared_buffers, work_mem, effective_cache_size), reason about I/O headroom (random_page_cost, effective_io_concurrency), and avoid recommending settings the machine cannot support.

Host metrics (every 5 seconds)

The agent samples the live state of the host alongside the database. These are the metrics that move minute-to-minute and feed directly into tuning decisions.

Group	Metrics
CPU	utilization (overall and, where available, per-core)
Memory	total, available, used, free, cached, buffers, shared, slab, active, inactive, percent used
Disk I/O	read and write counts, bytes, time, merged counts; busy time; queue depth where available
Network (containers only)	bytes in/out

The exact source depends on where your database lives:

Platform	Source	Notes
Self-hosted, Patroni, on-premise	`gopsutil` (host-local CPU, memory, disk I/O)	Highest fidelity: full per-device breakdown
Docker	Docker stats API	Limited to container-visible CPU and memory
Kubernetes / CNPG	Kubernetes metrics API + cAdvisor + kubelet	Container CPU, memory, disk usage, IOPS, network I/O
AWS RDS / Aurora	CloudWatch + Performance Insights	CPU utilization, memory used and freeable, IOPS read/write/total
Google Cloud SQL	Google Cloud Monitoring	CPU, memory, disk usage, IOPS, network bytes
Aiven	Aiven metrics API	1-hour resolution, cached. Tuning is therefore coarser than on self-hosted setups
Azure Database for PostgreSQL Flexible Server	Azure Monitor	CPU, memory, IOPS

Cloud platforms only expose what their metrics APIs allow; on managed services you can expect CPU, memory, and IOPS, but not the per-device disk breakdown available on self-hosted hosts.

System information (every minute)

Slow-moving facts about the machine and the database server are refreshed once a minute. These define the envelope DBtune tunes inside, and rarely change between samples.

Group	What we collect
Hardware	Number of CPUs, total memory, available memory, disk type, cloud provider, instance type
Software	PostgreSQL version, operating system, maximum connections, total database size

Hardware shape is what lets DBtune translate a recommendation like "use ~25% of RAM for shared_buffers" into an actual byte value, and what flags an instance as too small for a target before any change is made.

What system monitoring is not

The agent does not run shell commands on the host, does not read arbitrary files, and does not collect process-level metrics for processes outside the database. On managed platforms it never bypasses the provider's metrics API.

Data transmission

All metrics travel over encrypted HTTPS to the DBtune server, authenticated with your API key. Keep that key secure and rotate it periodically.

Compliance and trust

Open source agent: the source lives at github.com/dbtuneai/dbtune-agent, so every collector and every query above is auditable.
Network isolation: the agent runs inside your network and only egresses to the DBtune API.
Production safety: built-in guardrails (e.g. memory thresholds) prevent unsafe configuration changes.
Per-collector control: anything you do not want collected can be turned off with a single environment variable.

For more on how individual tuning targets are computed from these metrics, see Tuning targets.

Data privacy and security​

What DBtune collects​

Query text​

What DBtune does NOT collect​

How collection works​

Collector reference​

Hot signals (every 5 seconds)​

Database-wide statistics (every minute)​

Query and session activity (every minute, hot collectors faster)​

Tables and indexes (every minute)​

Replication (every minute)​

Progress reporting (every 30 seconds)​

Schema and catalog metadata (every 5 minutes)​

System monitoring​

Host metrics (every 5 seconds)​

System information (every minute)​

What system monitoring is not​

Data transmission​

Compliance and trust​