BingsanBingsan
Architecture

Data Model

Database schema and metadata storage in Bingsan

Data Model

Bingsan stores all catalog metadata in PostgreSQL. This page describes the database schema.

Overview

┌─────────────────┐     ┌─────────────────┐
│   namespaces    │     │     tables      │
├─────────────────┤     ├─────────────────┤
│ id              │────▶│ namespace_id    │
│ name            │     │ name            │
│ properties      │     │ metadata_loc    │
└─────────────────┘     │ metadata        │
                        └─────────────────┘


┌─────────────────┐     ┌─────────────────┐
│     views       │     │   scan_plans    │
├─────────────────┤     ├─────────────────┤
│ namespace_id    │     │ table_id        │
│ name            │     │ plan_id         │
│ metadata_loc    │     │ status          │
│ metadata        │     │ tasks           │
└─────────────────┘     └─────────────────┘

Tables

namespaces

Stores namespace metadata.

ColumnTypeDescription
idBIGSERIALPrimary key
nameTEXT[]Namespace name as array
propertiesJSONBNamespace properties
created_atTIMESTAMPTZCreation timestamp
updated_atTIMESTAMPTZLast update timestamp

tables

Stores Iceberg table metadata.

ColumnTypeDescription
idBIGSERIALPrimary key
namespace_idBIGINTForeign key to namespaces
nameTEXTTable name
table_uuidUUIDIceberg table UUID
metadata_locationTEXTPath to current metadata file
metadataJSONBCached table metadata (optional)
created_atTIMESTAMPTZCreation timestamp
updated_atTIMESTAMPTZLast update timestamp

views

Stores Iceberg view metadata with similar structure to tables.

scan_plans

Stores scan plan state for server-side planning.

ColumnTypeDescription
idBIGSERIALPrimary key
plan_idUUIDExternal plan identifier
table_idBIGINTForeign key to tables
statusTEXTPlan status
requestJSONBOriginal plan request
tasksJSONBComputed scan tasks

Metadata Storage Strategy

Database vs Object Storage

Bingsan uses a hybrid approach:

PostgreSQL stores:

  • Namespace metadata
  • Table/view registry (name, UUID, location)
  • Scan plan state
  • Cached metadata (optional)

Object storage (S3/GCS) stores:

  • Full Iceberg metadata JSON files
  • Manifest lists
  • Manifests
  • Data files

Metadata Caching

Table metadata can be cached in PostgreSQL for faster reads, avoiding reading from object storage for every request.

Locking Model

Advisory Locks

PostgreSQL advisory locks ensure consistency:

-- Namespace-level lock
SELECT pg_advisory_lock(hashtext('ns:' || $1));

-- Table-level lock
SELECT pg_advisory_xact_lock($1);

Locks are held for the minimum necessary duration.

Migrations

Bingsan uses golang-migrate for schema migrations.

Automatic Migrations

Migrations run automatically on startup.

Manual Migrations

# Check current version
migrate -database "postgres://..." -path migrations version

# Apply migrations
migrate -database "postgres://..." -path migrations up

# Rollback one migration
migrate -database "postgres://..." -path migrations down 1

Indexes

Indexes are optimized for common query patterns:

QueryIndex Used
List namespacesidx_namespaces_name (GIN)
Get namespace by namenamespaces_name_key (UNIQUE)
List tables in namespaceidx_tables_namespace
Get table by nametables_namespace_id_name_key (UNIQUE)
Find table by UUIDidx_tables_uuid

Data Lifecycle

Namespace

  1. Create: Insert into namespaces
  2. Update: Update properties, set updated_at
  3. Delete: Delete from namespaces (must be empty)

Table

  1. Create: Insert into tables, write metadata to storage
  2. Commit: Update metadata_location, optional metadata cache
  3. Drop: Delete from tables, optionally purge storage

Backup and Recovery

PostgreSQL Backup

# Full backup
pg_dump -h localhost -U iceberg iceberg_catalog > backup.sql

# Restore
psql -h localhost -U iceberg iceberg_catalog < backup.sql

Since full Iceberg metadata is in object storage, the database can be rebuilt from metadata files if needed.

On this page