BingsanBingsan
Architecture

Scalability

Scaling strategies and capacity planning for Bingsan

Scalability

Bingsan is designed to scale horizontally while maintaining consistency.

Horizontal Scaling

Stateless Architecture

Each Bingsan instance is stateless:

  • All state stored in PostgreSQL
  • No inter-node communication required
  • Any node can handle any request
  • Simple load balancing (round-robin works fine)
┌──────────────────────────────────────────┐
│              Load Balancer               │
└────────────────────┬─────────────────────┘

     ┌───────────────┼───────────────┐
     ▼               ▼               ▼
┌─────────┐   ┌─────────┐   ┌─────────┐
│ Node 1  │   │ Node 2  │   │ Node N  │
└────┬────┘   └────┬────┘   └────┬────┘
     │             │             │
     └─────────────┼─────────────┘

          ┌───────────────┐
          │  PostgreSQL   │
          └───────────────┘

Kubernetes Deployment

Scale with Kubernetes HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: bingsan
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: bingsan
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

PostgreSQL Scaling

Connection Pool Sizing

Total connections across all instances:

total_connections = max_open_conns × num_instances

Example: 25 connections × 10 instances = 250 connections

Connection Pooling (PgBouncer)

For many instances, use PgBouncer:

[databases]
iceberg_catalog = host=postgres port=5432 dbname=iceberg_catalog

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50

Performance Characteristics

Latency

Typical operation latencies:

Operationp50p99
List namespaces2ms10ms
Get namespace1ms5ms
List tables3ms15ms
Load table5ms25ms
Create table20ms100ms
Commit table30ms150ms

Throughput

Single node capacity (depends on hardware):

  • Reads: 5,000-10,000 requests/second
  • Writes: 500-2,000 requests/second

Scale linearly by adding nodes.

Resource Usage

Per instance:

ResourceTypicalPeak
Memory50-100 MB200 MB
CPU0.2 cores1 core
Goroutines100-5002,000

Bottlenecks and Solutions

PostgreSQL Connections

Symptom: too many connections errors

Solutions:

  • Use PgBouncer
  • Reduce max_open_conns per instance
  • Increase PostgreSQL max_connections

Lock Contention

Symptom: High commit latency, timeout errors

Solutions:

  • Increase lock_timeout
  • Reduce write frequency to same table
  • Partition workloads across tables

Capacity Planning

Estimating Instances

instances = (peak_requests_per_second / requests_per_instance) × 1.5

Example: 10,000 RPS with 5,000 RPS/instance = 3 instances × 1.5 = 5 instances

Database Sizing

Metadata size per table: ~10-50 KB

database_size ≈ num_tables × 30 KB + num_namespaces × 1 KB

10,000 tables ≈ 300 MB database

High Availability

Multiple Instances

Run at least 3 instances for HA:

spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

PostgreSQL HA

Use managed PostgreSQL with automatic failover:

  • AWS RDS Multi-AZ
  • GCP Cloud SQL HA
  • Azure Database for PostgreSQL

Or deploy with Patroni/Stolon for self-managed HA.

Multi-Region

Active-Passive

Single primary region, standby in secondary region with PostgreSQL replication.

Active-Active (Sharded)

Partition by namespace prefix:

analytics.* → Region A
raw.*       → Region B
staging.*   → Region C

Each region has its own database and Bingsan cluster.

Monitoring for Scale

Key Metrics

# Request rate per instance
sum(rate(iceberg_catalog_http_requests_total[5m])) by (instance)

# Database connection utilization
iceberg_db_connections_in_use / iceberg_db_connections_max

# Lock wait time
rate(iceberg_db_wait_duration_seconds_total[5m])

Scaling Triggers

Auto-scale based on:

  • CPU > 70%
  • Request latency p99 > 100ms
  • Request queue > 100

On this page