Scalability
Scaling strategies and capacity planning for Bingsan
Scalability
Bingsan is designed to scale horizontally while maintaining consistency.
Horizontal Scaling
Stateless Architecture
Each Bingsan instance is stateless:
- All state stored in PostgreSQL
- No inter-node communication required
- Any node can handle any request
- Simple load balancing (round-robin works fine)
┌──────────────────────────────────────────┐
│ Load Balancer │
└────────────────────┬─────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node 1 │ │ Node 2 │ │ Node N │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└─────────────┼─────────────┘
▼
┌───────────────┐
│ PostgreSQL │
└───────────────┘Kubernetes Deployment
Scale with Kubernetes HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: bingsan
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: bingsan
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70PostgreSQL Scaling
Connection Pool Sizing
Total connections across all instances:
total_connections = max_open_conns × num_instancesExample: 25 connections × 10 instances = 250 connections
Connection Pooling (PgBouncer)
For many instances, use PgBouncer:
[databases]
iceberg_catalog = host=postgres port=5432 dbname=iceberg_catalog
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50Performance Characteristics
Latency
Typical operation latencies:
| Operation | p50 | p99 |
|---|---|---|
| List namespaces | 2ms | 10ms |
| Get namespace | 1ms | 5ms |
| List tables | 3ms | 15ms |
| Load table | 5ms | 25ms |
| Create table | 20ms | 100ms |
| Commit table | 30ms | 150ms |
Throughput
Single node capacity (depends on hardware):
- Reads: 5,000-10,000 requests/second
- Writes: 500-2,000 requests/second
Scale linearly by adding nodes.
Resource Usage
Per instance:
| Resource | Typical | Peak |
|---|---|---|
| Memory | 50-100 MB | 200 MB |
| CPU | 0.2 cores | 1 core |
| Goroutines | 100-500 | 2,000 |
Bottlenecks and Solutions
PostgreSQL Connections
Symptom: too many connections errors
Solutions:
- Use PgBouncer
- Reduce
max_open_connsper instance - Increase PostgreSQL
max_connections
Lock Contention
Symptom: High commit latency, timeout errors
Solutions:
- Increase
lock_timeout - Reduce write frequency to same table
- Partition workloads across tables
Capacity Planning
Estimating Instances
instances = (peak_requests_per_second / requests_per_instance) × 1.5Example: 10,000 RPS with 5,000 RPS/instance = 3 instances × 1.5 = 5 instances
Database Sizing
Metadata size per table: ~10-50 KB
database_size ≈ num_tables × 30 KB + num_namespaces × 1 KB10,000 tables ≈ 300 MB database
High Availability
Multiple Instances
Run at least 3 instances for HA:
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1PostgreSQL HA
Use managed PostgreSQL with automatic failover:
- AWS RDS Multi-AZ
- GCP Cloud SQL HA
- Azure Database for PostgreSQL
Or deploy with Patroni/Stolon for self-managed HA.
Multi-Region
Active-Passive
Single primary region, standby in secondary region with PostgreSQL replication.
Active-Active (Sharded)
Partition by namespace prefix:
analytics.* → Region A
raw.* → Region B
staging.* → Region CEach region has its own database and Bingsan cluster.
Monitoring for Scale
Key Metrics
# Request rate per instance
sum(rate(iceberg_catalog_http_requests_total[5m])) by (instance)
# Database connection utilization
iceberg_db_connections_in_use / iceberg_db_connections_max
# Lock wait time
rate(iceberg_db_wait_duration_seconds_total[5m])Scaling Triggers
Auto-scale based on:
- CPU > 70%
- Request latency p99 > 100ms
- Request queue > 100