Architecture
Architecture
Understanding Bingsan's architecture for deployment and tuning
Architecture
Understanding Bingsan's architecture helps with deployment planning, performance tuning, and troubleshooting.
Overview
Bingsan is a stateless Go application that implements the Apache Iceberg REST Catalog specification. All persistent state is stored in PostgreSQL.
┌─────────────────────────────────────────────────────────────┐
│ Clients │
│ (Spark, Trino, Flink, PyIceberg, etc.) │
└─────────────────────────┬───────────────────────────────────┘
│ REST API (HTTP)
┌─────────────────────────▼───────────────────────────────────┐
│ Bingsan Cluster │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node N │ (Stateless) │
│ │ :8181 │ │ :8181 │ │ :8181 │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ Distributed Locking │
└────────────────────┼────────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
┌───────▼───────┐ ┌────────▼────────┐
│ PostgreSQL │ │ S3 / GCS │
│ (Metadata) │ │ (Data Lake) │
└───────────────┘ └─────────────────┘Key Components
HTTP Server
- Built on Fiber (fasthttp)
- High-performance, low-memory HTTP handling
- Supports HTTP/1.1 with keep-alive
Database Layer
- PostgreSQL for all metadata storage
- Connection pooling via pgx/v5
- Automatic schema migrations
- Advisory locks for distributed locking
Storage Integration
- Generates storage paths for tables
- Vends credentials for client data access
- Supports S3, GCS, and local filesystem
Event Streaming
- WebSocket-based real-time events
- Publish/subscribe model
- Namespace-level filtering
Design Principles
Stateless Nodes
Each Bingsan instance is stateless:
- All state in PostgreSQL
- No inter-node communication
- Any node can handle any request
- Easy horizontal scaling
Optimistic Concurrency
Table commits use optimistic concurrency control:
- Client reads current metadata
- Client submits changes with requirements
- Server validates requirements against current state
- If valid, changes are applied atomically
Distributed Locking
PostgreSQL row-level locking with configurable timeouts prevents concurrent modifications:
- Row-level locks with
SELECT ... FOR UPDATE - Configurable
lock_timeoutper transaction - Automatic retry with exponential backoff
- Handles lock conflicts gracefully
See Distributed Locking for configuration details.
Object Pooling
Memory optimization through buffer reuse:
sync.Pool-based buffer pooling- Reduces GC pressure under high load
- Prometheus metrics for pool health
See Object Pooling for implementation details.
Sections
- Request Flow - How requests are processed
- Data Model - Database schema and metadata storage
- Scalability - Scaling strategies and limits