Performance
Distributed Locking
PostgreSQL-based locking with retry logic
Distributed Locking
Bingsan uses PostgreSQL row-level locking with configurable timeouts and retry logic to handle concurrent modifications safely across multiple instances.
Overview
When multiple Bingsan instances modify the same resource simultaneously, locking ensures:
- Consistency: Only one operation modifies a resource at a time
- Isolation: Operations don't see partial states
- Automatic Recovery: Failed locks are retried with backoff
Configuration
Configure locking via config.yaml:
catalog:
lock_timeout: 30s
lock_retry_interval: 100ms
max_lock_retries: 100Environment Variables
ICEBERG_CATALOG_LOCK_TIMEOUT=30s
ICEBERG_CATALOG_LOCK_RETRY_INTERVAL=100ms
ICEBERG_CATALOG_MAX_LOCK_RETRIES=100How It Works
Lock Acquisition Flow
- Begin Transaction
- SET LOCAL lock_timeout = '30s'
- Execute operation (SELECT ... FOR UPDATE)
- On Success: Commit Transaction
- On Lock Timeout (55P03): Rollback, wait retry_interval, retry
- If max_retries exceeded: Return ErrLockTimeout
PostgreSQL Lock Timeout
Each transaction sets lock_timeout locally:
BEGIN;
SET LOCAL lock_timeout = '30000ms';
SELECT * FROM tables WHERE id = $1 FOR UPDATE;
-- ... perform update ...
COMMIT;Configuration Options
| Option | Default | Description |
|---|---|---|
lock_timeout | 30s | Max time to wait for a single lock attempt |
lock_retry_interval | 100ms | Wait time between retry attempts |
max_lock_retries | 100 | Maximum retry attempts before failing |
Total Wait Time
max_wait = lock_timeout + (max_lock_retries × lock_retry_interval)
= 30s + (100 × 100ms)
= 40sTuning Guidelines
High Contention Workloads
catalog:
lock_timeout: 5s
lock_retry_interval: 50ms
max_lock_retries: 200Low Contention Workloads
catalog:
lock_timeout: 60s
lock_retry_interval: 500ms
max_lock_retries: 10Batch Processing
catalog:
lock_timeout: 120s
lock_retry_interval: 1s
max_lock_retries: 60Error Handling
ErrLockTimeout
Returned when all retries are exhausted. Client should retry the operation.
Serialization Failures
PostgreSQL serialization errors (40001) are also detected and handled.
Monitoring
Lock Wait Metrics
Monitor lock contention via PostgreSQL:
-- Active locks
SELECT * FROM pg_locks WHERE NOT granted;
-- Lock wait statistics
SELECT * FROM pg_stat_activity
WHERE wait_event_type = 'Lock';Application Metrics
# Lock timeout errors
rate(iceberg_handler_errors_total{error="lock_timeout"}[5m])Best Practices
Keep Transactions Short
// Good: Minimal work inside lock
err := db.WithLock(ctx, cfg, func(tx pgx.Tx) error {
return tx.Exec(ctx, "UPDATE tables SET ...")
})
// Bad: External calls inside lock
err := db.WithLock(ctx, cfg, func(tx pgx.Tx) error {
callExternalService() // May be slow!
return tx.Exec(ctx, "UPDATE tables SET ...")
})Troubleshooting
Frequent Lock Timeouts
Causes:
- High write contention on same tables
- Long-running transactions holding locks
- Database performance issues
Solutions:
- Increase
max_lock_retries - Decrease
lock_timeout(fail faster, retry sooner) - Check for slow queries holding locks
- Partition workloads across different tables