π§± What are DataParts?
In TuringDB, graphs are versioned as a sequence of commits. Each commit represents a snapshot of the graphβs state at a given point in time. But under the hood, each commit is composed of DataParts, the fundamental unit of storage in TuringDB architecture.π¦ DataParts Explained
- Every commit is partitioned into multiple DataParts
- Nodes and edges are stored within DataParts
- Once written, a DataPart is immutable
- Commits reference a collection of DataParts, new and inherited from previous commits
DataPart 1
, DataPart 2
, etc., each storing a portion of the graph.
β‘ Why DataParts?
TuringDB is fundamentally a read-optimized analytical graph database, but DataParts are our answer to achieving high-performance parallel batch writes and data imports, especially for large-scale ingestion workloads.π Benefits
- Write Parallelism Multiple threads or processes can write concurrently to their own private DataPart, without coordination, synchronisation or locking overhead.
- Batch Import Performance Ingesting millions of nodes and edges becomes scalable and efficient, even in a system built for sub-millisecond analytics.
- Snapshot Safety Each commit references a set of immutable DataParts, allowing us to maintain consistent snapshots and rollback history without duplication.
π§ How TuringDB Uses DataParts
Each time you add new data or modify existing node/edge properties:- TuringDB creates a new DataPart to store the changes.
- It reuses existing DataParts from the parent commit whenever possible.
- This leads to efficient incremental storage, only new or changed data consumes additional memory.

git
objects, DataParts are immutable and sharable, enabling:
- Deduplication of unchanged data
- Consistent time-travel queries
- Audit-friendly storage history
π Tuning for Performance
TuringDB can efficiently read and traverse graphs with up to 200 DataParts per commit. However, for optimal read performance, we aim to consolidate down to a single DataPart per commit.The fewer the DataParts, the faster the reads, due to improved locality, reduced CPU cache misses, and minimized lookup overhead.
π§ Roadmap: Intelligent DataPart Merging
We are actively developing policies and algorithms to intelligently merge DataParts in the background. The goal is to:- Automatically compact multiple DataParts into fewer ones
- Detect hot paths and frequently accessed subgraphs
- Optimize for query throughput and storage locality
π‘ Summary
Feature | Benefit |
---|---|
Immutable DataParts | Safe versioning and reuse |
Parallel write ingestion | High-performance batch processing |
Shared storage across commits | Lower memory usage, fast snapshots |
Merge roadmap | Compact layout for ultimate read speed |
π Related Concepts
- ClickHouse: Parts , A similar model used in high-performance columnar stores to enable immutability, versioning, and efficient compaction.