YDB Roadmap
Legend
We use the following symbols as abbreviations:
- ㉓ - feature appeared in the Roadmap for 2023;
- ㉔ - feature appeared in the Roadmap for 2024;
- ✅ - feature has been released;
- 🚧 - feature is partially available and is under development;
- ❌ - feature has been refused;
- 🔥 - not yet released, but we are in rush.
Query Processor
- ㉔ Unique secondary indexes
- ㉔ Apply indexes automatically to optimize data fetching
- ㉔ Default values for table columns
- ㉔ Asynchronous LLVM JIT query compilation
- ㉔ Parameters in DECLARE clause are becoming optional, better SQL compatibility
- 🚧㉔ Cost-based optimizer for join order and join algorithm selection
- ㉔
INSERT INTO table FROM SELECT
for large datasets
- ㉔ Support for transactional writes into both row and column tables
- ㉔ Support for computed columns in a table
- ㉔ Support for temporary tables
- ㉔ Support for VIEW SQL clause
- ㉔ Data Spilling in case there is issufient amount of RAM
- ㉔ TPC-H, TPC-DS for 100TB dataset
- ✅ ㉓ Support for Snapshot Readonly transactions mode
- 🚧 ㉓ Better resource management for KQP Resource Manager (share information about nodes resources, avoid OOMs)
- ✅ ㉓ Switch to New Engine for OLTP queries
- ✅ ㉓ Support
not null
for PK (primary key) table columns
- ✅ ㉓ Aggregates and predicates push down to column-oriented tables
- ✅ ㉓ Optimize data formats for data transition between query phases
- ✅ ㉓ Index Rename/Rebuild
- ✅ ㉓ KQP Session Actor as a replacement for KQP Worker Actor (optimize to reduce CPU usage)
- PostgreSQL compatibility
- ✅ ㉓ Support PostgreSQL datatypes serialization/deserialization in YDB Public API
- 🚧 ㉓ PostgreSQL compatible query execution (TPC-C, TPC-H queries should work)
- ✅ ㉓ Support for PostgreSQL wire protocol
- ㉓ Support a single Database connection string instead of multiple parameters
- ㉓ Support constraints in query optimizer
- Query Processor 3.0 (a set of tasks to be more like traditional database in case of query execution functionality)
- ㉓ Support for Streaming Lookup Join via MVCC snapshots (avoid distributed transactions, scalability is better)
- ㉓ Universal API call for DML, DDL with unlimited results size for OLTP/OLAP workload (aka ExecuteQuery)
- ✅ ㉓ Support for secondary indexes in ScanQuery
- ✅ ㉓ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
- ✅ ㉓ Computation graphs caching (compute/datashard programs) (optimize CPU usage)
- 🚧 ㉓ RPC Deadline & Cancellation propagation (smooth timeout management)
- ✅ ㉓ DDL for column-oriented tables
Database Core (Tablets, etc)
- ㉔ Volatile transactions. YDB Distributed transactions 2.0, minimize network round trips in happy path
- ㉔ Table statistics for cost-based optimizer
- ㉔ Memory optimization for row tables (avoid full SST index loading, dynamic cache adjusting)
- ㉔ Reduce minimum requirements for the number of cores to 2 for YDB node
- ㉔ Incremental backup and Point-in-time recovery
- ㉔
ALTER CHANGEFEED
- ㉔ Async Replication between YDB databases (column tables, topics)
- ㉔ Async Replication between YDB databases (schema changes)
- ㉔ Support for Debezium format
- ㉔ Topics autoscaling (increase/decrease number of partitions in the topic automatically)
- ㉔ Extended Kafka API protocol to YDB Topics support (balance reads, support for v19)
- ㉔ Schema for YDB Topics
- ㉔ Message-level parallelism in YDB Topics
- ✅ ㉓ Get YDB topics (aka pers queue, streams) ready for production
- ✅ ㉓ Turn on MVCC support by default
- ✅ ㉓ Enable Snapshot read mode by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads)
- ✅ ㉓ Change Data Capture (be able to get change feed of table updates)
- 🔥 ㉓ Async Replication between YDB databases (first version, row tables, w/o schema changes)
- ✅ ㉓ Background compaction for DataShards
- ✅ ㉓ Compressed Backups. Add functionality to compress backup data
- ㉓ Process of Extending State Storage without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck.
- Split/Merge DataShards BY LOAD by default. Most users require this feature turned on by default
- ✅ ㉓ Support PostgreSQL datatypes in tablet local database
- Basic histogram for DataShards (first step towards cost based optimizations)
- ✅ ㉓ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
- ㉓ Data Ingestion from topic to table (implement built-in compatibility to ingest data to YDB tables from topics)
- ㉓ Support snapshot read over read replicas (consistent reads against read replicas)
- ㉓ 🚧 Transactions between topics and tables
- ✅ ㉓ Support for Kafka API compatible protocol to YDB Topics
Hardcore or system wide
- ㉔ Tracing capabilities
- ㉔ Automatically balance tablet channels via BlobStorage groups
- ✅ ㉓ Datashard iterator reads via MVCC
- ❌ (refused) ㉓ Switch to TRope (or don't use TString/std::string directly, provide zero-copy data passing between components)
- ㉓ Avoid Node Broker as SPF (NBS must work without Node Broker under emergency conditions)
- ㉓ Subscriptions in SchemeBoard (optimize interaction with SchemeBoard via subsription to updates)
Security
- ✅ ㉓ Basic LDAP Support
- ㉔ Support for OpenID Connect
- ㉔ Authentication via KeyCloack
- ㉔ Support for SASL framework
BlobStorage
- ㉔ BlobStorage latency optimization (p999), less CPU consumption
- ㉔ ActorSystem performance optimizations
- ㉔ Optimize ActorSystem for ARM processors
- ㉔ Effortless initial cluster deployment (provide only nodes and disks description)
- ㉔ Reduce number of BlobStorage groups for a database (add ability to remove unneeded groups)
- ㉓ "One leg" storage migration without downtime (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding)
- ✅ ㉓ ActorSystem 1.5 (dynamically reassign threads in different thread pools)
- ✅ ㉓ Publish an utility for BlobStorage management (it's called ds_tool for now, improve it and open)
- ㉓ Self-heal for degrated BlobStorage groups (automatic self-heal for groups with two broken disks, get VDisk Donors production ready)
- ㉓ BlobDepot (a component for smooth blobs management between groups)
- ㉓ Avoid BSC (BlobStorage Controller) as SPF (be able to run the cluster without BSC in emergency cases)
- ㉓ BSC manages static group (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group)
- ㉓ (Semi-)Hard disk space separation (Better guarantees for disk space usage by VDisks on a single PDisk)
- ㉓ Reduce space amplification (Optimize storage layer)
- ✅ ㉓ Storage nodes decommission (Add ability to remove storage nodes)
Analytical Capabilities
- ㉔ Backup for column tables
- ㉔ Column tables autosharding
- ㉓ 🚧 Log Store (log friendly column-oriented storage which allows to create 1+ million tables for logs storing)
- ㉓ 🚧 Column-oriented Tables (introduce a Column-oriented tables in additon to Row-orinted tables)
- ㉓ Tiered Storage for Column-oriented Tables (with the ability to store the data in S3)
Federated Query
- ✅ ㉓ Run the first version
Embedded UI
Detailed roadmap could be found at YDB Embedded UI repo.
Command Line Utility
- 🚧 ㉓ Use a single
ydb yql
instead of ydb table query
or ydb scripting
- ✅ ㉓ Interactive CLI
Tests and Benchmarks
- ㉓ Built-in load test for DataShards in YCSB manner
- ✅ ㉓
ydb workload
for topics
- Jepsen tests support
Experiments
- ❌ (refused) Try RTMR-tablet for key-value workload