YDB Roadmap
Legend
We use the following symbols as abbreviations:
- γ - feature appeared in the Roadmap for 2023;
- γ - feature appeared in the Roadmap for 2024;
- β
- feature has been released;
- π§ - feature is partially available and is under development;
- β - feature has been refused;
- π₯ - not yet released, but we are in rush.
Query Processor
- γ Unique secondary indexes
- γ Apply indexes automatically to optimize data fetching
- γ Default values for table columns
- γ Asynchronous LLVM JIT query compilation
- γ Parameters in DECLARE clause are becoming optional, better SQL compatibility
- γ Cost-based optimizer for join order selection
- γ
INSERT INTO table FROM SELECT
for large datasets
- γ Support for transactional writes into both row and column tables
- γ Support for computed columns in a table
- γ Support for temporary tables
- γ Support for VIEW SQL clause
- γ Data Spilling in case there is issufient amount of RAM
- γ TPC-H, TPC-DS for 100TB dataset
- β
γ Support for Snapshot Readonly transactions mode
- π§ γ Better resource management for KQP Resource Manager (share information about nodes resources, avoid OOMs)
- β
γ Switch to New Engine for OLTP queries
- β
γ Support
not null
for PK (primary key) table columns
- β
γ Aggregates and predicates push down to column-oriented tables
- β
γ Optimize data formats for data transition between query phases
- β
γ Index Rename/Rebuild
- β
γ KQP Session Actor as a replacement for KQP Worker Actor (optimize to reduce CPU usage)
- PostgreSQL compatibility
- β
γ Support PostgreSQL datatypes serialization/deserialization in YDB Public API
- π§ γ PostgreSQL compatible query execution (TPC-C, TPC-H queries should work)
- β
γ Support for PostgreSQL wire protocol
- γ Support a single Database connection string instead of multiple parameters
- γ Support constraints in query optimizer
- Query Processor 3.0 (a set of tasks to be more like traditional database in case of query execution functionality)
- γ Support for Streaming Lookup Join via MVCC snapshots (avoid distributed transactions, scalability is better)
- γ Universal API call for DML, DDL with unlimited results size for OLTP/OLAP workload (aka ExecuteQuery)
- β
γ Support for secondary indexes in ScanQuery
- β
γ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
- β
γ Computation graphs caching (compute/datashard programs) (optimize CPU usage)
- π§ γ RPC Deadline & Cancellation propagation (smooth timeout management)
- β
γ DDL for column-oriented tables
Database Core (Tablets, etc)
- γ Volatile transactions. YDB Distributed transactions 2.0, minimize network round trips in happy path
- γ Table statistics for cost-based optimizer
- γ Memory optimization for row tables (avoid full SST index loading, dynamic cache adjusting)
- γ Reduce minimum requirements for the number of cores to 2 for YDB node
- γ Incremental backup and Point-in-time recovery
- γ
ALTER CHANGEFEED
- γ Async Replication between YDB databases (column tables, topics)
- γ Async Replication between YDB databases (schema changes)
- γ Support for Debezium format
- γ Topics autoscaling (increase/decrease number of partitions in the topic automatically)
- γ Extended Kafka API protocol to YDB Topics support (balance reads, support for v19)
- γ Schema for YDB Topics
- γ Message-level parallelism in YDB Topics
- β
γ Get YDB topics (aka pers queue, streams) ready for production
- β
γ Turn on MVCC support by default
- β
γ Enable Snapshot read mode by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads)
- β
γ Change Data Capture (be able to get change feed of table updates)
- π₯ γ Async Replication between YDB databases (first version, row tables, w/o schema changes)
- β
γ Background compaction for DataShards
- β
γ Compressed Backups. Add functionality to compress backup data
- γ Process of Extending State Storage without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck.
- Split/Merge DataShards BY LOAD by default. Most users require this feature turned on by default
- β
γ Support PostgreSQL datatypes in tablet local database
- Basic histogram for DataShards (first step towards cost based optimizations)
- β
γ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
- γ Data Ingestion from topic to table (implement built-in compatibility to ingest data to YDB tables from topics)
- γ Support snapshot read over read replicas (consistent reads against read replicas)
- γ π§ Transactions between topics and tables
- β
γ Support for Kafka API compatible protocol to YDB Topics
Hardcore or system wide
- γ Tracing capabilities
- γ Automatically balance tablet channels via BlobStorage groups
- β
γ Datashard iterator reads via MVCC
- β (refused) γ Switch to TRope (or don't use TString/std::string directly, provide zero-copy data passing between components)
- γ Avoid Node Broker as SPF (NBS must work without Node Broker under emergency conditions)
- γ Subscriptions in SchemeBoard (optimize interaction with SchemeBoard via subsription to updates)
Security
- β
γ Basic LDAP Support
- γ Support for OpenID Connect
- γ Authentication via KeyCloack
- γ Support for SASL framework
BlobStorage
- γ BlobStorage latency optimization (p999), less CPU consumption
- γ ActorSystem performance optimizations
- γ Optimize ActorSystem for ARM processors
- γ Effortless initial cluster deployment (provide only nodes and disks description)
- γ Reduce number of BlobStorage groups for a database (add ability to remove unneeded groups)
- γ "One leg" storage migration without downtime (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding)
- β
γ ActorSystem 1.5 (dynamically reassign threads in different thread pools)
- β
γ Publish an utility for BlobStorage management (it's called ds_tool for now, improve it and open)
- γ Self-heal for degrated BlobStorage groups (automatic self-heal for groups with two broken disks, get VDisk Donors production ready)
- γ BlobDepot (a component for smooth blobs management between groups)
- γ Avoid BSC (BlobStorage Controller) as SPF (be able to run the cluster without BSC in emergency cases)
- γ BSC manages static group (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group)
- γ (Semi-)Hard disk space separation (Better guarantees for disk space usage by VDisks on a single PDisk)
- γ Reduce space amplification (Optimize storage layer)
- β
γ Storage nodes decommission (Add ability to remove storage nodes)
Analytical Capabilities
- γ Backup for column tables
- γ Column tables autosharding
- γ π§ Log Store (log friendly column-oriented storage which allows to create 1+ million tables for logs storing)
- γ π§ Column-oriented Tables (introduce a Column-oriented tables in additon to Row-orinted tables)
- γ Tiered Storage for Column-oriented Tables (with the ability to store the data in S3)
Federated Query
- β
γ Run the first version
Embedded UI
Detailed roadmap could be found at YDB Embedded UI repo.
Command Line Utility
- π§ γ Use a single
ydb yql
instead of ydb table query
or ydb scripting
- β
γ Interactive CLI
Tests and Benchmarks
- γ Built-in load test for DataShards in YCSB manner
- β
γ
ydb workload
for topics
- Jepsen tests support
Experiments
- β (refused) Try RTMR-tablet for key-value workload