# YDB Roadmap ## Legend We use the following symbols as abbreviations: 1. ใ‰“ - feature appeared in the Roadmap for 2023; 1. ใ‰” - feature appeared in the Roadmap for 2024; 1. โœ… - feature has been released; 1. ๐Ÿšง - feature is partially available and is under development; 1. โŒ - feature has been refused; 1. ๐Ÿ”ฅ - not yet released, but we are in rush. ## Query Processor 1. ใ‰” **Unique secondary indexes** 1. ใ‰” Apply **indexes automatically** to optimize data fetching 1. ใ‰” **Default values** for table columns 1. ใ‰” **Asynchronous LLVM JIT** query compilation 1. ใ‰” **Parameters in DECLARE clause are becoming optional**, better SQL compatibility 1. ๐Ÿšงใ‰” **Cost-based optimizer** for join order and join algorithm selection 1. ใ‰” **``INSERT INTO table FROM SELECT``** for large datasets 1. ใ‰” Support for **transactional writes into both row and column tables** 1. ใ‰” Support for **computed columns in a table** 1. ใ‰” Support for **temporary tables** 1. ใ‰” Support for **VIEW** SQL clause 1. ใ‰” **Data Spilling** in case there is issufient amount of RAM 1. ใ‰” **TPC-H, TPC-DS for 100TB** dataset 1. โœ… ใ‰“ Support for **Snapshot Readonly** transactions mode 1. ๐Ÿšง ใ‰“ **Better resource management** for KQP Resource Manager (share information about nodes resources, avoid OOMs) 1. โœ… ใ‰“ Switch to **New Engine** for OLTP queries 1. โœ… ใ‰“ Support **`not null` for PK (primary key) table columns** 1. โœ… ใ‰“ **Aggregates and predicates push down to column-oriented tables** 1. โœ… ใ‰“ **Optimize data formats** for data transition between query phases 1. โœ… ใ‰“ **Index Rename/Rebuild** 1. โœ… ใ‰“ **KQP Session Actor** as a replacement for KQP Worker Actor (optimize to reduce CPU usage) 1. **PostgreSQL compatibility** * โœ… ใ‰“ Support PostgreSQL datatypes **serialization/deserialization** in YDB Public API * ๐Ÿšง ใ‰“ PostgreSQL compatible **query execution** (TPC-C, TPC-H queries should work) * โœ… ใ‰“ Support for PostgreSQL **wire protocol** 1. ใ‰“ Support a single **Database connection string** instead of multiple parameters 1. ใ‰“ Support **constraints in query optimizer** 1. **Query Processor 3.0** (a set of tasks to be more like traditional database in case of query execution functionality) * ใ‰“ Support for **Streaming Lookup Join** via MVCC snapshots (avoid distributed transactions, scalability is better) * ใ‰“ **Universal API call for DML, DDL with unlimited results size for OLTP/OLAP workload** (aka ExecuteQuery) * โœ… ใ‰“ Support for **secondary indexes in ScanQuery** * โœ… ใ‰“ **Transaction can see its own updates** (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction) 1. โœ… ใ‰“ **Computation graphs caching (compute/datashard programs)** (optimize CPU usage) 1. ๐Ÿšง ใ‰“ **RPC Deadline & Cancellation propagation** (smooth timeout management) 1. โœ… ใ‰“ **DDL for column-oriented tables** ## Database Core (Tablets, etc) 1. ใ‰” **Volatile transactions**. YDB Distributed transactions 2.0, minimize network round trips in happy path 1. ใ‰” **Table statistics** for cost-based optimizer 1. ใ‰” **Memory optimization for row tables** (avoid full SST index loading, dynamic cache adjusting) 1. ใ‰” Reduce minimum requirements for **the number of cores to 2** for YDB node 1. ใ‰” **Incremental backup** and **Point-in-time recovery** 1. ใ‰” **``ALTER CHANGEFEED``** 1. ใ‰” **Async Replication** between YDB databases (column tables, topics) 1. ใ‰” **Async Replication** between YDB databases (schema changes) 1. ใ‰” Support for **Debezium** format 1. ใ‰” **Topics autoscaling** (increase/decrease number of partitions in the topic automatically) 1. ใ‰” **Extended Kafka API** protocol to YDB Topics support (balance reads, support for v19) 1. ใ‰” **Schema for YDB Topics** 1. ใ‰” **Message-level parallelism** in YDB Topics 1. โœ… ใ‰“ Get **YDB topics** (aka pers queue, streams) ready for production 1. โœ… ใ‰“ Turn on **MVCC support** by default 1. โœ… ใ‰“ Enable **Snapshot read mode** by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads) 1. โœ… ใ‰“ **Change Data Capture** (be able to get change feed of table updates) 1. ๐Ÿ”ฅ ใ‰“ **Async Replication** between YDB databases (first version, row tables, w/o schema changes) 1. โœ… ใ‰“ **Background compaction for DataShards** 1. โœ… ใ‰“ **Compressed Backups**. Add functionality to compress backup data 1. ใ‰“ Process of **Extending State Storage** without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck. 1. **Split/Merge DataShards *BY LOAD* by default**. Most users require this feature turned on by default 1. โœ… ใ‰“ Support **PostgreSQL datatypes** in tablet local database 1. **Basic histogram for DataShards** (first step towards cost based optimizations) 1. โœ… ใ‰“ **Transaction can see its own updates** (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction) 1. ใ‰“ **Data Ingestion from topic to table** (implement built-in compatibility to ingest data to YDB tables from topics) 1. ใ‰“ Support **snapshot read over read replicas** (consistent reads against read replicas) 1. ใ‰“ ๐Ÿšง **Transactions between topics and tables** 1. โœ… ใ‰“ Support for **Kafka API compatible protocol** to YDB Topics ### Hardcore or system wide 1. ใ‰” **Tracing** capabilities 1. ใ‰” Automatically **balance tablet channels** via BlobStorage groups 1. โœ… ใ‰“ **Datashard iterator reads via MVCC** 1. โŒ *(refused)* ใ‰“ **Switch to TRope** (or don't use TString/std::string directly, provide zero-copy data passing between components) 1. ใ‰“ **Avoid Node Broker as SPF** (NBS must work without Node Broker under emergency conditions) 1. ใ‰“ **Subscriptions in SchemeBoard** (optimize interaction with SchemeBoard via subsription to updates) ## Security 1. โœ… ใ‰“ Basic LDAP Support 1. ใ‰” Support for OpenID Connect 1. ใ‰” Authentication via KeyCloack 1. ใ‰” Support for SASL framework ## BlobStorage 1. ใ‰” BlobStorage **latency optimization** (p999), less CPU consumption 1. ใ‰” **ActorSystem performance optimizations** 1. ใ‰” Optimize **ActorSystem for ARM processors** 1. ใ‰” **Effortless initial cluster deployment** (provide only nodes and disks description) 1. ใ‰” **Reduce number of BlobStorage groups** for a database (add ability to remove unneeded groups) 1. ใ‰“ **"One leg" storage migration without downtime** (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding) 1. โœ… ใ‰“ **ActorSystem 1.5** (dynamically reassign threads in different thread pools) 1. โœ… ใ‰“ **Publish an utility for BlobStorage management** (it's called ds_tool for now, improve it and open) 1. ใ‰“ **Self-heal for degrated BlobStorage groups** (automatic self-heal for groups with two broken disks, get VDisk Donors production ready) 1. ใ‰“ **BlobDepot** (a component for smooth blobs management between groups) 1. ใ‰“ **Avoid BSC (BlobStorage Controller) as SPF** (be able to run the cluster without BSC in emergency cases) 1. ใ‰“ **BSC manages static group** (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group) 1. ใ‰“ **(Semi-)Hard disk space separation** (Better guarantees for disk space usage by VDisks on a single PDisk) 1. ใ‰“ **Reduce space amplification** (Optimize storage layer) 1. โœ… ใ‰“ **Storage nodes decommission** (Add ability to remove storage nodes) ## Analytical Capabilities 1. ใ‰” **Backup** for column tables 1. ใ‰” Column tables **autosharding** 1. ใ‰“ ๐Ÿšง **Log Store** (log friendly column-oriented storage which allows to create 1+ million tables for logs storing) 1. ใ‰“ ๐Ÿšง **Column-oriented Tables** (introduce a Column-oriented tables in additon to Row-orinted tables) 1. ใ‰“ **Tiered Storage for Column-oriented Tables** (with the ability to store the data in S3) ## Federated Query 1. โœ… ใ‰“ **Run the first version** ## Embedded UI Detailed roadmap could be found at [YDB Embedded UI repo](https://github.com/ydb-platform/ydb-embedded-ui/blob/main/ROADMAP.md). ## Command Line Utility 1. ๐Ÿšง ใ‰“ Use a **single `ydb yql`** instead of `ydb table query` or `ydb scripting` 1. โœ… ใ‰“ Interactive CLI ## Tests and Benchmarks 1. ใ‰“ **Built-in load test for DataShards** in YCSB manner 1. โœ… ใ‰“ **`ydb workload` for topics** 1. **Jepsen tests support** ## Experiments 1. โŒ *(refused)* Try **RTMR-tablet** for key-value workload