YDB Roadmap

Query Processor

Support for Snapshot Readonly transactions mode
Better resource management for KQP Resource Manager (share information about nodes resources, avoid OOMs)
✅ Switch to New Engine for OLTP queries
✅ Support not null for PK (primary key) table columns
Aggregates and predicates push down to column-oriented tables
Optimize data formats for data transition between query phases
Index Rename/Rebuild
KQP Session Actor as a replacement for KQP Worker Actor (optimize to reduce CPU usage)
PostgreSQL compatibility
- Support PostgreSQL datatypes serialization/deserialization in YDB Public API
- PostgreSQL compatible query execution (TPC-C, TPC-H queries should work)
- Support for PostgreSQL wire protocol
Support a single Database connection string instead of multiple parameters
Support constraints in query optimizer
Query Processor 3.0 (a set of tasks to be more like traditional database in case of query execution functionality)
- Support for Streaming Lookup Join via MVCC snapshots (avoid distributed transactions, scalability is better)
- Universal API call for DML, DDL with unlimited results size (aka StreamExecuteQuery, which allows to execute each query)
- Support for secondary indexes in ScanQuery
- Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
Computation graphs caching (compute/datashard programs) (optimize CPU usage)
RPC Deadline & Cancellation propagation (smooth timeout management)
DDL for column-oriented tables

Database Core (Tablets, etc)

✅ Get YDB topics (aka pers queue, streams) ready for production
✅ Turn on MVCC support by default
✅ Enable Snapshot read mode by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads)
✅ Change Data Capture (be able to get change feed of table updates)
Async Replication between YDB databases
✅ Background compaction for DataShards
✅ Compressed Backups. Add functionality to compress backup data
Process of Extending State Storage without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck.
Splite/Merge DataShards BY LOAD by default. Most users require this feature turned on by default
Support PostgreSQL datatypes in tablet local database
Basic histogram for DataShards (first step towards cost based optimizations)
Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
Data Ingestion from topic to table (implement built-in compatibility to ingest data to YDB tables from topics)
Support snapshot read over read replicas (consistent reads against read replicas)
Transactions between topics and tables

Hardcore

Datashard iterator reads via MVCC
Switch to TRope (or don't use TString/std::string directly, provide zero-copy data passing between components)
Avoid Node Broker as SPF (NBS must work without Node Broker under emergency conditions)
Subscriptions in SchemeBoard (optimize interaction with SchemeBoard via subsription to updates)

BlobStorage

"One leg" storage migration without downtime (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding)
ActorSystem 1.5 (dynamically reassign threads in different thread pools)
Publish an utility for BlobStorage management (it's called ds_tool for now, improve it and open)
Self-heal for degrated BlobStorage groups (automatic self-heal for groups with two broken disks, get VDisk Donors production ready)
BlobDepot (a component for smooth blobs management between groups)
Avoid BSC (BlobStorage Controller) as SPF (be able to run the cluster without BSC in emergency cases)
BSC manages static group (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group)
(Semi-)Hard disk space separation (Better guarantees for disk space usage by VDisks on a single PDisk)
Reduce space amplification (Optimize storage layer)
Storage nodes decommission (Add ability to remove storage nodes)

Analytical Capabilities

Log Store (log friendly column-oriented storage which allows to create 1+ million tables for logs storing)
Column-oriented Tables (introduce a Column-oriented tables in additon to Row-orinted tables)
Tiered Storage for Column-oriented Tables (with the ability to store the data in S3)

Federated Query

Run the first version

Embedded UI

Support for all schema entities
- YDB Topics (add support for viewing metadata of YDB topics, its data, lag, etc)
- CDC Streams
- Secondary Indexes
- Read Replicas
- Column-oriented Tables
Basic charts for database monitoring

Command Line Utility

Use a single ydb yql instead of ydb table query or ydb scripting
Interactive CLI

Tests and Benchmarks

Built-in load test for DataShards in YCSB manner
ydb workload for topics
Jepsen tests support

Experiments

Try RTMR-tablet for key-value workload

ROADMAP.md 5.5 KB History Raw

YDB Roadmap

Query Processor

Database Core (Tablets, etc)

Hardcore

BlobStorage

Analytical Capabilities

Federated Query

Embedded UI

Command Line Utility

Tests and Benchmarks

Experiments

ROADMAP.md 5.5 KB

History Raw