YDB Roadmap

Legend

We use the following symbols as abbreviations:

㉓ - feature appeared in the Roadmap for 2023;
㉔ - feature appeared in the Roadmap for 2024;
✅ - feature has been released;
🚧 - feature is partially available and is under development;
❌ - feature has been refused;
🔥 - not yet released, but we are in rush.

Query Processor

㉔ Unique secondary indexes
㉔ Apply indexes automatically to optimize data fetching
㉔ Default values for table columns
㉔ Asynchronous LLVM JIT query compilation
㉔ Parameters in DECLARE clause are becoming optional, better SQL compatibility
㉔ Cost-based optimizer for join order selection
㉔ INSERT INTO table FROM SELECT for large datasets
㉔ Support for transactional writes into both row and column tables
㉔ Support for computed columns in a table
㉔ Support for temporary tables
㉔ Support for VIEW SQL clause
㉔ Data Spilling in case there is issufient amount of RAM
㉔ TPC-H, TPC-H for 100TB dataset
✅ ㉓ Support for Snapshot Readonly transactions mode
🚧 ㉓ Better resource management for KQP Resource Manager (share information about nodes resources, avoid OOMs)
✅ ㉓ Switch to New Engine for OLTP queries
✅ ㉓ Support not null for PK (primary key) table columns
✅ ㉓ Aggregates and predicates push down to column-oriented tables
✅ ㉓ Optimize data formats for data transition between query phases
✅ ㉓ Index Rename/Rebuild
✅ ㉓ KQP Session Actor as a replacement for KQP Worker Actor (optimize to reduce CPU usage)
PostgreSQL compatibility
- ✅ ㉓ Support PostgreSQL datatypes serialization/deserialization in YDB Public API
- 🚧 ㉓ PostgreSQL compatible query execution (TPC-C, TPC-H queries should work)
- ✅ ㉓ Support for PostgreSQL wire protocol
㉓ Support a single Database connection string instead of multiple parameters
㉓ Support constraints in query optimizer
Query Processor 3.0 (a set of tasks to be more like traditional database in case of query execution functionality)
- ㉓ Support for Streaming Lookup Join via MVCC snapshots (avoid distributed transactions, scalability is better)
- ㉓ Universal API call for DML, DDL with unlimited results size for OLTP/OLAP workload (aka ExecuteQuery)
- ✅ ㉓ Support for secondary indexes in ScanQuery
- ✅ ㉓ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
✅ ㉓ Computation graphs caching (compute/datashard programs) (optimize CPU usage)
🚧 ㉓ RPC Deadline & Cancellation propagation (smooth timeout management)
✅ ㉓ DDL for column-oriented tables

Database Core (Tablets, etc)

㉔ Volatile transactions. YDB Distributed transactions 2.0, minimize network round trips in happy path
㉔ Table statistics for cost-based optimizer
㉔ Memory optimization for row tables (avoid full SST index loading, dynamic cache adjusting)
㉔ Reduce minimum requirements for the number of cores to 2 for YDB node
㉔ Incremental backup and Point-in-time recovery
㉔ ALTER CHANGEFEED
㉔ Async Replication between YDB databases (column tables, topics)
㉔ Async Replication between YDB databases (schema changes)
㉔ Support for Debezium format
㉔ Topics autoscaling (increase/decrease number of partitions in the topic automatically)
㉔ Extended Kafka API protocol to YDB Topics support (balance reads, support for v19)
㉔ Schema for YDB Topics
㉔ Message-level parallelism in YDB Topics
✅ ㉓ Get YDB topics (aka pers queue, streams) ready for production
✅ ㉓ Turn on MVCC support by default
✅ ㉓ Enable Snapshot read mode by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads)
✅ ㉓ Change Data Capture (be able to get change feed of table updates)
🔥 ㉓ Async Replication between YDB databases (first version, row tables, w/o schema changes)
✅ ㉓ Background compaction for DataShards
✅ ㉓ Compressed Backups. Add functionality to compress backup data
㉓ Process of Extending State Storage without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck.
Split/Merge DataShards BY LOAD by default. Most users require this feature turned on by default
✅ ㉓ Support PostgreSQL datatypes in tablet local database
Basic histogram for DataShards (first step towards cost based optimizations)
✅ ㉓ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
㉓ Data Ingestion from topic to table (implement built-in compatibility to ingest data to YDB tables from topics)
㉓ Support snapshot read over read replicas (consistent reads against read replicas)
㉓ 🚧 Transactions between topics and tables
✅ ㉓ Support for Kafka API compatible protocol to YDB Topics

Hardcore or system wide

㉔ Tracing capabilities
㉔ Automatically balance tablet channels via BlobStorage groups
✅ ㉓ Datashard iterator reads via MVCC
❌ (refused) ㉓ Switch to TRope (or don't use TString/std::string directly, provide zero-copy data passing between components)
㉓ Avoid Node Broker as SPF (NBS must work without Node Broker under emergency conditions)
㉓ Subscriptions in SchemeBoard (optimize interaction with SchemeBoard via subsription to updates)

Security

✅ ㉓ Basic LDAP Support
㉔ Support for OpenID Connect
㉔ Authentication via KeyCloack
㉔ Support for SASL framework

BlobStorage

㉔ BlobStorage latency optimization (p999), less CPU consumption
㉔ ActorSystem performance optimizations
㉔ Optimize ActorSystem for ARM processors
㉔ Effortless initial cluster deployment (provide only nodes and disks description)
㉓ "One leg" storage migration without downtime (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding)
✅ ㉓ ActorSystem 1.5 (dynamically reassign threads in different thread pools)
✅ ㉓ Publish an utility for BlobStorage management (it's called ds_tool for now, improve it and open)
㉓ Self-heal for degrated BlobStorage groups (automatic self-heal for groups with two broken disks, get VDisk Donors production ready)
㉓ BlobDepot (a component for smooth blobs management between groups)
㉓ Avoid BSC (BlobStorage Controller) as SPF (be able to run the cluster without BSC in emergency cases)
㉓ BSC manages static group (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group)
㉓ (Semi-)Hard disk space separation (Better guarantees for disk space usage by VDisks on a single PDisk)
㉓ Reduce space amplification (Optimize storage layer)
㉓ Storage nodes decommission (Add ability to remove storage nodes)

Analytical Capabilities

㉔ Backup for column tables
㉔ Column tables autosharding
㉓ 🚧 Log Store (log friendly column-oriented storage which allows to create 1+ million tables for logs storing)
㉓ 🚧 Column-oriented Tables (introduce a Column-oriented tables in additon to Row-orinted tables)
㉓ Tiered Storage for Column-oriented Tables (with the ability to store the data in S3)

Federated Query

✅ ㉓ Run the first version

Embedded UI

Support for all schema entities
- ㉓ YDB Topics (add support for viewing metadata of YDB topics, its data, lag, etc)
- ㉓ CDC Streams
- ㉓ Secondary Indexes
- ㉓ Read Replicas
- ✅ ㉓ Column-oriented Tables
㉓ Basic charts for database monitoring

Command Line Utility

🚧 ㉓ Use a single ydb yql instead of ydb table query or ydb scripting
✅ ㉓ Interactive CLI

Tests and Benchmarks

㉓ Built-in load test for DataShards in YCSB manner
✅ ㉓ ydb workload for topics
Jepsen tests support

Experiments

❌ (refused) Try RTMR-tablet for key-value workload

ROADMAP.md 8.5 KB Постоянная ссылка История Исходник

YDB Roadmap

Legend

Query Processor

Database Core (Tablets, etc)

Hardcore or system wide

Security

BlobStorage

Analytical Capabilities

Federated Query

Embedded UI

Command Line Utility

Tests and Benchmarks

Experiments

ROADMAP.md 8.5 KB

Постоянная ссылка История Исходник