ROADMAP.md 8.5 KB

YDB Roadmap

Legend

We use the following symbols as abbreviations:

  1. ㉓ - feature appeared in the Roadmap for 2023;
  2. ㉔ - feature appeared in the Roadmap for 2024;
  3. ✅ - feature has been released;
  4. 🚧 - feature is partially available and is under development;
  5. ❌ - feature has been refused;
  6. 🔥 - not yet released, but we are in rush.

Query Processor

  1. Unique secondary indexes
  2. ㉔ Apply indexes automatically to optimize data fetching
  3. Default values for table columns
  4. Asynchronous LLVM JIT query compilation
  5. Parameters in DECLARE clause are becoming optional, better SQL compatibility
  6. Cost-based optimizer for join order selection
  7. INSERT INTO table FROM SELECT for large datasets
  8. ㉔ Support for transactional writes into both row and column tables
  9. ㉔ Support for computed columns in a table
  10. ㉔ Support for temporary tables
  11. ㉔ Support for VIEW SQL clause
  12. Data Spilling in case there is issufient amount of RAM
  13. TPC-H, TPC-H for 100TB dataset
  14. ✅ ㉓ Support for Snapshot Readonly transactions mode
  15. 🚧 ㉓ Better resource management for KQP Resource Manager (share information about nodes resources, avoid OOMs)
  16. ✅ ㉓ Switch to New Engine for OLTP queries
  17. ✅ ㉓ Support not null for PK (primary key) table columns
  18. ✅ ㉓ Aggregates and predicates push down to column-oriented tables
  19. ✅ ㉓ Optimize data formats for data transition between query phases
  20. ✅ ㉓ Index Rename/Rebuild
  21. ✅ ㉓ KQP Session Actor as a replacement for KQP Worker Actor (optimize to reduce CPU usage)
  22. PostgreSQL compatibility
    • ✅ ㉓ Support PostgreSQL datatypes serialization/deserialization in YDB Public API
    • 🚧 ㉓ PostgreSQL compatible query execution (TPC-C, TPC-H queries should work)
    • ✅ ㉓ Support for PostgreSQL wire protocol
  23. ㉓ Support a single Database connection string instead of multiple parameters
  24. ㉓ Support constraints in query optimizer
  25. Query Processor 3.0 (a set of tasks to be more like traditional database in case of query execution functionality)
    • ㉓ Support for Streaming Lookup Join via MVCC snapshots (avoid distributed transactions, scalability is better)
    • Universal API call for DML, DDL with unlimited results size for OLTP/OLAP workload (aka ExecuteQuery)
    • ✅ ㉓ Support for secondary indexes in ScanQuery
    • ✅ ㉓ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
  26. ✅ ㉓ Computation graphs caching (compute/datashard programs) (optimize CPU usage)
  27. 🚧 ㉓ RPC Deadline & Cancellation propagation (smooth timeout management)
  28. ✅ ㉓ DDL for column-oriented tables

Database Core (Tablets, etc)

  1. Volatile transactions. YDB Distributed transactions 2.0, minimize network round trips in happy path
  2. Table statistics for cost-based optimizer
  3. Memory optimization for row tables (avoid full SST index loading, dynamic cache adjusting)
  4. ㉔ Reduce minimum requirements for the number of cores to 2 for YDB node
  5. Incremental backup and Point-in-time recovery
  6. ALTER CHANGEFEED
  7. Async Replication between YDB databases (column tables, topics)
  8. Async Replication between YDB databases (schema changes)
  9. ㉔ Support for Debezium format
  10. Topics autoscaling (increase/decrease number of partitions in the topic automatically)
  11. Extended Kafka API protocol to YDB Topics support (balance reads, support for v19)
  12. Schema for YDB Topics
  13. Message-level parallelism in YDB Topics
  14. ✅ ㉓ Get YDB topics (aka pers queue, streams) ready for production
  15. ✅ ㉓ Turn on MVCC support by default
  16. ✅ ㉓ Enable Snapshot read mode by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads)
  17. ✅ ㉓ Change Data Capture (be able to get change feed of table updates)
  18. 🔥 ㉓ Async Replication between YDB databases (first version, row tables, w/o schema changes)
  19. ✅ ㉓ Background compaction for DataShards
  20. ✅ ㉓ Compressed Backups. Add functionality to compress backup data
  21. ㉓ Process of Extending State Storage without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck.
  22. Split/Merge DataShards BY LOAD by default. Most users require this feature turned on by default
  23. ✅ ㉓ Support PostgreSQL datatypes in tablet local database
  24. Basic histogram for DataShards (first step towards cost based optimizations)
  25. ✅ ㉓ Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
  26. Data Ingestion from topic to table (implement built-in compatibility to ingest data to YDB tables from topics)
  27. ㉓ Support snapshot read over read replicas (consistent reads against read replicas)
  28. ㉓ 🚧 Transactions between topics and tables
  29. ✅ ㉓ Support for Kafka API compatible protocol to YDB Topics

Hardcore or system wide

  1. Tracing capabilities
  2. ㉔ Automatically balance tablet channels via BlobStorage groups
  3. ✅ ㉓ Datashard iterator reads via MVCC
  4. (refused)Switch to TRope (or don't use TString/std::string directly, provide zero-copy data passing between components)
  5. Avoid Node Broker as SPF (NBS must work without Node Broker under emergency conditions)
  6. Subscriptions in SchemeBoard (optimize interaction with SchemeBoard via subsription to updates)

Security

  1. ✅ ㉓ Basic LDAP Support
  2. ㉔ Support for OpenID Connect
  3. ㉔ Authentication via KeyCloack
  4. ㉔ Support for SASL framework

BlobStorage

  1. ㉔ BlobStorage latency optimization (p999), less CPU consumption
  2. ActorSystem performance optimizations
  3. ㉔ Optimize ActorSystem for ARM processors
  4. Effortless initial cluster deployment (provide only nodes and disks description)
  5. "One leg" storage migration without downtime (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding)
  6. ✅ ㉓ ActorSystem 1.5 (dynamically reassign threads in different thread pools)
  7. ✅ ㉓ Publish an utility for BlobStorage management (it's called ds_tool for now, improve it and open)
  8. Self-heal for degrated BlobStorage groups (automatic self-heal for groups with two broken disks, get VDisk Donors production ready)
  9. BlobDepot (a component for smooth blobs management between groups)
  10. Avoid BSC (BlobStorage Controller) as SPF (be able to run the cluster without BSC in emergency cases)
  11. BSC manages static group (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group)
  12. (Semi-)Hard disk space separation (Better guarantees for disk space usage by VDisks on a single PDisk)
  13. Reduce space amplification (Optimize storage layer)
  14. Storage nodes decommission (Add ability to remove storage nodes)

Analytical Capabilities

  1. Backup for column tables
  2. ㉔ Column tables autosharding
  3. ㉓ 🚧 Log Store (log friendly column-oriented storage which allows to create 1+ million tables for logs storing)
  4. ㉓ 🚧 Column-oriented Tables (introduce a Column-oriented tables in additon to Row-orinted tables)
  5. Tiered Storage for Column-oriented Tables (with the ability to store the data in S3)

Federated Query

  1. ✅ ㉓ Run the first version

Embedded UI

  1. Support for all schema entities
    • YDB Topics (add support for viewing metadata of YDB topics, its data, lag, etc)
    • CDC Streams
    • Secondary Indexes
    • Read Replicas
    • ✅ ㉓ Column-oriented Tables
  2. Basic charts for database monitoring

Command Line Utility

  1. 🚧 ㉓ Use a single ydb yql instead of ydb table query or ydb scripting
  2. ✅ ㉓ Interactive CLI

Tests and Benchmarks

  1. Built-in load test for DataShards in YCSB manner
  2. ✅ ㉓ ydb workload for topics
  3. Jepsen tests support

Experiments

  1. (refused) Try RTMR-tablet for key-value workload