SLO is the type of test where app based on ydb-sdk is tested against falling YDB cluster nodes, tablets, network (that is possible situations for distributed DBs with hundreds of nodes)
It has 3 commands:
create
- creates table in databasecleanup
- drops table in databaserun
- runs workload (read and write to table with sets RPS)create:
$APP create grpcs://ydb.cool.example.com:2135 /some/folder -t tableName
-min-partitions-count 6 -max-partitions-count 1000 -partition-size 1 -с 1000
-write-timeout 10000
cleanup:
$APP cleanup grpcs://ydb.cool.example.com:2135 /some/folder -t tableName
run:
$APP create run grpcs://ydb.cool.example.com:2135 /some/folder -t tableName
-prom-pgw http://prometheus-pushgateway:9091 -report-period 250
-read-rps 1000 -read-timeout 10000
-write-rps 100 -write-timeout 10000
-time 600 -shutdown-time 30
$APP create <endpoint> <db> [options]
Arguments:
endpoint YDB endpoint to connect to
db YDB database to connect to
Options:
-t -table-name <string> table name to create
-min-partitions-count <int> minimum amount of partitions in table
-max-partitions-count <int> maximum amount of partitions in table
-partition-size <int> partition size in mb
-c -initial-data-count <int> amount of initially created rows
-write-timeout <int> write timeout milliseconds
$APP cleanup <endpoint> <db> [options]
Arguments:
endpoint YDB endpoint to connect to
db YDB database to connect to
Options:
-t -table-name <string> table name to create
-write-timeout <int> write timeout milliseconds
$APP run <endpoint> <db> [options]
Arguments:
endpoint YDB endpoint to connect to
db YDB database to connect to
Options:
-t -table-name <string> table name to create
-initial-data-count <int> amount of initially created rows
-prom-pgw <string> prometheus push gateway
-report-period <int> prometheus push period in milliseconds
-read-rps <int> read RPS
-read-timeout <int> read timeout milliseconds
-write-rps <int> write RPS
-write-timeout <int> write timeout milliseconds
-time <int> run time in seconds
-shutdown-time <int> graceful shutdown time in seconds
Workload using anonymous credentials.
When running run
command, the program creates three jobs: readJob
, writeJob
, metricsJob
.
readJob
reads rows from the table one by one with random identifiers generated by writeJobwriteJob
generates and inserts rowsmetricsJob
periodically sends metrics to PrometheusTable have these fields:
hash Uint64 Digest::NumericHash(id)
id Uint64
payload_double Double
payload_hash Uint64
payload_str UTF8
payload_timestamp Timestamp
Primary key: ("hash", "id")
oks
- amount of OK requestsnot_oks
- amount of not OK requestsinflight
- amount of requests in flightlatency
- summary of latencies in msattempts
- summary of amount for requesterror
- amount of errorsquery_latency
- summary of latencies in ms in queryYou must reset metrics to keep them
0
in prometheus and grafana before beginning and after ending of jobs
In php
it looks like that:
$pushGateway->delete('workload-php', [
'sdk' => 'php',
'sdkVersion' => Ydb::VERSION
]);
You can get dashboard used in that test here - you will need to import json into grafana.