SLO workload

SLO is the type of test where app based on ydb-sdk is tested against falling YDB cluster nodes, tablets, network (that is possible situations for distributed DBs with hundreds of nodes)

Usage:

It has 3 commands:

create - creates table in database
cleanup - drops table in database
run - runs workload (read and write to table with sets RPS)

Run examples with all arguments:

create:

$APP create grpcs://ydb.cool.example.com:2135 /some/folder -t tableName -min-partitions-count 6 -max-partitions-count 1000 -partition-size 1 -с 1000 -write-timeout 10000

cleanup:

$APP cleanup grpcs://ydb.cool.example.com:2135 /some/folder -t tableName

run:

$APP create run grpcs://ydb.cool.example.com:2135 /some/folder -t tableName -prom-pgw http://prometheus-pushgateway:9091 -report-period 250 -read-rps 1000 -read-timeout 10000 -write-rps 100 -write-timeout 10000 -time 600 -shutdown-time 30

Arguments for commands:

create

$APP create <endpoint> <db> [options]

Arguments:
  endpoint                        YDB endpoint to connect to
  db                              YDB database to connect to

Options:
  -t -table-name         <string> table name to create

  -min-partitions-count  <int>    minimum amount of partitions in table
  -max-partitions-count  <int>    maximum amount of partitions in table
  -partition-size        <int>    partition size in mb

  -c -initial-data-count <int>    amount of initially created rows

  -write-timeout         <int>    write timeout milliseconds

cleanup

$APP cleanup <endpoint> <db> [options]

Arguments:
  endpoint                        YDB endpoint to connect to
  db                              YDB database to connect to

Options:
  -t -table-name         <string> table name to create

  -write-timeout         <int>    write timeout milliseconds

run

$APP run <endpoint> <db> [options]

Arguments:
  endpoint                        YDB endpoint to connect to
  db                              YDB database to connect to

Options:
  -t -table-name         <string> table name to create

  -initial-data-count    <int>    amount of initially created rows

  -prom-pgw              <string> prometheus push gateway
  -report-period         <int>    prometheus push period in milliseconds

  -read-rps              <int>    read RPS
  -read-timeout          <int>    read timeout milliseconds

  -write-rps             <int>    write RPS
  -write-timeout         <int>    write timeout milliseconds

  -time                  <int>    run time in seconds
  -shutdown-time         <int>    graceful shutdown time in seconds

Authentication

Workload using anonymous credentials.

What's inside

When running run command, the program creates three jobs: readJob, writeJob, metricsJob.

readJob reads rows from the table one by one with random identifiers generated by writeJob
writeJob generates and inserts rows
metricsJob periodically sends metrics to Prometheus

Table have these fields:

hash Uint64 Digest::NumericHash(id)
id Uint64
payload_double Double
payload_hash Uint64
payload_str UTF8
payload_timestamp Timestamp

Primary key: ("hash", "id")

Collected metrics

oks - amount of OK requests
not_oks - amount of not OK requests
inflight - amount of requests in flight
latency - summary of latencies in ms
attempts - summary of amount for request
error - amount of errors
query_latency - summary of latencies in ms in query

You must reset metrics to keep them 0 in prometheus and grafana before beginning and after ending of jobs

In php it looks like that:

$pushGateway->delete('workload-php', [
    'sdk' => 'php',
    'sdkVersion' => Ydb::VERSION
]);

Look at metrics in grafana

You can get dashboard used in that test here - you will need to import json into grafana.

README.MD 4.0 KB Постоянная ссылка История Исходник