OPTIMISATION.txt 8.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
  1. This document describes configuration options which may help one to generate onions faster.
  2. First of all, default configuration options are tuned for portability, and may be a bit suboptimal.
  3. User is expected to pick optimal settings depending on hardware mkp224o will run on and ammount of filters.
  4. ED25519 implementations:
  5. mkp224o includes multiple implementations of ed25519 code, tuned for different processors.
  6. Implementation is selected at configuration time, when running `./configure` script.
  7. If one already configured/compiled code and wants to change options, just re-run
  8. `./configure` and also run `make clean` to clear compiled files, if any.
  9. Note that options and CFLAGS/LDFLAGS settings won't carry over from previous configure run,
  10. so you have to include options you've previously configured, if you want them to remain.
  11. At the time of writing, these implementations are present:
  12. +----------------+-----------------------+----------------------------------------------------------+
  13. | implementation | enable flag | notes |
  14. |----------------+-----------------------+----------------------------------------------------------+
  15. | ref10 | --enable-ref10 | SUPERCOP' ref10, pure C, very portable, previous default |
  16. | amd64-51-30k | --enable-amd64-51-30k | SUPERCOP' amd64-51-30k, only works on x86_64 |
  17. | amd64-64-24k | --enable-amd64-64-24k | SUPERCOP' amd64-64-24k, only works on x86_64 |
  18. | ed25519-donna | --enable-donna | based on amd64-51-30k, C, portable, current default |
  19. | ed25519-donna | --enable-donna-sse2 | uses SSE2, needs x86 architecture |
  20. +----------------+-----------------------+----------------------------------------------------------+
  21. When to use what:
  22. - on 32-bit x86 architecture "--enable-donna" will probably be fastest, but one should try
  23. using "--enable-donna-sse2" too
  24. - on 64-bit x86 architecture, it really depends on your processor; "--enable-amd64-51-30k"
  25. worked best for me, but you should really benchmark on your own machine
  26. - on ARM "--enable-donna" will probably work best
  27. - otherwise you should benchmark, but "--enable-donna" will probably win
  28. Please note, that these recomendations may become out of date if more implementations
  29. are added in the future; use `./configure --help` to obtain all available options.
  30. When in doubt, benchmark.
  31. Onion filtering settings:
  32. mkp224o supports multiple algorithms and data types for filtering.
  33. Depending on your use case, picking right settings may increase performance.
  34. At the time of writing, mkp224o supports 2 algorithms for filter searching:
  35. sequential and binary search. Sequential search is default, and will probably
  36. be faster with small ammount of filters. If you have lots of filters (lets say >100),
  37. then picking binary search algorithm is the right way.
  38. mkp224o also supports multiple filter types: filters can be represented as integers
  39. instead of being binary strings, and that can allow better compiler's optimizations
  40. and faster code (dealing with fixed-size integers instead of variable-length strings is simpler).
  41. On the other hand, fixed size integers limit length of filters, therefore
  42. binary strings are used by default.
  43. Current options, at the time of writing:
  44. --enable-binsearch enable binary search algoritm; MUCH faster if there
  45. are a lot of filters. by default, if this isn't enabled,
  46. sequential search is used
  47. --enable-intfilter[=(32|64|128|native)]
  48. use integers of specific size (in bits) [default=64]
  49. for filtering. faster but limits filter length to:
  50. 6 for 32-bit, 12 for 64-bit, 24 for 128-bit. by default,
  51. if this option is not enabled, binary strings are used,
  52. which are slower, but not limited in length.
  53. --enable-binfilterlen=VAL
  54. set binary string filter length (if you don't use intfilter).
  55. default is 32 (bytes), which is maximum key length.
  56. this may be useful for decreasing memory usage if you
  57. have a lot of short filters, but then using intfilter
  58. may be better idea.
  59. --enable-besort force intfilter binsearch case to use big endian
  60. sorting and not omit masks from filters; useful if
  61. your filters aren't of same length.
  62. let me elaborate on this one.
  63. by default, when binary search algorithm is used with integer
  64. filters, we actually omit filter masks and use global mask variable,
  65. because otherwise we couldn't reliably use integer comparision operations
  66. combined with per-filter masks, as sorting order there is unclear.
  67. this is because majority of processors we work with are little-endian.
  68. therefore, to achieve proper filtering in case where filters
  69. aren't of same length, we flatten them by inserting more filters.
  70. binary searching should balance increased overhead here to some extent,
  71. but this is definitelly not optimal and can bloat filtering table
  72. very heavily in some cases (for example if there exists say 1-char filter
  73. and 8-char filter, it will try to flatten 1-char filterto 8 chars
  74. and add 32*32*32*32*32*32*32 filters to table which isn't really good).
  75. this option makes us use big-endian way of integer comparision, which isn't
  76. native for current little-endian processors but should still work much better
  77. than binary strings. we also then are able to have proper per-filter masks,
  78. and don't do stupid flattening tricks which may backfire.
  79. TL;DR: its quite good idea to use this if you do "--enable-binsearch --enable-intfilter"
  80. and have some random filters which may have different length.
  81. Batch mode:
  82. mkp224o now includes experimental key generation mode which performs certain operations in batches,
  83. and is around 15 times faster than current default.
  84. It is currently experimental, and is activated by -B run-time flag.
  85. Batched element count is configured by --enable-batchnum=number option at configure time,
  86. increasing or decreasing it may make batch mode faster or slower, depending on hardware.
  87. Benchmarking:
  88. It's always good idea to see if your settings give you desired effect.
  89. There currently isn't any automated way to benchmark different configuration options, but it's pretty simple to do by hand.
  90. For example:
  91. # prepare configuration script
  92. ./autogen.sh
  93. # try default configuration
  94. ./configure
  95. # compile
  96. make
  97. # benchmark implementation speed
  98. ./mkp224o -s -d res1 neko
  99. # wait for a while, copy statistics to some text editor
  100. ^C # stop experiment when you've collected enough data
  101. # try with different settings now
  102. ./configure --enable-amd64-64-24k --enable-intfilter
  103. # clean old compiled files
  104. make clean
  105. # recompile
  106. make
  107. # benchmark again
  108. ./mkp224o -s -d res2 neko
  109. # wait for a while, copy statistics to some text editor
  110. ^C # stop experiment when you've collected enough data
  111. # configure again, make clean, make, run test again.......
  112. # until you've got enough data to make decisions
  113. when benchmarking filtering settings, remember to actually use filter files you're going to work with.
  114. What options I use:
  115. For my lappy with old-ish i5 I do `./configure --enable-amd64-51-30k --enable-intfilter` incase I want single onion,
  116. and `./configure --enable-amd64-51-30k --enable-intfilter --enable-binsearch --enable-besort` when playing with dictionaries.
  117. For my raspberry pi 2, `./configure --enable-donna --enable-intfilter`
  118. (and also +=" --enable-binsearch --enable-besort" for dictionaries).