README 5.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147
  1. This directory contains several hash-map implementations, similar in
  2. API to SGI's hash_map class, but with different performance
  3. characteristics. sparse_hash_map uses very little space overhead, 1-2
  4. bits per entry. dense_hash_map is very fast, particulary on lookup.
  5. (sparse_hash_set and dense_hash_set are the set versions of these
  6. routines.) On the other hand, these classes have requirements that
  7. may not make them appropriate for all applications.
  8. All these implementation use a hashtable with internal quadratic
  9. probing. This method is space-efficient -- there is no pointer
  10. overhead -- and time-efficient for good hash functions.
  11. COMPILING
  12. ---------
  13. To compile test applications with these classes, run ./configure
  14. followed by make. To install these header files on your system, run
  15. 'make install'. (On Windows, the instructions are different; see
  16. README_windows.txt.) See INSTALL for more details.
  17. This code should work on any modern C++ system. It has been tested on
  18. Linux (Ubuntu, Fedora, RedHat, Debian), Solaris 10 x86, FreeBSD 6.0,
  19. OS X 10.3 and 10.4, and Windows under both VC++7 and VC++8.
  20. USING
  21. -----
  22. See the html files in the doc directory for small example programs
  23. that use these classes. It's enough to just include the header file:
  24. #include <sparsehash/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
  25. google::sparse_hash_set<int, int> number_mapper;
  26. and use the class the way you would other hash-map implementations.
  27. (Though see "API" below for caveats.)
  28. By default (you can change it via a flag to ./configure), these hash
  29. implementations are defined in the google namespace.
  30. API
  31. ---
  32. The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
  33. dense_hash_set, are a superset of the API of SGI's hash_map class.
  34. See doc/sparse_hash_map.html, et al., for more information about the
  35. API.
  36. The usage of these classes differ from SGI's hash_map, and other
  37. hashtable implementations, in the following major ways:
  38. 1) dense_hash_map requires you to set aside one key value as the
  39. 'empty bucket' value, set via the set_empty_key() method. This
  40. *MUST* be called before you can use the dense_hash_map. It is
  41. illegal to insert any elements into a dense_hash_map whose key is
  42. equal to the empty-key.
  43. 2) For both dense_hash_map and sparse_hash_map, if you wish to delete
  44. elements from the hashtable, you must set aside a key value as the
  45. 'deleted bucket' value, set via the set_deleted_key() method. If
  46. your hash-map is insert-only, there is no need to call this
  47. method. If you call set_deleted_key(), it is illegal to insert any
  48. elements into a dense_hash_map or sparse_hash_map whose key is
  49. equal to the deleted-key.
  50. 3) These hash-map implementation support I/O. See below.
  51. There are also some smaller differences:
  52. 1) The constructor takes an optional argument that specifies the
  53. number of elements you expect to insert into the hashtable. This
  54. differs from SGI's hash_map implementation, which takes an optional
  55. number of buckets.
  56. 2) erase() does not immediately reclaim memory. As a consequence,
  57. erase() does not invalidate any iterators, making loops like this
  58. correct:
  59. for (it = ht.begin(); it != ht.end(); ++it)
  60. if (...) ht.erase(it);
  61. As another consequence, a series of erase() calls can leave your
  62. hashtable using more memory than it needs to. The hashtable will
  63. automatically compact at the next call to insert(), but to
  64. manually compact a hashtable, you can call
  65. ht.resize(0)
  66. I/O
  67. ---
  68. In addition to the normal hash-map operations, sparse_hash_map can
  69. read and write hashtables to disk. (dense_hash_map also has the API,
  70. but it has not yet been implemented, and writes will always fail.)
  71. In the simplest case, writing a hashtable is as easy as calling two
  72. methods on the hashtable:
  73. ht.write_metadata(fp);
  74. ht.write_nopointer_data(fp);
  75. Reading in this data is equally simple:
  76. google::sparse_hash_map<...> ht;
  77. ht.read_metadata(fp);
  78. ht.read_nopointer_data(fp);
  79. The above is sufficient if the key and value do not contain any
  80. pointers: they are basic C types or agglomorations of basic C types.
  81. If the key and/or value do contain pointers, you can still store the
  82. hashtable by replacing write_nopointer_data() with a custom writing
  83. routine. See sparse_hash_map.html et al. for more information.
  84. SPARSETABLE
  85. -----------
  86. In addition to the hash-map and hash-set classes, this package also
  87. provides sparsetable.h, an array implementation that uses space
  88. proportional to the number of elements in the array, rather than the
  89. maximum element index. It uses very little space overhead: 2 to 5
  90. bits per entry. See doc/sparsetable.html for the API.
  91. RESOURCE USAGE
  92. --------------
  93. * sparse_hash_map has memory overhead of about 4 to 10 bits per
  94. hash-map entry, assuming a typical average occupancy of 50%.
  95. * dense_hash_map has a factor of 2-3 memory overhead: if your
  96. hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
  97. total.
  98. Hashtables tend to double in size when resizing, creating an
  99. additional 50% space overhead. dense_hash_map does in fact have a
  100. significant "high water mark" memory use requirement, which is 6 times
  101. the size of hash entries in the table when resizing (when reaching
  102. 50% occupancy, the table resizes to double the previous size, and the
  103. old table (2x) is copied to the new table (4x)).
  104. sparse_hash_map, however, is written to need very little space
  105. overhead when resizing: only a few bits per hashtable entry.
  106. PERFORMANCE
  107. -----------
  108. You can compile and run the included file time_hash_map.cc to examine
  109. the performance of sparse_hash_map, dense_hash_map, and your native
  110. hash_map implementation on your system. One test against the
  111. SGI hash_map implementation gave the following timing information for
  112. a simple find() call:
  113. SGI hash_map: 22 ns
  114. dense_hash_map: 13 ns
  115. sparse_hash_map: 117 ns
  116. SGI map: 113 ns
  117. See doc/performance.html for more detailed charts on resource usage
  118. and performance data.
  119. ---
  120. 16 March 2005
  121. (Last updated: 12 September 2010)