123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147 |
- This directory contains several hash-map implementations, similar in
- API to SGI's hash_map class, but with different performance
- characteristics. sparse_hash_map uses very little space overhead, 1-2
- bits per entry. dense_hash_map is very fast, particulary on lookup.
- (sparse_hash_set and dense_hash_set are the set versions of these
- routines.) On the other hand, these classes have requirements that
- may not make them appropriate for all applications.
- All these implementation use a hashtable with internal quadratic
- probing. This method is space-efficient -- there is no pointer
- overhead -- and time-efficient for good hash functions.
- COMPILING
- ---------
- To compile test applications with these classes, run ./configure
- followed by make. To install these header files on your system, run
- 'make install'. (On Windows, the instructions are different; see
- README_windows.txt.) See INSTALL for more details.
- This code should work on any modern C++ system. It has been tested on
- Linux (Ubuntu, Fedora, RedHat, Debian), Solaris 10 x86, FreeBSD 6.0,
- OS X 10.3 and 10.4, and Windows under both VC++7 and VC++8.
- USING
- -----
- See the html files in the doc directory for small example programs
- that use these classes. It's enough to just include the header file:
- #include <sparsehash/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
- google::sparse_hash_set<int, int> number_mapper;
- and use the class the way you would other hash-map implementations.
- (Though see "API" below for caveats.)
- By default (you can change it via a flag to ./configure), these hash
- implementations are defined in the google namespace.
- API
- ---
- The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
- dense_hash_set, are a superset of the API of SGI's hash_map class.
- See doc/sparse_hash_map.html, et al., for more information about the
- API.
- The usage of these classes differ from SGI's hash_map, and other
- hashtable implementations, in the following major ways:
- 1) dense_hash_map requires you to set aside one key value as the
- 'empty bucket' value, set via the set_empty_key() method. This
- *MUST* be called before you can use the dense_hash_map. It is
- illegal to insert any elements into a dense_hash_map whose key is
- equal to the empty-key.
- 2) For both dense_hash_map and sparse_hash_map, if you wish to delete
- elements from the hashtable, you must set aside a key value as the
- 'deleted bucket' value, set via the set_deleted_key() method. If
- your hash-map is insert-only, there is no need to call this
- method. If you call set_deleted_key(), it is illegal to insert any
- elements into a dense_hash_map or sparse_hash_map whose key is
- equal to the deleted-key.
- 3) These hash-map implementation support I/O. See below.
- There are also some smaller differences:
- 1) The constructor takes an optional argument that specifies the
- number of elements you expect to insert into the hashtable. This
- differs from SGI's hash_map implementation, which takes an optional
- number of buckets.
- 2) erase() does not immediately reclaim memory. As a consequence,
- erase() does not invalidate any iterators, making loops like this
- correct:
- for (it = ht.begin(); it != ht.end(); ++it)
- if (...) ht.erase(it);
- As another consequence, a series of erase() calls can leave your
- hashtable using more memory than it needs to. The hashtable will
- automatically compact at the next call to insert(), but to
- manually compact a hashtable, you can call
- ht.resize(0)
- I/O
- ---
- In addition to the normal hash-map operations, sparse_hash_map can
- read and write hashtables to disk. (dense_hash_map also has the API,
- but it has not yet been implemented, and writes will always fail.)
- In the simplest case, writing a hashtable is as easy as calling two
- methods on the hashtable:
- ht.write_metadata(fp);
- ht.write_nopointer_data(fp);
- Reading in this data is equally simple:
- google::sparse_hash_map<...> ht;
- ht.read_metadata(fp);
- ht.read_nopointer_data(fp);
- The above is sufficient if the key and value do not contain any
- pointers: they are basic C types or agglomorations of basic C types.
- If the key and/or value do contain pointers, you can still store the
- hashtable by replacing write_nopointer_data() with a custom writing
- routine. See sparse_hash_map.html et al. for more information.
- SPARSETABLE
- -----------
- In addition to the hash-map and hash-set classes, this package also
- provides sparsetable.h, an array implementation that uses space
- proportional to the number of elements in the array, rather than the
- maximum element index. It uses very little space overhead: 2 to 5
- bits per entry. See doc/sparsetable.html for the API.
- RESOURCE USAGE
- --------------
- * sparse_hash_map has memory overhead of about 4 to 10 bits per
- hash-map entry, assuming a typical average occupancy of 50%.
- * dense_hash_map has a factor of 2-3 memory overhead: if your
- hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
- total.
- Hashtables tend to double in size when resizing, creating an
- additional 50% space overhead. dense_hash_map does in fact have a
- significant "high water mark" memory use requirement, which is 6 times
- the size of hash entries in the table when resizing (when reaching
- 50% occupancy, the table resizes to double the previous size, and the
- old table (2x) is copied to the new table (4x)).
- sparse_hash_map, however, is written to need very little space
- overhead when resizing: only a few bits per hashtable entry.
- PERFORMANCE
- -----------
- You can compile and run the included file time_hash_map.cc to examine
- the performance of sparse_hash_map, dense_hash_map, and your native
- hash_map implementation on your system. One test against the
- SGI hash_map implementation gave the following timing information for
- a simple find() call:
- SGI hash_map: 22 ns
- dense_hash_map: 13 ns
- sparse_hash_map: 117 ns
- SGI map: 113 ns
- See doc/performance.html for more detailed charts on resource usage
- and performance data.
- ---
- 16 March 2005
- (Last updated: 12 September 2010)
|