sobota, 3 kwietnia 2010

STL: map with string as key - access speedup

The idea is quite simple: we do not have a single stl::map<string, something>, but vector of maps, indexed with O(1) time - each map store keys sharing certain properties. Drawback: additional memory.

I've tested following grouping schemes:
  1. length of string,
  2. first letter of string (one level trie),
  3. both length and first letter.
Third is the fastest - around 60% faster then plain std::map from gcc (red-black tree).

Tests: my program read plain text (I've used The Illiad from http://gutenberg.org), text is splitted into words (~190000) and then each words is inserted into dictonary (~28000 distinct words); then the same words are searched in dictionaries. Table below summarizes results on my computer (gcc 4.3.4 from cygwin).

+-------------+--------------------------------+--------------------------------+
|             |          running time [ms]     |            speedup [%]         |
| data struct +----------+----------+----------+----------+----------+----------+
|             |    min   |   avg    |   max    |   min    |   avg    |   max    |
+-------------+----------+----------+----------+----------+----------+----------+
|                                          inserting                            |
+-------------+----------+----------+----------+----------+----------+----------+
|  std::map   |    269   |   287    |   355    |   100    |   100    |   100    |
+-------------+----------+----------+----------+----------+----------+----------+
| first char  |    218   |   241    |   395    |    81    |    84    |   111    |
+-------------+----------+----------+----------+----------+----------+----------+
|   length    |    218   |   240    |   345    |    81    |    84    |    97    |
+-------------+----------+----------+----------+----------+----------+----------+
| len./char   |    165   |   172    |   207    |    61    |    60    |    58    |
+-------------+----------+----------+----------+----------+----------+----------+
|                                            finding                            |
+-------------+----------+----------+----------+----------+----------+----------+
|  std::map   |    295   |   322    |   483    |   100    |   100    |   100    |
+-------------+----------+----------+----------+----------+----------+----------+
| first char  |    243   |   263    |   460    |    82    |    82    |    95    |
+-------------+----------+----------+----------+----------+----------+----------+
|   length    |    238   |   248    |   292    |    80    |    77    |    60    |
+-------------+----------+----------+----------+----------+----------+----------+
| len./char   |    184   |   190    |   241    |    62    |    60    |    50    |
+-------------+----------+----------+----------+----------+----------+----------+

Download test program.