In this post, we try to understand the size and performance of the Sets
and Maps
data collection. In the first, we review the structure of the Mutable and Immutabla Sets
and Maps
collections.
A quick review of each collection application may be found in the table below.
Collection | description |
HashSet | A concrete implementation of Set semantics is HashSet. The element's hashCode will be used as a key in the HashSet, allowing for a quick lookup of the element's value. HashSet has immutable and mutable type |
HashMap | The Scala Collection includes mutable and immutable HashMap. It's used to save and return a map of elements. A HashMap is a Hash Table data structure that stores a collection of key and value pairs. It implements Map in its most basic form. |
TreeSet | A set is a data structure that allows us to store distinct components. The Set does not provide element ordering, but a TreeSet will create items in a specific order. TreeSet includes two types in Scala: scala.collection.mutable.TreeSet and scala.collection.immutable.TreeSet. |
TreeMap | TreeMap is useful when performing range queries or traversing in order, whereas the map does not keep order. If you only need key lookups and don't care in which order key-values are traversed, Map will suffice, which will generally have better performance. |
BitSet | Bitsets are collections of non-negative integers that are expressed as 64-bit words with variable-size arrays of bits. The greatest number stored in a bitset determines its memory footprint. There are two versions of BitSet in Scala: scala.collection.immutable.BitSet and scala.collection.mutable.BitSet. |
ListMap | A ListMap is a collection of key and value pairs where the keys are backed by a List data structure. ListMap collections are used only for a small number of elements. |
WeakHashMap | A weak hash map is a special kind of hash map where the garbage collector does not follow links from the map to the keys stored in it. This means that a key and its associated value will disappear from the map if there is no other reference to that key. Weak hash maps are useful for tasks such as caching, where you want to re-use an expensive function's result if the function is called again on the same key. |
Benchmark Codes
The benchmark codes in this section are more similar to the Seq
Collection benchmark codes from a previous post. Only the benchmark functions for Sets
and Maps
are different.
The Map
benchmark code can be found here.
def benchmarkMap(x:scala.collection.Map[Int,Int], n:Int, m:Int): Map[String, Double] = {
Map(
"volume" -> estimate(x),
"lookup" -> timeElapsing(x.get(m))(n),
"add" -> timeElapsing(x ++ Map((m,m)))(n),
"remove" -> timeElapsing(x-0)(n),
"min" -> timeElapsing(x.minBy(_._2)._1)(n)
)
}
Similar to Map
, we define a benchmark function for Set
.
def benchmarkSet(x:scala.collection.Set[Int], n:Int, m:Int): Map[String, Double] = {
Map(
"volume" -> estimate(x),
"lookup" -> timeElapsing(x.contains(m))(n),
"add" -> timeElapsing(x ++ Map((m,m)))(n),
"remove" -> timeElapsing(x-0)(n),
"min" -> timeElapsing(x.min)(n)
)
}
In the below code, the definition of each collection can be found.
val stats: Seq[List[Map[String, String]]] = for(s <- sizes) yield {
val integers = 0 until s
List(
("Immutable_HashMap", integers.zipWithIndex.toMap),
("Immutable_TreeMap", scala.collection.immutable.TreeMap(integers.zipWithIndex:_*)),
("Immutable_ListMap",scala.collection.immutable.ListMap(integers.zipWithIndex:_*)),
("Mutable_HashMap", scala.collection.mutable.HashMap(integers.zipWithIndex:_*)),
("Mutable_WeakHashMap",scala.collection.mutable.WeakHashMap(integers.zipWithIndex:_*))
).map(x => {
Map("size" -> s.toString, "collection" -> x._1) ++
benchmarkMap(x._2, 100, s).map(x => (x._1, x._2.toString))
}) ++ List(
("Immutable_HashSet", integers.toSet),
("Immutable_TreeSet", scala.collection.immutable.TreeSet(integers:_*)),
("Immutable_BitSet", scala.collection.immutable.BitSet(integers:_*)),
("Mutable_HashSet", scala.collection.mutable.HashSet(integers:_*)),
("Mutable_BitSet", scala.collection.mutable.BitSet(integers:_*)),
("Mutable_TreeSet", scala.collection.mutable.TreeSet(integers:_*))
).map(x => {
Map("size" -> s.toString, "collection" -> x._1) ++
benchmarkSet(x._2, 100, s).map(x => (x._1, x._2.toString))
})
}
Object Size in Memory
The benchmark information is now ready. The table below shows the estimated size in bytes of various collections of varied sizes.
Immutability | Collection | 1 | 16 | 256 | 4,096 | 65,536 | 1,048,576 |
Immutable | HashMap | 40 | 1,136 | 24,304 | 430,384 | 7,050,528 | 111,206,912 |
HashSet | 32 | 744 | 14,184 | 235,944 | 3,906,968 | 60,877,432 | |
ListMap | 56 | 656 | 12,304 | 227,344 | 3,667,984 | 58,718,224 | |
TreeMap | 88 | 808 | 14,376 | 260,136 | 4,192,296 | 67,106,856 | |
TreeSet | 104 | 824 | 12,344 | 196,664 | 3,145,784 | 50,331,704 | |
Mutable | HashMap | 160 | 824 | 14,328 | 268,848 | 4,412,704 | 63,585,176 |
HashSet | 200 | 568 | 7,848 | 112,568 | 2,097,144 | 32,883,416 | |
TreeSet | 120 | 960 | 14,400 | 229,440 | 3,670,080 | 58,720,320 | |
WeakHashMap | 344 | 1,248 | 17,856 | 259,784 | 98,542,768 | 67,101,968 |
The average memory size of each object is calculated and shown below.
Methods Performance
The table below contains a list of the data collection methods used.
operations | description |
lookup | Testing whether an element is contained in set, or selecting a value associated with a key. |
add | Adding a new element to a set or key/value pair to a map. |
remove | Removing an element from a set or a key from a map. |
min | The smallest element of the set, or the smallest key of a map. |
Scala collection performance characteristics can be found in the Scala documents. For comparison with the empirical results, the Scala performance table is given below.
Immutability | Collection | lookup | add | remove | min |
Immutable | HashSet/HashMap | eC | eC | eC | L |
TreeSet/TreeMap | Log | Log | Log | Log | |
BitSet | C | L | L | eC1 | |
ListMap | L | L | L | L | |
Mutable | HashSet/HashMap | eC | eC | eC | L |
WeakHashMap | eC | eC | eC | L | |
BitSet | C | aC | C | eC | |
TreeSet | Log | Log | Log | Log |
The performance results of each method and collection are shown using a scatter plot. We add a regression line to the plot to see the growth rate.
A similar plot is plotted for mutable collection.