In this post, we try to understand the size and performance of the Sets and Maps data collection. In the first, we review the structure of the Mutable and Immutabla Sets and Mapscollections.

Figure 1: Immutable Maps and Sets Data Collection

Figure 2: Mutable Maps Data Collection

Figure 3: Mutable Sets Data Collection

A quick review of each collection application may be found in the table below.

Table 1: Maps and Sets Data Collections
Collection	description
HashSet	A concrete implementation of Set semantics is HashSet. The element's hashCode will be used as a key in the HashSet, allowing for a quick lookup of the element's value. HashSet has immutable and mutable type
HashMap	The Scala Collection includes mutable and immutable HashMap. It's used to save and return a map of elements. A HashMap is a Hash Table data structure that stores a collection of key and value pairs. It implements Map in its most basic form.
TreeSet	A set is a data structure that allows us to store distinct components. The Set does not provide element ordering, but a TreeSet will create items in a specific order. TreeSet includes two types in Scala: scala.collection.mutable.TreeSet and scala.collection.immutable.TreeSet.
TreeMap	TreeMap is useful when performing range queries or traversing in order, whereas the map does not keep order. If you only need key lookups and don't care in which order key-values are traversed, Map will suffice, which will generally have better performance.
BitSet	Bitsets are collections of non-negative integers that are expressed as 64-bit words with variable-size arrays of bits. The greatest number stored in a bitset determines its memory footprint. There are two versions of BitSet in Scala: scala.collection.immutable.BitSet and scala.collection.mutable.BitSet.
ListMap	A ListMap is a collection of key and value pairs where the keys are backed by a List data structure. ListMap collections are used only for a small number of elements.
WeakHashMap	A weak hash map is a special kind of hash map where the garbage collector does not follow links from the map to the keys stored in it. This means that a key and its associated value will disappear from the map if there is no other reference to that key. Weak hash maps are useful for tasks such as caching, where you want to re-use an expensive function's result if the function is called again on the same key.

Benchmark Codes

The benchmark codes in this section are more similar to the Seq Collection benchmark codes from a previous post. Only the benchmark functions for Sets and Maps are different. The Map benchmark code can be found here.

  def benchmarkMap(x:scala.collection.Map[Int,Int], n:Int, m:Int): Map[String, Double] = {
    Map(
      "volume" -> estimate(x),
      "lookup" -> timeElapsing(x.get(m))(n),
      "add" -> timeElapsing(x ++ Map((m,m)))(n),
      "remove" -> timeElapsing(x-0)(n),
      "min" -> timeElapsing(x.minBy(_._2)._1)(n)
    )
  }

Similar to Map, we define a benchmark function for Set.

  def benchmarkSet(x:scala.collection.Set[Int], n:Int, m:Int): Map[String, Double] = {
    Map(
      "volume" -> estimate(x),
      "lookup" -> timeElapsing(x.contains(m))(n),
      "add" -> timeElapsing(x ++ Map((m,m)))(n),
      "remove" -> timeElapsing(x-0)(n),
      "min" -> timeElapsing(x.min)(n)
    )
  }

In the below code, the definition of each collection can be found.

  val stats: Seq[List[Map[String, String]]] = for(s <- sizes) yield {
    val integers = 0 until s
    List(
      ("Immutable_HashMap", integers.zipWithIndex.toMap),
      ("Immutable_TreeMap", scala.collection.immutable.TreeMap(integers.zipWithIndex:_*)),
      ("Immutable_ListMap",scala.collection.immutable.ListMap(integers.zipWithIndex:_*)),
      ("Mutable_HashMap", scala.collection.mutable.HashMap(integers.zipWithIndex:_*)),
      ("Mutable_WeakHashMap",scala.collection.mutable.WeakHashMap(integers.zipWithIndex:_*))
    ).map(x => {
      Map("size" -> s.toString, "collection" -> x._1) ++ 
        benchmarkMap(x._2, 100, s).map(x => (x._1, x._2.toString))
    }) ++  List(
      ("Immutable_HashSet", integers.toSet),
      ("Immutable_TreeSet", scala.collection.immutable.TreeSet(integers:_*)),
      ("Immutable_BitSet", scala.collection.immutable.BitSet(integers:_*)),
      ("Mutable_HashSet", scala.collection.mutable.HashSet(integers:_*)),
      ("Mutable_BitSet", scala.collection.mutable.BitSet(integers:_*)),
      ("Mutable_TreeSet", scala.collection.mutable.TreeSet(integers:_*))
    ).map(x => {
      Map("size" -> s.toString, "collection" -> x._1) ++ 
        benchmarkSet(x._2, 100, s).map(x => (x._1, x._2.toString))
    })

  }

Object Size in Memory

The benchmark information is now ready. The table below shows the estimated size in bytes of various collections of varied sizes.

Table 2: Estimated Size of Scala Collections[Int] In Different Size (in bytes)
Immutability	Collection	1	16	256	4,096	65,536	1,048,576
Immutable	HashMap	40	1,136	24,304	430,384	7,050,528	111,206,912
HashSet	32	744	14,184	235,944	3,906,968	60,877,432
ListMap	56	656	12,304	227,344	3,667,984	58,718,224
TreeMap	88	808	14,376	260,136	4,192,296	67,106,856
TreeSet	104	824	12,344	196,664	3,145,784	50,331,704
Mutable	HashMap	160	824	14,328	268,848	4,412,704	63,585,176
HashSet	200	568	7,848	112,568	2,097,144	32,883,416
TreeSet	120	960	14,400	229,440	3,670,080	58,720,320
WeakHashMap	344	1,248	17,856	259,784	98,542,768	67,101,968

The average memory size of each object is calculated and shown below.

Methods Performance

The table below contains a list of the data collection methods used.

Table 3: Operations That Are Tested on Maps ans Sets
operations	description
lookup	Testing whether an element is contained in set, or selecting a value associated with a key.
add	Adding a new element to a set or key/value pair to a map.
remove	Removing an element from a set or a key from a map.
min	The smallest element of the set, or the smallest key of a map.

Scala collection performance characteristics can be found in the Scala documents. For comparison with the empirical results, the Scala performance table is given below.

Table 4: Performance Characteristics of Immutable Maps and Sets
Immutability	Collection	lookup	add	remove	min
Immutable	HashSet/HashMap	eC	eC	eC	L
TreeSet/TreeMap	Log	Log	Log	Log
BitSet	C	L	L	eC1
ListMap	L	L	L	L
Mutable	HashSet/HashMap	eC	eC	eC	L
WeakHashMap	eC	eC	eC	L
BitSet	C	aC	C	eC
TreeSet	Log	Log	Log	Log

The performance results of each method and collection are shown using a scatter plot. We add a regression line to the plot to see the growth rate.

Figure 4: Immutable Collection Methods Performance

A similar plot is plotted for mutable collection.

Figure 5: Mutable Collection Methods Performance

Scala Data Collections Performance: Maps and Sets

Benchmark Codes

Object Size in Memory

Methods Performance

Refrences