Featured image of post Scala Data Collections Performance: Maps and Sets

Scala Data Collections Performance: Maps and Sets

In this post, we try to understand the size and performance of the Sets and Maps data collection. In the first, we review the structure of the Mutable and Immutabla Sets and Mapscollections.

Immutable Maps and Sets Data Collection

Figure 1: Immutable Maps and Sets Data Collection

Mutable Maps Data Collection

Figure 2: Mutable Maps Data Collection

Mutable Sets Data Collection

Figure 3: Mutable Sets Data Collection

A quick review of each collection application may be found in the table below.

Benchmark Codes

The benchmark codes in this section are more similar to the Seq Collection benchmark codes from a previous post. Only the benchmark functions for Sets and Maps are different. The Map benchmark code can be found here.

  def benchmarkMap(x:scala.collection.Map[Int,Int], n:Int, m:Int): Map[String, Double] = {
    Map(
      "volume" -> estimate(x),
      "lookup" -> timeElapsing(x.get(m))(n),
      "add" -> timeElapsing(x ++ Map((m,m)))(n),
      "remove" -> timeElapsing(x-0)(n),
      "min" -> timeElapsing(x.minBy(_._2)._1)(n)
    )
  }

Similar to Map, we define a benchmark function for Set.

  def benchmarkSet(x:scala.collection.Set[Int], n:Int, m:Int): Map[String, Double] = {
    Map(
      "volume" -> estimate(x),
      "lookup" -> timeElapsing(x.contains(m))(n),
      "add" -> timeElapsing(x ++ Map((m,m)))(n),
      "remove" -> timeElapsing(x-0)(n),
      "min" -> timeElapsing(x.min)(n)
    )
  }

In the below code, the definition of each collection can be found.

  val stats: Seq[List[Map[String, String]]] = for(s <- sizes) yield {
    val integers = 0 until s
    List(
      ("Immutable_HashMap", integers.zipWithIndex.toMap),
      ("Immutable_TreeMap", scala.collection.immutable.TreeMap(integers.zipWithIndex:_*)),
      ("Immutable_ListMap",scala.collection.immutable.ListMap(integers.zipWithIndex:_*)),
      ("Mutable_HashMap", scala.collection.mutable.HashMap(integers.zipWithIndex:_*)),
      ("Mutable_WeakHashMap",scala.collection.mutable.WeakHashMap(integers.zipWithIndex:_*))
    ).map(x => {
      Map("size" -> s.toString, "collection" -> x._1) ++ 
        benchmarkMap(x._2, 100, s).map(x => (x._1, x._2.toString))
    }) ++  List(
      ("Immutable_HashSet", integers.toSet),
      ("Immutable_TreeSet", scala.collection.immutable.TreeSet(integers:_*)),
      ("Immutable_BitSet", scala.collection.immutable.BitSet(integers:_*)),
      ("Mutable_HashSet", scala.collection.mutable.HashSet(integers:_*)),
      ("Mutable_BitSet", scala.collection.mutable.BitSet(integers:_*)),
      ("Mutable_TreeSet", scala.collection.mutable.TreeSet(integers:_*))
    ).map(x => {
      Map("size" -> s.toString, "collection" -> x._1) ++ 
        benchmarkSet(x._2, 100, s).map(x => (x._1, x._2.toString))
    })

  }

Object Size in Memory

The benchmark information is now ready. The table below shows the estimated size in bytes of various collections of varied sizes.

The average memory size of each object is calculated and shown below.

Methods Performance

The table below contains a list of the data collection methods used.

Scala collection performance characteristics can be found in the Scala documents. For comparison with the empirical results, the Scala performance table is given below.

The performance results of each method and collection are shown using a scatter plot. We add a regression line to the plot to see the growth rate.

Immutable Collection Methods Performance

Figure 4: Immutable Collection Methods Performance

A similar plot is plotted for mutable collection.

Mutable Collection Methods Performance

Figure 5: Mutable Collection Methods Performance

Built with Hugo
Theme Stack designed by Jimmy