This is mainly a reply to a Twitter conversation I’ve been having with Ken Fogel, regarding the performance of various List implementations. But it’s of general interest to Java programmers, and probably also to others participating in the conversation, so I’m posting it publicly here.
Ken Fogel (Twitter: @omniprof) has posted some Java Collections Performance Benchmarks and is using the results to help guide what he teaches his students about data structures. Unfortunately, there are some issues with the benchmarks, which probably lead to misleading results.
Our conversation has mainly been about the java.util.LinkedList class, particularly compared to ArrayList. I’ll therefore focus on those, leaving aside the other classes benchmarked there.
Here’s a screenshot from the benchmark application:
Ken’s instructions were to click the buttons several times in order to warm up the JIT. The numbers bounced around a bit but I think the ones displayed here are representative. At first glance the numbers seem reasonable: element access in an ArrayList is uniform, and access to the ends of a LinkedList is fast whereas access to the middle is slow. The insert/erase numbers for LinkedList are similar.
Something strange is going on with the insert/erase numbers for ArrayList. (The benchmark actually just does insertion.) Inserting at the beginning is the worst case, since the entire array has to be shifted down one position. Inserting in the middle should take about half that time. Inserting at the end should be cheap, since usually no elements need to be shifted. What’s going on?
Here’s the code for that benchmark:
// Insert at end runningTotal = 0; for (int x = 0; x < REPETITIONS; ++x) { arrayListX = new ArrayList<String>(arrayList); startTime = System.nanoTime(); arrayListX.add("Dawson College"); endTime = System.nanoTime() - startTime; runningTotal += endTime; }
There are three issues that I want to discuss here.
The first issue is in regard to general issues with Java benchmarking. The JVM’s JIT can easily invalidate benchmark results if you’re not careful. One of the most common problems is dead code elimination. A full discussion of these issues is provided in this article. In general, I strongly recommend writing benchmarks using JMH, the Java (Micro)benchmark Harness. It tries very hard to prevent the JIT from invalidating your benchmark, and it’s also quite rigorous at measuring and collecting data.
The second issue is about System.nanoTime(). The code here is careful to try to measure only the actual work, setting aside loop overhead and the setup time of creating the ArrayList that’s to be modified. Unfortunately, System.nanoTime() isn’t sufficiently precise to be used this way. While it reports its results in units of nanoseconds, it’s not necessarily precise enough to measure things with a granularity of nanoseconds. For a full explanation of the issues with nanoTime(), see Nanotrusting the Nanotime, an article by Oracle performance guru Alexey Shipilëv (who is also the author of JMH). The upshot is that you can’t use nanoTime() to measure an operation that runs very quickly. For example, for the quickest operations, this benchmark reports results of around 36-38ns, whereas the JMH-reported results are around 3ns. (See below.)
The third issue is with the workload the code presents. The list is to be modified, so the pre-populated arrayList is first copied to arrayListX. Then, an element is added to arrayListX; that’s the operation that’s measured. This seems OK, but there’s more going on under the covers.
The ArrayList class stores its elements in an internal array. If the array fills up, it’s grown by 50%. Thus, the typical case is for an ArrayList to have excess capacity at the tail end of the array. Arrays in Java cannot be resized, so “growing” an array really means allocating a new array and copying the elements into it. Naturally, this can be expensive. The benchmark populates arrayList by creating it with the default length of 10 and then adding 10,000 elements. The internal array ends up being reallocated a bunch of times, and it ends up with a length of 14,053. But all of this allocation and copying happens outside the benchmark timing.
When an ArrayList is created using its copy constructor, the length of its internal array is exactly the same as the number of elements. So the new list arrayListX has an internal array length of 10,000 and it’s fully populated. When an element is added, an array of length 15,000 is created and the 10,000 elements are copied into it, along with the one added element. But arrayListX is then thrown away and a new instance is created, again with length 10,000. And again, the array is reallocated and copied when the element is added. This happens every time through the benchmark loop.
This is the worst possible case for ArrayList. The 50% growth policy is intended to allow amortization of the allocate-and-copy cost over a large number of operations, but the way this benchmark is written, no such amortization occurs. This doesn’t seem representative of ArrayList‘s performance. This also explains why insertion at the beginning and middle have about the same time as insertion at the end. I don’t know whether Ken intended for the benchmark to exercise the worst case instead of the average case. At least, this should be clarified, and an alternative benchmark provided. (For example, I changed the benchmark to remove an element instead of inserting one, and the numbers were very different.)
Benchmarks Recast into JMH
JMH-based benchmark code is here: gist
I’ve taken some of Ken’s benchmarks and have recast them into JMH. I benchmarked ArrayDeque (AD), ArrayList (AL), and LinkedList (LL). I modified Ken’s insertion benchmark so that the same list is modified every time. This avoids the copying issue I described above. However, since we’re operating on the same list, we have to remove the element that was inserted. I’m calling the result an “edit” operation (although it’s really two operations). I also copied his “access” benchmark. Finally, like Ken’s benchmarks, I also did operations on the first, middle, and last elements, with the exception of ArrayDeque which doesn’t support operations in the middle.
Ken’s benchmark also populated the lists using random words from a dictionary. I found that this caused a large amount of variability in some of the benchmarks, probably because of interference from garbage collection. I removed the dictionary and instead populated the lists with empty strings. This didn’t make any difference in the average results, but it reduced the variability considerably.
I’ve created a gist with the code for my JMH-based benchmark. It’s pretty straightforward as JMH benchmarks go. One thing of note is that I’ve arranged all of the benchmark methods to return a value. This isn’t strictly necessary, but it’s good benchmarking practice. If you don’t return a value, the JIT has the potential to declare the entire method to be dead code (if, for example, it can determine the method has no side effects), and optimize it away entirely. JMH consumes method return values in a such a way as to prevent the JIT from doing that.
My system is a 2014 MacBook Pro (MacBookPro11,1) with a dual-core Intel i7 running at 3 GHz. I’m running JDK 8u65. The benchmark results are as follows:
Benchmark (SIZE) Mode Samples Score Score error Units AD_accessFirst 10000 avgt 5 2.923 0.078 ns/op AD_accessLast 10000 avgt 5 2.979 0.176 ns/op AD_editFirst 10000 avgt 5 8.995 0.150 ns/op AD_editLast 10000 avgt 5 5.223 0.295 ns/op AL_accessFirst 10000 avgt 5 2.945 0.159 ns/op AL_accessLast 10000 avgt 5 2.964 0.255 ns/op AL_accessMiddle 10000 avgt 5 2.932 0.114 ns/op AL_editFirst 10000 avgt 5 1898.086 100.413 ns/op AL_editLast 10000 avgt 5 20.434 2.096 ns/op AL_editMiddle 10000 avgt 5 893.348 31.074 ns/op LL_accessFirst 10000 avgt 5 2.901 0.150 ns/op LL_accessLast 10000 avgt 5 3.001 0.312 ns/op LL_accessMiddle 10000 avgt 5 8645.194 358.769 ns/op LL_editFirst 10000 avgt 5 8.261 0.373 ns/op LL_editLast 10000 avgt 5 10.703 2.873 ns/op LL_editMiddleIndx 10000 avgt 5 17075.179 730.738 ns/op LL_editMiddleIter 10000 avgt 5 8273.984 345.155 ns/op
We can see that many of the results are as expected. Access to the ends of ArrayDeque and LinkedList is quite fast, and access to any ArrayList element is uniformly fast. Access to the middle of the LinkedList is very slow. Editing the ends of the ArrayDeque and LinkedList is also quite fast. Editing the last element of the ArrayList is pretty fast. Editing the middle is somewhat slow, and editing the front is twice as slow. This makes sense, because inserting or deleting elements requires a lot of copying. The closer to the front of the list the edit occurs, the more copying is required, so the slower it gets.
Something funny is going on with editing in the middle of the LinkedList. One benchmark, LL_editMiddleIndx, is exceptionally slow. What does it do? The benchmark has a variable “mid” which is the middle index in the list; it’s SIZE / 2. The obvious code to edit the middle of a list is this:
@Benchmark public String LL_editMiddleIndx() { linkedList.add(mid, "Dawson College"); return linkedList.remove(mid); }
This is a terrible way to edit a linked list, because it has to traverse to the middle of the list to do the insertion, and it has to traverse to the middle again in order to do the deletion. It turns out there’s a way to keep a cursor to a location in the middle of a list, using the ListIterator class. The LL_editMiddleIter benchmark does this:
@Benchmark public String LL_editMiddleIter() { ListIterator<String> iter = linkedList.listIterator(mid); iter.add("Dawson College"); String r = iter.previous(); iter.remove(); return r; }
It’s slightly odd because we have to call previous() to cue up the element for removal. However, as you can see from the benchmark results, this performs only one traversal instead of two, making it twice as fast as the LL_editMiddleIndx method, and it’s in line with LL_accessMiddle.
Summary & Observations
I and a few other folks on Twitter have been making the case against LinkedList. For example, at one point I said that “LinkedList is much less useful than people think.” (tweet) I think the benchmark results I’ve presented here bear this out.
Looking at just these operations on ArrayList and LinkedList, there are clear tradeoffs. Element access in ArrayList is faster, but editing potentially involves copying O(n) elements, which can be expensive. Many people claim that LinkedList has the advantage for editing operations because they are O(1). This is true, but only at the ends of the list, or if you already have the location in the list where you want to do the editing. (Such as with a ListIterator.) If you have to search for the location, or if the index is somewhere in the middle, you have to pay the traversal cost to get to that location.
The kicker here is that traversing through a LinkedList is considerably more expensive than copying the elements of an ArrayList. If we assume that the edit location is uniformly distributed within the list, the average cost of element copying in an ArrayList is around 0.9µs. For a LinkedList, however, the average cost of traversing to a location is over 4µs. There are clearly workloads where a LinkedList will outperform an ArrayList. However, in many cases, the traversal cost of the LinkedList is so much more expensive than the copying cost of the ArrayList, it more than offsets LinkedList‘s O(1) advantage in editing operations.
That’s why I claim that ArrayList is usually preferable, and LinkedList should (almost) never be used.
Leave a Reply