Feeds:
Posts

## Some Java List Benchmarks

This is mainly a reply to a Twitter conversation I’ve been having with Ken Fogel, regarding the performance of various List implementations. But it’s of general interest to Java programmers, and probably also to others participating in the conversation, so I’m posting it publicly here.

Ken Fogel (Twitter: @omniprof) has posted some Java Collections Performance Benchmarks and is using the results to help guide what he teaches his students about data structures. Unfortunately, there are some issues with the benchmarks, which probably lead to misleading results.

Our conversation has mainly been about the java.util.LinkedList class, particularly compared to ArrayList. I’ll therefore focus on those, leaving aside the other classes benchmarked there.

Here’s a screenshot from the benchmark application:

Ken’s instructions were to click the buttons several times in order to warm up the JIT. The numbers bounced around a bit but I think the ones displayed here are representative. At first glance the numbers seem reasonable: element access in an ArrayList is uniform, and access to the ends of a LinkedList is fast whereas access to the middle is slow. The insert/erase numbers for LinkedList are similar.

Something strange is going on with the insert/erase numbers for ArrayList. (The benchmark actually just does insertion.) Inserting at the beginning is the worst case, since the entire array has to be shifted down one position. Inserting in the middle should take about half that time. Inserting at the end should be cheap, since usually no elements need to be shifted. What’s going on?

Here’s the code for that benchmark:

// Insert at end
runningTotal = 0;
for (int x = 0; x < REPETITIONS; ++x) {
arrayListX = new ArrayList<String>(arrayList);
startTime = System.nanoTime();
endTime = System.nanoTime() - startTime;
runningTotal += endTime;
}

There are three issues that I want to discuss here.

The first issue is in regard to general issues with Java benchmarking. The JVM’s JIT can easily invalidate benchmark results if you’re not careful. One of the most common problems is dead code elimination. A full discussion of these issues is provided in this article. In general, I strongly recommend writing benchmarks using JMH, the Java (Micro)benchmark Harness. It tries very hard to prevent the JIT from invalidating your benchmark, and it’s also quite rigorous at measuring and collecting data.

The second issue is about System.nanoTime(). The code here is careful to try to measure only the actual work, setting aside loop overhead and the setup time of creating the ArrayList that’s to be modified. Unfortunately, System.nanoTime() isn’t sufficiently precise to be used this way. While it reports its results in units of nanoseconds, it’s not necessarily precise enough to measure things with a granularity of nanoseconds. For a full explanation of the issues with nanoTime(), see Nanotrusting the Nanotime, an article by Oracle performance guru Alexey Shipilëv (who is also the author of JMH). The upshot is that you can’t use nanoTime() to measure an  operation that runs very quickly. For example, for the quickest operations, this benchmark reports results of around 36-38ns, whereas the JMH-reported results are around 3ns. (See below.)

The third issue is with the workload the code presents. The list is to be modified, so the pre-populated arrayList is first copied to arrayListX. Then, an element is added to arrayListX; that’s the operation that’s measured. This seems OK, but there’s more going on under the covers.

The ArrayList class stores its elements in an internal array. If the array fills up, it’s grown by 50%. Thus, the typical case is for an ArrayList to have excess capacity at the tail end of the array. Arrays in Java cannot be resized, so “growing” an array really means allocating a new array and copying the elements into it. Naturally, this can be expensive. The benchmark populates arrayList by creating it with the default length of 10 and then adding 10,000 elements. The internal array ends up being reallocated a bunch of times, and it ends up with a length of 14,053. But all of this allocation and copying happens outside the benchmark timing.

When an ArrayList is created using its copy constructor, the length of its internal array is exactly the same as the number of elements. So the new list arrayListX has an internal array length of 10,000 and it’s fully populated. When an element is added, an array of length 15,000 is created and the 10,000 elements are copied into it, along with the one added element. But arrayListX is then thrown away and a new instance is created, again with length 10,000. And again, the array is reallocated and copied when the element is added. This happens every time through the benchmark loop.

This is the worst possible case for ArrayList. The 50% growth policy is intended to allow amortization of the allocate-and-copy cost over a large number of operations, but the way this benchmark is written, no such amortization occurs. This doesn’t seem representative of ArrayList‘s performance. This also explains why insertion at the beginning and middle have about the same time as insertion at the end. I don’t know whether Ken intended for the benchmark to exercise the worst case instead of the average case. At least, this should be clarified, and an alternative benchmark provided. (For example, I changed the benchmark to remove an element instead of inserting one, and the numbers were very different.)

### Benchmarks Recast into JMH

JMH-based benchmark code is here: gist

I’ve taken some of Ken’s benchmarks and have recast them into JMH. I benchmarked ArrayDeque (AD), ArrayList (AL), and LinkedList (LL). I modified Ken’s insertion benchmark so that the same list is modified every time. This avoids the copying issue I described above. However, since we’re operating on the same list, we have to remove the element that was inserted. I’m calling the result an “edit” operation (although it’s really two operations). I also copied his “access” benchmark. Finally, like Ken’s benchmarks, I also did operations on the first, middle, and last elements, with the exception of ArrayDeque which doesn’t support operations in the middle.

Ken’s benchmark also populated the lists using random words from a dictionary. I found that this caused a large amount of variability in some of the benchmarks, probably because of interference from garbage collection. I removed the dictionary and instead populated the lists with empty strings. This didn’t make any difference in the average results, but it reduced the variability considerably.

I’ve created a gist with the code for my JMH-based benchmark. It’s pretty straightforward as JMH benchmarks go. One thing of note is that I’ve arranged all of the benchmark methods to return a value. This isn’t strictly necessary, but it’s good benchmarking practice. If you don’t return a value, the JIT has the potential to declare the entire method to be dead code (if, for example, it can determine the method has no side effects), and optimize it away entirely. JMH consumes method return values in a such a way as to prevent the JIT from doing that.

My system is a 2014 MacBook Pro (MacBookPro11,1) with a dual-core Intel i7 running at 3 GHz. I’m running JDK 8u65. The benchmark results are as follows:

Benchmark            (SIZE)  Mode  Samples      Score  Score error  Units
AD_accessFirst        10000  avgt        5      2.923        0.078  ns/op
AD_accessLast         10000  avgt        5      2.979        0.176  ns/op
AD_editFirst          10000  avgt        5      8.995        0.150  ns/op
AD_editLast           10000  avgt        5      5.223        0.295  ns/op
AL_accessFirst        10000  avgt        5      2.945        0.159  ns/op
AL_accessLast         10000  avgt        5      2.964        0.255  ns/op
AL_accessMiddle       10000  avgt        5      2.932        0.114  ns/op
AL_editFirst          10000  avgt        5   1898.086      100.413  ns/op
AL_editLast           10000  avgt        5     20.434        2.096  ns/op
AL_editMiddle         10000  avgt        5    893.348       31.074  ns/op
LL_accessFirst        10000  avgt        5      2.901        0.150  ns/op
LL_accessLast         10000  avgt        5      3.001        0.312  ns/op
LL_accessMiddle       10000  avgt        5   8645.194      358.769  ns/op
LL_editFirst          10000  avgt        5      8.261        0.373  ns/op
LL_editLast           10000  avgt        5     10.703        2.873  ns/op
LL_editMiddleIndx     10000  avgt        5  17075.179      730.738  ns/op
LL_editMiddleIter     10000  avgt        5   8273.984      345.155  ns/op

We can see that many of the results are as expected. Access to the ends of ArrayDeque and LinkedList is quite fast, and access to any ArrayList element is uniformly fast. Access to the middle of the LinkedList is very slow. Editing the ends of the ArrayDeque and LinkedList is also quite fast. Editing the last element of the ArrayList is pretty fast. Editing the middle is somewhat slow, and editing the front is twice as slow. This makes sense, because inserting or deleting elements requires a lot of copying. The closer to the front of the list the edit occurs, the more copying is required, so the slower it gets.

Something funny is going on with editing in the middle of the LinkedList. One benchmark, LL_editMiddleIndx, is exceptionally slow. What does it do? The benchmark has a variable “mid” which is the middle index in the list; it’s SIZE / 2. The obvious code to edit the middle of a list is this:

@Benchmark
public String LL_editMiddleIndx() {
}

This is a terrible way to edit a linked list, because it has to traverse to the middle of the list to do the insertion, and it has to traverse to the middle again in order to do the deletion. It turns out there’s a way to keep a cursor to a location in the middle of a list, using the ListIterator class. The LL_editMiddleIter benchmark does this:

@Benchmark
public String LL_editMiddleIter() {
String r = iter.previous();
iter.remove();
return r;
}

It’s slightly odd because we have to call previous() to cue up the element for removal. However, as you can see from the benchmark results, this performs only one traversal instead of two, making it twice as fast as the  LL_editMiddleIndx method, and it’s in line with LL_accessMiddle.

### Summary & Observations

I and a few other folks on Twitter have been making the case against LinkedList. For example, at one point I said that “LinkedList is much less useful than people think.” (tweet) I think the benchmark results I’ve presented here bear this out.

Looking at just these operations on ArrayList and LinkedList, there are clear tradeoffs. Element access in ArrayList is faster, but editing potentially involves copying O(n) elements, which can be expensive. Many people claim that LinkedList has the advantage for editing operations because they are O(1). This is true, but only at the ends of the list, or if you already have the location in the list where you want to do the editing. (Such as with a ListIterator.) If you have to search for the location, or if the index is somewhere in the middle, you have to pay the traversal cost to get to that location.

The kicker here is that traversing through a LinkedList is considerably more expensive than copying the elements of an ArrayList. If we assume that the edit location is uniformly distributed within the list, the average cost of element copying in an ArrayList is around 0.9µs. For a LinkedList, however, the average cost of traversing to a location is over 4µs. There are clearly workloads where a LinkedList will outperform an ArrayList. However, in many cases, the traversal cost of the LinkedList is so much more expensive than the copying cost of the ArrayList, it more than offsets LinkedList‘s O(1) advantage in editing operations.

That’s why I claim that ArrayList is usually preferable, and LinkedList should (almost) never be used.

## Devoxx Antwerp 2015

I’m presenting one session at the Devoxx conference in Antwerp, at 15:10 on Wednesday 11 November 2015, in room 5. Here is the slide deck:

## My JavaOne 2015 Talks, Plus Recommendations

Hi everybody, JavaOne 2015 is already underway. For some reason my talks are all concentrated toward the end of the conference this year; in fact, three are on Wednesday! Here’s my talk schedule:

♦ API Design with Java 8 Lambdas and Streams [CON6851]
(with Brian Goetz)
Wed 2015-10-28 – 8:30am
Hilton Continental Ballroom 5
Slides: CON6851-API-Design-v2 (PDF)
Video: https://youtu.be/o10ETyiNIsM?t=24m (61 minutes)

♦ New Tricks for Old Dogs: Collections Enhancements in Java 8 [CON7432]
(with Mike Duigou)
Wed 2015-10-28 – 11:30am
Hilton Continental Ballroom 1/2/3
Slides: CON7432-Marks-CollectionsNewTricks-v3 (PDF)
See also JEP 269, “Convenience Factory Methods for Collections” (JDK 9 work-in-progress)

♦ Saving the Future from the Past: Innovations in Deprecation [CON6856]
(presented by Dr Deprecator)
Wed 2015-10-28 – 3:00pm
Hilton Continental Ballroom 5
Slides: CON6856-Marks-Deprecation-v3 (PDF)
Video: https://youtu.be/o10ETyiNIsM?t=6h54m41s (61 minutes)
News flash! JEP 277 “Enhanced Deprecation” has been posted.

♦ 20 Years of APIs: A Retrospective [CON6891]
Thu 2015-10-29 – 9:00am
Hilton Continental Ballroom 5
Slides: CON6891-Marks-API-Retrospective-v2 (PDF)
Video: https://youtu.be/0KlJSNb8GZU?t=26m25s (61 minutes)

Sorry, there’s no lambda tutorial (“Jump-Starting Lambda”) this year, nor is there a Lambda Hands on Lab. This is most unfortunate. I was planning to work with Simon Ritter (Twitter: @speakjava) on those sessions this year, with Simon taking the lead. Unfortunately, Simon was laid off from Oracle just a few weeks ago, leaving no time to rearrange the program or to find someone else to work on them. There are a number of Lambda and Streams talks that I can recommend, however:

♦ Programming with Lambdas [CON8366]
Venkat Subramaniam
Mon 2015-10-26 – 4:00pm
Hilton Continental Ballroom 5
Video: https://youtu.be/8RhwmJlZQgs?t=7h54m50s

♦ Journey’s End: Collection and Reduction in the Stream API [TUT5906]
Maurice Naftalin
Tue 2015-10-27 – 8:30am
Hilton Continental Ballroom 4

♦ Streams: the Real Powerhouse in Java 8 [CON8367]
Venkat Subramaniam
Tue 2015-10-27 – 11:00am
Hilton Continental Ballroom 4

♦ Effective Java Streams [CON7066]
Paul Sandoz
Tue 2015-10-27 – 2:30pm
Hilton Continental Ballroom 5
Video: https://youtu.be/iHHSa39p48I?t=6h15m55s

♦ Shooting the Rapids: Maximizing Performance of Java 8 Streams [CON5931]
Maurice Naftalin & Kirk Pepperdine
Wed 2015-10-28 – 3:00pm
Hilton Continental Ballroom 4

Enjoy the conference!

## Java Day Tokyo 2014 and JJUG CCC 2014 Spring

Since this year’s Java Day Tokyo 2015 is about to happen, I figure I should post my article about last year’s event. Unfortunately I won’t be able to attend this year. But last year I traveled to Japan for Java Day Tokyo 2014 and for a Japan Java User Group event. The trip was  packed with events. I brought my family along with me, and fortunately we did have a couple days to travel around Tokyo to relax and do some sightseeing.

### JJUG CCC 2014 Spring, 18 May

The first event was the JJUG CCC Spring 2014 (Japan Java Users Group, Cross-Community Conference). This is a twice-per-year gathering of several JUGs from around Japan where they stage a full-day conference. It turned out that I was one of the keynote speakers! I was told there were over 300 people attending, making it one of the biggest JJUG events ever. Wow, I’m honored.

My presentation was Overview of Java 8 Lambda and Streams, which covered not only those topics but also default methods and method references. That’s a lot to cover, and I couldn’t go very fast because I had to pause after every sentence for consecutive translation. Still, people said they enjoyed the presentation and that they found it helpful.

Here are some pictures Yuichi Sakuraba took at the event. (He seems to be the designated conference photographer in Japan, when he’s not busy taking pictures of food.)

(photo: Yuichi Sakuraba, 2014-05-18, CC BY-NC 2.0, original on Flickr)

(photo: Yuichi Sakuraba, 2014-05-18, CC BY-NC 2.0, original on Flickr)

Yuichi has posted a Flickr photo set of the entire event, including a few more of me.

### Java Day Tokyo, 22 May 2014

This was the main event. It was jam packed with sessions, including a set of keynotes in the morning, and five tracks in parallel in the afternoon. Here’s the agenda, and here are slides and videos from the subset of sessions that were recorded. I had two sessions in the afternoon: the first on Java 8 Lambdas,  and the second on Java 8’s new Streams API. Here are some pictures I took during the keynotes.

Nandini Ramani (former VP, Oracle Java Platform Group) and Shin Ishiguro (NEC) showing off the NEC PaPeRo robot featuring Embedded Java SE:

Stephen Chin and Cameron Purdy demonstrating the Lego Duke balancing on two wheels:

That evening after a full day of sessions, there was a two hour “Ask the Experts” panel and I was on the panel. David Buck (Oracle JVM Sustaining) was pressed into service doing consecutive translation in both directions between the audience and the panelists. I think he did quite well considering he’s not a professional translator.

Not surprisingly (as Java 8 had just been released) most of the questions were about Lambdas and Streams. There were some pretty good questions. One question asked about some details of how lambdas are implemented. I replied that I’d try to be brief and hold my remarks to under half an hour. That got a laugh out of the audience (a Japanese audience — a first for me!). David did pretty well translating my answer, until I got to the part about the “lambda metafactory.” I’m not the real expert at this, though. Brian Goetz is, and he’s given a talk called Lambda: A Peek Under The Hood that explains the lambda implementation in great detail.

The following day, (not officially part of the conference) we had a hands-on lab in the Oracle offices where we let participants try their hand at a set of exercises that can be solved using Java 8 Lambdas and Streams.  This is similar to labs we’ve had at JavaOne and Devoxx and other conferences:

Like most labs, after a brief introduction, most of the participants went heads-down and worked steadily on the problems. They must have been pretty good problems, since most people were still working on them when we ran out of time!

I’m sad to be missing this year’s Japan event. Make sure you go if you get a chance. It looks like it’ll be as good if not better than last year’s!

## Math. It Works, Bitches.

That was what I tweeted the other day. Of course, it’s an homage to this famous xkcd cartoon. The tweet was a link to this Stack Overflow question and my answer. This article provides a bit more background and describes a little adventure I had computing the probability I gave in the answer.

The background is that the original poster (OP) of the question was generating a million data objects and was storing them in a TreeSet. But the TreeSet ended up with only 975,000 elements in it. The obvious reason that there would be fewer elements in a set than were added is that some of the data objects are duplicates. Somebody asked about this, and the OP said the chance of generating duplicate data objects was “minuscule.” To somebody like me, this is like waving a red flag in front of a bull, so I had to go investigate. (You know, Duty Calls.)

I estimated that there was a possible space of 18,000,000 data objects, and the OP was generating 1,000,000 of them at random. What’s the possibility of there being at least one pair of duplicates among the generated objects? This is a variation of the Birthday Problem, also known as the Birthday Paradox. It’s not really a paradox. It is, however, quite counterintuitive how quickly the probability approaches certainty that there will be a duplicate as the number of trials increases.

Briefly, the birthday problem is, given a certain number of people, what’s the probability that two will have the same birthday? The probability reaches 50% at only 23 people, and at 70 people it has risen above 99.9%. Most people find this pretty surprising. Certainly the OP did; given 1,000,000 generated objects out of a space of 18,000,000, the probability is not minuscule at all, but is in fact astonishingly close to 100%.

It’s actually a bit easier to talk about the probability of there not being a duplicate, that is, the probability of the choices all being unique, so I’ll talk about that. (Of course, the probability of the choices being unique is simply 1.0 minus the probability of a collision.) The Wikipedia article gives a couple formulas for computing this probability. One is an approximation:

$\displaystyle \left(\frac{d - 1}{d}\right)^{n(n-1)/2}$

The second is a product involving a large number of factors:

$\displaystyle \prod\limits_{k=1}^{n-1}(1 - \textstyle\frac{k}{d})$

In both formulas, d is the number of possible values in the domain, and n is the number of elements chosen. Naturally, the closed form approximation involves many fewer computations, so let’s start with that one.

What should we use to do the computation? Well I’m an old Unix hack, so I immediately reached for the venerable bc program. First let’s try some of the cases from the original birthday problem to see if we have this right (bold italic text is the program’s output):

$bc -l (364/365)^(23*22/2) .49952284596341798480 (364/365)^(70*69/2) .00132609259546606814 These are only approximations, but they seem about right. Let’s try the exact computations: p=1.0 for (k=1; k<23; k++) p *= (1 - k/365) p .49270276567601459277 p=1.0 for (k=1; k<70; k++) p *= (1 - k/365) p .00084042403484290862 The result for 23 people matches the figure given in the Wikipedia article (at least, to six decimal places) so it seems accurate. Great! Now let’s try the real problem. d=18000000 n=1000000 ((d-1)/d)^(n*(n-1)/2) Runtime error (func=(main), adr=19): exponent too large in raise Hm, that didn’t work. If the power operator isn’t working, let’s try the old trick of taking the logarithm, multiplying, and then exponentiating: e(l((d-1)/d)*n*(n-1)/2) I let this run at 100% CPU for five minutes and I didn’t get any output. I don’t know whether it was an infinite loop or what, but it certainly didn’t seem promising. All right then, let’s just try the exact computation: p=1.0 for (k=1; k<n; k++) p *= (d-k)/d p 0 Zero. Crap, underflow. The probabilities get pretty small, so I guess I shouldn’t be surprised. Let’s try Java instead. static void doubleApprox() { double d = 18_000_000.0; double n = 1_000_000.0; System.out.println(Math.pow((d-1.0)/d, n*(n-1.0)/2.0)); } 0.0 Underflow again. At least it ran quickly instead of looping infinitely. Let’s try the exact computation: static void doubleProduct() { int d = 18_000_000; int n = 1_000_000; double p = 1.0; for (int k = 1; k < n; k++) { p *= (double)(d - k) / (double)d; } System.out.println(p); } 4.4E-323 Aha! Now we’re getting somewhere. I put this into the initial version of my answer and declared it done. ## ∞ But there were a couple suspicious things nagging me about this result. First, the exponent of -323 seemed awfully familiar. Second, there are only two digits of precision. Usually a floating point double gives about 17 digits. It turns out that this result is very close to Double.MIN_VALUE, which is about 4.9E-324. When the numbers are this small, they are denormalized. As they get smaller, they have fewer and fewer digits of precision. With such a huge loss of precision, continued multiplication by a fraction such as (d – k) / d becomes highly inaccurate. It turns out that this result of 4.4E-323 is incredibly inaccurate. (In fact, as we’ll see later, it’s off by ten thousand of orders of magnitude.) In order to combat the underflow problem, I put a little hack into the loop to scale up the partial product by 10 until it was above 1.0. That should keep the values well within range, so we avoid precision loss. Of course, I kept track of the number of times I scaled by 10. It’s negative because scaling up by 10 means a negative exponent. (I have no idea whether this is acceptable numeric computation practice, but it seemed to work out in the end.) Here’s the code to do that, and the result. static void doubleScaled() { int d = 18_000_000; int n = 1_000_000; int scale = 0; double p = 1.0; for (int k = 1; k < n; k++) { p *= (double)(d - k) / (double)d; while (p < 1.0) { p *= 10.0; scale--; } } System.out.printf("%11.9fE%d%n", p, scale); } 2.843374644E-12294 Ten to the minus twelve thousandth? Nah, that can’t be right. Can it? I wasn’t sure how to verify this, so I talked to my friend and colleague Joe Darcy (blog, twitter). He suggested I use the floating-point mode of Java’s BigDecimal. Floating point? I thought BigDecimal only supported fixed-point arithmetic. In fact, there are variations of the BigDecimal operations that take a MathContext object, and if you set it up properly, it will perform floating point decimal arithmetic. Cool! Joe also mentioned that when used in this mode, BigDecimal stores its exponent as an int, so this should help avoid underflow. Let’s try out the approximation first: static void bdApprox() { int id = 18_000_000; int n = 1_000_000; MathContext mc = new MathContext(10, RoundingMode.HALF_EVEN); BigDecimal d = new BigDecimal(id, mc); BigDecimal base = d.subtract(BigDecimal.ONE, mc).divide(d, mc); BigDecimal result = base.pow(n * (n - 1) / 2, mc); System.out.println(result); } 622319181.9 WAT. This is totally wrong. Has Joe led me astray? Well, no. It turns out that BigDecimal.pow() takes an int argument as the exponent, and given n = 1,000,000 this clearly overflows an int. Whoops. All right then, let’s just go straight to the exact product computation: static void bdExact() { int id = 18_000_000; int n = 1_000_000; MathContext mc = new MathContext(20, RoundingMode.HALF_EVEN); BigDecimal prob = new BigDecimal(1, mc); BigDecimal d = new BigDecimal(id, mc); for (int k = 1; k < n; k++) { BigDecimal num = new BigDecimal(id - k, mc); prob = prob.multiply(num, mc) .divide(d, mc); } System.out.println(prob); } 2.8433746444606670057E-12294 Whoa. Look at that: the same answer, to as many significant digits as I printed out from the scaled double precision computation. That’s a pretty amazing number. The probability of choosing 1,000,000 unique, random values from a space of 18,000,000 is ten to the minus fricken’ twelve thousand. That’s what I call minuscule. And it totally explains why the Stack Overflow poster was getting duplicates. Math. It works, bitches. And BigDecimal too. ## Writing Stateful Stream Operations The distinct() stream operation compares the stream’s elements using Object.equals(). That is, for any set of stream elements that are all equals() to each other, the distinct() operation will let just one of them through. However, sometimes you want the notion of “distinct” to be based on some property or other value derived from the stream element, but not the value itself. You could use map() to map the stream element into some derived value and use distinct() on those, but the result would be a stream of those derived values, not the original stream element. It would be nice if there were some construct like distinct(Function<T,U> keyExtractor) that would call keyExtractor to derive the values that are compared for uniqueness, but there isn’t. However, it’s not too difficult to write your own. The first insight is that you can think of the distinct() operation as a ​stateful filter​. It’s like a filter() operation, which takes a predicate that determines whether to let the element though. It’s ​stateful​ because whether it lets an element through is determined by what elements it has seen previously. This state needs to be maintained somewhere. Internally, the distinct() operation keeps a Set that contains elements that have been seen previously, but it’s buried inside the operation and we can’t get to it from application code. But we could write something similar ourselves. The usual way to maintain state in Java is to create a class that has fields in which the state is maintained. We need a predicate, and that predicate could be a method on that class. This will work, but it’s rather cumbersome. The second insight is that lambdas can capture local variables from their enclosing lexical environment. These local variables cannot be mutated, but if they are references to mutable objects, ​those objects​ can be mutated. Thus we can write a higher-order function whose local variables contain references to the state objects, and we can have our higher-order function return a lambda that captures those locals and does its processing based on the captured, mutable state. This function will want to take a keyExtractor function that’s used to derive a value from each stream element. Conceptually we’ll want a Set to keep track of values we’ve seen already. However, in case our stream is run in parallel, we’ll want some thread-safe data structure. A ConcurrentHashMap is a simple way to do this, with each existing key representing membership in the set, and the value being a dummy object such as the empty string. (That’s how many Set implementations in the JDK work already.) Ideally we’d want to use an existing object as the dummy value and not create one each time. The empty string literal is used many times in the core JDK classes, so it’s certainly already in the constant pool. Here’s what the code looks like: public static <T> Predicate<T> distinctByKey( Function<? super T,Object> keyExtractor) { Map<Object,String> seen = new ConcurrentHashMap<>(); return t -> seen.put(keyExtractor.apply(t), "") == null; } This is a bit subtle. This is intended to be used within a filter() operation, so we’re returning a lambda that’s a predicate that computes a boolean based on whether it’s seen the value before. This value is derived from the stream element by calling the key extractor function. The put() method returns the previous value in the map, or null if there was no value. That’s the case we’re interested in, so if it returns null, we want the predicate to return true just for this first time. Subsequent times it will return non-null, so we return false those times, so the filter operation won’t pass through the element in those cases. I had used putIfAbsent() at first, since it has first-time-only semantics, but it turns out to be unnecessary, and using put() makes the code a bit shorter. Here’s how it’s used. Suppose we have a Book class that has fields for title and author, and the obvious constructor and getters, and we have a list of books that we want to process: List<Book> list = Arrays.asList( new Book("This Side of Paradise", "F. Scott Fitzgerald"), new Book("The Beautiful and Damned", "F. Scott Fitzgerald"), new Book("The Great Gatsby", "F. Scott Fitzgerald"), new Book("Tender is the Night", "F. Scott Fitzgerald"), new Book("The Sound and the Fury", "William Faulkner"), new Book("Absalom, Absalom!", "William Faulkner"), new Book("Intruder in the Dust", "William Faulkner"), new Book("The Sun Also Rises", "Ernest Hemingway"), new Book("A Farewell to Arms", "Ernest Hemingway"), new Book("The Old Man and the Sea", "Ernest Hemingway"), new Book("For Whom the Bell Tolls", "Ernest Hemingway"), new Book("A Moveable Feast", "Ernest Hemingway") ); If we wanted one book from each author, we could do this: list.stream() .filter(distinctByKey(Book::getAuthor)) .forEach(System.out::println); The output from running this is: Book[This Side of Paradise,F. Scott Fitzgerald] Book[The Sound and the Fury,William Faulkner] Book[The Sun Also Rises,Ernest Hemingway] Since this is a sequential stream, the first book by each author is the one that ends up in the output. If we were to run this stream in parallel, it would still work “correctly” in that one book from each author would be output. However, which book is output would differ from run to run. It takes a bit of tinkering to use higher-order functions to write stateful stream operations. You can see the evolution of my thinking by looking at my answer to a Stackoverflow question on this topic. I started out writing a class, but after chipping away at it a bit I realized a class was no longer necessary and a higher-order function could be used instead. This is a powerful technique that can be used to write all kinds of stateful stream operations. You just have to make sure to be careful they’re thread-safe. ## Knuth: Christmas Trees and Grand Theft Auto Earlier this month I had the pleasure of attending Donald Knuth’s 20th annual Christmas Tree Lecture. (announcement, video) For me it was quite a bit of nostalgia. When I was a Computer Science student at Stanford, uh, a number of decades ago, of course I had to take a class from Knuth, who was (and still is) a Very Famous Computer Science Professor. The class was nominally a computer science class, but it was mostly math. In fact, it was all math. I’m good at math, having completed several semesters of college-level math even after placing out of introductory calculus classes. However, I’m not really, really good at math, which is what you have to be to keep up with Knuth. As a result I spent most of that class wallowing in confusion. Knuth’s Christmas Tree lectures aren’t about Christmas trees, of course, but they are ostensibly about the Computer Science kind of trees, and they occur around Christmastime. Hence, Christmas Tree. But they aren’t so much about Computer Science as they are about math. So it was quite a familiar feeling for me to sit in a Stanford lecture hall, for the first time in decades, listening to Knuth give a lecture on math, with me wallowing in confusion most of the time. # ∞ A few years after I left Stanford, Knuth (along with Ronald Graham and Oren Patashnik) published Concrete Mathematics: For some reason I was compelled to buy it. I probably bought it because it must be a Very Important Computer Science Book because it has the name of a Very Famous Computer Science Professor on it. So I bought it and flipped through it, but it was mostly about math, and I’m good at math but not that good, so I eventually shelved it and didn’t look at it for a long time. # ∞ A number of years later (perhaps April, 2008) I was quite into playing Grand Theft Auto: Vice City. Yes, this is that notorious game where you’re the criminal and you do a variety of nasty things. One of the different things you can do is to steal a police car and run “Vigilante Missions” where you chase down and kill criminals. You start off at the first level where the goal is to kill one criminal, and you get a reward of$50. At the second level, you have to kill two criminals, and the reward is $200. At level three, there are three criminals, and the reward is$450. At level four the reward is $800, and at level five the reward is$1,250. The rewards keep rising at each level, even though the number of criminals maxes out after a while.

The reward given at each level isn’t necessarily obvious. After playing for a while, I paused the game to see if there was any information about this on the internet. Various gaming guide and FAQ authors have gone to the trouble of documenting the reward amounts. For example, the Grand Theft Auto: Vice City Vigilante Reward FAQ lists the reward for each level, as well as the cumulative reward, for every level up to level 1,000! Furthermore, it gives a formula of sorts for computing the reward amount for each level:

You take the level you are at, subtract one, multiply by 100, add 50 and add that to the previous level’s reward to compute the level you want.

(edited for clarity)

Stated mathematically, this is a recurrence relation where the reward for level n is given as follows:

$R_n = 100(n-1) + 50 + R_{n-1}$

This can be simplified a bit:

$R_n = 100n - 50 + R_{n-1}$

It’s pretty easy to see that each level k adds a reward of 100k – 50 dollars, so this can be expressed as a summation:

$R_n = \sum\limits_{k=1}^{n}(100k - 50)$

This is pretty easy to simplify to closed form:

\begin{aligned} \\ R_n &= \sum\limits_{k=1}^{n}(100k - 50) \\ &= 100 \sum\limits_{k=1}^{n} k - \sum\limits_{k=1}^{n} 50 \\ \end{aligned}

That first summation term is simply the sum of integers from 1 to n so it can be replaced with the well-known formula for that sum:

\begin{aligned} \\ R_n &= 100\frac{n(n+1)}{2} - 50n \\ &= 50n^2 \end{aligned}

OK, pretty simple. But this is just the reward for a given level n. What about the cumulative reward for having completed all levels 1 through n? Now this is a bit more interesting. Let’s define the cumulative reward for level n as follows:

\begin{aligned} \\ S_n &= \sum\limits_{k=1}^{n} 50k^2 \\ \end{aligned}

I don’t know how the guy made the table for levels up to 1,000. He might have used a spreadsheet, or he might have even played out all 1,000 levels and painstakingly kept track of the reward and cumulative reward at each level. (That would be entirely believable; gamers are nothing if not compulsive.) Looking at the “formula” he gave for computing the reward at each level, I’m quite certain he didn’t compute a closed form for the cumulative reward at each level. Clearly, it was important for me to figure that out.

At this point I had forgotten the game and was busy filling up sheets of paper with equations.

Deriving a closed form for this should be simple, right? Just perturb the indexes:

\begin{aligned} \\ S_n &= \sum\limits_{k=1}^{n} 50k^2 \\ S_{n+1} &= \sum\limits_{k=1}^{n+1} 50k^2 \\ S_n + 50(n+1)^2 &= 50 + \sum\limits_{k=2}^{n+1}50k^2 \\ &= 50 + \sum\limits_{k=1}^{n}50(k+1)^2 \\ &= 50 + \sum\limits_{k=1}^{n}50(k^2 + 2k + 1) \\ &= 50 + \sum\limits_{k=1}^{n}50k^2 + \sum\limits_{k=1}^{n} 100k + \sum\limits_{k=1}^{n} 50 \\ &= 50 + S_n + 100 \sum\limits_{k=1}^{n} k + 50n \\ \end{aligned}

Oh crap. The $S_n$ terms just cancel out. That’s what we’re trying to solve for. I tried several different techniques but nothing I tried worked. I was stumped. Hunting around on the web didn’t give me any helpful ideas, either.

At this point I couldn’t finish the game, because I just had to know how to compute the cumulative reward for completing n levels of the mission. I mean, this was important, right?

Swimming from the depths of my memory came a realization that somewhere I might have a book that would have some helpful information about solving sums and recurrences like this. I quickly found and pulled Concrete Mathematics from the shelf, blew the dust off, and curled up on the couch and started reading. Sure enough, right there at the start of section 2.5, General Methods, there’s a discussion of how to use several techniques to find the sum of the first n squares. (Page 41, equation 2.37) Great! How did Knuth et. al. solve it?

Method 0 is simply to look it up. There’s a well-known formula for this available in many reference books such as the CRC Handbook. But that’s cheating. I needed to know how to derive it. I kept reading.

Method 1 is to guess the answer and prove it by induction. I’m really bad at guessing stuff like this. I kept reading.

Method 2 is to perturb the sum. Aha! This is what I tried to do. Where did I go wrong? Turns out they did the same thing I did and ran into exactly the same problem — the desired sum cancels out. But they had one additional bit of insight: in attempting to find the sum of n squares, they ended up deriving the sum of the first n integers. Huh? Let’s pick up where we left off:

\begin{aligned} \\ S_n + 50(n+1)^2 &= 50 + S_n + 100 \sum\limits_{k=1}^{n} k + 50n \\ 50n^2 + 50n &= 100 \sum\limits_{k=1}^{n} k \\ \frac{n(n+1)}{2} &= \sum\limits_{k=1}^{n} k \\ \end{aligned}

Knuth et. al. continue by conjecturing that, if we tried to perturb the sum of cubes, a closed form for the sum of squares might just fall out. Let’s try that!

\begin{aligned} \\ T_n &= \sum\limits_{k=1}^{n} k^3 \\ T_{n+1} &= \sum\limits_{k=1}^{n+1} k^3 \\ T_n + (n+1)^3 &= 1 + \sum\limits_{k=2}^{n+1} k^3 \\ &= 1 + \sum\limits_{k=1}^{n} (k+1)^3 \\ &= 1 + \sum\limits_{k=1}^{n} (k^3 + 3k^2 + 3k + 1) \\ &= 1 + \sum\limits_{k=1}^{n} k^3 + 3 \sum\limits_{k=1}^{n} k^2 + 3 \sum\limits_{k=1}^{n} k \ + \sum\limits_{k=1}^{n} 1 \\ &= 1 + T_n + 3 \sum\limits_{k=1}^{n} k^2 + \frac{3}{2}n(n+1) + n \end{aligned}

OK, there’s our sum of squares term. Canceling $T_n$ and simplifying,

\begin{aligned} \\ n^3 + 3n^2 + 3n &= 3 \sum\limits_{k=1}^{n} k^2 + \frac{3}{2}n^2 + \frac{3}{2}n + n \\ n^3 + \frac{3}{2}n^2 + \frac{1}{2}n &= 3 \sum\limits_{k=1}^{n} k^2 \\ \frac{1}{3}n^3 + \frac{1}{2}n^2 + \frac{1}{6}n &= \sum\limits_{k=1}^{n} k^2 \\ \frac{2n^3 + 3n^2 + n}{6} &= \sum\limits_{k=1}^{n} k^2 \\ \frac{n(n+1)(2n+1)}{6} &= \sum\limits_{k=1}^{n} k^2 \\ \end{aligned}

And there we have it. This matches the formula given in the CRC Handbook. The solution to my particular problem, the cumulative reward for Vigilante Mission levels, is simply 50 times this quantity.

# ∞

What does this have to do with Christmas trees? Not much, but it does have a lot to do with Donald Knuth. It was good to see his lecture, even if I didn’t understand a lot of it. But I did apparently pick up something from his class, and I was able to figure out how to solve a problem (even if it was a silly one) with help from one of his books. And the equations this article use the WordPress plugin for LaTeX, which of course originally came from Knuth. So here’s to you, Don, have a happy new year!