Occasionally I see mention of a “fail-safe” Iterator in Java. What does this mean?
The context here is what happens when a collection is modified while it’s being iterated, or “concurrent modification” for short. Each collection defines its own policy for handling concurrent modification. Note that the issue is when the collection is modified directly, from “outside” the iterator. Modifying the collection via the iterator is generally permitted, with some specific exceptions.
The term “fail-safe” is not used anywhere in the Java SE specification. Instead, the specification describes four different policies for concurrent modification: fail-fast, weakly consistent, snapshot, and undefined.
1. fail-fast
Most of the non-concurrent collections in java.util
have fail-fast iterators. See the ArrayList specification for example:
The iterators returned by this class’s
iterator
andlistIterator
methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator’s ownremove
oradd
methods, the iterator will throw aConcurrentModificationException
. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
2. weakly consistent
Most of the concurrent collections in java.util.concurrent
, such as ConcurrentHashMap
, are weakly consistent. The definition from the java.util.concurrent package specification is:
Most concurrent Collection implementations (including most Queues) also differ from the usual
java.util
conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:
- they may proceed concurrently with other operations
- they will never throw
ConcurrentModificationException
- they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
3. snapshot
A third policy is snapshot semantics. The main example of this is CopyOnWriteArrayList:
All mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
The “snapshot” style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw
ConcurrentModificationException
. The iterator will not reflect additions, removals, or changes to the list since the iterator was created.
4. undefined
The fourth policy, if it can be called a policy, is that nothing is specified at all. The results of modifying a collection during iteration are undefined and may result in inconsistencies. The examples here are the legacy collections Vector
and Hashtable
and their methods that return Enumeration
, including Vector.elements
, Hashtable.elements
, and Hashtable.keys
.
If you iterate an Enumeration
from the Vector.elements
method, it’s not difficult to get odd behaviors such as elements being skipped, elements appearing twice in the iteration, or getting an exception unexpectedly.
As an aside, when the Collections Framework was introduced in JDK 1.2, Vector
and Hashtable
were retrofitted to implement the new interfaces. For example, if you have a Vector
, you can get an Enumeration
using the Vector.elements
method, or you can get an Iterator
using the Vector.iterator
method. The Iterator
has a fail-fast policy, while the Enumeration
does not. Hashtable
also provides both an Enumeration
and an Iterator
, and only the latter has the fail-fast policy. What’s interesting is that Hashtable
‘s Iterator
and Enumeration
are implemented by the same class which has a flag that determines whether it should behave as an Iterator
or Enumeration
.
∞
After all this, where does “fail-safe” come into the picture? Answer: it doesn’t. The words “fail-safe” are never used in the Java SE specifications that describe the concurrent modification policy of a collection. As such, there is no reliable, consistent definition of “fail-safe” for an Iterator. One can attempt to apply the general concept of “fail-safety” to an Iterator, but this is open to varying, misleading, and even contradictory interpretations.
Don’t use “fail-safe” to describe a Java Iterator. Instead, use one of the documented policies listed above.
Hi Stuart, if you may please explain, with an example, the following piece under weakly consistent section:
“they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.”
From my understanding if an Iterator is iterating a collection and, say, an element got deleted by any other agency, then it is possible that:
1. Iterator would still iterate it or,
2. The removal of the element would be “visible” to the Iterator and consequently, the element would not be iterated.
Sincerely,
Nawazish
Hi Nawazish, your understanding is correct; either of those possibilities might occur. Here’s an example.
Suppose you have a ConcurrentHashMap that contains keys A, B, C, D, and E. You start iterating the keySet(), and the iterator has seen A, B, and then C so far. Then, another thread deletes D from the map. What elements does the iterator see when it continues? It might either see D and E, or it might see just E. You can’t guarantee which behavior will occur, and in fact it might differ from run to run. That’s the “weak” part of “weakly consistent.”
So, what’s consistent about this? The consistency properties guarantee that an element isn’t repeated, and that an element isn’t skipped if it’s present in the map during the entire iteration.
Hi Stuart,
Thanks for your invaluable comment and the time you gave to reply me.
Just to further enhance my understanding, as the weakly consistent specification says, “…as they existed upon construction”. So what happens during the “construction” of an Iteration.
After reading your post, spec and your comment I could understand that, while the Iterator is being constructed, it will “see” *all* of the (key) Set’s elements which ever was visible at that moment.
And then, according to the consistency guarantees, (i) the Iterator will not repeat an iterated element again and (ii) element, if present, will not be skipped.
Sincerely,
Nawazish
Hi Nawazish, the phrase “as they existed upon construction” sounds like it’s saying that construction of the Iterator makes a snapshot, but that’s not the case. It’s saying that the elements as they existed upon construction of the Iterator are those that the Iterator has the *potential* to see. If no concurrent modifications occur, the Iterator will see all of them. If an element is deleted during iteration, the iteration might or might not see that element; and if an element is added, the iteration might or might not see that element either. And yes, elements present during the entire iteration won’t be skipped or repeated.
Hi Stuart,
Thanks a lot!!!
I humbly and yet strongly request that you should write books explaining Java (including its specification); you are awesome in your thought clarity. Hats off to you Sir!
Sincerely,
Nawazish
Hi Stuart, thank you for posting such an awsome article.
Since I felt this article would be quite helpful for Japanese JAVA developers, can I translate the whole paragraphs of this article and post to Qiita, which is a Japanese blog service for engineer?
Sincerely,
chooyan
Hi chooyan, thanks for taking the time to translate this. Yes, please do go ahead and translate some of the text, but please make sure to link back to the original article. Note also that this article contains quotations from the Java SE specifications, so if you translate those parts, please also make sure to link to those as well. I’ve included links to the source of any specifications I’ve quoted. Thanks!
Hi Stuart, thank you for allowing my translating!
Now I start to translate this article with suitable links
I’ll let you know when I post the translation to Qiita.
Sincerely,
chooyan
Hi Stuart.
I have posted an article of translation to Qiita just now. The link is below.
http://qiita.com/chooyan_eng/items/7f0d853844fee453325d
If you have time, could you check the article whether the consideration, such as link back or quotation, is appropriately included? (The article is written in Japanese, though…)
Thank you very much for giving me a chance to translate such an invaluable article!
Sincerely,
chooyan
[…] Ссылка: https://stuartmarks.wordpress.com/2016/07/27/there-is-no-such-thing-as-a-fail-safe-iterator-in-java/ […]
Thanks, nice explanation.
great explanation and very useful.
hi,
Thanks for the material first.
I have one question regarding the CopyOneWriteArrayList. How come the COW iterator are guaranteed to see the “snapshot” through it’s lifetime? I mean what if a write operation completes and pushes the changes by calling setArray() right before/after it exits it’s synchronized block as another thread was still iterating over it?
(Hi, sorry for the long delay in approving and replying to this comment. I missed the notification.)
Consider a simple operation on COWAL like add():
http://hg.openjdk.java.net/jdk/jdk15/file/jdk-15-ga/src/java.base/share/classes/java/util/concurrent/CopyOnWriteArrayList.java#l428
It does the following: 1) takes the lock, 2) gets the array, 3) does some manipulation, 4) sets the array, and finally 5) releases the lock. (All the mutator methods pretty much follow the same schema.)
The key here is that in step 3), the data is copied from the old array into a new array, the new array is then modified as necessary, and the new array is set into the “array” field using setArray(). (It is called “copy-on-write” after all.) 🙂 Once the array has been set into the “array” field, the array itself is never modified again after that. Also, the “array” field is volatile, which means that writing to it guarantees that all previous writes in this thread “happen-before” the volatile write. This is ensures proper memory visibility.
Any operation that does reading does a getArray(), which does is a volatile read from the “array” field. This ensures that the volatile read happens-before reads from the array, ensuring that they are up-to-date. Since (per the above) the array is never modified after it’s been set into the “array” field, any read operations are over a stable snapshot. If for example some thread is iterating the array and another thread performs a write operation, that operation operates on a new and different array, which doesn’t disturb any iterations already in-progress over the old array.