Java Collections API Design FAQ |
This is the most controversial design decision in the whole API. Clearly, static (compile time) type checking is highly desirable, and is the norm in Java. We would have supported it if we believed it were feasible. Unfortunately, attempts to achieve this goal cause an explosion in the size of the interface hierarchy, and do not succeed in eliminating the need for runtime exceptions (though they reduce it substantially).
Doug Lea, who wrote a popular Java collections package that did reflect mutability distinctions in its interface hierarchy, no longer believes it is a viable approach, based on user experience with his collections package. In his words (from personal correspondence) "Much as it pains me to say it, strong static typing does not work for collection interfaces in Java."
To illustrate the problem in gory detail, suppose you want to add the notion of modifiability to the Hierarchy. You need four new interfaces: ModifiableCollection, ModifiableSet, ModifiableList, and ModifiableMap. What was previously a simple hierarchy is now a messy heterarchy. Also, you need a new Iterator interface for use with unmodifiable Collections, that does not contain the remove operation. Now can you do away with UnsupportedOperationException? Unfortunately not.
Consider arrays. They implement most of the List operations, but not remove and add. They are "fixed-size" Lists. If you want to capture this notion in the hierarchy, you have to add two new interfaces: VariableSizeList and VariableSizeMap. You don't have to add VariableSizeCollection and VariableSizeSet, because they'd be identical to ModifiableCollection and ModifiableSet, but you might choose to add them anyway for consistency's sake. Also, you need a new variety of ListIterator that doesn't support the add and remove operations, to go along with unmodifiable List. Now we're up to ten or twelve interfaces, plus two new Iterator interfaces, instead of our original four. Are we done? No.
Consider logs (such as error logs, audit logs and journals for recoverable data objects). They are natural append-only sequences, that support all of the List operations except for remove and set (replace). They require a new core interface, and a new iterator.
And what about immutable Collections, as opposed to unmodifiable ones? (i.e., Collections that cannot be changed by the client AND will never change for any other reason). Many argue that this is the most important distinction of all, because it allows multiple threads to access a collection concurrently without the need for synchronization. Adding this support to the type hierarchy requires four more interfaces.
Now we're up to twenty or so interfaces and five iterators, and it's almost certain that there are still collections arising in practice that don't fit cleanly into any of the interfaces. For example, the collection-views returned by Map are natural delete-only collections. Also, there are collections that will reject certain elements on the basis of their value, so we still haven't done away with runtime exceptions.
When all was said and done, we felt that it was a sound engineering compromise to sidestep the whole issue by providing a very small set of core interfaces that can throw a runtime exception.
It was never our intention that programs should catch these exceptions: that's why they're unchecked (runtime) exceptions. They should only arise as a result of programming errors, in which case, your program will halt due to the uncaught exception.
The Collection interface provides this functionality. We are not providing any public implementations of this interface, as we think that it wouldn't be used frequently enough to "pull its weight." We occasionally return such Collections, which are implemented easily atop AbstractCollection (for example, the Collection returned by Map.values).
We are extremely sympathetic to the desire for type-safe collections. Rather than adding a "band-aid" to the framework that enforces type-safety in an ad hoc fashion, the framework has been designed to mesh with all of the parameterized-types proposals currently being discussed. In the event that parameterized types are added to the language, the entire collections framework will support compile-time type-safe usage, with no need for explicit casts. Unfortunately, this won't happen in the the 1.2 release. In the meantime, people who desire runtime type safety can implement their own gating functions in "wrapper" collections surrounding JDK collections.
While the names of the new collections methods do not adhere to the "Beans naming conventions", we believe that they are reasonable, consistent and appropriate to their purpose. It should be remembered that the Beans naming conventions do not apply to the JDK as a whole; the AWT did adopt these conventions, but that decision was somewhat controversial. We suspect that the collections APIs will be used quite pervasively, often with multiple method calls on a single line of code, so it is important that the names be short. Consider, for example, the Iterator methods. Currently, a loop over a collection looks like this:
for (Iterator i = c.iterator(); i.hasNext(); ) System.out.println(i.next());Everything fits neatly on one line, even if the Collection name is a long expression. If we named the methods "getIterator", "hasNextElement" and "getNextElement", this would no longer be the case. Thus, we adopted the "traditional" JDK style rather than the Beans style.
Many Collection implementations (including all of the ones provided by the JDK) will have a public clone method, but it would be mistake to require it of all Collections. For example, what does it mean to clone a Collection that's backed by a terabyte SQL database? Should the method call cause the company to requisition a new disk farm? Similar arguments hold for serializable.
If the client doesn't know the actual type of a Collection, it's much more flexible and less error prone to have the client decide what type of Collection is desired, create an empty Collection of this type, and use the addAll method to copy the elements of the original collection into the new one.
This is what is referred to as an "Internal Iterator" in the "Design Patterns" book (Gamma et al.). We considered providing it, but decided not to as it seems somewhat redundant to support internal and external iterators, and Java already has a precedent for external iterators (with Enumerations). The "throw weight" of this functionality is increased by the fact that it requires a public interface to describe upcalls.
It's easy to implement this functionality atop Iterators, and the resulting code may actually look cleaner as the user can inline the predicate. Thus, it's not clear whether this facility pulls its weight. It could be added to the Collections class at a later date (implemented atop Iterator), if it's deemed useful.
Because we don't believe in using Enumerations (or Iterators) as "poor man's collections." This was occasionally done in prior releases, but now that we have the Collection interface, it is the preferred way to pass around abstract collections of objects.
Again, this is an instance of an Enumeration serving as a "poor man's collection" and we're trying to discourage that. Note however, that we strongly suggest that all concrete implementations should have constructors that take a Collection (and create a new Collection with the same elements).
The semantics are unclear, given that the contract for Iterator makes no guarantees about the order of iteration. Note, however, that ListIterator does provide an add operation, as it does guarantee the order of the iteration.
People were evenly divided as to whether List suggests linked lists. Given the implementation naming convention, <Implementation><Interface>, there was a strong desire to keep the core interface names short. Also, several existing names (AbstractSequentialList, LinkedList) would have been decidedly worse if we changed List to Sequence. The naming conflict can be dealt with by the following incantation:
import java.util.*; import java.awt.*; import java.util.List; // Dictates interpretation of "List"
It was decided that the "set/get" naming convention was strongly enough enshrined in the language that we'd stick with it.
This was by design. We feel that mappings are not collections and collections are not mappings. Thus, it makes little sense for Map to extend the Collection interface (or vice versa).
If a Map is a Collection, what are the elements? The only reasonable answer is "Key-value pairs", but this provides a very limited (and not particularly useful) Map abstraction. You can't ask what value a given key maps to, nor can you delete the entry for a given key without knowing what value it maps to.
Collection could be made to extend Map, but this raises the question: what are the keys? There's no really satisfactory answer, and forcing one leads to an unnatural interface.
Maps can be viewed as Collections (of keys, values, or pairs), and this fact is reflected in the three "Collection view operations" on Maps (keySet, entrySet, and values). While it is, in principle, possible to view a List as a Map mapping indices to elements, this has the nasty property that deleting an element from the List changes the Key associated with every element before the deleted element. That's why we don't have a map view operation on Lists.
We view the method names for Enumeration as unfortunate. They're very long, and very frequently used. Given that we were adding a method and creating a whole new framework, we felt that it would be foolish not to take advantage of the opportunity to improve the names. Of course we could support the new and old names in Iterator, but it doesn't seem worthwhile.
It can be implemented atop the current Iterators (a similar pattern to java.io.PushbackInputStream). We believe that its use would be rare enough that it isn't worth including in the interface that everyone has to implement.
If you examine the goals for our Collections framework (in the Overview), you'll see that we are not really "playing in the same space" as JGL. Quoting from the "Design Goals" Section of the Java Collections Overview: "Our main design goal was to produce an API that was reasonably small, both in size, and (more importantly) in 'conceptual weight.'"
JGL consists of approximately 130 classes and interfaces; its main goal was consistency with the C++ Standard Template Library (STL). This was not one of our goals. Java has traditionally stayed away from C++'s more complex features (e.g., multiple inheritance, operator overloading). Our entire framework, including all infrastructure, contains approximately 25 classes and interfaces.
While this may cause some discomfort for some C++ programmers, we feel that it will be good for Java in the long run. As the Java libraries mature, they inevitably grow, but we are trying as hard as we can to keep them small and manageable, so that Java continues to be an easy, fun language to learn and to use.
Given that we provide core collection interfaces behind which programmers can "hide" their own implementations, there will be aliased collections whether the JDK provides them or not. Eliminating all views from the JDK would greatly increase the cost of common operations like making a Collection out of an array, and would do away with many useful facilities (like synchronizing wrappers). One view that we see as being particularly useful is List.subList. The existence of this method means that people who write methods taking List on input do not have to write secondary forms taking an offset and a length (as they do for arrays).
Primarily, resource constraints. If we're going to commit to such an API, it has to be something that works for everyone, that we can live with for the long haul. We may provide such a facility some day. In the meantime, it's not difficult to implement such a facility on top of the public APIs.
Copyright ©
1997-1999
Sun Microsystems, Inc., 901 San Antonia Ave., Palo Alto, CA 94303 USA.
All rights reserved.
Please send comments to: collections-comments@java.sun.com |
Java Software |