Collections

Introduction

Collections are objects that group other objects according to a conceptual scheme. For example, a card deck is a collection of cards; a dictionary is a collection of words. Collection is another term for data structures. They are used to store, manipulate and retrieve aggregate data. We will distinguish collections in the following ways:
  • Linear   (arrays, lists, stacks, queues)
  • Hierarchical   (various kinds of trees)
  • Graphs
  • In a linear collection each element (except the first and last) has a unique predecessor and successor.

    In a hierarchical collection, each element (except one, called the root) has a unique predecessor and zero or more successors. In the next picture, the root is in red

    In a graph, each element has many predecessors and successors.

    Java collections (called the Java Collections Framework) is a set of interfaces and generified classes found in the package java.util. They are defined through three major interfaces: List, Set and Map:

     Interfaces   Implementations 
     List   LinkedList, ArrayList 
     Set   HashSet, TreeSet 
     Map   HashMap, TreeMap, Hashtable 

     

    Sets are collections in which the number of occurrences of any object is at most one. There is a special kind of a Set called a SortedSet, which maintains its elements in sorted order.

    Lists (or Sequences) are ordered collections that may contain duplicate elements, and have elements that are accessible via array-like indexes.

    Maps are collections of pairs (key, value). In other words, it maps keys to values. A Map cannot contain duplicate keys; and each key can map to at most one value. An example of a Map is a dictionary where words are the keys and definitions are the values.

    The idea behind sets and maps is much older than the Java. They are important tools in mathematics and computer science and their properties are well understood. Most of you probably learned some elements of set theory in elementary school.

    As an example of working with Java's collection classes, we consider the Lotto 6/49 lottery game. In the code below we shuffle an array of 49 numbers, randomly choose 6 of them, sort them in ascending order and finally print them:

    ArrayList num = new ArrayList(49);
    for (int i = 1; i <= 49; i++) num.add(i);
    Collections.shuffle(num);
    List<Integer> win = num.subList(0, 6);
    Collections.sort(win);
    System.out.println(win);
    

    As a user of the collection classes, you do not need to know a lot about data structures, but you do need to know what kinds of operations you can perform on collections. Here is a list of the most common operations

  • Searching
  • Deletion
  • Insertion
  • Traversal
  • Sorting
  • Cloning
  • How would one support all these operations over different data types?

    Generics

    Generic types (or generics) are a powerful tool to write reusable object-oriented components. Their full understanding requires the knowledge of the type theory of programming languages. In this chapter, we will take a brief look at generic programming in Java. Generic programming refers to writing code that will work for many types of data.

    A class (or an interface) can be declared to take one or more parameters. Parameters are references and written in angle brackets. In the code below we create an ArrayList of Integers and then add a number to it:

    ArrayList<Integer> nums = new ArrayList<Integer> ();
    nums.add(1);
    Integer x = nums.get(0);
    

    With generics we restrict the list to contain a particular data type. We say that ArrayList takes a type parameter, which in our case is Integer. Without generics, the type parameters are omitted, but you must explicitly cast whenever an element is extracted from the list.

    ArrayList nums = new ArrayList ();
    nums.add(1);
    Integer x = (Integer) nums.get(0);
    

    Generics implicitely perform the same cast that is performed without generics. What is the advantage of this? You get typechecking at compile-time. Generics provide compile-time safety - when you accidentally mistake the type, you will get a compile-time error.

    You can create your own generic types

    public class Box<AnyType>
    {
       private ArrayList<AnyType> data;
       public Box()
       {
          data = new ArrayList<AnyType>();
       }
       public void add(AnyType x)
       {
          data.add(x);
       }
       public AnyType remove()
       {
          if (data.size() > 0)
             return data.remove(0);
          else
             return null;
       }
    }
    

    Here AnyType is a formal type parameter and does not have any special meaning. In the invocation (runtime), all occurrences of the formal type parameter (AnyType in this case) are replaced by the actual type. Generics are not available at runtime! This is somewhat similar to method's formal value parameters. When a method is invoked, actual arguments are substituted for the formal parameters,

    Learning to use generic types can get very complicated! As an example, Java does not permit the construction of generic arrays. Suppose, for example, that you want to design a class that has an array field

    public class Box<AnyType>
    {
       private AnyType[] contents = new AnyType[5];
        ...
    }
    

    It's hard to understand, but this is illegal! A workaround is to construct an array of objects with a type-cast:

    public class Box<AnyType>
    {
       private AnyType[] stack = (AnyType[]) new Object[5];
        ...
    }
    

    Nevertheless, this will generate a compiler warning about an unchecked type conversion. There is no way to get rid of that warning as to use annotations:

    @SuppressWarnings("unchecked")
    public class Box<AnyType>
    {
       private AnyType[] stack = (AnyType[]) new Object[5];
        ...
    }
    

    Learning generics be prepared for a bit of head-scratching along the way.

    Iterator

    The iterators are used to provide a uniform way of accessing collection elements sequentially without exposing its underlying representation. Whatever kind of collection you are dealing with, you always know how to traverse its elements. An iterator is an object that implements the Iterator interface

    public interface Iterator<AnyType>
    {
        boolean hasNext();
        AnyType next();
        void remove();
    }
    

    In this code example we fill ArrayList with integers and sum it up by using an iterator:

    ArrayList<Integer> num = new ArrayList<Integer>();
    for(int i = 0; i < 5; i++) num.add(i);
    
    Iterator<Integer> itr = num.iterator();
    int sum = 0;
    while (itr.hasNext()) sum += itr.next();
    
    System.out.println(sum);
    

    Think of an iterator as another object that contains objects available for accessing in a linear order. The picture below illustrates dependencies between objects

    As you see from the picture, the ArrayList object now has two references and therefore its state can be changed by using either of them. Eventually, using two references might lead to unforseen effects beetwen ArrayList and Iterator objects. For example, changes in the ArrayList are unvisible for the Iterator. Such changes are called concurrent modification. Iterator implementation does not permit changes in the object while it is iterating over that object. In particular, you cannot remove directly from the ArrayList object:

    Iterator<Integer> itr = num.iterator();
    while (itr.hasNext())
       if(itr.next() % 2 == 0) num.remove();
    

    This causes the ConcurrentModificationException.

    foreach loop

    Java 1.5 introduced a new loop idiom called for-each, though it is written using the keyword for. In the for-each loop, you don't declare the iterator - the compiler does this for you implicitly

    ArrayList<Integer> num = new ArrayList<Integer>();
    for(int i = 0; i < 5; i++) num.add(i);
    int sum = 0;
    
    for(Integer x : num) sum += x
    System.out.println(sum);
    

    The loop above reads as "for each Integer x in collection num." The for-each construct combines nicely with generics - it preserves all of the type safety.

    The foreach loop can be appplied to any collections or class that implements interface Iterable.

    public interface Iterable<AnyType>
    {
        Iterator<AnyType> iterator();
    }
    

    It also can be applied to an array

    int[] a = {1, 2, 3, 4, 5};
    sum = 0;
    for(int x : a)	sum += x;
    System.out.println(sum);
    

    You cannot use the for-each construct everywhere. Since the for-each loop hides the iterator, you cannot remove items. Similarly it is not usable for loops where you need to replace elements in a collection as you traverse it. The folloing code will result in ConcurrentModificationException.

    for( Integer x : lst )
       if( x % 2 == 0 ) lst.remove(x);