Bad programmers worry about the code. Good programmers worry about data structures and their relationships. — Linus Torvalds
The Java collections framework is a set of classes and interfaces for implementing a commonly used collection data structure. Examples include lists, queues, etc. Joshua Bloch designed and developed the collections framework, which debuted in JDK 1.2.
Because these collections are well tested, we do not have to implement every algorithm and data structure from scratch.
Almost all collections except the map can be derived from the java.util.Collection interface. The toArray() method converts any Collection into a one-dimensional array. We can use the for-each loop because the Collection interface extends the java.lang.Iterable interface.
The data structure allows us to store and manipulate data efficiently. With proper data structure selection, we can make algorithms faster. Without the proper data structure (PriorityQueue), we cannot solve Dijkstra’s shortest path problem in O((E+V)LogV) rather it will take O(V²) when List is used.
Therefore, data structures have a significant impact on the final performance of the software. Let’s examine them in more detail.
List
The List interface is an ordered collection that allows us to store and access items in a sequential manner.
It extends the Collection interface which extends the Iterable interface.
Lists can include duplicate elements.
It gives the user full visibility and control over the ordering of its elements.
ArrayList
Resizable-array implementation of the List interface.
This class is roughly equivalent to Vector, except that it is unsynchronized.
Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. As elements are added to an ArrayList, its capacity grows automatically.
Random access with index takes O(1) time complexity.
Insertion/deletion at the given index takes O(N) time complexity.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Doubly-linked list implementation of the List and Deque interfaces.
This implementation is not synchronized
Since the items are not stored next to each other in the memory, so there is no random access.
Manipulation (insertion/deletion) of the first & last item will take O(1) time complexity.
Manipulating an arbitrary item will O(N) time complexity.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Let’s see the performance of ArrayList and LinkedList when insertion happens at the start of the list.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Output on my system Time taken by ArrayList : 519 Time taken by LinkedList : 4
LinkedList performance is way better than ArrayList as no data shift happened.
Vector
Vector is similar to ArrayList but it’s synchronized.
If a thread-safe implementation is not needed, it is recommended to use ArrayList in place of Vector.
Stack
The Stack class represents a last-in-first-out (LIFO) stack of objects.
It is an abstract data type and it can be implemented either with an ArrayList or with a LinkedList.
Basic operations are push(), pop() and peek().
A more complete and consistent set of LIFO stack operations is provided by the Deque interface and its implementations, which should be used in preference to this class.
Graph algorithms rely heavily on stacks such as depth-first search.
Finding Eulerial cycles in a G(V, E) graph.
Finding strongly connected components in a given G(V, E) graph.
Processing function call.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Stack extends the Vector class which means that stacks are inherently synchronized. However, if synchronization is not needed, in that case, prefer using ArrayDeque.
Queues
A collection designed for holding elements prior to processing.
It is an abstract data type and it can be implemented either with an ArrayList or a LinkedList.
Queues typically, but do not necessarily, order elements in a FIFO (first-in-first-out) manner. Among the exceptions are priority queues, which order elements according to a supplied comparator, or the elements’ natural ordering.
Graph algorithm breadth-first search use queue as an underlying abstract data type.
Threads are stored in queues.
CPU scheduling uses queues.
Besides basic Collection operations, queues provide additional insertion, extraction, and inspection operations. Each of these methods exists in two forms: one throws an exception if the operation fails, and the other returns a special value.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The elements of the priority queue are ordered according to their natural ordering, or by a Comparator provided at queue construction time, depending on which constructor is used.
A priority queue does not permit null elements.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A linear collection that supports element insertion and removal at both ends. The name deque is short for “double-ended queue”.
This interface defines methods to access the elements at both ends of the deque.
This interface extends the Queue interface. When a deque is used as a queue, FIFO (First-In-First-Out) behavior results. Elements are added at the end of the deque and removed from the beginning.
Deques can also be used as LIFO (Last-In-First-Out) stacks. This interface should be used in preference to the legacy Stack class. When a deque is used as a stack, elements are pushed and popped from the beginning of the deque.
ArrayDeque
Resizable-array implementation of the Deque interface.
They are not thread-safe.
Null elements are prohibited.
This class is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Because Stack is synchronized as it extends the Vector class this is why it is going to be slower than the ArrayDequeue. So it’s advisable to use ArrayDeque if we want a LIFO structure.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Output on my system Time taken with ArrayDeque: 9ms Time taken with Stack: 14ms
Set
Set are abstract data types that can store certain values without any particular order.
Set contains no repeated values or duplicates.
HashSet
This class implements the Set interface, backed by a hash table (actually a HashMap instance).
It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.
This class permits the null element.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hash table and linked list implementation of the Set interface, with predictable iteration order.
This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A Set that further provides a total ordering of its elements.
The elements are ordered using their natural ordering, or by a Comparator typically provided at a sorted set creation time.
All elements inserted into a sorted set must implement the Comparable interface
TreeSet
A NavigableSet implementation based on a TreeMap.
The elements are ordered using their natural ordering, or by a Comparator provided at a set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove, and contains).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HashSet and LinkedHashSet rely heavily on a one-dimensional array and a hash function.
HashSet and LinkedHashSet have an average time complexity of O(1) but on collision, it can be O(N).
HashSet and LinkedHashSet can store null keys as well.
LinkedHashSet uses a doubly-linked list data structure in addition to maintaining insertion order.
TreeSet uses a balanced binary search tree and its time complexity is O(logN).
TreeSet cannot store null keys.
TreeSet uses less memory compared to HashSet/LinkedHashSet.
Map
An object that maps keys to values.
A map cannot contain duplicate keys; each key can map to at most one value.
This interface takes the place of the Dictionary class, which was a totally abstract class rather than an interface.
Hashtable
This class implements a hash table, which maps keys to values.
Any non-null object can be used as a key or as a value.
To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method.
An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
Hashtable is similar to HashMap but it’s synchronized so it’s thread-safe.
HashMap
Hash table based implementation of the Map interface.
The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hash table and linked list implementation of the Map interface, with predictable iteration order.
This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries.
This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
A linked hash map has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashMap.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Before we talk about TreeMap let’s first understand the binary search trees.
ArrayList can manipulate the last item in O(1) constant time complexity.
LinkedList can manipulate the first item of the data structure fast.
Searching for an arbitrary item takes O(N) linear running time for both ArrayList and LinkedList.
What if the array data structure is sorted?
We can search for an arbitrary item in O(logN) time complexity, which is the concept behind the binary search.
Binary Search Trees
Every node in the tree can have at most 2 children (left and right)
The left child is smaller than the parent node.
The right child is greater than the parent node.
We can access the root node exclusively and all other nodes can be accessed via the root node.
Red-Black Trees
It is a balanced data structure that has a guaranteed O(logN) time complexity.
The time complexity of binary search trees depends on the height of the binary search tree.
AVL trees are faster than red-black trees because they are more rigidly balanced but need more work.
It’s faster to construct a red-black tree and many operating systems rely on it.
TreeMap
A Red-Black tree-based NavigableMap implementation.
The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters