Java Data Structures (2nd edition)

© 1996-2001, Particle

Introduction...

    Welcome to Java Data Structures (2nd edition). This document was created with an intent to show people how easy Java really is, and to clear up a few things I've missed in the previous release of the document.

    This is a growing document; as new features are added to the language, new techniques are discovered or realized, this document shall be updated to try to accommodate them all. If you have suggestions, or requests, (or spelling/grammar errors) just e-mail them, and I'll try to add the suggested topics into the subsequent release. Because this document is changing so much, I've decided to implement a version number. This release is: v2.2.11, updated: May 7th, 2002.

    Current release of the document, including all the sources, can be downloaded here:

[download zip with sources and everything]

    Of course, this document is free, and I intend to keep it that way. Selling of this document is NOT permitted. You WILL go to hell if you do (trust me). (not that anybody would want to buy it...) You may distribute this document (in ANY form), provided you don't change it. (yes, you CAN include it in a book provided you notify me and give me credit <and give me one free copy of the book>) To my knowledge, this document has already been reproduced and distributed within some corporations, schools and colleges, but has yet to be formally published in a book.

    I take no responsibility for ANYTHING. I am only responsible for all the good things you like about the article. So, remember, if it's bad, don't blame me, if it's good, thank me (give me credit).

    All the source has been compiled and tested using JDK v1.2. Although most things should work flawlessly with previous versions, there are things where JDK 1.2 is more appropriate. If you find problems and/or errors, please let me know.

    Although this document should be read in sequence, it is divided into several major sections, here they are:

Variables

Arrays
Array Stack
Array Queue
Array List
The Vector

Nodes

Linked Lists
Reusing Tricks

Trees
Generic Tree
Comparing Objects
Binary Search Trees
Tree Traversals

Node Pools
Node Pool Nodes
Node Pool Generic Trees
Node Pool Sort Trees

Priority Vectors
Sorting
Sorting JDK 1.2 Style
Sorting using Quicksort
Optimizing Quicksort
Radix Sort
Improving Radix Sort

Reading and Writing Trees (Serialization)
Deleting items from a Binary Search Tree
Determining Tree Depth

Advanced Linked Lists
Doubly Linked Lists (with Enumeration)

Binary Space Partition Trees (BSP)
Binary Space Partition Tree DEMO (Dog 3D)
Binary Space Partition Tree DEMO with Lighting (Dog 3D)

Kitchen Sink Methods
Java Native Interface (JNI)

Bibliography
Special Thanks
Contact Info


    In contrast to what most people think about Java, it being a language with no pointers, data structures are quite easy to implement. In this section, I'll demonstrate few basic data structures. By learning how easy they are to implement in Java, you'll be able to write any implementation yourself.

    I also think that this document is a pretty good introduction to Data Structures in general. All these concepts can be applied in any programming language. Incidentally, most of these programs are ported from their C++ counterparts. So, if you want to learn Data Structures in C/C++, you'll still find this document useful! Java is an Object Oriented language, but more so than C++, so, most data structure concepts are expressed and illustrated "more naturally" in Java! (try not to raise your blood pressure from all the caffeine)

    I suggest that you be familiar with Java format, and know some other programming language in advance. Coincidentally, I and a couple of my friends are in the process of writing a C language book, which deals with all that "start up" stuff.

    The way most examples are executed is through the JDK's command line Java interpreter. (at the prompt, you just type "java" and the name of the class to run.)


Variables...

    Variables are the key to any program. There are variables called registers inside every CPU (Central Processing Unit). Every program ever written uses some form of variables. Believe it or not, the way you use variables can significantly impact your program. This section is a very simple introduction to what variables are, and how they're used in programs.

    Usually, a variable implies a memory location to hold one instance of one specific type. What this means is that if there is an integer variable, it can only hold one integer, and if there is a character variable, it can only hold one character.

    There can be many different types of variables, including of your own type. A sample declaration for different variable types is given below.

boolean t;
byte b;
char c;
int i;
long l;

    I believe the above is straight forward, and doesn't need much explanation. Variable 't' is declared as boolean type, and 'b' as of byte type, etc.

    The above variables are what's know as primitive types. Primitive types in Java means that you don't have to create them, they're already available as soon as you declare them. (you'll see what I mean when we deal with Objects) It also means that there is usually some hardware equivalent to these variables. For example, an int type, can be stored in a 32 bit hardware register.

    The other types of variables are instances of classes or Objects. Java is a very Object Oriented language, and everything in it, is an object. An object is an instance of a class. Your Java programs consist of classes, in which you manipulate objects, and make the whole program do what you want. This concept will be familiar to you if you've ever programmed C++, if not, think of objects as structures. An example of a simple class would be:

public class pSimpleObject{
    int i;
    public pSimpleObject(){
        i = 0;
    }
    public int get(){
        return i;
    }
    public void set(int n){
        i = n;
    }
}

    As you can see, first we specify that the class is public, this means that it can be visible to other objects outside it's file. We later say that it's a class, and give it's name, which in this case is: pSimpleObject. Inside of it, the class contains an integer named 'i', and three functions. The first function named pSimpleObject(), is the constructor. It is called every time an object is created using this class. The set() and get() functions set and get the value of 'i' respectively. One useful terminology is that functions in objects are not called functions, they're called methods. So, to refer to function set(), you'd say "method set()."   That's all there is to objects!

    The way you declare a variable, or in this case, an object of that class, is:

pSimpleObject myObject;
myObject = new pSimpleObject();

or

pSimpleObject myObject = new pSimpleObject();

    The first example illustrates how you declare an object named myObject, of class pSimpleObject, and later instantiate it (a process of actual creation, where it calls the object's constructor method). The second approach illustrates that it all can be done in one line. The object does not get created when you just declare it, it's only created when you do a new on it.

    If you're familiar with C/C++, think of objects as pointers. First, you declare it, and then you allocate a new object to that pointer. The only limitation seems to be that you can't do math on these pointers, other than that, they behave as plain and simple C/C++ pointers. (You might want to think of objects as references however.)

    Using variables is really cool, and useful, but sometimes we'd like to have more. Like the ability to work with hundreds or maybe thousands of variables at the same time. And here's where our next section starts, the Arrays!


Arrays...

    One of the most basic data structures, is an array. An array is just a number of items, of same type, stored in linear order, one after another. Arrays have a set limit on their size, they can't grow beyond that limit. Arrays usually tend to be easier to work with and generally more efficient than other structural approaches to organizing data; way better than a no formal structure approach.

    For example, lets say you wanted to have 100 numbers. You can always resort to having 100 different variables, but that would be a pain. Instead, you can use the clean notation of an array to create, and later manipulate those 100 numbers. For example, to create an array to hold 100 numbers you would do something like this:

int[] myArray;
myArray = new int[100];

or

int[] myArray = new int[100];

or

int myArray[] = new int[100];

    The three notations above do exactly the same thing. The first declares an array, and then it creates an array by doing a new. The second example shows that it can all be one in one line. And the third example shows that Java holds the backwards compatibility with C++, where the array declaration is: int myArray[]; instead of int[] myArray;. To us, these notations are exactly the same. I do however prefer to use the Java one.

    Working with arrays is also simple, think of them as just a line of variables, we can address the 5th element (counting from 0, so, it's actually the 6th element) by simply doing:

int i = myArray[5];

    The code above will set integer 'i' to the value of the 5th (counting from 0) element of the array. Similarly, we can set an array value. For example, to set the 50th element (counting from 0), to the value of 'i' we'd do something like:

myArray[50] = i;

    As you can see, arrays are fairly simple. The best and most convenient way to manipulate arrays is using loops. For example, lets say we wanted to make an array from 1 to 100, to hold numbers from 1 to 100 respectively, and later add seven to every element inside that array. This can be done very easily using two loops. (actually, it can be done in one loop, but I am trying to separate the problem into two)

int i;
for(i=0;i<100;i++)
    myArray[i] = i;
for(i=0;i<100;i++)
    myArray[i] = myArray[i] + 7;

    In Java, we don't need to remember the size of the array as in C/C++. Here, we have the length variable in every array, and we can check it's length whenever we need it. So to print out any array named: myArray, we'd do something like:

for(int i = 0;i<myArray.length;i++)
    System.out.println(myArray[i]);

    This will work, given the objects inside the myArray are printable, (have a corresponding toString() method), or are of primitive type.

    One of the major limitations on arrays is that they're fixed in size. They can't grow or shrink according to need. If you have an array of 100 max elements, it will not be able to store 101 elements. Similarly, if you have less elements, then the unused space is being wasted (doing nothing).

    Java API provides data storage classes, which implement an array for their storage. As an example, take the java.util.Vector class (JDK 1.2), it can grow, shrink, and do some quite useful things. The way it does it is by reallocating a new array every time you want to do some of these operations, and later copying the old array into the new array. It can be quite fast for small sizes, but when you're talking about several megabyte arrays, and every time you'd like to add one more number (or object) you might need to reallocate the entire array; that can get quite slow. Later, we will look at other data structures where we won't be overly concerned with the amount of the data and how often we need to resize.

    Even in simplest situations, arrays are powerful storage constructs. Sometimes, however, we'd like to have more than just a plain vanilla array.


Array Stack...

    The next and more serious data structure we'll examine is the Stack. A stack is a FILO (First In, Last Out), structure. For now, we'll just deal with the array representation of the stack. Knowing that we'll be using an array, we automatically think of the fact that our stack has to have a maximum size.

    A stack has only one point where data enters or leaves. We can't insert or remove elements into or from the middle of the stack. As I've mentioned before, everything in Java is an object, (since it's an Object Oriented language), so, lets write a stack object!

public class pArrayStackInt{
    protected int head[];
    protected int pointer;

    public pArrayStackInt(int capacity){
        head = new int[capacity];
        pointer = -1;
    }
    public boolean isEmpty(){
        return pointer == -1;
    }
    public void push(int i){
        if(pointer+1 < head.length)
            head[++pointer] = i;
    }
    public int pop(){
        if(isEmpty())
            return 0;
        return head[pointer--];
    }
}

    As you can see, that's the stack class. The constructor named pArrayStackInt() accepts an integer. That integer is to initialize the stack to that specific size. If you later try to push() more integers onto the stack than this capacity, it won't work. Nothing is complete without testing, so, lets write a test driver class to test this stack.

import java.io.*;
import pArrayStackInt;

class pArrayStackIntTest{
    public static void main(String[] args){
        pArrayStackInt s = new pArrayStackInt(10);
        int i,j;
        System.out.println("starting...");
        for(i=0;i<10;i++){
            j = (int)(Math.random() * 100);
            s.push(j);
            System.out.println("push: " + j);
        }
        while(!s.isEmpty()){
            System.out.println("pop: " + s.pop());
        }
        System.out.println("Done ;-)");
    }
}

    The test driver does nothing special, it inserts ten random numbers onto the stack, and then pops them off. Writing to standard output exactly what it's doing. The output gotten from this program is:

starting...
push: 33
push: 66
push: 10
push: 94
push: 67
push: 79
push: 48
push: 7
push: 79
push: 32
pop: 32
pop: 79
pop: 7
pop: 48
pop: 79
pop: 67
pop: 94
pop: 10
pop: 66
pop: 33
Done ;-)

    As you can see, the first numbers to be pushed on, are the last ones to be popped off. A perfect example of a FILO structure. The output also assures us that the stack is working properly.

    Now that you've had a chance to look at the source, lets look at it more closely.

    The pArrayStackInt class is using an array to store it's data. The data is int type (for simplicity). There is a head data member, that's the actual array. Because we're using an array, with limited size, we need to keep track of it's size, so that we don't overflow it; we always look at head.length to check for maximum size.

    The second data member is pointer. Pointer, in here, points to the top of the stack. It always has the position which had the last insertion, or -1 if the stack is empty.

    The constructor: pArrayStackInt(), accepts the maximum size parameter to set the size of the stack. The rest of the functions is just routine initialization. Notice that pointer is initialized to -1, this makes the next position to be filled in an array, 0.

    The isEmpty() function is self explanatory, it returns true if the stack is empty (pointer is -1), and false otherwise. The return type is boolean.

    The push(int) function is fairly easy to understand too. First, it checks to see if the next insertion will not overflow the array. If no danger from overflow, then it inserts. It first increments the pointer and then inserts into the new location pointed to by the updated pointer. It could easily be modified to actually make the array grow, but then the whole point of "simplicity" of using an array will be lost.

    The int pop() function is also very simple. First, it checks to see if stack is not empty, if it is empty, it will return 0. In general, this is a really bad error to pop of something from an empty stack. You may want to do something more sensible than simply returning a 0 (an exception throw would not be a bad choice). I did it this way for the sake of simplicity. Then, it returns the value of the array element currently pointed to by pointer, and it decrements the pointer. This way, it is ready for the next push or pop.

    I guess that just about covers it. Stack is very simple and is very basic. There are tons of useful algorithms which take advantage of this FILO structure. Now, lets look at alternative implementations...

    Given the above, a lot of the C++ people would look at me strangely, and say: "All this trouble for a stack that can only store integers?" Well, they're probably right for the example above. It is too much trouble. The trick I'll illustrate next is what makes Java my favorite Object Oriented language.

    In C, we have the void* type, to make it possible to store "generic" data. In C++, we also have the void* type, but there, we have very useful templates. Templates is a C++ way to make generic objects, (objects that can be used with any type). This makes quite a lot of sense for a data storage class; why should we care what we're storing?

    The way Java implements these kinds of generic classes is by the use of parent classes. In Java, every object is a descendent of the Object class. So, we can just use the Object class in all of our structures, and later cast it to an appropriate type. Next, we'll write an example that uses this technique inside a generic stack.

public class pArrayStackObject{
    protected Object head[];
    protected int pointer;

    public pArrayStackObject(int capacity){
        head = new Object[capacity];
        pointer = -1;
    }
    public boolean isEmpty(){
        return pointer == -1;
    }
    public void push(Object i){
        if(pointer+1 < head.length)
            head[++pointer] = i;
    }
    public Object pop(){
        if(isEmpty())
            return null;
        return head[pointer--];
    }
}

    The above is very similar to the int only version, the only things that changed are the int to Object. This stack, allows the push() and pop() of any Object. Lets convert our old test driver to accommodate this new stack. The new test module will be inserting java.lang.Integer objects (not int; not primitive type).

import java.io.*;
import pArrayStackObject;

class pArrayStackObjectTest{
    public static void main(String[] args){
        pArrayStackObject s = new pArrayStackObject(10);
        Integer j = null;
        int i;
        System.out.println("starting...");
        for(i=0;i<10;i++){
            j = new Integer((int)(Math.random() * 100));
            s.push(j);
            System.out.println("push: " + j);
        }
        while(!s.isEmpty()){
            System.out.println("pop: " + ((Integer)s.pop()));
        }
        System.out.println("Done ;-)");
    }
}

    And for the sake of being complete, I'll include the output. Notice that here, we're not inserting elements of int type, we're inserting elements of java.lang.Integer type. This means, that we can insert any Object.

starting...
push: 45
push: 7
push: 33
push: 95
push: 28
push: 98
push: 87
push: 99
push: 66
push: 40
pop: 40
pop: 66
pop: 99
pop: 87
pop: 98
pop: 28
pop: 95
pop: 33
pop: 7
pop: 45
Done ;-)

    I guess that covers stacks. The main idea you should learn from this section is that a stack is a FILO data structure. After this section, non of the data structures will be working with primitive types, and everything will be done solely with objects. (now that you know how it's done...)

    And now, onto the array relative of Stack, the Queue.


Array Queues...

    A queue is a FIFO (First In, First Out) structure. Anything that's inserted first, will be the first to leave (kind of like the real world queues.) This is totally the opposite of what a stack is. Although that is true, the queue implementation is quite similar to the stack one. It also involves pointers to specific places inside the array.

    With a queue, we need to maintain two pointers, the start and the end. We'll determine when the queue is empty if start and end point to the same element. To determine if the queue is full (since it's an array), we'll have a boolean variable named full. To insert, we'll add one to the start, and mod (the % operator) with the size of the array. To remove, we'll add one to the end, and mod (the % operator) with the size of the array. Simple? Well, lets write it.

public class pArrayQueue{
    protected Object[] array;
    protected int start,end;
    protected boolean full;

    public pArrayQueue(int maxsize){
        array = new Object[maxsize];
        start = end = 0;
        full = false;
    }

    public boolean isEmpty(){
        return ((start == end) && !full);
    }

    public void insert(Object o){
        if(!full)
            array[start = (++start % array.length)] = o;
        if(start == end)
            full = true;
    }

    public Object remove(){
        if(full)
            full = false;
        else if(isEmpty())
            return null;
        return array[end = (++end % array.length)];
    }
}

    Well, that's the queue class. In it, we have four variables, the array, the start and end, and a boolean full. The constructor pArrayQueue(int maxsize) initializes the queue, and allocates an array for data storage. The isEmpty() method is self explanatory, it checks to see if start and end are equal; this can only be in two situations: when the queue is empty, and when the queue is full. It later checks the full variable and returns whether this queue is empty or not.

    The insert(Object) method, accepts an Object as a parameter, checks whether the queue is not full, and inserts it. The insert works by adding one to start, and doing a mod with array.length (the size of the array), the resulting location is set to the incoming object. We later check to see if this insertion caused the queue to become full, if yes, we note this by setting the full variable to true.

    The Object remove() method, doesn't accept any parameters, and returns an Object. It first checks to see if the queue is full, if it is, it sets full to false (since it will not be full after this removal). If it's not full, it checks if the queue is empty, by calling isEmpty(). If it is, the method returns a null, indicating that there's been an error. This is usually a pretty bad bug inside a program, for it to try to remove something from an empty queue, so, you might want to do something more drastic in such a situation (like an exception throw). The method continues by removing the end object from the queue. The removal is done in the same way insertion was done. By adding one to the end, and later mod it with array.length (array size), and that position is returned.

    There are other implementations of the same thing, a little re-arrangement can make several if() statements disappear. The reason it's like this is because it's pretty easy to think of it. Upon insertion, you add one to start and mod, and upon removal, you add one to end and mod... easy?

    Well, now that we know how it works, lets actually test it! I've modified that pretty cool test driver from the stack example, and got it ready for this queue, so, here it comes:

import java.io.*;
import pArrayQueue;

class pArrayQueueTest{
    public static void main(String[] args){
        pArrayQueue q = new pArrayQueue(10);
        Integer j = null;
        int i;
        System.out.println("starting...");
        for(i=0;i<10;i++){
            j = new Integer((int)(Math.random() * 100));
            q.insert(j);
            System.out.println("insert: " + j);
        }
        while(!q.isEmpty()){
            System.out.println("remove: " + ((Integer)q.remove()));
        }
        System.out.println("Done ;-)");
    }
}

    As you can see, it inserts ten random java.lang.Integer Objects onto the queue, and later prints them out. The output from the program follows:

starting...
insert: 3
insert: 70
insert: 5
insert: 17
insert: 26
insert: 79
insert: 12
insert: 44
insert: 25
insert: 27
remove: 3
remove: 70
remove: 5
remove: 17
remove: 26
remove: 79
remove: 12
remove: 44
remove: 25
remove: 27
Done ;-)

    I suggest you compare this output to the one from stack. It's almost completely different. I guess that's it for this array implementation of this FIFO data structure. And now, onto something more complex...


Array Lists...

    The next step up in complexity is a list. Most people prefer to implement a list as a linked list (and I'll show how to do that later), but what most people miss, is that lists can also be implemented using arrays. A list has no particular structure; it just has to allow for the insertion and removal of objects from both ends, and some way of looking at the middle elements.

    A list is kind of a stack combined with a queue; with additional feature of looking at the middle elements. Preferably, a list should also contain the current number of elements. Well, lets not just talk about a list, but write one!

public class pArrayList{
    protected Object[] array;
    protected int start,end,number;

    public pArrayList(int maxsize){
        array = new Object[maxsize];
        start = end = number = 0;
    }
    public boolean isEmpty(){
        return number == 0;
    }
    public boolean isFull(){
        return number >= array.length;
    }
    public int size(){
        return number;
    }
    public void insert(Object o){
        if(number < array.length){
            array[start = (++start % array.length)] = o;
            number++;
        }
    }
    public void insertEnd(Object o){
        if(number < array.length){
            array[end] = o;
            end = (--end + array.length) % array.length;
            number++;
        }
    }
    public Object remove(){
        if(isEmpty())
            return null;
        number--;
        int i = start;
        start = (--start + array.length) % array.length;
        return array[i];
    }
    public Object removeEnd(){
        if(isEmpty())
            return null;
        number--;
        return array[end = (++end % array.length)];
    }
    public Object peek(int n){
        if(n >= number)
            return null;
        return array[(end + 1 + n) % array.length];
    }
}

    The class contains four data elements: array, start, end, and number. The number is the number of elements inside the array. The start is the starting pointer, and the end is the ending pointer inside the array (kind of like the queue design).

    The constructor, pArrayList(), and methods isEmpty(), isFull(), and size(), are pretty much self explanatory. The insert() method works exactly the same way as an equivalent queue method. It just increments the start pointer, does a mod (the % symbol), and inserts into the resulting position.

    The insertEnd(Object) method, first checks that there is enough space inside the array. It then inserts the element into the end location. The next trick is to decrement the end pointer, add the array.length, and do a mod with array.length. This had the effect of moving the end pointer backwards (as if we had inserted something at the end).

    The Object remove() method works on a very similar principle. First, it checks to see if there are elements to remove, if not, it simply returns a null (no Object). It then decrements number. We're keeping track of this number inside all insertion and removal methods, so that it always contains the current number of elements. We then create a temporary variable to hold the current position of the start pointer. After that, we update the start pointer by first decrementing it, adding array.length to it, and doing a mod with array.length. This gives the appearance of removing an element from the front of the list. We later return the position inside the array, which we've saved earlier inside that temporary variable 'i'.

    The Object removeEnd() works similar to the insert() method. It checks to see if there are elements to remove by calling isEmpty() method, if there aren't, it returns null. It then handles the number (number of elements) business, and proceeds with updating the end pointer. It first increments the end pointer, and then does a mod with array.length, and returns the resulting position. Simple?

    This next Object peek(int n) method is the most tricky one. It accepts an integer, and we need to return the number which this integer is pointing to. This would be no problem if we were using an array that started at 0, but we're using our own implementation, and the list doesn't necessarily start at array position 0. We start this by checking if the parameter 'n' is not greater than the number of elements, if it is, we return null (since we don't want to go past the bounds of the array). What we do next is add 'n' (the requesting number) to an incremented end pointer, and do a mod array.length. This way, it appears as if this function is referencing the array from 0 (while the actual start is the incremented end pointer).

    As I've said previously, the code you write is useless, unless it's working, so, lets write a test driver to test our list class. To write the test driver, I've converted that really cool Queue test driver, and added some features to test out the specifics of lists. Well, here it is:

import java.io.*;
import pArrayList;

class pArrayListTest{
    public static void main(String[] args){
        pArrayList l = new pArrayList(10);
        Integer j = null;
        int i;
        System.out.println("starting...");
        for(i=0;i<5;i++){
            j = new Integer((int)(Math.random() * 100));
            l.insert(j);
            System.out.println("insert: " + j);
        }
        while(!l.isFull()){
            j = new Integer((int)(Math.random() * 100));
            l.insertEnd(j);
            System.out.println("insertEnd: " + j);
        }
        for(i=0;i<l.size();i++)
            System.out.println("peek "+i+": "+l.peek(i));
        for(i=0;i<5;i++)
            System.out.println("remove: " + ((Integer)l.remove()));
        while(!l.isEmpty())
            System.out.println("removeEnd: " + ((Integer)l.removeEnd()));
        System.out.println("Done ;-)");
    }
}

    The test driver is nothing special, it inserts (in front) five random numbers, and the rest into the back (also five). It then prints out the entire list by calling peek() inside a for loop. It then continues with the removal (from front) of five numbers, and later removing the rest (also five). At the end, the program prints "Done" with a cute smiley face ;-)

    The output from this test driver is given below. I suggest you examine it thoroughly, and make sure you understand what's going on inside this data structure.

starting...
insert: 14
insert: 72
insert: 71
insert: 11
insert: 27
insertEnd: 28
insertEnd: 67
insertEnd: 36
insertEnd: 19
insertEnd: 45
peek 0: 45
peek 1: 19
peek 2: 36
peek 3: 67
peek 4: 28
peek 5: 14
peek 6: 72
peek 7: 71
peek 8: 11
peek 9: 27
remove: 27
remove: 11
remove: 71
remove: 72
remove: 14
removeEnd: 45
removeEnd: 19
removeEnd: 36
removeEnd: 67
removeEnd: 28
Done ;-)

    Well, if you really understand everything up to this point, there is nothing new anybody can teach you about arrays (since you know all the basics). There are however public tools available to simplify your life. Some are good, some are bad, but one that definitely deserves to have a look at is the java.util.Vector class; and that's what the next section is about!


The Vector...

    The java.util.Vector class is provided by the Java API, and is one of the most useful array based data storage classes I've ever seen. The information provided here is as far as JDK 1.2 goes, future versions may have other implementations; still, the functionality should remain the same. A vector, is a growing array; as more and more elements are added onto it, the array grows. There is also a possibility of making the array smaller.

    But wait a minute, all this time I've been saying that arrays can't grow or shrink, and it seems Java API has done it. Not quite. The java.util.Vector class doesn't exactly grow, or shrink. When it needs to do these operations, it simply allocates a new array (of appropriate size), and copies the contents of the old array into the new array. Thus, giving the impression that the array has changed size.

    All these memory operations can get quite expensive if a Vector is used in a wrong way. Since a Vector has a similar architecture to the array stack we've designed earlier, the best and fastest way to implement a Vector is to do stack operations. Usually, in programs, we need a general data storage class, and don't really care about the order in which things are stored or retrieved; that's where java.util.Vector comes in very useful.

    Using a Vector to simulate a queue is very expensive, since every time you insert or remove, the entire array has to be copied (not necessarily reallocated but still involves lots of useless work).

    Vector allows us to view it's insides using an Enumerator; a class to go through objects. It is very useful to first be able to look what you're looking for, and only later decide whether you'd like to remove it or not. A sample program that uses java.util.Vector for it's storage follows.

import java.io.*;
import java.util.*;

class pVectorTest{
    public static void main(String[] args){
        Vector v = new Vector(15);
        Integer j = null;
        int i;
        System.out.println("starting...");
        for(i=0;i<10;i++){
            j = new Integer((int)(Math.random() * 100));
            v.addElement(j);
            System.out.println("addElement: " + j);
        }
        System.out.println("size: "+v.size());
        System.out.println("capacity: "+v.capacity());

        Enumeration enum = v.elements();
        while(enum.hasMoreElements())
            System.out.println("enum: "+(Integer)enum.nextElement());

        System.out.println("Done ;-)");
    }
}

    The example above should be self evident (if you paid attention when I showed test programs for the previous data structures). The main key difference is that this one doesn't actually remove objects at the end; we just leave them inside. Removal can be accomplished very easily, and if you'll be doing anything cool with the class, you'll sure to look up the API specs.

    Printing is accomplished using an Enumerator; which we use to march through every element printing as we move along. We could also have done the same by doing a for loop, going from 0 to v.size(), doing a v.elementAt(int) every time through the loop. The output from the above program follows:

starting...
addElement: 9
addElement: 5
addElement: 54
addElement: 49
addElement: 60
addElement: 81
addElement: 8
addElement: 91
addElement: 76
addElement: 81
size: 10
capacity: 15
enum: 9
enum: 5
enum: 54
enum: 49
enum: 60
enum: 81
enum: 8
enum: 91
enum: 76
enum: 81
Done ;-)

    You should notice that when we print the size and capacity, they're different. The size is the current number of elements inside the Vector, and the capacity, is the maximum possible without reallocation.

    A trick you can try yourself when playing with the Vector is to have Vectors of Vectors (since Vector is also an Object, there shouldn't be any problems of doing it). Constructs like that can lead to some interesting data structures, and even more confusion. Just try inserting a Vector into a Vector ;-)

    I guess that covers the Vector class. If you need to know more about it, you're welcome to read the API specs for it. I also greatly encourage you to look at java.util.Vector source, and see for yourself what's going on inside that incredibly simple structure.


Nodes...

    The other type of data structures are what's called Node based data structures. Instead of storing data in it's raw format, Node based data structures store nodes, which in turn store the data. Think of nodes as being elements, which may have one or more pointers to other nodes.

    Yes, I did say the "pointer" word. Many people think that there are no pointers in Java, but just because you don't see them directly, doesn't mean they're not there. In fact, you can treat any object as a pointer.

    Thus, the Node structure should have a data element, and a reference to another node (or nodes). Those other nodes which are referenced to, are called child nodes. The node itself is called the parent node (or sometimes a "father" node) in reference to it's children. (nice big happy family)

    Well, the best way to visualize a node is to create one, so, lets do it. The node we'll create will be a one child node (it will have only one pointer), and we'll later use it in later sections to build really cool data structures. The source for our one child node follows:

public class pOneChildNode{
    protected Object data;
    protected pOneChildNode next;

    public pOneChildNode(){
        next = null;
        data = null;
    }
    public pOneChildNode(Object d,pOneChildNode n){
        data = d;
        next = n;
    }
    public void setNext(pOneChildNode n){
        next = n;
    }
    public void setData(Object d){
        data = d;
    }
    public pOneChildNode getNext(){
        return next;
    }
    public Object getData(){
        return data;
    }
    public String toString(){
        return ""+data;
    }
}

    Go over the source, notice that it's nothing more than just set and get functions (pretty simple). The two data members are the data and next. The data member holds the data of the node, and next holds the pointer to the next node. Notice that next is of the same type as the class itself; it effectively points to the object of same class!

    The String toString() method is the Java's standard way to print things. If an object wants to be printed in a special way, it will define this method, with instructions on how to print this object. In our case, we just want to print the data. Adding data to a bunch of quotation marks automatically converts it to type String (hopefully, our data will also have a toString() method defined on it). Without this method, we get the actual binary representation of the data members of this class (not a pretty nor meaningful printout).

    Node based data structures provide for dynamic growing and shrinking, and are the key to some complex algorithms (as you'll see later). Now that we know how to implement a Node, lets get to something cool...


Linked Lists...

    A linked list is just a chain of nodes, with each subsequent node being a child of the previous one. Many programs rely on linked lists for their storage because these don't have any evident restrictions. For example, the array list we did earlier could not grow or shrink, but node based ones can! This means there is no limit (other than the amount of memory) on the number of elements they can store.

    A linked list has just one node, that node, has links to subsequent nodes. So, the entire list can be referenced from that one node. That first node is usually referred to as the head of the list. The last node in the chain of nodes usually has some special feature to let us know that it's last. That feature, most of the time is a null pointer to the next node.

[node0]->[node1]->[node2]->[node3]->[node4]->null

    The example above illustrates the node organization inside the list. In it, node0 is the head node, and node4 is the last node, because it's pointer points to null. Well, now that you know how it's done, and what is meant by a linked list, lets write one. (I mean, that's why we're here, to actually write stuff!)

import pOneChildNode;

public class pLinkedList{
    protected pOneChildNode head;
    protected int number;

    public pLinkedList(){
        head = null;
        number = 0;
    }
    public boolean isEmpty(){
        return head == null;
    }
    public int size(){
        return number;
    }
    public void insert(Object obj){
        head = new pOneChildNode(obj,head);
        number++;
    }
    public Object remove(){
        if(isEmpty())
            return null;
        pOneChildNode tmp = head;
        head = tmp.getNext();
        number--;
        return tmp.getData();
    }
    public void insertEnd(Object obj){
        if(isEmpty())
            insert(obj);
        else{
            pOneChildNode t = head;
            while(t.getNext() != null)
                t=t.getNext();
            pOneChildNode tmp =
                new pOneChildNode(obj,t.getNext());
            t.setNext(tmp);
            number++;
        }
    }
    public Object removeEnd(){
        if(isEmpty())
            return null;
        if(head.getNext() == null)
            return remove();
        pOneChildNode t = head;
        while(t.getNext().getNext() != null)
            t = t.getNext();
        Object obj = t.getNext().getData();
        t.setNext(t.getNext().getNext());
        number--;
        return obj;
    }
    public Object peek(int n){
        pOneChildNode t = head;
        for(int i = 0;i<n && t != null;i++)
            t = t.getNext();
        return t.getData();
    }
}

    Before we move on, lets go over the source. There are two data members, one named head, and the other named number. Head is the first node of the list, and number is the total number of nodes in the list. Number is primarily used for the size() method. The constructor, pLinkedList() is self explanatory. The size() and isEmpty() methods are also pretty easy.

    Here comes the hard part, the insertion and removal methods. Method insert(Object) creates a new pOneChildNode object with next pointer pointing to the current head, and data the data which is inserted. It then sets the head to that new node. If you think about it, you'll notice that the head is still saved, and the new node points to it.

    Method Object remove() works in a very similar fashion, but instead of inserting, it is removing. It first checks to see if the list is isEmpty() or not, if it is, it returns a null. It then saves the current head node, and then changes it to accommodate the removal (think about the logic), decrements the number, and returns the data from the previously saved node.

    In the method insertEnd(Object), we're actually inserting at the end of the list. We first check to see if the list is isEmpty(), if it is, we do a regular insertion (since it really doesn't matter which direction we're coming from if the list is empty). We then setup a loop to search for the end. The end is symbolized by the next pointer of the node being null. When we get to the end, we create a new node, and place it at the end location. Incrementing number before we return.

    Method Object removeEnd() works in a similar fashion as insertend(Object) method. It also goes through the whole list to look for the end. At the beginning, we check if the list is not isEmpty(), if it is, we return a null. We then check to see if there is only one element in the list, if there is only one, we remove it using regular remove(). We then setup a loop to look for the node who's child is the last node. It is important to realize that if we get to the last node, we won't be able to erase it; we need the last node's parent node. When we find it, we get the data, setup necessary links, decrement number, and return the data.

    The Object peek(int) method simply goes through the list until it either reaches the element requested, or the end of the list. If it reaches the end, it should return a null, if not, it should return the correct location inside the list.

    As I have said before, it is very important to actually test. The ideas could be fine, and logical, but if it doesn't work, it doesn't work. So, lets convert our pArrayListTest driver to accommodate this class.

import java.io.*;
import pLinkedList;

class pLinkedListTest{
    public static void main(String[] args){
        pLinkedList l = new pLinkedList();
        Integer j = null;
        int i;
        System.out.println("starting...");
        for(i=0;i<5;i++){
            j = new Integer((int)(Math.random() * 100));
            l.insert(j);
            System.out.println("insert: " + j);
        }
        for(;i<10;i++){
            j = new Integer((int)(Math.random() * 100));
            l.insertEnd(j);
            System.out.println("insertEnd: " + j);
        }
        for(i=0;i<l.size();i++)
            System.out.println("peek "+i+": "+l.peek(i));
        for(i=0;i<5;i++)
            System.out.println("remove: " + ((Integer)l.remove()));
        while(!l.isEmpty())
            System.out.println("removeEnd: " + ((Integer)l.removeEnd()));
        System.out.println("Done ;-)");
    }
}

    The test driver is nothing special, it's just a pretty simple conversion of the old test driver, so, I wont spend any time discussing it. The output follows.

starting...
insert: 65
insert: 78
insert: 21
insert: 73
insert: 62
insertEnd: 82
insertEnd: 63
insertEnd: 6
insertEnd: 95
insertEnd: 57
peek 0: 62
peek 1: 73
peek 2: 21
peek 3: 78
peek 4: 65
peek 5: 82
peek 6: 63
peek 7: 6
peek 8: 95
peek 9: 57
remove: 62
remove: 73
remove: 21
remove: 78
remove: 65
removeEnd: 57
removeEnd: 95
removeEnd: 6
removeEnd: 63
removeEnd: 82
Done ;-)

    Look over the output, make sure you understand why you get what you get. Linked lists are one of the most important data structures you'll ever learn, and it really pays to know them well. Don't forget that you can always experiment. One exercise I'd like to leave up to the reader is to create a circular list. In a circular list, the last node is not pointing to null, but to the first node (creating a circle). Sometimes, lists are also implemented using two pointers; and there are many other variations you should consider and try yourself. You can even make it faster by having a "dummy" first node and/or "tail" node. This will eliminate most special cases, making it faster on insertions and deletions.

    I guess that's it for the lists, next, I'll show you how to do simple and easy tricks to re-use code.


Reusing Tricks...

    We have already written quite a lot of useful stuff, and there might come a time, when you're just too lazy to write something new, and would like to reuse the old source. This section will show you how you can convert some data structures previously devised, to implement a stack and a queue, with almost no creativity (by simply reusing the old source).

    Before we start, lets review the function of a stack. It has to be able to push and pop items of from one end. What structure do we know that can do something similar? A list! Lets write a list implementation of a stack.

import pLinkedList;

public class pEasyStack{
    protected pLinkedList l;

    public pEasyStack(){
        l = new pLinkedList();
    }
    public boolean isEmpty(){
        return l.isEmpty();
    }
    public void push(Object o){
        l.insert(o);
    }
    public Object pop(){
        return l.remove();
    }
}

    See how easily the above code simulates a stack by using a list? It may not be the best implementation, and it's certainly not the fastest, but when you need to get the project compiled and tested, you don't want to spend several unnecessary minutes writing a full blown optimized stack. Test for the stack follows:

import java.io.*;
import pEasyStack;

class pEasyStackTest{
    public static void main(String[] args){
        pEasyStack s = new pEasyStack();
        Integer j = null;
        int i;
        System.out.println("starting...");
        for(i=0;i<10;i++){
            j = new Integer((int)(Math.random() * 100));
            s.push(j);
            System.out.println("push: " + j);
        }
        while(!s.isEmpty()){
            System.out.println("pop: " + ((Integer)s.pop()));
        }
        System.out.println("Done ;-)");
    }
}

    The stack test program is exactly the same as for the previous stack version (doesn't really need explanation). For the completion, I'll also include the output.

starting...
push: 23
push: 99
push: 40
push: 78
push: 54
push: 27
push: 52
push: 34
push: 98
push: 89
pop: 89
pop: 98
pop: 34
pop: 52
pop: 27
pop: 54
pop: 78
pop: 40
pop: 99
pop: 23
Done ;-)

    You've seen how easily we can make a stack. What about other data structures? Well, we can just as easily implement a queue. One thing though, now instead of just inserting and removing, we'll be removing from the end (the other from the one we're inserting).

import pLinkedList;

public class pEasyQueue{
    protected pLinkedList l;

    public pEasyQueue(){
        l = new pLinkedList();
    }
    public boolean isEmpty(){
        return l.isEmpty();
    }
    public void insert(Object o){
        l.insert(o);
    }
    public Object remove(){
        return l.removeEnd();
    }
}

    Pretty easy huh? Well, the test driver is also easy.

import java.io.*;
import pEasyQueue;

class pEasyQueueTest{
    public static void main(String[] args){
        pEasyQueue s = new pEasyQueue();
        Integer j = null;
        int i;
        System.out.println("starting...");
        for(i=0;i<10;i++){
            j = new Integer((int)(Math.random() * 100));
            s.insert(j);
            System.out.println("insert: " + j);
        }
        while(!s.isEmpty()){
            System.out.println("remove: " + ((Integer)s.remove()));
        }
        System.out.println("Done ;-)");
    }
}

    I guess you get the picture, reusing code may not always be the best choice, but it sure is the easiest! Definitely, if you have time, always write a better implementation; these approaches are only good for the deadline, just to compile and test the code before the actual hard work of optimizing it. For the sake of completeness, the output of the above program follows:

starting...
insert: 77
insert: 79
insert: 63
insert: 59
insert: 22
insert: 62
insert: 54
insert: 58
insert: 79
insert: 25
remove: 77
remove: 79
remove: 63
remove: 59
remove: 22
remove: 62
remove: 54
remove: 58
remove: 79
remove: 25
Done ;-)

    I guess that covers code reuse! And remember, you can also reuse the code in the Java API. The reuse can be of many forms. You can reuse it the way I've shown in this section, or you can extend that particular class to provide the necessary functions. (Extending a java.util.Vector class is very useful ;-)


Trees...

    The next major set of data structures belongs to what's called Trees. They are called that, because if you try to visualize the structure, it kind of looks like a tree (root, branches, and leafs). Trees are node based data structures, meaning that they're made out of small parts called nodes. You already know what a node is, and used one to build a linked list. Tree Nodes have two or more child nodes; unlike our list node, which only had one child.

    Trees are named by the number of children their nodes have. For example, if a tree node has two children, it is called a binary tree. If it has three children, it is called tertiary tree. If it has four children, it is called a quad tree, and so on. Fortunately, to simplify things, we only need binary trees. With binary trees, we can simulate any tree; so the need for other types of trees only becomes a matter of simplicity for visualization.

    Since we'll be working with binary trees, lets write a binary tree node. It's not going to be much different from our pOneChildNode class; actually, it's going to be quite the same, only added support for one more pointer. The source for the follows:

public class pTwoChildNode{
    protected Object data;
    protected pTwoChildNode left,right;

    public pTwoChildNode(){
        data = null;
        left = right = null;
    }
    public pTwoChildNode(Object d){
        data = d;
        left = right = null;
    }
    public void setLeft(pTwoChildNode l){
        left = l;
    }
    public void setRight(pTwoChildNode r){
        right = r;
    }
    public void setData(Object d){
        data = d;
    }
    public pTwoChildNode getLeft(){
        return left;
    }
    public pTwoChildNode getRight(){
        return right;
    }
    public Object getData(){
        return data;
    }
    public String toString(){
        return ""+data;
    }
}

    This node is so much similar to the previous node we did, that I'm not even going to cover this one. (I assume you'll figure it out by simply looking at the source) The children of the node are named left and right; these will be the left branch of the node and a right branch. If a node has no children, it is called a leaf node. If a node has no parent (it's the father of every node), it is called the root of the tree. This weird terminology comes from the tree analogy, and from the family tree analogy.

    Some implementations of the tree node, might also have a back pointer to the parent node, but for what we'll be doing with the nodes, it's not necessary. The next section will talk about a generic binary tree which will be later used to create something cool.


Generic Tree...

    Binary Trees are quite complex, and most of the time, we'd be writing a unique implementation for every specific program. One thing that almost never changes though is the general layout of a binary tree. We will first implement that general layout as an abstract class (a class that can't be used directly), and later write another class which extends our layout class.

    Trees have many different algorithms associated with them. The most basic ones are the traversal algorithms. Traversals algorithms are different ways of going through the tree (the order in which you look at it's values). For example, an in-order traversal first looks at the left child, then the data, and then the right child. A pre-order traversal, first looks at the data, then the left child, and then the right; and lastly, the post-order traversal looks at the left child, then right child, and only then data. Different traversal types mean different things for different algorithms and different trees. For example, if it's binary search tree (I'll show how to do one later), then the in-order traversal will print elements in a sorted order.

    Well, lets not just talk about the beauties of trees, and start writing one! The code that follows creates an abstract Generic Binary Tree.

import pTwoChildNode;

public abstract class pGenericBinaryTree{
    private pTwoChildNode root;

    protected pTwoChildNode getRoot(){
        return root;
    }
    protected void setRoot(pTwoChildNode r){
        root = r;
    }
    public pGenericBinaryTree(){
        setRoot(null);
    }
    public pGenericBinaryTree(Object o){
        setRoot(new pTwoChildNode(o));
    }
    public boolean isEmpty(){
        return getRoot() == null;
    }
    public Object getData(){
        if(!isEmpty())
            return getRoot().getData();
        return null;
    }
    public pTwoChildNode getLeft(){
        if(!isEmpty())
            return getRoot().getLeft();
        return null;
    }
    public pTwoChildNode getRight(){
        if(!isEmpty())
            return getRoot().getRight();
        return null;
    }
    public void setData(Object o){
        if(!isEmpty())
            getRoot().setData(o);
    }
    public void insertLeft(pTwoChildNode p,Object o){
        if((p != null) && (p.getLeft() == null))
            p.setLeft(new pTwoChildNode(o));
    }
    public void insertRight(pTwoChildNode p,Object o){
        if((p != null) && (p.getRight() == null))
            p.setRight(new pTwoChildNode(o));
    }
    public void print(int mode){
        if(mode == 1) pretrav();
        else if(mode == 2) intrav();
        else if(mode == 3) postrav();
    }
    public void pretrav(){
        pretrav(getRoot());
    }
    protected void pretrav(pTwoChildNode p){
        if(p == null)
            return;
        System.out.print(p.getData()+" ");
        pretrav(p.getLeft());
        pretrav(p.getRight());
    }
    public void intrav(){
        intrav(getRoot());
    }
    protected void intrav(pTwoChildNode p){
        if(p == null)
            return;
        intrav(p.getLeft());
        System.out.print(p.getData()+" ");
        intrav(p.getRight());
    }
    public void postrav(){
        postrav(getRoot());
    }
    protected void postrav(pTwoChildNode p){
        if(p == null)
            return;
        postrav(p.getLeft());
        postrav(p.getRight());
        System.out.print(p.getData()+" ");
    }
}

    Now, lets go over it. The pGenericBinaryTree is a fairly large class, with a fair amount of methods. Lets start with the one and only data member, the root! In this abstract class, root is a private head of the entire tree. Actually, all we need is root to access anything (and that's how you'd implement it in other languages). Since we'd like to have access to root from other places though (from derived classes, but not from the "outside," we've also added two methods, named getRoot(), and setRoot() which get and set the value of root respectively.

    We have two constructors, one with no arguments (which only sets root to null), and another with one argument (the first element to be inserted on to the tree). Then we have a standard isEmpty() method to find out if the tree is empty. You'll also notice that implementing a counter for the number of elements inside the tree is not a hard thing to do (very similar to the way we did it in a linked list).

    The getData() method returns the data of the root node. This may not be particularly useful to us right now, but may be needed in some derived class (so, we stick it in there just for convenience). Throughout data structures, and mostly entire programming world, you'll find that certain things are done solely for convenience. Other "convenient" methods are getLeft(), getRight() and setData().

    The two methods we'll be using later (for something useful), are: insertLeft(pTwoChildNode,Object), and insertRight(pTwoChildNode,Object). These provide a nice way to quickly insert something into the left child (sub-tree) of the given node.

    The rest are just print methods. The trick about trees are that they can be traversed in many different ways, and these print methods print out the whole tree, in different traversals. All of these are useful, depending on what you're doing, and what type of tree you have. Sometimes, some of them make absolutely no sense, depending on what you're doing.

    Printing methods are recursive; a lot of the tree manipulation functions are recursive, since they're described so naturally in recursive structures. A recursive function is a function that calls itself (kind of like pretrav(), intrav(), and postrav() does).

    Go over the source, make sure you understand what each function is doing (not a hard task). It's not important for you to understand why we need all these functions at this point (for now, we "just" need them); you'll understand why we need some of them in the next few sections, when we extend this class to create a really cool sorting engine.


Comparing Objects...

    Comparing Objects in Java can be a daunting task, especially if you have no idea how it's done. In Java, we can only compare variables of native type. These include all but the objects (ex: int, float, double, etc.). To compare Objects, we have to make objects with certain properties; properties that will allow us to compare.

    We usually create an interface, and implement it inside the objects we'd like to compare. In our case, we'll call the interface pComparable. Interfaces are easy to write, since they're kind of like abstract classes.

public interface pComparable{
    public int compareTo(Object o);
}

    As you can see, there is nothing special to simple interfaces. Now, the trick is to implement it. You might be saying, why am I covering comparing of objects right in the middle of a binary tree discussion... well, we can't have a binary search tree without being able to compare objects. For example, if we'd like to use integers in our binary search tree, we'll have to design our own integer, and let it have a pComparable interface.

    Next follows our implementation of pInteger, a number with a pComparable interface. I couldn't just extend the java.lang.Integer, since it's final (cannot be extended) (those geniuses!).

public class pInteger implements pComparable{
    int i;

    public pInteger(){
    }
    public pInteger(int j){
        set(j);
    }
    public int get(){
        return i;
    }
    public void set(int j){
        i = j;
    }
    public String toString(){
        return ""+get();
    }
    public int compareTo(Object o){
        if(o instanceof pInteger)
            if(get() > ((pInteger)o).get())
                return 1;
            else if(get() < ((pInteger)o).get())
                return -1;
        return 0;
    }
}

    I believe most of the interface is self explanatory, except maybe for the compareTo(Object) method. In the method, we first make sure that the parameter is of type pInteger, and later using casting, and calling methods, we compare the underlying native members of pInteger and return an appropriate result.

    A note on JDK 1.2: In the new versions of the JDK, you won't need to implement your own pComparable, or your own pInteger; since it's built in! There is a Comparable interface, and it's already implemented by the built in java.lang.Integer, java.lang.String, and other classes where you might need comparisons. I'm doing it this way only for compatibility with the older versions of JDK. I'll talk about JDK 1.2 features later in this document (hopefully).


Binary Search Trees...

    And now, back to the show, or shall I say Binary Trees! A binary tree we'll be designing in this section will be what's known as binary search tree. The reason it's called this is that it can be used to sort numbers (or objects) in a way, that makes it very easy to search them; traverse them. Remember how I've said that traversals only make sense in some specific context, well, in binary search tree, only the in-traversal makes sense; in which numbers (objects) are printed in a sorted fashion. Although I'll show all traversals just for the fun of it.

    A binary search tree will extend our pGenericBinaryTree, and will add on a few methods. One that we definitely need is the insert() method; to insert objects into a tree with binary search in mind. Well, instead of just talking about it, lets write the source!

import pComparable;

public class pBinarySearchTree extends pGenericBinaryTree{

    public pBinarySearchTree(){
        super();
    }

    public pBinarySearchTree(Object o){
        super(o);
    }

    public void print(){
        print(2);
    }

    public void insert(pComparable o){
        pTwoChildNode t,q;
        for(q = null, t = getRoot();
            t != null && o.compareTo(t.getData()) != 0;
            q = t,t = o.compareTo(t.getData()) < 0
                ? t.getLeft():t.getRight());
        if(t != null)
            return;
        else if(q == null)
            setRoot(new pTwoChildNode(o));
        else if(o.compareTo(q.getData()) < 0)
            insertLeft(q,o);
        else
            insertRight(q,o);
    }
}

    As you can obviously see, the insert(pComparable) method is definitely the key to the whole thing. The method starts out by declaring two variables, 't', and 'q'. It then falls into a for loop. The condition inside the for loop is that 't' does not equal to null (since it was initially set to getRoot(), which effectively returns the value of root), and while the object we're trying to insert does not equal to the object already inside the tree.

    Usually, a binary search tree does not allow duplicate insertions, since they're kind of useless; that's why we're attempting to catch the case where we're trying to insert a duplicate. Inside the for loop, we set 'q' to the value of the next node to be examined. We do this by first comparing the data we're inserting with the data in the current node, if it's greater, we set 't' to the right node, if less, we set it to the left node (all this is cleverly disguised inside that for statement).

    We later check the value of 't' to make sure we've gotten to the end (or leaf) of the tree. If 't' is not null, that means we've encountered a duplicate, and we simply return. We then check to see if the tree is empty (didn't have a root), if it didn't, we create a new root by calling setRoot() with a newly created node holding the inserted data.

    If all else fails, simply insert the object into the left or the right child of the leaf node depending on the value of the data. And that's that!

    Understanding binary search trees is not easy, but it is the key to some very interesting algorithms. So, if you miss out on the main point here, I suggest you read it again, or get a more formal reference (where I doubt you'll learn more).

    Anyway, as it was with our stacks and queues, we always had to test everything, so, lets test it! Below, I give you the test module for the tree.

import java.io.*;
import pInteger;
import pBinarySearchTree;

class pBinarySearchTreeTest{
    public static void main(String[] args){
        pBinarySearchTree tree = new pBinarySearchTree();
        pInteger n;
        int i;
        System.out.println("Numbers inserted:");
        for(i=0;i<10;i++){
            tree.insert(n=new pInteger((int)(Math.random()*1000)));
            System.out.print(n+" ");
        }
        System.out.println("\nPre-order:");
        tree.print(1);
        System.out.println("\nIn-order:");
        tree.print();
        System.out.println("\nPost-order:");
        tree.print(3);
    }
}

    As you can see, it's pretty simple (and similar to our previous tests). It first inserts ten pInteger (pComparable) objects in to the tree, and then traverses the tree in different orders. These different orders print out the whole tree. Since we know it's a binary search tree, the in-order traversal should produce an ordered output. So, lets take a look at the output!

Numbers inserted:
500 315 219 359 259 816 304 902 681 334
Pre-order:
500 315 219 259 304 359 334 816 681 902
In-order:
219 259 304 315 334 359 500 681 816 902
Post-order:
304 259 219 334 359 315 681 902 816 500

    Well, our prediction is confirmed! The in-order traversal did produce sorted results. There is really nothing more I can say about this particular binary search tree, except that it's worth knowing. This is definitely not the fastest (nor was speed an issue), and not necessarily the most useful class, but it sure may proof useful in teaching you how to use trees.

    And now, onto something completely different! NOT! We're going to be doing trees for a while... I want to make sure you really understand what they are, and how to use them. (and to show you several tricks other books try to avoid <especially Java books>)


Tree Traversals...

    I've talked about tree traversals before in this document, but lets review what I've said. Tree's are created for the sole purpose of traversing them. There are two major traversal algorithms, the depth-first, and breadth-first.

    So far, we've only looked at depth-first. Pre-traversal, in-traversal, and post-traversal are subsets of depth-first traversals. The reason it's named depth-first, is because we eventually end up going to the deepest node inside the tree, while still having unseen nodes closer to the root (it's hard to explain, and even harder to understand). Tracing a traversal surely helps; and you can trace that traversal from the previous section (it's only ten numbers!).

    The other type of traversal is more intuitive; more "human like." Breadth-first traversal goes through the tree top to bottom, left to right. Lets say you were given a tree to read (sorry, don't have a non-copyrighted picture I can include), you'd surely read it top to bottom, left to right (just like a page of text, or something).

    Think of a way you visualize a tree... With the root node on top, and all the rest extending downward. What Breadth-First allows us to do is to trace the tree from top to bottom as you see it. It will visit each node at a given tree depth, before moving onto the the next depth.

    A lot of the algorithms are centered around Breadth-First method. Like the search tree for a Chess game. In chess, the tree can be very deep, so, doing a Depth-First traversal (search) would be costly, if not impossible. With Breadth-First as applied in Chess, the program only looks at several moves ahead, without looking too many moves ahead.

    The Breadth-First traversal is usually from left to right, but that's usually personal preference. Because the standard consul does not allow graphics, the output may be hard to correlate to the actual tree, but I will show how it's done.

    As with previous examples, I will provide some modified source that will show you how it's done. An extended pBinarySearchTree is shown below:

import pTwoChildNode;
import pBinarySearchTree;
import pEasyQueue;

public class pBreadthFirstTraversal extends pBinarySearchTree{

    public void breadth_first(){
        pEasyQueue q = new pEasyQueue();
        pTwoChildNode tmp;
        q.insert(getRoot());
        while(!q.isEmpty()){
            tmp = (pTwoChildNode)q.remove();
            if(tmp.getLeft() != null)
                q.insert(tmp.getLeft());
            if(tmp.getRight() != null)
                q.insert(tmp.getRight());
            System.out.print(tmp.getData()+" ");
        }
    }
}

    As you can see, the class is pretty simple (only one function). In this demo, we're also using pEasyQueue, developed earlier in this document. Since breadth first traversal is not like depth first, we can't use recursion, or stack based methods, we need a queue. Any recursive method can be easily simulated using a stack, not so with breadth first, here, we definitely need a queue.

    As you can see, we start by first inserting the root node on to the queue, and loop while the queue is not isEmpty(). If we have a left node in the node being examined, we insert it in to the queue, etc. (same goes for the right node). Eventually, the nodes inserted in to the queue, get removed, and subsequently, have their left children examined. The process continues until we've traversed the entire tree, from top to bottom, left to right order.

    Now, lets test it. The code below is pretty much the same code used to test the tree, with one minor addition; the one to test the breadth-first traversal!

import java.io.*;
import pInteger;
import pBinarySearchTree;

class pBreadthFirstTraversalTest{
    public static void main(String[] args){
        pBreadthFirstTraversal tree = new pBreadthFirstTraversal();
        pInteger n;
        int i;
        System.out.println("Numbers inserted:");
        for(i=0;i<10;i++){
            tree.insert(n=new pInteger((int)(Math.random()*1000)));
            System.out.print(n+" ");
        }
        System.out.println("\nPre-order:");
        tree.print(1);
        System.out.println("\nIn-order:");
        tree.print();
        System.out.println("\nPost-order:");
        tree.print(3);
        System.out.println("\nBreadth-First:");
        tree.breadth_first();
    }
}

    As you can see, nothing too hard. Next, goes the output of the above program, and you and I will have to spend some time on the output.

Numbers inserted:
890 474 296 425 433 555 42 286 724 88
Pre-order:
890 474 296 42 286 88 425 433 555 724
In-order:
42 88 286 296 425 433 474 555 724 890
Post-order:
88 286 42 433 425 296 724 555 474 890
Breadth-First:
890 474 296 555 42 425 724 286 433 88

    Looking at the output in this format is very abstract and is not very intuitive. Lets just say we have some sort of a tree, containing these numbers above. We were looking at the root node. Now, looking at the output of this program, can you guess what the root node is? Well, it's the first number in breadth-first: 890. The left child of the root is: 474. Basically, the tree looks like:

            |--[890]
            |
       |--[474]--|
       |         |
  |--[296]--|  [555]--|
  |         |         |
[ 42]--|  [425]--|  [724]
       |         |
  |--[286]     [433]
  |
[ 88]

    As is evident from this picture, I'm a terrible ASCII artist! (If it's screwed up, sorry).

    What you can also see is that if you read the tree form left to right, top to bottom, you end up with the breadth first traversal. Actually, this is a good opportunity for you to go over all traversals, and see how they do it, and if they make sense.

    There are many variations to these traversals (and I'll show a few in subsequent sections), for now, however, try to understand these four basic ones.

    Well, see you again in the next section, where we examine some ways you can speed up your code and take a look at JDK 1.2 features, which will simplify our life a LOT!


Node Pools...

    Since you've looked at node based data structures, and their flexibility, you might wonder why would we want to use anything else. Well, sometimes, the situation comes where the standards are not enough. Where we need that extra boost of performance, which is currently occupied by the Java's garbage collection.

    In any environment, allocating and releasing memory is slow! Most of the time, you should try to avoid the procedure, unless you really need it (or it's not in a time critical loop). You certainly wouldn't want to allocate and/or release memory every time you do anything to a list, or a binary tree. What can we do to avoid the natural way? Plenty!

    The most common way around these kinds of problems is to use our own allocation technique, which is perfectly suited for our purposes, and will generally be much faster than the current system's approach (but always make sure; system techniques are quite fast as well).

    Using our own allocation scheme does imply that we'll have to restrict our selves, but most of the times (when you really need it), the restriction is worth while. What we will be doing is simulating the allocation and release of nodes, using an array of nodes, stored in a linked list fashion. It may seem complex, but in reality, it's not.

    This array of nodes, is usually referred to as the "Node Pool" (we "catch" one when we need one, and "throw it back" when we don't). I know, I know, I haven't been the same since that girl on IRC has stolen my humor.

    There are two considerations we should think about before we start programming, however. These two are whether we want to make the node-pool itself static or dynamic? Shall we allocate the node-pool with every new instance of the object, or shall we just share one common node-pool for all the objects of that class?

    These two approaches are both quite useful in different situations. For example, when was the last time you've ever used more than 52 cards in a deck of cards? That implies that your deck of cards class may have no more than 52 cards at a time; a perfect candidate for a Node Pool! Inside your game, you have this one class, which can only hold 52 cards, no more. Then again, what if your card game has to have several decks? You might want to extend that a little more; like have a separate deck for each deck! (am I making any sense?)

    The reason I describe this deck of cards example is a historical one; it was the first program I've ever used with Node Pools (long time ago; in C++). The one static node-pool allows other instances of the class (inside the same program) to know which cards are missing from this universal deck, and which are present. On the other hand, if you have a local (non-static) node-pool, each deck handles it's own cards, and non of the other decks know anything about it.

    It is important to understand this concept, since it makes a lot of sense <sometimes> (and will allow you to write robust code that doesn't waste memory). Then again, sometimes it makes absolutely no sense to use a node-pool at all!

    The example we will design in the next few sections will deal with a binary tree, which does not use dynamic memory when inserting (or removing) nodes. It will have a non-static node-pool; meaning that you'll have to specify the maximum number of elements when it's instantiated (create an instance of it). I believe the example will be general enough for you to learn to apply node-pools in various different situations. Since I've already showed how to use my own pComparable for JDK 1.1 (or lower), this next example will deal only with JDK 1.2 java.lang.Comparable interface for most purposes. (A java.lang.Comparable interface is worth learning, since it's very useful; I only wonder why they haven't come up with something similar earlier?)


Node Pool Nodes...

    As with everything else, there are TONS of ways to write exactly the same thing! With nodes, there is no exception. I prefer to implement node-pool nodes with integer children; where the integer points to the next child inside the node-pool array. This may not be the best Object Oriented way to do this, since it's binding the node-pool node to the node-pool tree. However, there are many other solutions to this, and many are more "elegant" than the one I'll be using here. So, if you don't like it, implement your own version!

    If you've paid close attention to the discussion about regular binary trees (somewhere in this document), you'll find this discussion of node-pool stuff pretty easy and similar to that "regular" stuff. Enough talk, lets go and write this node-pool node!

public class pNodePoolTreeNode{

    private Object data;
    private int left;
    private int right;

    public pNodePoolTreeNode(){
        left = right = -1;
        data = null;
    }
    public pNodePoolTreeNode(int l,int r){
        left = l;
        right = r;
        data = null;
    }
    public Object getData(){
        return data;
    }
    public void setData(Object o){
        data = o;
    }
    public int getLeft(){
        return left;
    }
    public void setLeft(int l){
        left = l;
    }
    public int getRight(){
        return right;
    }
    public void setRight(int r){
        right = r;
    }
    public String toString(){
        return new String(data.toString());
    }
}

    The code above is pretty much self explanatory (if it's not, take a look back at binary tree nodes section, and you'll figure it out). The only significant thing you should notice is that children of the node are now integers, instead of references to other node objects. Although, it later turns out that what these integer children are referencing are in fact nodes.

    Well, that was easy enough, are you ready to figure out how a tree actually manipulates these?


Node Pool Generic Trees...

    A node-pool tree is not that much different from a regular tree. The only key differences is that it has to maintain a linked list (more or less a "stack") of free nodes, and allocate them when needed.

    Another tricky thing is that now, the node's children are pointed to by integers, which are references into a node-pool array; so, watch out. You'll notice that a lot of the "tree stuff" has been re-used from the previous version, and you're right! It makes it a lot easier to understand on your part, and a lot easier to write on mine! (oh, I just ran out of coffee...)

import pNodePoolTreeNode;
import java.io.*;

public abstract class pNodePoolArrayTree{

    private int numNodes,nextAvail;
    private pNodePoolTreeNode[] arrayNodes;
    protected int root;

    private void init(){
        int i;
        root = -1;
        arrayNodes = new pNodePoolTreeNode[numNodes];
        nextAvail = 0;
        for(i=0;i<numNodes;i++)
            arrayNodes[i] = new pNodePoolTreeNode(-1,i+1);
        getNode(i-1).setRight(-1);
    }
    public pNodePoolArrayTree(){
        numNodes = 500;
        init();
    }
    public pNodePoolArrayTree(int n){
        numNodes = n;
        init();
    }
    protected int getNode(){
        int i = nextAvail;
        nextAvail = getNode(nextAvail).getRight();
        if(nextAvail == -1){
            nextAvail = i;
            return -1;
        }
        return i;
    }
    protected void freeNode(int n){
        getNode(n).setRight(nextAvail);
        nextAvail = n;
    }
    public boolean isEmpty(){
        return getRoot() == -1;
    }
    public boolean isFull(){
        int i = getNode();
        if(i != -1){
            freeNode(i);
            return false;
        }
        return true;
    }
    protected Object getData(){
        if(isEmpty())
            return null;
        return getNode(getRoot()).getData();
    }
    protected void setData(Object o){
        if(isEmpty())
            root = getNode();
        getNode(root).setData(o);
        getNode(root).setLeft(-1);
        getNode(root).setRight(-1);
    }
    protected int getLeft(){
        if(isEmpty())
            return -1;
        return getNode(getRoot()).getLeft();
    }
    protected void setLeft(int n){
        if(isFull())
            return;
        if(isEmpty()){
            root = getNode();
            getNode(getRoot()).setRight(-1);
        }
        getNode(getRoot()).setLeft(n);
    }
    protected void setRight(int n){
        if(isFull())
            return;
        if(isEmpty()){
            root = getNode();
            getNode(getRoot()).setLeft(-1);
        }
        getNode(getRoot()).setRight(n);
    }
    protected int getRight(){
        if(isEmpty())
            return -1;
        return getNode(root).getRight();
    }
    protected int getRoot(){
        return root;
    }
    protected pNodePoolTreeNode getNode(int n){
        if(n != -1)
            return arrayNodes[n];
        else return null;
    }
    protected void insertLeft(int node,Object o){
        if((node != -1) && (getNode(node).getLeft() == -1)
            && !isFull()){
            int i = getNode();
            getNode(i).setData(o);
            getNode(i).setRight(-1);
            getNode(node).setLeft(i);
        }
    }
    protected void insertRight(int node,Object o){
        if((node != -1) && (getNode(node).getRight() == -1)
            && !isFull()){
            int i = getNode();
            getNode(i).setData(o);
            getNode(i).setRight(-1);
            getNode(node).setRight(i);
        }
    }
    public void print(int mode){
        switch(mode){
        case 1:
            pretrav();
            break;
        case 2:
            intrav();
            break;
        case 3:
            postrav();
            break;
        }
    }
    public void pretrav(){
        pretrav(getRoot());
    }
    private void pretrav(int p){
        if(p == -1)
            return;
        System.out.print(getNode(p).getData()+" ");
        pretrav(getNode(p).getLeft());
        pretrav(getNode(p).getRight());
    }
    public void intrav(){
        intrav(getRoot());
    }
    private void intrav(int p){
        if(p == -1)
            return;
        intrav(getNode(p).getLeft());
        System.out.print(getNode(p).getData()+" ");
        intrav(getNode(p).getRight());
    }
    public void postrav(){
        postrav(getRoot());
    }
    private void postrav(int p){
        if(p == -1)
            return;
        postrav(getNode(p).getLeft());
        postrav(getNode(p).getRight());
        System.out.print(getNode(p).getData()+" ");
    }
}

    There are not that many hard methods in this one; so, lets focus in on the hard ones, and I'll leave you to figure out the "easy" ones yourself.

    One thing I'd like to mention of the top so that it causes as little confusion as possible: now that we're not using objects, but integers, we still have to note the "null node" somehow, so, in our example, I'm using -1 as the "null node" indicator.

    Data members numNodes, nextAvail, and arrayNodes are there to keep track of the linked list of free and occupied nodes. It's not exactly a linked list (although many people like to think of it that way), it's more or less a simple array stack we've implemented at the beginning of this document. The numNodes is the maximum number of nodes inside this tree; we could just as well substitute it for arrayNodes.length, but I think this "separation" is more convenient.

    The nextAvail member is the one that points to the next available node. Our allocation methods, getNode() and freeNode(int n), uses this particular pointer to allocate and release nodes. And arrayNodes is just an array holding the node-pool. All of the integer child references are into this arrayNodes array; which we allocate dynamically during instantiation (creation) of the object.

    Inside the init() function, we allocate the arrayNodes array to hold the node-pool. We later loop through all the nodes inside of it, and turn them into a linked list (with right child pointing to the next available node and left child being -1). The last node in the list is apparently marked -1; since it's last!

    I suggest you go over the source for getNode() and freeNode(int n); try to understand what it's doing. Which is nothing more complex than what we've already done! (look over the linked list description if you find it confusing)

    The rest of the functions are pretty much self explanatory (you should be able to figure them out). Most of them are just a port of the old tree functions to use integer children with node-pool nodes. Well, are you ready for a binary sort tree implementing a node-pool (ready or not, we're going there)!


Node Pool Sort Trees...

    As I've mentioned before, node-pool implementations are fairly similar to the regular ones, so, our old sort tree will not change much. The only things that will change however, are the node handling methods. Since we already know what needs to be done (review the binary sort example described previously in this document), we can just go ahead and write the source.

import pNodePoolTreeNode;
import pNodePoolArrayTree;
import java.lang.*;

public class pNodePoolSortArrayTree extends pNodePoolArrayTree{

    public pNodePoolSortArrayTree(){
        super();
    }
    public pNodePoolSortArrayTree(int n){
        super(n);
    }
    public void print(){
        print(2);
    }
    public void insert(Comparable obj){
        int t,q = -1;
        t = getRoot();
        while(t != -1 && !(obj.equals(getNode(t).getData()))){
            q = t;
            if(obj.compareTo(getNode(t).getData()) < 0)
                t = getNode(t).getLeft();
            else
                t = getNode(t).getRight();
        }
        if(t != -1)
            return;
        if(q == -1){
            setData(obj);
            return;
        }
        if(obj.compareTo(getNode(q).getData()) < 0)
            insertLeft(q,obj);
        else
            insertRight(q,obj);
    }
}

    Now, the following only makes sense inside a JDK 1.2 (or above) context. Earlier version of the JDK do not support the Comparable interface. I just though it would be nice to show how to use all these cool interfaces introduced with JDK 1.2!

    The code does nothing special, and chances are, you can figure it out on your own. The only seeming difference between this source and the one presented earlier, is that in this one, whenever we test for a null node, we're not testing the node, we're testing whether if it has value of -1. Other than that, the code looks almost identical.

    This code is pretty bad, however, in illustrating the beauty of Object Oriented programming. With our approach, we needed to change the whole thing just to implement node-pools. Ideally, we should strive to only have to change one part of the system. For example, we could have only changed the abstract parent class, and leave everything else as it was. Or we could have changed the node a little, and make the node itself support stuff, but I just thought this was a clearer approach, without dragging all that Object Oriented stuff into the issue.

    I think I've made this Node Pool section longer than it should be; so, let me quickly wrap it up by getting right to the point. Lets write a test driver (which will be an almost identical copy of the old test driver).

import java.io.*;
import pNodePoolSortArrayTree;

public class pNodePoolSortArrayTreeTest{

    public static void main(String[] args){
        pNodePoolSortArrayTree tree =
            new pNodePoolSortArrayTree(100);
        int i,n;
        System.out.println("Numbers inserted:");
        for(i=0;i<10;i++){
            tree.insert(new Integer(n=(int)(Math.random()*1000)));
            System.out.print(n+" ");
        }
        System.out.println("\nPre-order:");
        tree.print(1);
        System.out.println("\nIn-order:");
        tree.print(2);
        System.out.println("\nPost-order:");
        tree.print(3);
    }
}

    You'll notice that the code above uses java.lang.Integer object, instead of our own pInteger one. That's because in JDK 1.2, an Integer is implementing Comparable interface, simplifying a whole lot of stuff. Other than that, the code works identically to the one shown earlier in the document. Just to be complete, however, lets include the output from the above program.

Numbers inserted:
103 82 165 295 397 828 847 545 766 384
Pre-order:
103 82 165 295 397 384 828 545 766 847
In-order:
82 103 165 295 384 397 545 766 828 847
Post-order:
82 165 295 397 384 828 545 766 847 103

    I guess that wraps it up. (finally!) Now, I'm gonna take a little break, and try to come up with some cool programs that use these data structures. The only way to actually learn what they are, and how to use them, is to see them in action. And some of these can be VERY useful in a wide variety of applications, ranging form databases, to graphics programming.

    So, now that you know all the basics of everything about data structures, the remainder of this document will mostly concentrate on interesting approaches, and algorithms using these. (You'll be surprised how far a binary sort tree can go when it comes to creating virtual 3D worlds!)


Priority Vectors, etc.

    We've learned about the structure of vectors, lists, queues, etc., but what if you need some order in the way data is arranged? You have several options to this; you can simply implement a sort method inside the data storage class, or you can make the class itself a priority one.

    Priority classes are widely used in programs because they simplify a lot of things. For example, you don't have to worry about sorting data, the class itself does it for you. Imagine a simulation of a real world queue at a doctor's office, where people with life threatening injuries get priority over everyone else. The queue has to accept all people, and sort them according to the seriousness of their illness; with most serious ending up seeing the doctor first.

    The priority concept can be applied anywhere, not just to queues. It can easily be extended to lists, and vectors. In this section, we will talk about creating a priority vector (since it's fairly simple). We will also cover some "well known" sorting algorithms.

    How simple is it? The answer is: very! All we need is to extend the java.util.Vector class, add one insert function, and we are done. And here's the source:

import java.lang.Comparable;

public class pPriorityVector extends java.util.Vector{
    public void pAddElement(Comparable o){
        int i,j = size();
        for(i=0;i<j&&(((Comparable)(elementAt(i))).compareTo(o)<0);i++);
        insertElementAt(o,i);
    }
}

    As you can see, it's VERY simple. We simply use this class the way we would use any other java.util.Vector, accept, when we use pAddElement(Comparable) method, we are inserting in accenting order. We could have just as well written a descending order one; I think you get the picture...

    Lets test this class, and see if it really works.

import java.io.*;
import java.util.*;
import pPriorityVector;

class pPriorityVectorTest{
    public static void main(String[] args){
        pPriorityVector v = new pPriorityVector();
        System.out.print("starting...\nadding:");
        for(int i=0;i<10;i++){
            Integer j = new Integer((int)(Math.random()*100));
            v.pAddElement(j);
            System.out.print(" " + j);
        }
        System.out.print("\nprinting:");
        Enumeration enum = v.elements();
        while(enum.hasMoreElements())
            System.out.print(" "+(Integer)enum.nextElement());
        System.out.println("\nDone ;-)");
    }
}

    The above test program simply inserts ten random java.lang.Integer objects, and then prints them out. If our class works, the output should produce a sorted list of number. Our prediction is correct, the output follows.

starting...
adding: 95 85 62 16 39 73 84 43 77 61
printing: 16 39 43 61 62 73 77 84 85 95
Done ;-)

    As you can see, we are inserting numbers in an unsorted order, and get a sorted list when we finally print them out. The technique illustrated above sorts the numbers upon insertion, some algorithms do sorting when retrieving data.


Sorting...

    Sorting, in general, means arranging something in some meaningful manner. There are TONS of ways of sorting things, and we will only look at the most popular ones. The one we won't cover here (but one you should know) is Quick Sort. You can find a really good Quick Sort example in the JDK's sample applets. We have also learned that we can sort using binary trees, thus, in this section, we won't cover that approach.

    We will be using JDK 1.2's java.lang.Comparable interface in this section, thus, the examples will not work with earlier versions of the JDK. The sorting algorithms we'll look at in this section are: Bubble Sort, Selection Sort, Insertion Sort, Heap Sort, and lastly, Merge Sort. If you think that's not enough, you're welcome to find other algorithms from other sources. One source I'd recommend is the JDK's sources. The JDK has lots of code, and lots of algorithms that may well be very useful.

    We won't exactly cover the algorithms; I will simply illustrate them, and you'll have to figure out how they work from the source. (By tracing them.) I don't have the time to go over each and every algorithm; I'm just taking my old C source, converting it to Java, and inserting it into this document ;-)

import java.io.*;
import java.util.*;
import java.lang.*;

public class pGeneralSorting{

    public static void BubbleSort(Comparable[] a){
        boolean switched = true;
        for(int i=0;i<a.length-1 && switched;i++){
            switched = false;
            for(int j=0;j<a.length-i-1;j++)
                if(a[j].compareTo(a[j+1]) > 0){
                    switched = true;
                    Comparable hold = a[j];
                    a[j] = a[j+1];
                    a[j+1] = hold;
                }
        }
    }

    public static void SelectionSort(Comparable[] a){
        for(int i = a.length-1;i>0;i--){
            Comparable large = a[0];
            int indx = 0;
            for(int j = 1;j <= i;j++)
                if(a[j].compareTo(large) > 0){
                    large = a[j];
                    indx = j;
                }
            a[indx] = a[i];
            a[i] = large;
        }
    }

    public static void InsertionSort(Comparable[] a){
        int i,j;
        Comparable e;
        for(i=1;i<a.length;i++){
            e = a[i];
            for(j=i-1;j>=0 && a[j].compareTo(e) > 0;j--)
                a[j+1] = a[j];
            a[j+1] = e;
        }
    }

    public static void HeapSort(Comparable[] a){
        int i,f,s;
        for(i=1;i<a.length;i++){
            Comparable e = a[i];
            s = i;
            f = (s-1)/2;
            while(s > 0 && a[f].compareTo(e) < 0){
                a[s] = a[f];
                s = f;
                f = (s-1)/2;
            }
            a[s] = e;
        }
        for(i=a.length-1;i>0;i--){
            Comparable value = a[i];
            a[i] = a[0];
            f = 0;
            if(i == 1)
                s = -1;
            else
                s = 1;
            if(i > 2 && a[2].compareTo(a[1]) > 0)
                s = 2;
            while(s >= 0 && value.compareTo(a[s]) < 0){
                a[f] = a[s];
                f = s;
                s = 2*f+1;
                if(s+1 <= i-1 && a[s].compareTo(a[s+1]) < 0)
                    s = s+1;
                if(s > i-1)
                    s = -1;
            }
            a[f] = value;
        }
    }

    public static void MergeSort(Comparable[] a){
        Comparable[] aux = new Comparable[a.length];
        int i,j,k,l1,l2,size,u1,u2;
        size = 1;
        while(size < a.length){
            l1 = k = 0;
            while((l1 + size) < a.length){
                l2 = l1 + size;
                u1 = l2 - 1;
                u2 = (l2+size-1 < a.length) ?
                    l2 + size-1:a.length-1;
                for(i=l1,j=l2;i <= u1 && j <= u2;k++)
                    if(a[i].compareTo(a[j]) <= 0)
                        aux[k] = a[i++];
                    else
                        aux[k] = a[j++];
                for(;i <= u1;k++)
                    aux[k] = a[i++];
                for(;j <= u2;k++)
                    aux[k] = a[j++];
                l1 = u2 + 1;
            }
            for(i=l1;k < a.length;i++)
                aux[k++] = a[i];
            for(i=0;i < a.length;i++)
                a[i] = aux[i];
            size *= 2;
        }
    }

    public static void main(String[] args){
        Integer[] a = new Integer[10];
        System.out.print("starting...\nadding:");
        for(int i=0;i<a.length;i++){
            a[i] = new Integer((int)(Math.random()*100));
            System.out.print(" " + a[i]);
        }
        BubbleSort(a);
        System.out.print("\nprinting:");
        for(int i=0;i<a.length;i++){
            System.out.print(" " + a[i]);
        }
        System.out.println("\nDone ;-)");
    }
}

    The above program both illustrates the algorithms, and tests them. Some of these may look fairly complicated; usually, the more complicated a sorting algorithm is, the faster and/or more efficient it is. That's the reason I'm not covering Quick Sort; I'm too lazy to convert that huge thing! Some sample output from the above program follows:

starting...
adding: 22 63 33 19 82 59 70 58 98 74
printing: 19 22 33 58 59 63 70 74 82 98
Done ;-)

    I think you get the idea bout sorting. You can always optimize the above sorting methods, use native types, etc. You can also use these in derived classes from java.util.Vector, using the java.util.Vector data directly. And just when you though we were done with sorting, here we go again...


Sorting JDK 1.2 Style...

    Wasn't the last section scary and a bit confusing? If you said "yes," then you're not alone. Java implementers also think so; that's why a lot of the sorting thingies are already built in! JDK 1.2 introduced the Collection interface; using it, we can do all kinds of data structures effects like sorting and searching. For example, to sort a Vector, all we need to do is call a sort() method of the java.util.Collections class.

import java.io.*;
import java.util.*;

public class pJDKVectorSortingUp{
    public static void main(String[] args){
        Vector v = new Vector();
        System.out.print("starting...\nadding:");
        for(int i=0;i<10;i++){
            Integer j = new Integer((int)(Math.random()*100));
            v.addElement(j);
            System.out.print(" " + j);
        }
        Collections.sort(v);
        System.out.print("\nprinting:");
        Enumeration enum = v.elements();
        while(enum.hasMoreElements())
            System.out.print(" "+(Integer)enum.nextElement());
        System.out.println("\nDone ;-)");
    }
}

    See how simple it is? The output follows...

starting...
adding: 91 37 16 53 11 15 89 44 90 58
printing: 11 15 16 37 44 53 58 89 90 91
Done ;-)

    All this is nice and useful, but what if you need descending order, instead of the "default" accenting one? Guess what?, the implementers of the language have though about that as well! We have a java.util.Comparator interface. With this Comparator, we can specify our own compare function, making it possible to sort in descending order (or any other way we want... i.e.: sorting absolute values, etc.).

import java.io.*;
import java.util.*;

public class pJDKVectorSortingDown implements Comparator{

    public int compare(Object o1,Object o2){
        return -((Comparable)o1).compareTo(o2);
    }

    public static void main(String[] args){
        Vector v = new Vector();
        System.out.print("starting...\nadding:");
        for(int i=0;i<10;i++){
            Integer j = new Integer((int)(Math.random()*100));
            v.addElement(j);
            System.out.print(" " + j);
        }
        Collections.sort(v,new pJDKVectorSortingDown());
        System.out.print("\nprinting:");
        Enumeration enum = v.elements();
        while(enum.hasMoreElements())
            System.out.print(" "+(Integer)enum.nextElement());
        System.out.println("\nDone ;-)");
    }
}

    As you can see, this time, we're sending a Comparator object to the Collections.sort() method. This technique is really cool and useful. The output from the above program follows:

starting...
adding: 9 96 58 64 13 99 91 55 51 95
printing: 99 96 95 91 64 58 55 51 13 9
Done ;-)

    I suggest you go over all those JDK classes and their methods. There is a LOT of useful stuff there. Now, say good bye to sorting for a while; we're moving into something totally different; NOT!


Sorting using Quicksort...

    Sorting using Quicksort is something that I wasn't planning to cover in this document due to the fact that the algorithm solely depends on the data it receives. If the data has certain properties, quicksort is one of the fastest, if not, quicksort can be excruciatingly slow. By now, you probably already figured out that bubble sort (covered earlier) is pretty slow; put it another way: forget about using bubble sort for anything important.

    Bubble sort tends to be O(n^2); in other words, pretty slow. Now, quicksort can perform quite fast, on average about O(n log n), but it's worst case, is a humiliating O(n^2). For quicksort, the worst case is usually when the data is already sorted. There are many algorithms to make this "less likely to occur," but it does happen. (the O notation is paraphrased "on the order of")

    If you need a no-worst-case fast sorting algorithm, then by all means, use heap sort (covered earlier). In my opinion, heap sort is more elegant than any other sort <it is in place O(n log n) sort>. However, on average, it does tend to perform twice as slow as quicksort.

    UPDATE: I've recently attended a conference at the [insert fancy name here] Science Society, and heard a nice talk by a professor from Princeton. The talk was totally dedicated to Quicksort. Apparently, if you modify the logic a bit, you can make Quicksort optimal! This is a very grand discovery! (unfortunately, I didn't write down the name of anybody; not even the place!) The idea (among other things) involves moving duplicate elements to one end of the list, and at the end of the run, move all of them to the middle, then sort the left and right sides of the list. Since now Quicksort performs well with lots of duplicate values (really well actually ;-), you can create a radix-like sort that uses embedded Quicksort to quickly (very quickly) sort things. There's also been a Dr.Dobbs article (written by that same professor ;-) on this same idea, and I'm sorry for not having any more references than I should. I still think this is something that deserved to be mentioned.

    We will start out by writing a general (simplest) implementation of quicksort. After we thoroughly understand the algorithm we will re-write it implementing all kinds of tricks to make it faster.

    Quicksort is naturally recursive. We partition the array into two sub-arrays, and then re-start the algorithm on each of these sub-arrays. The partition procedure involves choosing some object (usually, already in the array); If some other object is greater than the chosen object, it is added to one of the sub-arrays, if it's less than the chosen object, it's added to another sub-array. Thus, the entire array is partitioned into two sub-arrays, with one sub-array having everything that's larger than the chosen object, and the other sub-array having everything that's smaller.

    The algorithm is fairly simple, and it's not hard to implement. The tough part is making it fast. The implementation that follows is recursive and is not optimized, which makes this function inherently slow (most sorting functions try to avoid recursion and try to be as fast as possible). If you need raw speed, you should consider writing a native version of this algorithm.

public class pSimpleQuicksort{

    public static void qsort(Comparable[] c,int start,int end){
        if(end <= start) return;
        Comparable comp = c[start];
        int i = start,j = end + 1;
        for(;;){
            do i++; while(i<end && c[i].compareTo(comp)<0);
            do j--; while(j>start && c[j].compareTo(comp)>0);
            if(j <= i)   break;
            Comparable tmp = c[i];
            c[i] = c[j];
            c[j] = tmp;
        }
        c[start] = c[j];
        c[j] = comp;
        qsort(c,start,j-1);
        qsort(c,j+1,end);
    }

    public static void qsort(Comparable[] c){
        qsort(c,0,c.length-1);
    }

    public static void main(String[] args){
        int i;
        Integer[] arr = new Integer[20];
        System.out.println("inserting: ");
        for(i=0;i<arr.length;i++){
            arr[i] = new Integer((int)(Math.random()*99));
            System.out.print(arr[i]+" ");
        }
        pSimpleQuicksort.qsort(arr);
        System.out.println("\nsorted: ");
        for(i=0;i<arr.length;i++)
            System.out.print(arr[i]+" ");
        System.out.println("\nDone ;-)");
    }
}

    As you can see, the qsort() functions itself is fairly short. One of them is just a wrapper for the other one. The idea is that qsort() receives the array, and then the start, and end pointer between which you want everything sorted. So, the starting call starts at array position zero, and ends at the last valid position in the array. Some sample output follows:

inserting:
58 52 82 27 23 67 37 63 68 18 95 41 87 6 53 85 65 30 10 3
sorted:
3 6 10 18 23 27 30 37 41 52 53 58 63 65 67 68 82 85 87 95
Done ;-)

    The sorts starts out by first checking to see if the end pointer is less then or equal to the start pointer. It if is less, then there is nothing to sort, and we return. If they are equal, we have only one element to sort, and an element by itself is always already sorted, so we return.

    If we did not return, we make a pick. In our example, we simply choose the first element in the array to use as our partition element (some call it pivot element). To some people, this is the most critical point of the sort, since depending on what element you choose, you'll get either good performance, or bad. A lot of the times, instead of choosing the first, people choose the last, or take a median of the first, last and middle elements. Some even have a random number generator to pick a random element. However, all these techniques are useless against random data, in which case, all those tricky approaches can actually prove to worsen results. Then again, most of the times, they do work quite nicely... (it really depends on the type of data you are working with)

    After we have picked the element, we setup i and j, and fall into the a for loop. Inside that loop, we have two inner loops. Inside the first inner loop, we scan the array looking for the first element which is larger than our picked element, moving from left to right of the array. Inside the second inner loop, we scan to find an element smaller than the picked, moving from right to left in the array. Once we've found them (fallen out of both loops), we check to see if the pointers haven't crossed (since if they did, we are done). We then swap the elements, and continue on to the next iteration of the loop.

    You should notice that in all these loops, we are doing one simple thing. We are making sure that only one element gets to it's correct position. We deal with one element at a time. After we find that correct position of our chosen element, all we need to do is sort the elements on it's right, and left sides, and we're done. You should also notice that the algorithm above is lacking in optimization. The inner loops, where most of the time takes place are not as optimized as they should be. However, for the time being, try to understand the algorithm; and we will get back to optimizing a bit later.

    When we fall out of the outer loop, we put our chosen element into it's correct position, calculate the upper and lower bounds of the left and right arrays (from that chosen elements), and recursively call the method on these new computed arrays.

    That's basically it for the general algorithm. In the next section, you'll be amazed at how much faster we can make this work. Basically, in the next section, we will implement a sort function to use everywhere (it will be faster than most other approaches).


Optimizing Quicksort...