| ![]() |
|
|
|
Introduction...
|
In contrast to what most people think about Java, it being a language with no pointers, data structures are quite easy to implement. In this section, I'll demonstrate few basic data structures. By learning how easy they are to implement in Java, you'll be able to write any implementation yourself.
I also think that this document is a pretty good introduction to Data Structures in general. All these concepts can be applied in any programming language. Incidentally, most of these programs are ported from their C++ counterparts. So, if you want to learn Data Structures in C/C++, you'll still find this document useful! Java is an Object Oriented language, but more so than C++, so, most data structure concepts are expressed and illustrated "more naturally" in Java! (try not to raise your blood pressure from all the caffeine)
I suggest that you be familiar with Java format, and know some other programming language in advance. Coincidentally, I and a couple of my friends are in the process of writing a C language book, which deals with all that "start up" stuff.
The way most examples are executed is through the
JDK's command line Java interpreter. (at the prompt, you just type "java"
and the name of the class to run.)
Variables are the key to any program. There are variables called registers inside every CPU (Central Processing Unit). Every program ever written uses some form of variables. Believe it or not, the way you use variables can significantly impact your program. This section is a very simple introduction to what variables are, and how they're used in programs.
Usually, a variable implies a memory location to hold one instance of one specific type. What this means is that if there is an integer variable, it can only hold one integer, and if there is a character variable, it can only hold one character.
There can be many different types of variables, including of your own type. A sample declaration for different variable types is given below.
boolean t;
byte b;
char c;
int i;
long l;
I believe the above is straight forward, and doesn't
need much explanation. Variable 't' is declared as boolean type,
and 'b' as of byte type, etc.
The above variables are what's know as primitive
types. Primitive types in Java means that you don't have to create them, they're already
available as soon as you declare them. (you'll see what I mean when we deal with Objects)
It also means that there is usually some hardware equivalent to these variables. For
example, an int type, can be stored in a 32 bit hardware register.
The other types of variables are instances of classes or Objects. Java is a very Object Oriented language, and everything in it, is an object. An object is an instance of a class. Your Java programs consist of classes, in which you manipulate objects, and make the whole program do what you want. This concept will be familiar to you if you've ever programmed C++, if not, think of objects as structures. An example of a simple class would be:
public class pSimpleObject{
int i;
public pSimpleObject(){
i = 0;
}
public int get(){
return i;
}
public void set(int n){
i = n;
}
}
As you can see, first we specify that the class is public,
this means that it can be visible to other objects outside it's file. We later say that
it's a class, and give it's name, which in this case is: pSimpleObject.
Inside of it, the class contains an integer named 'i', and three functions.
The first function named pSimpleObject(), is the constructor. It is called
every time an object is created using this class. The set() and get()
functions set and get the value of 'i' respectively. One useful terminology
is that functions in objects are not called functions, they're called methods. So, to
refer to function set(), you'd say "method set()."
That's all there is to objects!
The way you declare a variable, or in this case, an object of that class, is:
pSimpleObject myObject;
myObject = new pSimpleObject();
or
pSimpleObject myObject = new pSimpleObject();
The first example illustrates how you declare an
object named myObject, of class pSimpleObject, and later
instantiate it (a process of actual creation, where it calls the object's constructor
method). The second approach illustrates that it all can be done in one line. The object
does not get created when you just declare it, it's only created when you do a new
on it.
If you're familiar with C/C++, think of objects as pointers. First, you declare it, and then you allocate a new object to that pointer. The only limitation seems to be that you can't do math on these pointers, other than that, they behave as plain and simple C/C++ pointers. (You might want to think of objects as references however.)
Using variables is really cool, and useful, but sometimes we'd like to have more. Like the ability to work with hundreds or maybe thousands of variables at the same time. And here's where our next section starts, the Arrays!
One of the most basic data structures, is an array. An array is just a number of items, of same type, stored in linear order, one after another. Arrays have a set limit on their size, they can't grow beyond that limit. Arrays usually tend to be easier to work with and generally more efficient than other structural approaches to organizing data; way better than a no formal structure approach.
For example, lets say you wanted to have 100 numbers. You can always resort to having 100 different variables, but that would be a pain. Instead, you can use the clean notation of an array to create, and later manipulate those 100 numbers. For example, to create an array to hold 100 numbers you would do something like this:
int[] myArray;
myArray = new int[100];
or
int[] myArray = new int[100];
or
int myArray[] = new int[100];
The three notations above do exactly the same thing.
The first declares an array, and then it creates an array by doing a new. The
second example shows that it can all be one in one line. And the third example shows that
Java holds the backwards compatibility with C++, where the array declaration is: int
myArray[]; instead of int[] myArray;. To us, these notations are
exactly the same. I do however prefer to use the Java one.
Working with arrays is also simple, think of them as just a line of variables, we can address the 5th element (counting from 0, so, it's actually the 6th element) by simply doing:
int i = myArray[5];
The code above will set integer 'i' to
the value of the 5th (counting from 0) element of the array. Similarly, we can set an
array value. For example, to set the 50th element (counting from 0), to the value of 'i'
we'd do something like:
myArray[50] = i;
As you can see, arrays are fairly simple. The best and most convenient way to manipulate arrays is using loops. For example, lets say we wanted to make an array from 1 to 100, to hold numbers from 1 to 100 respectively, and later add seven to every element inside that array. This can be done very easily using two loops. (actually, it can be done in one loop, but I am trying to separate the problem into two)
int i;
for(i=0;i<100;i++)
myArray[i] = i;
for(i=0;i<100;i++)
myArray[i] = myArray[i] + 7;
In Java, we don't need to remember the size of the
array as in C/C++. Here, we have the length variable in every array, and we can check it's
length whenever we need it. So to print out any array named: myArray, we'd do
something like:
for(int i = 0;i<myArray.length;i++)
System.out.println(myArray[i]);
This will work, given the objects inside the myArray
are printable, (have a corresponding toString() method), or are of primitive
type.
One of the major limitations on arrays is that they're fixed in size. They can't grow or shrink according to need. If you have an array of 100 max elements, it will not be able to store 101 elements. Similarly, if you have less elements, then the unused space is being wasted (doing nothing).
Java API provides data storage classes, which
implement an array for their storage. As an example, take the java.util.Vector
class (JDK 1.2), it can grow, shrink, and do some quite useful things. The way it does it
is by reallocating a new array every time you want to do some of these operations, and
later copying the old array into the new array. It can be quite fast for small sizes, but
when you're talking about several megabyte arrays, and every time you'd like to add one
more number (or object) you might need to reallocate the entire array; that can get quite
slow. Later, we will look at other data structures where we won't be overly concerned with
the amount of the data and how often we need to resize.
Even in simplest situations, arrays are powerful storage constructs. Sometimes, however, we'd like to have more than just a plain vanilla array.
The next and more serious data structure we'll examine is the Stack. A stack is a FILO (First In, Last Out), structure. For now, we'll just deal with the array representation of the stack. Knowing that we'll be using an array, we automatically think of the fact that our stack has to have a maximum size.
A stack has only one point where data enters or leaves. We can't insert or remove elements into or from the middle of the stack. As I've mentioned before, everything in Java is an object, (since it's an Object Oriented language), so, lets write a stack object!
public class pArrayStackInt{
protected int head[];
protected int pointer;
public pArrayStackInt(int capacity){
head = new int[capacity];
pointer = -1;
}
public boolean isEmpty(){
return pointer == -1;
}
public void push(int i){
if(pointer+1 < head.length)
head[++pointer] = i;
}
public int pop(){
if(isEmpty())
return 0;
return head[pointer--];
}
}
As you can see, that's the stack class. The
constructor named pArrayStackInt() accepts an integer. That integer is to
initialize the stack to that specific size. If you later try to push() more
integers onto the stack than this capacity, it won't work. Nothing is complete without
testing, so, lets write a test driver class to test this stack.
import java.io.*;
import pArrayStackInt;
class pArrayStackIntTest{
public static void main(String[] args){
pArrayStackInt s = new pArrayStackInt(10);
int i,j;
System.out.println("starting...");
for(i=0;i<10;i++){
j = (int)(Math.random() * 100);
s.push(j);
System.out.println("push: " + j);
}
while(!s.isEmpty()){
System.out.println("pop: " + s.pop());
}
System.out.println("Done ;-)");
}
}
The test driver does nothing special, it inserts ten random numbers onto the stack, and then pops them off. Writing to standard output exactly what it's doing. The output gotten from this program is:
starting...
push: 33
push: 66
push: 10
push: 94
push: 67
push: 79
push: 48
push: 7
push: 79
push: 32
pop: 32
pop: 79
pop: 7
pop: 48
pop: 79
pop: 67
pop: 94
pop: 10
pop: 66
pop: 33
Done ;-)
As you can see, the first numbers to be pushed on, are the last ones to be popped off. A perfect example of a FILO structure. The output also assures us that the stack is working properly.
Now that you've had a chance to look at the source, lets look at it more closely.
The pArrayStackInt class is using an
array to store it's data. The data is int type (for simplicity). There is a
head data member, that's the actual array. Because we're using an array, with limited
size, we need to keep track of it's size, so that we don't overflow it; we always look at head.length
to check for maximum size.
The second data member is pointer.
Pointer, in here, points to the top of the stack. It always has the position which had the
last insertion, or -1 if the stack is empty.
The constructor: pArrayStackInt(),
accepts the maximum size parameter to set the size of the stack. The rest of the functions
is just routine initialization. Notice that pointer is initialized to -1, this makes the
next position to be filled in an array, 0.
The isEmpty() function is self
explanatory, it returns true if the stack is empty (pointer is
-1), and false otherwise. The return type is boolean.
The push(int) function is fairly easy
to understand too. First, it checks to see if the next insertion will not overflow the
array. If no danger from overflow, then it inserts. It first increments the pointer and
then inserts into the new location pointed to by the updated pointer. It
could easily be modified to actually make the array grow, but then the whole point of
"simplicity" of using an array will be lost.
The int pop() function is also very
simple. First, it checks to see if stack is not empty, if it is empty, it will return 0.
In general, this is a really bad error to pop of something from an empty stack. You may
want to do something more sensible than simply returning a 0 (an exception throw would not
be a bad choice). I did it this way for the sake of simplicity. Then, it returns the value
of the array element currently pointed to by pointer, and it decrements the pointer. This
way, it is ready for the next push or pop.
I guess that just about covers it. Stack is very simple and is very basic. There are tons of useful algorithms which take advantage of this FILO structure. Now, lets look at alternative implementations...
Given the above, a lot of the C++ people would look at me strangely, and say: "All this trouble for a stack that can only store integers?" Well, they're probably right for the example above. It is too much trouble. The trick I'll illustrate next is what makes Java my favorite Object Oriented language.
In C, we have the void* type, to make
it possible to store "generic" data. In C++, we also have the void*
type, but there, we have very useful templates. Templates is a C++ way to make generic
objects, (objects that can be used with any type). This makes quite a lot of sense for a
data storage class; why should we care what we're storing?
The way Java implements these kinds of generic
classes is by the use of parent classes. In Java, every object is a descendent of the Object
class. So, we can just use the Object class in all of our structures, and
later cast it to an appropriate type. Next, we'll write an example that uses this
technique inside a generic stack.
public class pArrayStackObject{
protected Object head[];
protected int pointer;
public pArrayStackObject(int capacity){
head = new Object[capacity];
pointer = -1;
}
public boolean isEmpty(){
return pointer == -1;
}
public void push(Object i){
if(pointer+1 < head.length)
head[++pointer] = i;
}
public Object pop(){
if(isEmpty())
return null;
return head[pointer--];
}
}
The above is very similar to the int
only version, the only things that changed are the int to Object.
This stack, allows the push() and pop() of any Object.
Lets convert our old test driver to accommodate this new stack. The new test module will
be inserting java.lang.Integer objects (not int; not primitive
type).
import java.io.*;
import pArrayStackObject;
class pArrayStackObjectTest{
public static void main(String[] args){
pArrayStackObject s = new pArrayStackObject(10);
Integer j = null;
int i;
System.out.println("starting...");
for(i=0;i<10;i++){
j = new Integer((int)(Math.random() * 100));
s.push(j);
System.out.println("push: " + j);
}
while(!s.isEmpty()){
System.out.println("pop: " + ((Integer)s.pop()));
}
System.out.println("Done ;-)");
}
}
And for the sake of being complete, I'll include the
output. Notice that here, we're not inserting elements of int type, we're
inserting elements of java.lang.Integer type. This means, that we can insert
any Object.
starting...
push: 45
push: 7
push: 33
push: 95
push: 28
push: 98
push: 87
push: 99
push: 66
push: 40
pop: 40
pop: 66
pop: 99
pop: 87
pop: 98
pop: 28
pop: 95
pop: 33
pop: 7
pop: 45
Done ;-)
I guess that covers stacks. The main idea you should learn from this section is that a stack is a FILO data structure. After this section, non of the data structures will be working with primitive types, and everything will be done solely with objects. (now that you know how it's done...)
And now, onto the array relative of Stack, the Queue.
A queue is a FIFO (First In, First Out) structure. Anything that's inserted first, will be the first to leave (kind of like the real world queues.) This is totally the opposite of what a stack is. Although that is true, the queue implementation is quite similar to the stack one. It also involves pointers to specific places inside the array.
With a queue, we need to maintain two pointers, the start
and the end. We'll determine when the queue is empty if start
and end point to the same element. To determine if the queue is full (since
it's an array), we'll have a boolean variable named full. To
insert, we'll add one to the start, and mod (the % operator)
with the size of the array. To remove, we'll add one to the end, and mod (the
% operator) with the size of the array. Simple? Well, lets write it.
public class pArrayQueue{
protected Object[] array;
protected int start,end;
protected boolean full;
public pArrayQueue(int maxsize){
array = new Object[maxsize];
start = end = 0;
full = false;
}
public boolean isEmpty(){
return ((start == end) && !full);
}
public void insert(Object o){
if(!full)
array[start = (++start % array.length)] = o;
if(start == end)
full = true;
}
public Object remove(){
if(full)
full = false;
else if(isEmpty())
return null;
return array[end = (++end % array.length)];
}
}
Well, that's the queue class. In it, we have four
variables, the array, the start and end, and a boolean
full. The constructor pArrayQueue(int maxsize) initializes the queue,
and allocates an array for data storage. The isEmpty() method is
self explanatory, it checks to see if start and end are equal;
this can only be in two situations: when the queue is empty, and when the queue is full.
It later checks the full variable and returns whether this queue is empty or
not.
The insert(Object) method, accepts an Object
as a parameter, checks whether the queue is not full, and inserts it. The
insert works by adding one to start, and doing a mod with array.length
(the size of the array), the resulting location is set to the incoming
object. We later check to see if this insertion caused the queue to become full, if yes,
we note this by setting the full variable to true.
The Object remove() method, doesn't
accept any parameters, and returns an Object. It first checks to see if the
queue is full, if it is, it sets full to false
(since it will not be full after this removal). If it's not full,
it checks if the queue is empty, by calling isEmpty(). If it is, the method
returns a null, indicating that there's been an error. This is usually a
pretty bad bug inside a program, for it to try to remove something from an empty queue,
so, you might want to do something more drastic in such a situation (like an exception
throw). The method continues by removing the end object from the queue. The removal is
done in the same way insertion was done. By adding one to the end, and later
mod it with array.length (array size), and that position is
returned.
There are other implementations of the same thing, a
little re-arrangement can make several if() statements disappear. The reason
it's like this is because it's pretty easy to think of it. Upon insertion, you add one to start
and mod, and upon removal, you add one to end and mod... easy?
Well, now that we know how it works, lets actually test it! I've modified that pretty cool test driver from the stack example, and got it ready for this queue, so, here it comes:
import java.io.*;
import pArrayQueue;
class pArrayQueueTest{
public static void main(String[] args){
pArrayQueue q = new pArrayQueue(10);
Integer j = null;
int i;
System.out.println("starting...");
for(i=0;i<10;i++){
j = new Integer((int)(Math.random() * 100));
q.insert(j);
System.out.println("insert: " + j);
}
while(!q.isEmpty()){
System.out.println("remove: " + ((Integer)q.remove()));
}
System.out.println("Done ;-)");
}
}
As you can see, it inserts ten random java.lang.Integer
Objects onto the queue, and later prints them out. The output from the program
follows:
starting...
insert: 3
insert: 70
insert: 5
insert: 17
insert: 26
insert: 79
insert: 12
insert: 44
insert: 25
insert: 27
remove: 3
remove: 70
remove: 5
remove: 17
remove: 26
remove: 79
remove: 12
remove: 44
remove: 25
remove: 27
Done ;-)
I suggest you compare this output to the one from stack. It's almost completely different. I guess that's it for this array implementation of this FIFO data structure. And now, onto something more complex...
The next step up in complexity is a list. Most people prefer to implement a list as a linked list (and I'll show how to do that later), but what most people miss, is that lists can also be implemented using arrays. A list has no particular structure; it just has to allow for the insertion and removal of objects from both ends, and some way of looking at the middle elements.
A list is kind of a stack combined with a queue; with additional feature of looking at the middle elements. Preferably, a list should also contain the current number of elements. Well, lets not just talk about a list, but write one!
public class pArrayList{
protected Object[] array;
protected int start,end,number;
public pArrayList(int maxsize){
array = new Object[maxsize];
start = end = number = 0;
}
public boolean isEmpty(){
return number == 0;
}
public boolean isFull(){
return number >= array.length;
}
public int size(){
return number;
}
public void insert(Object o){
if(number < array.length){
array[start = (++start % array.length)] = o;
number++;
}
}
public void insertEnd(Object o){
if(number < array.length){
array[end] = o;
end = (--end + array.length) % array.length;
number++;
}
}
public Object remove(){
if(isEmpty())
return null;
number--;
int i = start;
start = (--start + array.length) % array.length;
return array[i];
}
public Object removeEnd(){
if(isEmpty())
return null;
number--;
return array[end = (++end % array.length)];
}
public Object peek(int n){
if(n >= number)
return null;
return array[(end + 1 + n) % array.length];
}
}
The class contains four data elements: array,
start, end, and number. The number is
the number of elements inside the array. The start is the starting pointer,
and the end is the ending pointer inside the array (kind of like
the queue design).
The constructor, pArrayList(), and
methods isEmpty(), isFull(), and size(), are pretty
much self explanatory. The insert() method works exactly the same way as an
equivalent queue method. It just increments the start pointer, does a mod
(the % symbol), and inserts into the resulting position.
The insertEnd(Object) method, first
checks that there is enough space inside the array. It then inserts the
element into the end location. The next trick is to decrement the end
pointer, add the array.length, and do a mod with array.length.
This had the effect of moving the end pointer backwards (as if we had
inserted something at the end).
The Object remove() method works on a
very similar principle. First, it checks to see if there are elements to remove, if not,
it simply returns a null (no Object). It then decrements number.
We're keeping track of this number inside all insertion and removal methods,
so that it always contains the current number of elements. We then create a
temporary variable to hold the current position of the start pointer. After
that, we update the start pointer by first decrementing it, adding array.length
to it, and doing a mod with array.length. This gives the appearance of
removing an element from the front of the list. We later return the position inside the
array, which we've saved earlier inside that temporary variable 'i'.
The Object removeEnd() works similar to
the insert() method. It checks to see if there are elements to remove by
calling isEmpty() method, if there aren't, it returns null. It
then handles the number (number of elements) business, and proceeds with
updating the end pointer. It first increments the end pointer,
and then does a mod with array.length, and returns the resulting position.
Simple?
This next Object peek(int n) method is
the most tricky one. It accepts an integer, and we need to return the number which this
integer is pointing to. This would be no problem if we were using an array
that started at 0, but we're using our own implementation, and the list
doesn't necessarily start at array position 0. We start this by
checking if the parameter 'n' is not greater than the number of
elements, if it is, we return null (since we don't want to go past the bounds
of the array). What we do next is add 'n' (the requesting
number) to an incremented end pointer, and do a mod array.length.
This way, it appears as if this function is referencing the array from 0
(while the actual start is the incremented end pointer).
As I've said previously, the code you write is useless, unless it's working, so, lets write a test driver to test our list class. To write the test driver, I've converted that really cool Queue test driver, and added some features to test out the specifics of lists. Well, here it is:
import java.io.*;
import pArrayList;
class pArrayListTest{
public static void main(String[] args){
pArrayList l = new pArrayList(10);
Integer j = null;
int i;
System.out.println("starting...");
for(i=0;i<5;i++){
j = new Integer((int)(Math.random() * 100));
l.insert(j);
System.out.println("insert: " + j);
}
while(!l.isFull()){
j = new Integer((int)(Math.random() * 100));
l.insertEnd(j);
System.out.println("insertEnd: " + j);
}
for(i=0;i<l.size();i++)
System.out.println("peek "+i+": "+l.peek(i));
for(i=0;i<5;i++)
System.out.println("remove: " + ((Integer)l.remove()));
while(!l.isEmpty())
System.out.println("removeEnd: " + ((Integer)l.removeEnd()));
System.out.println("Done ;-)");
}
}
The test driver is nothing special, it inserts (in
front) five random numbers, and the rest into the back (also five). It then prints out the
entire list by calling peek() inside a for loop. It then
continues with the removal (from front) of five numbers, and later removing the rest (also
five). At the end, the program prints "Done" with a cute smiley face ;-)
The output from this test driver is given below. I suggest you examine it thoroughly, and make sure you understand what's going on inside this data structure.
starting...
insert: 14
insert: 72
insert: 71
insert: 11
insert: 27
insertEnd: 28
insertEnd: 67
insertEnd: 36
insertEnd: 19
insertEnd: 45
peek 0: 45
peek 1: 19
peek 2: 36
peek 3: 67
peek 4: 28
peek 5: 14
peek 6: 72
peek 7: 71
peek 8: 11
peek 9: 27
remove: 27
remove: 11
remove: 71
remove: 72
remove: 14
removeEnd: 45
removeEnd: 19
removeEnd: 36
removeEnd: 67
removeEnd: 28
Done ;-)
Well, if you really understand everything up to this
point, there is nothing new anybody can teach you about arrays (since you know all the
basics). There are however public tools available to simplify your life. Some are good,
some are bad, but one that definitely deserves to have a look at is the java.util.Vector
class; and that's what the next section is about!
The java.util.Vector class is provided
by the Java API, and is one of the most useful array based data storage classes I've ever
seen. The information provided here is as far as JDK 1.2 goes, future versions may have
other implementations; still, the functionality should remain the same. A vector, is a
growing array; as more and more elements are added onto it, the array grows. There is also
a possibility of making the array smaller.
But wait a minute, all this time I've been saying
that arrays can't grow or shrink, and it seems Java API has done it. Not quite. The java.util.Vector
class doesn't exactly grow, or shrink. When it needs to do these operations, it simply
allocates a new array (of appropriate size), and copies the contents of the old array into
the new array. Thus, giving the impression that the array has changed size.
All these memory operations can get quite expensive
if a Vector is used in a wrong way. Since a Vector has a similar
architecture to the array stack we've designed earlier, the best and fastest way to
implement a Vector is to do stack operations. Usually, in programs, we need a
general data storage class, and don't really care about the order in which things are
stored or retrieved; that's where java.util.Vector comes in very useful.
Using a Vector to simulate a queue is
very expensive, since every time you insert or remove, the entire array has to be copied
(not necessarily reallocated but still involves lots of useless work).
Vector allows us to view it's insides
using an Enumerator; a class to go through objects. It is very useful to
first be able to look what you're looking for, and only later decide whether you'd like to
remove it or not. A sample program that uses java.util.Vector for it's
storage follows.
import java.io.*;
import java.util.*;
class pVectorTest{
public static void main(String[] args){
Vector v = new Vector(15);
Integer j = null;
int i;
System.out.println("starting...");
for(i=0;i<10;i++){
j = new Integer((int)(Math.random() * 100));
v.addElement(j);
System.out.println("addElement: " + j);
}
System.out.println("size: "+v.size());
System.out.println("capacity: "+v.capacity());
Enumeration enum = v.elements();
while(enum.hasMoreElements())
System.out.println("enum: "+(Integer)enum.nextElement());
System.out.println("Done ;-)");
}
}
The example above should be self evident (if you paid attention when I showed test programs for the previous data structures). The main key difference is that this one doesn't actually remove objects at the end; we just leave them inside. Removal can be accomplished very easily, and if you'll be doing anything cool with the class, you'll sure to look up the API specs.
Printing is accomplished using an Enumerator;
which we use to march through every element printing as we move along. We could also have
done the same by doing a for loop, going from 0 to v.size(),
doing a v.elementAt(int) every time through the loop. The output from the
above program follows:
starting...
addElement: 9
addElement: 5
addElement: 54
addElement: 49
addElement: 60
addElement: 81
addElement: 8
addElement: 91
addElement: 76
addElement: 81
size: 10
capacity: 15
enum: 9
enum: 5
enum: 54
enum: 49
enum: 60
enum: 81
enum: 8
enum: 91
enum: 76
enum: 81
Done ;-)
You should notice that when we print the size
and capacity, they're different. The size is the current number
of elements inside the Vector, and the capacity, is the maximum
possible without reallocation.
A trick you can try yourself when playing with the Vector
is to have Vectors of Vectors (since Vector is also
an Object, there shouldn't be any problems of doing it). Constructs like that
can lead to some interesting data structures, and even more confusion. Just try inserting
a Vector into a Vector ;-)
I guess that covers the Vector class.
If you need to know more about it, you're welcome to read the API specs for it. I also
greatly encourage you to look at java.util.Vector source, and see for
yourself what's going on inside that incredibly simple structure.
The other type of data structures are what's called Node based data structures. Instead of storing data in it's raw format, Node based data structures store nodes, which in turn store the data. Think of nodes as being elements, which may have one or more pointers to other nodes.
Yes, I did say the "pointer" word. Many people think that there are no pointers in Java, but just because you don't see them directly, doesn't mean they're not there. In fact, you can treat any object as a pointer.
Thus, the Node structure should have a data element, and a reference to another node (or nodes). Those other nodes which are referenced to, are called child nodes. The node itself is called the parent node (or sometimes a "father" node) in reference to it's children. (nice big happy family)
Well, the best way to visualize a node is to create one, so, lets do it. The node we'll create will be a one child node (it will have only one pointer), and we'll later use it in later sections to build really cool data structures. The source for our one child node follows:
public class pOneChildNode{
protected Object data;
protected pOneChildNode next;
public pOneChildNode(){
next = null;
data = null;
}
public pOneChildNode(Object d,pOneChildNode n){
data = d;
next = n;
}
public void setNext(pOneChildNode n){
next = n;
}
public void setData(Object d){
data = d;
}
public pOneChildNode getNext(){
return next;
}
public Object getData(){
return data;
}
public String toString(){
return ""+data;
}
}
Go over the source, notice that it's nothing more
than just set and get functions (pretty simple). The two data members are the data
and next. The data member holds the data of the node, and next
holds the pointer to the next node. Notice that next is of the same type as
the class itself; it effectively points to the object of same class!
The String toString() method is the
Java's standard way to print things. If an object wants to be printed in a special way, it
will define this method, with instructions on how to print this object. In our case, we
just want to print the data. Adding data to a bunch of quotation
marks automatically converts it to type String (hopefully, our data will also
have a toString() method defined on it). Without this method, we get the
actual binary representation of the data members of this class (not a pretty nor
meaningful printout).
Node based data structures provide for dynamic growing and shrinking, and are the key to some complex algorithms (as you'll see later). Now that we know how to implement a Node, lets get to something cool...
A linked list is just a chain of nodes, with each subsequent node being a child of the previous one. Many programs rely on linked lists for their storage because these don't have any evident restrictions. For example, the array list we did earlier could not grow or shrink, but node based ones can! This means there is no limit (other than the amount of memory) on the number of elements they can store.
A linked list has just one node, that node, has
links to subsequent nodes. So, the entire list can be referenced from that one node. That
first node is usually referred to as the head of the list. The last node in the chain of
nodes usually has some special feature to let us know that it's last. That feature, most
of the time is a null pointer to the next node.
[node0]->[node1]->[node2]->[node3]->[node4]->null
The example above illustrates the node organization
inside the list. In it, node0 is the head node, and node4 is the
last node, because it's pointer points to null. Well, now that you know how
it's done, and what is meant by a linked list, lets write one. (I mean, that's why we're
here, to actually write stuff!)
import pOneChildNode;
public class pLinkedList{
protected pOneChildNode head;
protected int number;
public pLinkedList(){
head = null;
number = 0;
}
public boolean isEmpty(){
return head == null;
}
public int size(){
return number;
}
public void insert(Object obj){
head = new pOneChildNode(obj,head);
number++;
}
public Object remove(){
if(isEmpty())
return null;
pOneChildNode tmp = head;
head = tmp.getNext();
number--;
return tmp.getData();
}
public void insertEnd(Object obj){
if(isEmpty())
insert(obj);
else{
pOneChildNode t = head;
while(t.getNext() != null)
t=t.getNext();
pOneChildNode tmp =
new pOneChildNode(obj,t.getNext());
t.setNext(tmp);
number++;
}
}
public Object removeEnd(){
if(isEmpty())
return null;
if(head.getNext() == null)
return remove();
pOneChildNode t = head;
while(t.getNext().getNext() != null)
t = t.getNext();
Object obj = t.getNext().getData();
t.setNext(t.getNext().getNext());
number--;
return obj;
}
public Object peek(int n){
pOneChildNode t = head;
for(int i = 0;i<n && t != null;i++)
t = t.getNext();
return t.getData();
}
}
Before we move on, lets go over the source. There
are two data members, one named head, and the other named number.
Head is the first node of the list, and number is the total number of nodes
in the list. Number is primarily used for the size() method. The constructor,
pLinkedList() is self explanatory. The size() and isEmpty()
methods are also pretty easy.
Here comes the hard part, the insertion and removal
methods. Method insert(Object) creates a new pOneChildNode
object with next pointer pointing to the current head, and data
the data which is inserted. It then sets the head to that new node. If you
think about it, you'll notice that the head is still saved, and the new node
points to it.
Method Object remove() works in a very
similar fashion, but instead of inserting, it is removing. It first checks to see if the
list is isEmpty() or not, if it is, it returns a null. It then
saves the current head node, and then changes it to accommodate the removal
(think about the logic), decrements the number, and returns the data
from the previously saved node.
In the method insertEnd(Object), we're
actually inserting at the end of the list. We first check to see if the list is isEmpty(),
if it is, we do a regular insertion (since it really doesn't matter which direction we're
coming from if the list is empty). We then setup a loop to search for the
end. The end is symbolized by the next pointer of the node being null.
When we get to the end, we create a new node, and place it at the end location.
Incrementing number before we return.
Method Object removeEnd() works in a
similar fashion as insertend(Object) method. It also goes through the whole
list to look for the end. At the beginning, we check if the list is not isEmpty(),
if it is, we return a null. We then check to see if there is only one element
in the list, if there is only one, we remove it using regular remove(). We
then setup a loop to look for the node who's child is the last node. It is important to
realize that if we get to the last node, we won't be able to erase it; we need the last
node's parent node. When we find it, we get the data, setup necessary links,
decrement number, and return the data.
The Object peek(int) method simply goes
through the list until it either reaches the element requested, or the end of the list. If
it reaches the end, it should return a null, if not, it should return the
correct location inside the list.
As I have said before, it is very important to
actually test. The ideas could be fine, and logical, but if it doesn't work, it doesn't
work. So, lets convert our pArrayListTest driver to accommodate this class.
import java.io.*;
import pLinkedList;
class pLinkedListTest{
public static void main(String[] args){
pLinkedList l = new pLinkedList();
Integer j = null;
int i;
System.out.println("starting...");
for(i=0;i<5;i++){
j = new Integer((int)(Math.random() * 100));
l.insert(j);
System.out.println("insert: " + j);
}
for(;i<10;i++){
j = new Integer((int)(Math.random() * 100));
l.insertEnd(j);
System.out.println("insertEnd: " + j);
}
for(i=0;i<l.size();i++)
System.out.println("peek "+i+": "+l.peek(i));
for(i=0;i<5;i++)
System.out.println("remove: " + ((Integer)l.remove()));
while(!l.isEmpty())
System.out.println("removeEnd: " + ((Integer)l.removeEnd()));
System.out.println("Done ;-)");
}
}
The test driver is nothing special, it's just a pretty simple conversion of the old test driver, so, I wont spend any time discussing it. The output follows.
starting...
insert: 65
insert: 78
insert: 21
insert: 73
insert: 62
insertEnd: 82
insertEnd: 63
insertEnd: 6
insertEnd: 95
insertEnd: 57
peek 0: 62
peek 1: 73
peek 2: 21
peek 3: 78
peek 4: 65
peek 5: 82
peek 6: 63
peek 7: 6
peek 8: 95
peek 9: 57
remove: 62
remove: 73
remove: 21
remove: 78
remove: 65
removeEnd: 57
removeEnd: 95
removeEnd: 6
removeEnd: 63
removeEnd: 82
Done ;-)
Look over the output, make sure you understand why
you get what you get. Linked lists are one of the most important data structures you'll
ever learn, and it really pays to know them well. Don't forget that you can always
experiment. One exercise I'd like to leave up to the reader is to create a circular list.
In a circular list, the last node is not pointing to null, but to the first
node (creating a circle). Sometimes, lists are also implemented using two pointers; and
there are many other variations you should consider and try yourself. You can even make it
faster by having a "dummy" first node and/or "tail" node. This will
eliminate most special cases, making it faster on insertions and deletions.
I guess that's it for the lists, next, I'll show you how to do simple and easy tricks to re-use code.
We have already written quite a lot of useful stuff, and there might come a time, when you're just too lazy to write something new, and would like to reuse the old source. This section will show you how you can convert some data structures previously devised, to implement a stack and a queue, with almost no creativity (by simply reusing the old source).
Before we start, lets review the function of a stack. It has to be able to push and pop items of from one end. What structure do we know that can do something similar? A list! Lets write a list implementation of a stack.
import pLinkedList;
public class pEasyStack{
protected pLinkedList l;
public pEasyStack(){
l = new pLinkedList();
}
public boolean isEmpty(){
return l.isEmpty();
}
public void push(Object o){
l.insert(o);
}
public Object pop(){
return l.remove();
}
}
See how easily the above code simulates a stack by using a list? It may not be the best implementation, and it's certainly not the fastest, but when you need to get the project compiled and tested, you don't want to spend several unnecessary minutes writing a full blown optimized stack. Test for the stack follows:
import java.io.*;
import pEasyStack;
class pEasyStackTest{
public static void main(String[] args){
pEasyStack s = new pEasyStack();
Integer j = null;
int i;
System.out.println("starting...");
for(i=0;i<10;i++){
j = new Integer((int)(Math.random() * 100));
s.push(j);
System.out.println("push: " + j);
}
while(!s.isEmpty()){
System.out.println("pop: " + ((Integer)s.pop()));
}
System.out.println("Done ;-)");
}
}
The stack test program is exactly the same as for the previous stack version (doesn't really need explanation). For the completion, I'll also include the output.
starting...
push: 23
push: 99
push: 40
push: 78
push: 54
push: 27
push: 52
push: 34
push: 98
push: 89
pop: 89
pop: 98
pop: 34
pop: 52
pop: 27
pop: 54
pop: 78
pop: 40
pop: 99
pop: 23
Done ;-)
You've seen how easily we can make a stack. What about other data structures? Well, we can just as easily implement a queue. One thing though, now instead of just inserting and removing, we'll be removing from the end (the other from the one we're inserting).
import pLinkedList;
public class pEasyQueue{
protected pLinkedList l;
public pEasyQueue(){
l = new pLinkedList();
}
public boolean isEmpty(){
return l.isEmpty();
}
public void insert(Object o){
l.insert(o);
}
public Object remove(){
return l.removeEnd();
}
}
Pretty easy huh? Well, the test driver is also easy.
import java.io.*;
import pEasyQueue;
class pEasyQueueTest{
public static void main(String[] args){
pEasyQueue s = new pEasyQueue();
Integer j = null;
int i;
System.out.println("starting...");
for(i=0;i<10;i++){
j = new Integer((int)(Math.random() * 100));
s.insert(j);
System.out.println("insert: " + j);
}
while(!s.isEmpty()){
System.out.println("remove: " + ((Integer)s.remove()));
}
System.out.println("Done ;-)");
}
}
I guess you get the picture, reusing code may not always be the best choice, but it sure is the easiest! Definitely, if you have time, always write a better implementation; these approaches are only good for the deadline, just to compile and test the code before the actual hard work of optimizing it. For the sake of completeness, the output of the above program follows:
starting...
insert: 77
insert: 79
insert: 63
insert: 59
insert: 22
insert: 62
insert: 54
insert: 58
insert: 79
insert: 25
remove: 77
remove: 79
remove: 63
remove: 59
remove: 22
remove: 62
remove: 54
remove: 58
remove: 79
remove: 25
Done ;-)
I guess that covers code reuse! And remember, you
can also reuse the code in the Java API. The reuse can be of many forms. You can reuse it
the way I've shown in this section, or you can extend that particular class to provide the
necessary functions. (Extending a java.util.Vector class is very useful ;-)
The next major set of data structures belongs to what's called Trees. They are called that, because if you try to visualize the structure, it kind of looks like a tree (root, branches, and leafs). Trees are node based data structures, meaning that they're made out of small parts called nodes. You already know what a node is, and used one to build a linked list. Tree Nodes have two or more child nodes; unlike our list node, which only had one child.
Trees are named by the number of children their nodes have. For example, if a tree node has two children, it is called a binary tree. If it has three children, it is called tertiary tree. If it has four children, it is called a quad tree, and so on. Fortunately, to simplify things, we only need binary trees. With binary trees, we can simulate any tree; so the need for other types of trees only becomes a matter of simplicity for visualization.
Since we'll be working with binary trees, lets write
a binary tree node. It's not going to be much different from our pOneChildNode
class; actually, it's going to be quite the same, only added support for one more pointer.
The source for the follows:
public class pTwoChildNode{
protected Object data;
protected pTwoChildNode left,right;
public pTwoChildNode(){
data = null;
left = right = null;
}
public pTwoChildNode(Object d){
data = d;
left = right = null;
}
public void setLeft(pTwoChildNode l){
left = l;
}
public void setRight(pTwoChildNode r){
right = r;
}
public void setData(Object d){
data = d;
}
public pTwoChildNode getLeft(){
return left;
}
public pTwoChildNode getRight(){
return right;
}
public Object getData(){
return data;
}
public String toString(){
return ""+data;
}
}
This node is so much similar to the previous node we
did, that I'm not even going to cover this one. (I assume you'll figure it out by simply
looking at the source) The children of the node are named left and right;
these will be the left branch of the node and a right branch. If a node has no children,
it is called a leaf node. If a node has no parent (it's the father of every node), it is
called the root of the tree. This weird terminology comes from the tree analogy, and from
the family tree analogy.
Some implementations of the tree node, might also have a back pointer to the parent node, but for what we'll be doing with the nodes, it's not necessary. The next section will talk about a generic binary tree which will be later used to create something cool.
Binary Trees are quite complex, and most of the time, we'd be writing a unique implementation for every specific program. One thing that almost never changes though is the general layout of a binary tree. We will first implement that general layout as an abstract class (a class that can't be used directly), and later write another class which extends our layout class.
Trees have many different algorithms associated with them. The most basic ones are the traversal algorithms. Traversals algorithms are different ways of going through the tree (the order in which you look at it's values). For example, an in-order traversal first looks at the left child, then the data, and then the right child. A pre-order traversal, first looks at the data, then the left child, and then the right; and lastly, the post-order traversal looks at the left child, then right child, and only then data. Different traversal types mean different things for different algorithms and different trees. For example, if it's binary search tree (I'll show how to do one later), then the in-order traversal will print elements in a sorted order.
Well, lets not just talk about the beauties of trees, and start writing one! The code that follows creates an abstract Generic Binary Tree.
import pTwoChildNode;
public abstract class pGenericBinaryTree{
private pTwoChildNode root;
protected pTwoChildNode getRoot(){
return root;
}
protected void setRoot(pTwoChildNode r){
root = r;
}
public pGenericBinaryTree(){
setRoot(null);
}
public pGenericBinaryTree(Object o){
setRoot(new pTwoChildNode(o));
}
public boolean isEmpty(){
return getRoot() == null;
}
public Object getData(){
if(!isEmpty())
return getRoot().getData();
return null;
}
public pTwoChildNode getLeft(){
if(!isEmpty())
return getRoot().getLeft();
return null;
}
public pTwoChildNode getRight(){
if(!isEmpty())
return getRoot().getRight();
return null;
}
public void setData(Object o){
if(!isEmpty())
getRoot().setData(o);
}
public void insertLeft(pTwoChildNode p,Object o){
if((p != null) && (p.getLeft() == null))
p.setLeft(new pTwoChildNode(o));
}
public void insertRight(pTwoChildNode p,Object o){
if((p != null) && (p.getRight() == null))
p.setRight(new pTwoChildNode(o));
}
public void print(int mode){
if(mode == 1) pretrav();
else if(mode == 2) intrav();
else if(mode == 3) postrav();
}
public void pretrav(){
pretrav(getRoot());
}
protected void pretrav(pTwoChildNode p){
if(p == null)
return;
System.out.print(p.getData()+" ");
pretrav(p.getLeft());
pretrav(p.getRight());
}
public void intrav(){
intrav(getRoot());
}
protected void intrav(pTwoChildNode p){
if(p == null)
return;
intrav(p.getLeft());
System.out.print(p.getData()+" ");
intrav(p.getRight());
}
public void postrav(){
postrav(getRoot());
}
protected void postrav(pTwoChildNode p){
if(p == null)
return;
postrav(p.getLeft());
postrav(p.getRight());
System.out.print(p.getData()+" ");
}
}
Now, lets go over it. The pGenericBinaryTree
is a fairly large class, with a fair amount of methods. Lets start with the one and only
data member, the root! In this abstract class, root
is a private head of the entire tree. Actually, all we need is root
to access anything (and that's how you'd implement it in other languages). Since we'd like
to have access to root from other places though (from derived classes, but
not from the "outside," we've also added two methods, named getRoot(),
and setRoot() which get and set the value of root respectively.
We have two constructors, one with no arguments
(which only sets root to null), and another with one argument (the first
element to be inserted on to the tree). Then we have a standard isEmpty()
method to find out if the tree is empty. You'll also notice that implementing a counter
for the number of elements inside the tree is not a hard thing to do (very similar to the
way we did it in a linked list).
The getData() method returns the data
of the root node. This may not be particularly useful to us right now, but
may be needed in some derived class (so, we stick it in there just for convenience).
Throughout data structures, and mostly entire programming world, you'll find that certain
things are done solely for convenience. Other "convenient" methods are getLeft(),
getRight() and setData().
The two methods we'll be using later (for something
useful), are: insertLeft(pTwoChildNode,Object), and insertRight(pTwoChildNode,Object).
These provide a nice way to quickly insert something into the left child (sub-tree) of the
given node.
The rest are just print methods. The trick about trees are that they can be traversed in many different ways, and these print methods print out the whole tree, in different traversals. All of these are useful, depending on what you're doing, and what type of tree you have. Sometimes, some of them make absolutely no sense, depending on what you're doing.
Printing methods are recursive; a lot of the tree
manipulation functions are recursive, since they're described so naturally in recursive
structures. A recursive function is a function that calls itself (kind of like pretrav(),
intrav(), and postrav() does).
Go over the source, make sure you understand what
each function is doing (not a hard task). It's not important for you to understand why we
need all these functions at this point (for now, we "just" need them); you'll
understand why we need some of them in the next few sections, when we extend
this class to create a really cool sorting engine.
Comparing Objects in Java can be a daunting task,
especially if you have no idea how it's done. In Java, we can only compare variables of native
type. These include all but the objects (ex: int, float, double,
etc.). To compare Objects, we have to make objects with certain properties; properties
that will allow us to compare.
We usually create an interface, and implement
it inside the objects we'd like to compare. In our case, we'll call the interface
pComparable. Interfaces are easy to write, since they're kind of like abstract
classes.
public interface pComparable{
public int compareTo(Object o);
}
As you can see, there is nothing special to simple
interfaces. Now, the trick is to implement it. You might be saying, why am I
covering comparing of objects right in the middle of a binary tree discussion... well, we
can't have a binary search tree without being able to compare objects. For example, if
we'd like to use integers in our binary search tree, we'll have to design our own integer,
and let it have a pComparable interface.
Next follows our implementation of pInteger,
a number with a pComparable interface. I couldn't just extend
the java.lang.Integer, since it's final (cannot be extended)
(those geniuses!).
public class pInteger implements pComparable{
int i;
public pInteger(){
}
public pInteger(int j){
set(j);
}
public int get(){
return i;
}
public void set(int j){
i = j;
}
public String toString(){
return ""+get();
}
public int compareTo(Object o){
if(o instanceof pInteger)
if(get() > ((pInteger)o).get())
return 1;
else if(get() < ((pInteger)o).get())
return -1;
return 0;
}
}
I believe most of the interface is self
explanatory, except maybe for the compareTo(Object) method. In the method, we
first make sure that the parameter is of type pInteger, and later using
casting, and calling methods, we compare the underlying native members of pInteger
and return an appropriate result.
A note on JDK 1.2: In the new versions of the JDK,
you won't need to implement your own pComparable, or your own pInteger;
since it's built in! There is a Comparable interface, and it's already implemented
by the built in java.lang.Integer, java.lang.String, and other
classes where you might need comparisons. I'm doing it this way only for compatibility
with the older versions of JDK. I'll talk about JDK 1.2 features later in this document
(hopefully).
And now, back to the show, or shall I say Binary Trees! A binary tree we'll be designing in this section will be what's known as binary search tree. The reason it's called this is that it can be used to sort numbers (or objects) in a way, that makes it very easy to search them; traverse them. Remember how I've said that traversals only make sense in some specific context, well, in binary search tree, only the in-traversal makes sense; in which numbers (objects) are printed in a sorted fashion. Although I'll show all traversals just for the fun of it.
A binary search tree will extend our pGenericBinaryTree,
and will add on a few methods. One that we definitely need is the insert()
method; to insert objects into a tree with binary search in mind. Well,
instead of just talking about it, lets write the source!
import pComparable;
public class pBinarySearchTree extends pGenericBinaryTree{
public pBinarySearchTree(){
super();
}
public pBinarySearchTree(Object o){
super(o);
}
public void print(){
print(2);
}
public void insert(pComparable o){
pTwoChildNode t,q;
for(q = null, t = getRoot();
t != null && o.compareTo(t.getData()) != 0;
q = t,t = o.compareTo(t.getData()) < 0
? t.getLeft():t.getRight());
if(t != null)
return;
else if(q == null)
setRoot(new pTwoChildNode(o));
else if(o.compareTo(q.getData()) < 0)
insertLeft(q,o);
else
insertRight(q,o);
}
}
As you can obviously see, the insert(pComparable)
method is definitely the key to the whole thing. The method starts out by declaring two
variables, 't', and 'q'. It then falls into a for
loop. The condition inside the for loop is that 't' does not
equal to null (since it was initially set to getRoot(), which
effectively returns the value of root), and while the object we're trying to insert
does not equal to the object already inside the tree.
Usually, a binary search tree does not allow
duplicate insertions, since they're kind of useless; that's why we're attempting to catch
the case where we're trying to insert a duplicate. Inside the for
loop, we set 'q' to the value of the next node to be examined. We do this by
first comparing the data we're inserting with the data in the current node, if it's
greater, we set 't' to the right node, if less, we set it to the left node
(all this is cleverly disguised inside that for statement).
We later check the value of 't' to make
sure we've gotten to the end (or leaf) of the tree. If 't' is not null,
that means we've encountered a duplicate, and we simply return. We then check
to see if the tree is empty (didn't have a root), if it didn't, we create a new
root by calling setRoot() with a newly created node holding the
inserted data.
If all else fails, simply insert the
object into the left or the right child of the leaf node depending on the value of the
data. And that's that!
Understanding binary search trees is not easy, but it is the key to some very interesting algorithms. So, if you miss out on the main point here, I suggest you read it again, or get a more formal reference (where I doubt you'll learn more).
Anyway, as it was with our stacks and queues, we always had to test everything, so, lets test it! Below, I give you the test module for the tree.
import java.io.*;
import pInteger;
import pBinarySearchTree;
class pBinarySearchTreeTest{
public static void main(String[] args){
pBinarySearchTree tree = new pBinarySearchTree();
pInteger n;
int i;
System.out.println("Numbers inserted:");
for(i=0;i<10;i++){
tree.insert(n=new pInteger((int)(Math.random()*1000)));
System.out.print(n+" ");
}
System.out.println("\nPre-order:");
tree.print(1);
System.out.println("\nIn-order:");
tree.print();
System.out.println("\nPost-order:");
tree.print(3);
}
}
As you can see, it's pretty simple (and similar to
our previous tests). It first inserts ten pInteger (pComparable)
objects in to the tree, and then traverses the tree in different orders. These different
orders print out the whole tree. Since we know it's a binary search tree, the in-order
traversal should produce an ordered output. So, lets take a look at the output!
Numbers inserted:
500 315 219 359 259 816 304 902 681 334
Pre-order:
500 315 219 259 304 359 334 816 681 902
In-order:
219 259 304 315 334 359 500 681 816 902
Post-order:
304 259 219 334 359 315 681 902 816 500
Well, our prediction is confirmed! The in-order traversal did produce sorted results. There is really nothing more I can say about this particular binary search tree, except that it's worth knowing. This is definitely not the fastest (nor was speed an issue), and not necessarily the most useful class, but it sure may proof useful in teaching you how to use trees.
And now, onto something completely different! NOT! We're going to be doing trees for a while... I want to make sure you really understand what they are, and how to use them. (and to show you several tricks other books try to avoid <especially Java books>)
I've talked about tree traversals before in this document, but lets review what I've said. Tree's are created for the sole purpose of traversing them. There are two major traversal algorithms, the depth-first, and breadth-first.
So far, we've only looked at depth-first. Pre-traversal, in-traversal, and post-traversal are subsets of depth-first traversals. The reason it's named depth-first, is because we eventually end up going to the deepest node inside the tree, while still having unseen nodes closer to the root (it's hard to explain, and even harder to understand). Tracing a traversal surely helps; and you can trace that traversal from the previous section (it's only ten numbers!).
The other type of traversal is more intuitive; more "human like." Breadth-first traversal goes through the tree top to bottom, left to right. Lets say you were given a tree to read (sorry, don't have a non-copyrighted picture I can include), you'd surely read it top to bottom, left to right (just like a page of text, or something).
Think of a way you visualize a tree... With the root node on top, and all the rest extending downward. What Breadth-First allows us to do is to trace the tree from top to bottom as you see it. It will visit each node at a given tree depth, before moving onto the the next depth.
A lot of the algorithms are centered around Breadth-First method. Like the search tree for a Chess game. In chess, the tree can be very deep, so, doing a Depth-First traversal (search) would be costly, if not impossible. With Breadth-First as applied in Chess, the program only looks at several moves ahead, without looking too many moves ahead.
The Breadth-First traversal is usually from left to right, but that's usually personal preference. Because the standard consul does not allow graphics, the output may be hard to correlate to the actual tree, but I will show how it's done.
As with previous examples, I will provide some
modified source that will show you how it's done. An extended pBinarySearchTree
is shown below:
import pTwoChildNode;
import pBinarySearchTree;
import pEasyQueue;
public class pBreadthFirstTraversal extends pBinarySearchTree{
public void breadth_first(){
pEasyQueue q = new pEasyQueue();
pTwoChildNode tmp;
q.insert(getRoot());
while(!q.isEmpty()){
tmp = (pTwoChildNode)q.remove();
if(tmp.getLeft() != null)
q.insert(tmp.getLeft());
if(tmp.getRight() != null)
q.insert(tmp.getRight());
System.out.print(tmp.getData()+" ");
}
}
}
As you can see, the class is pretty simple (only one
function). In this demo, we're also using pEasyQueue, developed earlier in
this document. Since breadth first traversal is not like depth first, we can't use
recursion, or stack based methods, we need a queue. Any recursive method can be easily
simulated using a stack, not so with breadth first, here, we definitely need a queue.
As you can see, we start by first inserting the root
node on to the queue, and loop while the queue is not isEmpty(). If we have a
left node in the node being examined, we insert it in to the queue, etc. (same goes for
the right node). Eventually, the nodes inserted in to the queue, get removed, and
subsequently, have their left children examined. The process continues until we've
traversed the entire tree, from top to bottom, left to right order.
Now, lets test it. The code below is pretty much the same code used to test the tree, with one minor addition; the one to test the breadth-first traversal!
import java.io.*;
import pInteger;
import pBinarySearchTree;
class pBreadthFirstTraversalTest{
public static void main(String[] args){
pBreadthFirstTraversal tree = new pBreadthFirstTraversal();
pInteger n;
int i;
System.out.println("Numbers inserted:");
for(i=0;i<10;i++){
tree.insert(n=new pInteger((int)(Math.random()*1000)));
System.out.print(n+" ");
}
System.out.println("\nPre-order:");
tree.print(1);
System.out.println("\nIn-order:");
tree.print();
System.out.println("\nPost-order:");
tree.print(3);
System.out.println("\nBreadth-First:");
tree.breadth_first();
}
}
As you can see, nothing too hard. Next, goes the output of the above program, and you and I will have to spend some time on the output.
Numbers inserted:
890 474 296 425 433 555 42 286 724 88
Pre-order:
890 474 296 42 286 88 425 433 555 724
In-order:
42 88 286 296 425 433 474 555 724 890
Post-order:
88 286 42 433 425 296 724 555 474 890
Breadth-First:
890 474 296 555 42 425 724 286 433 88
Looking at the output in this format is very
abstract and is not very intuitive. Lets just say we have some sort of a tree, containing
these numbers above. We were looking at the root node. Now, looking at the output of this
program, can you guess what the root node is? Well, it's the first number in
breadth-first: 890. The left child of the root is: 474.
Basically, the tree looks like:
|--[890]
|
|--[474]--|
| |
|--[296]--| [555]--|
| | |
[ 42]--| [425]--| [724]
| |
|--[286] [433]
|
[ 88]
As is evident from this picture, I'm a terrible ASCII artist! (If it's screwed up, sorry).
What you can also see is that if you read the tree form left to right, top to bottom, you end up with the breadth first traversal. Actually, this is a good opportunity for you to go over all traversals, and see how they do it, and if they make sense.
There are many variations to these traversals (and I'll show a few in subsequent sections), for now, however, try to understand these four basic ones.
Well, see you again in the next section, where we examine some ways you can speed up your code and take a look at JDK 1.2 features, which will simplify our life a LOT!
Since you've looked at node based data structures, and their flexibility, you might wonder why would we want to use anything else. Well, sometimes, the situation comes where the standards are not enough. Where we need that extra boost of performance, which is currently occupied by the Java's garbage collection.
In any environment, allocating and releasing memory is slow! Most of the time, you should try to avoid the procedure, unless you really need it (or it's not in a time critical loop). You certainly wouldn't want to allocate and/or release memory every time you do anything to a list, or a binary tree. What can we do to avoid the natural way? Plenty!
The most common way around these kinds of problems is to use our own allocation technique, which is perfectly suited for our purposes, and will generally be much faster than the current system's approach (but always make sure; system techniques are quite fast as well).
Using our own allocation scheme does imply that we'll have to restrict our selves, but most of the times (when you really need it), the restriction is worth while. What we will be doing is simulating the allocation and release of nodes, using an array of nodes, stored in a linked list fashion. It may seem complex, but in reality, it's not.
This array of nodes, is usually referred to as the "Node Pool" (we "catch" one when we need one, and "throw it back" when we don't). I know, I know, I haven't been the same since that girl on IRC has stolen my humor.
There are two considerations we should think about before we start programming, however. These two are whether we want to make the node-pool itself static or dynamic? Shall we allocate the node-pool with every new instance of the object, or shall we just share one common node-pool for all the objects of that class?
These two approaches are both quite useful in different situations. For example, when was the last time you've ever used more than 52 cards in a deck of cards? That implies that your deck of cards class may have no more than 52 cards at a time; a perfect candidate for a Node Pool! Inside your game, you have this one class, which can only hold 52 cards, no more. Then again, what if your card game has to have several decks? You might want to extend that a little more; like have a separate deck for each deck! (am I making any sense?)
The reason I describe this deck of cards example is a historical one; it was the first program I've ever used with Node Pools (long time ago; in C++). The one static node-pool allows other instances of the class (inside the same program) to know which cards are missing from this universal deck, and which are present. On the other hand, if you have a local (non-static) node-pool, each deck handles it's own cards, and non of the other decks know anything about it.
It is important to understand this concept, since it makes a lot of sense <sometimes> (and will allow you to write robust code that doesn't waste memory). Then again, sometimes it makes absolutely no sense to use a node-pool at all!
The example we will design in the next few sections
will deal with a binary tree, which does not use dynamic memory when inserting (or
removing) nodes. It will have a non-static node-pool; meaning that you'll have to specify
the maximum number of elements when it's instantiated (create an instance of it). I
believe the example will be general enough for you to learn to apply node-pools in various
different situations. Since I've already showed how to use my own pComparable
for JDK 1.1 (or lower), this next example will deal only with JDK 1.2 java.lang.Comparable
interface for most purposes. (A java.lang.Comparable interface
is worth learning, since it's very useful; I only wonder why they haven't come up with
something similar earlier?)
As with everything else, there are TONS of ways to write exactly the same thing! With nodes, there is no exception. I prefer to implement node-pool nodes with integer children; where the integer points to the next child inside the node-pool array. This may not be the best Object Oriented way to do this, since it's binding the node-pool node to the node-pool tree. However, there are many other solutions to this, and many are more "elegant" than the one I'll be using here. So, if you don't like it, implement your own version!
If you've paid close attention to the discussion about regular binary trees (somewhere in this document), you'll find this discussion of node-pool stuff pretty easy and similar to that "regular" stuff. Enough talk, lets go and write this node-pool node!
public class pNodePoolTreeNode{
private Object data;
private int left;
private int right;
public pNodePoolTreeNode(){
left = right = -1;
data = null;
}
public pNodePoolTreeNode(int l,int r){
left = l;
right = r;
data = null;
}
public Object getData(){
return data;
}
public void setData(Object o){
data = o;
}
public int getLeft(){
return left;
}
public void setLeft(int l){
left = l;
}
public int getRight(){
return right;
}
public void setRight(int r){
right = r;
}
public String toString(){
return new String(data.toString());
}
}
The code above is pretty much self explanatory (if it's not, take a look back at binary tree nodes section, and you'll figure it out). The only significant thing you should notice is that children of the node are now integers, instead of references to other node objects. Although, it later turns out that what these integer children are referencing are in fact nodes.
Well, that was easy enough, are you ready to figure out how a tree actually manipulates these?
A node-pool tree is not that much different from a regular tree. The only key differences is that it has to maintain a linked list (more or less a "stack") of free nodes, and allocate them when needed.
Another tricky thing is that now, the node's children are pointed to by integers, which are references into a node-pool array; so, watch out. You'll notice that a lot of the "tree stuff" has been re-used from the previous version, and you're right! It makes it a lot easier to understand on your part, and a lot easier to write on mine! (oh, I just ran out of coffee...)
import pNodePoolTreeNode;
import java.io.*;
public abstract class pNodePoolArrayTree{
private int numNodes,nextAvail;
private pNodePoolTreeNode[] arrayNodes;
protected int root;
private void init(){
int i;
root = -1;
arrayNodes = new pNodePoolTreeNode[numNodes];
nextAvail = 0;
for(i=0;i<numNodes;i++)
arrayNodes[i] = new pNodePoolTreeNode(-1,i+1);
getNode(i-1).setRight(-1);
}
public pNodePoolArrayTree(){
numNodes = 500;
init();
}
public pNodePoolArrayTree(int n){
numNodes = n;
init();
}
protected int getNode(){
int i = nextAvail;
nextAvail = getNode(nextAvail).getRight();
if(nextAvail == -1){
nextAvail = i;
return -1;
}
return i;
}
protected void freeNode(int n){
getNode(n).setRight(nextAvail);
nextAvail = n;
}
public boolean isEmpty(){
return getRoot() == -1;
}
public boolean isFull(){
int i = getNode();
if(i != -1){
freeNode(i);
return false;
}
return true;
}
protected Object getData(){
if(isEmpty())
return null;
return getNode(getRoot()).getData();
}
protected void setData(Object o){
if(isEmpty())
root = getNode();
getNode(root).setData(o);
getNode(root).setLeft(-1);
getNode(root).setRight(-1);
}
protected int getLeft(){
if(isEmpty())
return -1;
return getNode(getRoot()).getLeft();
}
protected void setLeft(int n){
if(isFull())
return;
if(isEmpty()){
root = getNode();
getNode(getRoot()).setRight(-1);
}
getNode(getRoot()).setLeft(n);
}
protected void setRight(int n){
if(isFull())
return;
if(isEmpty()){
root = getNode();
getNode(getRoot()).setLeft(-1);
}
getNode(getRoot()).setRight(n);
}
protected int getRight(){
if(isEmpty())
return -1;
return getNode(root).getRight();
}
protected int getRoot(){
return root;
}
protected pNodePoolTreeNode getNode(int n){
if(n != -1)
return arrayNodes[n];
else return null;
}
protected void insertLeft(int node,Object o){
if((node != -1) && (getNode(node).getLeft() == -1)
&& !isFull()){
int i = getNode();
getNode(i).setData(o);
getNode(i).setRight(-1);
getNode(node).setLeft(i);
}
}
protected void insertRight(int node,Object o){
if((node != -1) && (getNode(node).getRight() == -1)
&& !isFull()){
int i = getNode();
getNode(i).setData(o);
getNode(i).setRight(-1);
getNode(node).setRight(i);
}
}
public void print(int mode){
switch(mode){
case 1:
pretrav();
break;
case 2:
intrav();
break;
case 3:
postrav();
break;
}
}
public void pretrav(){
pretrav(getRoot());
}
private void pretrav(int p){
if(p == -1)
return;
System.out.print(getNode(p).getData()+" ");
pretrav(getNode(p).getLeft());
pretrav(getNode(p).getRight());
}
public void intrav(){
intrav(getRoot());
}
private void intrav(int p){
if(p == -1)
return;
intrav(getNode(p).getLeft());
System.out.print(getNode(p).getData()+" ");
intrav(getNode(p).getRight());
}
public void postrav(){
postrav(getRoot());
}
private void postrav(int p){
if(p == -1)
return;
postrav(getNode(p).getLeft());
postrav(getNode(p).getRight());
System.out.print(getNode(p).getData()+" ");
}
}
There are not that many hard methods in this one; so, lets focus in on the hard ones, and I'll leave you to figure out the "easy" ones yourself.
One thing I'd like to mention of the top so that it
causes as little confusion as possible: now that we're not using objects, but integers, we
still have to note the "null node" somehow, so, in our example, I'm
using -1 as the "null node" indicator.
Data members numNodes, nextAvail,
and arrayNodes are there to keep track of the linked list of free and
occupied nodes. It's not exactly a linked list (although many people like to think of it
that way), it's more or less a simple array stack we've implemented at the beginning of
this document. The numNodes is the maximum number of nodes inside this tree;
we could just as well substitute it for arrayNodes.length, but I think this
"separation" is more convenient.
The nextAvail member is the one that
points to the next available node. Our allocation methods, getNode() and freeNode(int
n), uses this particular pointer to allocate and release nodes. And arrayNodes
is just an array holding the node-pool. All of the integer child references are into this arrayNodes
array; which we allocate dynamically during instantiation (creation) of the object.
Inside the init() function, we allocate
the arrayNodes array to hold the node-pool. We later loop through all the
nodes inside of it, and turn them into a linked list (with right child pointing to the
next available node and left child being -1). The last node in the list is
apparently marked -1; since it's last!
I suggest you go over the source for getNode()
and freeNode(int n); try to understand what it's doing. Which is nothing more
complex than what we've already done! (look over the linked list description if you find
it confusing)
The rest of the functions are pretty much self explanatory (you should be able to figure them out). Most of them are just a port of the old tree functions to use integer children with node-pool nodes. Well, are you ready for a binary sort tree implementing a node-pool (ready or not, we're going there)!
As I've mentioned before, node-pool implementations are fairly similar to the regular ones, so, our old sort tree will not change much. The only things that will change however, are the node handling methods. Since we already know what needs to be done (review the binary sort example described previously in this document), we can just go ahead and write the source.
import pNodePoolTreeNode;
import pNodePoolArrayTree;
import java.lang.*;
public class pNodePoolSortArrayTree extends pNodePoolArrayTree{
public pNodePoolSortArrayTree(){
super();
}
public pNodePoolSortArrayTree(int n){
super(n);
}
public void print(){
print(2);
}
public void insert(Comparable obj){
int t,q = -1;
t = getRoot();
while(t != -1 && !(obj.equals(getNode(t).getData()))){
q = t;
if(obj.compareTo(getNode(t).getData()) < 0)
t = getNode(t).getLeft();
else
t = getNode(t).getRight();
}
if(t != -1)
return;
if(q == -1){
setData(obj);
return;
}
if(obj.compareTo(getNode(q).getData()) < 0)
insertLeft(q,obj);
else
insertRight(q,obj);
}
}
Now, the following only makes sense inside a JDK 1.2
(or above) context. Earlier version of the JDK do not support the Comparable interface.
I just though it would be nice to show how to use all these cool interfaces introduced
with JDK 1.2!
The code does nothing special, and chances are, you
can figure it out on your own. The only seeming difference between this source and the one
presented earlier, is that in this one, whenever we test for a null node, we're not
testing the node, we're testing whether if it has value of -1. Other than
that, the code looks almost identical.
This code is pretty bad, however, in illustrating
the beauty of Object Oriented programming. With our approach, we needed to change the
whole thing just to implement node-pools. Ideally, we should strive to only have to change
one part of the system. For example, we could have only changed the abstract
parent class, and leave everything else as it was. Or we could have changed the node a
little, and make the node itself support stuff, but I just thought this was a clearer
approach, without dragging all that Object Oriented stuff into the issue.
I think I've made this Node Pool section longer than it should be; so, let me quickly wrap it up by getting right to the point. Lets write a test driver (which will be an almost identical copy of the old test driver).
import java.io.*;
import pNodePoolSortArrayTree;
public class pNodePoolSortArrayTreeTest{
public static void main(String[] args){
pNodePoolSortArrayTree tree =
new pNodePoolSortArrayTree(100);
int i,n;
System.out.println("Numbers inserted:");
for(i=0;i<10;i++){
tree.insert(new Integer(n=(int)(Math.random()*1000)));
System.out.print(n+" ");
}
System.out.println("\nPre-order:");
tree.print(1);
System.out.println("\nIn-order:");
tree.print(2);
System.out.println("\nPost-order:");
tree.print(3);
}
}
You'll notice that the code above uses java.lang.Integer
object, instead of our own pInteger one. That's because in JDK 1.2, an Integer
is implementing Comparable interface, simplifying a whole lot of stuff. Other
than that, the code works identically to the one shown earlier in the document. Just to be
complete, however, lets include the output from the above program.
Numbers inserted:
103 82 165 295 397 828 847 545 766 384
Pre-order:
103 82 165 295 397 384 828 545 766 847
In-order:
82 103 165 295 384 397 545 766 828 847
Post-order:
82 165 295 397 384 828 545 766 847 103
I guess that wraps it up. (finally!) Now, I'm gonna take a little break, and try to come up with some cool programs that use these data structures. The only way to actually learn what they are, and how to use them, is to see them in action. And some of these can be VERY useful in a wide variety of applications, ranging form databases, to graphics programming.
So, now that you know all the basics of everything about data structures, the remainder of this document will mostly concentrate on interesting approaches, and algorithms using these. (You'll be surprised how far a binary sort tree can go when it comes to creating virtual 3D worlds!)
We've learned about the structure of vectors, lists, queues, etc., but what if you need some order in the way data is arranged? You have several options to this; you can simply implement a sort method inside the data storage class, or you can make the class itself a priority one.
Priority classes are widely used in programs because they simplify a lot of things. For example, you don't have to worry about sorting data, the class itself does it for you. Imagine a simulation of a real world queue at a doctor's office, where people with life threatening injuries get priority over everyone else. The queue has to accept all people, and sort them according to the seriousness of their illness; with most serious ending up seeing the doctor first.
The priority concept can be applied anywhere, not just to queues. It can easily be extended to lists, and vectors. In this section, we will talk about creating a priority vector (since it's fairly simple). We will also cover some "well known" sorting algorithms.
How simple is it? The answer is: very! All we need
is to extend the java.util.Vector class, add one insert function, and we are
done. And here's the source:
import java.lang.Comparable;
public class pPriorityVector extends java.util.Vector{
public void pAddElement(Comparable o){
int i,j = size();
for(i=0;i<j&&(((Comparable)(elementAt(i))).compareTo(o)<0);i++);
insertElementAt(o,i);
}
}
As you can see, it's VERY simple. We simply use this
class the way we would use any other java.util.Vector, accept, when we use pAddElement(Comparable)
method, we are inserting in accenting order. We could have just as well written a
descending order one; I think you get the picture...
Lets test this class, and see if it really works.
import java.io.*;
import java.util.*;
import pPriorityVector;
class pPriorityVectorTest{
public static void main(String[] args){
pPriorityVector v = new pPriorityVector();
System.out.print("starting...\nadding:");
for(int i=0;i<10;i++){
Integer j = new Integer((int)(Math.random()*100));
v.pAddElement(j);
System.out.print(" " + j);
}
System.out.print("\nprinting:");
Enumeration enum = v.elements();
while(enum.hasMoreElements())
System.out.print(" "+(Integer)enum.nextElement());
System.out.println("\nDone ;-)");
}
}
The above test program simply inserts ten random java.lang.Integer
objects, and then prints them out. If our class works, the output should produce a sorted
list of number. Our prediction is correct, the output follows.
starting...
adding: 95 85 62 16 39 73 84 43 77 61
printing: 16 39 43 61 62 73 77 84 85 95
Done ;-)
As you can see, we are inserting numbers in an unsorted order, and get a sorted list when we finally print them out. The technique illustrated above sorts the numbers upon insertion, some algorithms do sorting when retrieving data.
Sorting, in general, means arranging something in some meaningful manner. There are TONS of ways of sorting things, and we will only look at the most popular ones. The one we won't cover here (but one you should know) is Quick Sort. You can find a really good Quick Sort example in the JDK's sample applets. We have also learned that we can sort using binary trees, thus, in this section, we won't cover that approach.
We will be using JDK 1.2's java.lang.Comparable
interface in this section, thus, the examples will not work with earlier versions of the
JDK. The sorting algorithms we'll look at in this section are: Bubble Sort, Selection
Sort, Insertion Sort, Hea