Sorted Word List Programming Assignment

CS230, Winter 2002-3
Date Assigned: Wednesday, December 4, 2002
Milestone: On Wednesday, Dec 11, I will ask you to show what you have working so far.
Date Due: Monday, December, 16, 2002 at 11:59 PM

I believe that the document is complete now. Let me know if you see anything that appears to be incorrect or needs further clarification.

You may work with a partner (of your choosing) for this assignment, or you may work alone if you prefer (but don't expect me or one of the assistants to act as a surrogate partner for you). Your partner (if you choose to accept one) must be from your section. If you wish to work with a partner but do not find someone to work with on your own, let me know; if someone else is also looking, I will help you to get connected. I highly recommend finding someone with similar abilities, commitment to the course, etc.

Some of the later assignments will leave more of the design decisions to you. But it is very important to get practice implementing other people's specifications and designs.

Note that it is quite possible to do this assignment incrementally. Each of the required classes may be implemented and initially tested independent of the others.

Most of the credit for this assignment will be based on correctness. But quality counts. In particular, you should try to avoid writing the same code in multiple placesw if it is easy to avoid doing so.

Contents of this document

Overview of the assignment
List of required classes
WordListPrinter classes
Hints and suggestions for WordListPrinter classes
PrintStream classes
Hints and suggestions for PrintStream classes
Analysis and Timing Tests
Deliverables and how to turn in your code
How I will test your code
Examples

Overview of the assignment

For the purposes of this assignment, we define a word to be a maximal sequence of non-whitespace characters. We define whitespace characters to be space and tab. We will not have to deal explicitly with end-of-line characters because the readLine method from the PrintStream class handles that automatically for us.

I have provided an abstract class called WordListPrinter that declares a constructor
    public WordListPrinter(BufferedReader _in, PrintStream _out)
and a single public method:
    public abstract void outputWordList();
When this method is called, it reads all of the words from the input (the one provided as the first argument to the constructor), and prints to the output (the one provided as the second argument of the constructor) a sorted list of these words (with no duplicate words, but note that "sleep" and "sleep." are two different words). The list should be sorted according to the normal ASCII ordering provided by String's compareTo method.

For additional information on the WordListPrinter class, see the Javadoc documentation.

In this abstract class, I have provided the concrete method iterateAndOutput that your outputWordList method should call to actually produce the output (this should ensure that everyone's output is identical). Your outputWordList method should not produce any output other than that generated by its call to iterateAndOutput.

You will experiment with using different data structures and algorithms in four different extensions of WordListPrinter:

  1. ArrayListWordListPrinter uses an ArrayList that is not kept sorted durring the input process, but is sorted later using a call to a Collections.sort method before it outputs the word list.
  2. SortedArrayListWordListPrinter uses an ArrayList that it maintains in sorted order with no duplicates after each input word is added.
  3. SortedLinkedListWordListPrinter uses a LinkedList that it maintains in sorted order with no duplicates after each input word is added.
  4. TreeSetWordListPrinter uses a TreeSet that will automatically do the ordering and removal of duplicates for you.
You will run some experiments in which you compare the processing times using these four extensions.

Finally, you will experiment with supplying different PrintStream objects (some of which you will create) as the second argument of outputWordList, in order to format the output in different ways.

List of required public classes

A. The abstract class WordListPrinter and its extensions
1. ArrayListWordListPrinter
2. SortedArrayListWordListPrinter
3. SortedLinkedListWordListPrinter
4. TreeSetWordListPrinter
B. (decorated) Extensions of PrintStream:
1. OnePerLinePrintStream
2. TextFillPrintStream
3. JustifyingPrintStream

WordListPrinter classes

Basically all that you have to implement for each extension of the WordListPrinter class are

Hints and suggestions for WordListPrinter classes

When you create a new JCreator project for this assignment, be sure to choose either Basic Console Application or Empty Project. Your program is not supposed to have any GUI components; if you include them, your code will not work with the script that I will use to check the correctness of your code.

All of the classes that you create should be in the default package. I.e., do not include a package declaration anywhere in your code.

You can use the approach of lines 23-24 of DS page 42 to read the input a line at a time, and then create a StringTokenizer (with space and tab as its delimiters) to process each line a word at a time.

You may want to define one or more additional methods in the abstract class, in order to avoid having to write identical code in each extension class.

In your ArrayListWordListPrinter class, simply add each word to the end of the theList ArrayList if that word does not duplicate a word that it already there. When all of the words are there, sort the list using the Collections.sort method. An alternative is to add all of the input words, sort the list, and then remove duplicates. Which approach is better? Can you figure out an O(N) way to remove duplicate items from a sorted ArrayList?

In your SortedArrayListWordListPrinter class, you can check (rapidly, using Collections.binarySearch) to see if a new word is already in theList, and where it should be inserted if it is not there. In this way, the list can be always kept sorted.

In your SortedLinkedListWordListPrinter class, the process (and code) will be almost identical to the SortedArrayListWordListPrinter class. This class is only here for efficiency comparison with the previous class.

The last class, TreeSetWordListPrinter, should have the shortest code, since a TreeSet object takes care of keeping the set sorted and avoiding duplicates.

PrintStream classes

The outputWordList method's second argument is a PrintStream object. All of the output to that PrintStream is done via the calls to print and (the zero-argument version of) println in the iterateAndOutput method.

If the second argument to outputWordList is a "plain old" PrintStream object, e.g., System.out, then all of the words will be output as one line (unless a new line is written automatically when the buffer gets flushed; the JDK documentation does not seem to say whether this ever happens).

You are to write three classes that extend PrintStream:

Hints and suggestions for PrintStream classes

PrintStream has a constructor that takes an OutputStream as its argument.

You may want to consider having JustifyingPrintStream extend TextFillPrintStream.

Here is a simple approach to doing the justification: Create an ArrayList (I will call it blankIndex) that contains the positions of all spaces in the line. Choose a random number, use that number as an index into blankIndex, use the number stored there as an indication of where to add an extra space in the output line, add that space, and adjust the later numbers in blankIndex. Repeat this until the line has the correct length.

While this approach is simple, it is not very efficient, due to the overhead of inserting blanks into a string and adjusting blankIndex after each insertion. Perhaps it would be faster to copy the line into a linked list of characters, and let blankIndex be an ArrayList of references to the nodes in the list. Then when we want to insert a blank, we just insert a new node in the middle of the linked list (nothing else has to be moved), and no changes need to be made to blankListitself. After inserting the correct number of blanks, build a new String from the linked list.

Analysis and Timing Tests

You should submit (in class on Wednesday, Dec 18) a separate printed document that deals with running time issues. You should do an analysis based of the running times for building and printing the sorted list using your different WordListPrinter classes. Write down the results of your analysis (big-Oh results are fine) and explain how you got them.

You should run some tests in which you compare running times with various WordListPrinter objects. You should use a "generic" PrintWriter for output for all tests, so that you can focus on timing the list-building portion. Try different sized input files with each WordListPrinter class.

Present your data in tabular and/or graphical form. Does the data that you collected correlate with what was predicted by your analysis?

You may find two methods in the java.jang.System class helpful. They are currentTimeMillis and gc (can you see why the latter might be helpful?

Deliverables and how to turn in your code

Java source code files for the eight specified classes must be placed in a directory called SortedWordLists in your turnin directory on AFS (That directory is /class/cs/cs230/turnin/username). It is not necessary (or desirable) to place your entire JCreator project folder there; just place the Java files there.

If you work with a partner, only one of you should place code in your turnin directory. In the SortedWordLists directory in your turnin directory, place a plain taxt file called partner.txt that contains only one thing: the username of your partner.

You should also submit the printed "analysis and timing" document described in the previous section. This should be turned in during Wednesday's class.

How I will test your code

The testing for correctness will be done by a script. It is crucial that all of your classes have exactly the right names and that your constructors and methods have the correct signatures.

Your code should not generate any superfluous output, such as a prompt for suer input or a headder message such as "Alphabetical Lists of words:" .

My script will:

* cd to your SortedWordList directory.
* Remove all .class files that are there.
* javac *.java to compile all source code.
* For each of my test classes
*       Copy my source file to your directory.
*       Compile and execute my code.
*       Compare your output to the correct output. In all cases except the justified
          output, your output should exactly match mine.
         
* Perhaps do some timing tests.

PLEASE MAKE SURE THAT YOU DO NOT SUBMIT CODE THAT GOES INTO AN INFINITE LOOP.

Examples

I have placed a number of simple test programs in the test-files directory in the SortedWordLists assignment directory. You should be able to use them without modification. You should of course run tests with other input files; in particular some longer and shorter ones. Also ones that put the text fill and justification code through its paces.
sliderule 4:46pm > more TestArrayListGeneric.java
import java.io.*;

public class TestArrayListGeneric {

    public static void main(String args[]) {
        new TestArrayListGeneric();
    }

    public TestArrayListGeneric() {
        try {
            WordListPrinter wl = new ArrayListWordListPrinter(new
                BufferedReader(new FileReader("words")), System.out);
            wl.outputWordList();
        } catch (FileNotFoundException e) {
            System.err.println("File not found");
            e.printStackTrace();
        }
    }
}
sliderule 4:46pm > javac TestArrayListGeneric.java
sliderule 4:47pm > java TestArrayListGeneric 
-- Call Cato I If Ishmael. It November Some There This Whenever With a about account ago all almost an and as ball. before bringing but can. cherish circulation. coffin damp, degree, deliberately driving drizzly especially every feelings find flourish for from funeral get grim growing hand hats have having high himself his how hypos in interest into involuntarily is it it, knew knocking little long me me, me. meet; men methodically mind money moral mouth; my myself nearly never no nothing ocean of off on or other, part particular pausing people's philosophical pistol precisely prevent principle purse, quietly rear regulating requires sail same sea see ship. shore, some soon soul; spleen, stepping street, strong substitute such surprising sword; take that the their then, they this. thought throws time to towards up upon upper very warehouses, watery way whenever with world. would years 
sliderule 4:47pm > more TestTreeSetJustifying.java
import java.io.*;

public class TestTreeSetJustifying {

    public static void main(String args[]) {
        new TestTreeSetJustifying();
    }

    public TestTreeSetJustifying() {
        try {
            WordListPrinter wl = new TreeSetWordListPrinter(new
                BufferedReader(new FileReader("words")), new
                    JustifyingPrintStream(50,System.out));
            wl.outputWordList();
        } catch (FileNotFoundException e) {
            System.err.println("File not found");
            e.printStackTrace();
        }
    }
}

sliderule 4:47pm > java TestTreeSetJustifying
-- Call Cato I If Ishmael. It November  Some There
This Whenever With  a about account ago all almost
an and  as ball. before bringing but  can. cherish
circulation.   coffin  damp,  degree, deliberately
driving   drizzly  especially every  feelings find
flourish for from  funeral  get grim growing  hand
hats  have having high himself  his  how  hypos in
interest   into   involuntarily is   it it,   knew
knocking  little     long  me me,  me.   meet; men
methodically mind money  moral  mouth;  my  myself
nearly  never no nothing ocean of off on or other,
part particular  pausing  people's   philosophical
pistol precisely  prevent principle purse, quietly
rear regulating  requires sail same  sea see ship.
shore, some   soon soul; spleen, stepping  street,
strong substitute such surprising sword; take that
the their then, they this. thought throws  time to
towards up upon upper very warehouses, watery  way
whenever          with  world.     would     years
sliderule 4:50pm > javac TestTreeSetJustifying.java
sliderule 4:57pm > java TestTreeSetJustifying
-- Call Cato I If Ishmael. It  November Some There
This Whenever With a about account ago all  almost
an and as ball. before  bringing but can.  cherish
circulation. coffin  damp,   degree,  deliberately
driving  drizzly especially   every  feelings find
flourish  for from funeral  get  grim growing hand
hats  have having  high  himself  his how hypos in
interest into involuntarily  is  it   it,     knew
knocking    little  long  me  me, me. meet;    men
methodically   mind  money moral mouth;  my myself
nearly never no nothing  ocean of off on or other,
part   particular   pausing people's philosophical
pistol  precisely prevent principle purse, quietly
rear regulating requires sail same   sea see ship.
shore, some soon  soul; spleen,   stepping street,
strong substitute such surprising sword; take that
the their  then, they this. thought throws time to
towards up upon  upper very warehouses, watery way
whenever      with     world.       would    years
Note that the extra spaces were inserted in different places in the two runs of the last program.