Comp 212 - Lab 14: Java Stream I/O


Introduction

In Java, input and output is defined in terms of an abstract concept called "stream".  A stream is a sequence of data.  If it is an input stream, it has source.  If it is an output stream, it has a destination.  There are two kinds of streams: byte streams and character streams.  The java.io package provides a large number of classes to perform stream I/O.  Mastering Java stream I/O seems like a daunting task.  Do not fret.  In the beginning, you only need to learn to how to manipulate a few I/O classes.  As you progress, you can figure out on your own how to use the other I/O classes and even design new customized I/O classes.  This tutorial describes a few commonly used I/O classes and a class used to parse the input streams called StreamTokenizer.  To illustrate the use StreamTokenizer, the tutorial provides you with a few code examples, and leads you to the process of developing a recursive descent parser to parse scheme-like arithmetic expressions and build the corresponding abstract syntax trees.


The InputStream and OutputStream classes

Input and output in Java of byte streams are handled through subclasses of the java.io.InputStream and java.io.OutputStream classes.  These classes are abstractions that Java provides for dealing with reading and writing information sequentially to/from anything you want, be it a disk, a string buffer, an enumeration, or an array of bytes.  Below is the UML diagram for a few commonly used concrete subclasses of InputStream.  Note that the StreamTokenizer class shown in the diagram is NOT a subclass of InputStream.   StreamTokenizer is a utility class used to parse the stream of input.   StreamTokenizer is discussed in a separate section below.

inputstream.png (31799 bytes)

The first step in using a concrete input stream class is to specify its source for its constructor.  For instance, if I want to read the contents of a file into a String, you may want to use the FileInputStream class and specify the name of the input file in the constructor for FileInputStream.  The following is a sample code.  Try it out.

    import java.io.*;
 

    ...
 

    String buffer = new String();

    // read bytes until eof
    try  {
        // FileInputStream constructor takes either a String, a File object, or a FileDescriptor object
        FileInputStream infile = new FileInputStream("myfile.txt");

        for(int i = infile.read(); i != -1; i = infile.read())
        {
            buffer += (char) i;
        }

        infile.close();
    }
    catch(IOException e) {
        System.err.println("Could not read from myfile.txt");
    }
    catch(FileNotFoundException e) {
        System.err.println("myfile.txt not found");
    }
 

The InputStream's available() method returns the number of bytes that can be read from the stream, and the close() method closes the link between the program and the source of the data.  After you close the stream, you cannot read from it anymore until you open it up again.  The reset() method sends the "pointer" to the stream back to the top of the file.  Several read() methods exist that you can use.  All data comes out as ints or bytes--to convert to chars you will have to cast them.

Similar methods exist for OutputStream.  Below is a UML diagram for commonly used subclasses of OutputStream.

outputstream.png (24895 bytes)


The System class

The java.lang.System class is a useful class containing useful static members doing useful things.  This should look familiar:

    System.out.println("Hello world!");

I'm sure you've done calls like this a million times already, but what's really going on?  out is a static member of the System class; it's a PrintStream object.  A PrintStream is a grandchild of OutputStream in the Java class hierarchy--it has methods implemented that print lines of text at a time as opposed to each character at a time.  System.out is initialized when a program starts to what is known as standard output (stdout).  Stdout is usually the monitor screen, but you can also send stdout to a file at runtime by redirecting it from the Unix command line.  For example, to send the stdout to file "outfile.txt", we do the following:

    % java MyClass > outfile.txt

There is also a System.in class popularly known as standard input (stdin).  Stdin is an InputStream, initially set to taking input from the keyboard, but it can also read from a file at runtime like this:

    % java MyClass < infile.txt

There's a third System i/o file called standard error (stderr).  System.err is another PrintStream designed to direct error messages in case you don't want your output and error messages going to the same place.  Stderr is also initialized to the monitor, but you can redirect it like this:

    % java MyClass >& errfile.txt

And you can combine the redirections:

    % java MyClass < infile.txt > outfile.txt >& errfile.txt

There's lots of other stuff the System class provides for you--check it out.

NOTE: System.out and System.err are the ONLY PrintStream object you should use.  Class PrintStream is deprecated.  To output character streams, you should use PrintWriter (shown in a UML diagram below and in  TestStreamTokenizer.java) instead.

So what would you do if you wanted to, oh, I don't know, read in an arbitrarily long text file of floating point numbers?


The Reader and Writer classes

Input and output for characters streams are handled through subclasses of the java.io.Reader and java.io.Writer classes.  These classes are abstractions that Java provides for dealing with reading and writing character data.  Below is the UML diagram for a few commonly used Reader classes.

reader.png (23099 bytes)

And here is a UML diagram for common Writer classes.

writer.png (24678 bytes)

 


Command-line arguments

Sometimes when running large programs you want to be able to set options about how your program should behave at runtime.  For instance, you might have a hierarchy of debugging statements that you want to suppress at run-time, or you might have a variety of features that you want your users to have access to (check out the manual page for the Unix command "ls"--there's a vast amount of features regarding how to display your data, what to display, ad nauseam).  You can handle this in several ways, but most people make use of command-line arguments to do it.

Remember how all your main functions for your programs have to have the same signature?

        public static void main(String[] argv) { ... }

The array of Strings is where you keep your command-line arguments.  Everything after the class name is included in argv. Let's say we wanted to write a program that just echoed back the command-line arguments to stdout. Here's how we'd do it:

    public class echo {

        public static void main(String[] argv) {
            for(int j = 0; j < argv.length; j++)
                System.out.print(argv[j] + " ");  // don't insert an end-line character yet
            System.out.println();  // now print an end-line ('\n')
        }
    }
 


StreamTokenizer and Parsing

StreamTokenizer

Many times when reading an input stream of characters, we need to parse it to see if what we are reading is a word or a number.  This process is called "tokenizing".  java.io.StreamTokenizer is a class that can do simple tokenization.  TestStreamTokenizer.java is a sample program showing how to use StreamTokenizer.  Copy this file and the file input.txt to your local directory and test it out.

Abstract Syntax Tree

To further illustrate the use of file IO and StreamTokenizer to parse an input file, we write a program to read in scheme-like arithmetic expressions in pre-fix format, such as (* 3  (+ 4  5)), and build the corresponding abstract syntax tree  (AST):

    *
  /    \
3      +
      /     \
    4        5

Once the AST is built, one can then traverse and evaluate it, as done in Homework #3 (download the solution)!  We shall restrict ourselves to arithmetic expressions containing integers, strings (representing variables), and binary operations such as  addition and multiplication.  The syntax of the Scheme-like arithmetic expression can be expressed as a "grammar" with the following rules.

Scheme Arithmetic Expression Grammar Rules

Description

E :: Leaf | BinOp An expression E  is either a Leaf or a BinOp.
Leaf :: VarLeaf | IntLeaf A Leaf is either a VarLeaf or an IntLeaf
VarLeaf :: id A VarLeaf is an id, where id is a token representing a variable.
IntLeaf :: int An IntLeaf is an int, where int is a token representing an integer.
BinOp :: AddOp | MulOp A BinOp is either an AddOp or a MulOp
AddOp :: ( + E E ) An AddOp is a left parenthesis token,  followed by a + token, followed by an expression E, followed by another expression E, followed by a right parenthesis token.
MulOp :: ( * E E ) An MulOp is a left parenthesis token,  followed by a * token, followed by an expression E, followed by another expression E, followed by a right parenthesis token.

In the above grammar rules, 

Recursive Descent Parsing

The manufacturing of an abstract syntax tree (AST) for the above grammar can be thought of a factory method, makeAST(), of some abstract factory, IASTFactory.  How the AST is created is a variant as there are many ways to parse the input stream.  We shall implement a special parsing technique called "recursive descent parsing" (RDP).

In RDP, we use a "tokenizer" to scan and "tokenize" the input from left to right, and build the AST from the top down, based on the value of the tokens.

In order for RDP to make an AST, it needs to know how to make a Leaf AST and a BinOp AST.

Here is some pseudo-code.

makeAST():

makeLeaf():

makeIntLeaf(): trivial

makeVarLeaf(): trivial

makeBinOp():

makeAddOp():

makeMulOp(): analogous to makeAddOp().

In short, a RDP has a tokenizer and maintains a current token.  It has a method for each of the non-terminal symbols in the grammar.  The method that corresponds to the start symbol initiates the parsing.  Usually, parsing only involves checking for the syntax of an input expression.  However, in our case, we decide to build the abstract syntax tree as we parse.

public class RdpASTFactory implements IASTFactory {
    private AToken _currentTok;
    private ITokenizer _tokenizer:
    public RdpASTFactory(ITokenizer tokenizer) { _tokenizer = tokenizer;}
    public IAST makeAST() { // etc...}
    public IAST makeLeaf() { // etc...}
    // other factory methods, etc.
}

Tokens and Tokenizer

The above pseudo-code suggests design a union of token objects coupled with visitors, and an abstract tokenizer class that knows how to get the next token.  One way to implement this tokenizer class is to wrap a StreamTokenizer object.

public abstract class AToken {
    protected String _lexeme;
    protected AToken(String lexeme) { _lexeme = lexeme; }
    public String getLexeme() { return_lexeme;}
    public abstract Object execute(ITokVisitor v, Object inp);
}

public class IdToken extends AToken{
    // constructor, etc.
    public Object execute(ITokVisitor v, Object inp) { return v.idCase(this, inp);)
}

interface ITokVisitor {
    public Object idCase(IdToken host, Object inp);
    public Object intCase(IntToken host, Object inp);
    // etc...
}

interface ITokenizer (
   public AToken getNextToken();
}

public class StreamTokWrapper implements ITokenizer {
    private StreamTokenizer _st;
    // constructor and methods, etc...
}

Lab Activities

Divide the lab into 4 groups.

To test the code, download the solution to homework #3, unzip it, and move the various java files into their appropriate subdirectories to match the packaging.



Dung X. Nguyen & Jim O'Donnell
revised 12/03/01