Tutorial 09: Java Stream I/O

Introduction

In Java, input and output is defined in terms of an abstract concept called "stream". A stream is a sequence of data. If it is an input stream, it has source. If it is an output stream, it has a destination. There are two kinds of streams: byte streams and character streams. The java.io package provides a large number of classes to perform stream I/O. Mastering Java stream I/O seems like a daunting task. Do not fret. In the beginning, you only need to learn to how to manipulate a few I/O classes. As you progress, you can figure out on your own how to use the other I/O classes and even design new customized I/O classes. This tutorial describes a few commonly used I/O classes and a class used to parse the input streams called StreamTokenizer. The tutorial also provides you with a few code examples.

The InputStream and OutputStream classes

Input and output in Java of byte streams are handled through subclasses of the java.io.InputStream and java.io.OutputStream classes. These classes are abstractions that Java provides for dealing with reading and writing information sequentially to/from anything you want, be it a disk, a string buffer, an enumeration, or an array of bytes. Below is the UML diagram for a few commonly used concrete subclasses of InputStream. Note that the StreamTokenizer class shown in the diagram is NOT a subclass of InputStream. StreamTokenizer is a utility class used to parse the stream of input. StreamTokenizer is discussed in a separate section below.

inputstream.png (31799 bytes)

The first step in using a concrete input stream class is to specify its source for its constructor. For instance, if I want to read the contents of a file into a String, you may want to use the FileInputStream class and specify the name of the input file in the constructor for FileInputStream. The following is a sample code. Try it out.

import java.io.*;

...

String buffer = new String();

    // read bytes until eof
    try
    {
        // FileInputStream constructor takes either a String, a File object, or a FileDescriptor object
        FileInputStream infile = new FileInputStream("myfile.txt");

        for(int i = infile.read(); i != -1; i = infile.read())
        {
            buffer += (char) i;
        }

        infile.close();
    }
    catch(IOException e)
    {
        System.err.println("Could not read from myfile.txt");
    }
    catch(FileNotFoundException e)
    {
        System.err.println("myfile.txt not found");
    }

The InputStream's available() method returns the number of bytes that can be read from the stream, and the close() method closes the link between the program and the source of the data. After you close the stream, you cannot read from it anymore until you open it up again. The reset() method sends the "pointer" to the stream back to the top of the file. Several read() methods exist that you can use. All data comes out as ints or bytes--to convert to chars you will have to cast them.

Similar methods exist for OutputStream. Below is a UML diagram for commonly used subclasses of OutputStream.

outputstream.png (24895 bytes)

The System class

The java.lang.System class is a useful class containing useful static members doing useful things. This should look familiar:

System.out.println("Hello world!");

I'm sure you've done calls like this a million times already, but what's really going on? out is a static member of the System class; it's a PrintStream object. A PrintStream is a grandchild of OutputStream in the Java class hierarchy--it has methods implemented that print lines of text at a time as opposed to each character at a time. System.out is initialized when a program starts to what is known as standard output (stdout). Stdout is usually the monitor screen, but you can also send stdout to a file at runtime by redirecting it from the Unix command line. For example, to send the stdout to file "outfile.txt", we do the following:

% java MyClass > outfile.txt

There is also a System.in class popularly known as standard input (stdin). Stdin is an InputStream, initially set to taking input from the keyboard, but it can also read from a file at runtime like this:

% java MyClass < infile.txt

There's a third System i/o file called standard error (stderr). System.err is another PrintStream designed to direct error messages in case you don't want your output and error messages going to the same place. Stderr is also initialized to the monitor, but you can redirect it like this:

% java MyClass >& errfile.txt

And you can combine the redirections:

% java MyClass < infile.txt > outfile.txt >& errfile.txt

There's lots of other stuff the System class provides for you--check it out.

NOTE: System.out and System.err are the ONLY PrintStream object you should use. Class PrintStream is deprecated. To output character streams, you should use PrintWriter (shown in a UML diagram below and in TestStreamTokenizer.java) instead.

So what would you do if you wanted to, oh, I don't know, read in an arbitrarily long text file of floating point numbers?

The Reader and Writer classes

Input and output for characters streams are handled through subclasses of the java.io.Reader and java.io.Writer classes. These classes are abstractions that Java provides for dealing with reading and writing character data. Below is the UML diagram for a few commonly used Reader classes.

reader.png (23099 bytes)

And here is a UML diagram for common Writer classes.

writer.png (24678 bytes)

StreamTokenizer

Many times when reading an input stream of characters, we need to parse it to see if what we are reading is a word or a number. This process is called "tokenizing". java.io.StreamTokenizer is a class that can do simple tokenization. TestStreamTokenizer.java is a sample program showing how to use StreamTokenizer. Copy this file and the file copy the file ~comp212/tutorials/09/input.txt to your local directory and test it out.

Command-line arguments

Sometimes when running large programs you want to be able to set options about how your program should behave at runtime. For instance, you might have a hierarchy of debugging statements that you want to suppress at run-time, or you might have a variety of features that you want your users to have access to (check out the manual page for the Unix command "ls"--there's a vast amount of features regarding how to display your data, what to display, ad nauseam). You can handle this in several ways, but most people make use of command-line arguments to do it.

Remember how all your main functions for your programs have to have the same signature?

public static void main(String[] argv) { ... }

The array of Strings is where you keep your command-line arguments. Everything after the class name is included in argv. Let's say we wanted to write a program that just echoed back the command-line arguments to stdout. Here's how we'd do it:

public class echo {

        public static void main(String[] argv)
        {
            for(int j = 0; j < argv.length; j++)
                System.out.print(argv[j] + " "); // don't insert an end-line character yet
            System.out.println(); // now print an end-line ('\n')
        }
    }

Dung X. Nguyen & Jim O'Donnell
revised 11/05/00