Comp 210 Challenge Lab 2: More I/O, Web Server


Review of simple I/O from last challenge lab

Input-ports have a state associated with them: there is either more left to read, or we are at the end of the file (or more generally, the end of the stream of data). The read function returns either a useful s-expression, or a special end of file marker that we can test for with eof-object?.

Important functions for reading:
open-input-file : string-> input-port
open-input-string : string -> input-port
read : input-port -> s-expression

Here is the solution to one of the exercises from last lab. We wanted to write a contains-word? function which would search through a file for occurrences of a particular word, expressed as a symbol.

;; contains-word: input-port symbol -> boolean
;; Returns true if the symbol is found in the file, at the "top level"
;; ie, not inside a list
(define (contains-word? iport word)
  (local [(define next-sexpr (read iport))]
    (cond [(eof-object? next-sexpr) false]
          [(symbol? next-sexpr) 
           (or (symbol=? word next-sexpr) (contains-word? iport word))]
          [else (contains-word? iport word)])))

Before we start: characters, begin

We've seen a few simple or atomic types so far in Scheme: numbers, symbols, booleans, strings. There is another type called a character, suitable for representing single ASCII characters or bytes. (What are the pros and cons of having bytes identified with characters?)

Characters are denoted as a hash sign, a backslash, followed either by a single character literal, or a multicharacter name. For example, the following is a list of valid characters, representing the letters lowercase 'a', uppercase 'B', the numeral 9, a newline character, a space, a backslash, a hash sign, and the null character (ASCII value 0).

(list #\a #\B #\9 #\newline #\space #\\ #\# #\null)

You can construct a string from characters through the string function.  Example, (string #\W #\a #\z #\z #\u #\p) = "Wazzup". Other interesting functions on characters include char?, char=?, string->list, list->string, and the I/O functions introduced in the next section. To convert between characters and their ASCII numbers, look at char->integer and integer->char. In this lab, we will only use the I/O functions.

begin is a special form that lets one perform multiple actions in a sequence. We'll need to use begin when we talk about writing files, and then we'll have some concrete examples, but here is the syntax:
(begin
  (do-first-thing args ...)
  (do-second-thing more args ...)
  ...
  (do-last-thing whew ...))

Can you think of any applications for begin in past homeworks? Perhaps when using the draw library, when you wanted to draw multiple shapes:

(begin
(draw-circle circle1)
(draw-square square1))

Reading and writing characters

read provides appropriate functionality if you want to read in s-expressions and you don't care about things like whitespace or comments. There are several other reading functions in DrScheme. Also, reading files is of limited use if you cannot also write them. One writes files by opening an output-port and calling the appropriate write functions on it. In this section, we'll talk about reading and writing a character at a time.

read-char and write-char: read-char reads in the next character from the input-port. Like read, it returns an end of file object if nothing is left. write-char will write out exactly one character (byte) to an output-port.

How do we get output-ports to write to?  We can open a file to write to using open-output-port. When we open a file, we need to tell DrScheme what to do if the file does not exist. The default behavior is to signal an error. We can also overwrite the file or append to the end of it. There are other options (if you're interested, read about Opening File Ports in the Help Desk). We can also get an output-port corresponding to DrScheme's Interactions window by calling current-output-port.

;;open-output-file: string symbol -> output-port
(open-output-file "/var/log/httpd.log" 'append)
(open-output-file "program-config" 'replace)

;;current-output-port: -> output-port
(current-output-port)

Here's a very simple example: copying one file to another location, replacing the target file if it exists.

;; copy-file: string string -> void
(define (copy-file from to)
(local [(define (copy-loop in out)
(local [(next-char (read-char in))]
(cond [(eof-object? next-char) (void)]
[else (begin (write-char next-char out) (copy-loop in out))])))]
(copy-loop (open-input-port from) (open-output-port to 'replace))))

There are also display, write, print, read-line, printf, fprintf, open-output-string, etc. For formatted output, printf and fprintf are easily the most useful.

To do, if you want more practice:


Network I/O

So far we've read from files and strings and written to files and the interactions window. It turns out that we can talk to other computers over the internet using the same functions. We just need some way to get input and output ports associated with our connection to the other computer. Other than that step, talking to another computer is just like reading and writing files.

The protocol by which most communication happens on the internet is called TCP (Transfer Control Protocol). So to talk to another computer, we need a TCP connection. We'll use tcp-listen and tcp-accept in this lab. Another interesting function is tcp-connect, which makes a connection with a server.

Organization of a web server

A web server does its job by announcing to the operating system that it wants to listen for connections at a certain address, called a port (for web servers, usually port 80). The server then listens on that port for incoming connections. When a client (like a web browser) makes a connection, the server gets an input-port and an output-port which it can use to talk to the client. The client sends a request in a certain format. The server decodes the request, and if it refers to a file the web server is willing and able to serve, the web server sends the contents of that file to the client through the output-port. If not, the server sends an error page to the client over the output-port. The server then closes both ports and listens for a new connection.

The following is a dramatization of this protocol, using the metaphor of a switchboard operator. We are going to implement a simple version of HTTP/0.9, the first web protocol defined. Modern web servers and browsers speak HTTP/1.1, which adds many new features and options. It's also a lot more work, so we'll stick with the very simple stuff.

Web server to Front desk: I'm listening for connections on port 80.
Client to Front desk: I'd like to speak with the program at port 80, please.
Front desk to Client: Very well. <transfers the call to Web server>
Web server to Client: Hello?
Client to Web server: Give me a file called /pub/release/changelog
Web server (thinking): Hmm... I'll look for that starting in /home/httpd/html, so let me see if there's a file at /home/httpd/html/pub/release/changelog... yep, and it's readable!
Web server to Client: <sends the file>
Web server: Bye.
...
Client2 to Front desk: I'd like to speak with the program at port 80, please.
...

What library functions do we need in order to write a web server?

;; tcp-listen : number -> tcp-listener
tcp-listen tells the operating system that you want to listen for connections on the given port, and it returns a listener that you can use to receive connections (see tcp-accept). For your web server, listen on port 9999 (or something close to that; some ports are in use and some are restricted).

;; (call-with-values (lambda () (tcp-accept listener)) my-handler-function)
Because of the way tcp-accept returns the input-port and output-port, we need this magic syntax here (for the really curious, read about values in the Help Desk). This calls tcp-accept on a listener called listener and then calls my-handler-function with two arguments: the input-port and output-port.

;; find-file : string string -> string or false
This function, provided by the teachpack, takes a directory to use as the base (or root) directory and a path to locate relative to that directory. The function returns the real path if the file exists and is readable, or false if not.

The teachpack also provides the url structure and some associated functions.  There is one interesting selector for the purposes of this lab: url-path. There is also a string->url function, which parses a string like "http;//www.owlnet.rice.edu/~comp210/Labs/" into (make-url "http" "www.owlnet.rice.edu" "/~comp210/Labs/" ...). We can use this to extract the path from the request sent by the client.

Web server walkthrough:

  1. Load the teachpack. It should be located at /home/comp210/Labs/challenge/lab2teachpack.ss. The teachpack provides functions to resolve request paths into real paths. It also provides access to the url structure and functions.
  2. Write serve-file: string output-port -> void. This file should take in a filename (which is guaranteed to exist and be readable) and an output-port. The function should open the file for reading and then send the contents to the output port.
  3. Write parse-request: input-port -> string or false. This function should take an input-port and read an HTTP request from it. An HTTP request has the form "GET <url> ..." where <url> is the URL the client wants to retrieve. (Hint: use read, symbol->string, string->url, find-file). Return false if the request isn't in the right form, or if the file is invalid (doesn't exist, can't read, etc).
  4. Write server-loop: tcp-listener -> (loops forever) . This function should accept a connection from the tcp-listener, get the request, handle it, and then close the input and output-ports of the connection. It should then wait for a new connection (by recurring). Question: how should you handle bad requests?
  5. Write http-server: num -> (loops forever). This function should create a listener on the given port number and then start server-loop to handle requests.

Extensions

Once you've gotten the basic web server done, there are many directions you can go to extend it to do more interesting things. I'll describe some of them below. Many of these will require additional support from built-in libraries, and some are very challenging.
  1. In most web servers, if you request something that is the name of a directory, will look for a file called index.html (or something similar). Add this feature. Sometimes, if index.html is not found, the web server will print out an html listing (with links) of all the files in that directory.
  2. Add a logging capability to your web server. You can record requested urls, times, responses, client address (read more about TCP ports in the Help Desk), and other information. If you structure your log files in a nice way, you could even write programs that operate on your log files. For example, to determine what IPs are infected by Code Red and are hitting your server trying to infect it.
  3. Read about HTTP/1.0 and implement a useful subset of it. You can add support for headers and return codes (eg, 200 OK, 404 Not found) without too much work.
  4. Turn your web server into a web proxy. If you receive a request, find the host part of the url and connect to that site, and then just repeat the request. The host you contact will give you a response, which you can pass back to your client. (This actually isn't that hard.)
  5. Add CGI support to your web server. This is a nontrivial task. Look at the function process to figure out how this works.
  6. Add a servelet engine. This is also nontrivial, but so much more interesting than plain CGI. Much of the work is the same. If the file requested matches a certain pattern (eg, the request starts with "/servelets", or the filename ends with ".ss"), then that file should be loaded and the result of running it sent back to the client. There is a lot of work here both in implementing this and designing the interface between the server and the servelets it executes. How do you keep the servelets from redefining important server functions or calling (exit) and quitting the server?