Big picture:

xexpr, xml, and html

Information has structure. For instance, perhaps Picture-infos consists of a caption and a filename (ending in ".jpeg" or ".gif"); and in turn a gallery consists of a title, and a list of picture-infos. We have seen how to represent this data in scheme programs, but non-programmers also want to represent such information. X-expressions and xml are two different, equivalent ways of doing this. We will provide functions to translate back and forth between these formats.

So although your assignment will only process xexprs, by using the provided functions this will let you process data files using the xml format. Since html is just one type of xml, you will be able to write programs that take in and/or produce web pages. (Details on these functions forthcoming, though for the homework you won't need to use them.)

For example, here is some information in xml format, as might be written by a non-programmer photo archivist:

<gallery>
  <title>Me at the Britney Spears Concert</title>
  <picture>
    <filename>pict01.jpg</filename>
    <caption>Waiting in line for a Pepsi.</caption>
    </picture>
  <picture>
    <filename>pict07.jpg</filename>
    <caption>Waiting in line for <em>another</em> Pepsi.</caption>
    </picture>
  <picture>
    <caption>Waiting in line for the bathroom.</caption>
    <filename>pict19.jpg</filename>
    </picture>
  </gallery>
A few notes on this: This xml is all good and well, but we want to process this information in scheme. We'll provide a function which will translate the above into in a corresponding X-expression:
(list 'gallery
      (list 'title "Me at the Britney Spears Concert")
      (list 'picture
            (list 'filename "pict01.jpg")
            (list 'caption "Waiting in line for a Pepsi."))
      (list 'picture
            (list 'filename "pict07.jpg")
            (list 'caption "Waiting in line for " (list 'em "another") " Pepsi."))
      (list 'picture
            (list 'caption "Waiting in line for the bathroom.")
            (list 'filename "pict19.jpg")))


;; Or, equivalently, using the strictly-optional quoted-list form,
;; as mentioned in lecture friday:
;;
'(gallery (title "Me at the Britney Spears Concert")
          (picture (filename "pict01.jpg")
                   (caption "Waiting in line for a Pepsi."))
          (picture (filename "pict07.jpg")
                   (caption "Waiting in line for " (em "another") " Pepsi."))
          (picture (caption "Waiting in line for the bathroom.")
                   (filename "pict19.jpg"))

Further examples of data.

A document is also structured info, of course: it consists of paragraphs; paragraphs are a list of words, links, and emphasized sections. Furthermore, links themselves are words (that can be clicked on) You can express all these in xml, by declaring certain tags to represent paragraphs, lists, list-items, etc. This is exactly what html is: A set of xml tags, used to represent the structure of text-documents. The name even comes from "hypertext markup language". For example, here is some xml which also happens to be html:
html source info How your browser interprest this info
<p>
This sentence is a list of words,
<em>some of which are meant 
to be empahasized!</em>  
Even <em>sub</em>parts 
of words can be emphasized.  We indicate 
what to emphasize by placing it between 
matching "em" tags.
</p>



<p> Each paragraph is delimited 
between "p" tag and its matching 
closing tag, "/p", as you



see.  Blank lines don't count.
Though of as an xexpr, it 
becomes clear this whole thing is 
a list of (four) (paragraph) xexprs;
each paragraph itself contains 
a list of xexprs...
</p>



<p>
An ordered list, "ol", is also structured data:
a sequence of list-items.
<ol>
<li> this is the first item. </li>
<li> this is the second. </li>
<li> note that the actual numbering of 
the items is not included; 
that's implicit. </li>
</ol> 
</p>

This sentence is a list of words, some of which are meant to be empahasized! Even subparts of words can be emphasized. We indicate what to emphasize by placing it between matching "em" tags.

Each paragraph is delimited between "p" tag and its matching closing tag, "/p", as you see. Blank lines don't count. Though of as an xexpr, it becomes clear this whole thing is a list of (four) (paragraph) xexprs; each paragraph itself contains a list of xexprs...

An ordered list, "ol", is also structured data: a sequence of list-items.

  1. this is the first item.
  2. this is the second.
  3. note that the actual numbering of the items is not included; that's implicit.

Of course, there are many other tags used in html. (Try the "view-source" option in your browser, to see the xml for this document itself!)

Note, for those who know some html: our data defintion for xexpr doesn't include attributes. If you want, you can do this whole assignment with a slightly different data definition: a tagged-xexpr is (instead): (cons symbol (cons attr-list xlist)), where the second item, the attr-list, is a list of entries, where each entry is (list symbol string). The provided library, which will let you read/write xexprs as xml-files, will contain a flag, letting you say whether you are imitating attributes as 'attr elements (as in the regular assignment), or if you are using this real version of xexprs.

(Further aside, on attributes vs. elements: Note that both approaches convey the same information; when designing an xml language, it is sometimes a toss-up whether to include something as an attribute, or an element. The only guideline is that attributes are only simple strings -- not xexprs that can contain further elements.)