Information has structure. For instance, perhaps Picture-infos consists of a caption and a filename (ending in ".jpeg" or ".gif"); and in turn a gallery consists of a title, and a list of picture-infos. We have seen how to represent this data in scheme programs, but non-programmers also want to represent such information. X-expressions and xml are two different, equivalent ways of doing this. We will provide functions to translate back and forth between these formats.
So although your assignment will only process xexprs, by using the provided functions this will let you process data files using the xml format. Since html is just one type of xml, you will be able to write programs that take in and/or produce web pages. (Details on these functions forthcoming, though for the homework you won't need to use them.)
For example, here is some information in xml format, as might be written by a non-programmer photo archivist:
<gallery> <title>Me at the Britney Spears Concert</title> <picture> <filename>pict01.jpg</filename> <caption>Waiting in line for a Pepsi.</caption> </picture> <picture> <filename>pict07.jpg</filename> <caption>Waiting in line for <em>another</em> Pepsi.</caption> </picture> <picture> <caption>Waiting in line for the bathroom.</caption> <filename>pict19.jpg</filename> </picture> </gallery>A few notes on this:
Aside:
In general, if you want other people to use and understand
an xml format you've made up, you should formally specify
what tags are necessary, optional, etc.
There is a special syntax for doing this, called a "DTD" --
a data type definition (sound familiar?).
(Annoyingly, this special syntax could've itself used
XML, but the designers failed to do this,
and so DTD details are yet another language to learn...)
We won't touch DTDs for this class.
(list 'gallery (list 'title "Me at the Britney Spears Concert") (list 'picture (list 'filename "pict01.jpg") (list 'caption "Waiting in line for a Pepsi.")) (list 'picture (list 'filename "pict07.jpg") (list 'caption "Waiting in line for " (list 'em "another") " Pepsi.")) (list 'picture (list 'caption "Waiting in line for the bathroom.") (list 'filename "pict19.jpg"))) ;; Or, equivalently, using the strictly-optional quoted-list form, ;; as mentioned in lecture friday: ;; '(gallery (title "Me at the Britney Spears Concert") (picture (filename "pict01.jpg") (caption "Waiting in line for a Pepsi.")) (picture (filename "pict07.jpg") (caption "Waiting in line for " (em "another") " Pepsi.")) (picture (caption "Waiting in line for the bathroom.") (filename "pict19.jpg"))
Note the similarity between xml and scheme lists:
In scheme there are only three types of parens:
round (,)
and square [,]
and squirrely {,};
these can all nest, and each open must match its close.
In xml, it's the same, except there are lots of types of parens (which take more than a single character to write): there're gallery-parens <gallery>,</gallery> and em-parens <em>,</em>, etc. As you'd expect, these parens can nest, and each open must match its close.
Xexpr is just a convention to represent this plethora of parentheses inside of scheme: we pretend that we have this plethora of parentheses: we just insist that each list begin with a symbol (representing name of the xml parentheses -- or "tag"). We won't repeat that symbol at the closing paren, since it's implicit.
A document is also structured info, of course: it consists of paragraphs; paragraphs are a list of words, links, and emphasized sections. Furthermore, links themselves are words (that can be clicked on) You can express all these in xml, by declaring certain tags to represent paragraphs, lists, list-items, etc. This is exactly what html is: A set of xml tags, used to represent the structure of text-documents. The name even comes from "hypertext markup language". For example, here is some xml which also happens to be html:
html source info | How your browser interprest this info |
---|---|
<p> This sentence is a list of words, <em>some of which are meant to be empahasized!</em> Even <em>sub</em>parts of words can be emphasized. We indicate what to emphasize by placing it between matching "em" tags. </p> <p> Each paragraph is delimited between "p" tag and its matching closing tag, "/p", as you see. Blank lines don't count. Though of as an xexpr, it becomes clear this whole thing is a list of (four) (paragraph) xexprs; each paragraph itself contains a list of xexprs... </p> <p> An ordered list, "ol", is also structured data: a sequence of list-items. <ol> <li> this is the first item. </li> <li> this is the second. </li> <li> note that the actual numbering of the items is not included; that's implicit. </li> </ol> </p> |
This sentence is a list of words, some of which are meant to be empahasized! Even subparts of words can be emphasized. We indicate what to emphasize by placing it between matching "em" tags. Each paragraph is delimited between "p" tag and its matching closing tag, "/p", as you see. Blank lines don't count. Though of as an xexpr, it becomes clear this whole thing is a list of (four) (paragraph) xexprs; each paragraph itself contains a list of xexprs... An ordered list, "ol", is also structured data: a sequence of list-items.
|
Note, for those who know some html:
our data defintion for xexpr doesn't include attributes.
If you want, you can do this whole assignment with
a slightly different data definition:
a tagged-xexpr is (instead):
(cons symbol (cons attr-list xlist))
,
where the second item, the attr-list,
is a list of entries, where each entry
is (list symbol string).
The provided library, which will let you read/write xexprs as
xml-files, will contain a flag,
letting you say whether
you
are imitating attributes as 'attr elements
(as in the regular assignment),
or
if you are using this real version of xexprs.
(Further aside, on attributes vs. elements: Note that both approaches convey the same information; when designing an xml language, it is sometimes a toss-up whether to include something as an attribute, or an element. The only guideline is that attributes are only simple strings -- not xexprs that can contain further elements.)