Comp 210 Lab 8: Disks and Mutual Recursion

In a previous lecture we talked about files and directories, as an example of mutual recursion, and had problems about finding all files with a specific name (buried anywhere inside a given folder), etc. Today, we'll look at those ideas, and combine them with specific functions which actually look the file structure of the disk you are using.

  • TodoFrom the Help menu, start up help-desk and search for "Filesystem Utilities". (Use the version in "mzscheme language manual", not in the "mzlib libraries".) It starts with a discussion of build-path.
  • Go to the menu Language:Add Teachpack..., and add a teachpack which is sitting in today's Lab directory, /home/comp210/Labs/Lab08/lab08-utils.ss. This teachpack imports the mzscheme functions discussed in help-desk above, and makes them available in your intermediate-student level.
  • To make sure it works, type in the following:
    ; A couple of functions on strings which we'll use later:
    (has-suffix?   "caprice" "rice")
    (remove-suffix "caprice" "rice")
    
    
    
    ; Functions which access the file system!
    ;
    (current-directory)
    
    (directory-list (current-directory))
    
    (define m (build-path (current-directory) "zzz"))   ; Well, use the name of a file or directory here.
    m
    
    (file-exists? m)
    (directory-exists? m)
    (link-exists? m)
    
    (directory-list m)
    
    Take a moment to look at what each of these does.
  • Data Definitions

    First, note that rather than have a structure containing all the information about a file, Helpdesk indicates that all the built-in functions involve using a file/directory's "pathname".

    Aside: Ideally, a path name would not be a string, with directories seperated by / (unix) or \ (DOS) or : (Mac) or ... . In a perfect world, a path would be a list of strings (each corresponding to directory, and the last being a file). Alas, for historical reasons, mzscheme chose to stick with a pathname being one big long string. (It provides build-path to abstract over the different platforms.) (As mentioned in a previous lab, strings are often inappropriate as string representations ignore structure).

    Recall the data definition used back in lecture, for files and directories. This will not be quite the definition we use, but it's good to keep in mind.

    ;; An Disk-Entry is
    ;; -- File, or
    ;; -- Folder
    
    
    ;; A File is structure with a name and a size.
    ;;   (make-File string num)
    (define-struct File (name size))
    
    
    ;; A Folder is
    ;; (make-Folder symbol list-of-Disk-Entries) 
    (define-struct Folder (name entries))
    
    ;; list-of-Disk-Entries is
    ;; -- empty
    ;; -- (cons Disk-Entry list-of-Disk-Entries)
    
    
    ;; Template for Disk-Entry
    (define (f-Entry entry)
      (cond [(File? entry)   (f-File entry)]
            [(Folder? entry) (f-Folder entry)]))
    
    ;; Template for File
    (define (f-File file)
      (File-name file)..(File-size file)..)
    
    ;; Template for Folder
    (define (f-Folder folder)
      (Folder-name folder)..(f-LOI (Folder-loi folder))..)
    
    ;; Template for List-of-Disk-Entry
    (define (f-LOI dir)
      (cond [(empty? dir)]
            [(cons? dir) ..(f-Entry (first dir))..(f-LOI (rest dir))]))
    

    As mentioned, we will represent files and folders both as pathnames, which in turn are represented as strings. However, we will still need to have a function which handles a single disk-entry, as well as a function that handles a list of disk-entries.

         ;;; Our real data definition
    
    ;; A disk-entry is:
    ;;  - a filename, or
    ;;  - a directory-name.
    ;;  - a linkname  [see below]
    ;;
    ;; A filename is a pathname.
    ;; A directory-name is a pathname; use directory-list to get the list of enclosed disk-entries.
    ;; A linkname is a pathname; see below.
    ;;
    ;; A pathname is a string.
    
    
    ;; handle-disk-entry: disk-entry --> ??
    ;;
    (define (handle-disk-entry a-de)
      (cond [(file-exists? a-de) ..]
            [(directory-exists? a-de) ..(directory-list a-de)..]
            [(link-exists? a-de) ..]))   ; A dangling link.
    

    What's a link?

    Windows' "shortcuts", the mac's "aliases", and *nix's "soft links" are small files that simply refer to some other filename. For instance, you might have a drscheme shortcut/alias/link on your desktop, which does nothing but give the name of some (deeply nested) disk-entry. Usually, opening or asking about a linkname automatically translates to a question about the referred-to entry.

    However, occasionally something can be a linkname w/o being a filename or a directory-name -- usually this is a dangling link, (though it could also be a set of circular links). Thus the above data definition is a bit off; if you have a link to a directory and ask if it's a directory, both directory-exists? and link-exists? will return true.

    We'll be a bit shady here, and by asking link-exists? after the other questions, that branch is only taken for dangling (or cyclic) links.

    Todo Let's write a function "find-suff", which is given a (pathname of a) disk entry, and returns a list of all the contained disk entries which end in a given suffix. That is, (find-suff "/home/ian" ".ss") would return all the functions in my (unix) home directory which end in .ss.

    As the templates above suggest, you'll have one function which takes the initial disk-entry, determines whether it's a file or a directory, and handles it appropriately. The two helper functions will do the work. What are the contracts and descriptions for each of these functions?

    To process an individual file, you can use has-suffix. To process a directory, you can return a list which may or may not contain its own name, and then a helper which recurs on each of the files it contains (found by calling directory-list.

    A problem! You try running your program, and you see that on a single file it works. But the moment you run on a directory, you get an error along the lines of "cond: all branches false", indicating we're calling our function on something that is neither a file (pathname) nor a directory (pathname).

    What do we do? Give up in Despair, Disgust? No! We could try the stepper, if that worked on intermediate, but it doesn't. So the tried-and-true method is, use your noggin, to trace through exactly what gets returned! We can even type in intermediate expressions, to see what they evaluate to.

    Alternately, you can look up the error function in help-desk:
    (define (max-of-list nums)
      (cond [(empty? nums) (error 'max-of-list "No max for an empty list.")]
            ...))
    
    Note that this function is helpful in conjunction with the function format, which can produce a string made out of other (non-string) values:
    (define (add17 n)
      (cond [(number? n) (+ n 17)]
            [else (error 'add17 (format "Expected a number; got ~a." n))]))
    
    format is useful by itself:
    (format "I am ~a years old." 17)
    
    
    ;; discuss-age: Person -> string
    ;; Given a person, return a string talking about their age.
    ;;
    ;(define (discuss-age p)
    ;  (format "~a is ~a years old." (person-name p) (- 2002 (person-year p)))
    
    ; The above version has a couple of problems, rectified below:
    
    ;; discuss-age: Person -> string
    ;; Given a person, return a string talking about their age.
    ;;
    (define (discuss-age p)
      (local {(define current-year 2002)
              (define age (- current-year (person-year p)))}
        (format "~a is ~a year~a old."
                (person-name p)
                age
                (cond [(= 1 age) ""]   ; Get correct plural-ending.
                      [else "s"]))
    

    When you discover the problem, write a new version of directory-list-full, which returns a list of pathnames relative to the current directory!
    Optional: Use map (and, it turns out, lambda) to do this!
    Now go back and complete your function.

    Todo: When you've completed that, choose one of the following (perhaps the last one) to work on:

    A word of warning: If we were representing Disk Entries as self-contained structures, the template solution is guaranteed to never loop. However, we're just looking at the disk, and it turns out that links can refer to folders "above" themselves, which means following the link begins a cycle, which your code could follow forever!
    To ponder: What are ways to get your code to detect a cycle?