COMP 210 Lab 13: Sharing Local State

We have seen several examples of functions which share local state. This is another example, which also introduces the abstract data structure of a dictionary.


Dictionaries

An early homework involved a phone directory, implemented as a list-of-name/number pairs. We could add name/number pairs to the directory, and we could use a name to look up the associated number in the directory..

There is no need to restrict ourselves to name/number pairs. In general, a dictionary is a set of key/value pairs. You can add a key/value pair (an entry) to the dictionary, and you can use a key to look up the associated value. (There can only one value associated with a given key.)

We can use dictionaries in many contexts, such as


Dictionary Implementations

We will implement dictionaries in several different ways. We'll start by writing a dictionary implementation that has some problems, and then we'll incrementally improve it. (Of course, test each part thoroughly, before proceeding to the next step.)

The implementations will differ on the details. For example, some will use a list of entries, while some will use a vector of entries. But code using dictionaries shouldn't care about how the internal details; it should simple call functions to add and lookup entries. Others use the dictionary as an abstract data type.

Dictionary exercise

Here's the definition of an entry:

       (define-struct entry (key val))
       
And a useful helper function:
       ;; have-key? key entry --> boolean
       ;; Returns whether the key is the same as that of the entry.
       (define (have-key? key entry)
          (equal? key (entry-key entry)))
       

The following example won't use visitors, just in case they would distract from the points of the lab. If you have time, go back and redo the exercises using visitors.

Also, in these examples, adding an entry will modify the current dictionary. Contrast this with, e.g., the in-class stack example which created a new stack when you added an element.

  1. Create a global list of entries, named contents, which will be the "datastore" for our dictionary. Then, finish writing the two functions lookup and add! below. This should be really easy.

                  ;; add!: ANY ANY --> list-of-entries
                  ;; SIDE-EFFECT: change the global placeholder "contents".
                  ;; We return the updated contents.
                  ;;
                  (define (add! key val)
                    …)
    
                  ;; lookup: key --> entry or false
                  ;; Does the key occur inside "contents"?
                  ;; If so, return that entry, else return the sentinel false.
                  ;;
                  (define (lookup name)
                    (local [;; helper: list-of-entries --> entry or false
                            ;; Does the key occur inside "entries"?
                            ;; If so, return that entry, else return false.
                            ;;
                            (define (helper entries)
                              (cond [(empty? entries) …]
                                    [(cons?  entries) …]))]
                       (helper contents)))
                  

    Sample solution.

  2. The above code should work and test okay. However, it has a couple of huge drawbacks:

    • Anybody might set! contents directly.

      Even if someone is trying to use contents correctly, they could easily make mistakes. E.g., if we were keeping it in sorted order as a way to improve lookup's efficiency, everything could get screwed up.

    • We can only make one single dictionary, ever! We'd certainly like to write a program which used several independent dictionaries.

    We'll fix these flaws by encapsulating the contents and two functions in a single data structure. However, rather than making the contents explicit, we will hide it in the closures of the two functions.

                  ;; A Dictionary is a structure with two "methods"
                  ;;   (fields which are functions):
                  ;; add-fn!  : ANY ANY --> list-of-entries
                  ;;   SIDE-EFFECT: change the contents.
                  ;; lookup-fn: key --> entry or false
                  ;;   Does the key occur inside "contents"?
                  ;;   If so, return that entry, else return the sentinel false.
                  ;;
                  (define-struct dict (add! lookup))
                  

    Write a function dict-Factory, which takes no arguments but returns a new, independent dictionary that initially has no entries. Make sure you can create several dictionaries which don't interact.

    Sample solution.

  3. Suppose we'd like to keep track of how many dictionaries get made. We need to keep track of this information, and give access to it, so that it is useful.

    1. Use a global variable num-dicts to keeps track of this.

      Add another field to dictionaries, id, so that the first dictionary has an id of 1, the second an id of 2, etc. Making new dictionaries should not change the ids of existing dictionaries, of course.

    2. Discuss with your partner: What are the drawbacks of making this id field a number rather than a function returning a number? If your code is going to be used by many other programmers, how easy is it (intentionally or accidentally) for them to subvert your data?

    3. Then, the tricky part, make this variable not be global. But, clearly it does need to have dict-Factory inside its scope, and we only want one variable, not one per call to dict-Factory.

      Hint:

                            (define dict-Factory
                              (local […]
                                 (lambda () …)))
                            

    Sample solution.

  4. Note that noone using your dictionary knows that you are using a list-of-Entries to store your data; they just call the add!-fn and lookup-fn.

    But, other programmers tell you that they often use your dictionary and would like it to be faster. Furthermore, they also tell you that the keys are natural numbers in some bounded range: say, 1..100 for the programmer keeping track of grades, and 1..1000000 (Rice student ID) for the Registrar's database, or 100..800 for MUSI course numbers.

    We can make a dictionary which is optimized for quick lookups in this case. Rather than lists, we can use vectors. Write the factory below:

                  ;; dict-n-Factory: natnum --> dictionary
                  ;; Returns a dictionary where keys are natural numbers
                  ;; in the range 0..n-1.
                  ;; 
                  (define (dict-n-Factory num-keys)
                     …)
                  

    For the moment, write this assuming the key will always be in range.

    Sample solution.

  5. To be fair, the previous function doesn't quite return a true dictionary, since dictionaries allow the key to be anything. Modify your previous function so that if the key is not a natural number in range, it still works.

    How? Well, we could have our dictionary contain both a vector (to hold all the numeric keys in range) and a list (for the rest).

    Hint: You could copy/paste code from dict-Factory. But we hate repeating code, remember? How about, having your dict-n-Factory secretly containing not just a vector (for numeric keys in range), but also a whole other true dictionary (for for the rest).

    Sample solution.

  6. If you have time: Let's assume all our keys will be natural numbers. Vectors are great when we know how many entries will be used in advance. But what if we don't? What if somebody provides a numeric key greater than or equal to num-keys? We want the unbounded nature of a list, with the random-access nature of a vector.

    Here's a clever trick so that we can have our cake and eat it too. Start with a vector of size (say) 100. For keys of size less than 100, no problem. If we eventually get a key that is 100 or bigger, then we'll make a whole new vector of size 200, copy over the contents of our size-100 vector, keeping only the new, bigger vector! This "repeated doubling" technique is handy. Theoretically, it averages out to be half as fast as a regular vector implementation.

    Write such a new version.

    Sample solution.