(* This file is distributed under the terms of the Q Public License version 1.0, a copy of which is included at the very end of this file. All MetaOCaml source code must be QPL'ed because the staging runtime includes the OCaml compiler which is QPL'ed. I'd be delighted to receive any comments on this document, so come bug me by all means: Jun Inoue *) (* Delimited Assumptions: How to Supercompile with Staging Based on: Morten Heine Sorensen, Robert Gluck & Neil D. Jones, "Towards Unifying Partial Evaluation, Deforestation, Supercompilation, and GPC". In this article, I show that staging can express the information propagation in Sorensen et al.'s positive supercompiler by updating the static portion of partially static data structures in-place to reflect conditional assumptions. The technique developed here may be a very useful tool for writing self-optimizing libraries; such libraries are more robust and analyzable when expressed within a clearly defined language rather than by relying on compiler optimizations and ad-hoc pragmas. This case study is partly meant to remedy the present situation in which the literature has a wealth of multi-stage type systems but a dearth of documentation on practical multi-stage programming techniques. This study in particular shows that staging with effects can do more than partial evaluation, which adds to the motivation and recent interest in effectful multi-stage type systems. [Implementation Notes] This file implements the KMP string-search algorithm, but for convenience it works on char list instead of string. In comments I use "this syntax" for writing char list literals. You might find str_to_list : string -> char list at the end of this file handy if you want to try out the search algorithm interactively. The redundant (* *) below some comments are inserted to prevent Emacs' tuareg-mode from screwing up code indentation whenever I do paragraph-fill. It doesn't always work though...sigh. I will present several implementations of the same function that builds up to the final version; some of the implementations work while others don't. Different versions of the same function are given identical names but are defined in different local modules, so that all versions are accessible without modifying this file. *) (*** *** Introduction and summary of Sorensen et al. ***) (* In a nutshell, Sorensen et al.'s positive supercompiler is partial evaluation (PE) using partially static data with conditional assumptions. Specifically: (a) upon entering a branch of a dynamic pattern-match, the super- compiler adds assumptions to the scrutinee that is valid only while processing that branch, making the scrutinee partially static, and (b) it propagates that assumption into the context by replacing all references to the scrutinee (in a copy of the context) with the partially static datum. [FIXME: what's the relationship with conditionally static data? See Charles Consel and Olivier Danvy, "For a Better Support of Static Data Flow".] This file is all about simulating these mechanisms by staging. The propagation (b) is modeled by updating partially static data in place; the delimited nature of the assumptions (a) is modeled by installing a dynamic-wind, which undoes the update when leaving the conditional branch. I'll elaborate on this description using Sorensen et al.'s running example, the KMP pattern-matcher. Here's the reference implementation: ls_match is a naive, generic implementation. It takes two strings p and s, and decides whether p is a substring of s. naive_aab is ls_match naively specialized for the pattern "aab". kmp_aab is the manually specialized ls_match for "aab", which implements the Knuth-Morris-Pratt (KMP) algorithm. *) (* *) module Reference = struct (* Decide whether p is a substring of s. Here are some mnemonics for variable names: p, pp -- Pattern to search for s, ss -- String to search over op -- Original Pattern, kept for backtracking os -- Original String, kept for backtracking *) (* ls_match : char list -> char list -> bool *) let rec ls_match p s = loop p s p s and loop pp ss op os = match pp, ss with | [], _ -> true | _, [] -> false | p::pp', s::ss' -> if p = s then loop pp' ss' op os else next op os and next op = function | [] -> false | s::ss -> loop op ss op ss (* Naively specialized for p = "aab". Suffixes indicate what the remainder of the pattern looks like at that point. *) let rec naive_aab ss = ls_match_aab ss ss and ls_match_aab ss os = match ss with | [] -> false | x::xs -> if x = 'a' then ls_match_ab xs os else next os and ls_match_ab ss os = match ss with | [] -> false | x::xs -> if x = 'a' then ls_match_b xs os else next os and ls_match_b ss os = match ss with | [] -> false | x::xs -> if x = 'b' then true else next os and next = function | [] -> false | _::xs -> ls_match_aab xs xs (* NB: we can do better than this *) (* KMP-style matcher, specialized for p = "aab". No more `os'. *) let rec kmp_aab = function | [] -> false | x::xs -> if x = 'a' then kmp_ab xs else kmp_aab xs and kmp_ab = function | [] -> false | x::xs -> if x = 'a' then kmp_b xs else kmp_aab xs and kmp_b = function | [] -> false | x::xs -> if x = 'b' then true else (* NB: better than resetting to kmp_aab *) if x = 'a' then kmp_b xs else kmp_ab xs end (* A review of the clever shortcuts that KMP takes should make it clear where we're heading with this staging exercise. Consider the call to the naively specialized function naive_aab (str_to_list "aaab") i.e. naive_aab ('a'::'a'::'a'::'b'::[]) As far as the matcher is concerned, the contents of the string is unknown at the beginning of the call: let's denote that by writing naive_aab _ where _ means "some data, don't know what it contains". Then naive_aab calls ls_match_aab and we arrive at the pattern-match match _ with | [] -> false | x::xs -> ... The _ is, unbeknownst to the matcher, equal to 'a'::'a'::'a'::'b'::[], so the second branch is taken. This reveals the head constructor of the previously unknown data, i.e., _ = 'a'::_' where _' is a fresh unknown. Proceeding in this manner, one more pattern-match and two equality tests reveal the first two elements of the input, and we're at the call ls_match_b __ ('a'::'a'::__) where __ is another fresh unknown. Then ls_match_b pattern-matches on the first argument, hence getting to if x = 'b' then true else next os where the environment is: ss = __ os = 'a'::'a'::__ x = 'a' xs = 'a'::__ So we jump to next ('a'::'a'::__) thence to ls_match_aab ('a'::'a'::__) ('a'::'a'::__) which re-inspects the first argument and checks again that the first two characters are 'a'. But we can do better than that. We know the first two characters at this point, so we can safely zip right to the if-then-else in ls_match_b, jumping directly from the call to `next' to ls_match_b __ __ This is exactly what KMP does. In kmp_b, when the test x = 'b' fails, kmp_b resets to kmp_b or kmp_ab, whichever is allowable and is the shorter cut. Note that the KMP matcher doesn't need to keep around the parameter `os', because KMP never has to backtrack once the pattern to search for is fixed; all branches can be statically resolved up to the point where the matcher is forced to inspect an unknown cell. The DFA constructed in a textbook formulation of KMP encodes this kind of knowledge: if we reach a certain point in the code then we partially know the input, therefore we can skip some steps. In the code above, the DFA is embodied in the recursive structure of the program. *) (* *) (* There are two challenges in implementing this optimization. One is that the partially static knowledge about the text (`ss' and `os') are *delimited assumptions*, not unconditional facts. If we have match ls with | [] -> (* A *) | x::xs -> (* B *) then in B we know ls = _::__, but this knowledge is void both in A and outside of the match. In the previous example, we knew that ls_match_b always receives ('a'::'a'::_) precisely because reaching that function entails taking the cons-branches of some matches and the affirmative branches of some "if x=y" statements. So we need to introduce assumptions when we enter a conditional branch and make them expire when we leave the branch. The other issue is that it's our partially static knowledge about `os' that saves us computation, yet we never inspect `os' to get that knowledge. In any call to ls_match_b, we know os = 'a'::'a'::_ because of pattern-matches on `ss', not on `os'! So somehow the information learned about `ss' has to trickle to `os'. This is the crucial difference between supercompilation and PE. Sorensen et al.'s positive supercompiler gives a unique name to each data so that all copies of the same data have the same name. When the supercompiler enters a conditional branch within which `ss' = 'a'::_ holds, then it propagates this information to all copies by replacing all occurrences of (the unique name given to) `ss' in the context and in all subterms. The main idea of this article is to capture this information sharing by physically sharing partially static data, and updating it in-place. In KMP, `ss' and `os' will share the same memory location. Pattern-matches on `ss' will modify `ss', and therefore `os' as well, to reflect the assumptions introduced by the match. We install a handler that will ensure that this update is undone upon leaving the branch. *) (* *) (*** *** Staging the matcher without supercompilation. ***) (* So let's try to stage the naive matcher; I'll gradually work up to a complete solution. First, a naive specialization on the pattern doesn't work at all. Because the termination measure of the matcher includes the text and not just the pattern, we end up with a nonterminating generator. Basically, it keeps generating dynamic matches that deconstruct `ss' further and further ad infinitum. *) (* A non-terminating matcher specializer/generator. *) module Nonterm = struct (* This doesn't terminate. I need to refer recursively to the generated code. *) let rec ls_match_gen p = . .~(loop_gen p .. p ..)>. and loop_gen pp ss op os = match pp with | [] -> .. | p::pp -> . false | s::ss when p = s -> .~(loop_gen pp .. op os) | s::ss -> .~(next_gen op os)>. and next_gen op os = . false | s::ss -> (* Need to lookup memo table here. *) .~(loop_gen op .. op ..) >. end (* So we can't just keep unrolling indefinitely. We need to generate recursive functions that take the text as input, and if we're ever asked to specialize for the same pattern again, to recursively call that function (in PE-speak, we introduce specialization points to avoid specializing the same expression twice). It's kind of like infinite CSE. The technique is basically that of: Kedar Swadi, Walid Taha and Oleg Kiselyov, "A Monadic Approach for Avoiding Code Duplication when Staging Memoized Functions". *) (* Memoization table implementation. *) (* BUG: metaocaml crashes if we put these functions at toplevel, hence the gratuitous module. metaocamlopt seems to work fine either way. *) module MemoTable = struct let lookup ?eq x alist = let eq = match eq with Some eq -> eq | None -> (=) in let rec assoc = function | [] -> None | (y,v)::rest -> if eq x y then Some v else assoc rest in assoc alist and add x y alist = (x,y)::alist and empty = [] end (* Here's a naively memoized version that produces correct but inefficient code; basically the same code as Reference.naive_aab. The reason is as explained in the Intro: we don't propagate assumptions about `ss' to `os'. [Implementation Note] The let recs generated by NaiveMemo.ls_match_gen are nested rather than simultaneous, i.e. (* A *) let rec loop_1 ss os = ... let rec loop_2 ss os = ... let rec loop_3 ss os = ... in ... in ... in ... instead of (* B *) let rec loop_1 ss os = ... and loop_2 ss os = ... and loop_3 ss os = ... in ... I can't seem to generate B directly because the number of simultaneous bindings to be generated depends on the length of the pattern we're specializing for. I haven't checked if style A has any performance hits, but regardless it's ugly and worth trying to flatten into B. Lambda-lifting might let us do that, but we might have to grapple with scope extrusion issues. *) module NaiveMemo = struct open MemoTable let rec ls_match_gen p = . .~(loop_gen empty p .. p ..)>. and loop_gen memo pp ss op os = (* Don't need to index the table by op because op is constant. *) match lookup pp memo with | Some f -> .<.~f .~ss .~os>. | None -> match pp with | [] -> .. | p::pp' -> .. memo in . false | s::ss'' when p = s -> .~(loop_gen memo pp' .. op ..) | s::ss'' -> .~(next_gen memo op ..)>.) in loop .~ss .~os>. and next_gen memo op os = . false | s::ss -> .~(loop_gen memo op .. op ..)>. let ls_match p = .!ls_match_gen p end (*** *** Staging with delimited assumptions. ***) (* Quick recap. The partial knowledge about os is: - determined by pattern matching on ss, without touching os - different for each branch of the "match ss' with" in loop_gen. So we need to: - have partially static data - share static assumptions between copies of the same data - delimit the lifetime of the assumption *) (* Partially static data types with knowledge shared through mutation. *) module PSDataMut = struct (* FIXME: a production-grade generic library should additionally provide a future-stage variable bound to the Known value. The value is for static computation, the variable is for injecting the value into generated code while avoiding CSP overhead. *) (* partial_raw is a wrapper providing the classical partially static data structure idiom. A partial_raw value should be of the form (Unknown ..) or (Known (* some const *)). 'e is the environment classifier 's is the static representation of the payload 'd is the dynamic payload type 's and 'd may not match in recursive datatypes, like lists: 's = int ps_list whereas 'd = int list. The static representation is not just int list partial_raw because every cons must be marked Unknown/Known (i.e. partial_raw is a mixin not a simple wrapper). We need to know both 's and 'd to be able to type the `forget' functions, which strip the Unknown/Known constructors. See `forget_static_list' below. [FIXME: I'm not really convinced myself that we can't work around this.] *) type ('e, 's, 'd) partial_raw = Unknown of ('e, 'd) code | Known of 's (* partial makes partial_raw mutable, so that we can share information across branches. *) type ('e, 's, 'd) partial = ('e, 's, 'd) partial_raw ref (* Grab a dynamic or partially static value and package it as a partially static type. *) let unknown : ('e,'d) code -> ('e,'s,'d) partial = fun x -> ref (Unknown x) and known : 's -> ('e,'s,'d) partial = fun x -> ref (Known x) (* In a partially static list, we need to mix in Known/Unknown throughout the whole structure. FIXME: Type-level open recursion would be the general mechanism to use here, but in OCaml it makes type errors completely unreadable, so I'll hard-code the mixin here for now. *) (* The car is always partial because we don't know if it even exists. *) type ('e,'s,'d) ps_list_raw = Nil | Cons of ('e,'s,'d) partial * ('e,'s,'d) ps_list and ('e,'s,'d) ps_list = ('e, ('e,'s,'d) ps_list_raw, 'd list) partial (* Smart constructors. *) let nil = ref (Known Nil) and cons x xs = ref (Known (Cons (x, xs))) (* Introduce a (temporally) delimited assumption x = v, valid only while executing the thunk. *) let assuming x v thunk = let saved = !x in try x := v; let ret = thunk () in x := saved; ret with e -> x := saved; raise e (* Forget partially static information and return opaque code. User needs to write a callback to handle recursive structures. Usually, use of forget functions means you're not making full use of your assumptions. *) let forget_static walk x = match !x with | Unknown x -> x | Known x -> walk x (* For non-recursive or otherwise fully dynamic types. These functions are eta-expanded to get around value restriction. *) let forget_static_ground y = forget_static (fun x -> ..) y (* Monomorphic versions are pretty-printed better. *) let forget_static_int y = forget_static (fun (x:int) -> ..) y let forget_static_string y = forget_static (fun (x:string) -> ..) y let forget_static_float y = forget_static (fun (x:float) -> ..) y let forget_static_bool y = forget_static (fun (x:bool) -> ..) y let rec forget_static_list for_elt = let rec forget x = forget_static walk x and walk = function | Nil -> .<[]>. | Cons (x,xs) -> .<.~(for_elt x) :: .~(forget xs)>. in forget (* Examples. *) let eg_forget_unknown = . .~(forget_static_list forget_static_int (unknown ..))>. (* ==> . x_1>. *) let eg_forget_partial = . .~(forget_static_list forget_static_int (cons (known 0) (cons (unknown ..) (unknown ..))))>. (* ==> . fun y_3 -> (0 :: x_2 :: y_3)>. *) end (* Now we're ready to tackle supercompilation. The code itself is fairly straightforward, if extremely verbose. Basically, before every match we check if the head constructor is known. If it is, short-circuit; otherwise, generate a dynamic match and introduce delimited assumptions. This simple strategy gives a generator that returns the same code as Sorensen et al., except that let recs are nested. The positive supercompiler (and hence the generator here) generates redundant conditionals, which look like if x = 'a' then foo else if x = 'a' then bar else baz This happens because we track positive information, x = 'a', but not negative information, x <> 'a'. We could probably easily extend this approach by carrying more general characterizations of the data in the partially static data type. [FIXME: try it!] *) (* *) (* Using memoization + the partially static type with mutation-based information sharing. No attempt made at factoring common components into concise combinators here, so it's pretty verbose. But it's a repetition of the same basic pattern everywhere, so it shouldn't be hard to navigate through. This matcher doesn't use `forget', so it makes full use of the assumptions it introduced. *) module MemoMut = struct open PSDataMut open MemoTable (* The p can be any 'a list, but restricting polymorphism improves pretty-printing, which helps with debugging. *) let rec ls_match_gen (p : char list) = . .~(let s' = unknown .. in loop_gen [] p s' p s')>. and loop_gen memo pp ss op os = match pp with | [] -> .. | p::pp' -> match !ss with | Known Nil -> .. | Known (Cons (x,xs)) -> (match !x with | Known x' -> if x' = p then loop_gen memo pp' xs op os else next_gen memo op os | Unknown x' -> . loop_gen memo pp' xs op os)) else .~(next_gen memo op os)>.) | Unknown ssvar -> match lookup pp memo with | Some f -> .<.~f .~ssvar>. | None -> (* function makes pretty-printing much more concise than fun+match. *) . false | x::xs -> .~(let memo = add pp .. memo and x' = unknown .. and xs' = unknown .. in assuming ss (Known (Cons (x', xs'))) (fun () -> . loop_gen memo pp' xs' op os)) else .~(next_gen memo op os)>.)) in loop .~ssvar>. and next_gen memo op os = match !os with | Known Nil -> .. | Known (Cons (x, xs)) -> loop_gen memo op xs op xs | Unknown xs -> . false | s::ss -> .~(let ss' = unknown .. in loop_gen memo op ss' op ss')>. let ls_match p = .!ls_match_gen p end (*** *** Packaging delimited assumptions into combinators. ***) (* TODO *) (* The following is a brief explanation of the challenge in developing convenient combinators for using delimited assumptions. *) (* Let's try to streamline the technique explaiend above and provide combinators to make the code concise. We want to be able to write something very close the reference source, so e.g. match x with | [] -> foo | x::xs -> bar should be staged like match_pslist ls (fun () -> foo) (fun x xs -> bar) match_ps_list inspects ls and if the head constructor is known, deconstructs it statically. Otherwise, match_ps_list generates a dynamic match and executes the callbacks, wrapped in PSDataMut.assuming. A similar combinator can be defined for "if x = y". But there's a problem with this interface. Consider a simplified implementation: let match_ps_list for_nil for_cons = function | Unknown xs -> . .~((* A *) for_nil ()) | x::xs -> .~((* B *) for_cons (unknown ..) (unknown ..))>. | Known Nil -> (* C *) for_nil () | Known (Cons (x,xs)) -> (* D *) for_cons x xs for_nil is called in both branches A and C, but the callee usually wants to use a different implementation in those two branches. If we ask for four callbacks or supply an extra parameter, however, this combinator wouldn't be terribly concise: (* 4 callbacks *) match_ps_list' ls (fun () -> foo1 (* ls = Known Nil *)) (fun () -> foo2 (* ls = Unknown, in [] case of dynamic match *)) (fun x xs -> foo3 (* if ls = Known Cons *)) (fun x xs -> foo4 (* if ls = Unknown, in ::-case of dynamic match *)) (* extra parameter *) (* type wasknown = WasKnown | WasUnknown *) match_ps_list' ls (function | WasKnown -> foo1 | WasUnknown -> foo2) (fun x xs -> function | WasKnown -> foo3 | WasUnknown -> foo4) (* no combinator *) match ls with | Known Nil -> foo1 | Known (Cons (x,xs)) -> foo3 | Unknown ls -> . .~(assuming ls (known nil) (fun () -> foo2)) | x::xs -> .~(assuming ls (known (cons (unknown ..) (unknown ..))) (fun () -> foo4)) *) (*** *** Test code. ***) let enum start stop = let rec go acc n = if n < start then acc else go (n::acc) (n-1) in go [] stop let str_to_list s = List.map (String.get s) (enum 0 (String.length s - 1)) let test_cases = [("a","",false); ("a","abc",true); ("a","ba",true); ("a","aaa",true); ("a","aa",true); ("a","bbb",false); ("aa","",false); ("aa","abc",false); ("aa","ba",false); ("aa","aaa",true); ("aa","aa",true); ("aa","bbb",false); ("aa","bba",false); ("aab","",false); ("aab","abc",false); ("aab","ba",false); ("aab","aaa",false); ("aab","aaab",true); ("aab","aaba",true); ("aab","caba",false); ("aab","aa",false); ("aab","bbb",false); ("aab","bba",false); ("aaab","",false); ("aaab","abc",false); ("aaab","ba",false); ("aaab","aaa",false); ("aaab","aaab",true); ("aaab","aaaba",true); ("aaab","aaaab",true); ("aaab","aaba",false); ("aaab","caba",false); ("aaab","aa",false); ("aaab","bbb",false); ("aaab","bba",false); (* For testing the test code. *) (* ("this should fail","bba",true); ("this should fail","this should fail",false); *) ] (* Test a matcher. *) let test f = List.iter (fun (pat,text,expect) -> if f (str_to_list pat) (str_to_list text) <> expect then Printf.printf "FAILED: \"%s\" \"%s\"\n" pat text) test_cases (* Test a matcher that is specialized to a fixed pattern. The pattern must appear in the test_cases above. *) let test_spec f pat = List.iter (fun (pat',text,expect) -> if pat = pat' && f (str_to_list text) <> expect then Printf.printf "FAILED: \"%s\" \"%s\"\n" pat text) test_cases (* As a special exception to the Q Public Licence, you may develop application programs, reusable components and other software items that link with the original or modified versions of the Compiler and are not made available to the general public, without any of the additional requirements listed in clause 6c of the Q Public licence. ---------------------------------------------------------------------- THE Q PUBLIC LICENSE version 1.0 Copyright (C) 1999 Troll Tech AS, Norway. Everyone is permitted to copy and distribute this license document. The intent of this license is to establish freedom to share and change the software regulated by this license under the open source model. This license applies to any software containing a notice placed by the copyright holder saying that it may be distributed under the terms of the Q Public License version 1.0. Such software is herein referred to as the Software. This license covers modification and distribution of the Software, use of third-party application programs based on the Software, and development of free software which uses the Software. Granted Rights 1. You are granted the non-exclusive rights set forth in this license provided you agree to and comply with any and all conditions in this license. Whole or partial distribution of the Software, or software items that link with the Software, in any form signifies acceptance of this license. 2. You may copy and distribute the Software in unmodified form provided that the entire package, including - but not restricted to - copyright, trademark notices and disclaimers, as released by the initial developer of the Software, is distributed. 3. You may make modifications to the Software and distribute your modifications, in a form that is separate from the Software, such as patches. The following restrictions apply to modifications: a. Modifications must not alter or remove any copyright notices in the Software. b. When modifications to the Software are released under this license, a non-exclusive royalty-free right is granted to the initial developer of the Software to distribute your modification in future versions of the Software provided such versions remain available under these terms in addition to any other license(s) of the initial developer. 4. You may distribute machine-executable forms of the Software or machine-executable forms of modified versions of the Software, provided that you meet these restrictions: a. You must include this license document in the distribution. b. You must ensure that all recipients of the machine-executable forms are also able to receive the complete machine-readable source code to the distributed Software, including all modifications, without any charge beyond the costs of data transfer, and place prominent notices in the distribution explaining this. c. You must ensure that all modifications included in the machine-executable forms are available under the terms of this license. 5. You may use the original or modified versions of the Software to compile, link and run application programs legally developed by you or by others. 6. You may develop application programs, reusable components and other software items that link with the original or modified versions of the Software. These items, when distributed, are subject to the following requirements: a. You must ensure that all recipients of machine-executable forms of these items are also able to receive and use the complete machine-readable source code to the items without any charge beyond the costs of data transfer. b. You must explicitly license all recipients of your items to use and re-distribute original and modified versions of the items in both machine-executable and source code forms. The recipients must be able to do so without any charges whatsoever, and they must be able to re-distribute to anyone they choose. c. If the items are not available to the general public, and the initial developer of the Software requests a copy of the items, then you must supply one. Limitations of Liability In no event shall the initial developers or copyright holders be liable for any damages whatsoever, including - but not restricted to - lost revenue or profits or other direct, indirect, special, incidental or consequential damages, even if they have been advised of the possibility of such damages, except to the extent invariable law, if any, provides otherwise. No Warranty The Software and this license document are provided AS IS with NO WARRANTY OF ANY KIND, INCLUDING THE WARRANTY OF DESIGN, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Choice of Law This license is governed by the Laws of France. *)