Designing a library for printing recursive data structures

I have more than once found myself in need of a function for pretty-printing some recursive data type; Be it a prefix search tree, an abstract syntax tree for a domain specific language, XML or something else. Getting tired of having to implement the same type of logic over and again I decided to generalize the pattern. In the following sections I discuss the design of a tiny library for addressing this problem. It’s not a particularly challenging task but provides a good opportunity to touch on a few different concepts in functional programming. Examples include deep and shallow embeddings, monoids and equational reasoning. The implementation is given in F#.

The Problem

To give a motivating example, consider a (simplistic) type for representing XML data:

/// Attribute
type Attribute = string * string

/// XML is either a text-node or an element with attributes and children.
type XML =
    | Text of string
    | Node of string * list<Attribute> * list<XML>

An Attribute is a key-value pair and an XML node is either some text or an element with attributes and a list of children.

In order to render xml as a set of nested blocks with proper indentation levels, we wish to design a general purpose printing library.

Deriving the API

As a first task we need to decide on a name for the thing we’re trying to abstract; I call it Printer. At the bare minimum we also need a function for evaluating a printer by turning it into a string:

/// Executes a printer, producing a string value.
val run : Printer -> string

Let’s think about how to produce printers. We should naturally provide a function for lifting simple string values:

/// Produces a printer that prints a string.
val print : string -> Printer

The interesting part is the ability to achieve nesting or indentation. The most intuitive way to express this is via a function that operates on a printer by indenting it one level:

// Indents a printer one level.
val indent : Printer -> Printer

Finally we need some means for composing printers; One way to achieve compositionality is to require that our printer type forms a monoid by introducing an empty printer along with a binary operator for combining two printers:

/// Doesn't print anything.
val empty : Printer

/// Combine two printers.
val add : Printer -> Printer -> Printer

You may wonder what good empty brings; One benefit is getting sequencing for free, that is the ability to combine a list of printers:

/// Sequences a list of printers.
let sequence : seq<Printer> -> Printer = Seq.fold add empty

With the above interface, printing nested documents is straight forward. Here is an example for manually outputting some HTML:

let htmlPrinter =
    sequence [
        print "<html>"
        indent <|
            sequence [
                print "<body>"
                indent <|
                    print "Printed"
                print "</body>"
            ]
        print "</html>"
    ]

With the expectation that running this printer:

printfn "%s" (run htmlPrinter)

produces output similar to:

<html>
  <body>
    Printed
  </body>
</html>

Here is the complete set of combinators defining the API:

type Printer
/// Evaluates a printer.
val run : Printer -> string
/// Create a printer that prints the given string.
val print : string -> Printer
/// Indent a printer one level.
val indent : Printer -> Printer
/// The empty printer doesn't output anything.
val empty : Printer
/// Add two printers in a sequence.
val add : Printer -> Printer -> Printer

To ensure that the semantics is intuitive, there are a number of constraints that need to be satisfied by any particular implementation. I’ve identified the following ones:

print >> run = id
forall p: add p empty = p
forall p: add empty p = p
forall p1,p2,p3: add p1 (add p2 p3) = add (add p1 p2) p3
forall p1,p2,p3: indent (add p1 p2) = add (indent p1) (indent p2)

(1) states that run is the inverse of print, i.e. printing a string and then running it gives back the same string. (2) and (3) means that empty must be left and right identity for add which is required for Printer to form a monoid. (4) is also part of the monoid constraints and implies that add is associative. (5) states that indent is distributive over add; This is needed for safely being able to refactor expressions. Guaranteeing (2), (3) and (4) is necessary in order to provide intuitive semantics for sequence, for instance by ensuring that the following two printers are identical:

let pc1 = sequence [ p1; p2; p3 ]
let pc2 = sequence [p1 ; sequence [p2; p3]]

This is exactly why the monoid pattern is useful.

A shallow embedding

To complete the library we now need to find a definition of the type Printer that allows for a feasible implementations of the required functions.

Thinking of the API as a small Embedded Domain Specific Language (EDSL), there are broadly speaking two implementation strategies available - Deep and shallow embeddings. Deep embeddings preserve the expression structure of the operations; This generally enables more optimization capabilities and also makes it possible to provide multiple interpreters. In a shallow embedding no intermediate data structure is used for building up expression trees, instead the semantics of an operation is part of its definition.

Let’s start by the simplest possible solution, not worrying about whether to support multiple interpreters or not. Applying the principal of Denotational Design, we need to precisely define what it means to be a Printer. A printer is something that has the ability to output a nested structure. Parameterizing over the choice of how to print a line given an indentation level, this can be represented by the following function:

    type Printer = (int -> string -> string) -> string

In other words a printer is a black-box that when applied to a function from an indentation level and a string to string, returns a pretty-printed structure. All the information of what to output is contained within the closure of the function. The fact that it’s opaque, i.e. that it is not possible to peek inside a printer to find out how it was constructed, places it in the category of shallow embeddings.

Let’s see if the type is sufficient for implementing the interface. Starting with run:

let run (p: Printer) =
    p <| fun n s ->
        let space = String.Join("", List.replicate (n * 2) " ")
        sprintf "%s%s\n" space s

Run simply invokes the printer with a function that indents each line with two spaces per indentation level.

Lifting a string into a printer is also straight forward:

let print s = fun ind -> ind 0 s

The function turns a string into a printer that invokes the indentation argument with level 0.

To implement indent we need to transform a printer into a new one that when executed invokes its given indent function with a greater indentation level:

let indent p = fun ind -> p (fun n -> ind (n + 1))

When it comes to empty we’re left with little choice but to output an empty string:

let empty = fun _ -> ""

Last one is add:

let add p1 p2 = fun ind -> p1 ind + p2 ind

Which simply runs both printers and concatenates their output.

We further need to ensure that the implementation is compatible with the semantical constraints.

First, (print >> run) must be equivalent with the identity function:

(print >> run) s                                                    =
// Definition of function composition:
run (print s)                                                       =
// Definition of print:
run (fun ind -> ind 0 s)                                            =
// Definition of run:
(fun ind -> ind 0 s) (fun n s -> sprintf "%s%s" (space n) s)        =
// Apply the arguments (beta reduction):
sprintf "%s%s" (space 0) s)                                         =
// Definition of space and sprintf:
s

What about the monoid constraints for add and empty? Here is a proof for left identity (2):

add p empty =
// Definition of add:
fun ind -> p ind + empty ind                                        =
// Definition of empty:
fun ind -> p ind + ((fun _ -> "") ind)                              =
// Beta reduction
fun ind -> p ind + ""                                               =
// Empty string is identity of string concatenation:
fun ind -> p ind                                                    =
p

In fact, these properties follow from the monoid properties of string. The proof in the other direction is symmetric. We also need to show that add is associative:

add p1 (add p2 p3)                                                  =
// Definition of add on the outer argument:
fun ind -> p1 ind + ((add p2 p3) ind)                               =
// Definition of add on inner argument:
fun ind -> p1 ind + ((fun ind -> p2 ind + p3 ind)   ind)            =
// Beta reduction:
fun ind -> p1 ind + p2 ind + p3 ind                                 =
// Associativity of string concatenation:
fun ind -> (p1 ind + p2 ind) + p3 ind                               =
/// Definition of add:
add (add p1 p2) p3

Again, the proof relies on the associativity of string concatenation.

Finally we need to show that indent is distributive according to (5). This follows directly from the definitions of the two functions:

indent (add p1 p2)                                                  =
// Definition of indent:
fun ind -> (add p1 p2) (fun n -> ind (n + 1))                       =
// Definition of add
fun ind -> (fun ind -> p1 ind + p2 ind) (fun n -> ind (n + 1))      =
// Beta reduction:
fun ind -> (p1 (fun n -> ind (n + 1)) + p2 (fun n -> ind (n + 1)))  =
// Definition of indent:
add (indent p1) (indent p2)

To wrap it up, below is the complete listing of the implementation. I made the definition of Printer private, added a function for running printers with a custom indentation parameter and included operator aliases for sequence print, and add:

open System

/// A printer is a function from an indentation level to a list of strings.
type Printer = private { Run : (int -> string -> string) -> string}

/// Creates a printer.
let private mkPrinter f = {Run = f}

/// Creates a string of whitespace of the given length.
let private space n = String.Join("", List.replicate (n * 2) " ")

/// Create a printer from  a string.
let print s = mkPrinter <| fun ind -> ind 0 s

/// Indents a printer.
let indent p = mkPrinter <| fun ind -> p.Run ((+) 1 >> ind)

/// Runs a printer returning a string.
let runWith ind p = p.Run ind

/// Runs a printer returning a string.
let run = runWith <| fun n -> sprintf "%s%s\n" (space n)

/// An empty printer.
let empty = mkPrinter <| fun _ -> ""

/// Adds two printers.
let add tp1 tp2 = mkPrinter <| fun ind ->
     tp1.Run ind + tp2.Run ind

/// Concatenates a sequence of printers.
let sequence = Seq.fold add empty

/// Short for sequence.
let (!<) = sequence

/// Short for add.
let (<+>) = add

/// Short for print.
let (!) = print

Returning to the motivational example of printing XML, here is a complete implementation of a show function for the XML type:

/// Attribute
type Attribute = string * string

/// XML is either a text-node or an element with attributes and children.
type XML =
    | Text of string
    | Node of string * list<Attribute> * list<XML>

/// Pretty-prints an xml value.
let show =
    let showAttrs attrs =
        let showAttr (n,v) = sprintf " %s=%s" n v
        String.Join("", List.map showAttr attrs)

    let rec show = function
        | Text t                ->
            !t
        | Node (name,atrs,chs)  ->
            !<[
                !(sprintf "<%s%s>" name (showAttrs atrs))
                indent !<(List.map show chs)
                !(sprintf "</%s>" name)
            ]
    show >> run

Hopefully the example is straight forward to follow. In case you don’t like the prefix operators, you could change the definition to use sequence instead of !<, print instead of !.

A deep embedding

A deep embedding must preserve the structure of how a printer is assembled. This is required whenever you need to support multiple back-ends or different ways of interpreting expressions. Creating a data type for a deep embedding is straight forward, we basically just need to list the distinct language constructs. The following type will do:

// Deep embedding of printer type.
type Printer =
    | Empty
    | Print of string
    | Indent of Printer
    | Add of Printer * Printer

In this way, all operations are trivial. Here are the functions mirroring the constructors:

// Empty printer.
let empty = Empty

// Print a string.
let print = Print

// Indent a printer.
let indent p = Indent p

// Composing two printers.
let add p1 p2 =
    match p1,p2 with
    | Empty, p
    | p, Empty  -> p
    | _         -> Add(p1,p2)

The only interesting part is add which contains an optimization step for implementing left and right identity for empty in accordance with the specified semantics. All the work of evaluating a printer is pushed to the interpretors, in our case a function for constructing a string. Here are the equivalent runWith and run functions:

// Executes a printer.
let runWith ind p =
    let sb = new Text.StringBuilder()
    let rec go n = function
        | Empty         -> ()
        | Print l       -> ignore <| sb.AppendLine (ind n l)
        | Indent p      -> go (n+1) p
        | Add (p1,p2)   -> go n p1 ; go n p2
    go 0 p
    sb.ToString()

/// Runs a printer returning a string.
let run = runWith <| fun n -> sprintf "%s%s\n" (space n)

Each language construct is handled separately with Indent and Add traversing their arguments recursively. A Text.StringBuilder object is used to accumulate the output of printed lines in order to improve on efficiency.

What about the semantics, how do we prove that the definition is compatible with the constraints listed above? What we really need to check is the validity of expressions with respect to a particular interpreter (in this case run). For instance looking at constraint (4) concerning associativity of add: `(add

p1 (add p2 p3) = add (add p1 p2) p3)`, we’re not interested in whether these expressions are identical or not; Only that they produce the same output for a given interpreter. However, whenever we are able to show that two expression are in fact identical it naturally follows that all possible interpretations are identical.

Showing that constraints (1) and (4) holds is similar to the example above. For left and right identity (2,3) it’s possible to leverage the definition of add canceling out Empty values, but only in case the type constructors are hidden in order to rule out the construction of values such as (Add (Print "Hello"), Empty). This fact introduces a subtle problem; On the one hand exposing the printer type is necessary for allowing different interpretors to be defined. On the other hand, providing access to the constructors removes the control over how values are constructed. One solution would be to expose the core definitions in a separate module.

Using equational reasoning is slightly more complicated given the imperative style of the runWith function. The complete proofs are left as an exercise.

Another approach is to provide a mapping from the deep to the shallow embedding. Assuming a module ShallowPrinter containing the shallow implementation from above, here is a function for performing the translation along with runWith function:

module SP = ShallowPrinter

/// Transforms a deep printer to a shallow one.
let rec toShallow = function
    | Empty         -> SP.empty
    | Print p       -> SP.print p
    | Indent p      -> SP.indent (toShallow p)
    | Add (p1,p2)   -> SP.add (toShallow p1) (toShallow p2)

/// Reusing the interpretor from the shallow embedding.
let runWith ind = toShallow >> SP.runWith ind

/// Runs a printer returning a string.
let run =  runWith <| fun n -> sprintf "%s%s\n" (space n)

Now, all proofs concerning the shallow implementation can be safely reused in order to show that the constraints are fulfilled for this definition of run.

Here is the a complete listing of a stand-alone deep embedding:

open System

// Deep embedding of printer type.
type Printer =
    | Empty
    | Print of string
    | Indent of Printer
    | Add of Printer * Printer

/// Creates a string of whitespace of the given length.
let private space n = String.Join("", List.replicate (n * 2) " ")

// Empty printer.
let empty = Empty

// Print a string.
let print = Print

// Indent a printer.
let indent p = Indent p

// Composing two printers.
let add p1 p2 =
    match p1,p2 with
    | Empty, p
    | p, Empty  -> p
    | _         -> Add(p1,p2)

// Execute a printer.
let runWith ind p =
    let sb = new System.Text.StringBuilder()
    let rec go n = function
        | Empty         ->
            ()
        | Print l        ->
            ignore <| sb.AppendLine (ind n l)
        | Indent p      ->
            go (n+1) p
        | Add (p1,p2)   ->
            go n p1
            go n p2
    go 0 p
    sb.ToString()

/// Runs a printer returning a string.
let run =  runWith <| fun n -> sprintf "%s%s\n" (space n)

/// Concatenates a sequence of printers.
let sequence ps = Seq.fold add empty ps

/// Short for sequence.
let (!<) = sequence

/// Short for add.
let (<+>) = add

/// Short for print.
let (!) = print

At last, to illustrate that it is in fact possible to define alternative interpretors of printer expressions, consider the following example that given a printer, generates F# code for printing the expression itself:

let showFSharp : Printer -> string =
    let rec show = function
        | Empty         ->
            !"Empty"
        | Print s       ->
            !(sprintf "Print \"%s\"" s)
        | Indent p      ->
            !<[
                !"Indent ("
                indent (show p)
                !")"
            ]
        | Add (p1,p2)   ->
            !<[
                !"Add ("
                indent (show p1)
                indent !","
                indent (show p2)
                !")"
            ]
    show >> run

Using this function on the initial htmlPrinter example, we actually retrieve and equivalent F# expression for recreating the printer:

// Define a custom printer.
let htmlPrinter =
    sequence [
        print "<html>"
        indent <|
            sequence [
                print "<body>"
                indent <|
                    print "Printed"
                print "</body>"
            ]
        print "</html>"
    ]

printfn "%s" (showFSharp htmlPrinter)

Yielding the output:

Add (
  Add (
    Print "<html>"
    ,
    Indent (
      Add (
        Add (
          Print "<body>"
          ,
          Indent (
            Print "Printed"
          )
        )
        ,
        Print "</body>"
      )
    )
  )
  ,
  Print "</html>"
)

Which indeed valid F#. This would not have been possible using the shallow embedding.

Summary

In this post I’ve addressed the problem of designing a library for pretty-printing recursive data structures. The approach taken is general and starts by identifying a minimal set of required operations needed and then for each operation define a set of constraints (or laws) that any implementation must obey. In order support compositionally of printers the monoid pattern was used.

Finally, two different realizations of the library was given, one shallow and one deep embedding. By using equational reasoning we were able to show that the semantical constraints were satisfied.