Getting started
In the following sections we'll look at how to get started using Popper for writing and running tests.
Popper consists of two parts:
-
A library
popper
— containing functions for building and executing tests. -
A library
ppx_deriving_popper
— for deriving functions to sample and compare test data for custom data-types.
Installation
Installing ppx_deriving_popper
via OPAM brings in
both the ppx and the testing library.
> opam install ppx_deriving_popper
Similarly, if you're using the ppx, adding a preprocessor dependency on
ppx_deriving_popper
includes the runtime library popper
as well.
Currently, using the [@@deriving ... popper]
attribute also requires
dependencies for deriving show
and ord
for your data types. Below is an
example dune file for defining a test. It
assumes a file run.ml
:
(test
(name run)
(preprocess
(pps ppx_deriving.show ppx_deriving.ord ppx_deriving_popper)))
Unit tests
As a first example, here is a simple unit-test verifying that the function
List.rev
behaves as expected when applied with the list [1; 2; 3]
:
open Popper
let () =
check @@ fun () ->
equal Comparator.(list int) (List.rev [ 1; 2; 3 ]) [ 3; 2; 1 ]
When run with dune runtest
, it produces the following output:
The function check
is for running a single (anonymous) test. Its signature
is:
val check : ?config:Config.t -> (unit -> Proposition.t Sample.t) -> unit
It takes a function that produces a value of type Proposition.t Sample.t
and
executes it.
We'll discuss the Sample.t
part in the next sections. The Proposition.t
value is the result of the test, and evaluates to one of the following values:
pass
— the proposition istrue
, the test passes.fail
— the proposition isfalse
, the test fails.discarded
— the proposition is neithertrue
norfalse
, the test is discarded.
The function equal
constructs a proposition stating that two values are equal
according to the given Comparator.t
value:
val equal : ?loc:string -> 'a Comparator.t -> 'a -> 'a -> Proposition.t Sample.t
A value of type a Comparator.t
wraps a compare function and a
pretty-printer for the type a
. For convenience, the function returns in
the Sample.t
context.
In this case we construct an int list Comparator.t
value using the list
combinator from the Comparator
module.
Let's see what happens when a test fails. Modifying the example above slightly:
let () =
check @@ fun () ->
equal Comparator.(list int) (List.rev [ 1; 2; 3 ]) [ 2; 3; 1 ]
This yields:
The comparator embeds a pretty-printer for being able to explain why a particular proposition failed for the given arguments.
Property-based tests
A property-based test is one that, instead of verifying a concrete test-case, confirms that a certain property holds for a large number of sampled data points. The technique was popularized by the Haskell library QuickCheck. Using property-based testing effectively is a topic of its own and is independent of the particular choice of library or language. Here is a good article with examples using F#.
The school-book example for a property-based test is one that verifies that reversing a list twice gives the same list back. Using Popper, it can be expressed as follows:
open Popper
open Sample.Syntax
let () =
check @@ fun () ->
let* xs = Sample.(list int) in
equal Comparator.(list int) (List.rev (List.rev xs)) xs
In the test, xs
is a sample of type list of integers. The test returns a
proposition that asserts that after reversing xs
twice we end up with the same list.
The let*
syntax defined in the Sample.Syntax
module provides a convenient
method for expressing this in a declarative way.
When run, it produces the following output:
The function check
evaluates the property multiple times (300 by default), by
drawing different samples and verifying the resulting proposition for each of
them.
Note
In Popper, there is no fundamental difference between unit tests and property-based tests. They are constructed and run using the same functions.
To see what happens when a test fails, here is an example to confirm that
List.sort
returns a sorted list, but where the condition does not account for
lists containing duplicated elements.
let rec is_sorted = function
| x :: y :: ys -> x < y && is_sorted (y :: ys)
| _ -> true
let () =
check @@ fun () ->
let* xs = Sample.(list int) in
is_true (is_sorted @@ List.sort Int.compare xs)
The function is_true
is used to construct a proposition stating that the given
boolean condition is true
.
This test fails with:
The output is not very informative as all it says is that the proposition failed
because the boolean condition was false
. In order to actually observe the
drawn sample for which the condition is_sorted
did not hold, we may add add
some logging. The simplest way is to use the function Sample.with_log
:
val with_log
: string
-> (Stdlib.Format.formatter -> 'a -> unit)
-> 'a Sample.t
-> 'a Sample.t
It takes a string value for the key, a formatter, and a sampler. It returns
a new Sample.t
value that also records the sampled values.
To quickly obtain a formatter (aka pretty-printer) for lists of integers, you
may derive it using [@@deriving show]
. We'll soon look at how to also derive
samplers and comparators as well. For now:
type int_list = int list [@@deriving show]
let () =
check @@ fun () ->
let* xs = Sample.(with_log "xs" pp_int_list (list int)) in
is_true (is_sorted @@ List.sort Int.compare xs)
The result, when run:
The log section now displays the sampled value xs
for which the test failed.
A property-based test such as the one above is run multiple times, a simple
unit-test is only run once. How does check
know how to distinguish between the
two? The answer is that a simple unit test does not consume any input when
generating the resulting proposition. Any property-based test must consume some
data for drawing (pseudo-random) samples. A value of type a Sample.t
is really
a parser that reads input from a stream of numbers (int32
values) and
produces a result of type a
. By varying the input stream we feed to the
parser, we obtain different results.
Note
Whenever a failing sample is encountered, the check/run
function attempts to find
a smaller counter-example by shrinking the input stream that was used for producing
the sample.
Test suites
The function check
is convenient for quickly running individual tests, but
when working with multiple tests it's likely better to bundle and run them
together.
There are three essential functions for this purpose:
test
for creating a test.suite
for naming and grouping tests.run
for running a test (or a test suite).
Their signatures are:
val test : ?config:Config.t -> (unit -> Proposition.t Sample.t) -> Test.t
val suite : (string * Test.t) list -> Test.t
val run : ?config:Config.t -> Test.t -> unit
Here's how you can combine the tests from above in a suite, and run it:
let test_rev =
test @@ fun () ->
equal Comparator.(list int) (List.rev [ 1; 2; 3 ]) [ 3; 2; 1 ]
let test_rev_twice =
test @@ fun () ->
let* xs = Sample.(list int) in
equal Comparator.(list int) (List.rev (List.rev xs)) xs
let test_sort =
test @@ fun () ->
let* xs = Sample.(with_log "xs" pp_int_list (list int)) in
is_true (is_sorted @@ List.sort Int.compare xs)
let tests =
suite
[ ("Reverse", test_rev)
; ("Reverse twice", test_rev_twice)
; ("Sort", test_sort)
]
let () = run tests
Hopefully the definitions are straight-forward. There are three tests which are
defined using the function test
. They are packed together by the suite
combinator and executed with run
.
This results in the output:
Note that the value tests
has the same type as test_rev
, test_rev_twice
,
and test_sort
, namely Test.t
. That means that test suites may also be
nested. For instance, as in:
let rev_suite =
suite [ ("Simple list", test_rev); ("Reverse twice", test_rev_twice) ]
let tests = suite [ ("Reverse", rev_suite); ("Sort", test_sort) ]
let () = run tests
Which yields:
The nesting is reflected in the output, such as Reverse -> Simple list
.
Note
There is only one type of tests — Test.t
values — and tests may be nested
arbitrarily using the suite
combinator.
Deriving samples
For property-based tests, we sample data of some type and use the values for verifying certain properties. There are two options for sampling the input data:
- Sample the data and define comparators manually (as done in the examples above).
- Derive sample and comparator functions using the
ppx_deriving_popper
ppx.
For example, consider a simple data type, exp
, for representing boolean
expressions:
type exp =
| Lit of bool
| And of exp * exp
| Or of exp * exp
| Not of exp
Along with an evaluation function, eval
, with a signature:
val eval : exp -> bool
What if we would like to write a test that verifies that Lit false
is the
identity expression for Or
? That is, for any expression e
, the following
proposition should evaluate to true/pass:
eval e = eval (Or (Lit false, e))
How can we sample exp
values? We'll cover how to do this manually in a bit
but the easiest way forward is to derive it. All that is needed is adding an
attribute [@@deriving .. popper]
. It also requires deriving ord
and show
for the comparator part.
type exp =
| Lit of bool
| And of exp * exp
| Or of exp * exp
| Not of exp
[@@deriving show, ord, popper]
At the preprocessing stage, the popper ppx generates two functions:
val exp_comparator : exp Comparator.t
val exp_sample : exp Sample.t
Alternatively, if you only care about deriving the sample function you may skip
show
and ord
and use sample
instead of popper
, i.e.:
type exp = ... [@@deriving sample]
Before writing the property-based test, here's (a buggy) implementation of eval
:
let rec eval = function
| Lit b -> b
| And (e1, e2) -> eval e1 && eval e2
| Or (e1, e2) -> eval e1 && eval e2
| Not b -> not @@ eval b
Using the exp_sample
function, the test is straight forward:
let test_or =
test @@ fun () ->
let* e = exp_sample in
is_true (eval e = eval (Or (Lit false, e)))
let () = run test_or
The test now fails after 5 successful samples:
Again, in order to actually see the expression that was generated in order for the proposition to fail, we need to add some logging:
let test_or =
test @@ fun () ->
let* e = Sample.with_log "e" pp_exp exp_sample in
is_true (eval e = eval (Or(Lit false, e)))
let () = run test_or
The with_log
requires providing a pretty-printer. We use pp_exp
which is
derived by show
.
When running the test, you now also see the logged values in the output:
Note
Deriving popper for a type called t
, generates functions called
sample
and comparator
, rather than t_sample
and t_comparator
.
This is to mirror the deriving behavior for show
and ord
.
Combining propositions
Just like tests can be composed into test suites, propositions can also be bundled. There are two functions:
val all : Proposition.t Sample.t list -> Proposition.t Sample.t
val any : Proposition.t Sample.t list -> Proposition.t Sample.t
They both take a list of propositions (for convenience Proposition.t
Sample.t
values) and collapse them into a single proposition. As the names
suggest, all
requires all propositions to pass, and any
requires at least
one to do so, in order to return a successful result.
Here's an example:
let test_id =
test @@ fun () ->
let* e = exp_sample in
all
[ is_true (eval e = eval (Or (Lit false, e)))
; is_true (eval e = eval (And (Lit true, e)))
]
You may also use the pure combinators from the Proposition
module,
which do not return in the Sample
context.
Configurations
As you may have noticed, the functions test
, check
, and run
all take an
optional configuration argument called config
. This allows overriding some
settings. Either at the level of an individual test or for all tests when
passed to run
. In case there are overlapping settings, the ones defined at
the lowest level, i.e. the ones passed to an individual test
take precedence.
Here are some examples of what may be configured:
num_samples
— for specifying the number of samples to be drawn for property-based tests. The default value is 300.verbose
— whether or not the logged values should be displayed for each sample.max_size
— this is a parameter that controls the maximum size parameter for data-structures like lists. It defaults to 100.
For the complete list, see the module Popper.Config.
First, let's see what happens when we run a test configured with verbose
.
Here's a modified version of the test_or
function from above:
let test_or =
test ~config:Config.verbose @@ fun () ->
let* e = Sample.with_log "e" pp_exp exp_sample in
is_true (eval e = eval (Or(Lit false, e)))
let () = run test_or
You now see what all the drawn expression samples look like:
Also changing the number of samples to be tested can be done by combining Config.t
values
with the Config.all
combinator. For example:
let test_not =
let config = Config.(all [verbose; num_samples 1000]) in
test ~config @@ fun () ->
let* e = Sample.with_log "e" pp_exp exp_sample in
is_true (not (eval e) = eval (Not e))
let () = run test_not
The output confirms that the number of evaluated samples is now 1000:
Note
If property-based tests are slow to execute, you may decrease the number of
samples using Config.num_samples
or decrease the max-size parameter with
Config.max_size
. This, of course, limits the test coverage.
Custom samples
It is not always feasible to derive generic sample functions for the data-types you need for testing. Specifically when:
- The data-type is abstract.
- There are invariants that the samples must satisfy.
In those cases you need to write your own sample function.
To give an example, consider testing an module, Image
, with the following
signature:
type color = White | Black | Transparent
type t
val make : width:int -> height:int -> (x:int -> y:int -> color) -> t
val render : t -> string
val invert : t -> t
Here's an implementation:
module Image = struct
type color = White | Black | Transparent
type t =
{ width : int
; height : int
; get_pixel : x:int -> y:int -> color
}
let make ~width ~height get_pixel = { width; height; get_pixel }
let render { width; height; get_pixel } =
List.init height (fun y ->
List.init width (fun x ->
match get_pixel ~x ~y with
| White -> "w "
| Black -> "b "
| Transparent -> "- ")
|> String.concat "")
|> String.concat "\n"
let invert { width; height; get_pixel } =
let get_pixel ~x ~y =
match get_pixel ~x ~y with
| White -> Black
| Black -> White
| Transparent -> White
in
{ width; height; get_pixel }
end
Now say you would like to write a test for a property that states that inverting
the image twice results in the same image when rendered. However, it is not
possible to derive a sample function for the abstract type Image.t
. Further,
even if the type had been exposed, we need to constrain the width
and the
height
parameters. Luckily, writing the sample function yourself is straight
forward — it's just a matter of generating the different parameters. Here is a
version where the width and height varies between 0
and 14
:
let sample_img =
let* width = Sample.Int.range 0 15 in
let* height = Sample.Int.range 0 15 in
let* lookup =
Sample.fn
(Sample.one_value_of [ Image.Black; Image.White; Image.Transparent ])
in
let get_pixel ~x ~y = lookup (x, y) in
Sample.return (Image.make ~width ~height get_pixel)
And here is how to use the function for defining a test:
let test_invert_twice =
test @@ fun () ->
let* img = sample_img in
equal
Comparator.string
(Image.render img)
(Image.render (Image.invert (Image.invert img)))
let suite = suite [ ("Invert twice", test_invert_twice) ]
let () = run suite
Running the test reveals a bug:
The test failed after the first sample, and 13 successful shrink-steps were performed.
Note
An important feature of Popper is that shrinking is embedded. It means that
when a failing sample is encountered and an attempt is made to shrink it,
the invariants used for constructing the sample are always respected! Thus,
in the example above — with the Image.t sample
— shrinking would never
cause, say, the height to be negative.
You can read more about the mechanisms behind sampling and shrinking in the how it works section.