diff --git a/episodes/09-testing-output-files.Rmd b/episodes/09-testing-output-files.Rmd index 56b81327..9c32440b 100644 --- a/episodes/09-testing-output-files.Rmd +++ b/episodes/09-testing-output-files.Rmd @@ -1,223 +1,434 @@ --- -title: 'Regression Testing and Plots' +title: 'Regression Tests' teaching: 10 -exercises: 2 +exercises: 3 --- :::::::::::::::::::::::::::::::::::::: questions -- How to test for changes in program outputs? -- How to test for changes in plots? +- How can we detect changes in program outputs? +- How can snapshots make this easier? :::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::: objectives -- Learn how to test for changes in images & plots +- Explain what regression tests are and when they’re useful +- Write a manual regression test (save output and compare later) +- Use Snaptol snapshots to simplify output/array regression testing +- Use tolerances (rtol/atol) to handle numerical outputs safely :::::::::::::::::::::::::::::::::::::::::::::::: -## Regression testing +## Setup -When you have a large processing pipeline or you are just starting out adding tests to an existing project, you might not have the -time to carefully define exactly what each function should do, or your code may be so complex that it's hard to write unit tests for it all. +To use the packages in this module, you will need to install them via, -In these cases, you can use regression testing. This is where you just test that the output of a function matches the output of a previous version of the function. +```bash +pip install "git+https://github.com/PlasmaFAIR/snaptol" +``` -The library `pytest-regtest` provides a simple way to do this. When writing a test, we pass the argument `regtest` to the test function and use `regtest.write()` to log the output of the function. -This tells pytest-regtest to compare the output of the test to the output of the previous test run. +## 1) Introduction -To install `pytest-regtest`: +In short, a regression test asks "this test used to produce X, does it still produce X?". This can help us detect +unexpected or unwanted changes in the output of a program. -```bash -pip install pytest-regtest -``` +It is good practice to add these types of tests to all projects. They are particularly useful, + +- when beginning to add tests to an existing project, + +- when adding unit tests to all parts of a project is not feasible, + +- to quickly give a good test coverage, -::::::::::::::::::::::: callout +- when it does not matter if the output is correct or not. -This `regtest` argument is actually a fixture that is provided by the `pytest-regtest` package. It captures -the output of the test function and compares it to the output of the previous test run. If the output is -different, the test will fail. +These types of tests are not a substitute for unit tests, but rather are complimentary. -::::::::::::::::::::::::::::::: -Let's make a regression test: +## 2) Manual example -- Create a new function in `statistics/stats.py` called `very_complex_processing()`: +Let's make a regression test in a `test.py` file. It is going to utilise a "very complex" processing function to +simulate the processing of data, ```python +# test.py def very_complex_processing(data: list): + return [x ** 2 - 10 * x + 42 for x in data] +``` + +Let's write the basic structure for a test with example input data, but for now we will simply print the output, + +```python +# test.py continued - # Do some very complex processing - processed_data = [x * 2 for x in data] +def test_something(): + input_data = [i for i in range(8)] - return processed_data + processed_data = very_complex_processing(input_data) + + print(processed_data) ``` -- Then in `test_stats.py`, we can add a regression test for this function using the `regtest` argument. +Let's run `pytest` with reduced verbosity `-q` and print the statement from the test `-s`, + +```console +$ pytest -qs test.py +[42, 33, 26, 21, 18, 17, 18, 21] +. +1 passed in 0.00s +``` + +We get a list of output numbers that simulate the result of a complex function in our project. Let's save this data at +the top of our `test.py` file so that we can `assert` that it is always equal to the output of the processing function, ```python -import pytest +# test.py -from stats import very_complex_processing +SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21] + +def very_complex_processing(data: list): + return [x ** 2 - 10 * x + 42 for x in data] -def test_very_complex_processing(regtest): +def test_something(): + input_data = [i for i in range(8)] - data = [1, 2, 3] - processed_data = very_complex_processing(data) + processed_data = very_complex_processing(input_data) - regtest.write(str(processed_data)) + assert SNAPSHOT_DATA == processed_data ``` -- Now because we haven't run the test yet, there is no reference output to compare against, -so we need to generate it using the `--regtest-generate` flag: +We call the saved version of the data a "snapshot". -```bash -pytest --regtest-generate +We can now be assured that any development of the code that erroneously alters the output of the function will cause the +test to fail. For example, suppose we slightly altered the `very_complex_processing` function, + +```python +def very_complex_processing(data: list): + return [3 * x ** 2 - 10 * x + 42 for x in data] +# ^^^^ small change ``` -This tells pytest to run the test but instead of comparing the result, it will save the result for use in future tests. +Then, running the test causes it to fail, +```console +$ pytest -q test.py +F +__________________________________ FAILURES _________________________________ +_______________________________ test_something ______________________________ -- Try running pytest and since we haven't changed how the function works, the test should pass. + def test_something(): + input_data = [i for i in range(8)] -- Then change the function to break the test and re-run pytest. The test will fail and show you the difference between the expected and actual output. + processed_data = very_complex_processing(input_data) + +> assert SNAPSHOT_DATA == processed_data +E assert [42, 33, 26, 21, 18, 17, ...] == [42, 35, 34, 39, 50, 67, ...] +E At index 1 diff: 33 != 35 + +test.py:12: AssertionError +1 failed in 0.03s +``` + +If the change was intentional, then we could print the output again and update `SNAPSHOT_DATA`. Otherwise, we would want +to investigate the cause of the change and fix it. -```bash -=== FAILURES === -___ test_very_complex_processing ___ +## 3) Snaptol -regression test output differences for statistics/test_stats.py::test_very_complex_processing: -(recorded output from statistics/_regtest_outputs/test_stats.test_very_complex_processing.out) +So far, performing a regression test manually has been a bit tedious. Storing the output data at the top of our test +file, -> --- current -> +++ expected -> @@ -1 +1 @@ -> -[3, 6, 9] -> +[2, 4, 6] +- adds clutter, + +- is laborious, + +- is prone to errors. + +We could move the data to a separate file, but once again we would have to handle its contents manually. + +There are tools out there that can handle this for us, one widely known is Syrupy. A new tool has also been developed +called Snaptol, that we will use here. + +Let's use the original `very_complex_processing` function, and introduce the `snaptolshot` fixture, + +```python +# test.py + +def very_complex_processing(data: list): + return [x ** 2 - 10 * x + 42 for x in data] + +def test_something(snaptolshot): + input_data = [i for i in range(8)] + + processed_data = very_complex_processing(input_data) + + assert snaptolshot == processed_data ``` -Here we can see that it has picked up on the difference between the expected and actual output, and displayed it for us to see. +Notice that we have replaced the `SNAPSHOT_DATA` variable with `snaptolshot`, which is an object provided by +Snaptol that can handle the snapshot file management, amongst other smart features, for us. -Regression tests, while not as powerful as unit tests, are a great way to quickly add tests to a project and ensure that changes to the code don't break existing functionality. -It is also a good idea to add regression tests to your main processing pipelines just in case your unit tests don't cover all the edge cases, this will -ensure that the output of your program remains consistent between versions. +When we run the test for the first time, we will be met with a `FileNotFoundError`, -## Testing plots +```console +$ pytest -q test.py +F +================================== FAILURES ================================= +_______________________________ test_something ______________________________ -When you are working with plots, you may want to test that the output is as expected. This can be done by comparing the output to a reference image or plot. -The `pytest-mpl` package provides a simple way to do this, automating the comparison of the output of a test function to a reference image. + def test_something(snaptolshot): + input_data = [i for i in range(8)] -To install `pytest-mpl`: + processed_data = very_complex_processing(input_data) -```bash -pip install pytest-mpl +> assert snaptolshot == processed_data + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +test.py:10: +_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ +.../snapshot.py:167: FileNotFoundError +========================== short test summary info ========================== +FAILED test.py::test_something - FileNotFoundError: Snapshot file not found. +1 failed in 0.03s ``` -- Create a new folder called `plotting` and add a file `plotting.py` with the following function: +This is because we have not yet created the snapshot file. Let's run `snaptol` in update mode so that it knows to create +the snapshot file for us. This is similar to the print, copy and paste step in the manual approach above, -```python -import matplotlib.pyplot as plt +```console +$ pytest -q test.py --snaptol-update +. +1 passed in 0.00s +``` -def plot_data(data: list): - fig, ax = plt.subplots() - ax.plot(data) - return fig +This tells us that the test performed successfully, and, because we were in update mode, an associated snapshot file was +created with the name format `..json` in a dedicated directory, + +```console +$ tree +. +├── __snapshots__ +│ └── test.test_something.json +└── test.py ``` -This function takes a list of points to plot, plots them and returns the figure produced. +The contents of the JSON file are the same as in the manual example, +```json +[ + 42, + 33, + 26, + 21, + 18, + 17, + 18, + 21 +] +``` + +As the data is saved in JSON format, almost any Python object can be used in a snapshot test – not just integers and +lists. + +Just as previously, if we alter the function then the test will fail. We can similarly update the snapshot file with +the new output with the `--snaptol-update` flag as above. + +::::::::::::::::::::::::::::::::::::: callout -In order to test that this funciton produces the correct plots, we will need to store the correct plots to compare against. -- Create a new folder called `test_plots` inside the `plotting` folder. This is where we will store the reference images. +**Note:** `--snaptol-update` will only update snapshot files for tests that failed in the previous run of `pytest`. This +is because the expected workflow is 1) run `pytest`, 2) observe a test failure, 3) if happy with the change then run +the update, `--snaptol-update`. This stops the unnecessary rewrite of snapshot files in tests that pass – which is +particularly important when we allow for tolerance as explained in the next section. -`pytest-mpl` adds the `@pytest.mark.mpl_image_compare` decorator that is used to compare the output of a test function to a reference image. -It takes a `baseline_dir` argument that specifies the directory where the reference images are stored. +::::::::::::::::::::::::::::::::::::::::::::: -- Create a new file called `test_plotting.py` in the `plotting` folder with the following content: + +### Floating point numbers + +Consider a simulation code that uses algorithms that depend on convergence – perhaps a complicated equation that does +not have an exact answer but can be approximated numerically within a given tolerance. This, along with the common use +of controlled randomised initial conditions, can lead to results that differ slightly between runs. + +In the example below, we approximate the value of pi using a random sample of points in a square. The exact +implementation of the algorithm is not important, but it relies on the use of randomised input and as a result the +determined value will vary slightly between runs. ```python -import pytest -from plotting import plot_data +# test_tol.py +import numpy as np -@pytest.mark.mpl_image_compare(baseline_dir="test_plots/") -def test_plot_data(): - data = [1, 3, 2] - fig = plot_data(data) - return fig -``` +def approximate_pi(random_points: np.ndarray): + return 4 * np.mean(np.sum(random_points ** 2, axis=1) <= 1) -Here we have told pytest that we want it to compare the output of the `test_plot_data` function to the images in the `test_plots` directory. +def test_something(snaptolshot): + rng = np.random.default_rng() -- Run the following command to generate the reference image: -(make sure you are in the base directory in your project and not in the plotting folder) + random_points_in_square = rng.uniform(-1.0, 1.0, size=(10000000, 2)) -```bash -pytest --mpl-generate-path=plotting/test_plots + result = approximate_pi(random_points_in_square) + + print(result) + + assert snaptolshot(rtol=1e-03, atol=0.0) == result ``` -This tells pytest to run the test but instead of comparing the result, it will save the result into the `test_plots` directory for use in future tests. +Let's run the test initially like before but create the snapshot file straight away by running in update mode, -Now we have the reference image, we can run the test to ensure that the output of `plot_data` matches the reference image. -Pytest doesn't check the images by default, so we need to pass it the `--mpl` flag to tell it to check the images. +```console +$ pytest -qs test_tol.py --snaptol-update-all +3.1423884 +. +1 passed in 0.30s +``` -```bash -pytest --mpl +Even with ten million data points, the approximation of pi, 3.1423884, isn't great! + +::::::::::::::::::::::::::::::::::::: callout + +**Note:** remember that the result of a regression test is not the important part, but rather on how that result changes +in future runs. We want to focus on whether our code reproduces the result in future runs – in this case within a given +tolerance to account for the randomness. + +::::::::::::::::::::::::::::::::::::::::::::: + +In the test above, you may have noticed that we supplied `rtol` and `atol` arguments to the `snaptolshot` fixture. These +are used to control the tolerance of the comparison between the snapshot and the actual output. This means on future +runs of the test, the computed value will not be required to exactly match the snapshot, but rather within the given +tolerance. Remember, + +- `rtol` is the relative tolerance, useful for handling large numbers (e.g magnitude much greater than 1), +- `atol` is the absolute tolerance, useful for numbers "near zero" (e.g magnitude much less than 1). + +If we run the test again, we see the printed output is different to that saved to file, but the test still passes, + +```console +$ pytest -qs test_tol.py +3.1408724 +. +1 passed in 0.24s ``` -Since we just generated the reference image, the test should pass. -Now let's edit the `plot_data` function to plot a different set of points by adding a 4 to the data: +## Exercises + +::::::::::::::::::::::::::::::::::::: challenge + +## Create your own regression test + +- Add the below code to a new file and add your own code to the `...` sections. + +- On the first run, capture the output of your implemented `very_complex_processing` function and store it +appropriately. + +- After, ensure the test compares the stored data to the result, and passes successfully. Avoid using `float`s for now. ```python -import matplotlib.pyplot as plt +def very_complex_processing(data): + return ... + +def test_something(): + input_data = ... + + processed_data = very_complex_processing(input_data) -def plot_data(data: list): - fig, ax = plt.subplots() - # Add 4 to the data - data.append(4) - ax.plot(data) - return fig + assert ... ``` -- Now re-run the test. You should see that it fails. +:::::::::::::::::::::::: solution -```bash -=== FAILURES === -___ test_plot_data ___ -Error: Image files did not match. - RMS Value: 15.740441786649093 - Expected: - /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/baseline.png - Actual: - /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/result.png - Difference: - /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/result-failed-diff.png - Tolerance: - 2 +```python +SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21] + +def very_complex_processing(data: list): + return [x ** 2 - 10 * x + 42 for x in data] + +def test_something(): + input_data = [i for i in range(8)] + + processed_data = very_complex_processing(input_data) + + assert SNAPSHOT_DATA == processed_data ``` -Notice that the test shows you three image files. -(All of these files are stored in a temporary directory that pytest creates when running the test. -Depending on your system, you may be able to click on the paths to view the images. Try holding down CTRL or Command and clicking on the path.) +::::::::::::::::::::::::::::::::: +::::::::::::::::::::::::::::::::::::::::::::::: -- The first, "Expected" is the reference image that the test is comparing against. -- The second, "Actual" is the image that was produced by the test. -- And the third is a difference image that shows the differences between the two images. This is very useful as it enables us to cleraly see -what went wrong with the plotting, allowing us to fix the issue more easily. In this example, we can clearly see that the axes ticks are different, and -the line plot is a completely different shape. +::::::::::::::::::::::::::::::::::::: challenge -This doesn't just work with line plots, but with any type of plot that matplotlib can produce. +## Implement a regression test with Snaptol -Testing your plots can be very useful especially if your project allows users to define their own plots. +- Using the `approximate_pi` function above, implement a regression test using the `snaptolshot` object. +- On the first pass, ensure that it fails due to a `FileNotFoundError`. -::::::::::::::::::::::::::::::::::::: keypoints +- Run it in update mode to save the snapshot, and ensure it passes successfuly on future runs. -- Regression testing ensures that the output of a function remains consistent between changes and are a great first step in adding tests to an existing project. -- `pytest-regtest` provides a simple way to do regression testing. -- `pytest-mpl` provides a simple way to test plots by comparing the output of a test function to a reference image. +:::::::::::::::::::::::: solution -:::::::::::::::::::::::::::::::::::::::::::::::: +```python +import numpy as np + +def approximate_pi(random_points: np.ndarray): + return 4 * np.mean(np.sum(random_points ** 2, axis=1) <= 1) + +def test_something(snaptolshot): + rng = np.random.default_rng() + + random_points_in_square = rng.uniform(-1.0, 1.0, size=(10000000, 2)) + + result = approximate_pi(random_points_in_square) + + assert snaptolshot(rtol=1e-03, atol=0.0) == result +``` + +::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: challenge + +## More complex regression tests + +- Create two separate tests that both utilise the `approximate_pi` function as a fixture. + +- Using different tolerances for each test, assert that the first passes successfully, and assert that the second raises +an `AssertionError`. Hints: 1) remember to look back at the "Testing for Exceptions" and "Fixtures" modules, 2) the +error in the pi calculation algorithm is $\frac{1}{\sqrt{N}}$ where $N$ is the number of points used. + +:::::::::::::::::::::::: solution + +```python +import numpy as np +import pytest + +@pytest.fixture +def approximate_pi(): + rng = np.random.default_rng() + + random_points = rng.uniform(-1.0, 1.0, size=(10000000, 2)) + + return 4 * np.mean(np.sum(random_points ** 2, axis=1) <= 1) + +def test_pi_passes(snaptolshot, approximate_pi): + # Passes due to loose tolerance. + assert snaptolshot(rtol=1e-03, atol=0.0) == approximate_pi + +def test_pi_fails(snaptolshot, approximate_pi): + # Fails due to tight tolerance. + with pytest.raises(AssertionError): + assert snaptolshot(rtol=1e-04, atol=0.0) == approximate_pi +``` + +::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::::::::: + + +::::::::::::::::::::::::::::::::::::: keypoints + +- Regression testing ensures that the output of a function remains consistent between test runs. +- The `pytest` plugin, `snaptol`, can be used to simplify this process and cater for floating point numbers that may +need tolerances on assertion checks. +::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/learners/setup.md b/learners/setup.md index bf0b22e4..2af91ed7 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -36,7 +36,7 @@ conda activate myenv There are some python packages that will be needed in this course, you can install them using the following command: ```bash -pip install numpy pandas matplotlib pytest pytest-regtest pytest-mpl +pip install numpy pandas matplotlib pytest pytest-regtest pytest-mpl snaptol ``` ### Git