From 12dfef88b04f73cd0d00eb313b8a69c421d9d9c9 Mon Sep 17 00:00:00 2001
From: Ava Dean <ava.dean@york.ac.uk>
Date: Thu, 12 Feb 2026 19:22:56 +0000
Subject: [PATCH] Overhaul of the regression testing module. Now introduces
 regression testing via an example of a manual test to illustrate the idea and
 the fallbacks of doing it manually. Then introduces Snaptol to automate the
 file management process, as well as introduce the capability of tolerances on
 comparisons involving floating point numbers. Introduce 3 exercises of
 increasing difficulty, along with solutions.

---
 episodes/09-testing-output-files.Rmd | 467 +++++++++++++++++++--------
 learners/setup.md                    |   2 +-
 2 files changed, 340 insertions(+), 129 deletions(-)

diff --git a/episodes/09-testing-output-files.Rmd b/episodes/09-testing-output-files.Rmd
index 56b81327..9c32440b 100644
--- a/episodes/09-testing-output-files.Rmd
+++ b/episodes/09-testing-output-files.Rmd
@@ -1,223 +1,434 @@
 ---
-title: 'Regression Testing and Plots'
+title: 'Regression Tests'
 teaching: 10
-exercises: 2
+exercises: 3
 ---
 
 :::::::::::::::::::::::::::::::::::::: questions 
 
-- How to test for changes in program outputs?
-- How to test for changes in plots?
+- How can we detect changes in program outputs?
+- How can snapshots make this easier?
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Learn how to test for changes in images & plots
+- Explain what regression tests are and when they’re useful
+- Write a manual regression test (save output and compare later)
+- Use Snaptol snapshots to simplify output/array regression testing
+- Use tolerances (rtol/atol) to handle numerical outputs safely
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-## Regression testing
+## Setup
 
-When you have a large processing pipeline or you are just starting out adding tests to an existing project, you might not have the
-time to carefully define exactly what each function should do, or your code may be so complex that it's hard to write unit tests for it all.
+To use the packages in this module, you will need to install them via,
 
-In these cases, you can use regression testing. This is where you just test that the output of a function matches the output of a previous version of the function.
+```bash
+pip install "git+https://github.com/PlasmaFAIR/snaptol"
+```
 
-The library `pytest-regtest` provides a simple way to do this. When writing a test, we pass the argument `regtest` to the test function and use `regtest.write()` to log the output of the function.
-This tells pytest-regtest to compare the output of the test to the output of the previous test run.
+## 1) Introduction
 
-To install `pytest-regtest`:
+In short, a regression test asks "this test used to produce X, does it still produce X?". This can help us detect
+unexpected or unwanted changes in the output of a program.
 
-```bash
-pip install pytest-regtest
-```
+It is good practice to add these types of tests to all projects. They are particularly useful,
+
+- when beginning to add tests to an existing project,
+
+- when adding unit tests to all parts of a project is not feasible,
+
+- to quickly give a good test coverage,
 
-::::::::::::::::::::::: callout
+- when it does not matter if the output is correct or not.
 
-This `regtest` argument is actually a fixture that is provided by the `pytest-regtest` package. It captures
-the output of the test function and compares it to the output of the previous test run. If the output is
-different, the test will fail.
+These types of tests are not a substitute for unit tests, but rather are complimentary.
 
-:::::::::::::::::::::::::::::::
 
-Let's make a regression test:
+## 2) Manual example
 
-- Create a new function in `statistics/stats.py` called `very_complex_processing()`:
+Let's make a regression test in a `test.py` file. It is going to utilise a "very complex" processing function to
+simulate the processing of data,
 
 ```python
+# test.py
 
 def very_complex_processing(data: list):
+    return [x ** 2 - 10 * x + 42 for x in data]
+```
+
+Let's write the basic structure for a test with example input data, but for now we will simply print the output,
+
+```python
+# test.py continued
 
-    # Do some very complex processing
-    processed_data = [x * 2 for x in data]
+def test_something():
+    input_data = [i for i in range(8)]
 
-    return processed_data
+    processed_data = very_complex_processing(input_data)
+
+    print(processed_data)
 ```
 
-- Then in `test_stats.py`, we can add a regression test for this function using the `regtest` argument.
+Let's run `pytest` with reduced verbosity `-q` and print the statement from the test `-s`,
+
+```console
+$ pytest -qs test.py
+[42, 33, 26, 21, 18, 17, 18, 21]
+.
+1 passed in 0.00s
+```
+
+We get a list of output numbers that simulate the result of a complex function in our project. Let's save this data at
+the top of our `test.py` file so that we can `assert` that it is always equal to the output of the processing function,
 
 ```python
-import pytest
+# test.py
 
-from stats import very_complex_processing
+SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21]
+
+def very_complex_processing(data: list):
+    return [x ** 2 - 10 * x + 42 for x in data]
 
-def test_very_complex_processing(regtest):
+def test_something():
+    input_data = [i for i in range(8)]
 
-    data = [1, 2, 3]
-    processed_data = very_complex_processing(data)
+    processed_data = very_complex_processing(input_data)
 
-    regtest.write(str(processed_data))
+    assert SNAPSHOT_DATA == processed_data
 ```
 
-- Now because we haven't run the test yet, there is no reference output to compare against, 
-so we need to generate it using the `--regtest-generate` flag:
+We call the saved version of the data a "snapshot".
 
-```bash
-pytest --regtest-generate
+We can now be assured that any development of the code that erroneously alters the output of the function will cause the
+test to fail. For example, suppose we slightly altered the `very_complex_processing` function,
+
+```python
+def very_complex_processing(data: list):
+    return [3 * x ** 2 - 10 * x + 42 for x in data]
+#           ^^^^ small change
 ```
 
-This tells pytest to run the test but instead of comparing the result, it will save the result for use in future tests.
+Then, running the test causes it to fail,
+```console
+$ pytest -q test.py
+F
+__________________________________ FAILURES _________________________________
+_______________________________ test_something ______________________________
 
-- Try running pytest and since we haven't changed how the function works, the test should pass.
+    def test_something():
+        input_data = [i for i in range(8)]
 
-- Then change the function to break the test and re-run pytest. The test will fail and show you the difference between the expected and actual output.
+        processed_data = very_complex_processing(input_data)
+
+>       assert SNAPSHOT_DATA == processed_data
+E       assert [42, 33, 26, 21, 18, 17, ...] == [42, 35, 34, 39, 50, 67, ...]
+E         At index 1 diff: 33 != 35
+
+test.py:12: AssertionError
+1 failed in 0.03s
+```
+
+If the change was intentional, then we could print the output again and update `SNAPSHOT_DATA`. Otherwise, we would want
+to investigate the cause of the change and fix it.
 
-```bash
 
-=== FAILURES ===
-___ test_very_complex_processing ___
+## 3) Snaptol
 
-regression test output differences for statistics/test_stats.py::test_very_complex_processing:
-(recorded output from statistics/_regtest_outputs/test_stats.test_very_complex_processing.out)
+So far, performing a regression test manually has been a bit tedious. Storing the output data at the top of our test
+file,
 
->   --- current
->   +++ expected
->   @@ -1 +1 @@
->   -[3, 6, 9]
->   +[2, 4, 6]
+- adds clutter,
+
+- is laborious,
+
+- is prone to errors.
+
+We could move the data to a separate file, but once again we would have to handle its contents manually.
+
+There are tools out there that can handle this for us, one widely known is Syrupy. A new tool has also been developed
+called Snaptol, that we will use here.
+
+Let's use the original `very_complex_processing` function, and introduce the `snaptolshot` fixture,
+
+```python
+# test.py
+
+def very_complex_processing(data: list):
+    return [x ** 2 - 10 * x + 42 for x in data]
+
+def test_something(snaptolshot):
+    input_data = [i for i in range(8)]
+
+    processed_data = very_complex_processing(input_data)
+
+    assert snaptolshot == processed_data
 ```
 
-Here we can see that it has picked up on the difference between the expected and actual output, and displayed it for us to see.
+Notice that we have replaced the `SNAPSHOT_DATA` variable with `snaptolshot`, which is an object provided by
+Snaptol that can handle the snapshot file management, amongst other smart features, for us.
 
-Regression tests, while not as powerful as unit tests, are a great way to quickly add tests to a project and ensure that changes to the code don't break existing functionality.
-It is also a good idea to add regression tests to your main processing pipelines just in case your unit tests don't cover all the edge cases, this will
-ensure that the output of your program remains consistent between versions.
+When we run the test for the first time, we will be met with a `FileNotFoundError`,
 
-## Testing plots
+```console
+$ pytest -q test.py
+F
+================================== FAILURES =================================
+_______________________________ test_something ______________________________
 
-When you are working with plots, you may want to test that the output is as expected. This can be done by comparing the output to a reference image or plot.
-The `pytest-mpl` package provides a simple way to do this, automating the comparison of the output of a test function to a reference image.
+    def test_something(snaptolshot):
+        input_data = [i for i in range(8)]
 
-To install `pytest-mpl`:
+        processed_data = very_complex_processing(input_data)
 
-```bash
-pip install pytest-mpl
+>       assert snaptolshot == processed_data
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+test.py:10:
+_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
+.../snapshot.py:167: FileNotFoundError
+========================== short test summary info ==========================
+FAILED test.py::test_something - FileNotFoundError: Snapshot file not found.
+1 failed in 0.03s
 ```
 
-- Create a new folder called `plotting` and add a file `plotting.py` with the following function:
+This is because we have not yet created the snapshot file. Let's run `snaptol` in update mode so that it knows to create
+the snapshot file for us. This is similar to the print, copy and paste step in the manual approach above,
 
-```python
-import matplotlib.pyplot as plt
+```console
+$ pytest -q test.py --snaptol-update
+.
+1 passed in 0.00s
+```
 
-def plot_data(data: list):
-    fig, ax = plt.subplots()
-    ax.plot(data)
-    return fig
+This tells us that the test performed successfully, and, because we were in update mode, an associated snapshot file was
+created with the name format `<test_file>.<test_name>.json` in a dedicated directory,
+
+```console
+$ tree
+.
+├── __snapshots__
+│   └── test.test_something.json
+└── test.py
 ```
 
-This function takes a list of points to plot, plots them and returns the figure produced.
+The contents of the JSON file are the same as in the manual example,
+```json
+[
+  42,
+  33,
+  26,
+  21,
+  18,
+  17,
+  18,
+  21
+]
+```
+
+As the data is saved in JSON format, almost any Python object can be used in a snapshot test – not just integers and
+lists.
+
+Just as previously, if we alter the function then the test will fail. We can similarly update the snapshot file with
+the new output with the `--snaptol-update` flag as above.
+
+::::::::::::::::::::::::::::::::::::: callout
 
-In order to test that this funciton produces the correct plots, we will need to store the correct plots to compare against.
-- Create a new folder called `test_plots` inside the `plotting` folder. This is where we will store the reference images.
+**Note:** `--snaptol-update` will only update snapshot files for tests that failed in the previous run of `pytest`. This
+is because the expected workflow is 1) run `pytest`, 2) observe a test failure, 3) if happy with the change then run
+the update, `--snaptol-update`. This stops the unnecessary rewrite of snapshot files in tests that pass – which is
+particularly important when we allow for tolerance as explained in the next section.
 
-`pytest-mpl` adds the `@pytest.mark.mpl_image_compare` decorator that is used to compare the output of a test function to a reference image.
-It takes a `baseline_dir` argument that specifies the directory where the reference images are stored.
+:::::::::::::::::::::::::::::::::::::::::::::
 
-- Create a new file called `test_plotting.py` in the `plotting` folder with the following content:
+
+### Floating point numbers
+
+Consider a simulation code that uses algorithms that depend on convergence – perhaps a complicated equation that does
+not have an exact answer but can be approximated numerically within a given tolerance. This, along with the common use
+of controlled randomised initial conditions, can lead to results that differ slightly between runs.
+
+In the example below, we approximate the value of pi using a random sample of points in a square. The exact
+implementation of the algorithm is not important, but it relies on the use of randomised input and as a result the
+determined value will vary slightly between runs.
 
 ```python
-import pytest
-from plotting import plot_data
+# test_tol.py
+import numpy as np
 
-@pytest.mark.mpl_image_compare(baseline_dir="test_plots/")
-def test_plot_data():
-    data = [1, 3, 2]
-    fig = plot_data(data)
-    return fig
-```
+def approximate_pi(random_points: np.ndarray):
+    return 4 * np.mean(np.sum(random_points ** 2, axis=1) <= 1)
 
-Here we have told pytest that we want it to compare the output of the `test_plot_data` function to the images in the `test_plots` directory.
+def test_something(snaptolshot):
+    rng = np.random.default_rng()
 
-- Run the following command to generate the reference image:
-(make sure you are in the base directory in your project and not in the plotting folder)
+    random_points_in_square = rng.uniform(-1.0, 1.0, size=(10000000, 2))
 
-```bash
-pytest --mpl-generate-path=plotting/test_plots
+    result = approximate_pi(random_points_in_square)
+
+    print(result)
+
+    assert snaptolshot(rtol=1e-03, atol=0.0) == result
 ```
 
-This tells pytest to run the test but instead of comparing the result, it will save the result into the `test_plots` directory for use in future tests.
+Let's run the test initially like before but create the snapshot file straight away by running in update mode,
 
-Now we have the reference image, we can run the test to ensure that the output of `plot_data` matches the reference image.
-Pytest doesn't check the images by default, so we need to pass it the `--mpl` flag to tell it to check the images.
+```console
+$ pytest -qs test_tol.py --snaptol-update-all
+3.1423884
+.
+1 passed in 0.30s
+```
 
-```bash
-pytest --mpl
+Even with ten million data points, the approximation of pi, 3.1423884, isn't great!
+
+::::::::::::::::::::::::::::::::::::: callout
+
+**Note:** remember that the result of a regression test is not the important part, but rather on how that result changes
+in future runs. We want to focus on whether our code reproduces the result in future runs – in this case within a given
+tolerance to account for the randomness.
+
+:::::::::::::::::::::::::::::::::::::::::::::
+
+In the test above, you may have noticed that we supplied `rtol` and `atol` arguments to the `snaptolshot` fixture. These
+are used to control the tolerance of the comparison between the snapshot and the actual output. This means on future
+runs of the test, the computed value will not be required to exactly match the snapshot, but rather within the given
+tolerance. Remember,
+
+- `rtol` is the relative tolerance, useful for handling large numbers (e.g magnitude much greater than 1),
+- `atol` is the absolute tolerance, useful for numbers "near zero" (e.g magnitude much less than 1).
+
+If we run the test again, we see the printed output is different to that saved to file, but the test still passes,
+
+```console
+$ pytest -qs test_tol.py
+3.1408724
+.
+1 passed in 0.24s
 ```
 
-Since we just generated the reference image, the test should pass.
 
-Now let's edit the `plot_data` function to plot a different set of points by adding a 4 to the data:
+## Exercises
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Create your own regression test
+
+- Add the below code to a new file and add your own code to the `...` sections.
+
+- On the first run, capture the output of your implemented `very_complex_processing` function and store it
+appropriately.
+
+- After, ensure the test compares the stored data to the result, and passes successfully. Avoid using `float`s for now.
 
 ```python
-import matplotlib.pyplot as plt
+def very_complex_processing(data):
+    return ...
+
+def test_something():
+    input_data = ...
+
+    processed_data = very_complex_processing(input_data)
 
-def plot_data(data: list):
-    fig, ax = plt.subplots()
-    # Add 4 to the data
-    data.append(4)
-    ax.plot(data)
-    return fig
+    assert ...
 ```
 
-- Now re-run the test. You should see that it fails.
+:::::::::::::::::::::::: solution
 
-```bash
-=== FAILURES ===
-___ test_plot_data ___
-Error: Image files did not match.
-  RMS Value: 15.740441786649093
-  Expected:  
-    /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/baseline.png
-  Actual:    
-    /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/result.png
-  Difference:
-    /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/result-failed-diff.png
-  Tolerance: 
-    2
+```python
+SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21]
+
+def very_complex_processing(data: list):
+    return [x ** 2 - 10 * x + 42 for x in data]
+
+def test_something():
+    input_data = [i for i in range(8)]
+
+    processed_data = very_complex_processing(input_data)
+
+    assert SNAPSHOT_DATA == processed_data
 ```
 
-Notice that the test shows you three image files.
-(All of these files are stored in a temporary directory that pytest creates when running the test.
-Depending on your system, you may be able to click on the paths to view the images. Try holding down CTRL or Command and clicking on the path.)
+:::::::::::::::::::::::::::::::::
 
+:::::::::::::::::::::::::::::::::::::::::::::::
 
-- The first, "Expected" is the reference image that the test is comparing against.
-- The second, "Actual" is the image that was produced by the test.
-- And the third is a difference image that shows the differences between the two images. This is very useful as it enables us to cleraly see
-what went wrong with the plotting, allowing us to fix the issue more easily. In this example, we can clearly see that the axes ticks are different, and
-the line plot is a completely different shape.
+::::::::::::::::::::::::::::::::::::: challenge
 
-This doesn't just work with line plots, but with any type of plot that matplotlib can produce.
+## Implement a regression test with Snaptol
 
-Testing your plots can be very useful especially if your project allows users to define their own plots.
+- Using the `approximate_pi` function above, implement a regression test using the `snaptolshot` object.
 
+- On the first pass, ensure that it fails due to a `FileNotFoundError`.
 
-::::::::::::::::::::::::::::::::::::: keypoints 
+- Run it in update mode to save the snapshot, and ensure it passes successfuly on future runs.
 
-- Regression testing ensures that the output of a function remains consistent between changes and are a great first step in adding tests to an existing project.
-- `pytest-regtest` provides a simple way to do regression testing.
-- `pytest-mpl` provides a simple way to test plots by comparing the output of a test function to a reference image.
+:::::::::::::::::::::::: solution
 
-::::::::::::::::::::::::::::::::::::::::::::::::
+```python
+import numpy as np
+
+def approximate_pi(random_points: np.ndarray):
+    return 4 * np.mean(np.sum(random_points ** 2, axis=1) <= 1)
+
+def test_something(snaptolshot):
+    rng = np.random.default_rng()
+
+    random_points_in_square = rng.uniform(-1.0, 1.0, size=(10000000, 2))
+
+    result = approximate_pi(random_points_in_square)
+
+    assert snaptolshot(rtol=1e-03, atol=0.0) == result
+```
+
+:::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## More complex regression tests
+
+- Create two separate tests that both utilise the `approximate_pi` function as a fixture.
+
+- Using different tolerances for each test, assert that the first passes successfully, and assert that the second raises
+an `AssertionError`. Hints: 1) remember to look back at the "Testing for Exceptions" and "Fixtures" modules, 2) the
+error in the pi calculation algorithm is $\frac{1}{\sqrt{N}}$ where $N$ is the number of points used.
+
+:::::::::::::::::::::::: solution
+
+```python
+import numpy as np
+import pytest
+
+@pytest.fixture
+def approximate_pi():
+    rng = np.random.default_rng()
+
+    random_points = rng.uniform(-1.0, 1.0, size=(10000000, 2))
+
+    return 4 * np.mean(np.sum(random_points ** 2, axis=1) <= 1)
+
+def test_pi_passes(snaptolshot, approximate_pi):
+    # Passes due to loose tolerance.
+    assert snaptolshot(rtol=1e-03, atol=0.0) == approximate_pi
+
+def test_pi_fails(snaptolshot, approximate_pi):
+    # Fails due to tight tolerance.
+    with pytest.raises(AssertionError):
+        assert snaptolshot(rtol=1e-04, atol=0.0) == approximate_pi
+```
+
+:::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::::::::::
+
+
+::::::::::::::::::::::::::::::::::::: keypoints
+
+- Regression testing ensures that the output of a function remains consistent between test runs.
+- The `pytest` plugin, `snaptol`, can be used to simplify this process and cater for floating point numbers that may
+need tolerances on assertion checks.
 
+:::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/learners/setup.md b/learners/setup.md
index bf0b22e4..2af91ed7 100644
--- a/learners/setup.md
+++ b/learners/setup.md
@@ -36,7 +36,7 @@ conda activate myenv
 There are some python packages that will be needed in this course, you can install them using the following command:
 
 ```bash
-pip install numpy pandas matplotlib pytest pytest-regtest pytest-mpl
+pip install numpy pandas matplotlib pytest pytest-regtest pytest-mpl snaptol
 ```
 
 ### Git