How to write tests in Python - Writing Tests

Writing tests differs slightly from writing software. For the most part, when writing software, it's usually considered to be a good practice to make your code as abstract and DRY as possible. However, this is not always true for writing tests. In order to write completely DRY and abstract code, logic will need to be used. Introducing logic can cause bugs.

First and foremost: you should always be able to trust your tests. If you don't trust your tests you can't trust the software that it is testing. In order to develop trust in your tests you have to write tests with as little logic as possible. Each branch that is introduced in testing is a new chance for failure. This does not, however, restrict the use of functions or for loops in tests. I would keep the use of for loops to a minimum as a large amount of looping ends up requiring some sort of aggregation of results in order for the test results to be human readable. And, like if statements, aggregation is another chance for failure to occur.

Contracts and Requirements

There are two types of tests: unit tests and system tests (sometimes known as integration tests). Both, in some way, shape, or form, perform testing of contracts. A contract is an agreement between two parties: given a certain input, a specific output will be returned or action performed. Another way to think about it is via something called requirements. Typically, before code is written, there are usually a set of requirements that dictate what the program is supposed to do. Say that you've been asked to write a program that takes in a string and prints the string in reverse order. In this case, the requirements are simply to reverse the string and print out the result. When writing tests, the requirements (a.k.a. the contract) should be the only thing tested! Allow me to demonstrate my point with an example.

First I will write a simple test (using no particular framework and only Python's assert statement)

def test_the_program():
    expected = 'fdsa'
    actual = my_function('asdf')
    assert actual == expected, 'Actual "{}" != expected "{}"'.format(actual, expected)

And a possible implementation for the above requirements might be:

def my_function(s):
    chars = list(s)
    chars.reverse()
    s = ''
    for c in chars:
        s += c
    return s

If I were to call test_the_program() against my reversing function, it would pass (by not erring out). And since my test is testing the requirements and not the implementation itself, I could rewrite my_function() (maybe optimize it?), and the test should still pass.

def my_function(s):
    return ''.join(c for c in reversed(s))

Running test_the_program() one more time proves that my new implementation works just like the previous implementation. I know that the requirements are met since the test passes.

Unit Tests

A unit test will test the smallest possible unit. In the case of Python, the function is usually considered to be the smallest unit. A unit test should accurately reflect the contract for that unit. E.g. if a contract states that given an input X the function will return the output Y, then there should be a test calling that function with the value X and asserting that Y was returned.

For example, the contract for the foo() function might state that

If given 'X' foo() will return 'Y', otherwise return 'Z'

Then, a possible implementation might look like:

# example/subpackage1/module1.py

def foo(parameter):
   """
   If given 'X', this function will eventually return 'Y', otherwise 'Z'

   Args:
       paramter (str): the paramater passed to foo

   Returns:
       str: the value to return
   """
   if parameter == 'X':
       return 'Y'
   return 'Z'

There should then be two tests to fully test the foo() function. One for each decision branch so that return 'Y' and return 'Z' are both covered.

# tests/subpackage1/module1/test_foo.py

from . import BaseModule1TestCase

from example.subpackage1 import module1


class FooTestCase(BaseModule1TestCase):
    def test_that_when_passed_X_then_Y_is_returned(self):
        actual = module1.foo('X')
        expected = 'Y'
        self.assertEqual(actual, expected, 'Y should be the return value.')

    def test_that_when_something_other_than_X_is_passed_in_Z_is_returned(self):
        actual = module1.foo('something else')
        expected = 'Z'
        self.assertEqual(actual, expected, 'Z should be the return value.')

Running both of these tests should pass and foo()'s requirements should then be completely covered.

Note: the phrase decision branch might imply that I'm testing the implementation rather than the requirements, however, the phrase is still being used to describe the requirements of the function. An alternate implementation will still pass with the same tests:

# example/subpackage1/module1.py
return_values = {'X': 'Y'}
def foo(parameter):
    return return_values.get(parameter, 'Z')

Problems with Unit Testing

From experience, I've found that the benefits of unit testing usually don't outweigh the costs. The only time that unit tests seem to be worth the effort is for (mostly) pure functions. A pure function is a function that establishes a solid contract and does not rely on the environment or a certain state in order to satisfy that contract. I normally only like to write unit tests for utility functions as they are usually pure. Almost everything else is better tested with a system test because it relies on environment or state in order to be tested.

If the function requires mocking, the tests for that function are usually dependent upon the function's implementation; thus, if the function's implementation changes, the test will fail. Unless the function is pure (it only relies on standard lib functions and there isn't a lot of setup required to call the function), then it's usually hard to test a function without mocking. This sort of need seems to happen more often than not and the maintenance required for tests with mocking is usually not worth the potential bugs that they might catch. Usually, a system test will catch more bugs with less maintenance overhead, however, there's usually a significant amount of overhead in creating and setting up a system test as some sort of environment needs to be set up and managed.

System Tests

Before an application has code written, it should have a set of requirements to define what the application is and does. The requirements at this level are considered top-level requirements as they govern what the entire system is supposed to do. A system test is a test that tests one of these top-level requirements.

Much like contracts for unit tests, when the user does some input/event X, event Y should occur. System tests are usually more reliable than unit tests because it requires the software to be set up and initialized in the same way that it will be used. Thus, the tests need to reproduce the exact same environment needed in order to run. For frameworks, this might require that temporary projects be created and destroyed per test. For libraries or simple console scripts, this might require a file or set of files to be created and either stored as assets in the test repo or created dynamically per test.

I've found that there is usually a nice amount of overhead in creating a framework for system tests. But, if they're set up properly, then they will be easily extendable and easy to maintain. Once the framework itself is set up, writing system tests are no more difficult than writing unit tests and are much more valuable.

An example

Let's try and implement a simple version of the cat program. For those that are unfamiliar with cat, it is a program that prints the contents of a file to the console. The requirements for our cat doppelganger will be:

  • given a filename, print the file's contents to stdout and return a zero status code
  • if the filename given doesn't exist, print an error message and return a non-zero status code

Setting up the testing framework

In order to test this application, we will need to provide an actual file on disk. If any of my projects require some external file to be available for a test, I will either store an example file as an asset within the project repo or I will create the file dynamically at test time. I don't necessarily think one way is better than the other; it's all dependent upon the test being written.

To write this test, I will need to know the contents of the file to make sure that the program's output matches the contents of the file. To do this, I'm going to create an asset file in a folder named assets. I will then need to read from that asset file and store the contents as the expected value. The actual output from running our cat should match the expected string.

All of this functionality could be written inside of the test itself. However, in the event that another test needs similar functionality, it will probably be best to add all of the functionality to the base test case class.

Somewhere, configuration should occur. E.g. we need to tell the test framework where the assets directory lives relative to the tests. For something like this, I would create a constant that lives in the same module as the base test case class and set its value to the path to the assets directory relative to that module.

# assume that the `assets` dir is adjacent to the `tests` dir
ASSETS_DIR = os.path.normpath(os.path.join(
                 os.path.dirname(__file__), '..', '..', 'assets'))

Then, I'll create a helper method that retrieves the path to an asset file given the relative path to the asset file from the asset root.

class BaseTestCase(TestCase):
    def asset_filename(self, path):
        """
        Given a path relative to the assets directory, returns the absolute
        path to the filename

        Returns:
            str: absolute path to the asset file given

        Example:
            >>> asset_filename('path/to/file.txt')
            '/home/user/aaron/cat/assets/path/to/file.txt'
        """
        return os.path.normpath(os.path.join(ASSETS_DIR, path))

Another useful helper method would be to read the contents of an asset file and return it as a string.

    # ...
    def asset_contents(self, path):
        """
        Returns:
            str: contents of the asset file given the path to the asset file
                 relative to the asset dir.
        """
        filename = self.asset_filename(path)
        with open(filename, 'rb') as f:
            return f.read()

Lastly, as this is a system test, there needs to be a way to invoke the program as it will be used by the end user. We could simply define a main() function in cat.py, but there is usually some amount of setup (like argparsing) even before main() is called. Calling main() by itself will not capture that. Also, simply importing main() and calling it will cause any sort of environment setup for the testing framework (such as PYTHONPATH extending) to be inherited by cat.py and this could potentially cause unwanted side effects. To prevent side effects and in order to invoke the program as the end user will invoke it, we'll be creating a new Python process passing in the entry module with the command line arguments that are necessary. This requires an additional helper method to be set up.

import subprocess
import sys

# ...
PATH_TO_CAT_PY = os.path.join(
    os.path.dirname(__file__), '..', '..', 'src', 'cat.py')

def run_program(self, args=None, status=0):
    # any extra args to pass to the program
    if args is None:
        args = []

    # default args needed to call the program
    _args = [sys.executable, PATH_TO_CAT_PY]

    p = subprocess.Popen(
        _args + args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    self.stdout, self.stderr = p.communicate()

    if p.returncode != status:
        self.fail(
            'Expected returncode: {}\n'
            'Actual returncode: {}'.format(status, p.returncode)
        )

Note: I called this method run_program specifically because I'm using the unittest.TestCase class which already has a built-in run() method.

Lastly, we need to write an asset file for the tests. I'm going to create a simple text file and place it in assets/my_simple_file.txt:

Hello, World!

Now, we're ready to write the tests.

Writing the cat tests

I like to follow what's known as test-driven development (TDD). The gist of TDD is that you

  • write tests for the requirements
  • make sure that all tests fail
  • write the code needed to satisfy the requirements
  • ensure that all tests pass

I've found that trying to think about how to test the code greatly helps in keeping the code modular and reusable because it ends up forcing you to write code that is also easy to test. So, we'll write the tests first.

First the setup of the TestCase class. Inside of tests/test_cat.py:

from testlib import BaseTestCase


class TestCat(BaseTestCase):
    pass

Next, let's test the first requirement. In order to do this, we need to run the cat program with a filename and assert that stdout contains the exact same contents as the contents found within the file. Also, we will need to make sure that the returncode from cat was zero (indicating success). This can be done easily with all of the test framework helper methods.

def test_that_given_a_path_to_a_file_the_files_contents_are_output_with_zero_return_code(self):
    # First, we need to get the asset filename
    filename = self.asset_filename('my_simple_file.txt')
    # next, get the file's contents; this is the expected output to stdout
    expected = self.asset_contents('my_simple_file.txt')

    # cat.py expects to be called like so:
    #     python cat.py path/to/file
    # so we need to build up the args to call it like that.
    # the helper function `run_program` already handles calling
    # python with cat.py as the args, so all we need to pass is
    # just the filename.
    self.run_program(args=[filename])

    # calling `run_program` will store any stdout or stderr output
    # into self.stdout and self.stderr respectively. also, calling
    # `run_program` implicitly checks the returncode to make sure it's zero.
    # all that's left to do now is make sure that the output to stdout
    # matches the contents of the file.
    self.assertEqual(expected, self.stdout)

And that's it. The first requirement is fully tested.

The next requirement says that cat should print an error message and return a non-zero status code in the event that the filename given doesn't exist. Let's write that test now.

def test_that_when_a_non_existent_filename_is_given_cat_returns_non_zero_status_code(self):
    filename = 'this_filename_should_not_ever_exist.like_never_ever.txt'
    # just a sanity check to make sure that the filename doesn't exist
    self.assertFalse(
        os.path.exists(filename),
        'The filename used for this test actually exists on disk when it '
        'should not.'
    )
    # this test is as simple as calling cat.py with a non-existent filename.
    # `run_program` accepts an optional int `status` parameter. This
    # parameter asserts that when cat.py exits, the status code given matches
    # the status code returned.
    self.run_program(args=[filename], status=1)
    # make sure that the correct error message was printed
    self.assertIn(
        'The filename "{}" does not exist on disk.'.format(filename),
        self.stdout
    )

That's it. All requirements are now fully tested. And, as expected, all tests fail upon running (since cat.py shouldn't exist at this point). All that's left to do is actually write the code.

Writing cat.py

The main program itself will be written in a single module named cat.py and will live in the src/ directory.

First, we'll start out by doing a little argument parsing to get the filename from the command line:

# cat.py

def cat(filename):
    pass


if __name__ == '__main__':
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument('filename', help='The filename to output to stdout.')

    args = parser.parse_args()

    cat(args.filename)

Great. Now, if we run the tests again, there shouldn't be any errors because of the file not existing. Instead, the tests should be failing due to the missing stdout and return code.

Let's first satisfy the first requirement: output the contents of the file to stdout. All of this will be done inside of the cat() function.

import sys
# ...
def cat(filename):
    with open(filename, 'rb') as f:
        sys.stdout.write(f.read())

That looks good. Now, running the tests should show that one test passes while the other fails. Great!

Now, let's satisfy the last requirement: if the file doesn't exist, a non-zero status code should be returned. I went ahead and (arbitrarily) decided that the return code should be a positive 1 whenever this particular error occurs. The test has already accounted for this when it sets status=1 in the call to run_program(). Let's write the code to satisfy that test:

import os
import sys
#...
def cat(filename):
    if not os.path.exists(filename):
        # print a nice error message
        print('The filename "{}" does not exist on disk.'.format(filename))
        # return the 1 status code for this error
        sys.exit(1)

    with open(filename, 'rb') as f:
        sys.stdout.write(f.read())

That should do it. All of the tests should run and pass.

To view the whole project at once, I have created a zipfile.

Note: Since I'm using unittest I'm also using it's built-in test discovery to run the tests. To do this, in the shell, change directory to the tests directory, then run python -m unittest to run the tests. If you're running on Python2 you will need to also include the discover flag: python -m unittest discover. This should then find the two tests and run them.

Previous: Project Structure

Next: Recipes and Patterns


Comments