The hidden dependencies in your regression tests

At Reflect, we think a lot about how to improve the state of end-to-end testing. We knew that improved tooling and faster execution were clear wins, but one thing that wasn’t obvious to us was how correlated an application’s state is to the complexity of testing that application.

The practice of managing the state of the application under test, known as test data management, is often overlooked. In this article, we’ll cover how teams typically manage (or don’t manage) this data, and present a new method we’ve developed to make this process easier.

As application state changes, tests are more prone to failure

To understand how application state correlates to test complexity, let’s look at an example.

Imagine you’re building a competitor to Shopify, and as part of your testing strategy you want an automated end-to-end test of your checkout flow. You create a test that clicks through the site, adds a product to the cart, enters a dummy payment method, and validates that an order has been placed. Job well done, you continue development knowing that this critical workflow now has test coverage. And in your next commit, the test fails.

You investigate the failure and find that the test is failing not because you shipped a bug, but because the product is no longer available for purchase. Your product database maintains real-time inventory information, and because you created a bunch of orders for this product when debugging your test, the product is now out-of-stock. No worries! To get your test passing again, you: go into your database, manually set the inventory level for this product to 100,000 units, and call this problem fixed.

Next week, the test fails again. And like before, it’s not failing due to a code regression. This time the product is available for purchase, but the size and color that your test selects is no longer available due to a recent sync of product data from your dropship vendor. You could fix the data up again, but it’s just going to get overwritten by your nightly sync job again. It’s important that that data sync is also tested, so it can’t just be turned off. You opt for the quick fix here and modify the test to use a different size and color so the test passes.

Do these kinds of failure modes sound familiar? If so, you’ve experienced test failures due to improperly managed test data. Whether you realize it or not, every end-to-end test makes implicit assumptions about the state of your application. The most basic assumption is that the application under test is running and accessible. But there can be many others, even for seemingly simple tests.

Let’s break down the example above to see what assumptions we’re making in this test.

Before even reaching the product page, we may be making several assumptions about the state of the application:

If we reach the product by searching for it, then we’re assuming that the search page will yield consistent results on every execution.
If we reach the product by navigating to it, then we’re assuming that the site’s category hierarchy and the category this product is associated with has not changed.

On the product page, we likely assume that:

Key properties of the product like its name, description, and price have not changed.
The sizes and colorways associated with the product have not changed.
The product is in-stock and available for purchase.

On the checkout page, we likely assume that:

Shipping and tax rates have not changed.
A valid payment method can be entered.

On the surface this test isn’t terribly complicated, but in fact there’s a lot going on that could go wrong and cause this test to fail.

Anti-patterns when managing state

Before covering some viable strategies for handling this problem, let’s discuss two approaches we consider anti-patterns for test data management.

Resetting data within a test

When hitting repeatability issues with tests that mutate application state, it’s tempting to just throw in some steps at the end that reset the application back to its original state. For example, imagine a test that disables some setting in the application, validates some behavior on the application, and then re-enables the setting. This can seem like an easy fix, but if the test fails before the setting is toggled back on, then the key assumption that this test makes (i.e. that the setting starts off as enabled) is invalid.

Further, when attempting to fix up this kind of test up later, it’s often not obvious how to get the application back to the proper state.

Mocking APIs

Mocks and fakes are very common in unit testing to explicitly define behavior that’s outside the scope of what’s being tested. We don’t recommend taking this approach for end-to-end tests because they tie your tests too closely to the underlying implementation. Unlike a unit test, a single end-to-end test could touch tens of API endpoints. Managing even one of these tests through mocking API interfaces would be time consuming and error-prone, since any change to the API could cause the test to break.

Strategies for managing state

Refactor your tests to make more generic assumptions

The most obvious approach is to make less assumptions about the application. In our example above, we may not need to validate things like prices, shipping rates, and product descriptions exactly match a predetermined value. Maybe it’s sufficient to do a fuzzier match, like validating numeric values are within a range, or something even more general like just validating that some value exists.

It’s definitely worthwhile to question what things you actually want to validate in a test. And certainly if you’re validating these values through other means, such as an integration test that hits your API to validate product metadata, then an end-to-end test that duplicates these validations may not be providing much value.

Write tests that operate on new data rather than edit existing data

Part of the challenge of end-to-end tests is that any actions the tests take can cause the application’s state to be different the next time the tests run. Tests that fall into this category, such as a test that edits an existing record in an application, can sometimes be refactored to first create the data that they’re about to edit. So for example, instead of a test that edits a record straightaway, it instead creates a new record and then edits it.

While this makes the test have more steps and take longer to execute, it reduces the dependencies this test has on the application state. That’s because if the test fails in the middle, it doesn’t affect the next run of the test, which would be creating a new record anyway.

Resetting the database before running the test suite

Rather than refactoring the tests to make less assumptions on application state, another alternative is to manage that application state directly. Common approaches are:

Restoring from a database snapshot: This uses the backup-and-restore features of your underlying database to get the application back into its original state.
Running a set of SQL scripts: The scripts at a minimum include insert statements for the data that the tests depend upon, along with SQL commands to clear out data prior to making those inserts.
Applying “masked” data from production: Test databases are sometimes populated with data from production which has been modified to remove personal or sensitive information.

While these approaches get us closer to managing application state directly, they have some major drawbacks. The snapshotting approach makes data consistent between test runs, but it gives you no visibility into what data is in the system, and no method for incrementally updating that data as tests are added. SQL scripts make it so test data can be version-controlled, but this approach doesn’t allow you to manage test data in third-party APIs, and is easy to break as your underlying data model changes. Using masked data from production helps your test environment more closely match what end users experience in production, but it can lead to the loading of a lot of junk data and runs the risk of privacy and security violations.

A new approach to test data management

Existing tools and approaches to managing test data each have their own drawbacks, so we created a new test data management tool that implements our wishlist for managing state:

Test data is defined explicitly. I want to know exactly what data the tests depend on.
Test data is version controlled. I want a record of changes to the test data, and I want those changes reviewed.
Test data living in both internal data stores and third-party APIs can be managed Oftentimes you’ll want to test data that’s coming from a third-party API. Think payment flows (Stripe API), add to cart flows (Shopify API). Your tests are taking dependencies on that data, too.

We built an open-source library as a new alternative for managing test data. It’s called tdm, and it runs alongside your end-to-end testing tool (be it Reflect, Cypress, or something else) to get your application into the state that your tests expect.

tdm operates like a Terraform for test data; you describe the state that your data should be in, and tdm takes care of putting your data into that state. Rather than accessing your database directly, tdm interfaces with your APIs. This means that the same approach to managing your first-party data can also be used to manage test data in third-party APIs. Test data is defined as fixtures that are checked into source code. These fixtures look like JSON but are actually Typescript objects, which means your data gets compile-time checks, and you get the structural-typing goodness of TS to wrangle your data as you see fit. (Side-note: Here’s an article we wrote about why we decided to use Typescript instead of JSON for the fixtures.)

Similar to Terraform, you can run tdm in a dry-run mode to first check what changes will be applied, and then run a secondary command to apply those changes. With this “diffing” approach, any data that’s generated by the tests themselves gets cleared out for the next run.

We’re hoping that tdm helps software teams make their tests simpler and less flaky by making it easy to manage the underlying state of the application. Contributions are very much welcome!