Developer's Guide to Regression Testing

Our comprehensive guide to creating automated test suites for web applications that aren't a pain to maintain.

Table of Contents


Who is this guide for?

This is a guide for developers who want to learn how to effectively build and maintain automated regression tests for web applications. We wrote this guide because:

Who we are

We’re the creators of Reflect, a no-code tool for creating automated regression tests for web applications. We’ve talked to hundreds of companies about testing, and have some admittedly strong opinions about how to go about building test automation.

But don’t worry, this guide is written to be framework-agnostic; so whether you use our product or some other solution for regression testing, you’ll be able to apply all of the principles we’ve outlined here.

How to use this guide

This guide is written as a reference so you can read any section without having to read all prior sections.

If you’ve never written automated end-to-end tests before, we recommend reading the Getting Started section in its entirety. This will give you a good foundation on regression testing, educate you on the misconceptions about this type of testing, and help you understand the common mistakes made when building a regression test suite.

Getting Started

Success requires a different approach

If you take away one thing from this guide, it should be: Don’t treat end-to-end tests like unit tests. It’s tempting to write an end-to-end test the same way you’d write unit tests, but they are completely different beasts. A different approach is required for you to effectively write end-to-end tests. Why, you ask?

Well, the reality of regression testing is that you’re going to be driving one of the most complex pieces of software ever built: the web browser. Browsers are built to support thirty years worth of standards, implemented by each browser vendor to some different approximation of the spec, execute Javascript frameworks and third-party plugins that implement their own abstractions that make use of these standards, which are compiled via language transpilers, CSS-in-JS libraries, tree-shakers and minifiers, and with the whole works being delivered over a network connection.

Atop this mountain of complexity lies the custom code that you want to test.

Because of the complexities mentioned above, regression tests should be treated as black-box tests. The key to making tests maintainable is to ensure they stay repeatable and are resilient to changes in the underlying state over time. We recommend making as few assumptions about the underlying implementation as possible. For example, mocks and fakes, which are common in unit testing, should be avoided. Instead, you should make reasonable assumptions about the state of your application. A reasonable assumption about your application state may be that a test user exists with a given username and password. Making this assumption allows you to test functionality using that user account, and lets you avoid starting the test by creating a brand new user via your application’s sign-up flow.

Managing Complexity

Keep Tests Small

We recommend keeping tests as small as possible, but no smaller. What this means is that a test should replicate how real users interact with your web application, including all of their logically-related actions, but no additional, unrelated actions. Sometimes this requires a liberal use of visual assertions to verify that the site looks like you expect. What we advise against is chaining actions into a single test which could be split up into separate tests. This has the dual advantage of making your tests run faster (if you’re running them in parallel), as well as making the issue clearer when a test run fails.

Don’t Repeat Yourself

If you duplicate the same workflows across multiple tests, you’ll need to make the same updates across those tests when your application changes. Workflows that appear in multiple tests should be managed in a single location and referenced by other tests. Other workflows should be tested once and not duplicated.

Create tests that are repeatable

Automated tests need to be able to run over and over without changes.

The common cause for a test not being repeatable is underlying data changes that invalidate assumptions about the state of the application. Examples of this include:

Another common issue with repeatability is due to the timing of tests. Examples here are:

Repeatability issues can also occur in scenarios that require unique names or values, such as:

Create tests that can run in parallel

End-to-end tests run much slower than unit and integration tests. To keep test suites running in a reasonable amount of time, you should make tests small, and run those tests in parallel. This brings us to the second most important takeaway that we hope folks get from this guide: factor your tests so they can run in parallel, and run in an arbitrary order. This guidance sounds easy, but it takes work to make it happen. A lot of the recommendations in this guide are geared towards making your tests able to run in parallel and in an arbitrary order.

When designing your tests with parallelism in mind, one key thing to remember is to make sure your tests are not modifying state that’s depended upon by other tests.

Running tests in parallel is the single best way to decrease the running time of your test suite. This is critical in order to ensure you get feedback as quickly as possible.

Parallelism can take what was a 20 minute test suite down to 5 minutes or less with minimal work. Note however that your tests should be factored so that running tests in parallel and in an arbitrary order is possible. The key principle to making a test parallelizable is to ensure that the actions in that test do not affect other tests. For example, if you have a test that modifies an existing user, you should ensure that that modification is not depended upon by some test that normally executes after that test.

A good approach to ensure tests are parallelizable is to have tests that mutate state do so on records that are newly created. Take as an example a test where you can modify an existing record within your application. The straightforward approach to implementing this is to find an existing record, modify it, and then verify the modification happened. But what if that record is depended upon by another test? Perhaps you modified the name of the record, and another test is looking up that record by name and no longer finds it. Or perhaps another test is verifying the ‘last modified date’ of the record, and that date now is updated every time you run this test. If you’re going to modify existing records, you should ensure that that modification doesn’t break other tests.

An alternative approach to implementing this test would be to first create a new record and then immediately modify the record. This test will take longer to execute since it’s creating a new record, but the advantage here is that the test is much more likely to not break other tests. Another advantage to this approach is that you’re working with a clean slate on every test run. It’s no problem if the test fails in the middle of the run and leaves a half-created record in some unexpected state. This is not the case for a test that modifies an existing record. If the test fails in the middle of the run, it may leave that record in an unknown state that you’ll need to fix up manually before your tests can pass again.

Use stable selectors/locators

Selectors (sometimes called locators) are the method by which most test automation tools identify an element to interact with. There are two kinds of selectors:

We recommend using CSS selectors, as they are easier to read and will be more familiar to front-end developers.

The process of manually creating selectors is time-consuming. Selectors are also the cause of a lot of false-positive failures. This is because as the application changes, selectors may no longer target the right element, or may not target any element at all. Some automation frameworks can automatically build selectors for you. Alternatively, if you’re building your selectors manually, one way to reduce the chance of stale selectors is to add attributes to the markup in your application that are used exclusively by the selectors in your tests. By adding these attributes, it will make it more obvious to developers that a given change may break existing automated tests, and it will give your automation an explicit, and hopefully static, element identifier.

The most common attributes used for this purpose are:

Isolate data that’s touched by automated tests

The users and accounts in your application that are used by test automation should not be used for any other purpose. This is because users can end up modifying the underlying state of workflows that are tested by your automation, invalidating assumptions about that state and causing your tests to fail. These types of failures can be particularly hard to debug because modifications to the state often aren’t obvious, and there may not be an audit trail pointing to how the data was modified. For example, updating a record on the account may cause that record to appear at the top of a list sorted by recent edits, which in turn can cause tests that make assumptions about the ordering of that list to fail.

To avoid these issues, separate manual and automated testing at the highest level possible. Separate environments are best, separate accounts within the same environment is good, and anything finer grained than that is to be avoided.

Testing As Part of Your Development Process

Once you’ve built up your initial test suite, you’ll want to make a plan for updating and maintaining the suite over time. Your plan should cover the following items:

  1. Who is responsible for maintaining tests. Smaller teams can identify an owner or small set of owners for the entire suite. Larger teams are usually split up into different feature areas; you’ll want to split ownership so features teams own their respective tests.
  2. What tests are added as part of new feature development. A process should exist that has teams identify what tests, if any, should be added when a new feature is released.
  3. What tests are added when bugs occur. If your team does post-mortems, then adding an end-to-end test should be one of the action items that are considered when writing a post-mortem.

Running the same tests across different environments

If you have multiple non-production environments (such as a QA environment plus a staging environment), you should consider running at least a small subset of your test suite in those additional environments whenever a deployment occurs.

When running tests in a separate environment, you’ll need to account for changes in the environment. For example, if your tests involve authenticating as a user, you’ll likely need to authenticate with different credentials vs. your normal test environment. Your test will also need to account for the application hosted using a different hostname.

End-to-end tests in a microservice environment

Many applications today are built using a collection of microservices. This presents a challenge for end-to-end testing, as a change to any microservice could cause a regression to an end-to-end workflow.

You could run your full end-to-end test suite after every microservice deployment, but this may not be feasible or desirable depending on how many services are in your architecture, and how often you deploy them.

An alternative approach is the following:

Tools like Nx provide the ability to tie functionality to features, giving you the ability to kick off the subset of your test suite that is affected by a given changeset.

Synthetic testing in production

One advantage of having an end-to-end testing suite is that it can also be used as a kind of monitor in production. This type of testing is called synthetic testing, and it’s a hybrid of automated testing and monitoring. We recommend taking your most critical and high-trafficked workflows (sign-in, sign-up, placing an order) and running them on a schedule in production. Having a small subset of your regression suite run as synthetic tests in production can be a great way to augment a traditional monitoring suite and get additional mileage out of your test automation efforts.

Running regression tests on every pull request

This is sometimes referred to as “shift-left” testing, meaning that if you consider the software development lifecycle as a process defined from left to right, “shifting left” moves testing closer to the point of development. Open source tools like Docker and Kubernetes, PaaS features like Heroku Review Apps and Vercel Preview Environments, and fully-managed tools like Render and make spinning up infrastructure on every PR possible.

Common Scenarios

Testing a login workflow

Your sign-in page is among the simplest workflows to test, as well as a very critical workflow to have test coverage for. We recommend starting with a single happy-path test that verifies that a user can successfully log into your application. You should create a user specifically for this test. This user should either be in an environment that has no real production data, or it should be in an isolated account with no ability to view real customer data. See the Test Data Management section for more info.

In addition to running this test as part of your post-deployment test suite, tests of your sign-in workflow are also a good candidate for synthetic testing in production since it’s a relatively easy test to implement and isn’t going to create dummy data or potentially inflate your usage metrics compared to something like a test of your new user registration workflow.

When implementing a login test, be sure to include a final assertion at the end of the test that validates that you are successfully logged in. This may seem obvious, but it’s not enough to validate that you can enter in your username and password and click the Login button. You need to also validate that the user is logged in successfully. Be sure that the assertion you’ve added will only be true if the user is in fact logged in. We recommend validating something like an avatar or user’s name on the subsequent page, as that would only appear if the user is logged in, and it uniquely identifies the user.

In Java servlet applications this is usually called a cookie called JSESSIONID. Third-party authorization tools like AWS Cognito and Auth0 store Javascript Web Tokens (JWTs) in Local Storage. We don’t recommend this approach because it violates the principle of validating things from the user’s perspective. The validation is too implementation-specific. Not only could this cause false positive failures when the underlying implementation changes (such as if you upgrade your authentication library), but it will miss bugs where things look good with the authorization cookie, but the experience is broken to the user, such as:

Testing the authorization endpoint directly

Verifying the endpoint used to authenticate a user is a different test than verifying that your login workflow is working. The former case is easier to implement and gives you test coverage for a specific API endpoint. If you’re using a third party authentication library, this endpoint may be a third party endpoint.

While there is value in this type of test, it’s important to note that just like in the previous example, any bugs occurring outside of the API endpoint itself are not going to be caught with this approach, which would be considered an integration test or API test.

Testing a signup workflow

New user registration is often a critical workflow to the business, and should be near the top of the list when building your automation test suite. There are a few items however that can present difficulties in automating this workflow:

Each test needs to create a unique user

User registration workflows only allow you to register an email address or username once. This means that if you were to attempt to register the same email address in an automation script, it would fail on the 2nd run and on any subsequent run. The fix here is to register with a unique email address every time. We recommend using a common prefix such as ‘testuser’ with a random series of characters before the ‘@’. This will make it easier for you to find and clear out test users in your database should you need to do so.

You may need to contend with captchas

Captchas are simple puzzles that are easy for humans to solve, and very difficult for computers to solve in an automated way. For tips on how to work around captchas, see our Handling Captchas section.

You may need to verify your email address

Many user registration workflows require the user to verify their email address before completing the registration process. This can be difficult to automate because it requires the test to execute steps to wait for an email to arrive, parse the email, and then either click on a link within the email or copy a verification code that is input onto the web application. We recommend using a tool that has email testing built in, as the alternative is to script test steps that log into a web email provider and extra the email from there, which is a very brittle and error-prone approach.

Testing a date picker / calendar widget

Date pickers and calendar widgets can be difficult to automate because they change over time. We recommend two strategies here, one that’s click-driven, and one that is input-driven:

Testing workflows using Google sign-in

Testing Google auth workflows is tricky, but possible. To ensure Google does not detect and block the automation, you should set up your Google Workspaces account in the following way:

  1. Use a GSuite user rather than an user when testing. Likely to mitigate spam-related issues, Google will block logins from accounts when it detects it’s being run by automation, but this is not the case for GSuite users.

  2. For your GSuite user, you’ll need to update a setting within Google Apps to disable some security checks for the user you’re signing in as. This is because when testing Google sign-in, Google may sometimes detect that it’s running using automation and show a message like this:


You should be able to allow automation to work properly by logging into that Google account and going here: Note that we recommend using a GSuite account that is only used for testing since you’re going to be reducing the security levels for that account.

Testing role-based access controls (RBAC)

Many applications support role-based access for various features in their app. To design tests for role-based access controls (sometimes referred to by the acronym RBAC), start by listing out each user role supported in your application, and identify what features are accessible for each role. The table below illustrates access controls for our example application:

Role View Widgets Edit Widget Manage Users Manage Billing

Next, create two tests for each feature, with one test verifying that the feature can be exercised, and one test verifying that the feature is not accessible. Each test should be factored such that the inputs that differ for each user role is overrideable.

Finally, we’ll create a spreadsheet where each row represents a role-feature permutation to test, and each column represents the inputs (and optionally assertions) that should be overwritten for that permutation.

Test Name Username Password
View Widgets - Success ********
Edit Widget - Success ********
Manage Users - Success ********
Manage Billing - Success ********
View Widgets - Success ********
Edit Widget - Success ********
Manage Users - Success ********
Manage Billing - Blocked ********
View Widgets - Success ********
Edit Widget - Success ********
Manage Users - Blocked ********
Managing Billing - Blocked ********
View Widgets - Success ********
Edit Widget - Blocked ********
Manage Users - Blocked ********
Manage Billing - Blocked ********
View Widgets - Blocked ********
Edit Widget - Blocked ********
Manage Users - Blocked ********
Manage Billing - Blocked ********

This type of testing, known as data-driven testing, is very powerful since it allows you to test many different scenarios with relatively few tests. In the example above we need only create eight tests in order to test 20 different permutations of user roles + features.

In terms of downsides, data-driven tests are more complex to set up since you’ll need a way to parse a spreadsheet or CSV, kick off tests for each row, and will have had to factor each test to be overridable. These tests also add significantly to the runtime of your test suite, especially when you’re testing many permutations, and potentially across browsers.

Localization testing

Testing localized text can be done with a data-driven approach similar to testing RBAC. First, create a series of tests that validate various text in your application using the application’s default language. Next, you would create a set of overrides for each language that replace the expected values in the default language with the text localized to another language. Triggering an application to render text in a different language usually involves either running a test on a different subdomain (e.g. testing again instead of to verify Spanish text within the Mexico locale), or by passing in an Accept-Language HTTP header for the locale you want to test (e.g. es-mx).

Testing one-off integrations with simple smoke tests

Many companies offer integrations that can be embedded directly into a customer’s own web application. These types of integrations often provide different types of customizations, either via a set of options that the customer can choose themselves, or via more sophisticated customizations that could be unique to each customer.

Having test coverage for one-off integrations straddles the line between testing and monitoring, and can be a powerful approach for getting coverage for actions that are more complex than a simple check of an HTTP endpoint. We recommend keeping these tests simple: Make very few simple checks in each test so that you can sanity check that each integration is working without having the tests be a maintenance burden. More comprehensive testing should be done on an example integration that you completely control.

Bypassing authentication

You’ll likely have many tests that require a user to log in. One performance optimization to speed up your tests is to bypass the username and password entry and start each test as a logged in user. How you implement this depends on how authentication works on your application. Your application may support the ability to pass a short-lived token in a query parameter that serves to log in the user. Otherwise you could set the session to be authenticated by setting the proper cookies or session/localStorage values that represent a logged-in user.

The downside of this approach is that you’ll need to pass in new values each time you run your test, as the authenticated state usually becomes invalid after a short period of time.

If you’re bypassing authentication in your tests, we recommend having an isolated test of your full user authentication flow so you have end-to-end coverage for this critical workflow.

Handling multi-user test scenarios

Testing scenarios that involve multiple users can be challenging. Consider a multi-user workflow that involves one user requesting approval, and another user at a different permission level approving or denying the request. Here are two ways we could set up this test:

Running the workflow sequentially

This is usually what people default to. With this approach, the test would follow the following steps:

There’s a couple of problems with this approach:

A more maintainable approach here would be splitting this one large test into three separate tests:

We recommend combining this with a test data management approach that makes the following DB updates prior to running the test suite:

One additional note here is that to minimize the chances of missing bugs between the state changes in the tests, we recommend ending each test with a validation that that state change has completed successfully, such as a visual check that the item now shows up with the new status displayed on the page.

Running each user workflow in parallel

Consider a different example where you’re testing an interactive chat program. One key test is to validate that one user can post a message, and another user can see that message. Running two tests in parallel can be an effective way of testing this type of workflow.

Handling captchas

Captchas are questions or challenges that are easy for humans to solve, but very difficult for machines to solve. You’ll often find these in things like registration workflows to prevent spammers from creating accounts. Due to the nature of how captchas work, they cannot be automated. Instead, you’ll need to either disable captchas in the environment you’re testing, or have some mechanism for conditionally disabling captchas for test accounts.

If you are using Google’s reCAPTCHA, you can use a separate key for testing environments to automatically pass all verification requests. For more information about enabling automated tests in your environments with ReCAPTCHAs, read Google’s Guide.

Handling A/B tests

Like captchas, the best way to avoid A/B tests causing problems with your automation is to disable A/B tests in your test environment. Otherwise most A/B testing frameworks will have some option for disabling A/B tests entirely, usually by setting a cookie value at the outside of your test. If neither of these options are available, the next best option is to preset whether the user gets the A or B value in each running test. This can usually be done by copying the cookie value controlling the A/B tests from an existing browser session.

Handling sites with HTTP basic auth

HTTP Basic Authentication is sometimes used as a means of preventing test environments from being publicly accessible. A page with HTTP Basic Authentication enabled will prompt the user to enter a username and password using a native browser dialog that looks like this:

Rather than attempting to automate the entry of the username and password into the dialog, you can bypass the dialog by including the username and password into the URL of the site. Example: Including the username and password in the URL directly should grant access and prevent the dialog from appearing for the duration of the browser session.

Visual Testing

Whereas functional regressions are bugs that affect the functionality of your site, such as a button that shows an unexpected error when clicked, visual regressions are bugs that affect the look and feel of your application. These can be just as common as functional issues and have a similar level of impact to your users, but strangely most testing frameworks provide no facilities for getting test coverage for this class of issues.

We recommend using a testing tool that has built-in support for visual testing, or adding on a third-party library that is purpose-built for visual testing.

Adding visual checks to your tests

Imagine you are documenting a manual test case for an add-to-cart workflow. A simplified version of that test case might look like the following:

When translating this into an automated test, each of these “Observe” steps could be represented as visual checks. This gives you coverage for changes to the product name, quantity, and price.

Additionally, you could add visual checks for other things that you would likely be checking when manually testing, but that are not documented in the manual test case. Things like:

What is not worth visually validating are the elements on the page that are not pertinent to the test, and there are a lot of them: things like the navigation bar, the footer, the sidebar, etc. Validating these elements leads to noisy tests that fail for reasons you probably don’t care about, such as the ordering of categories in the sidebar changing.

That is why for visual tests we recommend only validating the elements most pertinent to the test, and to not have the same visual assertions across multiple tests. For instance if you are going to validate the top navigation bar, do it in a single test and not on every test that interacts with the top nav. It is for this reason that we also recommend not using solutions that screenshot the entire browser window (these also throw false positives due to slight changes in viewport scroll position), and instead use tools that screenshot specific elements on the page when looking for visual differences.

Using visual checks as smart “wait” steps

Visual checks can also be used as a signal to the test to wait before proceeding. In the case of the add to cart flow above, imagine that it takes several seconds for the shopping cart widget to appear after clicking Add to Cart. For long-running actions like this you may be tempted to add a hard-coded wait step in your test that tells the test runner to ‘sleep’ for a fixed number of seconds. This is definitely a bad approach because in the best case your test will be waiting longer than necessary to proceed to the next step, and in the worst case the time you chose to sleep won’t be long enough and the test will fail due to the operation

Instead of hard-coded sleeps, you should set up your tests to wait for a specific element to appear in the DOM, or wait for an element to have the expected visual state before proceeding. The former is a good approach when you know that the element will only appear after the long running operation is complete. The latter approach is better if the element changes its visual state while the long running operation is going on, such as a button which appears with a disabled state until the operation is complete.

Test Data Management

Reasoning about test data management

In our Getting Started section, we talked about how your end-to-end tests should make assumptions about the state of the application, and not the implementation of the application. The more you can safely assume about the state of your application, the simpler your tests become. Actively managing the state of your application via an approach called test data management is an effective way of keeping your tests maintainable and repeatable over time.

With test data management, you are writing automation that directly updates the underlying data used in your application. You can think of this as taking the concept of managing state one step further - instead of just making assumptions about the state, you are actively defining what this state is.

Managing first-party test data with DB calls or API calls

You can manage your test data in several ways:

  1. Restore your database from a snapshot before executing your tests. This is a heavy-handed approach and works best for ephemeral environments that are used for testing on every Pull Request.
  2. Run a script to reset a subset of data prior to executing your tests. We recommend this approach if you’re testing against a long-living environment like a QA or Staging environment.

Managing third-party test data with API calls

Your application likely has integrations with third-party providers for things like authentication, billing, etc. Managing test data within third party systems is more challenging than systems you control. You can use the provider’s underlying APIs to manage the data.

Set up data before tests run, don’t tear down data after tests finish

A frequent pattern in unit testing is to have a ‘setup’ phase that sets up state before the tests run, and a ‘teardown’ phase that cleans up state after the tests run. For end-to-end tests we recommend only setting up state during setup, and ditching the ‘teardown’ concept. The reason why is because if tests fail, this allows you to view the application in its current state, either by using the application directly or by querying the database. Running a teardown script to clear state can remove crucial information to help reproduce, diagnose, and root-cause any bugs that are found by your automation.

Frequently Asked Questions

What kinds of workflows should you be testing in a regression test?

Creating effective tests for these workflows will reduce bug counts and will make you feel like you’re making progress in increasing the quality of your software.

What shouldn’t you be testing in a regression test?

Avoid creating tests that would be better served as integration tests, or API tests, or pure monitoring/liveness scenarios that are not interacting with the application. Good end-to-end tests should closely mimic how a user interacts with your site. Initiating network calls directly or validating network responses and status codes are signs that your test would be better served as an integration test using a tool like Postman.

Should I still be writing unit tests and integration tests?

Yes, end-to-end tests aren’t a replacement for unit and integration tests. You should create all three types of tests so that you have defense in depth for bugs, and so that you’re testing assumptions and requirements at different levels of your application architecture.

When should you run your regression test suite?

Start by running your tests in the first environment that code reaches after it is merged. Ideally you’ll run these tests automatically after every deployment to this environment. Once your tests are stable, expand them to run in other environments. Running end-to-end tests on every Pull Request is a powerful concept but is harder to pull off.

Get started with Reflect today

Create your first test in 2 minutes, no installation or setup required. Accelerate your testing efforts with fast and maintainable test suites without writing a line of code.

Copyright © 2022 Reflect Software Inc. All Rights Reserved.