Why code-first tools are failing

Automated testing is now relatively mainstream. In the United Kingdom, where I am based, any decent-sized project can expect to have team members dedicated to writing coded automation tests. This new breed of developer’s duties is to test the correctness of the applications being built by writing code that uses an existing test framework. The resultant tests are then run in a continuous integration environment whenever the application code is modified. The goal is to alleviate the tedious repetitiveness of manual testing and cut our reliance on human beings to execute the tests.

There are now dedicated people to write these tests, and we don’t want them sitting idle. How about if we stated in our definition of done that every new feature has to have an automated test to test the functionality.

If we followed this doctrine, we would surely see fewer bugs in production and massive savings in the manual testing effort.

I have been a contractor in the United Kingdom for over 20 years, and I have worked in banks, startups and local government. Unfortunately, we are unleashing a world of pain from the gates of hell if we start adding automated tests to test each new feature.

Every end to end test comes with a maintenance cost that recurs every time we change the feature it tests. The more we have, the more we slow down our ability to quickly adapt to changing requirements.

Latency is the devil

If you test a static web page with no dynamic content, an established testing automation framework will work fine.

Unfortunately, certain conditions are going to make testing trickier, such as:

Asynchronous reads and writes from an API
JavaScript updates the page dynamically,
(JavaScript/CSS) is loaded from a remote server,
CSS or JavaScript creates animations
A framework such as React/Angular/Vue renders the HTML

Each of the conditions above will introduce latency, and things will not happen all at once. The average modern web page has a high level of asynchronicity.

If we start writing test helpers as we have below, then our troubles are only just beginning:

1
2
3


const el = this.$(selector);

waitFor(el, 2000);

In the example above, if the element is not present in the browser after 2000 milliseconds, an exception is thrown.

It is often the case that we need a perfect storm to occur for all our wait conditions to finish at the right time. In isolation, one waitFor condition is fine, but there are many things that we might need to wait for, such as asynchronous javascript calls or CSS animations. The tests are constantly running in a continuous integration server that can be much slower than our high powered laptop.

We may need a brave, new world for every test

Another problem when testing against real data in an actual database is that we might need to have specific data present to execute the current test. After we run the test, there is a high chance that the test will change this data. Changing the data might break other unrelated tests, and if we want this test to run again, it will somehow need to reset to the original state.

Resetting a database for one or two tests is fine. But what if we have tens or even hundreds of tests?

Enter the infamous flake

Non-determinism is probably the most challenging thing a developer experiences. Running the same test more than once and getting a combination of passes and fails is the stuff of nightmares.

Non-determinism gave birth to the immortal line “it works on my machine”.

Non-deterministic tests have been christened flakes by stressed-out developers and testers. Once you have flakes in your test suite, you can watch confidence in your tests evaporate out of all existence.

Waiting for specific elements or combinations in tests and constantly resetting resources becomes a massive pain on the often under-specified continuous integration server, and non-determinism sprouts like ivy on a wall.

Good developers like to “fix” things, and we might develop many clever methods of “fixing” these problems. These “fixes” might not work, making things significantly worse. A good test should be atomic and not rely on other tests. I see a lot of examples when existing tests are used to seed the database into a known state. I worked on one project, a browser application where a user completed several forms before submitting them all at once at the end. When writing automated tests for this, the testers wrote a test for each step. Each step needed the database to be in a specific state, and unfortunately, if the tester were writing a test for step 4, they would run the tests for steps 1, 2 and 3 to get the data into the correct state. The tests exploded exponentially and were eventually consigned to the virtual rubbish bin.

Established Automated testing is not working

The two most established automated testing frameworks that I see in the field are Selenium and Cypress. Both require a developer of some description to write the tests or scenarios. There is no substitute for experience when it comes to writing these tests. Writing end to end tests is complex and very easy to get wrong. Unfortunately, I have seen inexperienced testers with minimal development experience thrust into writing these problematic tests.

Anatomy of a code-first test

We can get fancy with our automated tests and present them in something that almost looks like the written word in a style known as BDD or behaviour-driven development.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16



Scenario: Correct non-zero number of books found by author

Given I have the following books in the store

| title | author |

| The Devil in the White City | Erik Larson |

| The Lion, the Witch and the Wardrobe | C.S. Lewis |

| In the Garden of Beasts | Erik Larson |

When I search for books by author Erik Larson

Then I find 2 books

These tests are written in a given,when, then style.

The given part describes the state of the world before you begin the behavior you’re specifying in this scenario. You can think of it as the pre-conditions to the test.
The when section is that behavior that you’re specifying.
Finally the then section describes the changes you expect due to the specified behavior.

The test above looks splendid, so what exactly is the problem?

A code file containing a BDD test like the above is often called a feature file and there is no magic parser that will turn the English prose into an executable test. Instead, every feature file might have an adjacent step file written in a lower-level language such as Java or JavaScript to turn the instructions into executable steps for the test framework of choice. All this code needs to be maintained no matter how good your reuse strategy is. The more code we write, the more code we need to change as our requirements change.

Another problem here is that the inputs we are asserting against are finite. In the above example, the title and author are examples of inputs. If we want to test more inputs, we must write more code. This test will only pass when these finite inputs are the same each run.

Many tests pass every time because we have written them to pass every time. They make our builds slower and increase our cloud bill as we add more infrastructure for the ever-increasing number of tests.

Are you saying we should not write automated tests?

I have painted a very bleak picture of the world as I see it when using an established automation testing framework such as cypress or selenium.

Unfortunately, I have experienced this in the field, but I am not saying we should never write automation tests.

I recommend writing a handful of end-to-end tests that check the critical business functionality is working and then using unit tests or in memory tests to test whatever else needs testing. Manual testing cannot replace where we are with the current level of tooling. In time this might change.

What often gets forgotten is the maintenance of these tests. If you have too many, then every change to a feature can break a waterfall of seemingly unrelated tests.

Automated testing can save your neck when used sparingly, but it can break your neck if there are too many.

Why code-first tools are failing

Table of contents

Latency is the devil

We may need a brave, new world for every test

Enter the infamous flake

Anatomy of a code-first test

Are you saying we should not write automated tests?

Get started with Reflect today

Create your first test in 2 minutes, no installation or setup required. Accelerate your testing efforts with fast and maintainable test suites without writing a line of code.