• [+] expand all

Vespa application system tests

System tests are an invaluable tool both when developing and maintaing a complex Vespa application. These are functional tests which are run against a deployment of the application package to verify, and use its HTTP APIs to execute feed and query operations which are compared to expected outcomes. Vespa provides two formalizations of this:

These two frameworks also includes an upgrade—or staging—test construct for scenarios where the application is upgraded, and state in the backend depends on the old application configuration; as well as a production verification test—basically a health check for production deployments. For system and staging tests, the frameworks provide an easy way to perform HTTP request against a designated test deployment, separating the tests from the deployment and configuration of the test clusters.

The rest of this document describes how each of these test categories can be run as part of an imagined CI/CD system for safely deploying changes to a Vespa application in a continuous manner.

System tests

System tests are just functional tests that verify a deployed Vespa application behaves as expected when fed and queried. Running a system test is as simple as making a separate deployment with the application package to test, and then running the system test suite, or one or a few or those tests.

Each system test should be self-contained, i.e., it should be able to run each test in isolation; or all tests, in any order. To achieve this, system tests should generally start by clearing all documents from the cluster to test. This is the case with our sample system tests, so take care to not run them against a production cluster.

For the most part, system tests must be updated due to changes in the application package. Rarely, an upgrade of the Vespa version may also lead to changed functionality, but within major versions, this should only be new features and bug fixes. In any case, it is a good idea to always run system tests against a dedicated test deployment—both before upgrading the Vespa platform, and the application package—before deploying the change to production.

Staging tests

The goal of staging (upgrade) tests is not to ensure the new deployment satisfies its functional specifications, as that should be covered by system tests; rather, it is to ensure the upgrade of the application package and/or Vespa platform does not break the application, and is compatible with the behavior expected by existing clients.

Running a staging test therefore requires more steps than a system test:

  1. First, a dedicated deployment is made with the current setup (package and Vespa version).

  2. Next, staging setup code is run to put the test cluster in a particular state—typically one that mimics the state in production clusters.

  3. When this is done, the deployment is upgraded to the new setup (package and/or Vespa version).

  4. Finally, staging test code is run to verify the cluster behaves as expected post-upgrade.

As an example, consider a change in how documents are indexed, e.g., a adding new document processor. A system test would test verify this new behavior by feeding a document, and then verifying the document processor modified the document, or perhaps did something else. A staging test, on the other hand, would feed the document before the document processor was added, and querying for the document after the upgrade could give different results from what the system test would expect.

Many such changes, which require additional action post-deployment, are also guarded by validation overrides, but the staging test is then a great way of figuring out what the exact consequences of the change are, and how to deal with it.

As opposed to system tests, staging tests are not self-contained, as the state change during upgrade is precisely what is tested. Instead, execution order of any staging tests that modify state, particularly after upgrade, must be controlled. Indeed, some changes will require refeeding data, and this should then be part of the staging test code. Finally, it is also good to verify the expected state prior to upgrade.

The clients of a Vespa application should be compatible with both the system and staging test expectations, and this dictates the workflow when deploying a breaking change:

  1. First, the application code and system and and staging tests are updated, so tests pass; and clients are updated to reflect the updated test code.

  2. Next, the application is upgraded.

  3. Finally, the staging setup code is updated to match the new application code.

Again, it is a good idea to always run staging tests before deployment of every change—be it a change in the application package, or an upgrade of the Vespa platform.

Production tests

Some changes cannot be properly tested outside of a production setting; examples include user engagement, and other high level metrics. Upgrading a subset of production clusters first, allows detecting a regression here, and stopping the change from deploying to the rest of the production clusters.

These tests will be completely domain-dependent, and typically not run against the Vespa application itself. The test framework therefore has less tooling here, and really only specifies this is a separate category.