Testing Design: Difference between revisions

From ParaQ Wiki
Jump to navigationJump to search
No edit summary
Line 1: Line 1:
==Requirements==
== Overview ==
 
The VTK and ParaView projects have a long history of regression testing using the tools (ctest, dashboards, etc) provided by CMake.  With the move to use Qt as the user interface toolkit in ParaView, it is necessary to revisit the manner in which application / user interface testing is performed.  Several testing frameworks (QtTest, Squish, KDExecutor) have been evaluated, but will not meet ParaView's needs.  Thus, a new testing framework has been created - this document describes the new framework in sufficient detail to allow outside contribution.
 
== Requirements ==
 
* The framework must have an open source license consistent with the ParaView / VTK license.  Rationale: to allow all ParaView users to contribute and run test cases.
* The framework should support easy test case creation to encourage user-contributed tests and bug reporting.
* The framework should allow for multiple forms of validation whether a test passes or fails, including-but-not-limited to image-based validation.
* The framework should support test cases that can be recorded as scripts.  Rationale: a test case recorded as a script can apply complex logic for quantitative verification and validation of results.  For example, a scripted test case could check the values of internal data structures during playback, instead of relying on image-based validation.
* The framework should support test cases that can be recorded as low-level "metafiles".  Rationale: simplicity, reduce dependencies on any given script engine for testing.
* Where possible, test-cases should continue to play-back correctly despite modifications to the user interface.
 
== Challenges ==
 
In theory, user interaction with a graphical user interface can be broken-down into a handful of low-level "events": mouse button press, mouse button release, mouse movement, key press, and key release.  Recording the low-level events and their parameters as they happen for recording, then synthesizing the same events in order for playback, will put the application into the same state.  In practice, this approach does not work well due to differences in user interface "skins", platform, hardware, and operating environment:
 
* Due to user interface skinning or platform specific look-and-feel, the width of a splitter bar may vary by a few pixels.  During playback of a test case where the user resizes a splitter, this could cause the mouse to "miss" the splitter, sending the subsequent mouse movement events into adjacent controls instead of to the splitter where they belong.
* When recording typical user interaction with a file dialog, the test case would include some combination of scrollbar movement and double-clicks as the user navigated through their filesystem.  Unfortunately, this recording would be heavily dependent on the contents of the filesystem - nearly any change in filesystem contents between recording and playback will cause the test to fail, since filenames would be in different locations relative to the mouse.
* A recording of user interaction with a spin button would fail during playback if the spin button is replaced with a slider, despite the fact that both represent an integer quantity.
 
For these and many other similar issues, it isn't enough that a test case record what the user ''did'' - a test case should record what the user ''intended''.  Thus, test case recording and playback becomes a case of mapping low-level user interface events to high-level user ''commands''. 


* Support easy testcase creation for user contributed tests.
* Support playback and recording of testcases.
* Support hand editing of testcases.
* Test cases can be scripts (but we support a trivial "metafile" format, so we aren't dependent on any given script engine for testing).
* Test commands are stored with a high level of abstraction instead of low-level events, to reduce test-case breakage (e.g. "button push" instead of "mouse down at x,y location").
* Test commands are stored with a high level of abstraction instead of low-level events, to reduce test-case breakage (e.g. "button push" instead of "mouse down at x,y location").
* Test commands still work if widget moves to new location in object tree (docking window can be top level or child), to reduce test-case breakage - implies a flat naming scheme for UI components.
* Test commands still work if widget moves to new location in object tree (docking window can be top level or child), to reduce test-case breakage - implies a flat naming scheme for UI components.

Revision as of 10:55, 18 May 2006

Overview

The VTK and ParaView projects have a long history of regression testing using the tools (ctest, dashboards, etc) provided by CMake. With the move to use Qt as the user interface toolkit in ParaView, it is necessary to revisit the manner in which application / user interface testing is performed. Several testing frameworks (QtTest, Squish, KDExecutor) have been evaluated, but will not meet ParaView's needs. Thus, a new testing framework has been created - this document describes the new framework in sufficient detail to allow outside contribution.

Requirements

  • The framework must have an open source license consistent with the ParaView / VTK license. Rationale: to allow all ParaView users to contribute and run test cases.
  • The framework should support easy test case creation to encourage user-contributed tests and bug reporting.
  • The framework should allow for multiple forms of validation whether a test passes or fails, including-but-not-limited to image-based validation.
  • The framework should support test cases that can be recorded as scripts. Rationale: a test case recorded as a script can apply complex logic for quantitative verification and validation of results. For example, a scripted test case could check the values of internal data structures during playback, instead of relying on image-based validation.
  • The framework should support test cases that can be recorded as low-level "metafiles". Rationale: simplicity, reduce dependencies on any given script engine for testing.
  • Where possible, test-cases should continue to play-back correctly despite modifications to the user interface.

Challenges

In theory, user interaction with a graphical user interface can be broken-down into a handful of low-level "events": mouse button press, mouse button release, mouse movement, key press, and key release. Recording the low-level events and their parameters as they happen for recording, then synthesizing the same events in order for playback, will put the application into the same state. In practice, this approach does not work well due to differences in user interface "skins", platform, hardware, and operating environment:

  • Due to user interface skinning or platform specific look-and-feel, the width of a splitter bar may vary by a few pixels. During playback of a test case where the user resizes a splitter, this could cause the mouse to "miss" the splitter, sending the subsequent mouse movement events into adjacent controls instead of to the splitter where they belong.
  • When recording typical user interaction with a file dialog, the test case would include some combination of scrollbar movement and double-clicks as the user navigated through their filesystem. Unfortunately, this recording would be heavily dependent on the contents of the filesystem - nearly any change in filesystem contents between recording and playback will cause the test to fail, since filenames would be in different locations relative to the mouse.
  • A recording of user interaction with a spin button would fail during playback if the spin button is replaced with a slider, despite the fact that both represent an integer quantity.

For these and many other similar issues, it isn't enough that a test case record what the user did - a test case should record what the user intended. Thus, test case recording and playback becomes a case of mapping low-level user interface events to high-level user commands.

  • Test commands are stored with a high level of abstraction instead of low-level events, to reduce test-case breakage (e.g. "button push" instead of "mouse down at x,y location").
  • Test commands still work if widget moves to new location in object tree (docking window can be top level or child), to reduce test-case breakage - implies a flat naming scheme for UI components.
  • Test commands not completely tied to type of object (spin box & slider bar represent the same thing & swapping them should make test work still).
  • Support verification - check that a line edit has the right text at a certain point in the test, or (more complex) retrieve data from VTK objects.
  • Support an inspector to navigate the object tree as an aid for hand-editing.

Design

The testing framework is centered around "recording" and "playback" of user interaction.

For recording, an instance of pqEventTranslator is created, which intercepts Qt events for the entire application. pqEventTranslator manages a collection of pqWidgetEventTranslator objects, which are specialized for specific Qt widget types. The translators convert low-level Qt events ("mouse move", "button down", "button up") into higher-level ParaQ events that can be usefully serialized and played back ("button activated"). Each high-level event is encapsulated in three strings: a widget identifier, the name of the event, and optional arguments for the event. Multiple recording "back ends" can be attached to pqEventTranslator to serialize events - pqEventObserverStdout and pqEventObserverXML are existing examples, a hypothetical pqEventObserverPython object could "store" events in Python code. pqWidgetEventTranslator-derivatives will provide coverage of all "native" Qt widgets, so recording will "just work" for any UI created with stock Qt components. For non-standard widgets, developers may create their own pqWidgetEventTranslator implementations and register them with pqEventTranslator at runtime.

For playback, an instance of pqEventPlayer is created. pqEventPlayer manages a collection of pqWidgetEventPlayer objects, which are responsible for converting high-level ParaQ events into interaction with the UI. Note that there is not necessarily a one-to-one correspondence between pwWidgetEventTranslator and pqWidgetEventPlayer objects - a single pqAbstractIntEventPlayer object is capable of handling events generated by both pqSpinBoxEventTranslator and pqAbstractSliderEventTranslator, because they map dissimilar Qt events into a single, abstract "set_int" event. This makes it possible for a test case to continue working when the UI changes from one compatible widget (e.g. SpinBox) to another (Dial or Slider).

Widget Namespaces

The process of translating low-level events into serializable "scripts" and playing them back is straightforward and well understood, see (shameless plug) [K-3D] for another 3D graphics system that provides this capability. The one fundamentally tough problem is one of how to generate a persistent, repeatable "address" for a given user interface component. There must be some way to uniquely identify each user interface component for which user interaction is recorded, so that events can be directed to the correct components during playback. The main problem is with generating globally-unique addresses, while allowing for reasonable changes to the user interface without breaking test cases. As an example, a test recorded on one system may be played-back on another, with a different configuration of docked / floating toolbars. Ideally, the test case should continue to function.

Some alternatives:

  • Flat - Assign a globally-unique name to every widget. Pros: test cases work no matter how the UI is rearranged. Cons: Developers must explicitly name everything, names quickly become unwieldy, QtDesigner limits widget names (e.g. can't contain slashes), and QtDesigner-generated intance variables will share the long, unwieldy names.
  • The MS Way - Same as "Flat", only use 128-bit UUIDs as identifiers. Insert Homer Simpson-like shudder here ...
  • Widget Hierarchy - Use the Qt widget hierarchy to generate unique names. Pros: developers can use straightforward names for widgets in Qt designer, and only have to explicitly name top-level objects like dialogs. Cons: rearranging the widget hierarchy breaks test cases, e.g. floating/docking windows.
  • The K-3D Way - Use a hierarchical naming scheme, but make it orthogonal to the Qt widget hierachy. Pros: same as "Widget Hierarchy", plus floating and docking windows no longer break test cases. Cons: since this is a separate hierarchy, it has to be maintained at runtime, e.g. by explicitly registering widgets with some central manager.

And the winner is ...

... Widget Hierarchy. The QtTesting library generates a UI component address by walking the Qt widget hierarchy, concatenating object names (separated by slashes) into a hierarchical "path" string. It was decided that, although this method is especially brittle in the face of UI modifications, it requires the least developer effort, and integrates well with custom user interfaces and Qt designer.