Testing Design: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
The VTK and ParaView projects have a long history of regression testing using the tools (ctest, dashboards, etc) provided by CMake. With the move to use Qt as the user interface toolkit in ParaView, it is necessary to revisit the manner in which application / user interface testing is performed. Several testing frameworks (QtTest, Squish, KDExecutor) have been evaluated, but | The VTK and ParaView projects have a long history of regression testing using the tools (ctest, dashboards, etc) provided by CMake. With the move to use Qt as the user interface toolkit in ParaView, it is necessary to revisit the manner in which application / user interface testing is performed. Several testing frameworks (QtTest, Squish, KDExecutor) have been evaluated, but do not meet ParaView's needs due to licensing and other issues. Thus, a new testing framework has been created - this document is intended to describe the new framework in sufficient detail to allow outside contribution. | ||
== Requirements == | == Requirements == | ||
Line 33: | Line 33: | ||
* The MS Way - Same as "Flat", only use 128-bit UUIDs as identifiers. Insert Homer Simpson-like shudder here ... | * The MS Way - Same as "Flat", only use 128-bit UUIDs as identifiers. Insert Homer Simpson-like shudder here ... | ||
* Widget Hierarchy - Use the Qt widget hierarchy to generate unique names. Pros: developers can use straightforward names for widgets in Qt designer, and only have to explicitly name top-level objects like dialogs. Cons: rearranging the widget hierarchy breaks test cases, e.g. floating/docking windows. | * Widget Hierarchy - Use the Qt widget hierarchy to generate unique names. Pros: developers can use straightforward names for widgets in Qt designer, and only have to explicitly name top-level objects like dialogs. Cons: rearranging the widget hierarchy breaks test cases, e.g. floating/docking windows. | ||
* The [http://www.k-3d.org K-3D Way] - Use a hierarchical naming scheme, but make it orthogonal to the Qt widget hierachy. Pros: same as "Widget Hierarchy", plus floating and docking windows no longer break test cases. Cons: since this is a separate hierarchy, it has to be maintained at runtime, e.g. by explicitly registering widgets with some central manager. | * The [http://www.k-3d.org K-3D Way] - Use a hierarchical naming scheme, but make it orthogonal to the Qt widget hierachy. Pros: same as "Widget Hierarchy", plus floating and docking windows no longer break test cases. Cons: since this is a separate hierarchy, it has to be maintained at runtime, e.g. by explicitly registering widgets with some central manager, implying a highly-intrusive mechanism of custom widgets or the like. | ||
After considering these options it was decided that the framework would use the "Widget Hierarchy" method, generating widget names by walking the Qt widget hierarchy, concatenating object names (separated by slashes) into a hierarchical "path" string. Although this method is especially brittle in the face of UI modifications, it requires the least developer effort and integrates well with custom user interfaces and interfaces created with Qt designer. | After considering these options it was decided that the framework would use the "Widget Hierarchy" method, generating widget names by walking the Qt widget hierarchy, concatenating object names (separated by slashes) into a hierarchical "path" string. Although this method is especially brittle in the face of UI modifications, it requires the least developer effort and integrates well with custom user interfaces and interfaces created with Qt designer. |
Revision as of 11:39, 18 May 2006
Overview
The VTK and ParaView projects have a long history of regression testing using the tools (ctest, dashboards, etc) provided by CMake. With the move to use Qt as the user interface toolkit in ParaView, it is necessary to revisit the manner in which application / user interface testing is performed. Several testing frameworks (QtTest, Squish, KDExecutor) have been evaluated, but do not meet ParaView's needs due to licensing and other issues. Thus, a new testing framework has been created - this document is intended to describe the new framework in sufficient detail to allow outside contribution.
Requirements
- The framework must have an open source license consistent with the ParaView / VTK license. Rationale: to allow all ParaView users to contribute and run test cases.
- The framework should support easy test case creation to encourage user-contributed tests and bug reporting.
- The framework should allow for multiple forms of validation whether a test passes or fails, including-but-not-limited to image-based validation.
- The framework should support test cases that can be recorded as scripts. Rationale: a test case recorded as a script can apply complex logic for quantitative verification and validation of results. For example, a scripted test case could check the values of internal data structures during playback, instead of relying on image-based validation.
- The framework should support test cases that can be recorded as low-level "metafiles". Rationale: simplicity, reduce dependencies on any given script engine for testing.
- Where possible, test-cases should continue to play-back correctly despite modifications to the user interface.
- To reduce developer burden, the framework should be as unintrusive as possible. In particular, it should be possible to use the framework with existing code and interfaces built using Qt Designer, and the framework should not require that the developer use "special" widgets, multiple inheritance, or the like.
Challenges
Low-Level vs. High-Level Events
In theory, all user interaction with a graphical user interface can be broken-down into a handful of low-level "events": mouse button press, mouse button release, mouse movement, key press, and key release. Recording the low-level events and their parameters as they happen for recording, then synthesizing the same events in order for playback, will put the application into the same state. In practice, this approach does not work well due to differences in user interface "skins", platform, hardware, and operating environment:
- Differences in window manager between platforms (or even on the same platform, due to user choices) may cause the sizes and arrangement of windows to vary between test case recording and playback.
- Due to user interface skinning or platform specific look-and-feel, the sizes of widgets may very, even if the top-level window is consistently sized. As an example, the width of a splitter bar may vary by a few pixels between two platforms. During playback of a test case where the user resizes a splitter, this could cause the mouse to "miss" the splitter, sending the subsequent mouse movement events into adjacent controls instead of to the splitter where they belong.
- When recording typical user interaction with a file dialog, the test case would include some combination of scrollbar movement and double-clicks as the user navigated through their filesystem. Unfortunately, this recording would be heavily dependent on the contents of the filesystem - nearly any change in filesystem contents between recording and playback will cause the test to fail, since filenames would be in different locations relative to the mouse.
- A recording of user interaction with a spin button would fail during playback if the spin button is replaced with a slider, despite the fact that both represent an integer quantity.
For these and many other similar issues, it isn't enough that a test case record what the user did - a test case should record what the user intended. Thus, test case recording and playback becomes a case of mapping between low level user interface events and high-level events - rather than recording a button activation as a mouse press event followed by a mouse release event at such-and-such coordinates, a single high-level "button activation" event can be recorded, so that the correct button can be activated during playback, regardless of its screen coordinates. Similarly, all of the mouse clicks and scrolling within a file dialog can be replaced by a single "file selected" event that includes the file name as a parameter - during playback, the correct file can be selected regardless of any changes to the underlying filesystem. When recording interaction with a spin button, the series of mouse-clicks and keyboard input can be replaced with a single "set integer" event that includes the final value as a parameter - during playback, the "set integer" event can be interpreted by any widget that represents integers (spin buttons, sliders, dials, etc), not just the type of widget used to record the test.
Widget Naming
While the notion of mapping between low-level and high-level events is straightforward, one problem remains - how to provide a serializable name for widgets so that high-level events can be directed to the correct widget during playback. It is typical to use pointers or references to access Qt widgets at runtime, but these are obviously inadequate as serializable identifiers. The main problem is with generating globally-unique names, while allowing for reasonable changes to the user interface without breaking test cases. As an example, a test recorded on one system may be played-back on another, with a different configuration of docked / floating toolbars. Several alternatives have been discussed:
- Flat - Assign a globally-unique name to every widget. Pros: test cases work no matter how the UI is rearranged. Cons: Developers must explicitly name everything, names quickly become unwieldy, QtDesigner limits widget names (e.g. can't contain slashes), and QtDesigner-generated intance variables will share the long, unwieldy names.
- The MS Way - Same as "Flat", only use 128-bit UUIDs as identifiers. Insert Homer Simpson-like shudder here ...
- Widget Hierarchy - Use the Qt widget hierarchy to generate unique names. Pros: developers can use straightforward names for widgets in Qt designer, and only have to explicitly name top-level objects like dialogs. Cons: rearranging the widget hierarchy breaks test cases, e.g. floating/docking windows.
- The K-3D Way - Use a hierarchical naming scheme, but make it orthogonal to the Qt widget hierachy. Pros: same as "Widget Hierarchy", plus floating and docking windows no longer break test cases. Cons: since this is a separate hierarchy, it has to be maintained at runtime, e.g. by explicitly registering widgets with some central manager, implying a highly-intrusive mechanism of custom widgets or the like.
After considering these options it was decided that the framework would use the "Widget Hierarchy" method, generating widget names by walking the Qt widget hierarchy, concatenating object names (separated by slashes) into a hierarchical "path" string. Although this method is especially brittle in the face of UI modifications, it requires the least developer effort and integrates well with custom user interfaces and interfaces created with Qt designer.
Design
The testing framework is centered around "recording" and "playback" of user interaction.
For recording, an instance of pqEventTranslator is created, which intercepts Qt events for the entire application. pqEventTranslator manages a collection of pqWidgetEventTranslator objects, which are specialized for specific Qt widget types. The translators convert low-level Qt events ("mouse move", "button down", "button up") into higher-level ParaQ events that can be usefully serialized and played back ("button activated"). Each high-level event is encapsulated in three strings: a widget identifier, the name of the event, and optional arguments for the event. Multiple recording "back ends" can be attached to pqEventTranslator to serialize events - pqEventObserverStdout and pqEventObserverXML are existing examples, a hypothetical pqEventObserverPython object could "store" events in Python code. pqWidgetEventTranslator-derivatives will provide coverage of all "native" Qt widgets, so recording will "just work" for any UI created with stock Qt components. For non-standard widgets, developers may create their own pqWidgetEventTranslator implementations and register them with pqEventTranslator at runtime.
For playback, an instance of pqEventPlayer is created. pqEventPlayer manages a collection of pqWidgetEventPlayer objects, which are responsible for converting high-level ParaQ events into interaction with the UI. Note that there is not necessarily a one-to-one correspondence between pwWidgetEventTranslator and pqWidgetEventPlayer objects - a single pqAbstractIntEventPlayer object is capable of handling events generated by both pqSpinBoxEventTranslator and pqAbstractSliderEventTranslator, because they map dissimilar Qt events into a single, abstract "set_int" event. This makes it possible for a test case to continue working when the UI changes from one compatible widget (e.g. SpinBox) to another (Dial or Slider).