Comments on: Test suite - A progress report
Just a quick progress report on the R3 test suite...
The new test suite is running, as a first draft design. It will be published soon (maybe today?) for the R3-Alpha test group (a private test group). In addition, I will be providing documentation to explain how it works, and how users can extend the tests simply by modifying the test suite data files. Once the R3-Alpha group approves the tests, they will be released more openly.
The test engine is built mainly on the reflective test approach, where thousands of test vectors (singular test expressions) are generated and then tested, driven by small test specification files -- each typically only a page in length.
As a result, the tests require validation through inspection of their log files. Once inspected, a hash checksum tracks if changes have occurred, in either the results (test failures) or from the introduction of new tests.
The method seems sound, but I think there could be a problem with the quantity of tests generated. Currently, with only five (of 56) datatypes tested, there are more than 22'000 test vectors, and they build exponentially. We may end up with more than a million test vectors for the final suite.
Although the hash approach works well for detecting failures and other variations, one must compare the test log files to determine specific differences. And, I am a bit concerned that the comparison required may be beyond the capabilities of most "diff" (change differencing) programs.
This will require some thought. Since the log output is a set of vectors itself, it may be possible to develop a specific diff method that works well for these larger data sets. (If not, then we may need to borrow La Sorbonne's supercomputer to analyze our results each time.)
Once the tests are released, let me know your opinion in this area.
I wonder if a couple of nVidia 8800 cards would have some useful processing power if these results were images and not log files; a complete run is a complete image ( for the UnitTest world I suppose it would be green and red and ignored by some sho are color-blind for red/green ) but then random test runs might show patterns that can be seen at a glance (some research indicated young primates out-perform older primates on such 'diff' tasks ) - the image could use color gradations or even be in more than 2 dimensions to reveal flaws in a component which affect another component; a tiled surface might be as interesting; or image layers. A tiled surface seems to allow for handling the exponential growth ( a 'heatmap' image is only so useful ... but those images on surfaces might be more interesting on a multi-faceted 'spindle' ( I think I have seen such a rotating catalog in a paint shop )
I may have first had this idea for browsing the million+ galaxies at www.galaxyzoo.org without the cataloging being 'fixed' in red versus blue (elliptic versus spiral+ )) as they go through multiple-passes of 'cataloging'. And there are interesing options for side-by-side comparison. |
Post a Comment:
You can post a comment here. Keep it on-topic.