Cain Project: Support Materials COMP482

Cain Project Home >> Courses >> COMP 482>> Common Problems in Interpreting and Graphing Data

Common Problems in Interpreting and Graphing Data

Samples 1 & 2 taken from student paper
(Click on images to enlarge)
Explanation of issues and suggested improvements for Samples 1 & 2
In the above samples, the students fail to explain key features of the data they produced. In Sample 1, the plot lines jump at 6000 edges from a nice curve that reflects the expected theoretical asymptotic behavior. Explaining this jump should be part of the analysis. Potential causes to explore include: Atypical test data at 6000 (and 7000) edges. To reduce the effects of atypical data, you typically want to average over several test data of each size. Memory hierarchy effects at this particular data size. You should be familiar with the details of your platform's memory hierarchy to be able to explore such issues. Stair-step data structure that isn’t captured by the chosen data sizes. Test for this by adding more data points. Data density issues. Edge density is this case is increasing while the number of vertices is held constant. At 6000 edges, the graph may be sufficiently dense for the algorithm to be significantly slower. System load. Your timings should not include CPU time spent on other tasks, and you should limit what other system resources are needed during testing. This is unlikely to be the cause here since the behavior is consistent across algorithms. In addition to failing to explain the data anomaly in Sample 1, the authors fail to explain the spike at 3000 vertices in Sample 2, which is even more anomalous since it is slower at 3000 vertices than at 4000 vertices. Finally, the authors claim that the plot lines in Sample 1 conform to the theoretical quadratic asymptote. While that looks plausible, the claim could be confirmed by fitting and plotting a quadratic curve.
Sample 3 taken from student paper
(Click on image to enlarge)
Explanation of issues and suggested improvements for Sample 3
This sample has some artifacts that could be interesting, but are not explained. The lines criss-cross in several places. Because the lines are always close together (i.e., within a small percentage of each other), the authors might validly consider the crossing behavior to indicate insignificant noise. If so, the argument should be made and referenced to the margin of error of your timing mechanism. Also, was the same data used for each algorithm? It should be, for consistency. The data exhibits significant stair-stepping. This is typically a result of either the algorithm or memory hierarchy issues.
Sample 4 taken from student paper
(Click on images to enlarge)
Explanation of issues and suggested improvements for Sample 4 The table and associated figures shown above have two problems. Only three data sizes are used, an insufficient number to draw strong conclusions about trends While the authors correctly state that the data confirms the expected conclusions, they should be more explicit and say, for example, that this illustrates that pmb is consistently faster than pmf, which is consistently faster than pma, as expected from the algorithms' descriptions. However, it is somewhat surprising that kmt isn't consistently faster than kms. The paucity of data, however, makes it hard to say how consistent this behavior is.

Course Instructor: Dr. John Greiner
Email: greiner@cs.rice.edu

Cain Project Contact: Deborah Ausman
Email: auswoman@rice.edu