[RiceCS]

COMP 320 Tutorial - Performance Profiling

[Rice]

Tutorial goals:


Performance Profiling

Often you want to know about the performance aspects of your program. Of course, when you are developing your program, you should understand its asymptotic "big-O" complexity behavior. But frequently you want more detail than that.

Modern architectures are sufficiently complicated (by features like superscalar pipelining and multi-level caching) that static analyses cannot both accurately and precisely describe program running time. What we are left with are dynamic analyses, i.e., running or simulating programs and analyzing what happened.

Whole-program profiling

You might already have discovered two simple UNIX tools:

However, neither of these can show the resource usage of individual modules or functions.

time and top Exercises
  1. Compile the sample program without any compiler optimizations and use time on it.

  2. Since this program involves user input, the elapsed time includes however long it takes you to enter some data. time is most useful on programs not involving human input. Here, you'll get the most useful results if you redirect the input from a file.

    Create a text file with your input. Use time on the program, redirecting the program's input from this file.

  3. Run it a few more times. Observe if and how the time statistics change.

  4. Compile it again using various compiler optimizations (-O1 or -O2 or -O3) and use time again, and compare.

  5. Run top while the sample program is running.

More detailed profiling

The three pieces of performance information that developers are usually most interested in are

How could you track these? We'll change the program's source code to track these.

Note that in each case, the act of profiling will change the profile at least a little.

Note: Hardware profiling
Many modern processors have profiling abilities built-in. Profiling still requires a software interface to this hardware. Advantages include that profiling doesn't require modifying the source code, and thus the act of profiling doesn't modify the code's behavior. Disadvantages include that such profiling has a fixed hardware-determined set of capabilities, and it is more difficult to map the profiling results back to the source code.

We could integrate the first two of these with the printf-debugging advocated in a prior tutorial. Instead, UNIX has standard tools for these:

UNIX has no standard tool for profiling space usage, although several debugging & profiling tools are described here.

gprof

prof and gprof are essentially identical in purpose. We'll use the latter, since it has more features. They are both tools for analyzing the run-time required in a program, and breaking this time down among all the various defined functions. In essence, it is a tool for automatically adding a stopwatch to each function.

Using gprof is a three-step process:

  1. Compile with the -pg flag with gcc. (Use the -xpg flag with cc.) This adds code into the resulting executable to do the profiling.

  2. Run the program program. The resulting profiling information is put into the file gmon.out.

  3. Run gprof program, which interprets the information in gmon.out and prints a human-readable summary.

    By default, this produces lots of information. (Use gprof program | more to view it all!) For each function, it shows the percentage of time spent in a function and how that is broken down into the functions it calls. (Note: It includes all the functions included in system libraries, too.) Various command-line options are available to limit what you see.

Of course, for more details, look at man gprof, or this manual.

gprof Exercise

Compile the sample program without any compiler optimizations, and use gprof on it. You can use our Makefile. What function takes the most time?

Compile the program using various compiler optimizations (-O1 or -O2 or -O3) and use gprof. How much does the performance improve with various optimizations?

E.g., here's the unoptimized and optimized results for one set of test inputs, as executed on one of the Owlnet servers.

See also [BO] Section 5.15.1.

tcov

Note: tcov is used with cc, while gcov is used with gcc. However, gcov is not yet available on Owlnet.

tcov is similar to prof/gprof. But, rather than timing each function, it counts the number of times each statement is executed. Using it is a four-step process:

  1. Compile with the -xprofile=tcov flag with cc. Again, you can use our Makefile. This adds code into the resulting executable to do the profiling.

  2. Run the program program. The resulting profiling information is put into the file program.profile/tcovd.

  3. Run tcov -x program.profile source1.c source2.c ... (listing all your source files, or using *.c) to separate that profiling information into readable files source1.tcov, source2.tcov, ...

  4. Look at the files *.tcov.

tcov Exercise
  1. Compile the sample program, and use tcov on it. Be sure to run the profiled program only once for now.

    What statement is executed the most times?

  2. Without recompiling, run the profiled program again, and run tcov again.

    The statistics should now represent both runs together. The counts are set back to zero only when you recompile.


Using profiling information

Now that you have this information, what do you do with it?

Within this course, a few classes have described a few optimization techniques, and more will be described in later classes. It will be helpful to use some of these in the last assignment, where optimizing time and space resources is part of the requirements.

Exercise
  1. Optimize the one function of the sample program that you previously identified as being the bottleneck.

    You'll need to use ideas that the compiler couldn't figure out. E.g., replace recursion with iteration (gcc should be able to do this better than it does) or pick different algorithms or data structures. Change anything except main() and the observable behavior.

  2. Again use gprof and tcov. How much did this one localized improvement help?

  3. Optimize the whole program. Compare results with your friends.

See also [BO] Section 5.15.2.