OCAML native code profiler for x86 + gcc
Of course even without any patch you can always pass -ccopt -p option to
ocamlopt and get some profiling information, but with -ccopt -p you would
only get the "flat" profile, while with my patch you will also get the full call
graph, including the information on
- how many times each function was called
- which functions were called from some particular function and how many times
- how much time was spent in each function
- how much time was spent inside each function including the function itself, its
children, grandchildren, grand-grandchildren, etc; how this time is distributed
among the children functions
- which functions called some particular function
I wrote my patch to mimic the behavior of gcc -pg - the patched ocamlopt works
in exactly the same way as the usual ocamlopt unless it is given the -p option.
When given the -p option, the patched ocamlopt:
- inserts calls to the counter function, mcount, at the beginning of each function
- passes the -pg option to gcc
- links the code against libasmprun library, which is the profiled version of the
usual libasmrun - it is compiled with -pg option and it also has the
modified (profiled "by hands") version of i386.S compiled in.
I tested my patch only on RedHat Linux 5.1 system,
but it will probably work fine on any other x86 + gcc system and I believe that it should
not be too hard to port it for use with other C compilers (you will probably need to
change -pg to -p and mcount to something else) or even to some
other hardware architecture.
Of course there is NO WARRANTY - it works for me, but you are using it at your own
risk.
To use the profiler:
- Apply my patch and recompile ocaml
- Recompile your code with -p option (only .ml files, you do not have to
recompile .mli files).
- Run your code. It should generate the gmon.out file.
- Run gprof <program executable>.
Please, note:
- The profiled binary will run much slower and the timing information would not be
completely accurate.
- You would not get much information for function that were compiled without -p option
(e.g. standard library)
- If you want to get very accurate call graph (at expense of less accurate timing
information), use -p -compact -inline 0.
- If some function is called more than 2^32 times, the profiling information becomes
wrong. This happens especially fast when you compile you code with -compact option
and your code start calling caml_alloc functions (caml_alloc, caml_alloc2,
caml_alloc3 and caml_alloc4).
Back to the Patches Page
Last update: Friday, December 14, 2001 by Aleksey Nogin (e-mail: Nogin@CS.Cornell.EDU)