I use std::tr1::shared_ptr extensively throughout my application. This includes passing objects in as function arguments. Consider the following:
class Dataset {...}
void f( shared_ptr< Dataset const > pds ) {...}
void g( shared_ptr< Dataset const > pds ) {...}
...
While passing a dataset object around via shared_ptr guarantees its existence inside f and g, the functions may be called millions of times, which causes a lot of shared_ptr objects being created and destroyed. Here's a snippet of the flat gprof profile from a recent run:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
9.74 295.39 35.12 2451177304 0.00 0.00 std::tr1::__shared_count::__shared_count(std::tr1::__shared_count const&)
8.03 324.34 28.95 2451252116 0.00 0.00 std::tr1::__shared_count::~__shared_count()
So, ~17% of the runtime was spent on reference counting with shared_ptr objects. Is this normal?
A large portion of my application is single-threaded and I was thinking about re-writing some of the functions as
void f( const Dataset& ds ) {...}
and replacing the calls
shared_ptr< Dataset > pds( new Dataset(...) );
f( pds );
with
f( *pds );
in places where I know for sure the object will not get destroyed while the flow of the program is inside f(). But before I run off to change a bunch of function signatures / calls, I wanted to know what the typical performance hit of passing by shared_ptr was. Seems like shared_ptr should not be used for functions that get called very often.
Any input would be appreciated. Thanks for reading.
-Artem
Update: After changing a handful of functions to accept const Dataset&
, the new profile looks like this:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
0.15 241.62 0.37 24981902 0.00 0.00 std::tr1::__shared_count::~__shared_count()
0.12 241.91 0.30 28342376 0.00 0.00 std::tr1::__shared_count::__shared_count(std::tr1::__shared_count const&)
I'm a little puzzled by the number of destructor calls being smaller than the number of copy constructor calls, but overall I'm very pleased with the decrease in the associated run-time. Thanks to all for their advice.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…