Wow. OK -- My original performance assessment was flat out wrong. Color me stupid.
Not so stupid. My performance test was wrong. Fixed. Along with a deep dive into the GCD code.
Update: Code for the benchmark can be found here: https://github.com/bbum/StackOverflow Hopefully, it is correct now. :)
Update2: Added a 10 queue version of each kind of test.
OK. Rewriting the answer:
??@synchronized()
has been around for a long time. It is implemented as a hash lookup to find a lock that is then locked. It is "pretty fast" -- generally fast enough -- but can be a burden under high contention (as can any synchronization primitive).
? dispatch_sync()
doesn't necessarily require a lock, nor does it require the block to be copied. Specifically, in the fastpath case, the dispatch_sync()
will call the block directly on the calling thread without copying the block. Even in the slowpath case, the block won't be copied as the calling thread has to block until execution anyway (the calling thread is suspended until whatever work is ahead of the dispatch_sync()
is finished, then the thread is resumed). The one exception is invocation on the main queue/thread; in that case, the block still isn't copied (because the calling thread is suspended and, therefore, using a block from the stack is OK), but there is a bunch of work done to enqueue on the main queue, execute, and then resume the calling thread.
??dispatch_async()
required that the block be copied as it cannot execute on the current thread nor can the current thread be blocked (because the block may immediately lock on some thread local resource that is only made available on the line of code after the dispatch_async()
. While expensive, dispatch_async()
moves the work off the current thread, allowing it to resume execution immediately.
End result -- dispatch_sync()
is faster than @synchronized
, but not by a generally meaningful amount (on a '12 iMac, nor '11 mac mini -- #s between the two are very different, btw... joys of concurrency). Using dispatch_async()
is slower than both in the uncontended case, but not by much. However, use of 'dispatch_async()' is significantly faster when the resource is under contention.
@synchronized uncontended add: 0.14305 seconds
Dispatch sync uncontended add: 0.09004 seconds
Dispatch async uncontended add: 0.32859 seconds
Dispatch async uncontended add completion: 0.40837 seconds
Synchronized, 2 queue: 2.81083 seconds
Dispatch sync, 2 queue: 2.50734 seconds
Dispatch async, 2 queue: 0.20075 seconds
Dispatch async 2 queue add completion: 0.37383 seconds
Synchronized, 10 queue: 3.67834 seconds
Dispatch sync, 10 queue: 3.66290 seconds
Dispatch async, 2 queue: 0.19761 seconds
Dispatch async 10 queue add completion: 0.42905 seconds
Take the above with a grain of salt; it is a micro-benchmark of the worst kind in that it does not represent any real world common usage pattern. The "unit of work" is as follows and the execution times above represent 1,000,000 executions.
- (void) synchronizedAdd:(NSObject*)anObject
{
@synchronized(self) {
[_a addObject:anObject];
[_a removeLastObject];
_c++;
}
}
- (void) dispatchSyncAdd:(NSObject*)anObject
{
dispatch_sync(_q, ^{
[_a addObject:anObject];
[_a removeLastObject];
_c++;
});
}
- (void) dispatchASyncAdd:(NSObject*)anObject
{
dispatch_async(_q, ^{
[_a addObject:anObject];
[_a removeLastObject];
_c++;
});
}
(_c is reset to 0 at the beginning of each pass and asserted to be == to the # of test cases at the end to ensure that the code is actually executing all the work before spewing the time.)
For the uncontended case:
start = [NSDate timeIntervalSinceReferenceDate];
_c = 0;
for(int i = 0; i < TESTCASES; i++ ) {
[self synchronizedAdd:o];
}
end = [NSDate timeIntervalSinceReferenceDate];
assert(_c == TESTCASES);
NSLog(@"@synchronized uncontended add: %2.5f seconds", end - start);
For the contended, 2 queue, case (q1 and q2 are serial):
#define TESTCASE_SPLIT_IN_2 (TESTCASES/2)
start = [NSDate timeIntervalSinceReferenceDate];
_c = 0;
dispatch_group_async(group, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
dispatch_apply(TESTCASE_SPLIT_IN_2, serial1, ^(size_t i){
[self synchronizedAdd:o];
});
});
dispatch_group_async(group, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
dispatch_apply(TESTCASE_SPLIT_IN_2, serial2, ^(size_t i){
[self synchronizedAdd:o];
});
});
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
end = [NSDate timeIntervalSinceReferenceDate];
assert(_c == TESTCASES);
NSLog(@"Synchronized, 2 queue: %2.5f seconds", end - start);
The above are simply repeated for each work unit variant (no tricksy runtime-y magic in use; copypasta FTW!).
With that in mind:
??Use @synchronized()
if you like how it looks. The reality is that if your code is contending on that array, you probably have an architecture issue. Note: using @synchronized(someObject)
may have unintended consequences in that it may cause additional contention if the object internally uses @synchronized(self)
!
??Use dispatch_sync()
with a serial queue if that is your thing. There is no overhead -- it is actually faster in both the contended and uncontended case -- and using queues are both easier to debug and easier to profile in that Instruments and the Debugger both have excellent tools for debugging queues (and they are getting better all the time) whereas debugging locks can be a pain.
??Use dispatch_async()
with immutable data for heavily contended resources. I.e.:
- (void) addThing:(NSString*)thing {
thing = [thing copy];
dispatch_async(_myQueue, ^{
[_myArray addObject:thing];
});
}
Finally, it shouldn't really matter which one you use for maintaining the contents of an array. The cost of contention is exceedingly high for the synchronous cases. For the asynchronous case, the cost of contention goes way down, but the potential for complexity or weird performance issues goes way up.
When designing concurrent systems, it is best to keep the boundary between queues as small as possible. A big part of that is ensuring that as few resources as possible "live" on both sides of a boundary.