In general, lock free algorithms are less efficient per thread - you're doing more work, as you mentioned, in order to implement a lock free algorithm than a simple lock.
However, they do tend to dramatically improve the overall throughput of the algorithm as a whole in the face of contention. Thread switching latency and context switches, which fast, over many threads slow down the throughput of your application dramatically. Lock free algorithms effectively are implementing their own "locks," but they do so in a manner that prevents or reduces the number of context switches, which is why they tend to out perform their locking counterparts.
That being said - most of this depends on the algorithm (and implementation) in question. For example, I've got some routines that I managed to switch to .NET 4's new concurrent collections instead of using the previous locking mechanisms, and have measured improvements over near 30% in my total algorithm speed. That being said, there are many benchmarks you can find that show reduced performance using some of these same collections when compared to a basic lock. As with all performance optimizations - you really don't know until you measure.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…