Unfortunately, the wrapped native CSP for MD5 - MD5CryptoServiceProvider
- is significantly slower than a pure managed implementation. It is an obstinate viewpoint that holds that native code is unequivocally faster than managed code: in many cases the opposite is true. This is such a case, at least in head-to-head measurements.
Using the translated reference MD5 implementation by David Anson, I constructed a quick performance test (source) which aims to measure any large differences in performance between the two implementations. While for small data arrays the difference are negligible, as expected, at around 16kB the native implementation starts to show potentially significant delay - on the order of milliseconds. This might not seem like much, but it is orders of magnitude slower than the pure managed implementation. This difference is maintained as the size of the data being hashed increases, and at the largest tested data array - ~250MB - the difference in CPU time was about 8.5 seconds. Considering that a hash like this is often used to fingerprint very large files, this extra delay would become noticeable, even against the often much larger delays from I/O.
It's not abundantly clear where the delay comes from, since a pure native test was not performed (one which would dispense with the wrapping of a CSP and consumption in managed code), but given the nearly identical shape of the graphs on the log scale, it would appear that the managed and native implementations have the same intrinsic performance, but that the native code performance is "shifted" down in performance likely due to the cost of the interop between native and managed code at runtime. This performance difference between wrapped native CSPs and pure managed implementations has also been reproduced and documented by other investigators.
In addition to answering the question "how much faster is the native implementation" in this particular case, I hope this evidence serves to prompt more reflection and investigation when the question of native vs. managed arises, breaking the long-standing and pernicious reaction to similar questions that native code is always faster, and thus, somehow, better. Managed code is clearly very fast, even in this performance-sensitive domain of bulk data hashing.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…