Imagine a CPU that performs only 64 bit arithmetic operations. Now imagine how you would implement an unsigned 8 bit addition on such CPU. It would necessarily involve more than one operation to get the right result. On such CPU, 64 bit operations are faster than operations on other integer widths. In this situation, all of Xint_fastY_t
might presumably be an alias of the 64 bit type.
If a CPU supports fast operations for narrow integer types and thus a wider type is not faster than a narrower one, then Xint_fastY_t
will not (should not) be an alias of the wider type than is necessary to represent all Y bits.
Out of curiosity, I checked the sizes on a particular implementation (GNU, Linux) on some architectures. These are not same across all implementations on same architecture:
┌────╥───────────────────────────────────────────────────────────┐
│ Y ║ sizeof(Xint_fastY_t) * CHAR_BIT │
│ ╟────────┬─────┬───────┬─────┬────────┬──────┬────────┬─────┤
│ ║ x86-64 │ x86 │ ARM64 │ ARM │ MIPS64 │ MIPS │ MSP430 │ AVR │
╞════╬════════╪═════╪═══════╪═════╪════════╪══════╪════════╪═════╡
│ 8 ║ 8 │ 8 │ 8 │ 32 │ 8 │ 8 │ 16 │ 8 │
│ 16 ║ 64 │ 32 │ 64 │ 32 │ 64 │ 32 │ 16 │ 16 │
│ 32 ║ 64 │ 32 │ 64 │ 32 │ 64 │ 32 │ 32 │ 32 │
│ 64 ║ 64 │ 64 │ 64 │ 64 │ 64 │ 64 │ 64 │ 64 │
└────╨────────┴─────┴───────┴─────┴────────┴──────┴────────┴─────┘
Note that although operations on the larger types may be faster, such types also take more space in cache, and thus using them doesn't necessarily yield better performance. Furthermore, one cannot always trust that the implementation has made the right choice in the first place. As always, measuring is required for optimal results.
Screenshot of table, for Android users:
(Android doesn't have box-drawing characters in the mono font - ref)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…