Four cases for reinterpret_cast
23 Sep 2020 - John Z. Li
A previous blog post of mine Type punning in C++
mentioned that using reinterpret_cast
to do type punning can easily leads to code that
violates strict alias rule, thus undefined behavior, as the following code does
int d = 1234567;
//don’t do this, undefined behavior
std::cout << *reinterpret_cast<float *>(&d) <<std::endl;
There are quite some posts on the Internet about reinterpret_cast
that actually involves examples violating strict alias rule unintentionally.
But when is it safe to use reinterpret_cast?
I will give 4 cases where reinterpret_cast
can be safely used in this post.
The list is not exhaustive, only including most common cases.
First of all, don’t use C-style cast in C++ code. What C-style cast does is
trying any of static_cast
, const_cast
, reinterpret_cast
or
combinations of the three to get its job done.
It is kind of hard to reason what happens with a C-style cast while
reading C++ code.
When static_cast
or dynamic_cast
expresses intention of the programmer better,
one should choose one of them specifically. With that in mind, here we go.
Case 1: cast between a pointer to signed integer types (char, short, int, etc.,) and a pointer to its corresponding unsigned pointer like below
char a[] = "a char string";
unsigned char* up = reinterpret_cast<unsigned char *>(a);
As a side note, it is also OK to convert a pointer to std::byte
(introduced in C++17)
from a pointer to (signed or unsigned) char.
This is useful when working with third party interfaces that takes unsigned char*
but you have char *
, or std::byte
.
Case 2: cast an object pointer to a pointer to raw bytes. For example, a possible implementation of a string class is as below
class string{
char * data;
size_t capacity;
size_t size;
};
The most significant bit of the most significant byte of the member variable size
,
when set to 1, indicates that the current string object is a small string,
which is stored in the remaining bytes of the object (including an appending ‘\0’ ).
(Remaining 7 bits of that byte is used to store the size information of a small string.)
Suppose it is a little-endian 64-bit machine ,
a strings with its size being smaller or equal to 22 (3 times 8 minus 2)
does not incur heap allocation.
To check whether a string instance is a small string, one needs to perform something like
//string s;
bool is_small_string = reinterpret_cast<char*>(&s)[24] & 0x80 ? true : false ;
to check and let front()
refers to reinterpret_cast<char*>(s)[1]
if it is.
Case 3: cast to escape from nested structs, arrays or unions. For examples, with
struct A{int a;};
struct B{
A a;
int b;
};
struct C{
B b;
int c;
};
C c{{{1}, 2}, 3};
int (& ints)[3] = reinterpret_cast<int(&)[3]>(c);
// to treat c as an array of three ints
object c
, after compilation, boils down to three contiguous ints.
So we can use reinterpret_cast
to treat c
as an array of int of length 3.
This is useful when one works with complex numbers.
No matter how complex number is implemented,
the standard guarantees that for a complex number z, the following is true:
// T is the underlying type
reinterpret_cast<T(&)[2]>(z)[0] == z.real();
reinterpret_cast<T(&)[2]>(z)[1] == z.imag();
and for an array of complex numbers, p, there is
reinterpret_cast<T*>(p)[2*i] == p[i].real();
reinterpret_cast<T*>(p)[2*i + 1] == p[i].imag();
Case 4: cast to restore or construct nested structs, arrays or unions. This is essentially the reverse procedure of Case 3. It is especially useful when working with matrices (two-dimensional arrays) and tensors (arrays with dimension equal to or larger than three). For example,
loat fa[16] = {};
//fb is a reference to a 4-by-4 matrix
float (& fb)[4][4] = reinterpret_cast<float(&)[4][4]>(fa);
//fc is a reference to a 4-by-2-by-2 tensor.
float (& fc)[4][2][2] = reinterpret_cast<float(&)[4][2][2]>(fa);
When dealing with matrices or tensors, casting around like this could be handy.