Do not redefine type char
06 Nov 2020 - John Z. Li
The C (and C++) language standard doesn’t specify whether the char
type
is signed or unsigned. This annoys some programmers,
I’ve seen suggestions in stackoverlow that you explicitly define
char
being signed or unsigned using #define
directive.
The problem with this is that char
, unsigned char
, and signed char
by the language standard are three distinctive types,
whether char
is implemented as signed char or unsigned char
is implementation defined, that is, a problem of representation.
At the level of the type system, internal representation of a type is
independent from semantics of that type.
So, it is never a good idea to redefine fundamental types of the language.
If you do it, you will hardwire a specific representation of a type into
the type system.
This might cause weird problems while interacting with other code
which have a different idea about signed-ness of the char type.
One should never write code that suppose a certain signed-ness of the char
type.
Doing so is breaching type constraints, that is, you are assuming things that the type system has
never promised.
If you mean char
, just use char
. If, for example,
a third-party library exposes its interface using signed or unsinged char,
use the corresponding type accordingly.
Though I believe it is a mistake to expose interface via signed or unsigned chars.
If, on the other hand, everyone just sticks to type char
,
given a certain platform and toolchain, everything just works fine.
In case of cross compiling, gcc provides the compiler option -fsigned-char
or -funsigned-char
,
which can force the underlying representation of char being signed or unsigned respectively.
Another thing to notice is that char
doesn’t have to be 8-bit wide.
In C (and C++), char is synchronous with byte,
the width of which is platform dependent.
So, don’t use signed char
as a replacement of an 8-bit signed integer type, use int8_t
instead, or unsigned char
as a replacement of an 8-bit unsigned integer type, use uint8_t
instead.
Likewise, don’t assume char8_t
, newly introduced in C++20 is synchronous with unsigned char
.
Although the language standard says that the underlying representation of the type is the same with unsinged char
,
but it doesn’t mean they are the same type. Actually,
a standard compliant implementation should evaluate std::is_same_v<unsigned char, char8_t>
to be false.
It is good to be honest with the type system.