Pages

strlen vs. wcstrlen

When working in C++ it would usually be better using C++ strings instead of NUL terminated arrays of characters, AKA c-strings. Anyway, for a number of reasons, c-strings are still popular also in C++ code.

A nuisance about string management is that we actually have two base types. The "normal" char, usually stored in 8 bits, used for representing ASCII strings; and "wide" character (wchar_t), taking 16 or 32 bits. This leads, due to the lack of function overloading in C, to two sets of C-functions with different names for doing the same job on different strings.

So, to get the length of a string, we have two functions: strlen(), for char based strings, and wcstrlen(), for wide char strings.

Another point about a C-string is that it is relatively easy for it to be buggy. The fact is that a '\0' has to be put at the end of the character sequence "by hand". And it is not difficult to forget to do that, or to overwrite the terminator by mistake. But strlen() has been designed to be fast, not smart. It just runs over the string looking for the first NUL occurence. When it finds that, it returns its distance from the string beginning. If no correct end of string is set we could get wierd results.

If we know the expected max value for the string length, we could use it as a safe limit, and using strnlen() / wcstrnlen() to avoid troubles.

Here is how to use these functions:

char s[] = "Nothing more than a simple string";
std::cout << "String length: " << strlen(s) << " - " << strnlen(s, 10) << std::endl;

wchar_t ws[] = L"Nothing more than a simple string"; // 1.
std::cout << "Wide string length: " << wcslen(ws) << " - " << wcsnlen(ws, 10) << std::endl;

1. Notice the L introducing a constant string, to specify it is a wide string.

For a C++ programmer, this a bore. Why should we taking care of checking the actual base type of strings, when we could rely on the compiler for such a trivial job? Wouldn't be fun to have a template function that gets in input the C-string we want to check, and let the compiler doing the dirty job of selecting the right string length function?

printStrLen(s);
printStrLen(ws);

We are in C++, so we can use function overloading. Let's wrap the standard C functions in a couple of C++ function wrappers:

size_t myStrLen(const char* s) { return strlen(s); }
size_t myStrLen(const wchar_t* s) { return wcslen(s); }

size_t myStrNLen(const char* s, size_t n) { return strnlen(s, n); }
size_t myStrNLen(const wchar_t* s, size_t n) { return wcsnlen(s, n); }

Now we can create our template function in this way:

template <typename T>
void printStrLen(T str)
{
std::cout << "String length: " << myStrLen(str) << " - " << myStrNLen(str, 10) << std::endl;
}

Our printStrLen() let us delegate to the compiler the boring job of selecting the correct function for the correct C-string. And since the compiler is a smart guy, it helps us to avoid silly mistakes. For instance, we can't call printStrLen() passing a pointer to int:

int x[20] = { 12 };
printStrLen(x); // compiler error

No comments:

Post a Comment