Pages

From heap to std::string

I'm writing a slightly more complex example on ASIO TCP than the first one, where the server sends a message through the socket to the client that reads it, and then both terminate.

The problem that we face in this post has not much to do with ASIO but on how we could smartly create a std::string from a c-string on the heap without risking memory leaks.

This issue arises from the fact that I want to send/receive through a socket an integer representing the c-string size, and then the string itself, as a stream of characters. The sending part goes smoothly, but receiving it is a bit bumpy. The message has a variable size, so we are forced to store it on the heap, and only after we know its size. But this forces us to writing code that it is clumsy, given that we should remember to delete it after the std::string has been generated from it, and should carefully pay attention to the risk of introducing memory leaks.

This was the first clumsy version:
std::string f(/* ... */)
{
    // ...
    int len;
    boost::asio::read(sock_, boost::asio::buffer(&len, sizeof(int)));

    char* local = new char[len+1];
    boost::asio::read(sock_, boost::asio::buffer(local, len)); // 1
    local[len] = '\0';

    std::string message(local);
    delete local; // 2
    return message; // 3
}
1. First issue: if we have an exception here, the memory associated to "local" is lost in space (or better, in heap).
2. Second issue: we must delete what we have newed, and we must do it before exiting the scope, so this is the exact place where to put this line. It is correct. But does it look elegant to you?
3. Third issue: if we had returned the string ctor, any compiler was good to optimize an unnecessary copy out. But not this case. Sure, if your compiler already support C++0x capabilities and the std::string implementation you use has a move constructor you can take advantage of it here.

So, you could get rid of issue 3 using a std::string move constructor, if available, issue 2 is a minor one, but still issue 1 should not be taken too lightly.

A solution is encapsulating that code in a class, letting the C++ stack unwinding capabilities to take care of details:
class Reader
{
private:
    char* local_;
public:
    Reader() : local_(nullptr) {}
    ~Reader() { delete local_; } // 1

    std::string read(boost::asio::ip::tcp::socket& sock_)
    {
        int len;
        boost::asio::read(sock_, boost::asio::buffer(&len, sizeof(int)));

        local_ = new char[len+1];
        boost::asio::read(sock_, boost::asio::buffer(local_, len)); // 2
        local_[len] = '\0';

        return std::string(local_); // 3
    }
};

// ...

std::string f()
{
    Reader r; // 4
    return r.read(sock_);
}
1. The call to delete is nicely put in the class destructor.
2. Even in case of exception, the dtor is called, so we can ensure delete is called on local_.
3. No extra copy object is generated by your compiler for this code.
4. The reader object is on the stack, so we can rely on the compiler for its correct memory management

Go to the full post

std::equal

When you compare two sequences, you are usually interested to know if any couple are equals or if there is at least one difference. If this is the case, you can use the std::equal(). But you should put some care in using it, you can't compare apples with oranges and you should ensure the first sequence is not shorter than the second, since the comparison is done for each element of the first one, till a difference, or the end, is met. If the second sequence is shorter, we risk a catastrophic dereferencing of an invalid iterator.

Here is the class I used as a base for creating a couple of tests for std::equal():

class EqualTest : public ::testing::Test
{
protected:
EqualTest()
{
vi_.reserve(10);
vc_.reserve(10);

for(int i = 0; i < 10; ++i)
vi_.push_back(i * i);

std::copy(vi_.begin(), vi_.end(), std::back_inserter(vc_));
}

std::vector<int> vi_;
std::vector<int> vc_;
};

In the first test I show the normal usage, we simply pass to equal() the beginning/end iterators for the first sequence, and just the beginning of the second one. Notice that before using equal() I also made a test on the vector sizes, to ensure we don't risk a crash while comparing:

TEST_F(EqualTest, Same)
{
ASSERT_GE(vi_.size(), vc_.size());
EXPECT_TRUE(std::equal(vi_.begin(), vi_.end(), vc_.begin()));
}

The second test is more interesting, since we use a custom comparison function (actually, a lambda function) to force std::equal() to behave how we want:

TEST_F(EqualTest, Lambda)
{
const int step = 3;
std::for_each(vi_.begin(), vi_.end(), [step](int& i){ i += step; });

ASSERT_GE(vi_.size(), vc_.size());
EXPECT_TRUE(std::equal(vi_.begin(), vi_.end(), vc_.begin(),
[step](int i, int j) { return i == j + step; } ));
}

Go to the full post

std::find and std::find_if

If we are looking for an element in a sequence, not knowing much on the sequence itself beside the fact that it could contain a specific value, we will probably use for our job find() or find_if(), both included as algorithm in the C++ STL.

The Google Text fixture I use to make some examples on these algorithm is based on this class:
class FindTest : public ::testing::Test
{
protected:
   FindTest()
   {
      vi_.reserve(10);
      for(int i = 1; i <= 10; ++i)
         vi_.push_back(i*i);
   }

   std::vector<int> vi_;
};
As you can see, we have a vector containing the first 10 natural square numbers (zero excluded).

Find - Negative test.

The find algorithm scans all the items in the passed sequence comparing them with the value passed as third parameter. If successful, it returns the iterator to the found item, otherwise it return the right delimiter of the interval.

Let's if find() works like expected when looking for a value it is not in the sequence:
TEST_F(FindTest, CantFind)
{
   auto it = std::find(vi_.begin(), vi_.end(), 42);

   EXPECT_TRUE(it == vi_.end());
}
Notice that I use the C++11 keyword "auto" for saving a bit of typing. If your compiler does not support it yet, you should tell it the exact type, std::vector<int>::iterator, an iterator to a vector of int. Since the compiler could deduce that without our help, it's nice to let it doing the job.

Find - Positive test.

Easy enough. If we pass to find() a square number below or equals to 100, we expect it to find it:
TEST_F(FindTest, Find)
{
   auto it = std::find(vi_.begin(), vi_.end(), 64);

   ASSERT_TRUE(it != vi_.end());
   EXPECT_EQ(*it, 64);
}
From the point of view of the Google Test user, this code is (mildly) interesting because it shows the use of an ASSERT macro. Since we don't want to dereference the iterator in case we actually find that it is an invalid one, we assert that it should be different from end() on the vector.

Find If - Function.

If we want to find the first element in a sequence of integers that has 5 as divisor, we could create a function that checks its parameter for that, and use it as third parameter in a find_if() call:
bool five(int i) { return i%5 == 0; }

TEST_F(FindTest, FindIfFunction)
{
   auto it = std::find_if(vi_.begin(), vi_.end(), five);

   ASSERT_TRUE(it != vi_.end());
   EXPECT_EQ(*it, 25);
}
Find If - Functor.

If we are looking for a more generic solution, a functor could be a solution. Here we use Divisor, a functor that check if the passed value has as factor (aka divisor) the number we passed to its ctor:
class Divisor
{
private:
   int mod_;
public:
   Divisor(int mod) : mod_(mod) {}
   bool operator()(int i){ return i % mod_ == 0; }
};

TEST_F(FindTest, FindIfFunctor)
{
   auto it = std::find_if(vi_.begin(), vi_.end(), Divisor(5));

   ASSERT_TRUE(it != vi_.end());
   EXPECT_EQ(*it, 25);
}
Find If - Reusing standard functors by adaptors.

Instead of creating a new class, it is better checking in the libraries at hand if there is anything we could reuse. In this specific case, we could think of using the modulus<> functor. That could be a good idea, but we should take care of the fact that find_if() expects a functor accepting one parameter, and not two, as modulus<>. This is not an issue, since we could wrap it with bind2nd, passing 5 as second parameter. Still, that is not all, because we need to reverse the result: our number has five as divisor if it is zero modulus five. This is not a problem too. We just apply the unary negator not1 to the result:
TEST_F(FindTest, FindFiveMod)
{
   const int mod = 5;
   auto it = std::find_if(vi_.begin(), vi_.end(),
   std::not1(std::bind2nd(std::modulus<int>(), mod)));

   ASSERT_TRUE(it != vi_.end());
   EXPECT_EQ(*it % mod, 0);
}
Find If - Lambda.

The previous example was cool, but probably a bit too complex for such an easy task as the one we are dealing with. We can get a more readable code using a lambda function (if this C++11 feature has already reached your compiler):
TEST_F(FindTest, FindFiveLambda)
{
   const int mod = 5;
   auto it = std::find_if(vi_.begin(), vi_.end(), [mod](int i)
   {
      return i % mod == 0;
   });

   ASSERT_TRUE(it != vi_.end());
   EXPECT_EQ(*it % mod, 0);
}

Go to the full post

Changing a collection via for_each

If our compiler supports the new C++0x (now C++ 2011) "for each" loop, we should probably use it. But currently, when we need to loop on a collection to change its values in some way, a good option is represented by the STL standard algorithm for_each().

for_each() is a non-mutating algorithm, in the sense that it does not change the structure of the collection on which it operates - we can't add of remove items - but it lets us changing item values.

It requires as parameter two input iterators, delimiting the sequence, and a Function object that is meant to use any single item in passed sequence.

To practice on for_each I wrote a simple test fixture (using Google Test) that works on an integer vector:
class ForEachTest : public ::testing::Test
{
protected:
ForEachTest() // 1.
{
vi_.reserve(10);
for(int i = 0; i < 10; ++i)
vi_.push_back(i * i);

vc_.reserve(10);
std::copy(vi_.begin(), vi_.end(), std::back_inserter(vc_)); // 2.
}

void checkVector(const int increase) // 3.
{
for(size_t i = 0; i < vi_.size(); ++i)
EXPECT_EQ(vc_[i] + increase, vi_[i]);
}

std::vector<int> vi_; // used by test cases
private:
std::vector<int> vc_; // copy for comparison
};

1. Since I planned to do a few tests all sharing the same structure, I decided to use a fixture. This class provides the basic setup. The ctor initialize an int vector that is going to be changed by the for_each() usage - a copy of the original vector is kept so that we can perform a test on the result.
2. It could be useful to notice how I copied the original vector using the STL copy() algorithm. A back_inserter() on the destination was required since we actually have to push any new item in it.
3. Utility function used to compare the expected result with the actual one.

A normal free function.

If the job we want to perform on our sequence is trivial (as here: increasing each element by three) we could use just a free function:
void increase3(int& value)
{
value += 3;
}

TEST_F(ForEachTest, IncreaseFunction)
{
std::for_each(vi_.begin(), vi_.end(), increase3);
checkVector(3);
}

Functor.

Often a free function is not enough. For instance, we could be interested in using different increase values, so a functor looks a more suitable solution:
class Increaser
{
private:
int step_;
public:
Increaser(int step) : step_(step) {}
void operator()(int& value) { value += step_; }
};

TEST_F(ForEachTest, IncreaseFunctor)
{
const int step = 5;
std::for_each(vi_.begin(), vi_.end(), Increaser(step));

checkVector(step);
}

Lambda

Sometimes we'd love to have the flexibility offered by functors, but we are not so keen in having a standalone class just for such a limited job. In this case se can use a lambda function:
TEST_F(ForEachTest, IncreaseLambda)
{
const int step = 3;
std::for_each(vi_.begin(), vi_.end(), [step] (int& i) { i += step; });

checkVector(step);
}

Lambda function is a C++0x feature, but if your compiler does not support it yet, you could use the boost implementation instead.

Go to the full post

C++ 2011

The final C++0x draft has been approved by the ISO C++ committee! More details on slashdot.

Go to the full post

Template and insertion sort

I have written in the previous post a minimal C function that implements the insertion sort algorithm. It actually works, but it could be lot better. Here we are going to use some C++ functionality to make it stronger and more flexible.

We don't want many limits on the collection to be sorted, so we are passing to it two input iterators, one to the beginning the other to the end, and we want to use our function on almost any data type - we just require from it a support to comparison operator, so that we could actually sort the collection.

All these led to rewrite the function in this way:
template <typename InputIt, typename Compare> // 1.
void insertionSort(InputIt begin, InputIt end, Compare comp) // 2.
{
if(end - begin < 2) // 3.
return;

for(InputIt cur = begin + 1; cur < end; ++cur) // 4.
{
std::iterator_traits<InputIt>::value_type temp = *cur; // 5.

InputIt back = cur; // 6.
for(;back > begin && comp(*(back-1), temp); --back) // 7.
*(back) = *(back-1);
*back = temp; // 8.
}
}

1. It's a template function, based on the input iterator type and the comparison functor type we are about to use.
2. We pass to the function two iterators delimiting the interval on which we want to operate; the third parameter is the functor specifying how we want to sort the data. To get the usually expected behaviour (first element smaller, last element bigger) we should pass std::greater comparison functor.
3. This check was implicit in the C code of the previous example. Here we have to make it explicit: in case the passed collection has less than two elements, there is nothing to do.
4. External loop. We use another iterator, same type of the passed ones to the function, initialized to be the next after begin; and we loop until we reach the end.
5. The type declaration for the temp variable could look quite impressive. We want to have a local variable for storing the value referenced by cur; we don't actually have explicit access to this type, but it is implicitly available since we know the type of the iterators on it. The std::iterator_traits<InputIt> structure encapsulates the properties of the specified iterator, value_type is exactely what we were looking for. If the iterators passed to this function were defined to work on int, this value_type is set to int, so in this line we would actually say that temp was an int, and set to the value pointed by cur. We could have saved some typing using the C++0x auto type declaration:
auto temp = *cur;
Since the compiler could easily deduce the temp object type from the iterator dereferencing on the other side of the assignment operator.
6. We have to twist a bit the logic of the original C code here, since we are working on a stright iterator mimiking what should be the use of a reverse iterator. The problem is that there is nothing before the first valid iterator on a collection (while rend() would exactely referring to it) but the orginal code was designed to work in that way. Here we should work in a less natural way, initilizing our back iterator to be equals to the "end" iterator for the already ordered section of our collection, and using the element pointed by its predecessor.
7. Internal loop. We scan the left part of the collection, the "sorted" one, till we find elements that satisfy the comparator passed to the function, or we reach the collection left limit, moving each element one step to the right.
8. Finally, we put the temp value to its new (or back to its original) position.

It should be nice to the user programmer to provide an overload that requires just a couple of iterators, and uses the "normal" comparison operator:
template <typename InputIt>
void insertionSort(InputIt begin, InputIt end)
{
typedef std::iterator_traits<InputIt>::value_type T;
return insertionSort(begin, end, std::greater<T>());
}

Here is some unit tests I wrote during the development:
TEST(TestK02, Empty) {
std::vector<int> vi;
insertionSort(vi.begin(), vi.end());
EXPECT_EQ(0, vi.size());
}

TEST(TestK02, Normal) {
const int SIZE = 10;
int values[SIZE] = { 42, 12, 94, 45, 1, 55, 95, 34, 73, 29 };
std::vector<int> vi(values, values + SIZE);

insertionSort(vi.begin(), vi.end());

// vector expected sorted
for(int i = 1; i < SIZE; ++i)
EXPECT_GE(vi[i], vi[i-1]);
}

TEST(TestK02, Strings) {
const int SIZE = 10;
std::string values[SIZE] = { "ft", "te", "nf", "ab", "o", "fr", "ng", "tf", "st", "tn" };
std::vector<std::string> vi(values, values + SIZE);

insertionSort(vi.begin(), vi.end());

// vector expected sorted increasing
for(size_t i = 1; i < vi.size(); ++i)
EXPECT_GE(vi[i], vi[i-1]);
}

TEST(TestK02, StrDecr) {
const int SIZE = 10;
std::string values[SIZE] = { "ft", "te", "nf", "ab", "o", "fr", "ng", "tf", "st", "tn" };
std::vector<std::string> vi(values, values + SIZE);

insertionSort(vi.begin(), vi.end(), std::less<std::string>());

// vector expected sorted decreasing
for(size_t i = 1; i < vi.size(); ++i)
EXPECT_LE(vi[i], vi[i-1]);
}

Go to the full post

Insertion sort

I reckon there is no practical use in talking once more about insertion sort, if not to have a bit of (twisted) fun. The idea of this cute sorting algorithm is that we partition our to-be-sorted collection in two blocks: the element already ordered, an the ones that still have to be processed. We check one by one the elements of the second group against the ones in the first one, till the job is done.

It should be enough to read this high level description to see that it relies on two loops proportional to the collection size, meaning that we are in the realm of the Big Oh - N Square algorithms.

On the other side it has a few advantages. First of all, it is easy to design and develop. Then we should appreciate the fact that it is an in-place algorithm (we need just one temporary element), stable and relatively efficient if the array is already near-sorted (in the best case it becomes a Big On - N algorithm).

I think that this simple C implementation should help undestanding how the algorithm works:

void insertionSort(int* vi, int len) // 1.
{
for(int j = 1; j < len; ++j) // 2.
{
int temp = vi[j]; // 3.

// 4.
int i = j - 1; // index of last sorted value
for(;i >= 0 && vi[i] > temp; --i)
vi[i+1] = vi[i]; // 5.

// 6.
vi[i+1] = temp;
}
}

1. It expects in input a pointer to an integer array and its size. Here is the main weakness of this implementation: it is error prone, and works just on one data type. We'll address to these issues in the next post, for the moment we just want to undestand better the algorithm.
2. External loop. We have divided the array in two parts. On the left we have just one element - that we call "sorted" - on the right we have all the other one - that we assume "unsorted".
3. We consider the first element of the "unsorted" subarray. We are duplicating it in a temporary variable, so that we can reuse its location in the array, if we need to move the "sorted" elements.
4. Inner loop. We scan backward the "sorted" elements until we see that the current "sorted" element is less than the currently checked "unsorted" one. Naturally we pay attention to stop when we get at the beginning of the array.
5. We are making room for the "unsorted" element, moving all the "sorted" elements bigger than it one step to the right.
6. We have made room for "temp" in the "sorted" area at position "i+1". We put it there.

Running the code step by step in debug mode should help a lot undestanding better how things work.

Go to the full post

std::move()

We have seen how we can use Rvalue reference to improve performance. Given Rvalue reference, we can overload functions to take adavantage of the fact that we know that the passed variable is just a temporary.

It would be a pity if we couldn't be able to use such a mechanism even on lvalue, when we are willing to take the responsibility of this act. After all C++ is known to let the programmer doing whatever he wants, for the good and for the bad. And actually we could use std::move() for this.

Recalling the code we used in the previous post, say we have an instance of Something, but we don't care anymore of it, we could treat it as it was a temporary. We say this to the compiler using std::move():

Something s5("hello");

Something s6 = std::move(s5); // 1.
s5 = std::move(s6); // 2.

1. Here we are saying to the compiler that we want to create a Something object from s5, and that it could call the ctor for Rvalue reference for doing it. We are aware of the consequences, or at least we should.
2. Naturally, we could do the same also for calling the assignment operator overloaded for Rvalue reference. Same caveat, the compiler trusts us, we should have designed the function correctly and we should accept the result of what is going on.

Go to the full post

What is Rvalue reference good for

C++0x gives us a way of working not only with Lvalue reference (the "classic" reference provided by C++) but also with Rvalue reference. This help us to write more performing code. Let's see how.

Here is a simple class wrapping a resource that we should think as expensive to create and copy:

class Something
{
private:
char* s_;
void init(const char* s)
{
std::cout << "expensive operation" << std::endl;
s_ = new char[strlen(s)+1];
strcpy(s_, s);
}
public:
Something(const char* s = nullptr)
{
std::cout << "ctor for " << (s ? s : "nullptr") << std::endl;
if(s)
init(s);
else
s_ = nullptr;
}

Something(const Something& rhs)
{
std::cout << "copy ctor" << std::endl;
init(rhs.s_);
}

Something& operator=(const Something& rhs)
{
std::cout << "assignment operator" << std::endl;
if(this == &rhs)
return *this;

delete s_;
init(rhs.s_);
return *this;
}

const char* get() { return s_ ? s_ : ""; }

~Something()
{
std::cout << "dtor for " << (s_ ? s_ : "nullptr") << std::endl;
delete s_;
}
};

It is often useful to provide a factory method. Let see a couple of such functions:

Something createSomething(const char* st)
{
std::cout << "A local object that could be easily optimized" << std::endl;
return Something(st); // 1.
}

Something createSomething2(const char* st)
{
std::cout << "A local object that can't be easily optimized" << std::endl;
Something s(st); // 2.

std::cout << "Some job required between object creation and return" << std::endl;
return s;
}

1. If we can put the creation of the object in the return statement, we usually could rely on the compiler to perform all the possible optimization. Theoretically speaking we should create a temporary object here, and copy it to its natural destination in the caller code, but almost any compiler is smart enough to remove this copy in the produced code.
2. If object creation and its return to the call are not part of the same instruction, the compiler usually has no way of performing such an aggressive optimization.

We usually call that function in this way:

Something s4 = createSomething("wow");
Something s5 = createSomething2("mow");

Having a look to the log produced by the first call:

A local object that could be easily optimized
ctor for wow
expensive operation

We see that the compiler actually removed the unnecessary creation/deletion of unnecessary temporary objects, minimizing the operation cost.

For the second call we don't have such a luck:

A local object that can't be easily optimized
ctor for mow
expensive operation
Some job required between object creation and return
copy ctor
expensive operation
dtor for mow

As we see, there is an unnecessary creation/deletion of a temporary object that implies a call too much to the operation marked as expensive.

Let's use the new Rvalue reference to improve the Something class:

Something(Something&& rhs) // 1.
{
std::cout << "Move copy ctor for " << rhs.s_ << std::endl;

s_ = rhs.s_;
rhs.s_ = nullptr;
}

Something& operator=(Something&& rhs) // 2.
{
std::cout << "Move assignment for " << rhs.s_ << std::endl;

delete s_;
s_ = rhs.s_;
rhs.s_ = nullptr;

return *this;
}

1. A copy ctor that relies on the fact that the passed object is a temporary. We don't need to create a new resource copy of the passed one, we can simply steal it!
2. Same for the assignment operator.

If we use our improved Something class definition, we have this log:

A local object that can't be easily optimized
ctor for mow
expensive operation
Some job required between object creation and return
Move copy ctor for mow
dtor for nullptr

We still have a temporary object, but we take advantage of knowing about its nature, and we spare the not necessary call to the expensive operation.

Go to the full post

Lvalue and Rvalue reference

Before C++0x, a variable could contain a value, a pointer to a value, or a reference to a value. C++0x makes a distintion between Lvalue reference - that represents a reference as was already known - and Rvalue reference. Here we see what are all these about, then we'll see what is the use of a Rvalue reference.

Let's declare three variables:

int i = 7; // 1.
int* pi = &i; // 2.
int& ri = i; // 3.

1. i is a variable of type int. Its value is an integer currently set to 7.
2. pi is a pointer to an int. It has been initialized with the address of i, so we say that it points to i.
3. ri is a reference to an int. It has been initialized as an alias of i.

Pointer and reference are not so different, both of them let us access to another memory location containing a value of a certain type. A difference is that we use a reference in a more natural way, the same of a "normal" value variable, while using a pointer requires dereferencing the pointer, leading to a more complicated syntax. On the other side a reference is less flexible, since it must be initialized with a referenced object that can't be changed anymore.

Now about the difference between Lvalue and Rvalue difference. As we have seen, a "classic" reference is associated at its creation to another variable that should be an Lvalue. Lvalue is short for "Left Value", meaning "something that could be on the left side of an assigment operator". What has to be on the right side is called, you should have guessed it, an Rvalue ("Right Value").

This means we can't write code like this:
int& j = 7; // this does not compile
As we said, a reference should be initialize with a Lvalue, but a literal like 7 can't be on the left side of an assignment.

If we want to create a reference associated with an Rvalue we can (now) use a Rvalue reference:
int&& j = 7;

All of this should be clear, the question is: way introducing Rvalue reference in C++? We'll see that in the next post.

Go to the full post

String, const, access violation

It's easy to say C-string, but actually there are many different ways of declare a C-string, each of them has its own peculiar characteristic. And using a wrong definition for the specific issue could even lead to an access violation exception.

A first way of declaring a C-string is this:
char s[] = "Nothing more than a simple string";
Here we are using the array notation. So we are saying to the compiler we won't use the name "s" to refer to anything else to that memory location initialized to contain the specified text (and a '\0' to mark its end). Sure we can change the memory itself, for instance we could change the string first letter:
s[0] = 'M';
But we can't reuse "s", this line won't even compile:
s = "Another string"; // compiler error

If we want to reuse the variable name to point to different strings in memory, we should use instead the pointer notation:
char* s2 = "Nothing more than a simple string";
In this case is legal assign a new string to our variable:
s2 = "Another string";
But we can't modify the string itself. If we try to do something like this:
s2[0] = 'B'; // access violation
we are about to crash our application.
If we are using Visual C, we can use a Microsoft extension to try-catch the code:

#include <excpt.h>

// ...

void aFunction()
{
char* s2 = "Nothing more than a simple string";

__try // 1.
{
s2[0] = 'B'; // access violation
}
__except(EXCEPTION_EXECUTE_HANDLER) // 2.
{
if(GetExceptionCode() == 0xc0000005) // 3.
{
// access violation
}
}
}

1. The __try keyword introduces something very similar to the standard C++ try block.
2. And __except is something like the catch C++ clause, but it expects and integer defining what to do. The define EXCEPTION_EXECUTE_HANDLER specifies to stop the current execution flow, passing to the associated block, and then to the first line after the __except block.
3. GetExceptionCode() returns the code associated with the reason for the code to generate an exception. The exception code for an access violation is defined in standard Windows include file as EXCEPTION_ACCESS_VIOLATION and STATUS_ACCESS_VIOLATION.

If you are writing code that could generated such a problem in C++ (again, Visual C++) you can use the standard try/catch mechanism, but you should tell to the compiler that you want to use the asynchronous exception handling model (option /EHa - and not the standard /EHs). This leads to a code that is bigger and slower - but safer.

We have seen that we have two choices: having a variable bound to a block of memory that we can modify as we please (array notation), or using a pointer, that we can freely associate to different strings that couldn't be changed.
But using pointers we could easily get much more freedom:

char* p = s; // 1.
*p = 'B'; // 2.
p = s + 10; // 3.

1. We say that p points to the same chunk of memory pointed by s, that has been initialized to be modifiable.
2. Being the string pointed by p modifiable, well, we can modify it.
3. And it is a pointer, free as air, so we can reuse it to point so something else (here: the substring starting 10 characters to the left of the original s).

Lot of freedom indeed. Someone could think too much freedom. Sometimes we want to put a limit to what one should do with a string.

If we don't want the memory be changed, we could say that our pointer is a const one:

const char* r = s;
*r = 'S'; // compiler error
r = s + 15;

A "const char*" is a pointer to a string that won't change. If we try to modify the associated memory using this pointer, we'll get a compiler error. On the other side, we could reuse the pointer to refer to another memory location.

If we want the pointer never change its associated block of memory we write:

char* const t = s;
*t = 'S';
t = s + 12; // compiler error

A "char* const" allows us to modify the memory it points to, but if we try to reuse it for another memory location we have a compiler error.

An sure we can combine the constantness:

const char* const z = s;
*z = 'K'; // compiler error
z = s + 10; // compiler error

Go to the full post

strlen vs. wcstrlen

When working in C++ it would usually be better using C++ strings instead of NUL terminated arrays of characters, AKA c-strings. Anyway, for a number of reasons, c-strings are still popular also in C++ code.

A nuisance about string management is that we actually have two base types. The "normal" char, usually stored in 8 bits, used for representing ASCII strings; and "wide" character (wchar_t), taking 16 or 32 bits. This leads, due to the lack of function overloading in C, to two sets of C-functions with different names for doing the same job on different strings.

So, to get the length of a string, we have two functions: strlen(), for char based strings, and wcstrlen(), for wide char strings.

Another point about a C-string is that it is relatively easy for it to be buggy. The fact is that a '\0' has to be put at the end of the character sequence "by hand". And it is not difficult to forget to do that, or to overwrite the terminator by mistake. But strlen() has been designed to be fast, not smart. It just runs over the string looking for the first NUL occurence. When it finds that, it returns its distance from the string beginning. If no correct end of string is set we could get wierd results.

If we know the expected max value for the string length, we could use it as a safe limit, and using strnlen() / wcstrnlen() to avoid troubles.

Here is how to use these functions:

char s[] = "Nothing more than a simple string";
std::cout << "String length: " << strlen(s) << " - " << strnlen(s, 10) << std::endl;

wchar_t ws[] = L"Nothing more than a simple string"; // 1.
std::cout << "Wide string length: " << wcslen(ws) << " - " << wcsnlen(ws, 10) << std::endl;

1. Notice the L introducing a constant string, to specify it is a wide string.

For a C++ programmer, this a bore. Why should we taking care of checking the actual base type of strings, when we could rely on the compiler for such a trivial job? Wouldn't be fun to have a template function that gets in input the C-string we want to check, and let the compiler doing the dirty job of selecting the right string length function?

printStrLen(s);
printStrLen(ws);

We are in C++, so we can use function overloading. Let's wrap the standard C functions in a couple of C++ function wrappers:

size_t myStrLen(const char* s) { return strlen(s); }
size_t myStrLen(const wchar_t* s) { return wcslen(s); }

size_t myStrNLen(const char* s, size_t n) { return strnlen(s, n); }
size_t myStrNLen(const wchar_t* s, size_t n) { return wcsnlen(s, n); }

Now we can create our template function in this way:

template <typename T>
void printStrLen(T str)
{
std::cout << "String length: " << myStrLen(str) << " - " << myStrNLen(str, 10) << std::endl;
}

Our printStrLen() let us delegate to the compiler the boring job of selecting the correct function for the correct C-string. And since the compiler is a smart guy, it helps us to avoid silly mistakes. For instance, we can't call printStrLen() passing a pointer to int:

int x[20] = { 12 };
printStrLen(x); // compiler error

Go to the full post

Waiting for C++0x for each

Among the many useful stuff we are going to have at hand with the new shiny C++0x standard, there is also a new way of looping, the so called "for each" loop. If you use other languages (as Java or Perl, just to make a couple of names) that is no news at all and, actually, the standard C++ for each remembers them closely, expecially the Java "enhanced loop", as they call it.

For what I can see, there is only an issue with the C++0x for each: currently it is not available. At least, if you are not using the GNU compiler.

If you are not in the GNU club, you still have at least a couple of alternatives, non-standard implementation provided by Boost and Microsoft. I won't suggest you to use them, but you could see them in the code you are working with.

Say that we have a vector of integer, like this:
std::vector<int> vi;
vi.reserve(10);

for(int i = 0; i < 10; ++i)
vi.push_back(i);

And we want to print all its elements on one line, using a blank as a delimiter.

A classic way of doing that is looping with a for from zero to the size of the vector:
for(size_t i = 0; i < vi.size(); ++i)
std::cout << i << ' ';
std::cout << std::endl;

There's nothing wrong with this solution, but it is not much expressive of what we are about to do: we want print any single value of the vector. It could be said louder.

If we are working with Visual C++, we can say it using the Microsoft propertary version of for each:
for each(const int& i in vi)
std::cout << i << ' ';
std::cout << std::endl;

In this way we are saying explicitely we are doing our job for each the elements of our vector - but we are saying it in a non-portable way. So, better to avoid it.

A portable way of doing the same is using the BOOST_FOREACH macro:
BOOST_FOREACH(const int& i, vi)
std::cout << i << ' ';
std::cout << std::endl;

The boost libraries are available for a vast number of C++ compilers, so portability shouldn't be an issue. On the other hand, this is not standard, and someone (me included) doesn't like that much using a macro, when an alternative is given.

And a good alternative is provided by the standard algorithm for_each - better if coupled with a lambda function:
std::for_each(vi.begin(), vi.end(), [] (const int& i) {std::cout << i << ' ';} );
std::cout << std::endl;

Admittedly, if you are not acquainted with lambda functions, this piece of code is not exactely so readable as the previous for each versions but, well, it is so cool. And it says exactely what we are going to do. We can read it in this way: for each element in the interval from begin to end of the vector vi, execute the lambda function that takes as input the element currently fetched and output it, followed by a blank, to the console.

But, do we really need to perform a for loop? In this specific case, a natural way of implementing a solution would require us calling the standard algorithm copy, using an ostream_iterator as destination:
std::copy(vi.begin(), vi.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;

Isn't that a beauty?

Go to the full post

Focusing on an element via anonymous function

Actually, the most challenging part of this post was the title. Defining its content in a few words was not so easy as writing it.

The problem is writing a piece of JavaScript code to set the focus on an element in the HTML page when the page is loaded. It is a pretty simple issue, but I felt it was worthy spending a few words.

Say that we want to put the focus on an element named "focusMe" when the page is loaded. This is the code that does the trick:

<script type="text/javascript">
window.onload = function () { document.getElementById('focusMe').focus(); }

// ...
</script>

The code included in the function body should be clear: we get the element with id "focusMe" and call focus() on it.

More interesting is how we assign it to the window onload property, the one responsible to keep the reference to the function that has to be called at page load time: we create a function with no name, and we include in its body the code we want to be called.

We could have given any name to this function, but why bother? the only caller of this piece of code already know where to find it, and no one else would need it.

Go to the full post

Using a DLL

The easiest way of using a DLL in your Windows C/C++ code requires us to use the LIB file, generated by the DLL compilation for letting the compiler know about the exported symbols. A third file that saves us some trouble in using a DLL is a (or more) header for the declaration of the available functions.

In our project property pages, linker section, General tab, we add in the Additional Library Directories the folder where our DLL is. Again in the Linker section, but Input tab, we change the Additional Dependency adding the LIB file relative to our DLL.

Now we can use the DLL in our code:

#include <iostream>
#include "..\xDll\xDll.h"

int _tmain()
{
std::cout << "A banal window app w/DLL" << std::endl;
f();

system("pause");
return 0;
}

Including the xDll.h header file we have available the declaration for the function, named f(), defined in the DLL, so we can use it in the code.

Go to the full post

Simple DLL

If you want to share functions among different application in Windows environment using C/C++ languages, you can use a DLL, Dynamic Link Library.

Creating a DLL using Visual Studio is pretty easy. Create a New Project, and specify DLL as application type.

A DLL may have a sort of main function, that is called at startup and it is a good place where to put initialization code. Let's see here a possible implementation for such a function, that we are about to use just to print a log message when the DLL process is loaded:

#include <iostream>

BOOL APIENTRY DllMain( HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved )
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
std::cout << "DLL process attached" << std::endl;
break;
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}

To make available functions to the user code, we write an header file where we put the function declarations:

#pragma once

#ifndef XDLL_API
#define XDLL_API extern "C" __declspec(dllimport)
#endif

XDLL_API void f();

We define a symbol (here we call it XDLL_API) that is a declaration specification for the functions. When we include this header in the application using the DLL, XDLL_API has not previously defined, so we use the definition here available, that specify that the functions are "dllimport".

Here is how we define a function in a DLL:

#include <iostream>

// XDLL_API defined for export
#define XDLL_API extern "C" __declspec(dllexport)
#include "xDll.h"

void f()
{
std::cout << "Hello from DLL" << std::endl;
}

Before including the header file containg the prototypes, we provide an alternative definition for the XDLL_API symbol, that is going to be used in the DLL. Notice that in the function definition we don't even use it - it is not necessary, since the compiler is smart enough to match the declaration with the definition even if we don't duplicate the declaration specification.

Basically, that's all. We compile, and we have a DLL ready to be used by other applications.

Go to the full post

Dump that table

In a previous post we have seen how to connect to a database using the Perl DBI module, here we write a subroutine, to be called after the connection has been estabilished and before disconnecting, to actually perform a query.

Assuming we have a table named t in the current database schema, here is it:

sub dumpThatTable
{
my $dbh = shift;
my $sth = $dbh->prepare("select * from t");
$sth->execute();
if($sth->err())
{
print "An error occurred: ".$sth->errstr()."\n";
return;
}

my @row;
print "@row\n" while(@row = $sth->fetchrow_array());
print "An error occurred: ".$sth->errstr()."\n" if $sth->err();
}

The subroutine expects to be called with a valid database handle as a parameter, that we extract and put in the local variable $dbh. Then we prepare a SQL statement, putting the result in another local variable, $sth. We try to execute the statement and, in case of success, we loop calling fetchrow_array() on the statement to get the next row and simply printing it.

Go to the full post

Perl chop

Perl chop() shouldn't be easily confused with a pork chop, but there could be a mixup with another, more popular, Perl function: chomp(). The latter is used to remove the string terminator (usually a backslash-n) from the string itself, while chop() is less selective, and removes the last character from a string, whatever it is.

We usually pass a scalar to the chop() function, and usually it is a string. The function returns the last string character (or ASCII NUL, the backslash-zero character, if there is nothing to chop in the string) and it has the side effect of removing that last character from the string itself.

Here is a short example:

my $countdown = "0123456789";
while(1)
{
my $current = chop($countdown);
last if(ord($current) == 0);

print "$current\n";
}

In an infinite loop we iteratively chop a string ouputting the chopped character. If the returned value from chop() is the NUL character we terminate the loop, otherwise we print it on a dedicated line.

Go to the full post

Hello DBI

The most commonly used Perl module to access database is DBI. Once installed this Perl module on my current machine (I simply opened a cpan shell and I entered the command "install DBI" and I let cpan doing its dirty job - obviously an active connection to the internet is required). I wrote a silly little perl script to test everything works fine.

This code just open a connection to a mysql database, and then it close it:

#!/usr/bin/perl
use strict;
use warnings;
use DBI;

my $dbh = DBI->connect("dbi:mysql:test", "root", "password") ||
die "Connection error: $DBI::errstr\n";

$dbh->disconnect();
print "Done.";

There is something interesting even in these few lines.

First of all we see that to start a connection we call the connect() function in the DBI package passing three paramenters representing the database to which we are about to connect (protocol and database name), the database user and its password.

If connect() fails, we let our script die, showing the error message that is stored in DBI::errstr.

Finally (!) we close the connection calling the disconnect() method on the dbh object resulting by the connect() call.

Go to the full post