Race condition

When different threads or process make use of the same data, we should pay attention not to incur in race conditions.

We have a race condition when two or more concurrent branches of execution depend on the same mutable shared state.

Let's see here a simple example of race condition.

Say that we should manage directly a linked list of elements, that could be defined in such a naively way:
struct Element
Element(int v, Element* n) : value(v), next(n) {}

int value;
Element* next;

To print all the items in a list of such Element's we could use this raw function:
void dumpList(Element* curr)
while(curr != nullptr)
std::cout << curr->value << ' ';
curr = curr->next;
std::cout << std::endl;

Adding an element to the begin of our linked list is a bit more interesting:
void addFront(Element*& head, int value) // 1
Element* t = head ? head : nullptr; // 2
Element* e = new Element(value, t); // 3

head = e; // 4

1. We pass to our function the pointer to the current head of list, and we pass it by reference, since we want to be able to change it, since we want to create a new head of the list.
2. This line could be merged in the next one, we don't actually need a temporary Element here, but it should make the code a bit clearer. Only if we have a "real" current head we need to set the next of the new element we are about to create.
3. A new Element is created.
4. The Element becomes the new head of the list.

The normal usage of our list is this:
Element* head = nullptr;

addFront(head, 42);
addFront(head, 24);

Everything works fine - if we stick to a single thread environment. Using such a code in a multithreading environment is an explicit request of troubles.

But let's start looking at a case where our code still works:

Element* head = nullptr;

boost::thread t1(addFront, std::ref(head), 42); // 1
boost::this_thread::sleep(boost::posix_time::millisec(250)); // 2
boost::thread t2(addFront, std::ref(head), 24); // 3



1. This new thread access by reference local data, the pointer to the list head. That means that both thread - main and worker - are accessing the same data. We are risking a race condition.
2. Introducing a relatively long sleeping period, we serialize actually remove the concurrency from this application.
3. Also this new thread is using the same head - the two worker thread are competing on the same state. If we don't acknowledge this situation we could expect big problems.

And it is quite easy to get troubles - it is enough to remove [2]. If there is no sleep, we should expect a mixup in addFront(), leading usually to data loss - only one Element would win the insertion in the list.

To better see what is going on, and in the meantime to make more visible the issue, let's rewrite the addFront() function. We'll just add a sort of emulation of the classic stepping debug function that is designed to work effectively in a multithreading environment:
boost::mutex mio; // 1

void step(int i)
boost::lock_guard<boost::mutex> l(mio); // 2
std::cout << boost::this_thread::get_id() << "/" << i << std::endl;

boost::this_thread::sleep(boost::posix_time::millisec(50)); // 3

1. A mutex to protect the output console, being it a shared resource. Actually, we should use it also in the dumpList() function - this is not an issue here, since the dumping is done just at the end of our minimal application, when the working threads have been already joined.
2. Lock the mutex before using the shared resource.
3. Let's make the competing effect more visible adding a sleep.

Let's use the stepping function in this way:
void addFront(Element*& head, int value)
Element* t = head ? head : nullptr;
Element* e = new Element(value, t);

head = e;

Now, running the two working threads should lead to a (catastrophic) result like this one:

It is easy seeing where the problem arise: there is a race to assign the element created by each thread to the head of the list, that is a shared resource. The last thread that writes in head is the winner, since it is going to override the writing previously done by the other thread, that simply gets lost.

The solution to the issue is easy: defining another mutex, and using it to protect the access to the shared resource.

No comments:

Post a Comment