I reckon you already know enough about the Fibonacci function, that is described recursively in this way:

Fibonacci(n) = Fibonacci(n-2) + Fibonacci(n-1)Where Fibonacci(0) and Fibonacci(1) are defined to be 0 and 1.

If this looks new to you, you can read more about it on wikipedia.

A way of calculating a Fibonacci number would be just sit and wait for the user to tell us which Fibonacci number he actually wants, and only then starting to calculate it. But that would be a loss of time. While the user is pondering on his choice, we could already start doing the job, and putting the results in a cache. As the user tells us which actual number he wants we check if we are lucky, and the result is already available.

That means, we should have a couple of threads, one waiting for the user choice, one working on the Fibonacci calculation.

Then, it is quite natural to think about another optimization. The function that performs the Fibonacci calculation could take advantage of the cache, too. This is not very interesting from the point of view of the multithreading design of this piece of code, but it has its (big) impact on the execution time. You could have fun checking the change in the application performance using it.

Here is the first part of the code, the include file for the class Fibonacci declaration:

#pragma once // 1 #include <vector> #include <boost/thread.hpp> #include <boost/thread/condition.hpp> class Fibonacci { private: std::vector<unsigned int> values_; // 2 bool optim_; // 3 boost::mutex mx_; // 4 boost::condition cond_; // 5 boost::thread tCalc_; // 6 void precalc(); // 7 unsigned int getValue(unsigned int index); // 8 public: static const int MAX_INDEX = 40; // 9 Fibonacci(bool optim = false); ~Fibonacci(); unsigned int get(unsigned int index); // 10 };1. This code is for MSVC++2010, it supports this useful pragma directive ("once") to avoid multiple inclusion of the same file.

2. This vector is used to cache the intermediate values.

3. Flag for the performance optimization, by default it is not used.

4. Mutex to rule the access to the cache.

5. Condition used to notify to the reader thread waiting on the vector that the writer thread has generated a new value.

6. The thread that takes care of calculating the Fibonacci numbers.

7. The thread (6) executes this function, to precalculate the Fibonacci numbers.

8. Internal function to get a specific Fibonacci number.

9. This is a toy Fibonacci calculator. The biggest Fibonacci number we could get with it is 40.

10. The user of the Fibonacci class could just ask for a Fibonacci number, through this method.

It's quite easy to use the Fibonacci class. What one should do is just create an object and call its method get() passing the Fibonacci number required:

#include <iostream> #include "Fibonacci.h" void fib() { Fibonacci fib; int input; while(true) { std::cout << "Your input [0 .. " << Fibonacci::MAX_INDEX << "]: "; std::cin >> input; if(input < 0 || input > Fibonacci::MAX_INDEX) break; std::cout << "Fibonacci of " << input << " is " << fib.get(input) << std::endl; } std::cout << "Bye" << std::endl; }Let's have a look now at the Fibonacci class implementation.

Here is the constructor:

Fibonacci::Fibonacci(bool optim) : optim_(optim) { values_.reserve(MAX_INDEX); tCalc_ = boost::thread(&Fibonacci::precalc, this); }We already know the max size for our vector, so we reserve immediately enough room for it. Then we start a thread on Fibonacci::precalc() - since it is a non static function we should pass to the boost::thread constructor a pointer to the object we are using, that is "this".

The destructor has just to keep waiting for the tCalc thread to terminate, so it just call join() on it. But, since calculating Fibonacci numbers could be a long and boring affair, it is better to call an interrupt() on that thread, to see if we can cut it shorter. In any case we are destroying the object, so no more calculation is required:

Fibonacci::~Fibonacci() { tCalc_.interrupt(); tCalc_.join(); }To get a Fibonacci number we call the get() method. What it does, it is trying to read from the vector the Fibonacci number at the index passed by the caller. Notice that in order to access exclusively the vector it use a lock on the mutex we create just for that reason.

If the element is not already available, we just wait on the condition that the other thread does its job. Any time a new Fibonacci number is generated, we'll get a notification, so another check will be done on the vector size, and the user will get another waiting message - if the expected number is not already calculated - or the return value:

unsigned int Fibonacci::get(unsigned int index) { if(index < 0 || index > MAX_INDEX) return 0; boost::lock_guard<boost::mutex> lock(mx_); while(index >= values_.size()) { std::cout << "Please wait ..." << std::endl; cond_.wait(mx_); } return values_.at(index); }Here is precalc(), the function that is run by the other thread. It is just a loop that calls getValue(), the core functionality of this class, the real place where the Fibonacci numbers are computed, then it gains exclusive access to the mutex protecting the vector, puts the newly calculated value in it, and finally notify to the other thread that something has changed:

void Fibonacci::precalc() { for(int iteration = 0; iteration <= MAX_INDEX; ++iteration) { unsigned int value = getValue(iteration); boost::lock_guard<boost::mutex> lock(mx_); values_.push_back(value); cond_.notify_one(); } }And finally, the actual Fibonacci number calculation. To shorten the working time, we can use the cache - in any case it is already there - otherwise we figure out the value recursively calling getValue().

Notice that we put an interruption_point before the internal calls, in this way we can cancel the execution there, when required:

unsigned int Fibonacci::getValue(unsigned int index) { if(optim_) { boost::lock_guard<boost::mutex> lock(mx_); if(index < values_.size()) return values_.at(index); } switch(index) { case 0: return 0; case 1: return 1; default: boost::this_thread::interruption_point(); return getValue(index - 2) + getValue(index - 1); } }

The two executions of lock function for each iteration of loop will kill all performance gain from multithreading.

ReplyDeleteHi Borsuk. Yes, you are right, locks are a well known bottleneck, and performance-wise this design is far from being optimal. Still, here the point was to show how to take advantage of multithreading to start calculating a result before the input was known.

Deletethank you for your great blog.. very useful!

ReplyDeleteI just wanted to contribute a tip that can dramaticlly increase the performances of the program by re-using the cache in optimized mode:

(not perfect yet as I have removed the lock because I've got some deadlock issues..)

unsigned int getValue(unsigned int index)

{

if (index < 2)

return index;

boost::this_thread::interruption_point();

if(optim_)

{

if(index < values_.size())

return values_.at(index);

else if(index -1 < values_.size())

return getValue(index - 2) + values_.at(index - 1);

else if (index -2 < values_.size())

return values_.at(index - 2); + values_.at(index - 1);

}

return getValue(index - 2) + getValue(index - 1);

}

Hi Hatem, thank you for the hint. As soon as I have some spare time, I'll integrate it in the sample.

Delete