Codility MissingInteger

Given a not empty integer vector, returns the minimal positive number not seen in input. You could find this problem in the Codility test Counting Elements section. It is marked as "respectable", even though it doesn't seem more complicated than an average "painless" one.

To clarify what we are looking for, here is a bunch of C++11 GTest test cases:
TEST(MissingInteger, One) // 1
{
    std::vector<int> input { 1 };
    ASSERT_EQ(2, solution(input));
}

TEST(MissingInteger, Neg) // 2
{
    std::vector<int> input { -3 };
    ASSERT_EQ(1, solution(input));
}

TEST(MissingInteger, Given) // 3
{
    std::vector<int> input { 1, 3, 6, 4, 1, 2 };
    ASSERT_EQ(5, solution(input));
}

TEST(MissingInteger, Sequence) // 4
{
    std::vector<int> input { 1, 2, 3, 4, 5, 6, 7 };
    ASSERT_EQ(8, solution(input));
}
1. Only one integer in input.
2. We could have negative integer in input.
3. Example provided by Codility.
4. Natural sequence.

An intuitive solution would be checking all the positive numbers in [1..n], where n is the vector size, and returning as soon as we find a missing value. But think to test 4. It is the worst case scenario. I'll check all the element, to see that no one of them is missing, and so I would return n+1, with a n-squared complexity.

To achieve a time-linear solution, we should use some extra memory. We'll scan the input vector just once, setting a flag for each found positive number. Then I'll search for the first unset flag, and that's it:
int solution(std::vector<int>& input)
{
    std::vector<bool> flags(input.size()); // 1

    std::for_each(input.begin(), input.end(), [&flags](int cur){ // 2
        if(cur > 0 && cur <= static_cast<int>(flags.size())) // 3
            flags[cur - 1] = true;
    });

    return std::find(flags.begin(), flags.end(), false) - flags.begin() + 1; // 4
}
1. A flag for each expected (in the worst case) positive integer in input.
2. Loop on all the input data
3. If I the current value is positive and in the expected range, convert it to a valid index for the flags buffer, and mark that element as seen.
4. Find the index of the first missing element (remember that the STL find() function return the end() iterator in case it doesn't find anything, and this behavior is just what we want here), convert it back to the related value an return it.

Go to the full post

Codility PermCheck

We have in input an integer vector, and we want to check if it is a permutation of the natural sequence [1..n], where n is the vector size.
You could find this problem in the Codility test Counting Elements section.

I have already written a post about it, where I provide some test cases and a few different C++ solutions. Some time has passed, I am solving them from scratch, this time using C++11, that in the meantime has become supported by Codility. Please have a look at it for more details.

Here is my current solution:
int solution(std::vector<int>& input)
{
    std::vector<bool> flags(input.size()); // 1

    for(auto it = input.begin(); it != input.end(); ++it) // 2
    {
        unsigned index = *it - 1; // 3
        if(index >= input.size() || flags[index]) // 4
            return 0;
        flags[index] = true;
    }

    return 1; // 5
//    return std::find(flags.begin(), flags.end(), false) == flags.end();
}
1. The idea is to use a buffer where I keep track of all the expected elements. Initially no one has been seen, once I find a number I flag it.
2. Loop on all the elements.
3. Convert the current value in its associated zero-based vector index.
4. If the index is outside the expected range, or if the flag has already been set for this value, the sequence is not a permutation. Return the required failure value.
5. If I get here, the sequence has been checked and found OK, return the success value. Actually, also this time I didn't notice that, and I originally wrote the next line, now commented, to explicitly check that any number is found. This is useless, since I have already checked in (4) all possible cases.

Go to the full post

Codility FrogRiverOne

This problem is described in the Codility web site, Counting Elements section.
We are asked to check if the integer vector passed in input contains all the values in the natural sequence from 1 up to a passed number.

I have already given a couple of C++ solutions (a Big-Oh-Squared and a linear one) in a previous post. Here I give the C++ solution that jumped now to my mind, and it sort of look cleaner to me.

First of all, a couple of test cases (based on the GTest framework) that should clarify the problem:
TEST(FrogRiverOne, Given)
{
    std::vector<int> input { 1, 3, 1, 4, 2, 3, 5, 4 }; // 1
    ASSERT_EQ(6, solution(5, input));
}

TEST(FrogRiverOne, Cannot)
{
    std::vector<int> input { 1, 3, 1, 4, 2, 3, 5, 4 }; // 2
    ASSERT_EQ(-1, solution(6, input));
}
1. The Codility given test case. Given this vector, we want to get the index of the first element so that any value in [1 .. 5] is available. That means 6, given that 1 is at position 0 (and 2), 2 at 4, 3 at 1 (and 5), 4 at 3, and finally 5 at 6.
2. A negative case, 6 is not in the passed vector, so we should return -1.

Here is my latest solution:
int solution(int x, std::vector<int>& input)
{
    assert(x > 0); // 1
    assert(!input.empty());

    std::vector<bool> flags(x-1); // 2
    for(unsigned i = 0; i < input.size(); ++i) // 3
    {
        unsigned index = input[i] - 1; // 4
        if(!flags[index]) // 5
        {
            flags[index] = true;
            if(--x == 0)
                return i; // 6
        }
    }

    return -1; // 7
}
1. Enforce the most sensitive problem requisite. This is not requested by Codility, that ensures only "clean" input data are provided.
2. When I see a number in [1..x] I mark it as found. Here I store the flags for this purpose.
3. Loop on all the vector input elements.
4. Convert the current input value so that I can use it as an index for the flags vector. Production code should ensure that index is not outside the range of valid values.
5. If I see this value for the first time, I mark it as found, and I decrease the number of missing value that I am looking for.
6. When I have found all the expected elements, I return the current index in the input vector.
7. Otherwise not-found is returned.

Go to the full post

Codility FrogJmp

You could find the full description of this problem, and check your solution in (possibly) your favorite programming language, on the Codility web site, Time Complexity section. To simply put it, we have to calculate a distance, and determine how many segments of a given size we need to cover that distance.

It is marked as "painless", even if "embarrassingly simple" would probably be a better description. Still, even in this case a mistake could be lurking somewhere, ready to catch us off guard. Here the issue could be that we need to consider two cases, and one of them could be overseen. Here is a couple of test cases (C++, GoogleTest) that clarifies the matter:
TEST(FrogJmp, Given) // 1
{
    ASSERT_EQ(3, solution(10, 85, 30));
}

TEST(FrogJmp, Precise)
{
    ASSERT_EQ(4, solution(10, 90, 20));
}
1. A frog starts its trip on 10, wants to reach 85, and its jump is sized 30. Obviously it needs three steps to get it.
2. A weaker frog on a longer path. In four steps it gets exactly on its end point.

Here is how implemented my solution:
int solution(int x, int y, int d)
{
    assert(x < y); // 1
    assert(d > 0);

    int distance = y - x; // 2
    int steps = distance / d; // 3
    if(distance % d != 0) // 4
        ++steps;
    return steps;
}
1. Assuming no tricky input. The frog starts on the left and goes to the right.
2. This is the distance it has to travel.
3. The solution is just a matter of dividing the distance for the step size ...
4. ... plus one, in case our frog is not lucky enough to drop right on its path end.

Go to the full post

Codility Tape Equilibrium

The Codility Tape Equilibrium problem is marked as "painless" in their Time Complexity section. Basically, we have a vector in input, containing at least two items, and we want to split it in a way that minimize the difference between the two sections. Our function should return such a minimal value.

A few test cases would make clearer what we are looking for:

TEST(TapeEquilibrium, Given) // 1
{
    std::vector<int> input { 3, 1, 2, 4, 3 };
    ASSERT_EQ(1, solution(input));
}

TEST(TapeEquilibrium, Couple)
{
    std::vector<int> input { 3, 3 };
    ASSERT_EQ(0, solution(input));
}

TEST(TapeEquilibrium, Couple2)
{
    std::vector<int> input { 1000, -1000 };
    ASSERT_EQ(2000, solution(input));
}

TEST(TapeEquilibrium, HundredK) // 2
{
    std::vector<int> input(100000, 1000);

    EXPECT_EQ(0, solution(input));
}

TEST(TapeEquilibrium, HundredKMinusOne)
{
    std::vector<int> input(99999, 1000);

    EXPECT_EQ(1000, solution(input));
}
1. This is the test case provided by Codility. It should be easy to see how we minimize the difference by splitting the input in {3, 1, 2} and {4, 3}. The expected output of our function is one.
2. Hundred thousand items, each of them set to one thousand. Lot of work for our function, but it is easy to see how the solution should be zero.

If you wonder, these are C++11 tests written for the GoogleTest xUnit framework. If you are using an older C++ compiler, you could have a look to a previous version of this post, that also shows a bad, O(N**2), alternative solution.

My linear time solution starts creating two partial sums, left and right, where initially the left one contains only the leftmost element, and the right one all the others. Then, I would remove one element at the time from the right one and add it to the left one. Each time I should repeat the check to see if I find a better solution.

Here is how I implemented it:
int solution(std::vector<int>& input) // 1
{
    assert(input.size() > 1); // 2

    int left = input.front(); // 3
    int right = std::accumulate(input.begin() + 1, input.end(), 0); // 4
    int result = std::abs(left - right); // 5

    std::for_each(input.begin() + 1, input.end() - 1, [&](int cur){ // 6
        left += cur;
        right -= cur;
        int tentative = std::abs(left - right);
        if(tentative < result) // 7
            result = tentative;
    });

    return result;
}
1. Actually, the input vector could and should be a const reference. Codility requirements are sometimes a bit sloppy.
2. Not required by the problem, however I felt bad to assume such a requisite without enforcing it in the code is some way.
3. Initializing the left sum is easy.
4. The right sum requires a bit more work. To keep the code more readable I used the handy STL accumulate function.
5. Initialize the function result. Notice that we are interested in the difference about left and right without caring about its sign.
6. A nice lambda function in a for_each loop keeps the code compact and, hopefully, readable. I am starting to loop from the second element, and I would end one before the end, since no section should be empty.
7. This is a better candidate, save it.

Go to the full post

Shortest path by breadth-first search

Given an undirected unweighted graph that has no loop, we can use a basic breadth-first search algorithm to determine the shortest path from a specified vertex to any (other) one in the graph.

In the previous post I showed a way to store a graph in a C++11 STL container (a vector of forward list of unsigned int). This is the Graph class you are going to see in the code below. Besides, Vertex is a typedef for unsigned int, meaning that we identify a vertex simply by an unsigned number, starting from zero.

This is the class I want to develop:
class BreadthFirstSearch
{
private:
    std::vector<Vertex> parents_; // 1
    Vertex start_; // 2
public:
    BreadthFirstSearch(const Graph& graph, Vertex start); // 3
    std::vector<Vertex> path(Vertex end); // 4
    std::vector<Vertex> backpath(Vertex end); // 5
    void printPath(Vertex end); // 6
};
1. As a result of the breadth-first search algorithm I want to put in this vector the closest ancestor to the start one of any vertex in the associated graph.
2. This is the vertex that I want to use as starting point for BFS algorithm.
3. The ctor gets in input a Graph and one of its vertex and fills the parents_ vector as expected.
4. This method returns the path from the start vertex, as specified by the ctor, to the passed end one.
5. Not strictly a necessity, it could be a private method. Provides the shortest path in reversed order, from end to start.
6. Utility method, dump to standard output a shortest path.

The BreadthFirstSearch constructor implements the BFS algorithm in this way:
BreadthFirstSearch::BreadthFirstSearch(const Graph& graph, Vertex start) :
        parents_(graph.vertices_.size(), std::numeric_limits<Vertex>::max()), start_(start) // 1
{
    if (start >= graph.vertices_.size()) // 2
        return;

    std::vector<bool> seen(graph.vertices_.size()); // 3
    std::queue<Vertex> queue; // 4

    queue.push(start); // 5
    seen[start] = true;
    while (!queue.empty()) // 6
    {
        Vertex vertex = queue.front();
        queue.pop();

        for (auto it = graph.vertices_[vertex].begin(); it != graph.vertices_[vertex].end(); ++it) // 7
        {
            Vertex next = *it;
            if (!seen[next]) // 8
            {
                queue.push(next);
                seen[next] = true;
                parents_[next] = vertex;
            }
        }
    }
}
1. Initially, the parents_ vector contains just "no parent" elements. I used the largest value available for a Vertex to represent such state.
2. Event though this code is not production ready, I couldn't help to put at least a minimal error handling in it. The vertex passed as starting point should be an actual Graph element.
3. This vector keep track of all the vertices that have been already checked in a previous step of the algorithm. Initially no one is, so we could rely on the default behavior for the vector ctor that sets to false all its elements.
4. On this queue I'll put all the vertices that are connected to the one is currently checked.
5. Let's put the control variables in the initial condition. The start vertex is enqueued, and it is marked as seen.
6. Loop until all the elements in the queue are processed.
7. Loop on all the vertices connected to the current one.
8. If this vertex has not already processed, push it in queue, mark it as seen, set as its next the current vertex.

The BreadthFirstSearch ctor has filled the parents_ vector, now we can use it to create the shortest path from the start vertex to a specific one:
std::vector<Vertex> BreadthFirstSearch::path(Vertex end)
{
    std::vector<Vertex> backtrace = backpath(end); // 1
    return std::vector<Vertex>(backtrace.rbegin(), backtrace.rend()); // 2
}
1. Actually, the real job is delegated to backpath().
2. Since backpath() returns a reversed path, the only real task of this method is reverting the vector to return a "stright" one.

If you want the path to be stored in a vector, you will find that it is in the nature of the problem to generate a reversed solution. Or maybe you could use a deque, filling it from the front. Anyway, this is my solution:
std::vector<Vertex> BreadthFirstSearch::backpath(Vertex end)
{
    std::vector<Vertex> backtrace; // 1
    if (end >= parents_.size()) // 2
        return backtrace;

    for (Vertex cur = end; cur != std::numeric_limits<Vertex>::max(); cur = parents_[cur])
        backtrace.push_back(cur); // 3
    return backtrace;
}
1. I am going to backtracing from the passed graph vertex up to the starting one, pushing each parent in this vector.
2. Better to ensure the caller won't pass a nonexistent vertex.
3. Each ancestor is pushed back in the vector, until the "no parent" element is found.

The utility method that dumps the path to standard shows how to use backpath():
void BreadthFirstSearch::printPath(Vertex end)
{
    std::vector<Vertex> vxs = backpath(end);
    std::copy(vxs.rbegin(), vxs.rend(), std::ostream_iterator<Vertex>(std::cout, " "));
    std::cout << std::endl;
}
Here is a piece of code that uses my BreadthFirstSearch class:
BreadthFirstSearch bfs(graph, 0); // 1
bfs.printPath(3); // 2
1. I pass to bfs the graph object as created in the previous post example, and I specify zero as the starting vertex.
2. Ask to print the shortest path from zero to three.

The expected result is:
0 4 3 

Go to the full post

Graph by adjacency list

If you need to work with graphs in your C++11 code, you'd usually rely on someone else's job, like the Boost Graph Library, friendly known as BGL. Sometimes it happens you simply can't, and you have to work it out by yourself. Here I am writing a trivial Graph class that would let me to store an undirected unweighted graph in a compact form.

I have a simple graph as the one showed in the picture. Each vertex is represented by an unsigned integer starting from zero, that helps me to keep the code even simpler. Edges have no weight nor direction, so we can move from vertex 0 to vertex 5 and vice versa, and we are not interested in the cost of moving from one vertex to another one. We only want to know if we can actually go from here to there.

The two common ways to represent a graph differ by using a matrix or a list to store the adjacency of each vertex. As often happens, you should know the actual problem you are tackling to decide which data structure would suit you better. Still, list is usually the primary suspect.

In this first implementation, my class Graph provides only a constructor to set it up and a print method to show what it has in its belly. The main focus here is about showing how the data is stored in it.
using Vertex = unsigned int; // 1
using Edge = std::pair<Vertex, Vertex>; // 2
using Edges = std::vector<Edge>; // 3
using Vertices = std::forward_list<Vertex>; // 4

class Graph
{
public:
    std::vector<Vertices> vertices_; // 5

    Graph(int nv, Edges edges) : vertices_(nv) // 6
    {
        std::for_each(edges.begin(), edges.end(), [this](const Edge& edge) // 7
        {
            if(edge.first < vertices_.size() && edge.second < vertices_.size()) // 8
            {
                vertices_[edge.first].push_front(edge.second); // 9
                vertices_[edge.second].push_front(edge.first);
            }
        });
    }

    void print() // 10
    {
        for(Vertex i = 0; i < vertices_.size(); ++i)
        {
            std::cout << i << ": ";
            std::copy(vertices_[i].begin(), vertices_[i].end(), std::ostream_iterator<Vertex>(std::cout, " "));
            std::cout << std::endl;
        }
    }
};
1. Each vertex is represented by an unsigned integer, starting from zero.
2. An edge is defined by the two vertices delimiting it.
3. I want to pass all the edges in my graph to the class constructor. This is the collection I am going to use for this task.
4. Any vertex in the graph has an associated collection of vertices, all the ones to which it is connected. The cheap C++11 forward_list suffices for this job.
5. A graph is a collection of Vertices. Each element in the vector is an actual vertex of the graph and the associated Vertices keeps track of all the connected vertices.
6. The Graph constructor requires as input the number of vertices in the graph, and all the edges on it. The data member vertices_ is initialized as a collection of empty Vertices.
7. Loop on all the passed edges to associate each vertex in the graph to its connections.
8. A real piece of code should have more effective error handling than this. Here I just discard any wrong edge. It would make sense let the user know that something went wrong.
9. Being the graph undirected, any edge creates two relations.
10. Utility method, just to show that anything worked as expected (hopefully).

Here is how my Graph class is used:
Edges edges { { 0, 1 }, { 0, 4 }, { 0, 5 }, { 1, 2 }, { 1, 4 }, { 2, 3 }, { 3, 4 } };
Graph graph(6, edges);
graph.print();
The expected output:
0: 5 4 1 
1: 4 2 0 
2: 3 1 
3: 4 2 
4: 3 1 0 
5: 0 

Go to the full post

Greedy algorithm for activity selection

A typical example of problem that has an optimal solution by implementing a greedy algorithm is the activity selection one, here is its description on wikipedia. In few words, we have a bunch of activities, identified by a start and end time, and we want to find a maximum selection of non-conflicting elements.

A couple of test cases (written in C++11 for GoogleTest) should clarify the problem:
typedef std::pair<int, int> Activity;
typedef std::vector<Activity> Activities;

TEST(ActSel, Simple)
{
  Activities input { {1, 2}, {5, 9}, {0, 6}, {8, 9}, {3, 4}, {5, 7} };

  Activities output = selectMax(input);
  ASSERT_EQ(4, output.size());
  for(unsigned i = 1; i < output.size(); ++i)
    ASSERT_LE(output[i-1].second, output[i].first);
}

TEST(ActSel, Simple2)
{
  Activities input { {1, 4}, {3, 5}, {0, 6}, {3, 9}, {5, 9}, {5, 7}, {6, 10}, {8, 11}, {8, 12}, {2, 14}, {12, 16} };

  Activities output = selectMax(input);
  ASSERT_EQ(4, output.size());
  for(unsigned i = 1; i < output.size(); ++i)
    ASSERT_LE(output[i-1].second, output[i].first);
}
In both cases I expect a selection of four Activity objects in output. In the first case these elements: (1, 2) (3, 4) (5, 7) (8, 9), in the second one (1, 4) (5, 7) (8, 12) (12, 16), or maybe (8, 11) instead of (8, 12). As you can see, there could be more solutions, and the problem doesn't require you to be particolary choosy. Once you maximize the number of selected items, the actual value of each of them is not an issue.

Still, I want to ensure in my test cases that I peak a valid solution, so I check, through ASSERT_LE, that all the elements in the extracted sequence are ordered as expected.

As said above, this problem has a greedy optimal solution. What we have to do is just sorting the input elements by their second component (the end time), and then greedily accepting all the elements we can. As in this implementation:
Activities selectMax(Activities& input) // 1
{
  std::sort(input.begin(), input.end(), [](Activity a, Activity b) { return a.second < b.second; }); // 2

  Activities output;
  output.push_back(input[0]); // 3

  for(unsigned i = 0, j = 1; j < input.size(); ++j) // 4
  {
    if(input[j].first >= input[i].second) // 5
    {
      output.push_back(input[j]);
      i = j;
    }
  }

  return output;
}
1. We don't mind if this function modify the input parameter, so it is passed as non-constant reference. Beware that this should be know and accepted by the callers.
2. The "normal" STL sort() function would order the passed sequence by its first component. So we need to use the overloaded version that let as pass a predicate to be used as comparator. Using a C++11 lambda function, as shown here, makes it simple and elegant.
3. The first element is always selected.
4. Now we are ready to loop on all the other elements in the sequence. The real looping variable is j, while i is used to keep track of the last accepted element.
5. The first element after the last accepted one that starts not before the end of it, is pushed in the output sequence.

Go to the full post

Rod cutting by dynamic programming

A typical problem that suits well to show how dynamic programming works. We have a rod sized up to, let's say, 10. We can freely cut it in pieces (integer sized) to sell them at the best price. Given a price table, find out the way to get the most from it.

Here is a C++11 test case for GoogleTest that should clarify the requirements:
typedef std::vector<int> Vector;

unsigned cutRod(const Vector& price, unsigned size);

TEST(CutRod, Simple)
{
  Vector price { 1, 5, 8, 9, 10, 17, 17, 20, 24, 30 };

  ASSERT_EQ(30, cutRod(price, 10));
  ASSERT_EQ(25, cutRod(price, 9));
  ASSERT_EQ(18, cutRod(price, 7));
  ASSERT_EQ(10, cutRod(price, 4));
}
Given that price list, we see immediately that if we have in input a rod sized up to 3, the best strategy is selling it in a single piece.
But if we have a rod sized four, selling it untouched we'll get 9. Better if we split it in two rodes both sized two, that give us 5 + 5 = 10.
Similarly, a rod sized 7 is priced 17. If we split it in two parts sized 6 and 1, we'll get 17 + 1 = 18.

Brute force

We may think to apply a recursive approach to this problem to check all the possible cut combinations we can think of. It is very easy to write the relative code, but can't we expect it to scale well:
unsigned cutRod(const Vector& price, unsigned size)
{
  unsigned result = 0;
  for(unsigned i = 0; i < size; ++i)
    result = std::max(result, price[i] + cutRod(price, size - (i+1)));

  return result;
}
It is just a matter of recursively calling our function reducing each time the size of the rod we are considering. We compare any time the partial result with the one we have previously stored, keeping just the best one.

Top-down dynamic programming

One obvious problem in the previous solution is that we solve again and again the same sub-problems. We could save lot of running time storing them in a buffer. This simple but effective idea is the basic of the dynamic programming technique.

In this context, the bargain of using space to avoid spending time repeating the same task to get a partial result is called memoization (as keeping a memo).

Here is a possible top-down implementation, very close to the naive version seen above:
unsigned cutRod(const Vector& price, unsigned size)
{
  Vector memo(size + 1, -1);
  memo[0] = 0;

  return memoCutRod(price, size, memo);
}
Here cutRod() just creates a memo vector that would store the values for each sub-problem, as soon as we get its result. Then it would start the recursion calling a support function.

Notice that the memo buffer has one element more than the price list. This is for storing also the value of the dummy cut sized zero. It is not a strict necessity, since we know that it won't cost anything, but it would help to make our code cleaner.
unsigned memoCutRod(const Vector& price, unsigned size, Vector& memo)
{
  if(memo[size] >= 0) // 1
    return memo[size];

  unsigned result = 0; // 2
  for(unsigned i = 0; i < size; ++i)
    result = std::max(result, price[i] + memoCutRod(price, size - (i+1), memo));

  return memo[size] = result; // 3
}
1. If the realtive memo buffer is not negative, we have already calculated it. Job already done.
2. Otherwise we calculate the best price as seen before.
3. And we set a memo before returning it.

Bottom-up approach

Again dynamic programming, still using memoization as we have just seen, but starting from the bottom of the problem and crawling up to its top. In this case the implementation is even simpler, and avoid us the pain and the cost of recursion:
unsigned cutRod(const Vector& price, unsigned size)
{
  Vector memo(size + 1); // 1
  for(unsigned i = 1; i <= size; ++i) // 2
  {
    int value = -1; // 3
    for(unsigned j = 0; j < i; ++j) // 4
      value = std::max(value, price[j] + memo[i-j-1]);
    memo[i] = value;
  }

  return memo.back(); // 5
}
1. As in the top-down approach, we get an extra element in the memo vector, just to keep simpler the code. But this time we don't need to initialize it to a "bad" values, because we are setting it up iteratively starting from the beginning.
2. First element in memo is already set to its expected value (that is, zero) as courtesy of the vector constructor. We need to calculate all the other elements, up to the rightmost one.
3. Initialize the current memo value to less than the minimum acceptable value (meaning, less than zero).
4. Basically it is the same loop we have seen in the previous implementations, but here we explicitly go for the smaller element first.
5. End of the story, the answer is stored in the rightmost memo element.

Check on github for full C++11 code.

Go to the full post

Quicksort

Quicksort is known to be a fast O(N lg N) divide and conquer sorting algorithm, in its average behavior. Still we have to pay attention to the worst case scenario, that brings it to a O(N ** 2) time cost.

The idea is repetitively partitioning the data collection, splitting it in two parts, in a way that a randomly chosen pivot would be equal or greater than the values on its left partition, and then call again the quicksorting procedure, until there is nothing more left to sort. As one could easily spot, is a possible bad choice of the pivot that could lead to poor performances.

The resulting code should be something like this:
void quicksort(std::vector<int>& data, int left, int right) // 1
{
  if(left < right) // 2
  {
    int pivot = partition(data, left, right); // 3
    quicksort(data, left, pivot - 1); // 4
    quicksort(data, pivot + 1, right);
  }
}
1. The function requires in input the collection on which it should operate and the indexes of its leftmost and rightmost elements.
2. Check if the interval is not empty.
3. Split the original interval in two parts. On the left side we have all the values less or equal to the value in the pivot element.
4. Call again quicksort on the left and right partitions. Notice that the pivot element is already in the right place, and don't need to be considered anymore.

We just need to partition a collection as expected:
int partition(std::vector<int>& data, int left, int right)
{
  int pivot = data[right]; // 1
  int index = left - 1; // 2

  for(int i = left; i < right; ++i) // 3
  {
    if(data[i] <= pivot) // 4
      std::swap(data[++index], data[i]);
  }

  std::swap(data[++index], data[right]); // 5
  return index;
}
1. OK, this doesn't look smart. As pivot we always select the rightmost element in the interval.
2. Initialize index to the first-before-beginning position in the interval.
3. Loop on all the items in the interval, but the last one (that is, the pivot).
4. If the current element value is less than the pivot, let's swap it with the first not already used element on the left.
5. Finally, we swap the pivot (rightmost value in the interval) with the element next to index.

Full C++ code on github.

Go to the full post

Heapsort

Heapsort is an in-place sorting algorithm, like insertion sort, that asympthotically scores a nice O(N lg N) time complexity, like merge sort.

It makes use of the heap data structure, that is a normal array seen as a nearly complete binary tree, in its max-heap flavor. Meaning that its biggest value is placed in the first element of the array (considered as the root of the tree).

Implementing heapsort in C++ is pretty trivial, since it just a matter of calling two STL algorithm functions:
#include <vector>
#include <algorithm>

void heapsort(std::vector<int>& data)
{
  std::make_heap(data.begin(), data.end()); // 1
  std::sort_heap(data.begin(), data.end()); // 2
}
1. This make_heap() call rearranges the passed elements as a max-heap.
2. This sort_heap() call assumes that the passed sequence is a max-heap and sort it in ascending order.

But let's have some fun reimplementing by hand these two functions:
typedef std::vector<int> Vector;

void heapsort(Vector& data)
{
  buildMaxHeap(data);
  sortHeap(data);
}
We'll need a way to navigate down the binary heap:
unsigned childLeft(unsigned i) { return (2 * i) + 1; }
unsigned childRight(unsigned i) { return (2 * i) + 2; }
The root element is at index 0. Its children are on 1 and 2.
The left child of the root (index 1) has its own children on 3 and 4; its sibling (index 2) on 5 and 6.
We can get the index of the children of a generic node in a binary heap just multiplying its index by two and adding 1 (for the left one) or 2 (for the right one).
And we'll need a function to ensure that a node in the data structure is complying to the binary max-heap requisite (it should be bigger than its children):
void maxHeapify(Vector& data, unsigned i, unsigned len) // 1
{
  unsigned left = childLeft(i);
  unsigned right = childRight(i);

  unsigned largest = (left < len && (data[left] > data[i])) ? left : i;
  if(right < len && (data[right] > data[largest]))
    largest = right;

  if(largest != i) // 2
  {
    std::swap(data[i], data[largest]); // 3
    maxHeapify(data, largest, len); // 4
  }
}
1. We pass to the function the data collection, the index of the element that we are checking, and the number of element in the heap.
2. We have compared the current node value against the ones of its left and right children. If the largest one is a children, the rule of the heap is currently violated. We need to rearrange the nodes.
3. Firstly, we need to swap the nodes so that the largest one is above the other ones.
4. Then, we need to ensure that the swapping has not corrupted the max-heap structure.

We are finally ready to implement the two main functions:
void buildMaxHeap(Vector& data) // 1
{
  for(int i = data.size() / 2; i >= 0; --i) // 2
    maxHeapify(data, i, data.size());
}

void sortHeap(Vector& heap) // 3
{
  for(int i = heap.size() - 1; i > 0; --i) // 4
  {
    std::swap(heap[0], heap[i]); // 5
    maxHeapify(heap, 0, i); // 6
  }
}
1. Given an arbitrary collection of values, convert it to a max-heap.
2. Start from the bottom, up to the root.
3. We assume that the passed data respect the max-heap constrains.
4. We scan the heap starting from the rightmost element up to the second one.
5. We know that the heap root is the biggest element in the collection, so we swap it to the rightmost position.
6. Before starting a new iteration, we ensure that the data collection (except the one we have already sorted) is still a max-heap.

Full C++ source code and a couple of test cases for Google test on github.

Go to the full post

Maximum subarray by Kadane

I have already shown a solution to the maximum subarray problem, based on an algorithm that follows the divide and conquer recipe. Here I give a couple of C++11 implementations based on the asyntotically better algorithm devised by Professor Kadane in the dynamic programming spirit.

The basic idea is keeping memory of the higher sum already reached and increasing a current sum. If the current sum gets higher than the historical one, also that one is increased.

This is a first version, that calculates just the sum:
typedef std::vector<int> Vector;

int maxSubAr(const Vector& data)
{
  int sum = 0;
  int sumTmp = 0;

  for(unsigned i = 0; i < data.size(); ++i)
  {
    if(int value = sumTmp + data[i] > 0) // 1
      sumTmp = value;
    else
      sumTmp = 0;

    if(sumTmp > sum) // 2
      sum = sumTmp;
  }

  return sum;
}
1. Add the current element value to the temporary sum. If this leads to a positive number, this will be the new temporary sum, otherwise I reset it.
2. If the temporary sum is bigger than the previously saved sum, I save this new value.

And that's it. Incredibly simple and effective.

Things get a bit more complicated when we want to get also the first-last index of the subsequence:
typedef std::array<int, 3> Info;
typedef std::vector<int> Vector;

Info maxSubArray(const Vector& data)
{
  int left = 0;
  int right = 0;
  int sum = 0;

  int leftTmp = 0;
  int sumTmp = 0;

  for(unsigned i = 0; i < data.size(); ++i)
  {
    int value = sumTmp + data[i];
    if(value > 0)
    {
      if(sumTmp == 0) // 1
        leftTmp = i;
      sumTmp = value;
    }
    else
      sumTmp = 0;

    if(sumTmp > sum) // 2
    {
      left = leftTmp;
      right = i;
      sum = sumTmp;
    }
  }

  return {{ left, right, sum }};
}
1. If I am at the beginning of the sequence, or if I have just reset the temporary sum, the current value is at the tentative first element of the subsequence.
2. When I see that the current sum is bigger than the one already discovered, I adjust also the left/right indexes.

Full C++11 code is on github, with a few GoogleTest test as a bonus.

Go to the full post