Graph by adjacency list

If you need to work with graphs in your C++11 code, you'd usually rely on someone else's job, like the Boost Graph Library, friendly known as BGL. Sometimes it happens you simply can't, and you have to work it out by yourself. Here I am writing a trivial Graph class that would let me to store an undirected unweighted graph in a compact form.

I have a simple graph as the one showed in the picture. Each vertex is represented by an unsigned integer starting from zero, that helps me to keep the code even simpler. Edges have no weight nor direction, so we can move from vertex 0 to vertex 5 and vice versa, and we are not interested in the cost of moving from one vertex to another one. We only want to know if we can actually go from here to there.

The two common ways to represent a graph differ by using a matrix or a list to store the adjacency of each vertex. As often happens, you should know the actual problem you are tackling to decide which data structure would suit you better. Still, list is usually the primary suspect.

In this first implementation, my class Graph provides only a constructor to set it up and a print method to show what it has in its belly. The main focus here is about showing how the data is stored in it.
using Vertex = unsigned int; // 1
using Edge = std::pair<Vertex, Vertex>; // 2
using Edges = std::vector<Edge>; // 3
using Vertices = std::forward_list<Vertex>; // 4

class Graph
{
public:
    std::vector<Vertices> vertices_; // 5

    Graph(int nv, Edges edges) : vertices_(nv) // 6
    {
        std::for_each(edges.begin(), edges.end(), [this](const Edge& edge) // 7
        {
            if(edge.first < vertices_.size() && edge.second < vertices_.size()) // 8
            {
                vertices_[edge.first].push_front(edge.second); // 9
                vertices_[edge.second].push_front(edge.first);
            }
        });
    }

    void print() // 10
    {
        for(Vertex i = 0; i < vertices_.size(); ++i)
        {
            std::cout << i << ": ";
            std::copy(vertices_[i].begin(), vertices_[i].end(), std::ostream_iterator<Vertex>(std::cout, " "));
            std::cout << std::endl;
        }
    }
};
1. Each vertex is represented by an unsigned integer, starting from zero.
2. An edge is defined by the two vertices delimiting it.
3. I want to pass all the edges in my graph to the class constructor. This is the collection I am going to use for this task.
4. Any vertex in the graph has an associated collection of vertices, all the ones to which it is connected. The cheap C++11 forward_list suffices for this job.
5. A graph is a collection of Vertices. Each element in the vector is an actual vertex of the graph and the associated Vertices keeps track of all the connected vertices.
6. The Graph constructor requires as input the number of vertices in the graph, and all the edges on it. The data member vertices_ is initialized as a collection of empty Vertices.
7. Loop on all the passed edges to associate each vertex in the graph to its connections.
8. A real piece of code should have more effective error handling than this. Here I just discard any wrong edge. It would make sense let the user know that something went wrong.
9. Being the graph undirected, any edge creates two relations.
10. Utility method, just to show that anything worked as expected (hopefully).

Here is how my Graph class is used:
Edges edges { { 0, 1 }, { 0, 4 }, { 0, 5 }, { 1, 2 }, { 1, 4 }, { 2, 3 }, { 3, 4 } };
Graph graph(6, edges);
graph.print();
The expected output:
0: 5 4 1 
1: 4 2 0 
2: 3 1 
3: 4 2 
4: 3 1 0 
5: 0 

Go to the full post

Greedy algorithm for activity selection

A typical example of problem that has an optimal solution by implementing a greedy algorithm is the activity selection one, here is its description on wikipedia. In few words, we have a bunch of activities, identified by a start and end time, and we want to find a maximum selection of non-conflicting elements.

A couple of test cases (written in C++11 for GoogleTest) should clarify the problem:
typedef std::pair<int, int> Activity;
typedef std::vector<Activity> Activities;

TEST(ActSel, Simple)
{
  Activities input { {1, 2}, {5, 9}, {0, 6}, {8, 9}, {3, 4}, {5, 7} };

  Activities output = selectMax(input);
  ASSERT_EQ(4, output.size());
  for(unsigned i = 1; i < output.size(); ++i)
    ASSERT_LE(output[i-1].second, output[i].first);
}

TEST(ActSel, Simple2)
{
  Activities input { {1, 4}, {3, 5}, {0, 6}, {3, 9}, {5, 9}, {5, 7}, {6, 10}, {8, 11}, {8, 12}, {2, 14}, {12, 16} };

  Activities output = selectMax(input);
  ASSERT_EQ(4, output.size());
  for(unsigned i = 1; i < output.size(); ++i)
    ASSERT_LE(output[i-1].second, output[i].first);
}
In both cases I expect a selection of four Activity objects in output. In the first case these elements: (1, 2) (3, 4) (5, 7) (8, 9), in the second one (1, 4) (5, 7) (8, 12) (12, 16), or maybe (8, 11) instead of (8, 12). As you can see, there could be more solutions, and the problem doesn't require you to be particolary choosy. Once you maximize the number of selected items, the actual value of each of them is not an issue.

Still, I want to ensure in my test cases that I peak a valid solution, so I check, through ASSERT_LE, that all the elements in the extracted sequence are ordered as expected.

As said above, this problem has a greedy optimal solution. What we have to do is just sorting the input elements by their second component (the end time), and then greedily accepting all the elements we can. As in this implementation:
Activities selectMax(Activities& input) // 1
{
  std::sort(input.begin(), input.end(), [](Activity a, Activity b) { return a.second < b.second; }); // 2

  Activities output;
  output.push_back(input[0]); // 3

  for(unsigned i = 0, j = 1; j < input.size(); ++j) // 4
  {
    if(input[j].first >= input[i].second) // 5
    {
      output.push_back(input[j]);
      i = j;
    }
  }

  return output;
}
1. We don't mind if this function modify the input parameter, so it is passed as non-constant reference. Beware that this should be know and accepted by the callers.
2. The "normal" STL sort() function would order the passed sequence by its first component. So we need to use the overloaded version that let as pass a predicate to be used as comparator. Using a C++11 lambda function, as shown here, makes it simple and elegant.
3. The first element is always selected.
4. Now we are ready to loop on all the other elements in the sequence. The real looping variable is j, while i is used to keep track of the last accepted element.
5. The first element after the last accepted one that starts not before the end of it, is pushed in the output sequence.

Go to the full post

Rod cutting by dynamic programming

A typical problem that suits well to show how dynamic programming works. We have a rod sized up to, let's say, 10. We can freely cut it in pieces (integer sized) to sell them at the best price. Given a price table, find out the way to get the most from it.

Here is a C++11 test case for GoogleTest that should clarify the requirements:
typedef std::vector<int> Vector;

unsigned cutRod(const Vector& price, unsigned size);

TEST(CutRod, Simple)
{
  Vector price { 1, 5, 8, 9, 10, 17, 17, 20, 24, 30 };

  ASSERT_EQ(30, cutRod(price, 10));
  ASSERT_EQ(25, cutRod(price, 9));
  ASSERT_EQ(18, cutRod(price, 7));
  ASSERT_EQ(10, cutRod(price, 4));
}
Given that price list, we see immediately that if we have in input a rod sized up to 3, the best strategy is selling it in a single piece.
But if we have a rod sized four, selling it untouched we'll get 9. Better if we split it in two rodes both sized two, that give us 5 + 5 = 10.
Similarly, a rod sized 7 is priced 17. If we split it in two parts sized 6 and 1, we'll get 17 + 1 = 18.

Brute force

We may think to apply a recursive approach to this problem to check all the possible cut combinations we can think of. It is very easy to write the relative code, but can't we expect it to scale well:
unsigned cutRod(const Vector& price, unsigned size)
{
  unsigned result = 0;
  for(unsigned i = 0; i < size; ++i)
    result = std::max(result, price[i] + cutRod(price, size - (i+1)));

  return result;
}
It is just a matter of recursively calling our function reducing each time the size of the rod we are considering. We compare any time the partial result with the one we have previously stored, keeping just the best one.

Top-down dynamic programming

One obvious problem in the previous solution is that we solve again and again the same sub-problems. We could save lot of running time storing them in a buffer. This simple but effective idea is the basic of the dynamic programming technique.

In this context, the bargain of using space to avoid spending time repeating the same task to get a partial result is called memoization (as keeping a memo).

Here is a possible top-down implementation, very close to the naive version seen above:
unsigned cutRod(const Vector& price, unsigned size)
{
  Vector memo(size + 1, -1);
  memo[0] = 0;

  return memoCutRod(price, size, memo);
}
Here cutRod() just creates a memo vector that would store the values for each sub-problem, as soon as we get its result. Then it would start the recursion calling a support function.

Notice that the memo buffer has one element more than the price list. This is for storing also the value of the dummy cut sized zero. It is not a strict necessity, since we know that it won't cost anything, but it would help to make our code cleaner.
unsigned memoCutRod(const Vector& price, unsigned size, Vector& memo)
{
  if(memo[size] >= 0) // 1
    return memo[size];

  unsigned result = 0; // 2
  for(unsigned i = 0; i < size; ++i)
    result = std::max(result, price[i] + memoCutRod(price, size - (i+1), memo));

  return memo[size] = result; // 3
}
1. If the realtive memo buffer is not negative, we have already calculated it. Job already done.
2. Otherwise we calculate the best price as seen before.
3. And we set a memo before returning it.

Bottom-up approach

Again dynamic programming, still using memoization as we have just seen, but starting from the bottom of the problem and crawling up to its top. In this case the implementation is even simpler, and avoid us the pain and the cost of recursion:
unsigned cutRod(const Vector& price, unsigned size)
{
  Vector memo(size + 1); // 1
  for(unsigned i = 1; i <= size; ++i) // 2
  {
    int value = -1; // 3
    for(unsigned j = 0; j < i; ++j) // 4
      value = std::max(value, price[j] + memo[i-j-1]);
    memo[i] = value;
  }

  return memo.back(); // 5
}
1. As in the top-down approach, we get an extra element in the memo vector, just to keep simpler the code. But this time we don't need to initialize it to a "bad" values, because we are setting it up iteratively starting from the beginning.
2. First element in memo is already set to its expected value (that is, zero) as courtesy of the vector constructor. We need to calculate all the other elements, up to the rightmost one.
3. Initialize the current memo value to less than the minimum acceptable value (meaning, less than zero).
4. Basically it is the same loop we have seen in the previous implementations, but here we explicitly go for the smaller element first.
5. End of the story, the answer is stored in the rightmost memo element.

Check on github for full C++11 code.

Go to the full post

Quicksort

Quicksort is known to be a fast O(N lg N) divide and conquer sorting algorithm, in its average behavior. Still we have to pay attention to the worst case scenario, that brings it to a O(N ** 2) time cost.

The idea is repetitively partitioning the data collection, splitting it in two parts, in a way that a randomly chosen pivot would be equal or greater than the values on its left partition, and then call again the quicksorting procedure, until there is nothing more left to sort. As one could easily spot, is a possible bad choice of the pivot that could lead to poor performances.

The resulting code should be something like this:
void quicksort(std::vector<int>& data, int left, int right) // 1
{
  if(left < right) // 2
  {
    int pivot = partition(data, left, right); // 3
    quicksort(data, left, pivot - 1); // 4
    quicksort(data, pivot + 1, right);
  }
}
1. The function requires in input the collection on which it should operate and the indexes of its leftmost and rightmost elements.
2. Check if the interval is not empty.
3. Split the original interval in two parts. On the left side we have all the values less or equal to the value in the pivot element.
4. Call again quicksort on the left and right partitions. Notice that the pivot element is already in the right place, and don't need to be considered anymore.

We just need to partition a collection as expected:
int partition(std::vector<int>& data, int left, int right)
{
  int pivot = data[right]; // 1
  int index = left - 1; // 2

  for(int i = left; i < right; ++i) // 3
  {
    if(data[i] <= pivot) // 4
      std::swap(data[++index], data[i]);
  }

  std::swap(data[++index], data[right]); // 5
  return index;
}
1. OK, this doesn't look smart. As pivot we always select the rightmost element in the interval.
2. Initialize index to the first-before-beginning position in the interval.
3. Loop on all the items in the interval, but the last one (that is, the pivot).
4. If the current element value is less than the pivot, let's swap it with the first not already used element on the left.
5. Finally, we swap the pivot (rightmost value in the interval) with the element next to index.

Full C++ code on github.

Go to the full post

Heapsort

Heapsort is an in-place sorting algorithm, like insertion sort, that asympthotically scores a nice O(N lg N) time complexity, like merge sort.

It makes use of the heap data structure, that is a normal array seen as a nearly complete binary tree, in its max-heap flavor. Meaning that its biggest value is placed in the first element of the array (considered as the root of the tree).

Implementing heapsort in C++ is pretty trivial, since it just a matter of calling two STL algorithm functions:
#include <vector>
#include <algorithm>

void heapsort(std::vector<int>& data)
{
  std::make_heap(data.begin(), data.end()); // 1
  std::sort_heap(data.begin(), data.end()); // 2
}
1. This make_heap() call rearranges the passed elements as a max-heap.
2. This sort_heap() call assumes that the passed sequence is a max-heap and sort it in ascending order.

But let's have some fun reimplementing by hand these two functions:
typedef std::vector<int> Vector;

void heapsort(Vector& data)
{
  buildMaxHeap(data);
  sortHeap(data);
}
We'll need a way to navigate down the binary heap:
unsigned childLeft(unsigned i) { return (2 * i) + 1; }
unsigned childRight(unsigned i) { return (2 * i) + 2; }
The root element is at index 0. Its children are on 1 and 2.
The left child of the root (index 1) has its own children on 3 and 4; its sibling (index 2) on 5 and 6.
We can get the index of the children of a generic node in a binary heap just multiplying its index by two and adding 1 (for the left one) or 2 (for the right one).
And we'll need a function to ensure that a node in the data structure is complying to the binary max-heap requisite (it should be bigger than its children):
void maxHeapify(Vector& data, unsigned i, unsigned len) // 1
{
  unsigned left = childLeft(i);
  unsigned right = childRight(i);

  unsigned largest = (left < len && (data[left] > data[i])) ? left : i;
  if(right < len && (data[right] > data[largest]))
    largest = right;

  if(largest != i) // 2
  {
    std::swap(data[i], data[largest]); // 3
    maxHeapify(data, largest, len); // 4
  }
}
1. We pass to the function the data collection, the index of the element that we are checking, and the number of element in the heap.
2. We have compared the current node value against the ones of its left and right children. If the largest one is a children, the rule of the heap is currently violated. We need to rearrange the nodes.
3. Firstly, we need to swap the nodes so that the largest one is above the other ones.
4. Then, we need to ensure that the swapping has not corrupted the max-heap structure.

We are finally ready to implement the two main functions:
void buildMaxHeap(Vector& data) // 1
{
  for(int i = data.size() / 2; i >= 0; --i) // 2
    maxHeapify(data, i, data.size());
}

void sortHeap(Vector& heap) // 3
{
  for(int i = heap.size() - 1; i > 0; --i) // 4
  {
    std::swap(heap[0], heap[i]); // 5
    maxHeapify(heap, 0, i); // 6
  }
}
1. Given an arbitrary collection of values, convert it to a max-heap.
2. Start from the bottom, up to the root.
3. We assume that the passed data respect the max-heap constrains.
4. We scan the heap starting from the rightmost element up to the second one.
5. We know that the heap root is the biggest element in the collection, so we swap it to the rightmost position.
6. Before starting a new iteration, we ensure that the data collection (except the one we have already sorted) is still a max-heap.

Full C++ source code and a couple of test cases for Google test on github.

Go to the full post

Maximum subarray by Kadane

I have already shown a solution to the maximum subarray problem, based on an algorithm that follows the divide and conquer recipe. Here I give a couple of C++11 implementations based on the asyntotically better algorithm devised by Professor Kadane in the dynamic programming spirit.

The basic idea is keeping memory of the higher sum already reached and increasing a current sum. If the current sum gets higher than the historical one, also that one is increased.

This is a first version, that calculates just the sum:
typedef std::vector<int> Vector;

int maxSubAr(const Vector& data)
{
  int sum = 0;
  int sumTmp = 0;

  for(unsigned i = 0; i < data.size(); ++i)
  {
    if(int value = sumTmp + data[i] > 0) // 1
      sumTmp = value;
    else
      sumTmp = 0;

    if(sumTmp > sum) // 2
      sum = sumTmp;
  }

  return sum;
}
1. Add the current element value to the temporary sum. If this leads to a positive number, this will be the new temporary sum, otherwise I reset it.
2. If the temporary sum is bigger than the previously saved sum, I save this new value.

And that's it. Incredibly simple and effective.

Things get a bit more complicated when we want to get also the first-last index of the subsequence:
typedef std::array<int, 3> Info;
typedef std::vector<int> Vector;

Info maxSubArray(const Vector& data)
{
  int left = 0;
  int right = 0;
  int sum = 0;

  int leftTmp = 0;
  int sumTmp = 0;

  for(unsigned i = 0; i < data.size(); ++i)
  {
    int value = sumTmp + data[i];
    if(value > 0)
    {
      if(sumTmp == 0) // 1
        leftTmp = i;
      sumTmp = value;
    }
    else
      sumTmp = 0;

    if(sumTmp > sum) // 2
    {
      left = leftTmp;
      right = i;
      sum = sumTmp;
    }
  }

  return {{ left, right, sum }};
}
1. If I am at the beginning of the sequence, or if I have just reset the temporary sum, the current value is at the tentative first element of the subsequence.
2. When I see that the current sum is bigger than the one already discovered, I adjust also the left/right indexes.

Full C++11 code is on github, with a few GoogleTest test as a bonus.

Go to the full post

Maximum subarray by divide and conquer

Given an array containing positive and negative integers, we want to determine a subarray containing the largest sum of elements.

A couple of test cases, written in C++11 for GoogleTest, should make clearer the problem:
typedef std::array<int, 3> Info; // 1
typedef std::vector<int> Vector; // 2

TEST(MaxSub, Simple) // 3
{
  Vector input { 2, 3, 4, 5, 7 };

  unsigned last = input.size() - 1;
  Info sub = maxSubArray(input, 0, last);
  EXPECT_EQ(0, sub[0]);
  EXPECT_EQ(last, sub[1]);
  EXPECT_EQ(21, sub[2]);
}

TEST(MaxSub, Simple2) // 4
{
  Vector input {-2, -5, 6, -2, -3, 1, 5, -6};

  Info sub = maxSubArray(input, 0, input.size() - 1);
  EXPECT_EQ(2, sub[0]);
  EXPECT_EQ(6, sub[1]);
  EXPECT_EQ(7, sub[2]);
}

TEST(MaxSub, Negative) // 5
{
  Vector input {-2, -5, -2, -3, -6};

  Info sub = maxSubArray(input);
  EXPECT_EQ(0, sub[0]);
  EXPECT_EQ(0, sub[1]);
  EXPECT_EQ(0, sub[2]);
}
1. I want the function to return three values, the delimiting subarray indexes, and the found sum.
2. This is the container used to keep the input data.
3. Trivial case, all positive elements, the function would return 0, the last element index, and all element sum.
4. A typical case, nothing fancy.
5. If no positive element is in input, the result is an empty subarray.

We could use a divide and conquer approach to get a solution. Here is my C++11 implementation. Firstly, here is the divide part:
Info maxSubArray(const Vector& data, int left, int right)
{
  if(left == right) // 1
    return {{ left, right, data[left] }};

  int middle = (left + right) / 2; // 2

  Info subLeft = maxSubArray(data, left, middle); // 3
  Info subRight = maxSubArray(data, middle + 1, right);
  Info crossing = maxCrossing(data, left, middle, right); // 4

  return max(subLeft, subRight, crossing); // 5
}
1. If left and right index are actually the same, the problem is trivial.
2. Otherwise we split the interval in two parts. And
3. Recursively call the divide function on the left and right parts.
4. The hard job is done here. We need to check also the sequences that star before the middle point and end after it.
5. Once we get the three partial results, it is just a matter of checking which one has the highest sum and return it. To do that I have written a max() function that I guess you won't need to see to get how it works. In any case you will find it on github.

Let's see how to get the max subarray that crosses the central element. The idea is pretty simple, get the left and right max sum, starting from the middle point and moving outward, and then merge it:
Info maxCrossing(const Vector& data, int left, int middle, int right)
{
  int sum = 0;

  int maxLeft = middle;
  int leftSum = std::numeric_limits<int>::min();
  for(int i = middle; i >= left; --i) // 1
  {
    sum += data[i]; // 2
    if(sum > leftSum)
    {
      leftSum = sum;
      maxLeft = i;
    }
  }

  sum = 0; // 3
  int maxRight = middle + 1;
  int rightSum = std::numeric_limits<int>::min();
  for(int i = middle + 1; i <= right; ++i)
  {
    sum += data[i];
    if(sum > rightSum)
    {
      rightSum = sum;
      maxRight = i;
    }
  }

  return {{ maxLeft, maxRight, leftSum + rightSum }}; // 4
}
1. Loop on the elements, starting from the middle element to the leftmost one.
2. Tentatively add to the sum value the current value. If it is bigger to the precedently stored left sum value, adjust it and its leftmost index.
3. The right part is scanned specularly.
4. Put together left and right partial result to get the return value.

Full source C++11 code and a few test cases are on github.

Go to the full post

Merge sort

Typical example of a divide and conquer approach applied to the sorting problem. It's O(n lg n) time complexity makes it interesting for large enough data collections.

This algorithm's idea is that we could split recursively the problem until we have a tiny subproblem so simple that is trivial to solve it. Than we just have to merge the partial results to get the actual solution.

Here is the "divide" part, implemented in C++:
typedef std::vector<int> Data;

void mergeSort(Data& data, int left, int right) // 1
{
    if(left < right) // 2
    {
        int center = (left + right) / 2;
        mergeSort(data, left, center); // 3
        mergeSort(data, center + 1, right);
        merge(data, left, center, right); // 4
    }
}
1. The function expects in input the collection to sort and the first/last indexes to consider.
2. If there is just one element to sort, there is nothing to do.
3. Split the problem it two parts and recursively call the divide function on them.
4. Merge the partial solutions.

As one would expect, large part of the job is done by the merging function:
typedef std::queue<int> Queue;

void merge(Data& data, int left, int center, int right)
{
    Queue low; // 1
    Queue high;

    for(int i = left; i <= center; ++i) // 2
        low.push(data[i]);
    for(int i = center + 1; i <= right; ++i)
        high.push(data[i]);

    int i = left;
    while(!low.empty() && !high.empty()) // 3
    {
        if(low.front() <= high.front())
        {
            data[i++] = low.front();
            low.pop();
        }
        else
        {
            data[i++] = high.front();
            high.pop();
        }
    }

    while(!low.empty()) // 4
    {
        data[i++] = low.front();
        low.pop();
    }
    while(!high.empty())
    {
        data[i++] = high.front();
        high.pop();
    }
}
1. A couple of queues are used to temporary store the data while processing them.
2. Fill the queues with the data coming from the input collection.
3. Compare the data on the two queues, rearranging them in the original container.
4. Ensure all the possible elements left in the temporary queues are copied back to the input data.

Full code is on github. As bonus you will find there also a simple xUnit test for GoogleTest.

Go to the full post

Insertion sort

It's a simple sorting algorithm that, even if asymptotically expensive, O(N**2), it results to be cheap for small data set.

Its idea is comparing each element starting from the second up to the last one with its predecessor. While we found smaller items on its left, move those ones to the right, to make room to it.

Here is my C++ implementation:
void insertionSort(std::vector<int>& data)
{
    for(unsigned i = 1; i < data.size(); ++i) // 1
    {
        int value = data[i]; // 2
        int j = i - 1; // 3
        while(j >= 0 && data[j] > value)
        {
            data[j+1] = data[j]; // 4
            --j;
        }
        data[j+1] = value; // 5
    }
}
1. Loop on all the elements after the first one.
2. Cache the current element.
3. Loop backward on the elements on the left of the current one until we found something smaller, or there is nothing more to check.
4. Make room to the current element.
5. Place the element in order.

Full code is on github. As bonus you will find there also a basic GoogleTest test.

Go to the full post

Longest Common Subsequence by Dynamic Programming

Given two strings, find their longest common subsequence.

This is a well known programming problem (to read more about it, you could go to this wikipedia page) that is commonly used as introduction to the dynamic programming solving method.

Its simpler version requires to return just the size of the subsequence. Determining the actual result is a bit more complicated.

I have written a few test cases to clarify to myself what the problem is about, and to get driven in the software development. Here are a couple of them (written for the C++ GoogleTest framework):
int lcsSize(const std::string& first, const std::string& second);
std::string lcsStr(const std::string& first, const std::string& second);

TEST(Lcs, CaseSize)
{
  EXPECT_EQ(2, lcsSize("AGCAT", "GAC"));
  EXPECT_EQ(3, lcsSize("ABCDGH", "AEDFHR"));
  EXPECT_EQ(4, lcsSize("AGGTAB", "GXTXAYB"));
}

TEST(Lcs, CaseStr)
{
  EXPECT_EQ("GA", lcsStr("AGCAT", "GAC"));
  EXPECT_EQ("ADH", lcsStr("ABCDGH", "AEDFHR"));
  EXPECT_EQ("GTAB", lcsStr("AGGTAB", "GXTXAYB"));
}
As you would have already get, lcsSize() is the simpler one, and lcsStr() the more complete. Both functions require to generate a matrix where all the subproblem results are stored. If we want to get just the result size, we would just peak the result of the last one (the most right-below element). Otherwise we'll need to navigate the matrix to build the string up:
std::vector<std::vector<int>> lcs(const std::string& lhs, const std::string& rhs)
{
  const unsigned rows = lhs.size() + 1; // 1
  const unsigned cols = rhs.size() + 1;

  std::vector<std::vector<int>> buffer(rows); // 2
  for(unsigned i = 0; i < rows; ++i) // 3
    buffer[i].resize(cols);

  for(unsigned i = 1; i < rows; ++i) // 4
    for(unsigned j = 1; j < cols; ++j)
      buffer[i][j] = (lhs[i-1] == rhs[j-1]) ? // 5
          buffer[i-1][j-1] + 1 : std::max(buffer[i-1][j], buffer[i][j-1]);

  return buffer;
}
1. The matrix is going to have an extra row and column. This is not a strict necessity, but it makes the code more readable.
2. A simple way to implement a matrix is making it a vector of vectors. Notice the double '>' sign, it is legal in C++11 but it would confuse older compilers.
3. The vector of vectors ctor in the line above creates "rows" zero-sized rows. Here I assign them the right size. Remember that each element is initialized with the default value, that is zero.
4. Loop on all the "real" elements to solve the problem using the previous results.
5. To determine the current value, I check if the relative letters in the input string are the same (notice the "-1", it is due because of the fake top/left elements in the matrix). If it is the case, I have found another matching character, so I increase the value. Otherwise the lcs is not increasing, check the current biggest value and use it.

Now it is trivial to get the common subsequence size, it is the value stored in the last visited cell in the matrix:
int lcsSize(const std::string& lhs, const std::string& rhs)
{
  std::vector<std::vector<int>> buffer = lcs(lhs, rhs);
  return buffer[lhs.size()][rhs.size()];
}
Returning the actual subsequence requires some more code:
std::string lcsStr(const std::string& lhs, const std::string& rhs)
{
  std::vector<std::vector<int>> buffer = lcs(lhs, rhs);

  std::pair<unsigned, unsigned> cur { lhs.size(), rhs.size() }; // 1
  std::vector<char> result; // 2
  while(unsigned size = buffer[cur.first][cur.second] > 0) // 3
  {
    if(buffer[cur.first-1][cur.second] < buffer[cur.first][cur.second] &&
        buffer[cur.first][cur.second-1] < buffer[cur.first][cur.second])
    { // 4
      result.push_back(rhs[cur.second-1]);
      --cur.first;
      --cur.second;
      --size;
    }
    else if(buffer[cur.first][cur.second-1] == buffer[cur.first][cur.second]) // 5
      --cur.second;
    else
      --cur.first;
  }

  return std::string(result.rbegin(), result.rend()); // 6
}
1. The starting point is the matrix right bottom.
2. Each time I find a subsequence character, I'll push it here.
3. We know the subsequence length, we can use it to stop looping when we find all its elements.
4. This cell is marked as being relative to a matching character. Save its value and move up and to the left.
5. Otherwise, move in the matrix (to the left or up), until we find another border character.
6. Now is just a matter of building a string that revert the sequence stored in the resulting vector.

Go to the full post

Double square numbers

A number that is the sum of two perfect squares is called a double square. Notice that any perfect square number, since it could be expressed as the sum of its square root and zero squared, is a double square:
16 = 4**2 + 0**2
Beside the trivial case showed before, we could think of cases like 50, were more than one couple of integer could generate it:
50 = 1**2 + 7**2 = 5**2 + 5**2
We want to write a function that could check if any positive integer up to a couple of billion (so that we could store it in a signed 32 bit int) is such a number.

I have found this problem on CodeEval, under the name Double Squares however, they credit the Facebook Hacker Cup 2011 as their source. They suggest of not even thinking about a brute force approach, and trying to be smarter than that.

When working on this post, I found nicer to change slightly the requisites, asking the function to return the collection of couples that generates the number as double square. If the passed number is not such, the function would return an empty collection.

Here is a few test cases (GoogleTest for C++) that show what I want from this function:
typedef std::vector< std::pair<int, int> > Generators; // 1

Generators doubleSquares(int value); // 2

TEST(DoubleSquares, Sixteen) // 3
{
    Generators output = doubleSquares(16);

    ASSERT_EQ(1, output.size());
    EXPECT_EQ(0, output[0].first);
    EXPECT_EQ(4, output[0].second);
}

TEST(DoubleSquares, Fifty) // 4
{
    Generators output = doubleSquares(50);

    ASSERT_EQ(2, output.size());
    EXPECT_EQ(1, output[0].first);
    EXPECT_EQ(7, output[0].second);
    EXPECT_EQ(5, output[1].first);
    EXPECT_EQ(5, output[1].second);
}

TEST(DoubleSquares, BigOne) // 5
{
    Generators output = doubleSquares(5882353);

    ASSERT_EQ(1, output.size());
    EXPECT_EQ(588, output[0].first);
    EXPECT_EQ(2353, output[0].second);
}
1. As container for the generators I am going to use a vector of pairs.
2. My function prototype.
3. Passing 16 in, I expect as output a single pair, (0, 4).
4. Both (1, 7) and (5, 5) generates 50.
5. A (relative) big number, 5882353, generated by (588, 2353).

Here is how I have implemented the function:
Generators doubleSquares(int value)
{
    Generators result;
    for (int i = 0; i <= std::sqrt(value / 2); i++) // 1
    {
        double j = std::sqrt(value - std::pow(i, 2)); // 2
        if (std::floor(j) == j) // 3
            result.push_back( { i, static_cast<int>(j) } ); // 4
    }
    return result;
}
1. I am checking if value has a couple of generators. Each of them should be a non-negative integer, the biggest one can't be bigger than the square root of the half of value. To clarify this point think to a real case, 64 for instance. It is easy to see that 9 can't be among its generators.
2. Let's assume i is a generator for value, the other generator should concur with all the missing slice to reach the result. So I extract the square root from the difference between value and the square of i.
3. I am still not sure I can accept j as second generator for value, I should check it is an integer. This is probably the fastest way to do it. The standard math function floor() returns the nearest integer not greater than its input. We state that its returned value is exactly equals to the j original value.
4. I am using this very handy C++11 way to build a STL pair on the fly. If your compiler does not support it, I am sorry you have to fallback to the less immediate make_pair() function template. Notice also that I should say to the compiler that I am aware I am casting a double to an int. I am sure of what I am doing, thanks to the check on the previous line.

Go to the full post

Dominator by golden leader

I want to check if in a collection of integers there is a value that occurs more than half the times. I have already given a solution to this problem in the previous post, but I need something smarter, linear in time and constant in space complexity.

To achieve this result I am about to implement an algorithm based on the discussion that you can read in a paper available on Codility. Look at the end of that document, for a Python function named goldenLeader.

In a few words, the idea is that if a dominator exists, we could discard couples of different elements, and in the end, we should ends up with having spared at least one dominator element. If no dominator exists, we'll find out we have throw away all the elements looking for it.

However, our function should not return just the dominant value, but an index of the original vector that refers to an element with such a value. So we need to twist a bit that algorithm.
typedef std::pair<unsigned, int> ValInd; // 1

int solution(const std::vector<int>& input)
{
    int count = 0; // 2
    ValInd candidate; // 3
    for (unsigned i = 0; i < input.size(); ++i)
    {
        if (count == 0) // 4
        {
            count = 1;
            candidate.first = input[i];
            candidate.second = i;
        }
        else // 5
        {
            count += (candidate.first == input[i]) ? 1 : -1;
        }
    }

    if (count == 0) // 6
        return -1;

    if (std::count(input.begin(), input.end(), candidate.first) > (int) input.size() / 2)
        return candidate.second;

    return -1;
}
1. I need to remember the current value and the index where I have found it.
2. I am going to count how many elements, supposedly being dominators, I have found that are not yet matched with non-dominator. I could have pushed them on a stack, but it is obviously cheaper doing in this way.
3. Here I am going to store the current candidate.
4. No previous candidate survived the looping, the current element is chosen.
5. I have a candidate, I compare it against the current element. If they have the same value, I increase the counter, otherwise I decrease it.
6. If there is no pending candidate, I am sure there is no dominant value. Otherwise I count how many elements with the candidate value are in the input collection. If they are enough, bingo.

Go to the full post