This Thread: February 2014

Maximum subarray by Kadane

I have already shown a solution to the maximum subarray problem, based on an algorithm that follows the divide and conquer recipe. Here I give a couple of C++11 implementations based on the asyntotically better algorithm devised by Professor Kadane in the dynamic programming spirit.

The basic idea is keeping memory of the higher sum already reached and increasing a current sum. If the current sum gets higher than the historical one, also that one is increased.

This is a first version, that calculates just the sum:

typedef std::vector<int> Vector;

int maxSubAr(const Vector& data)
{
  int sum = 0;
  int sumTmp = 0;

  for(unsigned i = 0; i < data.size(); ++i)
  {
    if(int value = sumTmp + data[i] > 0) // 1
      sumTmp = value;
    else
      sumTmp = 0;

    if(sumTmp > sum) // 2
      sum = sumTmp;
  }

  return sum;
}

1. Add the current element value to the temporary sum. If this leads to a positive number, this will be the new temporary sum, otherwise I reset it.
2. If the temporary sum is bigger than the previously saved sum, I save this new value.

And that's it. Incredibly simple and effective.

Things get a bit more complicated when we want to get also the first-last index of the subsequence:

typedef std::array<int, 3> Info;
typedef std::vector<int> Vector;

Info maxSubArray(const Vector& data)
{
  int left = 0;
  int right = 0;
  int sum = 0;

  int leftTmp = 0;
  int sumTmp = 0;

  for(unsigned i = 0; i < data.size(); ++i)
  {
    int value = sumTmp + data[i];
    if(value > 0)
    {
      if(sumTmp == 0) // 1
        leftTmp = i;
      sumTmp = value;
    }
    else
      sumTmp = 0;

    if(sumTmp > sum) // 2
    {
      left = leftTmp;
      right = i;
      sum = sumTmp;
    }
  }

  return {{ left, right, sum }};
}

1. If I am at the beginning of the sequence, or if I have just reset the temporary sum, the current value is at the tentative first element of the subsequence.
2. When I see that the current sum is bigger than the one already discovered, I adjust also the left/right indexes.

Full C++11 code is on github, with a few GoogleTest test as a bonus.

Go to the full post

Maximum subarray by divide and conquer

Given an array containing positive and negative integers, we want to determine a subarray containing the largest sum of elements.

A couple of test cases, written in C++11 for GoogleTest, should make clearer the problem:

typedef std::array<int, 3> Info; // 1
typedef std::vector<int> Vector; // 2

TEST(MaxSub, Simple) // 3
{
  Vector input { 2, 3, 4, 5, 7 };

  unsigned last = input.size() - 1;
  Info sub = maxSubArray(input, 0, last);
  EXPECT_EQ(0, sub[0]);
  EXPECT_EQ(last, sub[1]);
  EXPECT_EQ(21, sub[2]);
}

TEST(MaxSub, Simple2) // 4
{
  Vector input {-2, -5, 6, -2, -3, 1, 5, -6};

  Info sub = maxSubArray(input, 0, input.size() - 1);
  EXPECT_EQ(2, sub[0]);
  EXPECT_EQ(6, sub[1]);
  EXPECT_EQ(7, sub[2]);
}

TEST(MaxSub, Negative) // 5
{
  Vector input {-2, -5, -2, -3, -6};

  Info sub = maxSubArray(input);
  EXPECT_EQ(0, sub[0]);
  EXPECT_EQ(0, sub[1]);
  EXPECT_EQ(0, sub[2]);
}

1. I want the function to return three values, the delimiting subarray indexes, and the found sum.
2. This is the container used to keep the input data.
3. Trivial case, all positive elements, the function would return 0, the last element index, and all element sum.
4. A typical case, nothing fancy.
5. If no positive element is in input, the result is an empty subarray.

We could use a divide and conquer approach to get a solution. Here is my C++11 implementation. Firstly, here is the divide part:

Info maxSubArray(const Vector& data, int left, int right)
{
  if(left == right) // 1
    return {{ left, right, data[left] }};

  int middle = (left + right) / 2; // 2

  Info subLeft = maxSubArray(data, left, middle); // 3
  Info subRight = maxSubArray(data, middle + 1, right);
  Info crossing = maxCrossing(data, left, middle, right); // 4

  return max(subLeft, subRight, crossing); // 5
}

1. If left and right index are actually the same, the problem is trivial.
2. Otherwise we split the interval in two parts. And
3. Recursively call the divide function on the left and right parts.
4. The hard job is done here. We need to check also the sequences that star before the middle point and end after it.
5. Once we get the three partial results, it is just a matter of checking which one has the highest sum and return it. To do that I have written a max() function that I guess you won't need to see to get how it works. In any case you will find it on github.

Let's see how to get the max subarray that crosses the central element. The idea is pretty simple, get the left and right max sum, starting from the middle point and moving outward, and then merge it:

Info maxCrossing(const Vector& data, int left, int middle, int right)
{
  int sum = 0;

  int maxLeft = middle;
  int leftSum = std::numeric_limits<int>::min();
  for(int i = middle; i >= left; --i) // 1
  {
    sum += data[i]; // 2
    if(sum > leftSum)
    {
      leftSum = sum;
      maxLeft = i;
    }
  }

  sum = 0; // 3
  int maxRight = middle + 1;
  int rightSum = std::numeric_limits<int>::min();
  for(int i = middle + 1; i <= right; ++i)
  {
    sum += data[i];
    if(sum > rightSum)
    {
      rightSum = sum;
      maxRight = i;
    }
  }

  return {{ maxLeft, maxRight, leftSum + rightSum }}; // 4
}

1. Loop on the elements, starting from the middle element to the leftmost one.
2. Tentatively add to the sum value the current value. If it is bigger to the precedently stored left sum value, adjust it and its leftmost index.
3. The right part is scanned specularly.
4. Put together left and right partial result to get the return value.

Full source C++11 code and a few test cases are on github.

Go to the full post

Merge sort

Typical example of a divide and conquer approach applied to the sorting problem. It's O(n lg n) time complexity makes it interesting for large enough data collections.

This algorithm's idea is that we could split recursively the problem until we have a tiny subproblem so simple that is trivial to solve it. Than we just have to merge the partial results to get the actual solution.

Here is the "divide" part, implemented in C++:

typedef std::vector<int> Data;

void mergeSort(Data& data, int left, int right) // 1
{
    if(left < right) // 2
    {
        int center = (left + right) / 2;
        mergeSort(data, left, center); // 3
        mergeSort(data, center + 1, right);
        merge(data, left, center, right); // 4
    }
}

1. The function expects in input the collection to sort and the first/last indexes to consider.
2. If there is just one element to sort, there is nothing to do.
3. Split the problem it two parts and recursively call the divide function on them.
4. Merge the partial solutions.

As one would expect, large part of the job is done by the merging function:

typedef std::queue<int> Queue;

void merge(Data& data, int left, int center, int right)
{
    Queue low; // 1
    Queue high;

    for(int i = left; i <= center; ++i) // 2
        low.push(data[i]);
    for(int i = center + 1; i <= right; ++i)
        high.push(data[i]);

    int i = left;
    while(!low.empty() && !high.empty()) // 3
    {
        if(low.front() <= high.front())
        {
            data[i++] = low.front();
            low.pop();
        }
        else
        {
            data[i++] = high.front();
            high.pop();
        }
    }

    while(!low.empty()) // 4
    {
        data[i++] = low.front();
        low.pop();
    }
    while(!high.empty())
    {
        data[i++] = high.front();
        high.pop();
    }
}

1. A couple of queues are used to temporary store the data while processing them.
2. Fill the queues with the data coming from the input collection.
3. Compare the data on the two queues, rearranging them in the original container.
4. Ensure all the possible elements left in the temporary queues are copied back to the input data.

Full code is on github. As bonus you will also find there a simple xUnit test for GoogleTest.

Go to the full post

Insertion sort

It's a simple sorting algorithm that, even if asymptotically expensive, O(N**2), it results to be cheap for small data set.

Its idea is comparing each element starting from the second up to the last one with its predecessor. While we found smaller items on its left, move those ones to the right, to make room to it.

Here is my C++ implementation:

void insertionSort(std::vector<int>& data)
{
    for(unsigned i = 1; i < data.size(); ++i) // 1
    {
        int value = data[i]; // 2
        int j = i - 1; // 3
        while(j >= 0 && data[j] > value)
        {
            data[j+1] = data[j]; // 4
            --j;
        }
        data[j+1] = value; // 5
    }
}

1. Loop on all the elements after the first one.
2. Cache the current element.
3. Loop backward on the elements on the left of the current one until we found something smaller, or there is nothing more to check.
4. Make room to the current element.
5. Place the element in order.

Full code is on github. As bonus you will find there also a basic GoogleTest test.

Go to the full post

Longest Common Subsequence by Dynamic Programming

Given two strings, find their longest common subsequence.

This is a well known programming problem (to read more about it, you could go to this wikipedia page) that is commonly used as introduction to the dynamic programming solving method.

Its simpler version requires to return just the size of the subsequence. Determining the actual result is a bit more complicated.

I have written a few test cases to clarify to myself what the problem is about, and to get driven in the software development. Here are a couple of them (written for the C++ GoogleTest framework):

int lcsSize(const std::string& first, const std::string& second);
std::string lcsStr(const std::string& first, const std::string& second);

TEST(Lcs, CaseSize)
{
  EXPECT_EQ(2, lcsSize("AGCAT", "GAC"));
  EXPECT_EQ(3, lcsSize("ABCDGH", "AEDFHR"));
  EXPECT_EQ(4, lcsSize("AGGTAB", "GXTXAYB"));
}

TEST(Lcs, CaseStr)
{
  EXPECT_EQ("GA", lcsStr("AGCAT", "GAC"));
  EXPECT_EQ("ADH", lcsStr("ABCDGH", "AEDFHR"));
  EXPECT_EQ("GTAB", lcsStr("AGGTAB", "GXTXAYB"));
}

As you would have already get, lcsSize() is the simpler one, and lcsStr() the more complete. Both functions require to generate a matrix where all the subproblem results are stored. If we want to get just the result size, we would just peak the result of the last one (the most right-below element). Otherwise we'll need to navigate the matrix to build the string up:

std::vector<std::vector<int>> lcs(const std::string& lhs, const std::string& rhs)
{
  const unsigned rows = lhs.size() + 1; // 1
  const unsigned cols = rhs.size() + 1;

  std::vector<std::vector<int>> buffer(rows); // 2
  for(unsigned i = 0; i < rows; ++i) // 3
    buffer[i].resize(cols);

  for(unsigned i = 1; i < rows; ++i) // 4
    for(unsigned j = 1; j < cols; ++j)
      buffer[i][j] = (lhs[i-1] == rhs[j-1]) ? // 5
          buffer[i-1][j-1] + 1 : std::max(buffer[i-1][j], buffer[i][j-1]);

  return buffer;
}

1. The matrix is going to have an extra row and column. This is not a strict necessity, but it makes the code more readable.
2. A simple way to implement a matrix is making it a vector of vectors. Notice the double '>' sign, it is legal in C++11 but it would confuse older compilers.
3. The vector of vectors ctor in the line above creates "rows" zero-sized rows. Here I assign them the right size. Remember that each element is initialized with the default value, that is zero.
4. Loop on all the "real" elements to solve the problem using the previous results.
5. To determine the current value, I check if the relative letters in the input string are the same (notice the "-1", it is due because of the fake top/left elements in the matrix). If it is the case, I have found another matching character, so I increase the value. Otherwise the lcs is not increasing, check the current biggest value and use it.

Now it is trivial to get the common subsequence size, it is the value stored in the last visited cell in the matrix:

int lcsSize(const std::string& lhs, const std::string& rhs)
{
  std::vector<std::vector<int>> buffer = lcs(lhs, rhs);
  return buffer[lhs.size()][rhs.size()];
}

Returning the actual subsequence requires some more code:

std::string lcsStr(const std::string& lhs, const std::string& rhs)
{
  std::vector<std::vector<int>> buffer = lcs(lhs, rhs);

  std::pair<unsigned, unsigned> cur { lhs.size(), rhs.size() }; // 1
  std::vector<char> result; // 2
  while(unsigned size = buffer[cur.first][cur.second] > 0) // 3
  {
    if(buffer[cur.first-1][cur.second] < buffer[cur.first][cur.second] &&
        buffer[cur.first][cur.second-1] < buffer[cur.first][cur.second])
    { // 4
      result.push_back(rhs[cur.second-1]);
      --cur.first;
      --cur.second;
      --size;
    }
    else if(buffer[cur.first][cur.second-1] == buffer[cur.first][cur.second]) // 5
      --cur.second;
    else
      --cur.first;
  }

  return std::string(result.rbegin(), result.rend()); // 6
}

1. The starting point is the matrix right bottom.
2. Each time I find a subsequence character, I'll push it here.
3. We know the subsequence length, we can use it to stop looping when we find all its elements.
4. This cell is marked as being relative to a matching character. Save its value and move up and to the left.
5. Otherwise, move in the matrix (to the left or up), until we find another border character.
6. Now is just a matter of building a string that revert the sequence stored in the resulting vector.

Go to the full post

This Thread

Pages

Maximum subarray by Kadane

Maximum subarray by divide and conquer

Merge sort

Insertion sort

Longest Common Subsequence by Dynamic Programming

Labels

Recent comments

Tutorials - Examples - Books

Have a look at these blogs ...