Splitting a string in plain C++

To split a string I would normally use the Boost tokenizer function. Sometimes I can't. For instance when I am having fun in solving some online programming problem, where no extra library could be used. In this case I fall back to this homemade split function.

boost::tokenizer() is smarter, however this bare version is usually enough for what I need:
std::vector<std::string> split(const std::string& input, char sep) // 1
{
    std::vector<std::string> tokens;
    std::size_t beg = 0, end = 0; // 2
    while ((end = input.find(sep, beg)) != std::string::npos) // 3
    {
        if(beg < end) // 4
            tokens.push_back(input.substr(beg, end - beg));
        beg = end + 1; // 5
    }
    if(beg < input.size()) // 6
        tokens.push_back(input.substr(beg));

    return tokens;
}
1. It accepts in input a constant reference to the string we want to split and the unique character expected as separator. As output we get the found tokens in a vector of strings.
2. I am going to loop on the input string, putting in beg and end the delimiter positions for each token.
3. Find the next position for the separator, until one of them is available at all.
4. If the token is not empty, extract it from input and push it in the tokens vector.
5. The next token would start after the current separator position.
6. Push the last token, not considering a possible empty one at the end of the input string.

As example, see how I have used this split() function to solve the Swap Numbers codeeval problem, that asks to swap the first and last character in each word in a blank separated string:
std::string solution(const std::string& input)
{
    std::vector<std::string> words = split(input, ' '); // 1
    for(std::string& word : words) // 2
        std::swap(word.front(), word.back());

// ...
1. Using the above defined split() function on the input string for the blank separator.
2. Each word in the vector has its first and last character swapped.

Go to the full post

CodeEval magic numbers

We have to detect all the numbers in a given interval that are "magic".

You could find the full description of the problem, and input your solution, on CodeEval.
The most interesting part is the description of the numbers we consider as magic:

* No digits repeat.
* Beginning with the leftmost digit, take the value of the digit and move that number of digits to the right. Repeat the process again using the value of the current digit to move right again. Wrap back to the leftmost digit as necessary. A magic number will visit every digit exactly once and end at the leftmost digit.

6231 is magic because there is no duplicate digit and:
* in position 0 there is a 6, move to position 2 -> (0 + 6) % 4 = 2
* in position 2 there is a 3, move to position 1 -> (2 + 3) % 4 = 1
* in position 1 there is a 2, move to position 3 -> (1 + 2) % 4 = 3
* in position 3 there is a 1, move to position 0 -> (3 + 1) % 4 = 0

Stripping down all the less interesting code, here is how implemented this algorithm in C++.

Firstly ensure there is no duplicated digit in the number:
for(auto it = number.begin(); it != number.end(); ++it)
{
  if(std::find(it + 1, number.end(), *it) != number.end())
    return false;
}
Then loop on the digits until we land on a digit that we have already visited:
std::vector visited(number.size(), false);
int pos = 0;
while(!visited[pos])
{
  visited[pos].flip();
  int cur = number[pos] - '0';
  pos = (pos + cur) % number.size();
}

Finally return success if we ended at the beginning of the string and all the digits have been visited:
return pos == 0 && std::find(visited.begin(), visited.end(), false) == visited.end();

Go to the full post

CodeEval knight moves

Given the position of a knight on the chessboard, return all its possible moves. In alphabetical order.

You could solve this problem as a CodeEval easy challenge.

It came natural to me to solve it in a functional way, still writing C++11 code.

My solution function would put the result in a string using a local lambda function:
std::string result;

auto push_back = [&result] (char x, char y)
{
  result.push_back(x);
  result.push_back(y);
  result.push_back(' ');
};
Nothing of much interest here. My push_back() lambda captures the result string, and push back to it its parameters, adding a blank at the end as separator for the possible next element.

More interesting this other lambda. It captures the string passed as input parameter, that would contain the current knight position in a format like "g2", and the above defined push_back lambda. As parameter it gets the column where I want to move the knight and how many squares I could move up or down:
auto add = [&input, &push_back](char x, int step)
{
  if (input[1] > '0' + step)
    push_back(x, input[1] - step);
  if (input[1] < '9' - step)
    push_back(x, input[1] + step);
};
If the knight does not fall off the chessboard, I push back the resulting positions using the push_back lambda. Now I just have to call add() for each meaningful column. I can do that in this way:
char x = input[0];
if (x > 'b')
  add(x - 2, 1);
if (x > 'a')
  add(x - 1, 2);
if (x < 'h')
  add(x + 1, 2);
if (x < 'g')
  add(x + 2, 1);
Just for readability, I introduced the x variable that should contain the name of the column where the knight is placed. If it is to the right of the 'b' column, I could move two steps to the left, and then call the add() lambda to verify if it could be moved one square more up or down. And so on. Full C++ code is available on github. As well as a few test cases.

Go to the full post

CodeEval string permutations

Given in input a string that could contains letters and numbers, we should output all its permutations comma separated and alphabetically ordered according to the usual conventions.

This is a CodeEval sponsored challenge, so I'll just write here a few hints and not the full solution.

There is a very useful C++ STL function that does all the job for us, namely std::next_permutation(). It assumes the collection it works on has been in the initial state, and it modifies it to the next permutation, until there is anything to permutate. When there is nothing more to do, it returns false.

So we would std::sort() our collection - that should be modifiable, and then std::next_permutation() it in a loop, storing the permutations in a temporary buffer, or maybe directly outputting it to standard output.
std::sort(s.begin(), s.end());
// ...
do {
  // ...
} while (std::next_permutation(s.begin(), s.end()));
The only minor nuisance left now is the comma. We should add it to each permutation but the last one. There are a number of ways to address this point, for instance you could see it in a different perspective. Each permutation has a comma before it, but the first one. Or maybe you could blissfully ignore the point when you build the solution, and simply remove the last character before returning your solution. If you are following this strategy, the C++11 pop_back() std::string method could be useful.

If pop_back() is not available yet on your compiler, here is an alternative solution:
s.erase(s.size() - 1, 1); // s.pop_back();
In any case you should remember to check the size of your input string. If it is empty or consisting of just one character there is nothing to do. Otherwise you are safe in assuming there is something you could erase in the resulting output.

Go to the full post

Couples and Single

A vector of integers is given. All values are in the range [1..100,000,000] and present in couples, but one that is single. Return it.

A C++11 GoogleTest test case to clarify the problem:
TEST(CouplesAndSingle, Simple)
{
    ASSERT_EQ(42, solution({ 1, 42, 1 }));
}
My solution uses an hash table where I store each new value value I see in input. If I see it a second time, I remove it from the table. In the end, I should have just one element in the hash table, the single value. It is not required, however I decided to return 0 in case of unexpected input.

Here is how I have implemented it:
int solution(const std::vector<int>& input)
{
    std::unordered_set<int> buffer; // 1
    for(int value : input) // 2
    {
        if(value < 1 || value > 100000000) // 3
            return 0;

        auto it = buffer.find(value); // 4
        if(it == buffer.end())
            buffer.insert(value);
        else
            buffer.erase(it);
    }
    return (buffer.size() != 1) ? 0 : *buffer.begin(); // 5
}
1. STL unordered_set implements hash table in C++11.
2. For each element in input.
3. Paranoid check.
4. If the current value is not in buffer, insert it. Otherwise remove it.
5. The single value is the first (and unique) element in buffer.

Go to the full post

Codility Peaks

Given a vector of integers, we want to split it in the maximum numbers of same-sized blocks, such as any block would contain at least one peak (aka local maximum).

The problem, part of the Codility Prime and composite numbers section, is not complicated, however I needed to spend some time thinking over it to understand what really was its point. As usual, I found helpful writing test cases, here are a few of them:
TEST(Peaks, Given) // 1
{
    std::vector<int> input { 1, 2, 3, 4, 3, 4, 1, 2, 3, 4, 6, 2 };
    ASSERT_EQ(3, solution(input));
}

TEST(Peaks, ThreeSpikes) // 2
{
    std::vector<int> input { 0, 1, 0, 0, 1, 0, 0, 1, 0 };
    ASSERT_EQ(3, solution(input));
}

TEST(Peaks, ElevenSaw) // 3
{
    std::vector<int> input { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0 };
    EXPECT_EQ(1, solution(input));
}

TEST(Peaks, TwelveSaw) // 4
{
    std::vector<int> input { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
    EXPECT_EQ(4, solution(input));
}

TEST(Peaks, NoPeak) // 5
{
    std::vector<int> input(10);
    std::iota(input.begin(), input.end(), 1);
    ASSERT_EQ(0, solution(input));
}

TEST(Peaks, MaxPlain) // 6
{
    std::vector<int> input(100000);
    for(unsigned i = 1; i < input.size(); i += 2)
        input[i] = 1;
    ASSERT_EQ(25000, solution(input));
}
1. Codility provided test case. In this case the smallest size we can chose to have subsequences of the same size, each of them containing a peak, is four. Leading to a solution of 3.
2. Very plain case. Three blocks sized three are the optimal solution.
3. The input sequence is sized 11. That is, a prime number, we can't find any subinterval in this case, we just have to check if there is at least a peak, to decide to return 0 or 1.
4. Twelve could be split in 2, 3, 4, 6 sized units.
5. I have used iota() to generate a monotone function. No peak in it.
6. This test case is to check the algorithm performance. It should be easy to see that the minimum acceptable block size is four (two won't work because of the last interval).

Here is my C++11 solution:
int solution(std::vector<int>& input)
{
    if(input.size() < 3) // 1
        return 0;

    std::vector<unsigned> peaks(input.size() + 1); // 2
    for(std::size_t i = 1; i < input.size() - 1; ++i)
    {
        if(input[i] > input[i - 1] && input[i] > input[i + 1])
            peaks[i+1] = 1;
    }
    std::partial_sum(peaks.begin(), peaks.end(), peaks.begin());

    for(std::size_t size = 3; size <= input.size() / 2; ++size) // 3
    {
        if(input.size() % size) // 4
            continue;

        bool hasPeaks = true;
        for(std::size_t end = size; end < peaks.size(); end += size) // 5
        {
            if(peaks[end - size] == peaks[end])
            {
                hasPeaks = false;
                break;
            }
        }
        if(hasPeaks) // 6
            return input.size() / size;
    }

    return peaks.back() ? 1 : 0; // 7
}
1. Get rid of trivial cases.
2. I am going to use partial sum to check if there is any peak in a give interval (some more detail on this STL function and its mathematical meaning in a previous post). To simplify the code, I add a spurious element at the beginning, set to zero, and shift all the partial sums one position to the right.
3. Check all the possible subsequence, from 3 to half the size of the input sequence. It won't make sense to check for subsequences sized less than three, since it is not possible to conceive a to have a local maximum in this case - see comments for more details.
4. We want all-equal blocks, so we accept only sizes that are divisor of the input sequence size.
5. Check all the subsequence to ensure they contain at least a peak.
6. If the current block size is good, return the number of them in the input sequence.
7. We couldn't find any way of splitting the sequence, return 1 if there is at least a peak, 0 otherwise.

Go to the full post

Codility MinPerimeterRectangle

Given an integer that represents the area of a rectangle, return the minimal perimeter of a rectangle having that area and integer sides.

You could find and submit your solution to this problem in the Codility Prime and composite numbers section.

It is easy to see that if a rectangle has area 30, its integers sides could only be (1, 30), (2, 15), (3, 10), and (5, 6). The latter giving the minimal perimeter.

My original C++ solution was clumsier than the one you are seeing now. I was checking the input for squares and worse, I started looping from the wrong side. Thanks to the anonymous suggestion below, I have cleaned up my code in this way:
int solution(int input)
{
    assert(input > 0 && input < 1000000001); // 1

    for(int i = static_cast<int>(std::sqrt(input)); i > 0; --i) // 2
    {
        if(input % i == 0) // 3
            return 2 * ((input / i) + i);
    }

    return 0; // 4
}
1. Enforce the problem requirements in the code through an assert.
2. Remembering that among all rectangles having the same area, the square is the one with minimal perimeter, I should start checking from the closest to the optimum down to the minimal rectangle.
3. If the current candidate divides the area with no remainder, it is the best solution. Calculate the perimeter and return it.
4. We should never reach this line. However usually the C++ compilers complain for a missing return value if they don't see it.

Go to the full post

Codility CountFactors

Return the number of factors for a given positive integer.

You can find this problem in the Codility Prime and composite numbers section. In the pdf associated to this block of exercises it is explained why we shouldn't check all the possible factors, but we could stop at the square root of the examined number.

Still, the solution there is written in Python. If you want to use a strong typed language, you'd better pay attention to large int's, because overflow is a risk we should directly take care of.

That's the reason why I added to the test case provided with the problem also an explicit check on the biggest integer that could be passed in input:
TEST(CountFactors, Given)
{
    ASSERT_EQ(8, solution(24));
}

TEST(CountFactors, Simple)
{
    EXPECT_EQ(1, solution(1));
    EXPECT_EQ(2, solution(2));
    EXPECT_EQ(2, solution(3));
    EXPECT_EQ(3, solution(4));
    EXPECT_EQ(2, solution(5));
}

TEST(CountFactors, Max)
{
    ASSERT_EQ(2, solution(std::numeric_limits<int>::max()));
}
Here is my C++ solution. Since this is a problem almost completely based on numbers properties, it is very simple to port this piece of code to other languages:
int solution(int input)
{
    assert(input > 0);
    if(input == 1) // 1
        return 1;

    int result = 2; // 2
    unsigned i = 2; // 3
    for(; i * i < static_cast<unsigned>(input); ++i)
        if(input % i == 0)
            result += 2;
    if(i * i == static_cast<unsigned>(input))
        ++result;

    return result;
}
1. Trivial case.
2. The algorithm is designed to be symmetric, any factor we find has its own double that has to be considered. Here we are considering 1, that implies also input itself.
3. Let's loop on all the other possible factors, starting from 2, up to the root of the input value. As I hinted above, I should pay attention to possible overflow, this is why I am using unsigned int instead of plain int.

Go to the full post

Codility MaxSliceSum

Check the passed integer vector for the maximum sum of any non empty possible slice in it. You should expect also possible negative results.

This is a slight variation on the well known Kadane's algorithm, that you can find in the Maximum slice Codility section.

To stress the difference against the standard behavior, I have written a couple of test case that I added to the one provided in the problem definition itself:
TEST(MaxSliceSum, Given)
{
    std::vector<int> input { 3, 2, -6, 4, 0 }; // 1
    ASSERT_EQ(5, solution(input));
}

TEST(MaxSliceSum, Minimal)
{
    std::vector<int> input { -1000000 }; // 2
    ASSERT_EQ(-1000000, solution(input));
}

TEST(MaxSliceSum, Minimal2)
{
    std::vector<int> input { -1000000, -1000000 };
    ASSERT_EQ(-1000000, solution(input));
}
1. As in the normal case, the winning slice is the one including the first two elements.
2. At least an element should be considered. Here the input collection contains just an element, so the winning slice contains it.

And here is my C++11 solution:
int solution(std::vector<int>& input)
{
    assert(!input.empty()); // 1

    int result = input.front(); // 2
    int candidate = input.front();
    for(auto it = input.begin() + 1; it != input.end(); ++it) // 3
    {
        candidate = std::max(*it, candidate + *it); // 4
        result = std::max(result, candidate);
    }

    return result;
}
1. Not required by codility, just to stress the fact that it is assumed the input collection is not empty.
2. The first candidate is a slice that include only the first element.
3. Let's loop on all the other elements.
4. As for the standard Kadane's algorithm, but here the new candidate comes from the comparison against the current element and the sum between it and the previously candidate.

Go to the full post

Codility MaxDoubleSliceSum (Kadane)

Given a vector of integers, return the maximal sum of any double slice.

See the previous post for details and test cases. There I solved the problem using an intuitive but performance-wise terrible algorithm. Here I give a linear solution based on the Kadane's algorithm.

Codility itself gives us a big hint placing this problem in its Maximum slice section. The question is, how to adapt the Kadane's algorithm here, where we have an hole in the interval?

The solution requires us to change perspective. Instead of looking at the central element as an enemy, think to it as the pivot of the solution. This leads almost automatically to a clean specular solution, where we have two sub-intervals that go from one limit of the vector to the central pivot.

Looking at the code that I have written should make things clearer:
int solution(std::vector<int>& input)
{
    if(input.size() < 4)
        return 0;

    std::vector<int> left; // 1
    left.reserve(input.size() - 2);
    left.push_back(0); // 2
    for(auto it = input.begin() + 1; it < input.end() - 2; ++it) // 3
    {
        left.push_back(std::max(0, left.back() + *it));
    }

    std::vector<int> right; // 4
    right.reserve(input.size() - 2);
    right.push_back(0);
    for(auto it = input.rbegin() + 1; it < input.rend() - 2; ++it)
    {
        right.push_back(std::max(0, right.back() + *it));
    }

    int result = 0; // 5
    auto ir = right.rbegin();
    for(auto il = left.begin(); il != left.end(); ++il, ++ir)
    {
        int tentative = *il + *ir;
        result = std::max(result, tentative);
    }
    return result;
}
1. In the left vector I am going to store all the maximum subarray values from the first interesting value up to the last one. Each value represent the best result for the left slice, up to the current pivot position.
2. If the pivot is in the first valid position, there wouldn't be anything to its left, so the first element in the left vector would be a zero.
3. All the other left elements are calculated following the Kadane's algorithm.
4. The right vector is exactly the same of the left one, just mirrored through the pivot. If the pivot was in the most rightwise position, there would be nothing in the right slice, and a zero in the right vector. Than, it is just applying Kadane moving from right to left on the input collection.
5. Now we have to combine the two partial results to get the expected solution. The vector left contains the maximum subarray values up to the pivot, the right one down to it. So, what we need to do is adding the first element in left with the last element in right, and move step by step to check all the possible solutions.

Go to the full post

Codility MaxDoubleSliceSum (naive)

Given a vector of integers, return the maximal sum of any double slice.

What's a double slice is defined in this Codility problem, that you can find in their Maximum slice section, dedicated to exercises that could be solved by the Kadane's algorithm.

In a few words, a double slice is defined by an ordered triplet of indexes of a vector. The elements we have to consider are the ones strictly contained in the sub-intervals. A few examples in form of C++11 GoogleTest test cases should clarify the concept:
TEST(MaxDoubleSliceSum, Empty)
{
    std::vector<int> input { 3, 2, 6 }; // 1
    ASSERT_EQ(0, solution(input));
}

TEST(MaxDoubleSliceSum, Minimal)
{
    std::vector<int> input { 3, 2, 6, 1 }; // 2
    ASSERT_EQ(6, solution(input));
}

TEST(MaxDoubleSliceSum, Given)
{
    std::vector<int> input { 3, 2, 6, -1, 4, 5, -1, 2 }; // 3
    ASSERT_EQ(17, solution(input));
}
1. We can think only of one triplet for this test case, that would contain no element.
2. Two possible interesting triples: (0, 1, 3) and (0, 2, 3). The first one has an empty left sub-interval and a right one containing just the element '6'. The second one has only the left sub-interval non empty, with the element '2' inside. The maximal sum is given by the first case.
3. This is the test case provided by codility. The triplet (0, 3, 6) leads to the best solution.

A naive solution

In this post I provide a natural implementation that ends up to be far too costly to be acceptable on a reasonably large input. I simply check all the possible triples elements:
int simpleSolution(std::vector<int>& input)
{
    if(input.size() < 4) // 1
        return 0;

    int result = 0;
    for(unsigned beg = 1; beg < input.size() - 2; ++beg) // 2
    {
        for(unsigned end = beg + 2; end < input.size(); ++end) // 3
        {
            int max = std::accumulate(input.begin() + beg, input.begin() + end, 0); // 4
            int tentative = 0;
            for(unsigned i = beg; i < end; ++i) // 5
            {
                tentative = max - input[i];
                result = std::max(result, tentative); // 6
            }
        }
    }

    return result;
}
1. As we have seen in the first test case, if container has less than four elements, there is no interesting double slice.
2. The first element in a triplet could be the second element in the vector. Go from this element up to the last useful one.
3. The possible "end" element (last + 1) for a triplet is in this range.
4. Sum all the element in the beg..end interval.
5. Tentatively remove a single element in the interval, to represent the median element in the triplet.
6. If a better possible solution is found, save it.

This solution works fine against the test cased proposed above. But think what would happen in on a larger input:
TEST(MaxDoubleSliceSum, Hundreds)
{
    std::vector<int> input(601);
    std::iota(input.begin(), input.end(), -300);
    ASSERT_EQ(44850, solution(input));
}
A few hundred elements should be enough to perceive a substantial performance degradation.

Removing the most internal loop (5) doesn't seem complicated, still it won't help much. We still have a O(n**2) algorithm to deal with.

We need to take a completely different way. And Kadane is there to pointing us in the right direction. See next post for details.

Go to the full post

Codility MaxProfit

A vector of integers represents the price evolution of a stock share in a period. We should return the highest profit we could had buying and then selling that share.

If you have read the post about the maximum slice problem, you should see how this Codility problem could be solved in linear time, using what is commonly known as Kadane's algorithm.

Here I implemented the given test case in C++11 for GoogleTest:
TEST(MaxProfit, Given)
{
    std::vector<int> input { 23171, 21011, 21123, 21366, 21013, 21367 };
    ASSERT_EQ(356, solution(input));
}
Buying on day 0 and selling on day 1 would lead to a loss, while 1 -> 2 would give a gain of 21123 - 21011 = 112.
The best result that we could get is 21367 − 21013 = 354.

Following the max slice algorithm, I have implemented a solution in this way:
int solution(std::vector<int>& input)
{
    if(input.size() < 2) // 1
        return 0;

    int gain = 0;
    int buy = input.front();

    for(auto it = input.begin() + 1; it != input.end(); ++it) // 2
    {
        int sell = std::max(0, *it - buy); // 3
        gain = std::max(sell, gain); // 4
        buy = std::min(buy, *it); // 5
    }

    return gain;
}
1. Trivial cases, I can't get any real gain.
2. Loop on all the input elements, starting from the second one.
3. It is worthy to sell today only if we should get a positive outcome.
4. If selling today leads to the current best result, store this gain.
5. This is the key point of the algorithm. If the current stock price is lower that the previously stored one, we could get rid of that one and start using the new one.

Go to the full post