Pages

Matching bracket

Think of a string (it could be very long) composed exclusively by round brackets (aka parentheses). We want to check if it is correct, in the sense that each open bracket should have a matching close one. So, a string starting with a ")" is considered wrong, while the "()()()" string is a good one. Nested brackets are allowed, so "((()())(()()))" is a good sequence.

It is not difficult to think to a solution, still, you should not forget that unharmful looking hint on the possible large input size.

I bumped into this problem when having a coffee-break with colleagues. We were talking about interview questions, and someone remembered this one. In a few minutes we envisioned a couple of possible solutions, neither of them looked satisfactory. After a bit more of talking, and we found out a third candidate, that looked quite good.

First attempt

Naively, we could iteratively remove each "()" from the input string, until we could not find anymore such pattern. After that, we simply check if the string is empty. The good thing about this solution is that is easy to describe and implement. Less convincing is that we should copy the input to a buffer, and then perform on it a lot of splitting and splicing of substrings.

Let's see a possible C++ implementation of this idea:
bool simpleChecker(const std::string& input)
{
  std::string buffer(input);

  if(buffer.empty()) // 1
    return true;

  if(buffer.length() % 2) // 2
    return false;

  while(true)
  {
    std::string::size_type pos = buffer.find("()"); // 3
    if(pos == std::string::npos)
      break;

    buffer.replace(pos, 2, ""); // 4
  }

  return buffer.empty(); // 5
}
1. The requisites do not say if an empty string should be considered valid or not. Let's assume it should.
2. If we have an odd number of elements, the string is obviously invalid.
3. We stop looping if we can't find a "()" element.
4. When we find a "()", we replace it with an empty string (i.e., we remove it).
5. If the string is empty, the validation succeeded.

This code works fine, even for moderately large strings. In the best case, a monotonous sequence of "()()()()", the standard string replace() function is so nicely written that on my machine it takes a handful of millisecs to check an input of thousands characters. On the other side, it is easy to spot a sequence that gives troubles to replace(). The test case here below takes me something around one second on a 50K string:
TEST(CPPSimple, FiftyKWorstCase)
{
  const int SIZE = 50 * 1024;

  char buffer[SIZE + 1];
  for(int i = 0; i < SIZE / 2; ++i)
    buffer[i] = '(';
  for(int i = SIZE / 2; i < SIZE; ++i)
    buffer[i] = ')';
  buffer[SIZE] = '\0';

  EXPECT_TRUE(simpleChecker(buffer));
}
And his bigger brother, 250K sized, takes something like half a minute.

Recursive approach

The main issue of the first try is that it modifies the string. A better approach would be checking the string for each matching parenthesis, without doing any change. We read the first character in the passed string, it should be an '(', if the next one is a ')', cool, we have completed a sequence, we can move to the third character. Otherwise, we call recursively the same function, to extract a subsequence. It should return the length of the found sequence, or zero, if the subsequence is not valid. The caller would check the returned value, and move in the string accordingly, to see if the next character closes its sequence, or if it has to call again itself to check if we have another subsequence.

As you can see, this solution is more contrived, but promises to be more performant. Still, it has a big issue. In the same worst case as seen above, we have such a huge number of recursive function calls that it could easily lead to corruption in the memory stack.

Just count them

Yes, just count the brackets. After all, what we want is having the same number of open and close parenthesis.

Actually, we have also to take care that we don't want to see a close bracket when there is nothing to close, but it is easy to take care of it. Each time we have an open bracket, we put it on a stack, and we pop out one of them as we get a close bracket. If we get a closing bracket before any opening one has been scanned, the stack is empty, and so we know the string is not correct.

Here is how I have implemented this solution:
bool stackChecker(const char* input)
{
  int tracker = 0; // 1

  for(int i = 0; input[i] != '\0'; ++i)
  {
    switch(input[i])
    {
    case '(': // 2
      ++tracker;
      break;
    case ')': // 3
      if(tracker)
        --tracker;
      else
        return false;
    } // 4
  }

  return tracker ? false : true; // 5
}
1. I use this integer as the stack I talked above. Since I am not interested in the actual position of the brackets, I don't need a real stack, I just need to keep track of how many open brackets have been read.
2. I see an open bracket in the string, put it in the stack, and go to the next character.
3. Close bracket. If the stack is not empty, it matches with the last open bracket in the stack, I pop it and I am ready to check the next one. If the stack is empty, the input string is not valid.
4. What if the string contains anything else? I assumed a "don't care" behavior, but possibly the string could be considered invalid. If that is the case, a default section should be added to the switch, to always return false.
5. The complete string has been scanned, if no open bracket is left in the stack, we consider it valid.

No comments:

Post a Comment