Commuting Engineer CodeEval problem

We have the coordinates of a bunch of places that we want to visit. Being nerds, we are not happy if we don't generate an algorithm to determine our path. But, the nature of the problem is such that we are ready to deal with a loose approximation to a good solution. Frankly speaking, we are aiming to get what could look, at least at first sight, as a not-so-bad solution.

You can get a more detailed description of the problem on the CodeEval blog. There you can even enter your solution that would be used (at least at the time when I am writing this post) as a first screening for some job interview.

You can't use the solution I am proposing here for a few good reason. Firstly, it is not a good idea to use a piece of code written by someone else to represent you. Secondly, I have written and tested my code for C++11 on GCC 4.8, that is a bit too modern stuff for the current CodeEval requirements. And thirdly, come on, you don't want to give away a good chance of having some fun writing your own solution.

As I said, I am not aiming to the best possible solution, and this is not caused by sloppiness, there are good reasons for that. Let's see a couple of them.

NP-hard

If you have some knowledge of theory of computation, you should have recognized the problem as an instance of the well known Traveling Salesman Problem, commonly called just TSP. It is an interesting problem because it can be stated in a few words, it looks very easy indeed, but it comes out to be an NP-hard (Non-deterministic Polynomial-time hard) one. That means, forget about to come out with an elegant solution.

In the real life, what I would do if I had to solve a problem like that, is checking for a library provinding some adequate algorithm. For instance, you could have a look to BGL, the Boost Graph Library.

But here we can't use external libraries, we have to rely just on the standard ones. So, what I'll do, is implementing a greedy algorithm, choosing any time the local best solution. It is easy to show as such a strategy is heavily flawed, but at least it assure us we get a solution that is not the worst one, and it does it in a reasonable time.

Geography

As you should know, Earth is not flat. We usually think to it like a sort of sphere, but also this one is nothing more than a weak approximation. This implies that we shouldn't consider the coordinates of a place as they were on a two-dimensional surface.

Besides, we are going to check the distance as if we could go in a straight line from one point to the other, and this is usually not the case.

Splitting the problem

I have reduced the original problem to something that I can actually solve with a relatively simple piece of code. Now I split it in a few simpler problems, for each of them I could write a function that solve it.

Parsing the input

Our input is a number of strings, each of them like this one:
1 | CodeEval 1355 Market St, SF (37.7768016, -122.4169151)
We are interested in its ordinal number, and in the place longitude and latitude.

I decided to organize my data using STL pair's and a vector, and I gave them names that hopefully help to understand better what going on in the code:
using Position = std::pair<double, double>;
using Location = std::pair<int, Position>;
using Locations = std::vector<Location>;
Given that, what I want is to extract a Location from each input string, that is going to be pushed in a Locations container. This is the declaration of the function I am thinking of:
Location parse(const std::string& input);
This test case (written for Google Test) shows how I expect it to behave:
TEST(CommEng, Parse1)
{
    std::string input("1 | CodeEval 1355 Market St, SF (37.7768016, -122.4169151)");
    Location loc = parse(input);

    EXPECT_EQ(1, loc.first);
    EXPECT_DOUBLE_EQ(37.7768016, loc.second.first);
    EXPECT_DOUBLE_EQ(-122.4169151, loc.second.second);
}
Notice I use the EXPECT_DOUBLE_EQ() gtest macro to check the actual value extracted from the input string. This is to avoid, or at least reducing, rounding problems. What I basically do using this macro is delegating to GoogleTest the job of choosing an appropriate epsilon that determines when the two compared values are considered about equal.

Here is my function implementation:
Location parse(const std::string& input)
{
    int nr = std::stoi(input); // 1

    std::string::size_type bracket = input.find('('); // 2
    if(bracket == std::string::npos)
        return {}; // 3

    std::string::size_type comma = input.find(',', bracket);
    if(comma == std::string::npos)
        return {};

    double lon = std::stod(input.substr(bracket + 1)); // 4
    double lat = std::stod(input.substr(comma + 1));
    return { nr, {lon, lat}}; // 5
}
1. stoi() is the standard C++11 function similar to old atoi() but having as input parameter an STL-string and not a C-string. Here I am extracting the place ordinal number, that is expected to be right at the beginning of the string. Real code should be more robust, and be ready to (probably) throw an exception.
2. Just a minimal error checking, if no open bracket and following comma is found, an empty Location is returned.
3. Maybe is worthy to remember that this C++11 notation means "call the default ctor for the expected object". So, what I am doing here is building an "empty" Location.
4. Extract the input substring from the expected position (again, more error handling required in production code), than convert it to double using the C++11 stod() function.
5. Construct a Location object and return it to the caller.

Approximated distance

Just apply the pythagorean theorem to calculate the distance between to positions. For what I have said above, the result should be considered just an approximation:
double distance(const Position& beg, const Position& end)
{
    return std::sqrt(std::pow(beg.first - end.first, 2) + pow(beg.second - end.second, 2));
}
The closest point

My greedy algorithm needs to identify the closest Location to a specific Position:
Locations::iterator findClosest(Position& beg, Locations& others)
{
    double closest = std::numeric_limits<double>::max(); // 1
    Locations::iterator pos = others.end();
    for(Locations::iterator it = others.begin(); it != others.end(); ++it) // 2
    {
        double current = distance(beg, it->second); // 3
        if(current < closest)
        {
            closest = current;
            pos = it;
        }
    }

    return pos; // 4
}
1. Initially I set as a solution as "nothing sensible", so the closest distance found is set the the biggest double number available, and the found position to invalid - the end() iterator.
2. Loop on all the available other points.
3. Calculate the current distance, if I found a good candidate, mark it as such, and check if there is anything better.
4. Return the iterator to the best solution I found.

Generating a path

The core of my algorithm is a function that gets in input a container of Locations and gives back a vector containing the path, where each step is identified by the Location descriptor.
std::vector<int> getPath(const Locations& input)
{
    if(input.empty()) // 1
        return {};

    Locations locations(input); // 2
    std::vector<int> results;
    Locations::iterator current = locations.begin(); // 3
    do { // 4
        Position curPos = current->second; // 5
        results.push_back(current->first);
        locations.erase(current);

        current = findClosest(curPos, locations); // 6

    } while(!locations.empty());

    return results;
}
1. Trivial case, nothing in input, nothing in output.
2. Create an input local copy, since I am about to modify it.
3. As for requirements, the path should start from the first element in the provided list.
4. Loop until all the input is consumed.
5. Erase the current position from the list of Locations that I haven't visited yet, but before that, push its descriptor in the output vector.
6. Find the next "current" element.

Full C++11 source code, and some more test cases, on github.

3 comments:

  1. I wonder, did this pass the codeeval submission. Because I would suspect that they are expecting an exact sol'n and not an approximation.

    ReplyDelete
    Replies
    1. Hey Mayo. This was my first CodeEval problem. I was so crossed when I found out that C++11 was not supported that I couldn't find the push to backport it to C++98. I should try it now, since they updated their compiler.
      I don't think they want an exact solution, but sometimes it is not easy to get properly their requisites. It would be nice if they added a Q&A section to their site, so that we could ask clarifications.

      Delete
    2. I just found out that it won't pass if you assume an Euclidian plane or (probably) use a greedy algorithm.

      Delete