Pages

Sorted vector instead of map

A map/multimap is a perfect container if you plan to perform on it a random mix of insertions, lookups, erasures.

If there is a pattern in its usage (a setup where data are inserted, a lookup where data are retrieved, a reorganization when the data is changed, ...) it could be worthy using a less fancy but more efficient sorted vector.

The idea is setting the data, sorting the vector, and then looking up using the standard searching functions for sorted containers. Then maybe change the data, sort again, and go on again with searching phase.

As we are about to see, the main difference to the sorted vector for set usage is in the way to compare values to the elements in the vector, that requires some caution.

After defining a synonim to the pair that we are going to use, we write a functor that we are going to use for comparisons. We actually have to write three overloads for operator(), since we have to manage comparisons among pairs, and pair against key both as first or second argument:

typedef std::pair<std::string, int> Data;

class DataCompare
{
private:
bool keyLess(const Data::first_type& k1, const Data::first_type& k2) const
{
return k1 < k2;
}

public:
bool operator()(const Data& lhs, const Data& rhs)
{
return keyLess(lhs.first, rhs.first);
}

bool operator()(const Data& lhs, const Data::first_type& k) const
{
return keyLess(lhs.first, k);
}

bool operator()(const Data::first_type& k, const Data& rhs) const
{
return keyLess(k, rhs.first);
}
};

Now we define an overload for the "put to" operator on ostream - this is for enabling us to use the utility dump() function previously described:

std::ostream& operator<<(std::ostream& s, const Data& p)
{
return s << '(' << p.first << ", " << p.second << ')';
}

So, we are ready to work with our vector that is about to emulate a map. We put some data in it, converting an int to a std::string, and we shuffle the data to simulate a random data creation:

std::stringstream ss;

std::vector<Data> vd;
vd.reserve(10);
for(int i = 0; i < 10; ++i)
{
ss << i;
Data d(ss.str(), i * i);
ss.str("");

vd.push_back(d);
}

std::random_shuffle(vd.begin(), vd.end());
dump(vd);

Before working on this vector, we have to sort it:

std::sort(vd.begin(), vd.end());
dump(vd);

And now we are ready to search data on it:

std::string target = "5";
if(std::binary_search(vd.begin(), vd.end(), target, DataCompare()))
std::cout << "binary_search found " << target << std::endl;

Let's use now our vector as a multimap emulation, adding some duplicated data. Before using it we should remember to sort again the vector:

vd.reserve(20);
for(int i = 0; i < 10; ++i)
{
ss << i;
Data d(ss.str(), i * i);
ss.str("");

vd.push_back(d);
}

std::sort(vd.begin(), vd.end());
dump(vd);

We are ready to search. First thing we typedef the constant iterator on the vector, to save some typing and keep the code readable, than we search through lower_bound(), paying attention to the comparison clause, and equal_range(), using a pair of iterators on the vector to check the result:

typedef std::vector<Data>::const_iterator CIT;
CIT it = std::lower_bound(vd.begin(), vd.end(), target, DataCompare());
if(it != vd.end() && !(it->first < target))
std::cout << "lower_bound found: " << *it << std::endl;

std::pair<CIT, CIT> range = std::equal_range(vd.begin(), vd.end(), target, DataCompare());
if(range.first != range.second)
std::cout << "equal_range found: " << *(range.first) << std::endl;

Many more details on sorted vectors in Item 23 of Effective STL, part of the Effective C++ book series by Scott Meyers.

No comments:

Post a Comment