Heapsort

Heapsort is an in-place sorting algorithm, like insertion sort, that asympthotically scores a nice O(N lg N) time complexity, like merge sort.

It makes use of the heap data structure, that is a normal array seen as a nearly complete binary tree, in its max-heap flavor. Meaning that its biggest value is placed in the first element of the array (considered as the root of the tree).

Implementing heapsort in C++ is pretty trivial, since it just a matter of calling two STL algorithm functions:
#include <vector>
#include <algorithm>

void heapsort(std::vector<int>& data)
{
  std::make_heap(data.begin(), data.end()); // 1
  std::sort_heap(data.begin(), data.end()); // 2
}
1. This make_heap() call rearranges the passed elements as a max-heap.
2. This sort_heap() call assumes that the passed sequence is a max-heap and sort it in ascending order.

But let's have some fun reimplementing by hand these two functions:
typedef std::vector<int> Vector;

void heapsort(Vector& data)
{
  buildMaxHeap(data);
  sortHeap(data);
}
We'll need a way to navigate down the binary heap:
unsigned childLeft(unsigned i) { return (2 * i) + 1; }
unsigned childRight(unsigned i) { return (2 * i) + 2; }
The root element is at index 0. Its children are on 1 and 2.
The left child of the root (index 1) has its own children on 3 and 4; its sibling (index 2) on 5 and 6.
We can get the index of the children of a generic node in a binary heap just multiplying its index by two and adding 1 (for the left one) or 2 (for the right one).
And we'll need a function to ensure that a node in the data structure is complying to the binary max-heap requisite (it should be bigger than its children):
void maxHeapify(Vector& data, unsigned i, unsigned len) // 1
{
  unsigned left = childLeft(i);
  unsigned right = childRight(i);

  unsigned largest = (left < len && (data[left] > data[i])) ? left : i;
  if(right < len && (data[right] > data[largest]))
    largest = right;

  if(largest != i) // 2
  {
    std::swap(data[i], data[largest]); // 3
    maxHeapify(data, largest, len); // 4
  }
}
1. We pass to the function the data collection, the index of the element that we are checking, and the number of element in the heap.
2. We have compared the current node value against the ones of its left and right children. If the largest one is a children, the rule of the heap is currently violated. We need to rearrange the nodes.
3. Firstly, we need to swap the nodes so that the largest one is above the other ones.
4. Then, we need to ensure that the swapping has not corrupted the max-heap structure.

We are finally ready to implement the two main functions:
void buildMaxHeap(Vector& data) // 1
{
  for(int i = data.size() / 2; i >= 0; --i) // 2
    maxHeapify(data, i, data.size());
}

void sortHeap(Vector& heap) // 3
{
  for(int i = heap.size() - 1; i > 0; --i) // 4
  {
    std::swap(heap[0], heap[i]); // 5
    maxHeapify(heap, 0, i); // 6
  }
}
1. Given an arbitrary collection of values, convert it to a max-heap.
2. Start from the bottom, up to the root.
3. We assume that the passed data respect the max-heap constrains.
4. We scan the heap starting from the rightmost element up to the second one.
5. We know that the heap root is the biggest element in the collection, so we swap it to the rightmost position.
6. Before starting a new iteration, we ensure that the data collection (except the one we have already sorted) is still a max-heap.

Full C++ source code and a couple of test cases for Google test on github.

No comments:

Post a Comment