Hello libcurl world

Reading a resource on the Internet through http via libcurl is very easy. Here I get the google news feed and I output it to the standard console.

Getting curl

There a number of ways to get the libcurl on your machine, accordingly to your development platform. You would probably want to check the official documentation on curl.haxx.se to get the right solution for you.

In my case, developing on a Ubuntu box for GCC C++, I could rely on the Debian repository to install the curl 4 development package for GnuTLS, simplifying a bit this step:
sudo apt-get install libcurl4-gnutls-dev
After that, I can see a new usr/include/curl directory containing all the .h files I need. There is also a libcurl.so, that I need to link to my C++ project. In my case, I find it in the /usr/lib/x86_64-linux-gnu directory (I have a 64 bit distribution).

Libcurl is written in C-language, and exposes an API in C. There is C++ wrapper, curlpp, that aims to provide a simpler access to them. I am not especially fond of it, what I usually do instead is creating my own thin C++ layer around the C-API. In any case, here I show the bare C access to the library.


Before using the curl functionality in the code, we have to initialize the library. And, symmetrically, we should also cleanup when we are done. So, typically, we'll have something like:
#include <curl/curl.h>
#include <iostream>

// ...

CURLcode result = curl_global_init(CURL_GLOBAL_NOTHING); // 1
if(result != CURLE_OK) // 2
  std::cout << "Curl initialization failure" << result << std::endl;
  return result;

// ...

curl_global_cleanup(); // 3
1. There are a few option we could pass to the curl global initialization routine. Here I am happy with the plain vanilla setup, so I pass a nothing (more than the usual stuff) flag.
2. In case of error, we can't use curl.
3. We are done with curl.

CURL object

Once we have the curl library correctly initialized, we ask to it a CURL handler on which we will perform our request. Again, when we are done with it, we should cleanup.

The usual way to perform a call with curl would follow this pattern:
CURL* curl = curl_easy_init(); // 1
  return CURLE_FAILED_INIT; // 2

// ...

CURLcode code = curl_easy_perform(curl); // 3
curl_easy_cleanup(curl); // 4

return code; // 5
1. The curl library exposes two interfaces. The classic one, and the easy one. Let's keep it simple.
2. If curl-easy can't provide us a CURL handler, we can't do anything more than returning an error code.
3. After setting up our request, we ask curl-easy to perform it. We get back a status code.
4. We are done with the session, clean it up.
5. The user could be interested in knowing what happened to his job. For instance, if the resource can't be fetched because a timeout occurred, a 28, defined as CURLE_OPERATION_TIMEDOUT in curl.h, is returned.

Setting options on CURL

We tell to libcurl what we want to do setting options on the CURL object we get back from the easy_init function. An example clarifies the matter:
curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); // 1
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &curlWriter); // 2
curl_easy_setopt(curl, CURLOPT_TIMEOUT, timeout); // 3
  curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L); // 4
1. The URL I want it to access. I had stored it in a STL string, so I have to extract a C-string from it.
2. The address of a function that it has to use to give my back what it gets calling that URL. Be patient, this callback function is showed below.
3. I am very impatient. I want the call to fail if libcurl is not able to get result in a specific time. Be aware that the timeout is specified in seconds.
4. If the resource I request has changed is URL, libcurl could find just a redirect instruction there. Do we want to follow it or not? Setting this option to one (as long) means that, yes, we want to follow it.

Write function

As promised, here is the callback function I passed to libcurl to let it give me back what it got:
size_t curlWriter(void* buf, size_t size, size_t nmemb) // 1
  if(std::cout.write(static_cast<char*>(buf), size * nmemb)) // 2
    return size * nmemb; // 3
  return 0;
1. The minimal interface for the callback function used by libcurl expects three parameters, buf is the pointer to the block of memory where it put a chunk of the answer, size is the block size in which that chunk is organized, and nmemb is the number of members we have there.
2. I just get that block of memory, and send it to the standard output console, assuming it is text resource.
3. Here I am saying to libcurl that I have correctly managed all the stuff it passed me, please send me some more - if you have it.

Full source code on github.

No comments:

Post a Comment