Showing posts with label libcurl. Show all posts
Showing posts with label libcurl. Show all posts

CURLOPT_WRITEFUNCTION and C++

Even though I am developing for C++, I don't use the curlpp wrapper to libcurl, the well known file transfer library, preferring to access its bare C interface. I feel more comfortable in this way, still there are a few low level details that require to be explicitly considered. For instance, the CURLOPT_WRITEFUNCTION option accepts as parameter only addresses to free functions (or static member functions). When we want to use a (non static) member function, we have to be prepared to deal with a certain amount of ugliness.

I'd like Curl to put the data it fetches (for more details, please have a look to the previous post where I talked about the plain Curl setup) in a (non static) member variable of the same class from which curl_easy_perform() is called. Something like that:
class CurledClass
{
  // ...

private:
  std::string data_; // 1

  CURLcode curling(/* ... */)
  {
    CURL* curl = curl_easy_init();

    // ...

    CURLcode code = curl_easy_perform(curl); // 2
    curl_easy_cleanup(curl);

    return code;
  }
};
1. I want to store the answer I get from Curl in this data member.
2. Calling Curl to perform the job.

As I said before, we can't pass a pointer to a non-static member function to CURLOPT_WRITEFUNCTION, but we pass a pointer to a static function, but we can ask Curl to pass an extra parameter to that function. We can let that parameter, CURLOPT_WRITEDATA, to be the pointer to this object, so that our static function could actually call a non-static member function:
CURL* curl = curl_easy_init();

// ...
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &CurledClass::write); // 1
curl_easy_setopt(curl, CURLOPT_WRITEDATA, this); // 2
// ...

data_.clear(); // 3
// ...
curl_easy_perform(curl);
1. I specify the static member function I want to be called.
2. I put in the CURLOPT_WRITEDATA option the pointer to the current CurledClass object
3. It is usually a good idea to cleanup the buffer before using it.

The static function specified as CURLOPT_WRITEFUNCTION now just acts like an adapter to the member function I want to use instead:
class CurledClass
{
  // ...
  static size_t write(void* buf, size_t size, size_t nr, void* self) // 1
  {
    return static_cast<LbsTest*>(self)->mWrite(static_cast<char*>(buf), size, nr); // 2
  }

  size_t mWrite(char* buf, size_t size, size_t nr) // 3
  {
    data_.append(buf, size * nr); // 4
    return size * nr;
  }
};
1. Notice the last parameter, that I called "self". It stores the value we have put to CURLOPT_WRITEDATA.
2. The static function write() simply calls the member function mWrite(). To do that it (unsafely) casts the self parameter to this, and the first parameter to a pointer to char..
3. Here is the member function that does the job of copying the data from the Curl buffer to the local one.
4. Remember that Curl could call many times the CURLOPT_WRITEFUNCTION, and we are expected to splice the passed data to build the actual result. I am using a standard STL string as buffer, so I just append the new data as I get it.

Go to the full post

Hello libcurl world

Reading a resource on the Internet through http via libcurl is very easy. Here I get the google news feed and I output it to the standard console.

Getting curl

There a number of ways to get the libcurl on your machine, accordingly to your development platform. You would probably want to check the official documentation on curl.haxx.se to get the right solution for you.

In my case, developing on a Ubuntu box for GCC C++, I could rely on the Debian repository to install the curl 4 development package for GnuTLS, simplifying a bit this step:
sudo apt-get install libcurl4-gnutls-dev
After that, I can see a new usr/include/curl directory containing all the .h files I need. There is also a libcurl.so, that I need to link to my C++ project. In my case, I find it in the /usr/lib/x86_64-linux-gnu directory (I have a 64 bit distribution).

Libcurl is written in C-language, and exposes an API in C. There is C++ wrapper, curlpp, that aims to provide a simpler access to them. I am not especially fond of it, what I usually do instead is creating my own thin C++ layer around the C-API. In any case, here I show the bare C access to the library.

Initialization

Before using the curl functionality in the code, we have to initialize the library. And, symmetrically, we should also cleanup when we are done. So, typically, we'll have something like:
#include <curl/curl.h>
#include <iostream>

// ...

CURLcode result = curl_global_init(CURL_GLOBAL_NOTHING); // 1
if(result != CURLE_OK) // 2
{
  std::cout << "Curl initialization failure" << result << std::endl;
  return result;
}

// ...

curl_global_cleanup(); // 3
1. There are a few option we could pass to the curl global initialization routine. Here I am happy with the plain vanilla setup, so I pass a nothing (more than the usual stuff) flag.
2. In case of error, we can't use curl.
3. We are done with curl.

CURL object

Once we have the curl library correctly initialized, we ask to it a CURL handler on which we will perform our request. Again, when we are done with it, we should cleanup.

The usual way to perform a call with curl would follow this pattern:
CURL* curl = curl_easy_init(); // 1
if(!curl)
  return CURLE_FAILED_INIT; // 2

// ...

CURLcode code = curl_easy_perform(curl); // 3
curl_easy_cleanup(curl); // 4

return code; // 5
1. The curl library exposes two interfaces. The classic one, and the easy one. Let's keep it simple.
2. If curl-easy can't provide us a CURL handler, we can't do anything more than returning an error code.
3. After setting up our request, we ask curl-easy to perform it. We get back a status code.
4. We are done with the session, clean it up.
5. The user could be interested in knowing what happened to his job. For instance, if the resource can't be fetched because a timeout occurred, a 28, defined as CURLE_OPERATION_TIMEDOUT in curl.h, is returned.

Setting options on CURL

We tell to libcurl what we want to do setting options on the CURL object we get back from the easy_init function. An example clarifies the matter:
curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); // 1
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &curlWriter); // 2
curl_easy_setopt(curl, CURLOPT_TIMEOUT, timeout); // 3
if(follow)
  curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L); // 4
1. The URL I want it to access. I had stored it in a STL string, so I have to extract a C-string from it.
2. The address of a function that it has to use to give my back what it gets calling that URL. Be patient, this callback function is showed below.
3. I am very impatient. I want the call to fail if libcurl is not able to get result in a specific time. Be aware that the timeout is specified in seconds.
4. If the resource I request has changed is URL, libcurl could find just a redirect instruction there. Do we want to follow it or not? Setting this option to one (as long) means that, yes, we want to follow it.

Write function

As promised, here is the callback function I passed to libcurl to let it give me back what it got:
size_t curlWriter(void* buf, size_t size, size_t nmemb) // 1
{
  if(std::cout.write(static_cast<char*>(buf), size * nmemb)) // 2
    return size * nmemb; // 3
  return 0;
}
1. The minimal interface for the callback function used by libcurl expects three parameters, buf is the pointer to the block of memory where it put a chunk of the answer, size is the block size in which that chunk is organized, and nmemb is the number of members we have there.
2. I just get that block of memory, and send it to the standard output console, assuming it is text resource.
3. Here I am saying to libcurl that I have correctly managed all the stuff it passed me, please send me some more - if you have it.

Full source code on github.

Go to the full post