Showing posts with label Apache. Show all posts
Showing posts with label Apache. Show all posts

At least one JAR was scanned for TLDs yet contained no TLDs

As soon as I added a jar to my Tomcat base web application, I got this new info log message.
INFO [main] org.apache.jasper.servlet.TldScanner.scanJars
At least one JAR was scanned for TLDs yet contained no TLDs
It makes sense, since it was a JDBC driver, and I would have been very surprised if it contained a TLD.

Let's see how to get rid of it.

Firstly, I read the rest of the logging message
Enable debug logging for this logger for a complete list
of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can
improve startup time and JSP compilation time.
Actually, I already knew the name of the JAR culprit, since I just added it, so I could have skipped directly to the second step. However, I followed the instructions and ...

Tomcat logging properties

I (temporarily) changed the logging.properties for my Tomcat instance, adding this line
org.apache.jasper.servlet.TldScanner.level = FINE

After that, my log file got more interesting, and I focused my attention to this line.
FINE [main] org.apache.jasper.servlet.TldScanner$TldScannerCallback.scan
No TLD files were found in [... x.jar] Consider adding the
JAR to the tomcat.util.scan.StandardJarScanFilter.jarsToSkip
property in CATALINA_BASE/conf/catalina.properties file.
Very clear indeed.

Tomcat Catalina properties

I easily found the tomcat.util.scan.StandardJarScanFilter.jarsToSkip property in file catalina.properties from my Tomcat configuration folder. It is a longish comma separated string containing a list of JAR that are known not to contain TLD.

I just added the name of my JAR to it, and the case was closed.

Go to the full post

Httpd virtual hosts

I wanted to manage a couple of web sites, let's call them one.dd and two.dd, with my Apache Web Server, and I wanted them to live on the same machine, sharing the same IP address. We know that to do that in the real life, I could not choose randomly a fancy name, like I have just said, but I have to register a proper name under well known limitations. But if I play just on my local machine(s), I can forget about that, and being free and foolish. Still I have to follow a few basic rules.

I am working on a Debian box, on a Apache httpd 2.2 built from scratch, downloading the package from the official Apache site. I reckon you can adapt very easily what I have done to your current setup.

Setting the hosts

The operating system should be aware of the names I want to use on the current machine. This is done in a text file, typically (for *x environments) named /etc/hosts. There we see, among the other things, the standard mapping between 127.0.0.1 and localhost, and we are about to extend it to add our two host names:
127.0.0.1 localhost one.dd two.dd
Setting the httpd configuration

Apache has to know how to manage our virtual hosts, too. The standard http configuration file, conf/httpd.conf, has a commented line that, when activated, includes the specific configuration file for virtual hosts.
# Virtual hosts
#Include conf/extra/httpd-vhosts.conf
It is usually considered a better idea to let the provided example alone, and work on a different file.

This is my virtual host configuration file:
# Virtual Hosts
NameVirtualHost *

<VirtualHost *>
    DocumentRoot /site/www/one.dd
    ServerName www.one.dd
</VirtualHost>

<VirtualHost *>
    DocumentRoot /site/www/two.dd
    ServerName www.two.dd
</VirtualHost>
I guess this is the simplest configuration file one could conceive.

The directive NameVirtualHost says to Apache that we want to attach one or more virtual hosts to the specified address/port. Here I passed a star to it, meaning "anything you get to this point". Usually you want to be more choosy. Besides, I didn't specify any port number. In this case, Apache assumes I expect it to use the one specified in the Listen directive.

Then I have a VirtualHost block for each host I want to define. If anything not matching with the ServerName's specified is getting here, the first one is considered as the default one.

The DocumentRoot says to Apache which directory to use as root for the site. I have created the specified document root directories, and put in both of them an index.html file.

Looks easy, doesn't it? Still, even at this basic level, there are a few thing that could go wrong. And the resulting error messages could look cryptical.

Wrong!

If NameVirtualHost is not matching with any VirtualHost (a different port number is enough) Apache doesn't know what to do of that directive, and a "NameVirtualHost has no VirtualHosts" warning is issued at startup.

I have already noted that if the NameVirtualHost port is not explicitly given, the one specified in the Listen directive is used. But you should ensure to keep the same convention for the associated VirtualHost, too. Otherwise you could get a "VirtualHost mixing * ports and non-* ports with a NameVirtualHost address is not supported, proceeding with undefined results".

Go to the full post

Iterating over an Apache apr_table_t

A common data structure that is very useful to have at hand when working with a web server, is an associative array where both key and value are strings. If Apache httpd was developed in C++, they would have probably used an STL unordered_map, but here we are dealing with pure C, so an internal data structure named apr_table_t has been designed expressly for this scope, with a bunch of associated functions for manage it.

Here I am going to write an example that uses apr_table_do() to loop over all the elements in an Apache table.

What I want to do is writing an Apache2 module that generates as output an HTML page listing all the properties in the request header.

If we have a look to the apr_tables.h, we'll find this couple of interesting lines:
typedef int (apr_table_do_callback_fn_t)(
    void* rec, const char* key, const char* value);

int apr_table_do(apr_table_do_callback_fn_t* comp,
    void* rec, const apr_table_t* t, ...);
The apr_table_do() gets as first parameter a callback function, then the module request record, and the Apache table we want to loop on. Finally we specify which tables elements we are interested in, or a NULL if we want to go through all of them.

Here is the function I want to use as callback, a simple output of the current key-value pair:
int print(void* rec, const char* key, const char* value)
{
    request_rec* r = static_cast<request_rec*>(rec); // 1
    ap_rprintf(r, "%s: %s<br />\n", key, value); // 2

    return 1; // 3
}
1. Tiny nuisance, the request_rec is seen by the callback prototype as a void pointer - to allow more flexibility, I reckon - so we need to cast it back to its original type. I was about to check the cast result, but in the end I decided that was a bit too paranoid for such a basic example.
2. Dump the pair to the HTML response that the module is generating.
3. And finally return a non-zero value, to mean success.

In the handler, I'll have something like:
int handler(request_rec* r)
{
    // ...
    apr_table_do(print, r, r->headers_in, NULL);

    // ...
    return OK;
}
The full C++ source code is on github. You should compile it, possibly using a make file like the one showed in the previous post, and make the resulting shared object available to Apache.

In the httpd configuration file, we should explain to Apache how to map a request to the server to a call for our module, and how to load the module:
<Location /info>
    SetHandler info
</Location>

# ...

LoadModule info_module modules/mod_info.so
And what it is left to do, to have the new module available, it is just stop and start your Apache server.

Go to the full post

Makefile for C++ Apache module

The Apache web server (AKA httpd, or just Apache) is written in C language, but this is not a compelling reason for us to write our modules in the same language. And, as you could expect, it is pretty easy to use the C++ language instead.

Converting the minimal Hello World and the simple example from C to C++ (actually g++ 4.4.5 on Linux Debian for Apache 2.2) took a minimal effort.

What I had to do was adding an explicit include directive for http_protocol.h, to let the less forgiving C++ compiler to properly check against a few functions. Not doing it was leading to these errors:
error: ‘ap_set_content_type’ was not declared in this scope
error: ‘ap_rputs’ was not declared in this scope
error: ‘ap_rprintf’ was not declared in this scope
Besides, I also removed the static specification for all the local function, and put them instead in an unnamed namespace.

Finally I wrote this Makefile:
all: mod_hello.so

mod_hello.o : mod_hello.cpp
    g++ -c -I/path/to/apache22/include -fPIC mod_hello.cpp

mod_hello.so : mod_hello.o
    g++ -shared -o mod_hello.so mod_hello.o

clean:
    rm -rf mod_hello.o mod_hello.so
To build the object I called g++ with a few options:
-c because I don't want it to run the linker, its output should be the object file.
-I to specify the apache include directory (put there your actual one).
-fPIC is due to the fact that we are about to create a shared object, so we need g++ to generate position-independent code.

The actual generation of the shared object is accomplished by second call to g++, this time specifying as options:
-shared to let it know that a shared object is what we want.
-o to specify the output file name.

Remember that in a Makefile you should put TAB and only TAB (no white spaces at all!), if you don't want to get a puzzling error like this:
Makefile:6: *** missing separator.  Stop.

Go to the full post

Simple Apache module

My first Apache module needs to be improved in many ways. Here I am addressing to a few basic requirements, answering only to one specific request; logging using the Apache built-in facility; and generating an HTML document as answer.

We want to set our Apache web server so that it would answer with an HTML page containing a few information on the actual received request, and we want it to give this feedback only when we ask for "hello" on it.

Request-module connection

Apache should know about how to associate a user request to the handler that our module is designed to manage. To do that we add a Location directive in the Apache httpd configuration file.

You'll find this file in the Apache conf directory, named httpd.conf, we open it and we add this section:
<Location /hello>
    SetHandler hello
</Location>
We ask Apache to set the handler "hello" so that it is invoked when is issued a request to the hello page directly under the root of my server.

Secondly, we change the C source code, so that we check the passed request, and we ensure the passed handler matches our expectation:
// ...

static const char* MOD_HANDLER = "hello"; // 1

static int handler(request_rec* r)
{
    if(r->handler == NULL || strcmp(r->handler, MOD_HANDLER)) // 2
    {
        // ...
        return DECLINED; // 3
    }

    // ...

    return OK; // 4
}
1. The handler for this module, it's value is used in httpd.conf, as shown above.
2. Check the handler as passed by Apache.
3. If our module has nothing to say here we return DECLINED, to let know to Apache that it has to look elsewhere.
4. Otherwise, after we did our job, we return OK, saying in this way that Apache could considered the request accomplished.

Apache logging

It is often a good idea to log what is going on in our code, both for debugging and administrative purpose. In this module, it would be nice to have a feedback also when a request is discarded. We get this effect adding this line before "return DECLINED":
ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r, "handling declined: %s", r->handler);
The Apache ap_log_rerror() lets us writing to the Apache error log file, you'll find it in the logs folder, in your Apache httpd installation directory, named "error_log".
APLOG_MARK is just a way to combine the standard __FILE__ and __LINE__ defines in a single symbol, to save a few keystrokes.
After it, we specify the log level, that ranges from APLOG_EMERG down to APLOG_DEBUG (not to mention the TRACE defines, not commonly used, as far as I know). In the httpd.conf file we configure which level of logging we actually want to print to the log file, setting the log level:
LogLevel debug
As you can imagine, in production it is usually a smart move to set the configured log level higher than debug.
Next parameter, here set to 0, is not very interesting here, and it is followed by the pointer to the request, as we get it from Apache.
Finally we have a string, representing what we want to log, and that could include printf-style parameters.

Building a HTML response

Admittedly, an Apache module is not the most programmer-friendly tool to create a HTML page. But it still make sense in such a simple case:
ap_set_content_type(r, "text/html"); // 1
ap_rputs("<html><head><title>Hello Module</title></head><body>", r); // 2
ap_rprintf(r, "handler: %s<br />\n", r->handler); // 3
ap_rprintf(r, "filename: %s<br />\n", r->filename);
ap_rprintf(r, "the_request: %s<br />\n", r->the_request);
ap_rprintf(r, "header_only: %d<br />\n", r->header_only);
ap_rprintf(r, "hostname: %s<br />\n", r->hostname);
ap_rputs("</body></html>", r);
1. Firstly, set the reply content type.
2. To put a static string, as this one, ap_rputs() does an excellent job.
3. When we need to send parametrized stuff, we'd better using ap_rprintf().

As you can see, I generated the answer from the request, extracting a few (more or less) interesting values as received from Apache.

The complete C code for this module is on github. To compile it and add the generated .so to the server, you can use the apxs utility (more details in the previous post). Remember to set the Apache configuration file with the Location/SetHandler directives.

Go to the full post

A minimal Hello World Apache module

After you have set up the Apache development environment, it is time to create a first module.

I tried to create a minimal module, following the K&R's "Hello World" spirit, that I reckon is so rewarding when you are approaching new stuff.

This module is going to be very impolite, trying to answer to all the user requests to the Apache server in the same way. It would even override the standard index.html page on root.

It is made of a module declaration, logically the first thing we are interested in, but traditionally placed at the bottom of the file, and a couple of static functions (I forgot to mention it, but I am developing in plain old C language).

One of the two functions, hooks(), is passed to the module declaration, and it sets a connection between Apache and the other function, that I named handler(), that it is going to be called to handle the user's jobs.

Here is the three guys above mentioned in detail:
static int handler(request_rec* r) // 1
{
    ap_set_content_type(r, "text/plain"); // 2
    ap_rputs("Hello Apache httpd module", r); // 3
    return OK; // 4
}

static void hooks(apr_pool_t* p) // 5
{
    ap_hook_handler(handler, NULL, NULL, APR_HOOK_REALLY_FIRST); // 6
}

module AP_MODULE_DECLARE_DATA hello_module = // 7
{
    STANDARD20_MODULE_STUFF, NULL, NULL, NULL, NULL, NULL, hooks // 8
};
1. This function is called by Apache so that we can provide a reply to a client request. As we can see, as parameter we get a pointer to a structure that actually represents the user request.
2. I am not doing any check here, I always prepare a plain text reply, by calling the Apache function that sets the content type, ...
3. ... and I fill it with a puny string, with the Apache version of the well known puts() function, but reinterpreted for working on a request_rec.
4. Finally I return OK, meaning that my module has been able to fulfill the user request, and Apache could happily consider this job as done.
5. Here I set the hooks that let Apache know which are the functions in my module it can call.
6. The first parameter to ap_hook_handler() is a function that Apache could call to reply to a user request, and the last one is the priority this hook should have in the collection of hooks owned by Apache. Here I am saying that I want full priority.
7. Here is my module declaration. It is known to Apache by the suggestive name of hello_module.
8. And this are a bunch of information we are passing to Apache about our module. The STANDARD20_MODULE_STUFF define is an aggregate of constants that are saying to Apache this is a standard module version 2, and there is not much more to say about it. We'll say something more on the subsequent five NULLs, but more interesting is the last parameter, the function name Apache needs to know to perform the module initialization.

This is almost everything about it. There are a couple of header inclusions you have to perform to let the compiler knowing what the heck are all those ap_... things, namely httpd.h and http_config.h, but you can see the full code on github.

And, well, you have to compile and register this code before you can actually use it on Apache. To do that, there is a nifty Apache utility, apxs, that basically does all the job for us.

In this case, I would call it something like this:
/path/to/apache/bin/apxs -a -i -c mod_hello.c
Then I stop and start Apache, run it, submit any request whatsoever to it through a we browser, and I should always get back the same reply.

Go to the full post

Installing Apache Httpd

In my current project I am also developing some stuff on the Apache HTTP Server, also known as Apache httpd, or even just Apache, in a Debian box. The environment setup is not complicated, but it does have a couple of twists. So I reckoned it was worthy to put down a few notes about the process.

Currently, on the official Apache httpd download page, we have access to three versions (2.0, 2.2, and 2.4) and a number of different formats, ready to be grabbed.

Since I want a developer install, I should avoid the lean standard cuts provided by apt-get (for Debian) or packaged in a msi (for Windows), and I should go for the "Source" releases. So in my case, I go for a 2.2 (this is the version used at work) Unix Source download.

If you check your chosen link, you would see a different provider accordingly to your geographical position, but the archive name should end like ".../httpd/httpd-2.2.22.tar.gz" (being 2.2.22 the current 2.2 version). I downloaded it through wget, and then I have extracted it by tar xvfz (that is why I have picked up the tar.gz flavor), getting all the raw stuff in a subfolder. I changed directory to there and, before starting the real installation process, I decided where to put the thing, let's call it $APACHE2, that would usually be something like $HOME/apache2.

Firstly I have to prepare the configuration, this is usually done by calling the command
./configure --prefix=$APACHE2
In my case, I want to enable the Dynamic Shared Object (DSO) Support, so that I could install shared modules in a next step. To do that, configure has to be called passing the enable-so option too:
./configure --prefix=$APACHE2 --enable-module=so
Once configure has run, it is time to make Apache. This needs two steps, a first call to "make", and a second one, to "make install".

Almost done. Still I had to set the Apache endpoint in its configuration file: I went to $APACHE2/conf, I edited httpd.conf, setting the property Listen to localhost:8080 - a common setup.

Now I can start and stop my Apache HTTP server, going to $APACHE2/bin, and running:
./apachectl start
./apachectl stop
After starting, and before stopping, I should be able to connect from my web browser to the Apache local home page, by accessing the page on localhost:8080 - if this is not the case, that means I have some trouble.

Building Apache 1.3 on a modern environment

If you need to install a legacy Apache httpd server, you would follow more or less the same procedure, but you could bump in a couple of issues.

Firstly, running configure could lead to errors and a garbled output. This is easy to solve, just run it through a new bash shell, like this:
bash ./configure --prefix=$APACHE13 --enable-module=so
Secondly, you could get a failure in making Apache, due to the use of the symbol getline, that now is part of the C standard library.
The solution here is editing the offending files, htdigest.c htpasswd.c logresolve.c, to rename the local getline function.

Go to the full post

Hello PHP

Writing an hello program for PHP is quite straighforward. It just takes a bit to setup the environment, expecially if you are using a Windows machine.

I use Apache as HTTP server, version 2.2, actually. Since the target operating system is Windows, I downloaded the msi installer - that makes the installing job a bit easier.

The only change I made is about the starting/stopping. I changed the Apache2.2 service start mode to manual, and I wrote a couple of tiny script to do the job - but I could have used the cool Apache tool on the tray-bar, instead.

In any case, here is the two one-liner that start and stop the service (they required to be executed by an user having administrator priviledges):

net start Apache2.2

net stop Apache2.2

Then I downloaded PHP for Windows/Apache. Notice that for Apache you need the VC6 version, and not the VC9. I got again the msi, expecting to have less job to do in this way.

Actually, something went wrong, and after installing PHP, Apache did not startup anymore. I had a look at the Apache configuration file, conf\httpd.conf, and I found out that the PHP section was wrong - I had to correct manually the directories in this way:

#BEGIN PHP INSTALLER EDITS - REMOVE ONLY ON UNINSTALL
PHPIniDir "C:/dev/PHP"
LoadModule php5_module "C:/dev/PHP/php5apache2_2.dll"
#END PHP INSTALLER EDITS - REMOVE ONLY ON UNINSTALL

Notice the use of forward slash and not the "windows style" backslash. Obviously the pathname could be different in your setting.

In any case. Done that, I could run my hello php file. I put in the Apache htdocs directory a file named hello.php with inside just this line:
<?php phpinfo(); ?>
The result is a huge HTML page with lot of information on the current PHP configuration.

Go to the full post

RAII for Xerces

We have already seen how to install and run Apache Xerces-C 3 on Windows Seven / VC++ 2010.

Before starting using it, let's simplify a bit our life creating a little class that would spare us the bore of initialize and terminate explicitly Xerces.

The rationale behind it is that Xerces is a resource that has to be initialized before using it and released at the end, so it makes perfect sense applying the RAII (Resource Acquisition Is Initialization) paradigm.

Besides, it is very easy to design and implement. It just a matter of writing this tiny class:
#include <iostream>
#include <xercesc/util/PlatformUtils.hpp>
XERCES_CPP_NAMESPACE_USE

class XercesManager
{
public:
   XercesManager()
   {
      std::cout << "Initializing Xerces" << std::endl;
      XMLPlatformUtils::Initialize();
   }

   ~XercesManager()
   {
      std::cout << "Terminating Xerces" << std::endl;
      XMLPlatformUtils::Terminate();
   }
};
Given this wrapper class, our main becomes:
int main(int argc, char* argv[])
{
   try
   {
      XercesManager xm;
      someFunction(argc, argv); 
   }
   catch(const XMLException& ex)
   {
      std::cout << "Failure on Xerces: " << ex.getMessage() << std::endl;
   }

   system("pause");
   return 0;
}
Where in someFunction() there will be the actual code requiring Xerces.

We put on the stack a XercesManager instance. Its allocation determine the Xerces initialization; and when we leave the scope (both in case of exception and regular termination) the termination call is made through the destructor.

And now we can focus on the real job.

Go to the full post

Xerces 3.1.1

I have to do some xml related job in C++, so I'm installing Apache Xerces-C 3.1.1 on my current test environment (Windows 7 - VC++ 2010).

Once you install xerces in [YOUR_XERCES_PATH], you have to settle a few values in your project property pages:

Among C/C++, General, Additional Include Directories: [YOUR_XERCES_PATH]\include
In Linker, General, Additional Library Directories: [YOUR_XERCES_PATH]\lib
In Linker, Input, Additional Dependencies: xerces-c_3.lib

And, xerces-c_3_1.dll should available to the application you are going to write. You could easily achieve this adding [YOUR_XERCES_PATH]\bin to your system PATH.

Once you have done all of this, you are ready for your first Xerces application, as reported by the minimal official Xerces C++ 3.1.1 Programming Guide.

It is the bit of code that any application designed to use Xerces has to implement, and it just initializes and terminates the Xerces system:
#include <iostream>
#include <xercesc/util/PlatformUtils.hpp>

XERCES_CPP_NAMESPACE_USE

int main(int argc, char* argv[])
{
   std::cout << "Initializing Xerces" << std::endl;
   try
   {
      XMLPlatformUtils::Initialize();
   }
   catch(const XMLException& toCatch)
   {
      std::cout << "Failure on Xerces: " << toCatch.getMessage() << std::endl;
      return 1;
   }

   // Do your actual work with Xerces-C++ here.

   std::cout << "Terminating Xerces" << std::endl;
   XMLPlatformUtils::Terminate();

   system("pause");
   return 0;
}

Go to the full post