Monetize with AdMob
Home / PHP

WURFL and PHP
Configuration file

Downloads
PHP Tools version 2.1 Final


Andrea Trasatti
PHP Tools version 2.1 Final


SourceForge.net Logo
 

WURFL and PHP: Great Combination
by Andrea Trasatti (atrasatti at users dot sourceforge dot net)

PHP is a great platform for WAP development. Thousands of developers worldwide love PHP for its performance and for the semplicity of its model.
It should come as no surprise that some PHP developer quickly built the tools to tap the WURFL power from PHP.

One easy way to play with the WURFL, is to use what I call the "WURFL PHP Library". The package includes the library and some support files such as a readme (I STRONGLY SUGGEST EVERYONE TO READ IT), check_wurfl.php that will let you quickly read all the capabilities of a selected device and update_cache.php which will be better described later.

  • wurfl_parser.php: parse the xml and put it in an Array and the other to work with the data collected. Before starting, make sure you have compiled PHP properly ( http://www.php.net/manual/en/ref.xml.php ), wurfl_parser.php uses the basic XML functions implemented compiling with expat.
  • wurfl_class.php: this script lets you access the data in the wurfl array (see previous bullet) in an object-oriented fashion.
  • wurfl_config.php: with the growing number of possible configurations in the PHP library, I am now introducing a single file for configuration. This should hopefully make it clearer where you need to configure the scripts and maybe also make it easier to integrate it with your central configuration files (if your application has any)

wurfl_parser.php

wurfl_parser.php: This is a VERY simple XML parser that reads WURFL and puts all the needed data into an array. Considering the time the parser was taking to parse the XML, I thought that using some kind of caching mechanism was probably a good idea. This will be discussed more later.
The generated Array is an associative array that looks like this:
$wurfl["devices"]["ericsson_generic"]["fall_back"]="generic";
The first key is "devices". Other possible values such as authors and contributors are probably not so interesting to you.

The second key is the UNIQUE ID assigned to each user agent. All other keys are related to the attributes or groups and capabilities of the device.

Of course, this may seem not so flexible in practice, since devices tell us their user-agent strings, not their WURFL IDs.
This is solved thanks to an additional array called $wurfl_agents, which is simply an associative array of user agents and the relative unique id. This array makes your life much simpler (and your search much faster). $wurfl_agents is cached too.
Let's look at a concrete example. Someone visits you site with, say, a Siemens S45 (user agent: "SIE-S45/24 UP.Browser/5.0"). In order to find the device ID, you would normally be required to cycle most of the $wurfl array and retrieve the ID, before you can look up the actual capabilities.
Thanks to $wurfl_agent, this is as simple as:

if ( in_array($id, $wurfl_agents) ) {
      	echo "The ID $id is known<br/>\n";
      }
Example to search an ID knowing the user agent:

      $wurfl_user_agents = array_keys($wurfl_agents);
      while ( $x = each($wurfl_user_agents) ) {
      	if ( $x[1] == $user_agent ) {
      		return $wurfl_agents($x[1]);
        }
      }

wurfl_class.php

wurfl_class.php was created to make our life simpler. Once we had the data ready in an array, we needed an easy way to access it and some methods to manipulate it. By looking closely at what had been done with the Java API, I reproduced many of those useful methods in PHP.

The class loads the parser automatically. To load the parser a defined must be set appropriately in the config file.

The class is initialized calling the constructor, wurfl_class and passing two variables that may be empty. The former is the full XML parsed (like the parser does) and the latter is the array of user agents and id's as generated by the parser. Pass two empty variables (or nothing) if you want them to be filled (when needed) or the real values if you already have them. The class will check the values and the cache files (if enabled) and decide what to do. Also check your configuration, because the behaviour will change.
Use the public method GetDeviceCapabilitiesFromAgent() passing the user agent to make the class search for the best fit and fill the object's properties. Once again, what the class does depends on your configuration, if you enabled cache files and so on.
Once you have instantiated the object and passed a user agent you may use all the class' methods.
Here is a list of properties:
$wurfl_class->_wurfl		is the WURFL array (all of it)

$wurfl_class->_wurfl_agents	is the associative array made of the user agents
                                and unique id's

$wurfl_class->user_agent	the visitor's user agent

$wurfl_class->wurfl_agent	the WURFL's best fitting user agent

$wurfl_class->id		the corresponding id

$wurfl_class->GUI		true if the device supports Openwave's GUI
                                extensions

$wurfl_class->browser_is_wap	true if the device is WAP capable. It is here
                                only for legacy support, you should use the 
                                is_wireless_device capability from WURFL.
				browser_is_wap now has the same value as the
				capability, when found. If you want to take full
                                advantage of this capability you should download
				the web patch from the WURFL site or CVS!
$wurfl_class->capabilities	the array of device's capabilities
Note: PHP (up to version 4.3) does not have any distinction between private and public methods. In the wurfl_class implementation, I named all private methods with a leading underscore to distinguish them from public methods without the underscore. If you are interested in knowing the details, just open the class, there are beautiful JAVADOC-like comments for each variable and method.
These libraries don't require register_globals anymore (from version 2 and up), but will not work with versions before 4.1.
wurfl_class($wurfl, $wurfl_agents)	is the constructor. Built to work
					best with the wurfl_parser.

GetDeviceCapabilitiesFromAgent($ua)	given a user agent it will search WURFL
					for the best fit

getDeviceCapability($capability)	given a capability it will tell you the
					value. Remember that capabilities might be string, 
					integer or boolean.
wurfl_config.php

The scope of this file is straight forward, modify it at your wish to configure the library to act as you like it best.
Please check the paragraph about caching for more info about cache files.
Here is a quick explanation of all the fields:
WURFL_CONFIG		boolean, this is set to true by default, it's used as a simple
			check to make sure the configuration was included. Add this to your
			configuration files if you won't use wurfl_config.php, otherwise just
			leave it as it is

DATADIR			string, where all data is stored (wurfl.xml, cache file, logs, etc)

WURFL_FILE		string, full path and filename of wurfl.xml

WURFL_PARSER_FILE	string, full path and filename of wurfl_parser.php

WURFL_CLASS_FILE	string, full path and filename of wurfl_class.php

WURFL_USE_CACHE		boolean, true if you want to use a cache file (strongly
			suggested). If only this parameter is set to true will be used
			cache.php.

WURFL_USE_MULTICACHE	boolean, true if you want to use Multicache files
			instead of a single BIG cache file (cache.php)

MULTICACHE_DIR		string, used only if you enabled Multicache, defines where
			the cache files will be stored. WARNING: while cache.php will grow
			in size but remain a single file, here the files will grow in
			number. Expect more than 5000 tiny files.

MULTICACHE_SUFFIX 	string, suffix for the files generated using Multicache.
			Useful if you use a caching system and don't want to load your
			shared memory with a ton of tiny files.

CACHE_FILE		string, with full path and filename of the cache file to use
			(refreshed when a new WURFL is found, if WURFL_CACHE_AUTOUPDATE is
			set to true)

WURFL_CACHE_AUTOUPDATE	boolean, tells the class to automatically update the
			cached files with a new XML is found. This is NOT suggested when
			using MULICACHE because of the high number of files to be updated.
			Race conditions are highly possible to happen. The use of 
			update_cache.php is strongly suggested for production
			environments

WURFL_PATCH_FILE	string, optional patch file for WURFL

WURFL_AGENT2ID_FILE	string, used by wurfl_class.php. Used only when
			WURFL_USE_CACHE is set to true

MAX_UA_CACHE		integer, max number of user agents to store in
			WURFL_AGENT2ID_FILE. Too high limits might give the opposite effect.

WURFL_LOG_FILE 		string, defines full path and filename for logging

LOG_LEVEL		integer, desired logging level. Use the same constants as for PHP
			logging

WURFL_AUTOLOAD		boolean, true if you want the XML to be loaded at every
			startup. If not, the XML will be loaded when needed.

Caching

Considering how slow PHP can be when parsing a big XML file, caching was a must.
Currently there are two caching systems. The older is activated when setting WURFL_USE_CACHE to true and uses DATADIR to store its files. The concept is quite simple, dump the array generated by the parsers in a big file, by default called cache.php (set by the define CACHE_FILE). In this file we also store the array called $wurfl_agents and a timestamp, useful to check if a new XML was deployed.
This system is very simple in its concept and worked well for quite some time. Considering the big size that cache.php was reaching it became a need (and in fact I always strongly suggested it) to use a caching system at PHP-level, such as Zend Accelerator, Turck cache, APC 2.0. Using such tools lets you store the cache file (cache.php) into shared memory and provides really good performances from the third hit on (first hit the XML is parsed: slow, second hit the cache is stored in shared memory: slow, third hit and on the cache is read from the shared memory: fast!).

The new caching system was dubbed "multicache" because instead of generating a single big cache file it generates 1 cache file for every device in WURFL. For this reason you will need to create a directory for this (or at least this is suggested) because the library will generate about 6000 tiny files when this feature is activated.
To activate the multicache system you will need to set WURFL_USE_CACHE to true.
CACHE_FILE will still be used, but the file will only contain the array $wurfl_agents and the timestamp.
Also set WURFL_USE_MULTICACHE to true, set the appropriate path for MULTICACHE_DIR, an absolute path is suggested. Don't forget the ending slash (for example '/tmp/cache/multicache/').
MULTICACHE_SUFFIX should be left unchanged in most cases. This will define the extension of the tiny files. You might want to set some strange extension if you want to avoid that those files are cached in shared memory by any PHP-cache, for example. Change it only if you know what you're doing!
The multicache system provides a MUCH faster data retrival, files are smaller and so it will take a way less time to read them. This will also mean a higher I/O on your system, consider returning to the older cache system if you have problems. Generating many tiny files also involves possible race conditions if a new XML is deployed and the library is configured to update automatically the cache. Read on for more info.

If WURFL_CACHE_AUTOUPDATE is set to true the library (specifically wurfl_class.php) will check the timestamp in cache.php against the file mtime of wurfl.xml. If the XML is newer than the cache it is reloaded. This is not suggested for production environments, if you have many concurrent hits you might have more than one process trying to refresh the same cache wasting a lot of resources. If you would like to avoid this you can set the automatic update to false and use the 'ad hoc' script called update_cache.php. This script was created to be called from command line (or a hidden URL if you'd like, but the command line is suggested when available) and force a cache update. This way the cache update will be prepared and the file will be changed at the very last second saving a lot of resources and having a single process do it. On sites with MANY hits you might also consider preparing the cache files on a separated system and moving them to the production server at once. There isn't a sccript to do this automatically at this time, but the new update_cache.php is already a step in that direction.

When setting WURFL_USE_CACHE to true you also enable another simple caching system (active both when using standard cache and multicache). When a user agent hits your site, this will most likely hit it again a few times. It would be stupid to search for all its capabilities at every second hit. For this reason we store the user agent and its capabilities in a file named after the value set in WURFL_AGENT2ID_FILE, this will make every second hit A LOT faster.
Storing all the user agents hitting your site will end up having a second full cache file and the benefit would reach zero. For this reason you can define a limit of user agents stored using MAX_UA_CACHE. The "perfect" value will change depending on your server's performances on the variety of user agents visiting your site and so on. A good number is between 30 and 50. I suggest you to start with 30 and maybe check the logs and see how often the cache is cleaned of the elder user agents and how often a user agent that is still visiting your site is cleaned and researched.

It's a direct consequence of the cache system that you will not need to read the entire XML and parse it every time you start the wurfl object. You may still want to force this for debug reasons for example. You can do this setting to true the define named WURFL_AUTOLOAD. If you are using any of the caching systems, I suggest you to disable this. If you're not using any cache, the XML will be loaded anyway, so just set this to true, if you'd like.


Logging

While logging is out of the scope of the WURFL PHP Libraries and I suggest you to integrate the libraries with your logging system (if you have any), a basic logging feature is included. This should work fine on Linux, Solaris and Windows.
Logging is done on a file as configured with WURFL_LOG_FILE. The log level is defined following the PHP contants and using the define named LOG_LEVEL. It used to be buggy in previous releases, check out how it has changed and it is now supposed to work properly. When set with the highest level the library might generate a log of logs.
Logs should anyway give you all the info you might need. This is a sample log when set at the highest detail level:
<date> [LuckyTitan.local 327][constructor] Class Initiated
<date> [LuckyTitan.local 327][GetDeviceCapabilitiesFromAgent] searching for SonyEricssonZ600/R601
<date> [LuckyTitan.local 327][_cacheIsValid] cache file is outdated
<date> [LuckyTitan.local 327][GetDeviceCapabilitiesFromAgent] cache enabled, WURFL is not loaded, now loading
<date> [LuckyTitan.local 327][GetDeviceCapabilitiesFromAgent] loading WURFL from XML
<date> [LuckyTitan.local 327][parse] No XML patch file defined
<date> [LuckyTitan.local 327][GetDeviceCapabilitiesFromAgent] Searching in the agent database
<date> [LuckyTitan.local 327][_GetFullCapabilities] searching for sonyericsson_z600_ver1_subr601
<date> [LuckyTitan.local 327][_GetDeviceCapabilitiesFromId] reading id:sonyericsson_z600_ver1_subr601
<date> [LuckyTitan.local 327][_GetDeviceCapabilitiesFromId] I have it in wurfl_agents cache, done
As you can see you will get:
   date [hostname pid][function_name] message

When the log is a warning or an error you will also see a special banner, like in this example:
<date> [LuckyTitan.local 13279][GetDeviceCapabilitiesFromAgent] WARNING: I couldn't find the device in my list
The date field was removed here for space reasons. The date format is:
Sat,  2 Apr 2005 23:41:21 +0200
Known issues and TODO's?
  1. the library is currently limited to a single patch file. The patch file is configured with a define. There are two possible solutions, define a list and load the files in the list or change from a define to an array and load more than one file. The parser will need to be adapted. Not hard.
  2. the library currenly implements (in wurfl_class.php) a simple recognition of web browsers and avoids searching in WURFL. This is incompatible with the new web patch, you may avoid this skipping the check. The current automatic recognition is much faster than searching WURFL, though.
  3. a possible improvement to the multicache system could be to use a directory tree instead of having all the files in a single directory. This should provide a better I/O performance
More PHP Tools
  • php_demo: When Luca and Laszlo designed and implemented the JSP tag-lib and the portal demo, I thought the demo could easily be ported to PHP.
    The concept behind the demo is quite simple: you access the site with any device and depending on its capabilities, your page is given a different set of links: basic wml pages, color images, java link, mms. This is a great way show you how easy it is exploit the power of the WURFL and use it for a real portal applications.
    NOTICE: this demo works with the latest libraries, previous versions need a different startup configuration (defines). Here are some code snippets:
    <?php
    // WURFL demo by "Andrea Trasatti" <atrasatti AT users DOT sourceforge DOT net>
    
    require_once('./wurfl_config.php'); // include the configuration, make sure to configure it properly
    require_once(WURFL_CLASS_FILE); // include the main class. This is defined in the configuration file
    
    // creating the WURFL object
    $myDevice = new wurfl_class($wurfl, $wurfl_agents);
    $myDevice->GetDeviceCapabilitiesFromAgent($_SERVER["HTTP_USER_AGENT"]);
    if ( $myDevice->is_wireless_device ) {
      header("Content-Type: text/vnd.wap.wml");
      echo '<?xml version="1.0" encoding="ISO-8859-1"?>'."\n";
    ?>
    <!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org//DTD//wml_1.1.xml">
    <wml>
     <card>
      <p mode="nowrap">
    
    <?php
      if ( $myDevice->getDeviceCapability('gif') ) {
        echo '<img src="logo.gif" alt="Global TEL" />'."\n";
      } else {
        echo '<img src="logo.wbmp" alt="Global TEL" />'."\n<br/>\n";
      }
    ?>
    
    <a href="index.php">Home</a><br/>
     </p>
     </card>
    </wml>
    
    <?php
    } else {
    ?>
    <img src="logo.gif"><br><br>
    Welcome Web browser.<br>
    We are sorry, but we are only offering WAP services, at this time.<br>
    <?php } ?>
    

  • As you can see, starting with the PHP libraries is pretty easy. Just dedicate the time that is needed to configure everything and then it'll be pretty easy.
    To make sure browsers are correctly recognized as mobile or desktop, make sure you are using the web patch. You may download it from this site or from the CVS.
Copyright © 2008, Luca Passani