WURFL Evolution
===============
Luca Passani & Andrea Trasatti
This document describes the evolution of WURFL. There are quite a few
features that people are requiring or that it makes sense to have.
The purpose of this document is to discuss those features, the APIs
to implement them and getting them all to play together.
Let's say it clear that there is no guarantee that backward compatibility
will be preserved. In fact, backward compatibility will most likely
be lost.
Some of these new features impact the XML format used in WURFL and, as such,
have an impact on the Java and PHP API alike. (They also have an impact
on all other APIs of course).
Some of the new proposed features only apply to Java.
New features
============
- WURFL Modularization
- Test Suite
- Better UA Matchers
- Beyond UA Matchers
- Make UA matchers pluggable (Spring Framework?)
- Multiple patch-files
- Separation of WALL/WURFL API
- introduction of logging (Log4J) for Java
- let the API load wurfl.xml from a zip file
- better management for singletons (Spring Framework?)
WURFL Modularization
====================
We are seeing more and more mobile-related technologies show up
every year. Invariably, those new technologies encounter
'device fragmentation' as the main obstacle to that success.
Invariably, many developers find in WURFL a solution to their problems.
By now, though, the WURFL file has grown very large. In addition, the
fall_back model is getting in the way: the current WURFL hierarchy
(originally based on the features of the respective WAP browsers)
does not always match what would be an optimal break-down for
other technologies.
For example, two devices may have identical browsers, but very different
J2ME support. Deriving one from the other is not always the best
solution. While some have suggested the introduction of some kind of
'multiple inheritance' to address the problem, we feel that this would
complicate the WURFL model without any sensible advantage for the
'end developer'. After all, only a few developers need to access
different classes of capabilities within one application.
And even for those developers, WURFL modularization would still
be a better option.
WURFL modularization is about splitting WURFL over multiple WURFL files
which group together related capability groups. For example,
WAP and mark-up groups belong nicely together in one module,
while J2ME is a strong candidate to having a module of its own.
The actual break-down will of course need to be discussed
on WMLProgramming, but mark-ups, J2ME, and picture-rendering
are obviously strong candidates to represent a module of their own.
One of the advantages of modularization is that it will make it
easier for the WURFL core team to delegate tasks related to the
maintanance of groups of capabilities to other individuals
(J2ME developers may be more willing to contribute J2ME device
information if they don't have to handle the extra complexity
of dealing with devices and capabilities they do not care about).
From a technology viewpoint, the key point about the introduction
of modularization is that fall_back consistency will only
be guaranteed inside each module.
This means that looking up a device capability in a module may
lead the API to follow a different fall_back route as compared
to looking for a capability for the same device in a different
module.
For example, a device looking up a given WALL capability for a
Nokia6600 may lead to find the capability in the generic_nokia_browser,
while looking for a J2ME capability may lead to finding it
in generic_series_60, where generic_nokia_browser and
generic_series_60 only show up in the respective modules.
Since this will happen internally, the API to access the WURFL
will be *almost* unchanged, the only difference being that developers
will now need to provide the module name when using the API.
This could be made at the level of single APIs, by adding the
name of the module as one of the parameters. But there is probably
a better way than this.
For example, the singletones that we have today could be
turned into normal object and created with the knoledge of which
module they represent.
In other words, if what we have today is:
UAManager uam = ObjectsManager.getUAManagerInstance();
CapabilityMatrix cm = ObjectsManager.getCapabilityMatrixInstance();
String device_id = uam.getDeviceIDFromUALoose(UA);
String capability_value = cm.getCapabilityForDevice(device_id, "preferred_markup");
With the new Modularization API, it will look something like:
String device_id = uam_markup.getDeviceIDFromUALoose(UA);
String capability_value = cm_markup.getCapabilityForDevice(device_id, "preferred_markup");
where the CapabilityMatrix and UAManager are no longer (necessarily)
singletones, but rather simple objects initialized according to the
Spring Framework paradigm (i.e. initialization goes into a spring XML
file and the fully initialized POJOs are injected into your application
with methods like:
UAManager uam_markup;
CapabilityMatrix cm_markup;
:
public void setUAManagerMarkup(UAManager uam) {
uam_markup = uam;
}
public void setCapabilityManatrixMarkup(CapabilityMatrix cm) {
cm_markup = cm;
}
This, of course, assumes that having a unique wurfl.xml file in the
system does not cut it anymore. As a minimum, we should have
wurfl_markups.xml as a default choice, even though we may want to
exploit the fact that we need to reorganize a lot of things to
get a bunch of things right (not that what we did so far was
wrong, but WURFL's popularity is demanding that more powerful
mechanics is put in place).
As far as Java is concerned, the Spring Framework seems a wonderful
candidates to have the new version of the API totally generic
while delegating configuration to Spring wiring.
Back to out scenario of applications which in fact need to query
different modules, the new API allows for that:
dev_id_j2me = uam_j2me.getDeviceIDFromUALoose(UA);
j2me_wmapi_1_1 = cm_j2me.getCapabilityForDevice(device_id, "j2me_wmapi_1_1");
dev_id_markups = uam_markups.getDeviceIDFromUALoose(UA);
preferred_markup = cm_markups.getCapabilityForDevice(device_id, "preferred_markup");
if("true".equals(preferred_markup) && "false".equals(j2me_wmapi_1_1)) {
:
//I don't know why you may need this, but I'm sure you do
}
Test Suite
==========
Once the modularization is in place, it is time to organize a
long overdue WURFL test suite. People around the planet have the
strangest devices in their hands. We need to enable them to check what
a device can or can't do with just a few clicks.
This requires the hosting and also a mechanism to capture device info
through a web interface. So many ideas, so little time.
Better UA MAtchers
==================
When there's no exact match in WURFL, the API uses heuristics to
find a best match fast.
Currently, most heuristics are just simple tricks (ex: remove
"Vodafone/" and retry).
The main heuristics currently use is the one that, given a user-agent
which does not match exactly, starts removing a char at the time from
the end of the UA and tries to match that. So the following
(fictional) User-Agent:
NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC-1.0
will be compared with
NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC-1.
NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC-1
NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC-
NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC
:
NokiaN-Gage/1.0 (6
NokiaN-Gage/1.0 (
here WURFL stops. A match is found! => nokia_ngage_ver1_sub403
This approach has served the community well for a long time, but by now
we know it's not enough.
We have found that some devices would be better served by
different algorithms.
For example, there is a way to measure the so called
Levensthein distance between two strings.
For example, many devices that claim to be Internet Explorer
would be better served by this strategy. Look at the following UA
String for example:
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; PalmSource/Palm-TnT5; Blazer/4.0) 16;320x448
It goes without saying that UA strings like this go a long way
in generating false positive one way or the other.
A solution to this problem can be found by adding one extra level
to the UA analysis. For example, If a device UA starts with
"Nokia", we could assume that the usual matching could give
an optimal match. At the same time, if the UA string
contains "Blazer" then we may want to choose the levenstein
distance to match the UA.
Here is a snippet of code that gives an idea of what could be done:
public String getDeviceIDFromUALoose(String ua) {
//bogus data
if (ua == null || ua.length() == 0) {
return "generic";
}
//stabilize UA: no Gateway substring attachments
ua = UAUtils.removeUPLinkFromUA(ua);
//direct matches are direct matches. You are home.
Object strictUA = listOfUAWithDeviceID.get(ua);
if (strictUA != null) {
return (String)strictUA;
}
//stabilize Vodafone
if (ua.startsWith("Vodafone")) {
String result = handleVodafone(ua);
if (!result.equals("generic")) {
return "result";
}
}
//Nokia
if (ua.startsWith("Nokia")) {
String result = applyNokiaHeuristic(ua);
if (!result.equals("generic")) {
return "result";
}
}
//MOT ("MOT-" string not always in the beginning of the string)
if (ua.indexOf("MOT-") != -1) {
String result = applyNokiaHeuristic(ua);
if (!result.equals("generic")) {
return "result";
}
}
//Siemens
if (ua.startsWith("SIE-")) {
String result = applySiemensHeuristic(ua);
if (!result.equals("generic")) {
return "result";
}
}
//similarly for other manufacturers families
//Sharp, Alacatel,Sagem,NEC, Philips
:
//Mozilla: could be Microsoft, could be web browsers,
//could be WebTV or a bunch of PDAs
if (ua.startsWith("Mozilla/")) {
String result = applyMozillaHeuristic(ua);
if (!result.equals("generic")) {
return "result";
}
}
//if everything else fails check for known
//substrings (UP.Browser4,5,6,Nokia series 40,60,80)
return UAUtils.lastAttempts(ua);
}
At this point the problem has been pushed down, with a big advantage,
though: it's a bunch of smaller problem. Parsing a given Nokia UA is
much simpler than parsing a UA string that could still be a Motorola too
or a Microsoft Smartphone.
Heuristics could be as simple
String applyNokiaHeuristic(String ua) {
//pass on the char that is needed to understand
//the minimal number of chars required for a valid match
String result = matchFromBottom(ua,"/");
return result; //could still be "generic"
}
at the same time, Mozilla devices might be better served with something
more complicated:
String applyMozillaHeuristic(String ua) {
if (...figure out if it's a Microsoft device) {
//might want to apply Levensthein distance here
}
if (...web TV..PDAs...web browser....) {
//we will need some thinking here too
}
}
Of course, if reliance on the Spring Framework is provided, it will be
possible for developers to provide new implementations of methods and
classes to override the ones provided with the standard installation.
Beyond UA Matchers
==================
Matching the User-Agent string does the job for and
the WURFL core team is betting that it still will for a
long time to come.
Having said this, WURFL relies on the assumption that the UA string
uniquely identifies a certain device. This is almost always true,
but there are a handful of exceptions, with MS SmartPhone as the
most notable culprit.
Look at the headers generated by these two devices:
Orange SPV M2000:
HTTP_ACCEPT="*/*"
HTTP_ACCEPT_ENCODING="gzip, deflate"
HTTP_ACCEPT_LANGUAGE="en-us"
HTTP_CONNECTION="Keep-Alive"
HTTP_UA_COLOR="color16"
HTTP_UA_CPU="Intel(R) PXA263"
HTTP_UA_LANGUAGE="JavaScript"
HTTP_UA_OS="Windows CE (Pocket PC) - Version 4.21"
HTTP_UA_PIXELS="240x320"
HTTP_UA_VOICE="TRUE"
HTTP_USER_AGENT="Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)"
and HP iPAQ h6330
HTTP_ACCEPT="*/*"
HTTP_ACCEPT_ENCODING="gzip, deflate"
HTTP_CONNECTION="Keep-Alive"
HTTP_UA_COLOR="color16"
HTTP_UA_CPU="TI OMAP1510"
HTTP_UA_LANGUAGE="JavaScript"
HTTP_UA_OS="Windows CE (Pocket PC) - Version 4.20"
HTTP_UA_PIXELS="240x320"
HTTP_UA_VOICE="TRUE"
HTTP_USER_AGENT="Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)"
Since the User-agent is exactly the same, WURFL id unable to tell
one device from the other.
WURFL should be extended to allow the framework to tell two different
devices apart in spite of the fact that they share an identical
UA String.
The basic assumption is that, even when the UA is the same,
some other HTTP header will be different, as in the case above.
WURFL should model those differences in XML and the API should be
enhanced to map HTTP requests to devices accurately.
Since the existing mechanism has worked so well so far, the new
mechanism should be backward compatible, meaning that UA matching
will happen as before unless some extra information is provided.
Today we have something like:
Troubles start when we need to model two devices such as the ones
above (SPV M2000 and HP iPAQ h6330). While we can model it in WURFL, the
fact that the UA string is identical would confuse the API, which would be
unable to tell the two devices apart.
The solution could be something like this
:
:
In other words, in case of devices with the same UA, WURFL now introduce
a special kind of devices which resolves the conflict by looking at
other HTTP headers too.
Notes:
1 - It will be wise to provide a sensible default in case of no match
2 - The first match wins. I.e. let's keep things simple (and leave the default
as the last entry)
3 - the 'logic' attributes should allow 3 or 4 keywords:
contains, equals, startsWith, endsWith
(we can always add fancier stuff later)
4 - There will be an impact on the API (we also need to pass the HTTP Request
object now):
String device_id = uam.getDeviceIDFromUALoose(request);
Multiple Patch Files
====================
Having one patch file is useful, but why not two? or three?
after all, we already have the web patch and someone may want
to have some extra patches too.
Separation of WALL/WURFL API
============================
At the time of this writing, WALL and the WURFL API are tightly
coupled: WALL tags directly invoke the WURFL API, which makes it
hard for people to use WALL against a different API or even
a different repository if they wish.
I say 'hard' and not impossible, because some did manage to
use WALL against a different API. They achieved this by rebuilding
the WALL tags against new WURFL API classes.
Being this the case, it makes sense to generalize WALL in ways
that the access to the API becomes pluggable.
Of course, this may go hand-in-hand with the introduction
of the Spring Framework in WURFL: the actual API is just a class
you wire in in the Spring conf file.
introduction of logging (Log4J) for Java
========================================
Introducing Logging (Log4J) was evaluated also during before
thr release of WURFL/WALL's latest version.
The reason why it was decided not to use it was performance:
since the library may be used in applications that require
high-performance (operator portals, for example), it was decided
that a commented out 'System.out.print' would be perfect for
the job.
Now it's not like a got huge amounts of complaints about the
lack of logging, but I did get some. Also, I think it would be
more elegant and useful than System.out.print. So, I'll
go for it.
The way I understand it is that there are three ways logging could
be added:
Simple:
_logger.debug("nice message number "+ i +
", but concatenating strings is expensive")
Less simple, but preserving performance:
if (_logger.isDebugEnabled()) {
_logger.debug("Makes me wonder why I didn't stick to System.out.print().");
}
Best in terms of performance, but not the simplest:
if (Constants.MYDEBUG) {
_logger.debug("Still verbose, but the compiler will clean everything.");
_logger.debug("if statement still a pain in the backside.");
}
Which one do you recommend?
(I wonder if I should make the logger also pluggable using the Spring
framework....).
Load wurfl.xml from a zip file
==============================
This is a simple one, but also very useful as the wurfl file gets
increasingly large.
better management for singletons (Spring Framework?)
====================================================
I have been keeping an eye over the SpringFramework in the past few months:
http://www.springframework.org/
My understanding of the framework is that (among other things) Spring is
about placing everything that has to do with initialization
into an XML configuration file. Once that is done, you no longer
need to initialize a thing in your classes: database connections are
there, your dependencies are correct and you find them ready for use
in your classes.
The advantage of this is that your code no longer needs to worry
whether it's running in a servlet or not, or if your connection
to the database is active or not. All of this is taken care of by
the framework.
In the specific case of WURFL, the introduction of Spring would
produce bring a few advantages:
- users don't have to worry about service initialization
- if implemented from interfaces, all the services become pluggable
- Spring does not seem to be very invasive, which is good.
- by integrating with SpringFramework, the problem with the WURFL singletons
(which has been bothering many) should be automatically solved
( http://wiki.apache.org/tomcat/OutOfMemory ). I understand that
Spring will initialize and release context resources automatically.
Of course, there is a price to pay:
- having to introduce dependencies and jars to the distribution
(which I always tried to avoid)
- more configuration files to manage, which increases the complexity
and scares newbies away.
Here is how I envision the default spring configuration file:
/WEB-INF/wurfl.xml
/WEB-INF/wurfl_patch.xml
One implication of this is that many WURFL classes may have to be turned
into interfaces, in order to let people provide alternative implementations
that they can plug in (or 'wire in', in Spring lingo) through Spring.
It should not be hard to separate WALL from the WURFL APIs in ways that
a different API (even one that does not talk to WURFL!) can be used.