WURFL Evolution =============== Luca Passani & Andrea Trasatti This document describes the evolution of WURFL. There are quite a few features that people are requiring or that it makes sense to have. The purpose of this document is to discuss those features, the APIs to implement them and getting them all to play together. Let's say it clear that there is no guarantee that backward compatibility will be preserved. In fact, backward compatibility will most likely be lost. Some of these new features impact the XML format used in WURFL and, as such, have an impact on the Java and PHP API alike. (They also have an impact on all other APIs of course). Some of the new proposed features only apply to Java. New features ============ - WURFL Modularization - Test Suite - Better UA Matchers - Beyond UA Matchers - Make UA matchers pluggable (Spring Framework?) - Multiple patch-files - Separation of WALL/WURFL API - introduction of logging (Log4J) for Java - let the API load wurfl.xml from a zip file - better management for singletons (Spring Framework?) WURFL Modularization ==================== We are seeing more and more mobile-related technologies show up every year. Invariably, those new technologies encounter 'device fragmentation' as the main obstacle to that success. Invariably, many developers find in WURFL a solution to their problems. By now, though, the WURFL file has grown very large. In addition, the fall_back model is getting in the way: the current WURFL hierarchy (originally based on the features of the respective WAP browsers) does not always match what would be an optimal break-down for other technologies. For example, two devices may have identical browsers, but very different J2ME support. Deriving one from the other is not always the best solution. While some have suggested the introduction of some kind of 'multiple inheritance' to address the problem, we feel that this would complicate the WURFL model without any sensible advantage for the 'end developer'. After all, only a few developers need to access different classes of capabilities within one application. And even for those developers, WURFL modularization would still be a better option. WURFL modularization is about splitting WURFL over multiple WURFL files which group together related capability groups. For example, WAP and mark-up groups belong nicely together in one module, while J2ME is a strong candidate to having a module of its own. The actual break-down will of course need to be discussed on WMLProgramming, but mark-ups, J2ME, and picture-rendering are obviously strong candidates to represent a module of their own. One of the advantages of modularization is that it will make it easier for the WURFL core team to delegate tasks related to the maintanance of groups of capabilities to other individuals (J2ME developers may be more willing to contribute J2ME device information if they don't have to handle the extra complexity of dealing with devices and capabilities they do not care about). From a technology viewpoint, the key point about the introduction of modularization is that fall_back consistency will only be guaranteed inside each module. This means that looking up a device capability in a module may lead the API to follow a different fall_back route as compared to looking for a capability for the same device in a different module. For example, a device looking up a given WALL capability for a Nokia6600 may lead to find the capability in the generic_nokia_browser, while looking for a J2ME capability may lead to finding it in generic_series_60, where generic_nokia_browser and generic_series_60 only show up in the respective modules. Since this will happen internally, the API to access the WURFL will be *almost* unchanged, the only difference being that developers will now need to provide the module name when using the API. This could be made at the level of single APIs, by adding the name of the module as one of the parameters. But there is probably a better way than this. For example, the singletones that we have today could be turned into normal object and created with the knoledge of which module they represent. In other words, if what we have today is: UAManager uam = ObjectsManager.getUAManagerInstance(); CapabilityMatrix cm = ObjectsManager.getCapabilityMatrixInstance(); String device_id = uam.getDeviceIDFromUALoose(UA); String capability_value = cm.getCapabilityForDevice(device_id, "preferred_markup"); With the new Modularization API, it will look something like: String device_id = uam_markup.getDeviceIDFromUALoose(UA); String capability_value = cm_markup.getCapabilityForDevice(device_id, "preferred_markup"); where the CapabilityMatrix and UAManager are no longer (necessarily) singletones, but rather simple objects initialized according to the Spring Framework paradigm (i.e. initialization goes into a spring XML file and the fully initialized POJOs are injected into your application with methods like: UAManager uam_markup; CapabilityMatrix cm_markup; : public void setUAManagerMarkup(UAManager uam) { uam_markup = uam; } public void setCapabilityManatrixMarkup(CapabilityMatrix cm) { cm_markup = cm; } This, of course, assumes that having a unique wurfl.xml file in the system does not cut it anymore. As a minimum, we should have wurfl_markups.xml as a default choice, even though we may want to exploit the fact that we need to reorganize a lot of things to get a bunch of things right (not that what we did so far was wrong, but WURFL's popularity is demanding that more powerful mechanics is put in place). As far as Java is concerned, the Spring Framework seems a wonderful candidates to have the new version of the API totally generic while delegating configuration to Spring wiring. Back to out scenario of applications which in fact need to query different modules, the new API allows for that: dev_id_j2me = uam_j2me.getDeviceIDFromUALoose(UA); j2me_wmapi_1_1 = cm_j2me.getCapabilityForDevice(device_id, "j2me_wmapi_1_1"); dev_id_markups = uam_markups.getDeviceIDFromUALoose(UA); preferred_markup = cm_markups.getCapabilityForDevice(device_id, "preferred_markup"); if("true".equals(preferred_markup) && "false".equals(j2me_wmapi_1_1)) { : //I don't know why you may need this, but I'm sure you do } Test Suite ========== Once the modularization is in place, it is time to organize a long overdue WURFL test suite. People around the planet have the strangest devices in their hands. We need to enable them to check what a device can or can't do with just a few clicks. This requires the hosting and also a mechanism to capture device info through a web interface. So many ideas, so little time. Better UA MAtchers ================== When there's no exact match in WURFL, the API uses heuristics to find a best match fast. Currently, most heuristics are just simple tricks (ex: remove "Vodafone/" and retry). The main heuristics currently use is the one that, given a user-agent which does not match exactly, starts removing a char at the time from the end of the UA and tries to match that. So the following (fictional) User-Agent: NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC-1.0 will be compared with NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC-1. NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC-1 NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC- NokiaN-Gage/1.0 (6.4) SymbianOS/6.1 Series60/0.9 Profile/MIDP-1.0 Configuration/CLDC : NokiaN-Gage/1.0 (6 NokiaN-Gage/1.0 ( here WURFL stops. A match is found! => nokia_ngage_ver1_sub403 This approach has served the community well for a long time, but by now we know it's not enough. We have found that some devices would be better served by different algorithms. For example, there is a way to measure the so called Levensthein distance between two strings. For example, many devices that claim to be Internet Explorer would be better served by this strategy. Look at the following UA String for example: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; PalmSource/Palm-TnT5; Blazer/4.0) 16;320x448 It goes without saying that UA strings like this go a long way in generating false positive one way or the other. A solution to this problem can be found by adding one extra level to the UA analysis. For example, If a device UA starts with "Nokia", we could assume that the usual matching could give an optimal match. At the same time, if the UA string contains "Blazer" then we may want to choose the levenstein distance to match the UA. Here is a snippet of code that gives an idea of what could be done: public String getDeviceIDFromUALoose(String ua) { //bogus data if (ua == null || ua.length() == 0) { return "generic"; } //stabilize UA: no Gateway substring attachments ua = UAUtils.removeUPLinkFromUA(ua); //direct matches are direct matches. You are home. Object strictUA = listOfUAWithDeviceID.get(ua); if (strictUA != null) { return (String)strictUA; } //stabilize Vodafone if (ua.startsWith("Vodafone")) { String result = handleVodafone(ua); if (!result.equals("generic")) { return "result"; } } //Nokia if (ua.startsWith("Nokia")) { String result = applyNokiaHeuristic(ua); if (!result.equals("generic")) { return "result"; } } //MOT ("MOT-" string not always in the beginning of the string) if (ua.indexOf("MOT-") != -1) { String result = applyNokiaHeuristic(ua); if (!result.equals("generic")) { return "result"; } } //Siemens if (ua.startsWith("SIE-")) { String result = applySiemensHeuristic(ua); if (!result.equals("generic")) { return "result"; } } //similarly for other manufacturers families //Sharp, Alacatel,Sagem,NEC, Philips : //Mozilla: could be Microsoft, could be web browsers, //could be WebTV or a bunch of PDAs if (ua.startsWith("Mozilla/")) { String result = applyMozillaHeuristic(ua); if (!result.equals("generic")) { return "result"; } } //if everything else fails check for known //substrings (UP.Browser4,5,6,Nokia series 40,60,80) return UAUtils.lastAttempts(ua); } At this point the problem has been pushed down, with a big advantage, though: it's a bunch of smaller problem. Parsing a given Nokia UA is much simpler than parsing a UA string that could still be a Motorola too or a Microsoft Smartphone. Heuristics could be as simple String applyNokiaHeuristic(String ua) { //pass on the char that is needed to understand //the minimal number of chars required for a valid match String result = matchFromBottom(ua,"/"); return result; //could still be "generic" } at the same time, Mozilla devices might be better served with something more complicated: String applyMozillaHeuristic(String ua) { if (...figure out if it's a Microsoft device) { //might want to apply Levensthein distance here } if (...web TV..PDAs...web browser....) { //we will need some thinking here too } } Of course, if reliance on the Spring Framework is provided, it will be possible for developers to provide new implementations of methods and classes to override the ones provided with the standard installation. Beyond UA Matchers ================== Matching the User-Agent string does the job for and the WURFL core team is betting that it still will for a long time to come. Having said this, WURFL relies on the assumption that the UA string uniquely identifies a certain device. This is almost always true, but there are a handful of exceptions, with MS SmartPhone as the most notable culprit. Look at the headers generated by these two devices: Orange SPV M2000: HTTP_ACCEPT="*/*" HTTP_ACCEPT_ENCODING="gzip, deflate" HTTP_ACCEPT_LANGUAGE="en-us" HTTP_CONNECTION="Keep-Alive" HTTP_UA_COLOR="color16" HTTP_UA_CPU="Intel(R) PXA263" HTTP_UA_LANGUAGE="JavaScript" HTTP_UA_OS="Windows CE (Pocket PC) - Version 4.21" HTTP_UA_PIXELS="240x320" HTTP_UA_VOICE="TRUE" HTTP_USER_AGENT="Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)" and HP iPAQ h6330 HTTP_ACCEPT="*/*" HTTP_ACCEPT_ENCODING="gzip, deflate" HTTP_CONNECTION="Keep-Alive" HTTP_UA_COLOR="color16" HTTP_UA_CPU="TI OMAP1510" HTTP_UA_LANGUAGE="JavaScript" HTTP_UA_OS="Windows CE (Pocket PC) - Version 4.20" HTTP_UA_PIXELS="240x320" HTTP_UA_VOICE="TRUE" HTTP_USER_AGENT="Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)" Since the User-agent is exactly the same, WURFL id unable to tell one device from the other. WURFL should be extended to allow the framework to tell two different devices apart in spite of the fact that they share an identical UA String. The basic assumption is that, even when the UA is the same, some other HTTP header will be different, as in the case above. WURFL should model those differences in XML and the API should be enhanced to map HTTP requests to devices accurately. Since the existing mechanism has worked so well so far, the new mechanism should be backward compatible, meaning that UA matching will happen as before unless some extra information is provided. Today we have something like: Troubles start when we need to model two devices such as the ones above (SPV M2000 and HP iPAQ h6330). While we can model it in WURFL, the fact that the UA string is identical would confuse the API, which would be unable to tell the two devices apart. The solution could be something like this : : In other words, in case of devices with the same UA, WURFL now introduce a special kind of devices which resolves the conflict by looking at other HTTP headers too. Notes: 1 - It will be wise to provide a sensible default in case of no match 2 - The first match wins. I.e. let's keep things simple (and leave the default as the last entry) 3 - the 'logic' attributes should allow 3 or 4 keywords: contains, equals, startsWith, endsWith (we can always add fancier stuff later) 4 - There will be an impact on the API (we also need to pass the HTTP Request object now): String device_id = uam.getDeviceIDFromUALoose(request); Multiple Patch Files ==================== Having one patch file is useful, but why not two? or three? after all, we already have the web patch and someone may want to have some extra patches too. Separation of WALL/WURFL API ============================ At the time of this writing, WALL and the WURFL API are tightly coupled: WALL tags directly invoke the WURFL API, which makes it hard for people to use WALL against a different API or even a different repository if they wish. I say 'hard' and not impossible, because some did manage to use WALL against a different API. They achieved this by rebuilding the WALL tags against new WURFL API classes. Being this the case, it makes sense to generalize WALL in ways that the access to the API becomes pluggable. Of course, this may go hand-in-hand with the introduction of the Spring Framework in WURFL: the actual API is just a class you wire in in the Spring conf file. introduction of logging (Log4J) for Java ======================================== Introducing Logging (Log4J) was evaluated also during before thr release of WURFL/WALL's latest version. The reason why it was decided not to use it was performance: since the library may be used in applications that require high-performance (operator portals, for example), it was decided that a commented out 'System.out.print' would be perfect for the job. Now it's not like a got huge amounts of complaints about the lack of logging, but I did get some. Also, I think it would be more elegant and useful than System.out.print. So, I'll go for it. The way I understand it is that there are three ways logging could be added: Simple: _logger.debug("nice message number "+ i + ", but concatenating strings is expensive") Less simple, but preserving performance: if (_logger.isDebugEnabled()) { _logger.debug("Makes me wonder why I didn't stick to System.out.print()."); } Best in terms of performance, but not the simplest: if (Constants.MYDEBUG) { _logger.debug("Still verbose, but the compiler will clean everything."); _logger.debug("if statement still a pain in the backside."); } Which one do you recommend? (I wonder if I should make the logger also pluggable using the Spring framework....). Load wurfl.xml from a zip file ============================== This is a simple one, but also very useful as the wurfl file gets increasingly large. better management for singletons (Spring Framework?) ==================================================== I have been keeping an eye over the SpringFramework in the past few months: http://www.springframework.org/ My understanding of the framework is that (among other things) Spring is about placing everything that has to do with initialization into an XML configuration file. Once that is done, you no longer need to initialize a thing in your classes: database connections are there, your dependencies are correct and you find them ready for use in your classes. The advantage of this is that your code no longer needs to worry whether it's running in a servlet or not, or if your connection to the database is active or not. All of this is taken care of by the framework. In the specific case of WURFL, the introduction of Spring would produce bring a few advantages: - users don't have to worry about service initialization - if implemented from interfaces, all the services become pluggable - Spring does not seem to be very invasive, which is good. - by integrating with SpringFramework, the problem with the WURFL singletons (which has been bothering many) should be automatically solved ( http://wiki.apache.org/tomcat/OutOfMemory ). I understand that Spring will initialize and release context resources automatically. Of course, there is a price to pay: - having to introduce dependencies and jars to the distribution (which I always tried to avoid) - more configuration files to manage, which increases the complexity and scares newbies away. Here is how I envision the default spring configuration file: /WEB-INF/wurfl.xml /WEB-INF/wurfl_patch.xml One implication of this is that many WURFL classes may have to be turned into interfaces, in order to let people provide alternative implementations that they can plug in (or 'wire in', in Spring lingo) through Spring. It should not be hard to separate WALL from the WURFL APIs in ways that a different API (even one that does not talk to WURFL!) can be used.