Popular Posts

java.util.uuid FAQ
XPath Engine Comparison
Author Tips

Jan
4th
2005
Tue
permalink

java.util.UUID mini-FAQ

J2SE 1.5 gives us a new class to generate identifiers. Here are some details I found after writing a bit of code and doing some reading.

What is a UUID?

According to Wikipedia:

“…an identifier standard used in software construction, standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE). The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that that identifier will never be unintentionally used by anyone for anything else. Information labelled with UUIDs can therefore be later combined into a single database without need to resolve name conflicts. The most widespread use of this standard is in Microsoft’s Globally Unique Identifiers (GUIDs) which implement this standard.

A UUID is essentially a 16-byte number and in its canonical form a UUID may look like this:

550E8400-E29B-11D4-A716-446655440000

How do I generate a new UUID?

Calling the randomUUID() factory method will give you a fresh instance. Calling toString() will provide a spec-compliant canonical string representation. See the testRandom method in the source below.

I thought these were called GUIDs. What is a UUID?

Per the draft spec referenced below these are synonymous:

…UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
Unique IDentifier).

I’m a Windows guy. Is this the same thing as CoCreateGuid?

According to MSDN the current version of CoCreateGuid also generates a ‘version 4′ random number based UUID.

The CoCreateGuid function calls the RPC function UuidCreate, which creates a GUID, a globally unique 128-bit integer. Use the CoCreateGuid function when you need an absolutely unique number that you will use as a persistent identifier in a distributed environment.To a very high degree of certainty, this function returns a unique value no other invocation, on the same or any other system (networked or not), should return the same value.

For security reasons, it is often desirable to keep ethernet/token ring addresses on networks from becoming available outside a company or organization. In Windows XP/2000, the UuidCreate function generates a UUID that cannot be traced to the ethernet/token ring address of the computer on which it was generated. It also cannot be associated with other UUIDs created on the same computer. If you do not need this level of security, your application can use the UuidCreateSequential function, which behaves exactly as the UuidCreate function does on all other versions of the operating system.

In Windows NT 4.0, Windows 95, DCOM release, and Windows 98, UuidCreate returns RPC_S_UUID_LOCAL_ONLY when the originating computer does not have an ethernet/token ring (IEEE 802.x) address. In this case, the generated UUID is a valid identifier, and is guaranteed to be unique among all UUIDs generated on the computer. However, the possibility exists that another computer without an ethernet/token ring address generated the identical UUID. Therefore you should never use this UUID to identify an object that is not strictly local to your computer. Computers with ethernet/token ring addresses generate UUIDs that are guaranteed to be globally unique.

I thought UUIDs used the MAC address to guarantee uniqueness

The specification defines multiple types of UUIDs. Version 1 UUIDs include a MAC address. Version 4 UUIDs are generated from a large random number and do not include the MAC address. The implementation of java.util.UUID creates version 4 UUIDs. See the testVersion method in the source below.

So java.util.UUID can only generate version 4 UUIDs. Can it parse and handle the other types?

Yes. Using the fromString method you can instantiate and interrogate a UUID of version 1,2,3 or 4. See the testFromString method in the source below; note it’s a version 1 UUID.

The draft spec talks about multiple variant types of UUIDs. Which variant types are supported by java.util.UUID?

As indicated by the javadoc for getVariant(), java.util.UUID only supports variant type 2.

I like the idea of using the MAC address. What makes version 4 UUIDs better?

Using a version 4 UUID could save you 20 months in a United States Federal Prison. Evidently the writer of the Melissa worm was tracked down in part by the MAC address in a UUID.

So these version 4 UUIDs are basically random numbers. Won’t my UUID collide with someone elses?

There are 122 significant bits in a type 4 UUID. 2122 is a *very* large number. Assuming a random distribution of these bits, the probability of collission is *very* low. How is the “randomness” determined?

Under the hood java.util.UUID is creating an instance of SecureRandom and using that to generate new UUIDs. If you are using the default Sun provider and default java.security file, you are using a SHA1PRNG ( Pseudo Random Number Generator based on Secure Hash Algorigthm 1 ) seeded from the operating system.

java.security

#
# Select the source of seed data for SecureRandom. By default an
# attempt is made to use the entropy gathering device specified by
# the securerandom.source property. If an exception occurs when
# accessing the URL then the traditional system/thread activity
# algorithm is used.
#
# On Solaris and Linux systems, if file:/dev/urandom is specified and it
# exists, a special SecureRandom implementation is activated by default.
# This "NativePRNG" reads random bytes directly from /dev/urandom.
#
# On Windows systems, the URLs file:/dev/random and file:/dev/urandom
# enables use of the Microsoft CryptoAPI seed functionality.
#
securerandom.source=file:/dev/urandom

On WinTel platforms this may map down to hardware

More on randomness

I want to preserve backward JDK compatibilty. I’ll just use java.rmi.server.UID. Cool?

Probably a bad idea. Per the docs this gives you 216 significant digits. It is also makes no provision for the system clock being set backward.

What other options do I have?

commons-id is a general purpose identifier generator capable of generating UUIDs.

Where can I read more about UUIDs?

Current Draft Specification

Sample Code

import java.util.UUID;
  import junit.framework.TestCase;
public class UUIDTest extends TestCase {  	
	public void testRandom() { 		
		UUID a = UUID.randomUUID();
		UUID b = UUID.randomUUID();
		assertFalse(a.equals(b));
	}  	

	public void testVariant() { 		
		UUID a = UUID.randomUUID();
		assertEquals(a.variant(), 2);
	}  	

	public void testVersion() { 		
		UUID a1 = UUID.randomUUID();
		assertEquals(a1.version(), 4);
		// here is a version 1 UUID plucked from my own HKLMSoftwareClasses 		
		// for history of UUIDs ending in 444553540000 		
		// http://blogs.msdn.com/oldnewthing/archive/2004/02/11/71307.aspx#71356 		
		UUID a2 = UUID.fromString("d27cdb6e-ae6d-11cf-96b8-444553540000");
		assertEquals(a2.variant(), 2);
		assertEquals(a2.version(), 1);
	}  	

	public void testFromString() { 		
		String s = "d27cdb6e-ae6d-11cf-96b8-444553540000";
		UUID a = UUID.fromString(s);
		assertEquals(a.toString(), s);
	}  	

	public void testVersionFromCommonsTestCase() {  		
		// these UUIDs are from commons-id 		
		UUID v1 = UUID.fromString("3051a8d7-aea7-1801-e0bf-bc539dd60cf3");
		UUID v2 = UUID.fromString("3051a8d7-aea7-2801-e0bf-bc539dd60cf3");
		UUID v3 = UUID.fromString("3051a8d7-aea7-3801-e0bf-bc539dd60cf3");
		UUID v4 = UUID.fromString("3051a8d7-aea7-4801-e0bf-bc539dd60cf3");

		//UUID v5 = UUID.fromString("3051a8d7-aea7-3801-e0bf-bc539dd60cf3");
		assertEquals(1, v1.version());
		assertEquals(2, v2.version());
		assertEquals(3, v3.version());
		assertEquals(4, v4.version());

		// java.util.UUID doesn't support version 5 UUIDs while commons-id does 		
		//assertEquals(5, v5.version());
	} 
}
Comments (View)    
Dec
11th
2004
Sat
permalink

2 new java.util.Collections methods in JDK 1.5

java.util.Collections.frequency provides a useful way to determine number of elements in a collection

java.util.Collections.disjoint facilitates comparison of overlap
in collections

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import junit.framework.TestCase;

public class CollectionMethodsTest extends TestCase {

	public void testDisjointTrue() {
		Set a = new HashSet();
		a.add("zero");
		a.add("un");
		a.add("deux");

		Set b = new HashSet();
		b.add("trois");
		b.add("quatre");
		b.add("cinq");

		// disjoint will return TRUE since there IS NOT overlap
		assertTrue(Collections.disjoint(a, b));
	}

	public void testDisjointFalse() {
		Set a = new HashSet();
		a.add("zero");
		a.add("un");
		a.add("deux"); // both a and b contain this entry
	
		Set b = new HashSet();
		b.add("deux"); // both a and b contain this entry
		b.add("trois");
		b.add("quatre");
		b.add("cinq");

		// disjoint will return FALSE since there IS overlap
		assertFalse(Collections.disjoint(a, b));
	}

	public void testFrequency() {
		List a = new ArrayList();
		a.add("brun");

		a.add("bleu");
		a.add("bleu");

		a.add("rouge");
		a.add("rouge");
		a.add("rouge");

		assertEquals(Collections.frequency(a, "vert"), 0);
		assertEquals(Collections.frequency(a, "brun"), 1);
		assertEquals(Collections.frequency(a, "bleu"), 2);
		assertEquals(Collections.frequency(a, "rouge"), 3);

	}
}
Comments (View)    
permalink

Useful new JDK 1.5 method: java.util.regex.Pattern.quote( String s )

Nosing around with JDK 1.5 changes this evening. Pattern now implements a convenience method quote. Useful to if your application takes a user-entered search string that gets passed into Pattern.compile. No need to worry about a malformed expression or worse yet a regexp-injection exploit.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import junit.framework.TestCase;

public class QuoteTest extends TestCase {

	/**
	 * with JDK 1.5.0
	 * pass the pattern text through Pattern.quote
	 *
	 */
	public void testQuoteFunction() {
		String s = Pattern.quote("[test]");
		Pattern p = Pattern.compile(s);
		Matcher m = p.matcher("[test]");
		assertTrue(m.matches());
	}

	/**
	 * prior to JDK 1.5.0
	 * Q ...everything quoted until... E
	 * could still be problematic if user entered Q or E
	 *
	 */
	public void testQuote() {

		String s = "Q[test]E";
		Pattern p = Pattern.compile(s);
		Matcher m = p.matcher("[test]");
		assertTrue(m.matches());
	}

	/**
	 * prior to JDK 1.5.0
	 * escape each chacter with
	 *
	 */
	public void testCharacterEscape() {
		String s = "[test]";
		Pattern p = Pattern.compile(s);
		Matcher m = p.matcher("[test]");
		assertTrue(m.matches());
	}

}
Comments (View)    
Nov
16th
2004
Tue
permalink

Java XPath 1.0 Engine Comparison - Compliance

To address a couple of questions that have come up: All tests used DOM since it is the most widely supported data model. The latest stable binary releases of each library were employed. At the time of testing this was:

  • Resin 3.0.9
  • Saxon 8.0.1
  • Xalan 2.6.0
  • JXpath 1.2
  • JXP 1.3.5
  • Jaxen 1.0
  • J2SE1.5.0

Compliance Notes

  • J2SE 1.5 and Xalan - Excellent compliance to spec. Oddly, perhaps a little too compliant? If I cranked the iterations up I encountered the effects of this known issue.
  • Jaxen - Overall reasonable compliance. I logged this issue w/substring calls here. Seems to be an issue navigating the ancestor-or-self axis as well; it’s picking up 6 nodes instead of 5.
  • JXP - Version 1.3.3 ( the one I started with ) had some issue navigating the following-sibling axis and some glitches in substring handling. Alexandre and I exchanged some emails and he released 1.3.4 and 1.3.5 to address these concerns. We still have to work through this name(node()) failure and see if I’m doing something wrong.
  • JXPath - I began w/version 1.1 and moved to 1.2. This addressed some issues I encountered. Still an odd problem with the //* and //*[contains(string(.),’Capulet’)] test where it seems there’s a node missing from the result sets. I need to log this in bugzilla.
  • Resin - Failed to seemingly very simple and fundamental tests //speech[speaker = ‘Rom.’] and //speech[speaker = ‘Jul.’]. An issue is logged with caucho, but it has received no attention.
  • Saxon - Some small issues with substring / floor / ceiling. Otherwise excellent compliance. I need to log these issues with the Saxon project.

Summary

  • This work turned into more a compliance test than a performance test. I began with performance questions but quickly shifted to concerns about poor compliance to the XPath 1.0 spec. Each project has it’s own way to perform regression checks, but there is little consistency across libraries. There is clearly a need here for a cross library / platform XPath compliance test to reduce the duplicate effort going on in each project and improve overall compliance levels. OASIS has some tests, but they are coupled to XSLT. This also has me curious about XPath 2.0 and XQuery compliance.
  • There’s a clear lesson as well: Beware of swapping out your Xpath engine without some serious regression tests.
Comments (View)    
Nov
13th
2004
Sat
permalink

Java XPath 1.0 Engine Comparison - Performance

The Tests:

The Results: Each test executed 10x. J2SE 1.5. Intel 2.4Ghz. Results in milliseconds. Grey boxes are compliance failures.

General Performance Observations

  • Navigating the descendant axis is slow - Expressions that included descendant axis traversal tend to be quite slow.
  • Function evaluation is fast - Expressions that include no node traversal were in general quite fast.
  • Explicit path select is fast - Expressions that spell out the path to desired nodes are in general quite fast.

Library Specific Observations

  • J2SE 1.5 and Xalan - These are the same engines. Sun wraps Xalan with a thin facade ( com.sun.org.apache.xpath.internal ). Performance and compliance was good. What you would expect from a mature library.
  • Jaxen - Slowest of the bunch. Mostly attributable to descendant axis traversal. Hope? http://jira.werken.com/browse/JAXEN-31
  • JXP - Speed is it’s proposed value proposition. While it’s not the slowest, it’s not the fastest either. Perhaps the time sink is in node retrieval and not traversal ( sum(//act/@id) took 15ms but //* took 2797ms )
  • JXPath - Reasonable performance. Does not like ancestor-or-self::*.
  • Resin - Wicked fast under certain circumstances ( //* in 63ms ! ). Extremely slow under others ( //line | //speaker in 5265ms ! ).
  • Saxon - Good performance. following-sibling traversal a bit slow.

Some take aways:

  • Measure your expression execution early and often; don’t assume.
  • Don’t get lazy. Avoid descendant navigation if you know the path to the nodes you are looking for.
  • Function evaluation is fast. Don’t be afraid to use them. Don’t overlook your XPath engine if your app needs simple expression evaluation.

Sidebar: Seems like a cross library / platform performance test suite could be created. A java specific harness could also be crafted to save some double effort and promote performant implementations.

Comments (View)    
permalink

Java XPath 1.0 Engine Comparison

First post. Be kind.

Earlier in the year, I wrapped up a project that made extensive use of Jaxen XPath 1.0 expressions evaluated against JDOM documents. After completing the project I began to explore JDOM’s ability to plug in additional XPath engines; which led me to wonder:

Why Jaxen and not one of the other engines out there?

I did a bit of googling and located these XPath 1.0 implementations:

How do the implementations differ in terms of:

  • Performance
  • Compliance to the spec
  • Support for variable binding
  • Support for precompiled expressions
  • Namespace handling
  • Object graph support ( DOM, JDOM, beans, …)
  • key()/id() support
  • Ability to modify object graphs
  • Pointer support
  • Custom function support
  • Extensibility
  • Project activity
  • Implementation details
  • Licensing

I put together a small suite of expressions and spent some time with each of the libraries mentioned. I’ll post my findings as time permits in subsequent blog entries. If there is any interest, I can post the source to the performance and compliance test harness.

Comments (View)