What Does Equality Mean?

One of my colleagues pointed out something interesting in the Java class documentation a few days ago, and I thought I'd make a small demo and write about it. Take a look at the following simple piece of Java code, as you can see it creates two URL objects then displays if they're equal on the console:

import java.net.URL;
import java.net.MalformedURLException;

class Equals
    public static void main(String[] args) throws MalformedURLException
	URL url = new URL("http://www.example.com/");
	URL url2 = new URL("http://www.westpoint.ltd.uk/");

	System.out.println( url.equals(url2) );

Running the code offers no surprises - as expected Java quite correctly says they're not the same:

rich@smithers:~/equals_v1> java Equals

Okay, so that wasn't very interesting. Let's try something slightly different, I'll change the URLs so the code reads like this:

        URL url = new URL("http://xmelegance.org/");
        URL url2 = new URL("http://needcoffee.co.uk/");

I didn't make any other changes, but now if we look at the result we see this:

rich@smithers:~/equals_v2> java Equals

Err. What's going on? Those URLs are clearly different, but Java thinks they're the same. The answer lies hidden in the line of the documentation my colleague spotted: Two hosts are considered equivalent if both host names can be resolved into the same IP addresses. As it happens, I know that both of those two domains are actually hosted as different vhosts on the same IP address, which explains the result.

We can see this happening if we fire up Ethereal and watch the network as we run the code. Just the act of comparing the two URLs is enough to make Java send off two DNS requests, then wait for the DNS server to reply. Often the DNS server wouldn't have the answer cached, so this could take around 100ms which is a long time in computing terms.

Of course, not every system has DNS setup... What do you think happens if we disable it then rerun the exact same code?

rich@smithers:~/equals_v2> java Equals

Well, that's confusing! The same code can actually give either answer depending on whether we we have DNS or not. In fact, things are even worse though - if we take a site like Google that can return different IP address each time the DNS is queried then we can actually see Java tell us that two identical looking URLs are different.

The implications of this behaviour are important to be aware of (at least if you're writing Java code). The equals() method is not a string comparison as you'd expect. In fact it makes DNS requests to resolve the hostname in the URL to an IP address. This means that calls to it will often involve at least one network round trip (with all the performance implications this implies).

It also means that if you're writing security sensitive code then you need to be extremely careful when comparing URLs because the chances are that using the equals() method will not do what you'd expect. You might want to rewrite such code to use the URI class instead which has much less surprising behaviour.

If you want to see more about how Java compares URLs, take a look at the reference documentation for the equals method of URL.

< Previous Article
Recent News
Next Article >
XML External Entities, Attack and Defence


XML External Entities, Attack and Defence

XML is used widely in many different areas of computing. It's been wildly successful especially compared to its more complex sibling SGML. Most people think of XML as just a bunch of tags and some text, which is normally a perfectly reasonable way to regard it. Unfortunately when you're working with XML data that originates from an untrusted source there are some gotchas waiting to bite you.

Read more
Designed & Built by e3creative