Mailing list ARC-DEV: Archives

URI handling in RDF/XML generation

From: "Stuckman, Jeffrey C." 
Subject: URI handling in RDF/XML generation
Date: Mon, 23 Nov 2009 19:01:52 -0500


Hello,

I have discovered a problem in ARC2 that causes RDF/XML generation to fail =
in some cases.

To generate RDF/XML, prefixes (or namespaces) must be inferred from the sub=
ject URIs in order to represent subjects in RDF/XML. In ARC2, this logic is=
 apparently handled by the splitURI() function.

Due to the way the splitURI function is written, it will fail if the incomi=
ng URI does not contain any forward slashes or hash marks. Such URIs are le=
gal per the RFCs for RDF/XML and URI. The splitURI function will also fail =
if the URI contains other characters in certain positions that are illegal =
in XML element names.

The offending lines in splitURI() are here:

  function splitURI($v) {
    $parts =3D preg_match('/^(.*[\/\#])([^\/\#]+)$/', $v, $m) ? array($m[1]=
, $m[2]) : array($v);
...

To fix the problem locally, I replaced the entire splitURI() method with th=
e following:

    function splitURI($uri) {
        return preg_match('|^(.*?)([A-Z_a-z][-A-Z_a-z0-9.]*)$|S',$uri,$m) ?=
 array($m[1], $m[2]) : array($uri);
       }

Applying my fix allowed my RDF/XML to be successfully generated, and now th=
e behavior of splitURI matches the behavior of the equivalent method in Jen=
a.

You will notice that in my patch, I omitted the logic present in the second=
 half of the method. This is because I was unable to figure out what that c=
ode is supposed to do. If this logic needs to be preserved, I believe that =
you can preserve it by simply keeping the old code in the method and replac=
ing the old regular expression with my modified one.

Can one of the developers update splitURI() to include my patch?

Thanks,
Jeff