URI handling in RDF/XML generation
From: "Stuckman, Jeffrey C."
Subject: URI handling in RDF/XML generation
Date: Mon, 23 Nov 2009 19:01:52 -0500
Hello,
I have discovered a problem in ARC2 that causes RDF/XML generation to fail =
in some cases.
To generate RDF/XML, prefixes (or namespaces) must be inferred from the sub=
ject URIs in order to represent subjects in RDF/XML. In ARC2, this logic is=
apparently handled by the splitURI() function.
Due to the way the splitURI function is written, it will fail if the incomi=
ng URI does not contain any forward slashes or hash marks. Such URIs are le=
gal per the RFCs for RDF/XML and URI. The splitURI function will also fail =
if the URI contains other characters in certain positions that are illegal =
in XML element names.
The offending lines in splitURI() are here:
function splitURI($v) {
$parts =3D preg_match('/^(.*[\/\#])([^\/\#]+)$/', $v, $m) ? array($m[1]=
, $m[2]) : array($v);
...
To fix the problem locally, I replaced the entire splitURI() method with th=
e following:
function splitURI($uri) {
return preg_match('|^(.*?)([A-Z_a-z][-A-Z_a-z0-9.]*)$|S',$uri,$m) ?=
array($m[1], $m[2]) : array($uri);
}
Applying my fix allowed my RDF/XML to be successfully generated, and now th=
e behavior of splitURI matches the behavior of the equivalent method in Jen=
a.
You will notice that in my patch, I omitted the logic present in the second=
half of the method. This is because I was unable to figure out what that c=
ode is supposed to do. If this logic needs to be preserved, I believe that =
you can preserve it by simply keeping the old code in the method and replac=
ing the old regular expression with my modified one.
Can one of the developers update splitURI() to include my patch?
Thanks,
Jeff