Mailing list ARC-DEV: Archives

Re: [arc-dev] LOADing DBpedia URIs with escaped chars (followup of twitter convo)

From: "Patrick Murray-John" 
Subject: Re: [arc-dev] LOADing DBpedia URIs with escaped chars
	(followup of twitter convo)
Date: Sun, 24 May 2009 17:53:59 -0400


Benji,

Many thanks...I had sorta discounted that possibility, but looks like it's =
something to consider after all.  I'll be happy to send something on to =
the dbpedia folks if ya want.  Indeed, could definitely get annoying on =
that level!

Thanks (and hope you're recovering from your trip happily!),
Patrick

>>> Benjamin Nowack <bnowack@semsol.com> 05/24/09 1:27 PM >>>


Hi Patrick,

ARC's HTTP reader does the RDF discovery for you, but it does not=20
convert URIrefs in any way. DBPedia 303-redirects to=20
"...(learning_theory).xml" where it only serves one triple. If the
303 contained "...%28learning_theory%29.xml", ARC would have found the=20
whole data which is served there. Not sure if that's a bug in ARC, but=20
it looks like the DBPedia server should provide the full data at both=20
URLs. Might be worth asking the dbpedia folks, I can send a mail to the=20
LOD list. If HTTP client have to normalize URIrefs, then I'm not sure =
if=20
I'll be able to get that right on a general level. I could %xx-escape=20
special characters, but then there are also IRIrefs, which might be =
fine=20
with "(" and ")". Annoying..

Best,
Benji

Patrick Murray-John wrote:
>=20
> Benji,
>=20
> Thanks for responding about loading DBpedia URIs.  I'm using the latest =
version, and from the endpoint am trying to do this LOAD:
>=20
> LOAD <http://dbpedia.org/resource/Constructivism_%28learning_theory%29>
>=20
> The interesting thing is it is returning just one triple, which uses the =
unescaped parens:
>=20
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix ns0: <http://dbpedia.org/class/yago/> .
>=20
> <http://dbpedia.org/resource/Constructivism_(learning_theory)> rdf:type =
ns0:PsychologicalTheories
>=20
>=20
> This leads me into areas beyond my understanding, but here's some =
results trying to chase things down with curl :
>=20
>=20
> $ch =3D curl_init('http://dbpedia.org/resource/Constructivism_%28learning=
_theory%29');
> curl_setopt($ch, CURLOPT_HTTPHEADER, array('Accept: application/rdf+xml')=
 );
> curl_setopt($ch, CURLOPT_HEADER, 1);
> curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);=20
> curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
> $data =3D curl_exec($ch);
> curl_close($ch);
> print $data;
>=20
> and the $data is:
>=20
> HTTP/1.1 303 See Other
> Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64  VDB
> Connection: close
> Date: Sun, 24 May 2009 14:50:21 GMT
> Accept-Ranges: bytes
> TCN: choice
> Vary: negotiate,accept
> Content-Location: Constructivism_(learning_theory).xml
> Content-Type: application/rdf+xml; qs=3D0.95
> Location: http://dbpedia.org/data/Constructivism_(learning_theory).xml
> Content-Length: 0
>=20
> HTTP/1.1 200 OK
> Server: Virtuoso/05.11.3039 (Solaris) x86_64-sun-solaris2.10-64  VDB
> Connection: Keep-Alive
> Date: Sun, 24 May 2009 14:50:21 GMT
> Accept-Ranges: bytes
> Content-Type: application/rdf+xml; charset=3DUTF-8
> Content-Length: 354
>=20
> <Xxml version=3D"1.0" encoding=3D"utf-8" ?>
> <rdf:RDF xmlns:rdf=3D"http://www.w3.org/1999/02/22-rdf-syntax-ns#" =
xmlns:rdfs=3D"http://www.w3.org/2000/01/rdf-schema#">
> <rdf:Description rdf:about=3D"http://dbpedia.org/resource/Constructivism_=
(learning_theory)"><rdf:type rdf:resource=3D"http://dbpedia.org/class/yago/=
PsychologicalTheories"/></rdf:Description>
> </rdf:RDF>
>=20
> Same result using curl from terminal.=20
> (The real data I'm trying to get to has many triples.)
>=20
> I don't have a good understanding of HTTP, but this redirect to the xml =
file with unescaped parens makes me wonder if there's something I'm =
missing in the way PHP and/or cURL is configured on my laptop, because it =
looks like the data returned from LOAD in the ARC endpoint is the same as =
the data from a straight cURL, bypassing ARC altogether.
>=20
> Any thoughts are much appreciated!
>=20
> Patrick
>=20
>=20
>=20
>=20
>=20


""" ;
         ns1:returnPath "<pgosetti@umw.edu>" ;
         ns1:xOriginalTo "arc-dev@semsol.org" ;
         ns1:deliveredTo "web11p1@p15192371.pureserver.info" ;
         ns1:received """from SMTP-DOM-MTA by umwgw.umw.edu
	with Novell_GroupWise; Sun, 24 May 2009 17:54:30 -0400""" ;
         ns1:messageId "<4A1989A0020000B10009AFB3@umwgw.umw.edu>" ;
         ns1:xMailer "Novell GroupWise Internet Agent 7.0.3 " ;
         ns1:date "Sun, 24 May 2009 17:53:59 -0400" ;
         ns1:from '"Patrick Murray-John" <pgosetti@umw.edu>' ;
         ns1:to "<arc-dev@semsol.org>" ;
         ns1:subject """Re: [arc-dev] LOADing DBpedia URIs with escaped chars
	(followup of twitter convo)""" ;
         ns1:mimeVersion "1.0" ;
         ns1:contentType "text/plain; charset=US-ASCII" ;
         ns1:contentTransferEncoding "quoted-printable" ;
         ns1:contentDisposition "inline" ;
         ns1:xSpamCheckerVersion """SpamAssassin 2.64 (2004-01-11) on 
	p15192371.pureserver.info""" ;
         ns1:xSpamLevel "" ;
         ns1:xSpamStatus """No, hits=-3.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=2.64