Re: [arc-dev] erdf parser
From: =?ISO-8859-1?Q?Robert_Goen=E9?=
Subject: Re: [arc-dev] erdf parser
Date: Fri, 16 May 2008 15:17:29 +0200
Hi Benjamin,
Thanks! I think the 'or' rule should be implemented, as the spec =20
suggests: if there is a label, use it as an object, else use the =20
content value.
Could you tell me how you should implement it? I would like to =20
understand the parser a bit better and use it right away. I was =20
thinking of adjusting the following function:
/* imgs */
if ($n['tag'] =3D=3D 'img') {
if (($s =3D $this->v('src iri', '', $n['a'])) && $ct=20
['cur_obj_literal']['val']) {
$t =3D array(
's' =3D> $s,
's_type' =3D> 'iri',
'p' =3D> $ct['ns']['rdfs'] . 'label',
'o' =3D> $ct['cur_obj_literal']['val'],
'o_type' =3D> 'literal',
'o_lang' =3D> $ct['cur_obj_literal']['dt'] ? '' : $ct=20
['cur_obj_literal']['lang'],
'o_dt' =3D> $ct['cur_obj_literal']['dt'],
);
$this->addT($t);
Thanks in advance!
On 16-mei-2008, at 11:10, Benjamin Nowack wrote:
>
>
>
> Hi Robert,
>
>
>
> I *think* I had support for anchors in an earlier stand-alone
>
> eRDF parser, but forgot to implement them when I switched to the
>
> extractor approach. Anyway, I'll add the label generation in the
>
> next rev. Generating two triples per anchor would need more work
>
> as the "current literal value" is generated in a separate method
>
> that prioritizes @title over plain node content. (And some people
>
> would possibly complain about triple bloat.) You'd probably have
>
> write a dedicated (@title-ignoring) extractor for anchors.
>
>
>
> Thanks for spotting this!
>
>
>
> Cheers,
>
> Benji
>
>
>
> --
>
> Benjamin Nowack
>
> http://bnode.org/
>
>
>
> On 15.05.2008 23:37:13, Robert Goen=E9 wrote:
>
>>
>
>>
>
>> Hi!
>
>>
>
>> I am using ARC2's eRDF parser extensively and keep on discovering new
>
>> and useful ways of using embedded rdf in plain html all the time.
>
>>
>
>> I keep on running in the following issue: the parsing of the anchor
>
>> elements is not in conformance with the specification. The title
>
>> attribute or the element's content should be added as rdfs labels.
>
>> Without this feature, we cannot represent our everyday use of links
>
>> with rdf triples.
>
>>
>
>> The erdf summary states the following:
>
>>
>
>> "In addition, anchors generate triples with:
>
>>
>
>> * a subject URI derived from the href attribute
>
>> * a predicate of rdfs:label
>
>> * a literal value equal to the value of the ''title' attribute
>
>> if present, or the string-value of the anchor element's content if
>
>> not." http://research.talis.com/2005/erdf/wiki/Main/
>
>> SummaryOfTripleProductionRules
>
>>
>
>> I would even say that the title and the element's content should BOTH
>
>> produce rdfs labels, as they both are ways of describing the =20
>> hyperlink.
>
>>
>
>> WYT?
>
>>
>
>> Regards, Robert Goen=E9
>
>>
>
>>
>
>
>
""" ;
ns1:returnPath "<robert@goene.nl>" ;
ns1:xOriginalTo "arc-dev@semsol.org" ;
ns1:deliveredTo "web11p1@p15192371.pureserver.info" ;
ns1:received """from ?10.0.0.51? ( [84.87.3.38])
by mx.google.com with ESMTPS id d25sm5491143nfh.27.2008.05.16.06.17.31
(version=TLSv1/SSLv3 cipher=RC4-MD5);
Fri, 16 May 2008 06:17:31 -0700 (PDT)""" ;
ns1:mimeVersion "1.0 (Apple Message framework v753)" ;
ns1:inReplyTo "<PM-GA.20080516111053.D178C.1.1D@semsol.com>" ;
ns1:references "<6C8D2235-D8FB-4A62-9C9F-0A2D259B6D46@goene.nl> <PM-GA.20080516111053.D178C.1.1D@semsol.com>" ;
ns1:contentType "text/plain; charset=ISO-8859-1; delsp=yes; format=flowed" ;
ns1:messageId "<02A0AC59-A439-48EC-8442-EF4365736EF1@goene.nl>" ;
ns1:contentTransferEncoding "quoted-printable" ;
ns1:from "=?ISO-8859-1?Q?Robert_Goen=E9?= <robert@goene.nl>" ;
ns1:subject "Re: [arc-dev] erdf parser" ;
ns1:date "Fri, 16 May 2008 15:17:29 +0200" ;
ns1:to '"arc-dev" <arc-dev@semsol.org>' ;
ns1:xMailer "Apple Mail (2.753)" ;
ns1:xSpamCheckerVersion """SpamAssassin 2.64 (2004-01-11) on
p15192371.pureserver.info""" ;
ns1:xSpamLevel "" ;
ns1:xSpamStatus """No, hits=-3.2 required=5.0 tests=AWL,BAYES_01 autolearn=ham
version=2.64