Mailing list ARC-DEV: Archives

Spurious bnodes in Turtle output

From: Alex Cozzi 
Subject: Spurious bnodes in Turtle output
Date: Wed, 22 Apr 2009 15:27:22 -0700


I am using ARC2 to parse and generate Turtle output and I observe that  
it generates spurious bnode. Here is the minimal example that i can  
get to show the problem:
Given this php code:

<html>
<head>
<style>
div.turtle {
     border: 1px;
     font-family: consolas, courier, sans-serif;
     font-size: 8pt;
     white-space: pre;
     background-color: #e5e5e5;
}
</style>
<title>ARC Turtle test</title>
</head>
<body>
<?php
include_once("./arc/ARC2.php");
$parser = ARC2::getSemHTMLParser();
$parser->parse("test.xml");
$parser->extractRDF('rdfa');
$triples = $parser->getTriples();
$serializer = ARC2::getTurtleSerializer();
echo "<h2 align=\"center\">RDF content (Turtle)</h2>";
echo "<div class=\"turtle\">" . htmlspecialchars($serializer- 
 >toTurtle($triples)
) . "</div>\n";
?>
</body>
</html>

and this test.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns:dc="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/ 
">
<body>
<p rel="dc:subject">
         <span typeof="foaf:name"/>
</p>
</body>
</html>

I get the following Turtle output:

RDF content (Turtle)
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ns0: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <http://purl.org/dc/terms/> .
  _:arcf418b2 rdf:type ns0:name .
<file:///Users/xela/Sites/test.xml> ns1:subject _:arcf418b2 ,  
_:arcf418b1 .

As you can notice 2 blank nodes where generated, where instead I was  
expecting only 1.

for comparison pyRDFa distiller generates:

<test.xml> dc:subject
          [ a foaf:name
          ] .