RDF/XML Parser (v1)

This parser creates an array of triples from RDF/XML. It passes each of the 128 positive parser tests.

Setup

Simply include the parser class:
include_once("path/to/arc/ARC_rdfxml_parser.php");

Instantiation

The parser can be instantiated with an array of parameters with the following (all optional) keys:
  • base
  • bnode_prefix (custom bnode prefix)
  • encoding
  • proxy_host
  • proxy_port
  • user_agent (custom User-Agent string)
  • headers (an array of HTTP headers)
  • save_data (parsed RDF/XML chunks will be stored in a variable during parsing)
e.g.
$args = array(
  "bnode_prefix" => "genid",
  "base" => ""
);
$parser = new ARC_rdfxml_parser($args);

Parsing

There are three different methods for parsing:
  • parse_web_file($url)
  • parse_file ($path)
  • parse_data ($data)
The parse_web_file method sends an "Accept: application/rdf+xml" header and follows up to 4 HTTP redirects. Here is an example for parsing an RDF/XML file from the Web:
$url = "http://www.example.com/data.rdf";
$result = $parser->parse_web_file($url);
if (is_array($result)) {
  echo count($result) . " triples found";
}
else {
  echo "couldn't parse " . $url . ": " . $result;
}

Triples array structure

The triples array returned by the parser is a flat array of associative arrays. It can be processed with a simple loop:
$triples = $parser->parse_web_file($url);
for ($i = 0, $i_max = count($triples); $i < $i_max; $i++) {
  $triple = $triples[$i];
  echo 'triple ' . $i . ': ';
  print_r($triple);
A single triple is structured as follows:
$triple = array(
  's' => array(
      'type' => 'uri|bnode', 
      'uri|bnode_id' => '...' // subject value
  ),
  'p' => '...', // property URI
  'o' => array(
      'type' => 'uri|bnode|literal', 
      'uri|bnode_id|val' => '...', // object value
      'dt' => '...', // datatype URI
      'lang' => '...', // language
  )
);

Methods

set_base ($base)
expects a URL for $base.
init()
resets the parser and re-processes the array of parameters that were passed when the parser was instantiated.
parse_web_file($url)
expects the URL of an RDF/XML document for $url and returns an array of triples or an error string. This method considers proxy settings and additional headers.
parse_file($path)
expects the path to an RDF/XML document for $path and returns an array of triples or an error string. $path can be a URL.
parse_data($data)
expects an RDF/XML string for $data and returns an array of triples or an error string.
get_triples()
returns the current array of triples.
get_target_encoding()
returns the value of the parser's target encoding option ("UTF-8", "ISO-8859-1" or "US-ASCII")
get_data()
returns the parsed RDF/XML if the parser was initialized with "save_data" => true
get_result_headers()
returns an array of HTTP headers if the method parse_web_file() was called