ARC Bugs and Feature Requests

Open Issues

  • post-consolidation queries sometimes still contain smushed-away resources (suggested fix: split the o2val/s2val tables)

Closed Bugs

  • With MySQL 4.0.x, the temporary tables are not always cleaned up properly, which can lead to long-running processes [fixed, testing]
  • the ID/Value lookup is inefficient and leads to full table scans (by Peter Krantz) [fixed]
  • The SemHTMLParser should replace HTML entities with actual characters [fixed via html_entity_decode, if available]
  • The RDFa Extractor types multi-line plain literals as rdf:XMLLiterals [fixed]
  • the RDFa extractor creates more triples than wanted [fixed (sort-of), removed RDFa from the default extractors]
  • The TableManager class has problems with newer MySQL versions (no default allowed in TEXT) [fixed]
  • Make auto-adding of arc:label triples in DESCRIBE queries a config option (default: off) [fixed]
  • The Turtle Serializer has problems with literals that contain a mix of single and double quotation marks [fixed, testing]
  • RDF Collections are not parsed correctly [fixed]
  • The IRC logger has problems with multiple messages within a single second [fixed]
  • mysql_real_escape_string is sometimes called w/o an active DB connection, which makes it fail [fixed]

See the release notes for changes in the latest revisions.

Recently implemented

  • Support for predefined namespaces in the SPARQL Parser (by Michael Hausenblas) [done]
  • Easy drop-in directory for SPARQL+ triggers (à la "onLOAD: do foo") (by Dan Brickley) [done]
  • Support for POSTs in the WebReader class (by Morten Frederiksen) [done]
  • A script for file updates via SPARQL INSERTor DELETE (by Tim Berners-Lee) [done: Data Wiki Plugin]
  • An ARC wordpress plugin (by Dan Brickley) [done]
  • An N-Triples parser or an enhanced Turtle parser with support for N-Triples' unicode escape syntax (by Morten Frederiksen) [done, without un-escaping the escaped chars, though]
  • Support for ARC "plugins" (by Morten Frederiksen) [done]

Feature Requests

  • Split-up the triple table space for improved scalability and performance
  • Support for mysqli
  • Speed-up the LOADers by consuming ARC's internal structures directly (i.e. w/o the Turtle round-trip) (by Morten Frederiksen)
  • Definition of a default namespace in the RDF/XML Serializer (by Ivan Garcia Tora)
  • Stand-alone microformats extractor(s)
  • In-Memory SPARQLing
  • Add database (not just table) creation to setUp method (by Tuukka Hastrup)