Setup
An ARC Store is instantiated like any other component:
/* ARC2 static class inclusion */
include_once('path/to/arc/ARC2.php');
/* configuration */
$config = array(
/* db */
'db_host' => 'localhost', /* optional, default is localhost */
'db_name' => 'my_db',
'db_user' => 'user',
'db_pwd' => 'secret',
/* store name (= table prefix) */
'store_name' => 'my_store',
);
/* instantiation */
$store = ARC2::getStore($config);
Creating the MySQL tables
if (!$store->isSetUp()) {
$store->setUp();
}
Running Queries
$q = 'SELECT ...';
$rs = $store->query($q);
if (!$store->getErrors()) {
$rows = $rs['result']['rows'];
...
}
ARC supports
standard SPARQL queries as well as
SPARQL+ for write operations.
Result formats
The default
query() method returns an associative array with two keys: "query_time" and "result". The former tells how long the SPARQL engine needed to process the query (excluding parse time), the latter contains query-dependent sub-structures. The
query() method also accepts a second parameter to specify a result format. Examples are listed below:
-
query('SELECT ?fname ...')
$duration = $rs['query_time'];
$rows = $rs['result']['rows'];
$row = $rows[0];
$val = $row['fname'];
-
query('SELECT ?fname ...', 'rows')
-
query('SELECT ?fname ...', 'row')
-
query('ASK ...')
$duration = $rs['query_time'];
$true_or_false = $rs['result'];
-
query('ASK ...', 'raw')
-
query('DESCRIBE <http://example.com/>...')
$duration = $rs['query_time'];
$index = $rs['result'];
$res = $index['http://example.com/'];
The index format is described in Internal Structures.
-
query('DESCRIBE <http://example.com/>...', 'raw')
-
query('CONSTRUCT ...') works analogue to DESCRIBE
-
query('LOAD ...')
$duration = $rs['query_time'];
$added_triples = $rs['result']['t_count'];
$load_time = $rs['result']['load_time'];
$index_update_time = $rs['result']['index_update_time'];
-
query('LOAD ...', 'raw')
$added_triples = $rs['t_count'];
$load_time = $rs['load_time'];
$index_update_time = $rs['index_update_time'];
-
query('INSERT ...') works analogue to LOAD
-
query('DELETE ...')
$duration = $rs['query_time'];
$removed_triples = $rs['result']['t_count'];
$delete_time = $rs['result']['delete_time'];
$index_update_time = $rs['result']['index_update_time'];
-
query('DELETE ...', 'raw')
$removed_triples = $rs['t_count'];
$delete_time = $rs['delete_time'];
$index_update_time = $rs['index_update_time'];
-
query('DUMP) creates (and outputs) a store backup (see dump method below), the result format parameter has no effect
Advanced query parameters
Besides a
query and
result_format, the
query() method accepts two other parameters:
query_base and whether to
keep_bnode_ids.
- "query_base" (parameter #3, default: empty) allows you to specify a base for the query (e.g. if the query contains relative paths, but no BASE).
- "keep_bnode_ids" (parameter #4, default: false) is an advanced trigger that enables deletes and updates of blank nodes. ARC supports bnode identification for read operations, i.e. bnode IDs returned by a SELECT can be used in successive queries, if masked as URIs (e.g. <_:bn27>). Likewise, ARC can be told to write bnodes to the store without changing their IDs:
$q1 = 'DELETE FROM <...> { <_:methuselah> ex:age ?age . }';
$q2 = 'INSERT INTO <...> { <_:methuselah> ex:age 969 . }';
$store->query($q1, 'raw', '', true);
$store->query($q2, 'raw', '', true);
Other methods
-
reset()
All tables are emptied.
-
drop()
All tables are deleted.
-
insert($doc, $g, $keep_bnode_ids = 0)
A convenience method. $doc can be an ARC structure, or an ARC-supported RDF format (including HTML), $g is the target graph URI, $keep_bnode_ids is explained in the paragraph above.
-
dump()
Creates a SPOG document from all quads in the store. This method can be used for streamed store backups.
-
createBackup($path, $q = '')
Saves a SPOG file that either contains a complete store dump, or triples/quads from a custom, SPO(G)-compliant SELECT query (via the $q parameter).
-
replicateTo($name)
Creates a new store and replicates all tables and quads to it.
-
renameTo($name)
Renames the store's underlying database tables.
-
optimizeTables($level = 2) /* 1: triple + g2t, 2: triple + *2val, 3: all tables */
Defragments the MySQL tables. This method is automatically called every ~50th LOAD or DELETE query. You can also call it explicitly, though, when queries are getting slower than they should due to store updates.
-
extendColumns()
Changes the table column types from MEDIUMINT to INT for scaling beyond 16M triples. Called automatically by RDF loader.
Advanced configuration options
-
store_indexes (default: array('sp (s,p)', 'os (o,s)', 'po (p,o)'))
Custom MySQL triple table indexes.
-
store_write_buffer (default: 2500)
This option let's you set the batch size of triples written to the MySQL tables via SQL.
-
store_engine_type (default: MyISAM)
This option let's you set the MySQL engine type used by ARC, in case your application environment works better with InnoDB, or maybe even MEMORY.
-
store_strip_mb_comp_str (default: false)
If you encounter UTF-8/multibyte-related MySQL errors on your system during INSERTs or LOADs, you can try setting this flag to "1". Multibyte comparisons may then return inaccurate results, but the errors should go away.
-
max_errors (default: 25)
This option let's you set the maximum number of errors before ARC will stop proceeding (e.g. during LOADs or streaming parsing).
Querying remote SPARQL endpoints
ARC provides a dedicated "
RemoteStore" component for running queries against Web-accessible SPARQL endpoints.