Mailing list ARC-DEV: Archives

Restricting Endpoint Graphs?

From: Will Daniels 
Subject: Restricting Endpoint Graphs?
Date: Thu, 10 Sep 2009 00:06:01 +0300


Hi Benji et al.,

I want to have a public SPARQL Endpoint (for SIOC data) at my site, but
there are some triples that I want to keep in the same store, but do not
want to make available to all and sundry via the endpoint (basically
people's email and perhaps also ip addresses).

Having scoured the ARC documentation, I could not see any existing
mechanism by which to achieve this, other than to partition the data
into separate stores. But I do not want to do that because it would make
things unnecessarily awkward for my application code. So I have been
trying to think of a simpler solution.

Now, trying to filter out individual triple patterns did not seem like a
sensible way to go, so I have firstly isolated the "private" data in a
specific graph. And so the natural requirement would then be to have a
way to exclude a certain graph from the endpoint queries.

However, on looking at the ARC2_StoreEndpoint class I'm thinking that
the opposite approach would in fact be easier - to restrict the endpoint
to querying a single "public" graph instead of excluding a "private"
one. It seems (and a quick test appears to validate the theory) that all
I would need to do then is to rewrite the dataset in adjustQueryInfos,
something like this:

  function adjustQueryInfos($infos) {
    /* limit */
    if ($max_l = $this->v('endpoint_max_limit', 0, $this->a)) {
      if ($this->v('limit', $max_l + 1, $infos['query']) > $max_l) {
        $infos['query']['limit'] = $max_l;
      }
    }
>>>>>
    /* wgd: new option restricting to single graph */
    if($restrict = $this->v('endpoint_single_graph', '', $this->a)) {
      $ngs = array($restrict); $dgs = array();
    }
    else {
      /* default-graph-uri / named-graph-uri */
      $dgs = $this->p('default-graph-uri', '', 1);
      $ngs = $this->p('named-graph-uri', '', 1);
    }
<<<<<
    if (count(array_merge($dgs, $ngs))) {
      $ds = array();
      foreach ($dgs as $g) {
.....
    return $infos;
  }

(NB: You could alternatively restrict to multiple graphs this way too.)

But I don't know anything about whatever specifications may govern
SPARQL Endpoints, and thus whether causing it to behave like this (i.e.
potentially answering a different query than was submitted) would be an
issue.

Furthermore, I'm anxious to hear if anybody has better suggestions for
how to deal with the requirement in the first place.

Your thoughts please...?

Cheers,
Will