Mailing list ARC-DEV: Archives

Re: [arc-dev] SPARQL OPTIONAL Odd Behaviour?

From: Will Daniels 
Subject: Re: [arc-dev] SPARQL OPTIONAL Odd Behaviour?
Date: Wed, 25 Mar 2009 15:01:05 +0200


This is a multi-part message in MIME format.
--------------060108010709060106060801
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

O_o patches! Now there's an invitation I can never refuse ;)

It just so happens I have a day off today too...if I can just get ARC to 
COALESCE bindings on the same variable from sibling OPTIONALs I guess 
that would that suit? I hope it doesn't turn out to be more complicated 
than that because my head is quite fuzzy today :D

Cheers,
Will


Benjamin Nowack wrote:
> Hi Will,
>
> Yes, you're right, sibling optionals should be fixed, although ARC's 
> approach of de-normalizing graphs from the triples makes the SQL 
> generation often trickier than I would have hoped it'd be. Patches are
> welcome ;)
>
> I'm not sure if it works as expected, but for the time being, you 
> *could* perhaps try something along:
>
> SELECT * FROM <urn:/test/optional> WHERE {
>   ?id a owl:Ontology . 
>   OPTIONAL { 
>     ?id ?version_p ?version . 
>     FILTER(?version_p = owl:versionInfo || ?version_p = dc:date)
>   }
> }
>
> This would at least decrease the LEFT JOIN dependencies that ARC often
> gets wrong.
>
> HTH, and thx for the feedback,
> Benji
>
> --
> Benjamin Nowack
> http://bnode.org/
> http://semsol.com/
>
> On 25.03.2009 01:27:24, Will Daniels wrote:
>   
>> Hi Benji,
>>
>> Thanks for the prompt reply :) I think the relevant part of the spec is 6.1:
>>
>> "In an optional match, either the optional graph pattern matches a 
>> graph, thereby defining and adding bindings to one or more solutions, or 
>> it leaves a solution unchanged without adding any additional bindings."
>>
>> To my mind, "either" does not permit to do both here and this also seems 
>> most logical to me. But I'll certainly raise it on the W3C list for 
>> clarification since it does not say explicitly that the OPTIONAL pattern 
>> can/should not add additional *solutions* :P
>>
>> Anyway, I started digging into ARC to see what it is doing, and I 
>> started to see what you mean about the difficulty of implementing this 
>> in a single query:
>>
>> SELECT ...vars... FROM jos_rdf_triple
>> JOIN jos_rdf_g2t ...named graph...
>> LEFT JOIN jos_rdf_triple ...optional dc:date...
>> LEFT JOIN jos_rdf_g2t ...optional named graph...
>> WHERE ...a owl:Ontology...
>>
>> UNION ALL
>>
>> SELECT ...vars... FROM jos_rdf_triple
>> JOIN jos_rdf_g2t ...named graph...
>> LEFT JOIN jos_rdf_triple ...optional owl:versionInfo...
>> LEFT JOIN jos_rdf_g2t ...optional named graph...
>> WHERE ...a owl:Ontology...
>>
>> My immediate reaction was that perhaps a better way of doing UNIONs 
>> would be to map them into the join condition (as ORs) in a single [LEFT] 
>> JOIN for the group graph pattern. But then I realised that it would not 
>> work with all the other stuff like GraphGraphPatterns that are allowed 
>> in GroupGraphPattern needing to join g2t...so without using nested 
>> SELECTs I think you are right, that it would have to be a "Won't Fix" in 
>> the case that this deviates from the spec :(
>>
>> However, I almost forgot why I had written that query in the first 
>> place. I was actually going for, in the first instance, something more like:
>>
>> SELECT * FROM <urn:/test/optional> WHERE
>> { ?id a owl:Ontology . OPTIONAL { ?id owl:versionInfo ?version } . 
>> OPTIONAL { ?id dc:date ?version } }
>>
>> And that was the first error that I found (this one is definitely wrong) 
>> in ARC's mapping of SPARQL to the regular relational algebra of SQL, in 
>> that an unbound variable in the first OPTIONAL pattern here results in 
>> an unbound variable in the solution, rather than the correct RDF 
>> relational semantics whereby only a join *conflict* to the left prevents 
>> the second (or subsequent right-side OPTIONAL patterns) from binding 
>> ?version in the result. This one we should be able to fix I think!?
>>
>> Best regards,
>> Will
>>
>>
>>
>> Benjamin Nowack wrote:
>>     
>>> Hi Will,
>>>
>>> To be honest, I'm not sure if it's wrong or right. ARC tries to map 
>>> SPARQL to a single SQL query based on (My)SQL's relational algebra.
>>> This is not always possible, and may sometimes lead to unexpected 
>>> results. Putting a UNION into an OPTIONAL sounds like a good candidate
>>> for fuzzy results. Might be worth asking on public-sparql-dev what
>>> the correct results should look like, I'd be interested as well. If
>>> it's wrong, however, it'll most likely be a "Won't fix" in ARC's 
>>> SQL-based processor where OPTIONALs are simply translated to LEFT 
>>> JOINs.
>>>
>>> Regards,
>>> Benji
>>>
>>> [1] http://lists.w3.org/Archives/Public/public-sparql-dev/
>>>
>>> --
>>> Benjamin Nowack
>>> http://bnode.org/
>>> http://semsol.com/
>>>
>>>
>>> On 24.03.2009 01:54:57, Will Daniels wrote:
>>>   
>>>       
>>>> Hello!
>>>>
>>>> I'm finding some behaviour in ARC2's SPARQL implementation that doesn't 
>>>> look quite right to me. In certain formulations, an OPTIONAL pattern 
>>>> appears to cause duplication in the results such that where the optional 
>>>> pattern matches, I get two solutions, one extended with the optional 
>>>> variable, and one without it.
>>>>
>>>> Take for example:
>>>>
>>>>  LOAD <http://xmlns.com/foaf/0.1/> INTO <urn:/test/optional>
>>>>
>>>> Then run the query:
>>>>
>>>>  PREFIX dc: <http://purl.org/dc/elements/1.1/>
>>>>  PREFIX owl: <http://www.w3.org/2002/07/owl#>
>>>>
>>>>  SELECT * FROM <urn:/test/optional> WHERE
>>>>  { ?id a owl:Ontology . OPTIONAL { { ?id dc:date ?version } UNION { ?id 
>>>> owl:versionInfo ?version } } }
>>>>
>>>> You get:
>>>>
>>>>  0 =>
>>>>    array (
>>>>      'id' => 'http://xmlns.com/foaf/0.1/',
>>>>      'id type' => 'uri',
>>>>      'version' => '$Date: 2007-06-16 23:18:26 $',
>>>>      'version type' => 'literal',
>>>>    ),
>>>>  1 =>
>>>>    array (
>>>>      'id' => 'http://xmlns.com/foaf/0.1/',
>>>>      'id type' => 'uri',
>>>>    ),
>>>>
>>>> It seems that the unbound solution { ?id owl:versionInfo ?version } from 
>>>> the alternative UNION pattern is still being used to extend the 
>>>> solution, which to my interpretation of the spec is not right. I tried 
>>>> this out in Virtuoso before raising the issue here, and Virtuoso seems 
>>>> to agree with me...I only get the one solution.
>>>>
>>>> Or is there something I have misunderstood about it all?
>>>>
>>>> Thanks,
>>>> Will
>>>>
>>>>
>>>>     
>>>>         
>>>   
>>>       
>
>   

--------------060108010709060106060801
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>O_o patches! Now there's an invitation I can never refuse ;)<br>
<br>
It just so happens I have a day off today too...if I can just get ARC
to COALESCE bindings on the same variable from sibling OPTIONALs I
guess that would that suit? I hope it doesn't turn out to be more
complicated than that because my head is quite fuzzy today :D<br>
<br>
Cheers,<br>
Will<br>
<br>
</tt><br>
Benjamin Nowack wrote:
<blockquote cite="mid:PM-GA.20090325112738.C545B.1.1D@semsol.com"
 type="cite">
  <pre wrap="">
Hi Will,

Yes, you're right, sibling optionals should be fixed, although ARC's 
approach of de-normalizing graphs from the triples makes the SQL 
generation often trickier than I would have hoped it'd be. Patches are
welcome ;)

I'm not sure if it works as expected, but for the time being, you 
*could* perhaps try something along:

SELECT * FROM &lt;urn:/test/optional&gt; WHERE {
  ?id a owl:Ontology . 
  OPTIONAL { 
    ?id ?version_p ?version . 
    FILTER(?version_p = owl:versionInfo || ?version_p = dc:date)
  }
}

This would at least decrease the LEFT JOIN dependencies that ARC often
gets wrong.

HTH, and thx for the feedback,
Benji

--
Benjamin Nowack
<a class="moz-txt-link-freetext" href="http://bnode.org/">http://bnode.org/</a>
<a class="moz-txt-link-freetext" href="http://semsol.com/">http://semsol.com/</a>

On 25.03.2009 01:27:24, Will Daniels wrote:
  </pre>
  <blockquote type="cite">
    <pre wrap="">Hi Benji,

Thanks for the prompt reply :) I think the relevant part of the spec is 6.1:

"In an optional match, either the optional graph pattern matches a 
graph, thereby defining and adding bindings to one or more solutions, or 
it leaves a solution unchanged without adding any additional bindings."

To my mind, "either" does not permit to do both here and this also seems 
most logical to me. But I'll certainly raise it on the W3C list for 
clarification since it does not say explicitly that the OPTIONAL pattern 
can/should not add additional *solutions* :P

Anyway, I started digging into ARC to see what it is doing, and I 
started to see what you mean about the difficulty of implementing this 
in a single query:

SELECT ...vars... FROM jos_rdf_triple
JOIN jos_rdf_g2t ...named graph...
LEFT JOIN jos_rdf_triple ...optional dc:date...
LEFT JOIN jos_rdf_g2t ...optional named graph...
WHERE ...a owl:Ontology...

UNION ALL

SELECT ...vars... FROM jos_rdf_triple
JOIN jos_rdf_g2t ...named graph...
LEFT JOIN jos_rdf_triple ...optional owl:versionInfo...
LEFT JOIN jos_rdf_g2t ...optional named graph...
WHERE ...a owl:Ontology...

My immediate reaction was that perhaps a better way of doing UNIONs 
would be to map them into the join condition (as ORs) in a single [LEFT] 
JOIN for the group graph pattern. But then I realised that it would not 
work with all the other stuff like GraphGraphPatterns that are allowed 
in GroupGraphPattern needing to join g2t...so without using nested 
SELECTs I think you are right, that it would have to be a "Won't Fix" in 
the case that this deviates from the spec :(

However, I almost forgot why I had written that query in the first 
place. I was actually going for, in the first instance, something more like:

SELECT * FROM &lt;urn:/test/optional&gt; WHERE
{ ?id a owl:Ontology . OPTIONAL { ?id owl:versionInfo ?version } . 
OPTIONAL { ?id dc:date ?version } }

And that was the first error that I found (this one is definitely wrong) 
in ARC's mapping of SPARQL to the regular relational algebra of SQL, in 
that an unbound variable in the first OPTIONAL pattern here results in 
an unbound variable in the solution, rather than the correct RDF 
relational semantics whereby only a join *conflict* to the left prevents 
the second (or subsequent right-side OPTIONAL patterns) from binding 
?version in the result. This one we should be able to fix I think!?

Best regards,
Will



Benjamin Nowack wrote:
    </pre>
    <blockquote type="cite">
      <pre wrap="">Hi Will,

To be honest, I'm not sure if it's wrong or right. ARC tries to map 
SPARQL to a single SQL query based on (My)SQL's relational algebra.
This is not always possible, and may sometimes lead to unexpected 
results. Putting a UNION into an OPTIONAL sounds like a good candidate
for fuzzy results. Might be worth asking on public-sparql-dev what
the correct results should look like, I'd be interested as well. If
it's wrong, however, it'll most likely be a "Won't fix" in ARC's 
SQL-based processor where OPTIONALs are simply translated to LEFT 
JOINs.

Regards,
Benji

[1] <a class="moz-txt-link-freetext" href="http://lists.w3.org/Archives/Public/public-sparql-dev/">http://lists.w3.org/Archives/Public/public-sparql-dev/</a>

--
Benjamin Nowack
<a class="moz-txt-link-freetext" href="http://bnode.org/">http://bnode.org/</a>
<a class="moz-txt-link-freetext" href="http://semsol.com/">http://semsol.com/</a>


On 24.03.2009 01:54:57, Will Daniels wrote:
  
      </pre>
      <blockquote type="cite">
        <pre wrap="">Hello!

I'm finding some behaviour in ARC2's SPARQL implementation that doesn't 
look quite right to me. In certain formulations, an OPTIONAL pattern 
appears to cause duplication in the results such that where the optional 
pattern matches, I get two solutions, one extended with the optional 
variable, and one without it.

Take for example:

 LOAD <a class="moz-txt-link-rfc2396E" href="http://xmlns.com/foaf/0.1/">&lt;http://xmlns.com/foaf/0.1/&gt;</a> INTO &lt;urn:/test/optional&gt;

Then run the query:

 PREFIX dc: <a class="moz-txt-link-rfc2396E" href="http://purl.org/dc/elements/1.1/">&lt;http://purl.org/dc/elements/1.1/&gt;</a>
 PREFIX owl: <a class="moz-txt-link-rfc2396E" href="http://www.w3.org/2002/07/owl#">&lt;http://www.w3.org/2002/07/owl#&gt;</a>

 SELECT * FROM &lt;urn:/test/optional&gt; WHERE
 { ?id a owl:Ontology . OPTIONAL { { ?id dc:date ?version } UNION { ?id 
owl:versionInfo ?version } } }

You get:

 0 =&gt;
   array (
     'id' =&gt; '<a class="moz-txt-link-freetext" href="http://xmlns.com/foaf/0.1/">http://xmlns.com/foaf/0.1/</a>',
     'id type' =&gt; 'uri',
     'version' =&gt; '$Date: 2007-06-16 23:18:26 $',
     'version type' =&gt; 'literal',
   ),
 1 =&gt;
   array (
     'id' =&gt; '<a class="moz-txt-link-freetext" href="http://xmlns.com/foaf/0.1/">http://xmlns.com/foaf/0.1/</a>',
     'id type' =&gt; 'uri',
   ),

It seems that the unbound solution { ?id owl:versionInfo ?version } from 
the alternative UNION pattern is still being used to extend the 
solution, which to my interpretation of the spec is not right. I tried 
this out in Virtuoso before raising the issue here, and Virtuoso seems 
to agree with me...I only get the one solution.

Or is there something I have misunderstood about it all?

Thanks,
Will


    
        </pre>
      </blockquote>
      <pre wrap="">  
      </pre>
    </blockquote>
  </blockquote>
  <pre wrap=""><!---->
  </pre>
</blockquote>
</body>
</html>

--------------060108010709060106060801--

""" ;
         ns1:returnPath "<mail@willdaniels.co.uk>" ;
         ns1:xOriginalTo "arc-dev@semsol.org" ;
         ns1:deliveredTo "web11p1@p15192371.pureserver.info" ;
         ns1:received """from [192.168.1.100] (unknown [77.49.238.129])
	by smtp1.servage.net (Postfix) with ESMTP id 6A49FF98179
	for <arc-dev@semsol.org>; Wed, 25 Mar 2009 12:59:26 +0000 (GMT)""" ;
         ns1:messageID "<49CA2B11.4000800@willdaniels.co.uk>" ;
         ns1:date "Wed, 25 Mar 2009 15:01:05 +0200" ;
         ns1:from "Will Daniels <mail@willdaniels.co.uk>" ;
         ns1:userAgent "Thunderbird 2.0.0.21 (X11/20090318)" ;
         ns1:mIMEVersion "1.0" ;
         ns1:to "arc-dev <arc-dev@semsol.org>" ;
         ns1:subject "Re: [arc-dev] SPARQL OPTIONAL Odd Behaviour?" ;
         ns1:references "<49C82151.60207@willdaniels.co.uk> <PM-GA.20090324084038.D4C06.1.1D@semsol.com> <49C96C5C.9090904@willdaniels.co.uk> <PM-GA.20090325112738.C545B.1.1D@semsol.com>" ;
         ns1:inReplyTo "<PM-GA.20090325112738.C545B.1.1D@semsol.com>" ;
         ns1:xEnigmailVersion "0.95.6" ;
         ns1:contentType '''multipart/alternative;
 boundary="------------060108010709060106060801"''' ;
         ns1:xSpamCheckerVersion """SpamAssassin 2.64 (2004-01-11) on 
	p15192371.pureserver.info""" ;
         ns1:xSpamLevel "" ;
         ns1:xSpamStatus """No, hits=-3.1 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE 
	autolearn=ham version=2.64