Re: [arc-dev] SPARQL OPTIONAL Odd Behaviour?
From: Will Daniels
Subject: Re: [arc-dev] SPARQL OPTIONAL Odd Behaviour?
Date: Wed, 25 Mar 2009 15:01:05 +0200
This is a multi-part message in MIME format.
--------------060108010709060106060801
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
O_o patches! Now there's an invitation I can never refuse ;)
It just so happens I have a day off today too...if I can just get ARC to
COALESCE bindings on the same variable from sibling OPTIONALs I guess
that would that suit? I hope it doesn't turn out to be more complicated
than that because my head is quite fuzzy today :D
Cheers,
Will
Benjamin Nowack wrote:
> Hi Will,
>
> Yes, you're right, sibling optionals should be fixed, although ARC's
> approach of de-normalizing graphs from the triples makes the SQL
> generation often trickier than I would have hoped it'd be. Patches are
> welcome ;)
>
> I'm not sure if it works as expected, but for the time being, you
> *could* perhaps try something along:
>
> SELECT * FROM <urn:/test/optional> WHERE {
> ?id a owl:Ontology .
> OPTIONAL {
> ?id ?version_p ?version .
> FILTER(?version_p = owl:versionInfo || ?version_p = dc:date)
> }
> }
>
> This would at least decrease the LEFT JOIN dependencies that ARC often
> gets wrong.
>
> HTH, and thx for the feedback,
> Benji
>
> --
> Benjamin Nowack
> http://bnode.org/
> http://semsol.com/
>
> On 25.03.2009 01:27:24, Will Daniels wrote:
>
>> Hi Benji,
>>
>> Thanks for the prompt reply :) I think the relevant part of the spec is 6.1:
>>
>> "In an optional match, either the optional graph pattern matches a
>> graph, thereby defining and adding bindings to one or more solutions, or
>> it leaves a solution unchanged without adding any additional bindings."
>>
>> To my mind, "either" does not permit to do both here and this also seems
>> most logical to me. But I'll certainly raise it on the W3C list for
>> clarification since it does not say explicitly that the OPTIONAL pattern
>> can/should not add additional *solutions* :P
>>
>> Anyway, I started digging into ARC to see what it is doing, and I
>> started to see what you mean about the difficulty of implementing this
>> in a single query:
>>
>> SELECT ...vars... FROM jos_rdf_triple
>> JOIN jos_rdf_g2t ...named graph...
>> LEFT JOIN jos_rdf_triple ...optional dc:date...
>> LEFT JOIN jos_rdf_g2t ...optional named graph...
>> WHERE ...a owl:Ontology...
>>
>> UNION ALL
>>
>> SELECT ...vars... FROM jos_rdf_triple
>> JOIN jos_rdf_g2t ...named graph...
>> LEFT JOIN jos_rdf_triple ...optional owl:versionInfo...
>> LEFT JOIN jos_rdf_g2t ...optional named graph...
>> WHERE ...a owl:Ontology...
>>
>> My immediate reaction was that perhaps a better way of doing UNIONs
>> would be to map them into the join condition (as ORs) in a single [LEFT]
>> JOIN for the group graph pattern. But then I realised that it would not
>> work with all the other stuff like GraphGraphPatterns that are allowed
>> in GroupGraphPattern needing to join g2t...so without using nested
>> SELECTs I think you are right, that it would have to be a "Won't Fix" in
>> the case that this deviates from the spec :(
>>
>> However, I almost forgot why I had written that query in the first
>> place. I was actually going for, in the first instance, something more like:
>>
>> SELECT * FROM <urn:/test/optional> WHERE
>> { ?id a owl:Ontology . OPTIONAL { ?id owl:versionInfo ?version } .
>> OPTIONAL { ?id dc:date ?version } }
>>
>> And that was the first error that I found (this one is definitely wrong)
>> in ARC's mapping of SPARQL to the regular relational algebra of SQL, in
>> that an unbound variable in the first OPTIONAL pattern here results in
>> an unbound variable in the solution, rather than the correct RDF
>> relational semantics whereby only a join *conflict* to the left prevents
>> the second (or subsequent right-side OPTIONAL patterns) from binding
>> ?version in the result. This one we should be able to fix I think!?
>>
>> Best regards,
>> Will
>>
>>
>>
>> Benjamin Nowack wrote:
>>
>>> Hi Will,
>>>
>>> To be honest, I'm not sure if it's wrong or right. ARC tries to map
>>> SPARQL to a single SQL query based on (My)SQL's relational algebra.
>>> This is not always possible, and may sometimes lead to unexpected
>>> results. Putting a UNION into an OPTIONAL sounds like a good candidate
>>> for fuzzy results. Might be worth asking on public-sparql-dev what
>>> the correct results should look like, I'd be interested as well. If
>>> it's wrong, however, it'll most likely be a "Won't fix" in ARC's
>>> SQL-based processor where OPTIONALs are simply translated to LEFT
>>> JOINs.
>>>
>>> Regards,
>>> Benji
>>>
>>> [1] http://lists.w3.org/Archives/Public/public-sparql-dev/
>>>
>>> --
>>> Benjamin Nowack
>>> http://bnode.org/
>>> http://semsol.com/
>>>
>>>
>>> On 24.03.2009 01:54:57, Will Daniels wrote:
>>>
>>>
>>>> Hello!
>>>>
>>>> I'm finding some behaviour in ARC2's SPARQL implementation that doesn't
>>>> look quite right to me. In certain formulations, an OPTIONAL pattern
>>>> appears to cause duplication in the results such that where the optional
>>>> pattern matches, I get two solutions, one extended with the optional
>>>> variable, and one without it.
>>>>
>>>> Take for example:
>>>>
>>>> LOAD <http://xmlns.com/foaf/0.1/> INTO <urn:/test/optional>
>>>>
>>>> Then run the query:
>>>>
>>>> PREFIX dc: <http://purl.org/dc/elements/1.1/>
>>>> PREFIX owl: <http://www.w3.org/2002/07/owl#>
>>>>
>>>> SELECT * FROM <urn:/test/optional> WHERE
>>>> { ?id a owl:Ontology . OPTIONAL { { ?id dc:date ?version } UNION { ?id
>>>> owl:versionInfo ?version } } }
>>>>
>>>> You get:
>>>>
>>>> 0 =>
>>>> array (
>>>> 'id' => 'http://xmlns.com/foaf/0.1/',
>>>> 'id type' => 'uri',
>>>> 'version' => '$Date: 2007-06-16 23:18:26 $',
>>>> 'version type' => 'literal',
>>>> ),
>>>> 1 =>
>>>> array (
>>>> 'id' => 'http://xmlns.com/foaf/0.1/',
>>>> 'id type' => 'uri',
>>>> ),
>>>>
>>>> It seems that the unbound solution { ?id owl:versionInfo ?version } from
>>>> the alternative UNION pattern is still being used to extend the
>>>> solution, which to my interpretation of the spec is not right. I tried
>>>> this out in Virtuoso before raising the issue here, and Virtuoso seems
>>>> to agree with me...I only get the one solution.
>>>>
>>>> Or is there something I have misunderstood about it all?
>>>>
>>>> Thanks,
>>>> Will
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>
>
--------------060108010709060106060801
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>O_o patches! Now there's an invitation I can never refuse ;)<br>
<br>
It just so happens I have a day off today too...if I can just get ARC
to COALESCE bindings on the same variable from sibling OPTIONALs I
guess that would that suit? I hope it doesn't turn out to be more
complicated than that because my head is quite fuzzy today :D<br>
<br>
Cheers,<br>
Will<br>
<br>
</tt><br>
Benjamin Nowack wrote:
<blockquote cite="mid:PM-GA.20090325112738.C545B.1.1D@semsol.com"
type="cite">
<pre wrap="">
Hi Will,
Yes, you're right, sibling optionals should be fixed, although ARC's
approach of de-normalizing graphs from the triples makes the SQL
generation often trickier than I would have hoped it'd be. Patches are
welcome ;)
I'm not sure if it works as expected, but for the time being, you
*could* perhaps try something along:
SELECT * FROM <urn:/test/optional> WHERE {
?id a owl:Ontology .
OPTIONAL {
?id ?version_p ?version .
FILTER(?version_p = owl:versionInfo || ?version_p = dc:date)
}
}
This would at least decrease the LEFT JOIN dependencies that ARC often
gets wrong.
HTH, and thx for the feedback,
Benji
--
Benjamin Nowack
<a class="moz-txt-link-freetext" href="http://bnode.org/">http://bnode.org/</a>
<a class="moz-txt-link-freetext" href="http://semsol.com/">http://semsol.com/</a>
On 25.03.2009 01:27:24, Will Daniels wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi Benji,
Thanks for the prompt reply :) I think the relevant part of the spec is 6.1:
"In an optional match, either the optional graph pattern matches a
graph, thereby defining and adding bindings to one or more solutions, or
it leaves a solution unchanged without adding any additional bindings."
To my mind, "either" does not permit to do both here and this also seems
most logical to me. But I'll certainly raise it on the W3C list for
clarification since it does not say explicitly that the OPTIONAL pattern
can/should not add additional *solutions* :P
Anyway, I started digging into ARC to see what it is doing, and I
started to see what you mean about the difficulty of implementing this
in a single query:
SELECT ...vars... FROM jos_rdf_triple
JOIN jos_rdf_g2t ...named graph...
LEFT JOIN jos_rdf_triple ...optional dc:date...
LEFT JOIN jos_rdf_g2t ...optional named graph...
WHERE ...a owl:Ontology...
UNION ALL
SELECT ...vars... FROM jos_rdf_triple
JOIN jos_rdf_g2t ...named graph...
LEFT JOIN jos_rdf_triple ...optional owl:versionInfo...
LEFT JOIN jos_rdf_g2t ...optional named graph...
WHERE ...a owl:Ontology...
My immediate reaction was that perhaps a better way of doing UNIONs
would be to map them into the join condition (as ORs) in a single [LEFT]
JOIN for the group graph pattern. But then I realised that it would not
work with all the other stuff like GraphGraphPatterns that are allowed
in GroupGraphPattern needing to join g2t...so without using nested
SELECTs I think you are right, that it would have to be a "Won't Fix" in
the case that this deviates from the spec :(
However, I almost forgot why I had written that query in the first
place. I was actually going for, in the first instance, something more like:
SELECT * FROM <urn:/test/optional> WHERE
{ ?id a owl:Ontology . OPTIONAL { ?id owl:versionInfo ?version } .
OPTIONAL { ?id dc:date ?version } }
And that was the first error that I found (this one is definitely wrong)
in ARC's mapping of SPARQL to the regular relational algebra of SQL, in
that an unbound variable in the first OPTIONAL pattern here results in
an unbound variable in the solution, rather than the correct RDF
relational semantics whereby only a join *conflict* to the left prevents
the second (or subsequent right-side OPTIONAL patterns) from binding
?version in the result. This one we should be able to fix I think!?
Best regards,
Will
Benjamin Nowack wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi Will,
To be honest, I'm not sure if it's wrong or right. ARC tries to map
SPARQL to a single SQL query based on (My)SQL's relational algebra.
This is not always possible, and may sometimes lead to unexpected
results. Putting a UNION into an OPTIONAL sounds like a good candidate
for fuzzy results. Might be worth asking on public-sparql-dev what
the correct results should look like, I'd be interested as well. If
it's wrong, however, it'll most likely be a "Won't fix" in ARC's
SQL-based processor where OPTIONALs are simply translated to LEFT
JOINs.
Regards,
Benji
[1] <a class="moz-txt-link-freetext" href="http://lists.w3.org/Archives/Public/public-sparql-dev/">http://lists.w3.org/Archives/Public/public-sparql-dev/</a>
--
Benjamin Nowack
<a class="moz-txt-link-freetext" href="http://bnode.org/">http://bnode.org/</a>
<a class="moz-txt-link-freetext" href="http://semsol.com/">http://semsol.com/</a>
On 24.03.2009 01:54:57, Will Daniels wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hello!
I'm finding some behaviour in ARC2's SPARQL implementation that doesn't
look quite right to me. In certain formulations, an OPTIONAL pattern
appears to cause duplication in the results such that where the optional
pattern matches, I get two solutions, one extended with the optional
variable, and one without it.
Take for example:
LOAD <a class="moz-txt-link-rfc2396E" href="http://xmlns.com/foaf/0.1/"><http://xmlns.com/foaf/0.1/></a> INTO <urn:/test/optional>
Then run the query:
PREFIX dc: <a class="moz-txt-link-rfc2396E" href="http://purl.org/dc/elements/1.1/"><http://purl.org/dc/elements/1.1/></a>
PREFIX owl: <a class="moz-txt-link-rfc2396E" href="http://www.w3.org/2002/07/owl#"><http://www.w3.org/2002/07/owl#></a>
SELECT * FROM <urn:/test/optional> WHERE
{ ?id a owl:Ontology . OPTIONAL { { ?id dc:date ?version } UNION { ?id
owl:versionInfo ?version } } }
You get:
0 =>
array (
'id' => '<a class="moz-txt-link-freetext" href="http://xmlns.com/foaf/0.1/">http://xmlns.com/foaf/0.1/</a>',
'id type' => 'uri',
'version' => '$Date: 2007-06-16 23:18:26 $',
'version type' => 'literal',
),
1 =>
array (
'id' => '<a class="moz-txt-link-freetext" href="http://xmlns.com/foaf/0.1/">http://xmlns.com/foaf/0.1/</a>',
'id type' => 'uri',
),
It seems that the unbound solution { ?id owl:versionInfo ?version } from
the alternative UNION pattern is still being used to extend the
solution, which to my interpretation of the spec is not right. I tried
this out in Virtuoso before raising the issue here, and Virtuoso seems
to agree with me...I only get the one solution.
Or is there something I have misunderstood about it all?
Thanks,
Will
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
</blockquote>
<pre wrap=""><!---->
</pre>
</blockquote>
</body>
</html>
--------------060108010709060106060801--
""" ;
ns1:returnPath "<mail@willdaniels.co.uk>" ;
ns1:xOriginalTo "arc-dev@semsol.org" ;
ns1:deliveredTo "web11p1@p15192371.pureserver.info" ;
ns1:received """from [192.168.1.100] (unknown [77.49.238.129])
by smtp1.servage.net (Postfix) with ESMTP id 6A49FF98179
for <arc-dev@semsol.org>; Wed, 25 Mar 2009 12:59:26 +0000 (GMT)""" ;
ns1:messageID "<49CA2B11.4000800@willdaniels.co.uk>" ;
ns1:date "Wed, 25 Mar 2009 15:01:05 +0200" ;
ns1:from "Will Daniels <mail@willdaniels.co.uk>" ;
ns1:userAgent "Thunderbird 2.0.0.21 (X11/20090318)" ;
ns1:mIMEVersion "1.0" ;
ns1:to "arc-dev <arc-dev@semsol.org>" ;
ns1:subject "Re: [arc-dev] SPARQL OPTIONAL Odd Behaviour?" ;
ns1:references "<49C82151.60207@willdaniels.co.uk> <PM-GA.20090324084038.D4C06.1.1D@semsol.com> <49C96C5C.9090904@willdaniels.co.uk> <PM-GA.20090325112738.C545B.1.1D@semsol.com>" ;
ns1:inReplyTo "<PM-GA.20090325112738.C545B.1.1D@semsol.com>" ;
ns1:xEnigmailVersion "0.95.6" ;
ns1:contentType '''multipart/alternative;
boundary="------------060108010709060106060801"''' ;
ns1:xSpamCheckerVersion """SpamAssassin 2.64 (2004-01-11) on
p15192371.pureserver.info""" ;
ns1:xSpamLevel "" ;
ns1:xSpamStatus """No, hits=-3.1 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE
autolearn=ham version=2.64