Channel #semsol: Logs

This is a public chat log generated from the #semsol IRC channel.

09:40:21 kwijibo: arc-bot: help
12:32:28 kwijibo: hey bengee, how goes?
12:32:45 bengee: relaxed :)
12:32:52 kwijibo: :)
12:32:59 kwijibo: nice weather?
12:33:04 bengee: yeah
12:33:29 bengee: spent the last 2 days on the balcony
12:33:46 kwijibo: not too bad here either - yesterday was the warmest this year so far I think
12:33:57 kwijibo: lightning storm last night though
12:34:32 bengee: yeah, things may change here, too, in a day or two
12:35:36 kwijibo: I was wondering - should I test a fix to the RDF/XML parser, or have you done it already?
12:35:59 kwijibo: re the entity thing last week
12:36:20 bengee: I was wondering if the current behaviour wasn't correct, perhaps
12:36:49 bengee: i.e. that entities shoul be decoded ()
12:36:58 bengee: i.e. that entities shoul be decoded (unless nested)
12:37:27 kwijibo: how come?
12:37:36 bengee: as e.g. other rdf formats such as turtle use different escape mechanisms
12:38:02 bengee: i.e. that the canonical/internal representation should be entity-free
12:38:35 kwijibo: nah, I'm not convinced
12:39:34 kwijibo: i think if it's a plain literal, "&lt;" is diffferent from "<"
12:40:13 kwijibo: kwijibo rethinks
12:41:03 tuukkah: =) in xml file, &lt; is how you write literal <
12:42:56 tuukkah: actually, just wanted to leave feedback that i would've expected more talk about quads in arc2 documentation
12:45:04 bengee: the store and sparql stufff is based on quads, the internal structures are triple indexes/arrays
12:46:17 bengee: so you don't get in direct touch with quads too much
12:46:42 tuukkah: http://arc.semsol.org/features says "2 internal structures: resource-centric processing, triple/quad-centric processing"
12:47:22 bengee: hmm, who wrote that?
12:47:24 bengee: ;)
12:48:41 tuukkah: maybe the above point about RDF storage should say it stores quads in MySQL
12:49:03 tuukkah: or does arc2 support other databases?
12:49:26 bengee: no
12:50:00 bengee: bengee changes "triple/quad" to "statement"
12:50:30 bengee: the internal structures enable quad-based processing, though
12:51:02 bengee: but there is no in-memory store that would dirctly provide an API yet
12:52:41 bengee: bengee adds "using MySQL" to Storage list item
12:52:48 bengee: thx for the suggestion
12:53:23 tuukkah: would be nice to have a mention of quads too, since that was the only one on the whole site
12:54:06 tuukkah: or are quads always implied by sparql?
12:54:14 bengee: I think so
12:54:47 bengee: unless you drop support for graphs and datasets
12:55:44 tuukkah: sparql spec doesn't explicitly mention quads at all
12:56:46 bengee: I could be interesting to have some quad-requiring features baked into a toolkit, such as trust/provenance checking
12:57:20 bengee: sparql partly has those, via GRAPH and FROM/FROM NAMED
12:57:43 tuukkah: spec says "Many RDF data stores hold multiple RDF graphs and record information about each graph, allowing an application to make queries that involve information from more than one graph."
12:58:53 bengee: there are some unresolved issues around graphs and datasets, so the DAWG was prolly very careful not to specify things to strictly
12:59:06 tuukkah: where the important word would be "many", so it's not nexessary that arc2 is one of those
12:59:11 bengee: s/to strictly/too strictly/
13:00:58 tuukkah: to make a secure sparql endpoint, do i just leave out "load", "insert", "delete" in "endpoint_features"?
13:01:11 bengee: yes
13:01:41 tuukkah: tuukkah goes and adds one to smob
13:04:30 tuukkah: php markers were missing in the example :-)
13:04:48 bengee: yeah, the wiki doesn't support them ;)
13:05:02 bengee: (IIRC)
13:10:43 tuukkah: hmm, with output format "default" the result doesn't have proper content-type
13:11:54 tuukkah: oh it does but my firefox doesn't recognise it and thinks it's a php script
13:12:11 tuukkah: application/sparql-results+xml that is
13:13:52 bengee: I personally find those content-type headers pretty annoying, at least when you're trying to debug things with a browser
13:14:32 tuukkah: i think firefox is stupid not to offer to view it as plain text. the download dialog is clearly missing this option
13:14:56 bengee: yeah
13:15:16 tuukkah: or in this case, it should know +xml means it can be viewed as xml
13:21:25 scor: scor likes curl
13:30:33 tuukkah: here we go: http://smob.sioc-project.org/server/sparql
13:41:25 tuukkah: "ORDER BY ?unused" results in an SQL syntax error - do you consider this a bug?
13:42:46 bengee: not a bug, but it should be reported by the SPARQL processor one day, that sort of validation isn't there yet, though
13:43:00 bengee: smob looks nice, btw
13:43:02 tuukkah: ok
13:43:05 tuukkah: thanks :-)
13:49:45 tuukkah: one ugly part about installation is the creation of the database
13:50:16 tuukkah: could $store->setUp() do that?
13:50:56 bengee: hmm
13:52:15 bengee: that's usually not part of ARC users' privileges, but I could perhaps add it as an option or dedicated method
13:53:23 tuukkah: well, if it's unusual then it doesn't make much sense
13:54:41 tuukkah: oh, rather: $store->setUp() could try to do it and return an error if it didn't have the priviledges
13:55:13 tuukkah: the current error message is rather cryptic
13:56:30 tuukkah: what would you suggest in the documentation? echo "create database smob;" | mysql -u yourusername -p?
13:57:19 bengee: hmm, not sure
13:58:52 tuukkah: a bit better: mysql -h localhost -u yourusername -p -e "create database smob;"
14:23:10 tuukkah: wow, is this really acceptable SPARQL: CONSTRUCT {<>rdfs:seeAlso?g.}WHERE{GRAPH?g{?s?p?o.}}
14:23:33 tuukkah: the space after CONSTRUCT is all ARC2 requires
14:26:04 bengee: sparql is less strict ws-wise than turtle
14:27:33 tuukkah: requiring CONSTRUCT doesn't seem to be in line with the grammar
14:27:47 tuukkah: i mean, "CONSTRUCT "
14:28:16 bengee: possibly
14:30:57 bengee: I'll fix it in the next rev
15:02:53 tuukkah: hmm, how do i get error messages for a failed LOAD query?
15:03:27 kwijibo: tuukkah: $store->getErrors()
15:03:41 tuukkah: what's the type of the result?
15:03:52 kwijibo: an array of strings
15:04:26 tuukkah: ok, thanks!
15:04:32 kwijibo: np
15:18:29 tuukkah: "LOAD <http://www.johnbreslin.com/foaf/foaf.rdf" fails with an SQL syntax error
15:18:45 tuukkah: sorry, "LOAD <http://www.johnbreslin.com/foaf/foaf.rdf>"
15:19:28 tuukkah: meanwhile, rapper parses it just fine
15:32:56 kwijibo: tuukkah: it's not a problem with the parsing though is it?
15:33:15 kwijibo: it's the SQL INSERT statement generator
15:33:53 tuukkah: well, to me as a user, the inserts are part of the parsing
15:35:49 tuukkah: indeed, "-- at line 1 (INSERT INTO smob_triple (--"
15:38:50 tuukkah: i don't see what's wrong with it
15:39:17 tuukkah: you can see the full error message at http://tuukka.sioc-project.org/smob/server/load/index.php?data=http://www.johnbreslin.com/foaf/foaf.rdf
16:08:25 kwijibo: hmm
16:08:48 kwijibo: I haven't actually used MySQL in ages now,
16:08:56 kwijibo: but could it be the , , ?
16:09:15 kwijibo: can you have empty columns like that? or does it have to be , null, ?
16:14:04 scor: kwijibo, it should be '' (empty string)
16:14:32 scor: , '',
16:20:44 kwijibo: scor: ta, I guess that's it then
16:21:07 kwijibo: tuukkah: do you want to email bengee with that?
16:21:09 scor: kwijibo: I get the exact same error - adding '' fixes it
16:21:55 kwijibo: scor: cool
16:22:26 scor: but it's strange that such an error didn't come up before...
16:23:14 kwijibo: the empty column is s_type?
16:23:15 scor: tuukkah: do you know what in the RDF file would produce such an empty element?
16:23:34 scor: s_type yes
16:23:49 tuukkah: so something is without a type?
16:24:41 kwijibo: well, the parser isn't picking it up
16:24:42 kwijibo: <foaf:phone rdf:type="http://skype.com/" rdf:resource="callto://johnbreslin"/>
16:26:07 kwijibo: a bit of an odd construction
16:26:29 kwijibo: an edge case for the parser maybe
16:29:43 kwijibo: tuukkah: does http://convert.test.talis.com/?data-uri=http%3A%2F%2Fwww.johnbreslin.com%2Ffoaf%2Ffoaf.rdf&input=&output=rdf&callback= work?
16:29:51 kwijibo: i mean, if you LOAD that?
16:30:14 kwijibo: just to double check that it's the particular serialisation that the parser isn't coping with
16:31:46 tuukkah: i think john said the file is from some foaf generator
16:32:01 kwijibo: yes - I mean, the rdf/xml is totally valid
16:32:39 kwijibo: but what I think is that the arc rdf/xml parser is missing the s_type in that particular expression of the triple
16:42:41 tuukkah: yes, this worked: http://tuukka.sioc-project.org/smob/server/load/index.php?data=http%3A%2F%2Fconvert.test.talis.com%2F%3Fdata-uri%3Dhttp%253A%252F%252Fwww.johnbreslin.com%252Ffoaf%252Ffoaf.rdf%26input%3D%26output%3Drdf%26callback%3D
19:08:33 tuukkah: trying to load a text/html resource doesn't result in error messages
21:42:21 tuukkah: kwijibo, do you think my parser issue will be resolved in the next arc release?
21:43:25 kwijibo: tuukkah: no idea I'm afraid :) ask bengee
21:44:15 kwijibo: hope so though
21:44:31 kwijibo: bengee said there would be another release soonish
21:44:38 tuukkah: seems serious enough at least
21:45:03 kwijibo: i imagine it will be a trivial enough thing to fix
21:45:16 tuukkah: what about, would you have an idea how to get an error message when someone tries to LOAD a text/html document?
21:46:03 kwijibo: hmm - what kind of thing do you mean?
21:46:28 kwijibo: you wouldn't want html to parse?
21:46:39 tuukkah: the smob aggregator has clients provide it URIs and it will LOAD them
21:46:58 kwijibo: by default *I think* arc will glean a few triples from even plain html
21:47:17 kwijibo: it's configurable anywa
21:47:19 kwijibo: *anyway
21:47:39 kwijibo: I'm not sure if it would provoke an error, but you might get documenbts that LOAD no triples
21:48:06 kwijibo: the result of the query is an associative array, which includes things like a triple count
21:48:21 tuukkah: you're right, i get dc:title
21:48:34 kwijibo: so you could count the triples, and if there are none, write your error out to the user
21:49:02 kwijibo: you should be able to turn that off in your configuration if you don't want it to generate triples from plain html
21:49:59 kwijibo: 'sem_html_formats' => 'rdfa xfn', // I think it's this bit
21:50:10 kwijibo: (from http://arc.semsol.org/docs/v2/getting_started)
21:51:03 tuukkah: yeah, so i want to remove dc which is there by default
21:51:47 kwijibo: i imagine if you have sem_html_formats => '', then it won't extract any triples from html
21:52:04 kwijibo: and you can add in the ones you want to support
21:52:41 tuukkah: i suppose it would be good to support rdfa at least