Channel #semsol: Logs

This is a public chat log generated from the #semsol IRC channel.

03:53:57 jscheel: hello!
10:05:12 bengee: bengee swithces from "CREATE tmp SELECT" to "CREATE tmp" + "INSERT INTO tmp SELECT"
10:05:53 kwijibo: hmmm
10:05:55 bengee: getting rid of the apparently problematic "ALTER tmp ADD _pos_" for queries that contain ORDER BY
10:06:52 bengee: with a separate CREATE, I can add the sort column before the table is filled
10:07:53 bengee: and mysql will hopefully stop hanging
10:08:22 kwijibo: weird
10:08:58 kwijibo: with Jena, variable predicates make queries slower
10:09:12 kwijibo: with arc i'm getting the opposite!
10:09:18 bengee: heh
10:09:19 kwijibo: can't be right
10:12:35 kwijibo: DESCRIBE ?s WHERE { ?s ?p ?o . FILTER(regex(?o, "machine")) } -> 5.82785010338
10:13:06 kwijibo: DESCRIBE ?s WHERE { ?s rdfs:label ?o . FILTER(regex(?o, "machine")) } --> emm, still not finished!
10:13:51 kwijibo: still not finished ...
10:14:00 kwijibo: last time it was 111 seconds
10:17:17 kwijibo: this time 364 seconds
10:18:46 kwijibo: bengee - any idea why that is?
10:22:03 bengee: might make sense to do a query(..., 'sql') and run an EXPLAIN against that
10:22:12 bengee: most probably index-related
10:24:26 kwijibo: ah
10:28:44 kwijibo: bengee: is there any docs about the indexing?
10:28:51 kwijibo: *are there
10:29:21 bengee: no
10:29:51 bengee: did you find out anything interesting?
10:38:58 kwijibo: not really familiar with the output of explain :(
10:41:02 kwijibo: the rdfs:label query has os,po in the possible keys column, whereas the ?p query has null
10:41:59 kwijibo: sorry, that should have been cid instead of os,po
10:44:54 bengee: hmm, no index or p, that's odd
10:45:17 bengee: ah, a variable p, ok
10:46:33 bengee: ok, then it perhaps does a table scan for ?p, but a (for some reason slow) index lookup for a given p
10:47:25 bengee: did you try $store->optimizeTables()
10:47:45 bengee: maybe the index is fragmented
11:00:17 kwijibo: bengee: what's sparqlxmlresultsloader?
11:00:55 bengee: I don't think that exists already
11:01:19 kwijibo: mortenf tweeted about sending it to you :)
11:02:13 bengee: it'd be a streaming rdf loader from a predefined sparql xml result (e.g. g = graph, s = subject, p = predicate, etc)
11:02:26 bengee: for store replication
11:06:01 kwijibo: bengee - optimising tables made the rdfs:label query a lot better
11:06:06 kwijibo: but the ?p query a lot worse
11:06:48 kwijibo: rdfs:label now takes only 15.0095608234
11:06:59 bengee: heh, but that's how it was supposed to be, no? ;)
11:07:03 kwijibo: but ?p now takes 76.0367071629
11:07:15 kwijibo: (whereas before it took 5-7 seconds)
11:07:41 kwijibo: yeah, I suppose it depends though - optimised for what ? :)
11:07:59 bengee: some query cache thing perhaps, too?
11:08:12 bengee: not sure
12:47:17 bengee: bengee mangs gets rid of the table locks for DELETE queries
12:47:40 bengee: heh, s/mangs/manages to/
12:55:52 kwijibo: bengee: I seem to be managing to stream large datasets into the platform now thanks to ARC's streaming parser :)
12:56:03 bengee: oh, cool :)
12:56:29 kwijibo: script hasn't finished yet - but it's not died yet either :) and the triples are definitely going in :)
12:56:42 bengee: yay
12:57:59 kwijibo: I have a question about the format detector - I pointed it at a local file (on mac) with a .rdf extension, containing rdf/xml, and it detected xml rather than rdf/xml
13:00:22 bengee: I'd need a snippet of the first 1000 chars to see why it's not working
13:00:44 bengee: (or 1st couple of lines)
21:54:19 Rabur: Hi
21:54:26 Rabur: Anybody's here?