Channel #semsol: Logs

This is a public chat log generated from the #semsol IRC channel.

09:29:10 kwijibo: bengee: did chameleon95 say how long it took for him to get his 11m triples in there?
09:30:47 bengee: no
09:31:04 bengee: I'd be interested, too
09:32:34 kwijibo: kwijibo trying to load a 95mb file, and wondering how long that will take
09:32:50 kwijibo: been 90mins so far
09:34:49 kwijibo: it occurred to me this morning that the simplest answer to my problem yesterday is simply to convert the bnodeids to uris myself instead of letting the platform do it
09:35:45 bengee: did you apply the binary patch?
09:35:49 kwijibo: $storeURI.'/bnodes/'.substr($bnodeID, 2)
09:36:01 kwijibo: no - should I have done?
09:36:07 bengee: yeah
09:36:12 kwijibo: oops :D
09:36:33 bengee: I should put up a new rev
09:37:25 kwijibo: i decided to try loading the data before i'd woken up properly and didn't really think it through
09:37:28 kwijibo: :p
09:38:32 bengee: you can add a "store_log_inserts" => 1 to the configuration, and ARC will generate an insert speed log file
09:38:59 kwijibo: oh that would be handy
09:39:27 kwijibo: does that write the log file as it's happening, or at the end?
09:40:24 bengee: it's written/extended after each INSERT IIRC
09:42:07 bengee: bengee prepares a new rev
09:42:34 kwijibo: I wish I'd done that then :), could see the progress
10:15:28 bengee: ok, new rev is online
10:24:34 kwijibo: coolio
10:27:30 kwijibo: hmm, it's been 2hr 25 mins now is that normal, or is my machine jamming up?
10:29:01 bengee: arc sould be at 1-2 MT now
10:29:23 bengee: well, depending on the machine
10:30:03 kwijibo: I wonder how many triples in 95mb of average RDF/XML
10:32:32 bengee: I have 120KT of DBLP in 6MB RDF/XML
10:33:20 bengee: and 3MT of RSS in 400MB
10:35:23 kwijibo: ah, I think it's approx 1 305 000 triples
10:35:47 kwijibo: that's quite a lot
10:36:20 kwijibo: is the RSS for something? or just convenient test data?
10:36:44 bengee: testdata from environmentalhealthnews.org
10:37:44 kwijibo: rss would have a relatively low mb/triples ratio wouldn't it? long literals
10:37:53 bengee: yeah
10:38:14 kwijibo: how's the query performance at that size?
10:38:22 bengee: dunno
10:38:57 bengee: depends on your index settings
15:31:11 kwijibo: would code reserialising ARC's SPARQL parser output would be useful to anyone?
15:33:05 bengee: maybe
15:33:31 bengee: might be worth asking on the ml, perhaps
16:07:35 danbri: yes, it could be
16:07:38 danbri: if you can edit the query
16:07:50 danbri: some discussions around on doing ACL by query rewrites
16:09:37 kwijibo: danbri: how would you do that with ARC though? isn't every graph included unless a FROM is given?
16:10:07 danbri: put a GRAPH ?G around unqualified parts
16:10:13 danbri: and constrain things against ?g
16:10:19 danbri: not the most efficient, but should work
16:10:29 danbri: plenty devilish detail
16:11:03 kwijibo: filter(! regex(str(?g), "private-graph") ) ?
16:11:54 bengee: I realized that the idea of putting a single graph graph pattern around the whole query doesn't work
16:12:14 bengee: so yes, you'd have to put them around each single pattern
16:13:34 kwijibo: I still think ACL would be better done at a lower level
16:13:56 kwijibo: maybe there would need to be an extra column
16:16:16 kwijibo: I dunno - just parsing each sparql query twice seems kind of hacky for something as fundamental as ACL
16:17:33 kwijibo: it's funny, I think we need to solve the same problem at Talis, but coming from the opposite direction
16:18:09 kwijibo: we have graph based ACL, but we don't have user-definable named graphs
16:19:13 kwijibo: and (I think) we would want to have it work like arc basically, where you can use graphs for preovenance or whatever without making it ridiculously difficult to write sparql queries (naming all the graphs explicitly)
16:20:01 kwijibo: but we would prolly still want private graphs to need to be queried explicitly
16:21:24 kwijibo: and we've got the 2 different endpoints as well - the public one and the authenticated one for the private graphs
16:22:14 kwijibo: anyway, sorry for rambling, just trying to get a plausible picture in my head of how I think it should work
16:33:13 danbri: there could be various graphs already
16:33:23 danbri: some with grounded URIs, some with variables
16:34:11 danbri: it could be hacky, ... but it could also be a stepping stone to having the access control done within the query engine itself
16:43:40 kwijibo: well, I'll package up the code later in case anyone wants to have a play anyway
20:20:31 bengee: bengee wonders how silly it'd be to create an RDF/HTML entirely based on form elements
20:21:51 bengee: hmm, no nested forms
20:22:42 bengee: which may not be a problem for an indexed structure
20:24:01 bengee: hmm, can't properly render markup in form elements, I guess
20:26:53 bengee: ok, so the only attributes that work on any tag are class, id, and title
20:26:58 kwijibo: why would you want to render it in a form element?
20:27:29 bengee: because any form element can have @name
20:28:18 bengee: would have been handy for URIs
20:28:23 kwijibo: what problem do you need to solve that isn't covered by an existing rdf in html syntax?
20:28:33 bengee: no prefixes
20:28:41 bengee: html validity
20:29:23 kwijibo: no prefixes? or no prefixes and no uris?
20:29:27 bengee: pure client-side switching from view to edit
20:29:38 bengee: no qnames/curies
20:32:51 bengee: I used @title for URIs until now, but I guess screen readers will read them, and the tooltips are annoying
20:34:10 kwijibo: maybe better to use xhtml namespaced custom attributes than misuse existing html attributes?
20:34:35 bengee: xhtml1 doesn't allow them, unfortunately
20:35:40 bengee: I guess I really just need something for the arcs, the subject and object could be more native html structures
22:30:54 kwijibo: danbri: any idea how to write tests for the SPARQL serialiser? the only thing I can think of is to parse and reserialise, and see if both queries return the same results
22:34:49 danbri: that sounds reasonable
22:35:03 danbri: you might also count various observable things
22:35:09 danbri: number of named variables
22:35:16 danbri: include those in tests
22:35:36 danbri: some things will be ok to vary though, eg number of lines in the text file
22:35:39 danbri: interesting problem :)