Channel #semsol: Logs

This is a public chat log generated from the #semsol IRC channel.

13:34:27 PeterKz: Hi all! bengee: have you had time to look at performance in ARC2 yet?
13:35:28 PeterKz: I have loaded a subset of data for my EU legislation site (take a peek at e.g. http://eurlex.nu/doc/32007D0849)
13:36:17 PeterKz: ...and each LOAD operation seem to make a full table scan in the db meaning that time for a LOAD increases linearly right now.
13:38:18 bengee: haven't really looked at performance yet, but it's getting urgent
13:38:43 bengee: do you know which table is scanned?
13:38:47 PeterKz: no problem. I tried to take a look, but my sql skills are out of date...:-)
13:39:19 PeterKz: I hade the misfortune of a temporary ban from the webhosting company for running my LOAD job:-)
13:39:30 bengee: the "triple" one?
13:39:33 bengee: ugh
13:39:52 bengee: bengee wonders if that might be caused by INSERT IGNORE..
13:40:30 bengee: ... once the index doesn't fit into mem any more
13:41:32 PeterKz: maybe duplicate management requires full table scans?
13:41:51 bengee: yeah, maybe
13:44:23 PeterKz: The only info I got from the hosting company was this: http://pastie.caboo.se/161761
13:45:00 bengee: oh, that's the id lookup
13:48:11 PeterKz: Doing a LOAD of an xhtml2 doc with RDFa (15 triples) takes around 10 secs currently. For an empty db it takes < 1 sec.
13:49:23 PeterKz: Db size is 1M triples
13:52:06 PeterKz: brb phonecall
14:02:28 bengee: I should split the id lookup into individual selects instead of one big union, the LIMIT 1 may not be sufficient for an internal optimization to stop once a matching row is found
14:03:55 bengee: and I should introduce a hash column for the id lookups
14:07:18 PeterKz: that soounds interesting