This is a public chat log generated from the #semsol IRC channel.
13:34:27
Hi all! bengee: have you had time to look at performance in ARC2 yet?
13:35:28
I have loaded a subset of data for my EU legislation site (take a peek at e.g. http://eurlex.nu/doc/32007D0849)
13:36:17
...and each LOAD operation seem to make a full table scan in the db meaning that time for a LOAD increases linearly right now.
13:38:18
haven't really looked at performance yet, but it's getting urgent
13:38:43
do you know which table is scanned?
13:38:47
no problem. I tried to take a look, but my sql skills are out of date...:-)
13:39:19
I hade the misfortune of a temporary ban from the webhosting company for running my LOAD job:-)
13:39:30
the "triple" one?
13:39:33
ugh
13:39:52
bengee wonders if that might be caused by INSERT IGNORE..
13:40:30
... once the index doesn't fit into mem any more
13:41:32
maybe duplicate management requires full table scans?
13:41:51
yeah, maybe
13:44:23
The only info I got from the hosting company was this: http://pastie.caboo.se/161761
13:45:00
oh, that's the id lookup
13:48:11
Doing a LOAD of an xhtml2 doc with RDFa (15 triples) takes around 10 secs currently. For an empty db it takes < 1 sec.
13:49:23
Db size is 1M triples
13:52:06
brb phonecall
14:02:28
I should split the id lookup into individual selects instead of one big union, the LIMIT 1 may not be sufficient for an internal optimization to stop once a matching row is found
14:03:55
and I should introduce a hash column for the id lookups
14:07:18
that soounds interesting
