Channel #semsol: Logs

This is a public chat log generated from the #semsol IRC channel.

10:37:02 KiYanWang: KiYanWang waves hello again
12:50:55 edsu: anyone notice memory climb during largish loads? i'm loading 2,625,020 n-triples and the process seems to be using 349MB of memory after running for like 20 mins
12:51:30 edsu: $store->query('LOAD <http://lcsh.info/static/lcsh.nt'); # if anyone is interested
12:51:50 bengee: yeah, may be. The only real streaming parser is the rdf/xml one
12:52:15 edsu: bengee: oh!
12:53:29 edsu: bengee: thnx
12:53:54 bengee: the others sort-of stream, but I didn't really test them with large files yet
12:54:11 edsu: it's growing relatively slowly
12:54:28 edsu: i guess i could take a look :)
12:54:51 edsu: open source and all ... nice work though, i was amazed at how quickly it was to get going w/ arc
12:55:32 bengee: hey, thanks :)
12:59:06 bengee: bengee checks Turtle Parser
12:59:55 bengee: could be some buffering, or the dupe check perhaps
13:01:12 bengee: are triples still added during the growing memory consumption?
13:03:47 edsu: i thik so yeah
13:03:53 edsu: i wasn't really tracking that
13:04:24 edsu: but i'm not sure
13:04:30 bengee: you can add a config setting "store_log_inserts" => 1
13:04:51 bengee: ARC will then try to creat an arc_insert_log.txt in the scripts dir
13:04:56 edsu: sweet
13:05:04 bengee: s/scripts/script's/
13:06:15 bengee: bengee tries to load the .nt file
13:07:11 edsu: yup, i see triples being written
13:08:07 edsu: http://paste.lisp.org/display/59107
13:09:51 bengee: ok
13:10:15 bengee: the turtle parser is rather slow
13:10:17 edsu: so it's streaming for sure :)
13:10:38 edsu: lots of regexen instead of xml parser written in c i guess?
13:11:32 bengee: yeah, and written along the SPARQL spec, i.e. not really optimized for average triple patterns
13:11:35 edsu: edsu converts n-triples to rdf/xml
13:13:29 bengee: I expected it to be even slower, though
13:14:25 bengee: the rdf/xml parser will start at around 1200 t/sec but insert speed goes down faster as the db bottleneck kicks in earlier
13:23:58 bengee: bengee belatedly waves at KiYanWang
16:04:42 edsu: edsu tries rdf/xml now
16:11:08 edsu: bengee: seems to exhibit similar memory growth problems :(
16:12:43 edsu: http://paste.lisp.org/display/59110 and I'm at 177956k
16:12:54 edsu: bengee: perhaps i'm loading wrong?
16:14:46 edsu: bengee: http://paste.lisp.org/display/59110#1
16:14:51 edsu: bengee: that's the script
16:16:09 bengee: store->drop will delete the db tables
16:16:45 bengee: mem looked fine here
16:17:13 bengee: you should run a store->setUp() once, then you can use query
16:17:52 bengee: unless the loader auto-setups the store
16:19:08 edsu: oh hmmm, forgot i had that in there
16:19:29 bengee: reset() will remove triples, but keep the tables
16:21:43 edsu: bengee++ # now it's humming along nicely
16:21:53 bengee: phew :)
16:22:12 edsu: 22k or so :)
16:22:35 edsu: i wonder if that was my problem with the n-triples
16:22:41 edsu: edsu tries n-triples again
16:24:41 bengee: n-triples *seemed* to work ok here. Slower, but w/o mem leaks as far as I could see (just loaded 500K triples, though)
16:24:53 edsu: yup, now it's working fine
16:25:02 bengee: cool
16:25:04 edsu: with the tables not dropped
16:25:09 edsu: :)
16:26:31 bengee: I forgot to set up the tables myself once and thought I found a way to increase the insert speed ;)
16:27:28 bengee: maybe I should add some error catching that stops loading when the tables are missing
16:27:49 bengee: ah
16:28:08 edsu: looks like rdf/xml is like 40% faster than n3
16:28:09 bengee: the increased memory may well be caused be the growing error log
16:28:22 edsu: bengee: that sounds right :)
16:28:23 bengee: s/be the/by the/
16:28:36 edsu: bengee: heheh
16:29:04 edsu: this is awesome, thanks for the help
16:29:25 bengee: np. I'm glad it wasn't a memory leak
16:29:51 edsu: the only memory leak was in my cranium
16:30:47 bengee: ;)