This is a public chat log generated from the #semsol IRC channel.
10:37:02
KiYanWang waves hello again
12:50:55
anyone notice memory climb during largish loads? i'm loading 2,625,020 n-triples and the process seems to be using 349MB of memory after running for like 20 mins
12:51:30
$store->query('LOAD <http://lcsh.info/static/lcsh.nt'); # if anyone is interested
12:51:50
yeah, may be. The only real streaming parser is the rdf/xml one
12:52:15
bengee: oh!
12:53:29
bengee: thnx
12:53:54
the others sort-of stream, but I didn't really test them with large files yet
12:54:11
it's growing relatively slowly
12:54:28
i guess i could take a look :)
12:54:51
open source and all ... nice work though, i was amazed at how quickly it was to get going w/ arc
12:55:32
hey, thanks :)
12:59:06
bengee checks Turtle Parser
12:59:55
could be some buffering, or the dupe check perhaps
13:01:12
are triples still added during the growing memory consumption?
13:03:47
i thik so yeah
13:03:53
i wasn't really tracking that
13:04:24
but i'm not sure
13:04:30
you can add a config setting "store_log_inserts" => 1
13:04:51
ARC will then try to creat an arc_insert_log.txt in the scripts dir
13:04:56
sweet
13:05:04
s/scripts/script's/
13:06:15
bengee tries to load the .nt file
13:07:11
yup, i see triples being written
13:08:07
http://paste.lisp.org/display/59107
13:09:51
ok
13:10:15
the turtle parser is rather slow
13:10:17
so it's streaming for sure :)
13:10:38
lots of regexen instead of xml parser written in c i guess?
13:11:32
yeah, and written along the SPARQL spec, i.e. not really optimized for average triple patterns
13:11:35
edsu converts n-triples to rdf/xml
13:13:29
I expected it to be even slower, though
13:14:25
the rdf/xml parser will start at around 1200 t/sec but insert speed goes down faster as the db bottleneck kicks in earlier
13:23:58
bengee belatedly waves at KiYanWang
16:04:42
edsu tries rdf/xml now
16:11:08
bengee: seems to exhibit similar memory growth problems :(
16:12:43
http://paste.lisp.org/display/59110 and I'm at 177956k
16:12:54
bengee: perhaps i'm loading wrong?
16:14:46
bengee: http://paste.lisp.org/display/59110#1
16:14:51
bengee: that's the script
16:16:09
store->drop will delete the db tables
16:16:45
mem looked fine here
16:17:13
you should run a store->setUp() once, then you can use query
16:17:52
unless the loader auto-setups the store
16:19:08
oh hmmm, forgot i had that in there
16:19:29
reset() will remove triples, but keep the tables
16:21:43
bengee++ # now it's humming along nicely
16:21:53
phew :)
16:22:12
22k or so :)
16:22:35
i wonder if that was my problem with the n-triples
16:22:41
edsu tries n-triples again
16:24:41
n-triples *seemed* to work ok here. Slower, but w/o mem leaks as far as I could see (just loaded 500K triples, though)
16:24:53
yup, now it's working fine
16:25:02
cool
16:25:04
with the tables not dropped
16:25:09
:)
16:26:31
I forgot to set up the tables myself once and thought I found a way to increase the insert speed ;)
16:27:28
maybe I should add some error catching that stops loading when the tables are missing
16:27:49
ah
16:28:08
looks like rdf/xml is like 40% faster than n3
16:28:09
the increased memory may well be caused be the growing error log
16:28:22
bengee: that sounds right :)
16:28:23
s/be the/by the/
16:28:36
bengee: heheh
16:29:04
this is awesome, thanks for the help
16:29:25
np. I'm glad it wasn't a memory leak
16:29:51
the only memory leak was in my cranium
16:30:47
;)
