This is a public chat log generated from the #semsol IRC channel.
22:39:49
kwijibo wondering if maybe INSERT INTO <http://example.org> { ?s ?p ?o } shouldn't parse
07:08:35
bengee: you about?
07:09:25
hey, resurfacing from rick watching?
07:09:34
ooh
07:09:40
thanks for reminding me
07:09:53
kwijibo starts watching rick astley again
07:10:47
http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/talis/Talis_StorePlugin.php my Talis_Store::import() method eventually runs out of memory depending on the memory and the size of the data - I was wondering if you could see a way to make it not run out of memory :p
07:16:35
hmm
07:18:44
do you thik it's the arc store not garbage-collecting properly?
07:19:03
I don't really understand how the memory usage works - I'd have thought that at the end of the loop it would all be cleared by the garbage collector and start over
07:19:11
but apparently not
07:20:07
is it php that uses the memory, or mysql perhaps?
07:20:19
(just to be sure)
07:20:23
php is the one that says it runs out of memory
07:20:27
ok
07:21:05
did you try to comment-out the $this->insert call
07:21:20
i.e. just loop through the record sets first
07:21:23
no - I'll try that
07:24:15
and doesn't (empty($data)) evaluate to true even if the construct is empty?
07:24:29
ah, no
07:24:33
"raw"
07:27:35
hmm, good call, that doesn't seem to be stopping
07:27:48
although I wish I'd lowered the memory before starting
07:30:09
hmm, seems to have slowed down considerably
07:30:29
I wonder if I'm going to get a row for D.O.Sing the platform again :p
07:31:58
ok, gotta go for a train now, cheers bengee cu l8r
07:32:05
ok
07:32:32
i wonder what's wrong with the insert call
12:03:11
hey bengee
12:03:15
back online now
12:03:35
it was because I was creating a new parser with each ->insert()
12:04:04
I'd have thought php could've coped with clearing up that, but apparently not
12:05:40
ah
12:09:26
I also thought iand told me that you needed a new parser for each document with arc2, but that seems not to be the case?
12:09:48
it seems to work just reusing the parser instance anyway
12:10:23
it depends, I think
12:10:43
some local variables may no be reset
12:11:10
unless you manually call __init()
12:11:13
what about if __init() was called at the start of parse() ?
12:11:18
;)
12:12:55
the reader stuff may be problematic
12:13:32
you may have to unset $this->reader
12:13:40
so that a new socket can be opened etc
12:14:12
it's probably less dangerous to create new parser objects
12:14:43
there might be other dependencies
12:15:28
or conflicts
12:19:05
in this case, the parser is being passed the document, so shouldn't need to open a new socket
12:19:22
ok, then it may work
12:19:46
seems less harmful than the script dying from memory overflow anyway
12:21:00
can I request a reusable parser for a future revision? :D
12:21:21
or is it more problematic than it sounds?
12:21:45
it feels problematic
12:21:50
not sure if it is
12:22:15
the parser->sub_parser chain makes things a bit complicated
12:22:35
i wonder why the parser object is hanging around in memory
12:23:03
don't I unset() it after parsing?
12:23:10
oh well, as I said, I don't understand how the garbage collector works
12:23:27
even if I unset it manually, I still get the memory overflow
12:23:55
I mean: unset($parser); return $foo;
12:25:36
maybe the xml parser has some weird global scope and isn't freed
12:25:59
that sounds plausible
12:28:36
hmm, which parser are we talking about, btw? the rdfxml one?
12:28:52
and you pass in a string?
12:29:16
maybe it's the reader that consumes all the memory
12:30:14
a data reader keeps the string in memory and uses a pointer to walk through it.
12:30:49
so, if the reader isn't properly killed, you'll keep all the data strings in mem
12:32:05
arc calls closeStream after parsing, though, so this shouldn't happen
12:34:55
maybe there should be a streaming dumpTrix in arc which could then be accessed from a streaming trix loader/parser
12:35:36
i.e. $new_store->query("LOAD <oldendpoint/dumpTrix>")
12:36:50
bengee: emm, I think the turtle one actually - though I was using getRDFParser()
12:37:37
and I pass in a string
12:42:20
s/trix/sparql xml result/ would probably work, too, if ARC could stream those somehow
12:43:24
not sure I get the point of the streaming trix ?
12:43:53
you wouldn't have the offset/multi-parsing problem
12:44:16
you stream-insert in one store what's streaming out of another store
12:47:46
how would you stream the output?
12:48:11
yeah, that's the big questions ;)
12:48:57
I'd need a single query that can generate g, s, p, o, s_type, o_type, o_dt, o_lang from ARC's nomalized tables
12:49:49
+ mysql_unbuffered_query()
12:50:00
+ echo + flush()
12:51:52
the OFFSET thing isn't a hardship for me, because I have to lump the triples into documents to send over http anyway
12:53:04
I'm just looking at how to make LOAD scalable with the streaming parsing, like it is in the arc store
12:53:20
that will be nice
12:54:10
one of the pains of using a talis store is getting medium-large amounts of data in there - it has to be chunked
12:54:30
ah, ok
12:54:31
and you've already solved that
12:54:40
:)
12:59:56
that's how I managed to get wordnet into the schema-cache store - imported it into arc, and imported into the platform from my arc store
13:00:43
heh
13:00:56
how many triples were that?
13:01:21
1.3 million in the biggest file
13:01:36
which was 95mb
13:01:41
I already had the others in
13:01:59
interesting
17:25:42
Hello
