Channel #semsol: Logs

This is a public chat log generated from the #semsol IRC channel.

15:42:13 kwijibo: bengee: ?
15:42:22 bengee: yep
15:42:41 kwijibo: i'm noticing some weird behaviour with the parser/serialiser
15:42:50 kwijibo: actually I should narrow down which it is first
15:42:54 kwijibo: then bug you
15:55:02 kwijibo: bengee: yes, it;s the parser
15:55:15 kwijibo: it's decoding html entities
15:55:28 kwijibo: i wonder if that was my 'fix' that did that :(
15:55:38 kwijibo: for the xmlliterals the other month
15:57:05 bengee: which parser is that?
15:57:36 kwijibo: rdf/xml
15:57:38 kwijibo: sorry
15:57:46 kwijibo: forgot to say
15:59:00 bengee: hmm, I thought your fix prevented the decoding..
16:00:08 bengee: maybe we have to make that more liberal if it's still happening
16:04:15 kwijibo: bengee: yeah, you're right
17:37:28 danja: bengee, 35MB of crufty rdf/xml parser/serialiser test data for ya : http://hyperdata.org/knobot_2007-12-28.rdf
17:37:53 danja: was output from Jena, but I think includes things like bad URIs
17:37:59 bengee: hey, cool
17:38:54 danja: prolly what kwijibo was hitting problems with
17:39:13 kwijibo: danja: yeah, actally i think its the parser rather than your data
17:39:41 bengee: you're trying to move your blog data into a talis store?
17:40:09 danja: yep
17:40:39 danja: I still want to use knobot (to keep reto happy), but not be totally reliant on it...
17:41:04 danja: also link other stuff in (which knobot can do, but it can be clunky UI-wise)
17:41:26 danja: danja -> dogwalk
17:41:40 bengee: I see
17:41:59 kwijibo: kwijibo wonders how bengee intuited danja's evil plans
17:45:13 bengee: ;)
17:47:57 kwijibo: bengee: RDFa doctype triggers quirksmode?
17:49:12 bengee: it's a FF thing, it seems
17:49:50 kwijibo: is it in quirksmode? or is it some other issue?
17:50:09 bengee: render mode: standards compliant
17:51:14 kwijibo: weird
17:51:23 bengee: yeah
17:51:33 bengee: the css is a whole mess, but still..
17:52:06 bengee: bengee tries another template
17:53:27 bengee: yep, same problem
18:13:35 danja: danja does a drive-by bwahahaha
18:17:29 kwijibo: bengee: add $d = htmlspecialchars($d, ENT_NOQUOTES);
18:17:30 kwijibo: to h4Cdata ?
18:20:12 kwijibo: danja: Invalid URIs are not permitted in RDF/XML documents. Please replace the uri <mbox:thaynes{at}openlinksw.com> with a valid one.
18:22:57 danja: danja un-bwahahas
18:23:52 bengee: kwijibo, is the xmlliteral not coming in via parseType=literal, but as a plain literal with a datatype=rdf:XMLLiteral attribute perhaps?
18:25:23 bengee: that would explain why the parser doesn't change the state to 6
18:26:09 bengee: maybe this coul then be fixed by adding a datatype check to h4Cdata
18:29:03 bengee: bengee adds htmlspecialchars to h4Cdata and runs syntax tests, maybe we can just add it for all cdata
18:32:00 bengee: hmm, there is no test for this, it seems
18:32:56 bengee: bengee takes danja's export as test file
18:43:27 bengee: bengee thinks he actually prefers to have unescaped markup after parsing
18:47:32 bengee: e.g. turtle has different escaping rules, seems to make sense to not pass entities to its serialiser etc
18:53:13 bengee: "parsed 291785 triples in 43.061 secs". the data *seems* to be ok