This is a public chat log generated from the #semsol IRC channel.
15:42:13
bengee: ?
15:42:22
yep
15:42:41
i'm noticing some weird behaviour with the parser/serialiser
15:42:50
actually I should narrow down which it is first
15:42:54
then bug you
15:55:02
bengee: yes, it;s the parser
15:55:15
it's decoding html entities
15:55:28
i wonder if that was my 'fix' that did that :(
15:55:38
for the xmlliterals the other month
15:57:05
which parser is that?
15:57:36
rdf/xml
15:57:38
sorry
15:57:46
forgot to say
15:59:00
hmm, I thought your fix prevented the decoding..
16:00:08
maybe we have to make that more liberal if it's still happening
16:04:15
bengee: yeah, you're right
17:37:28
bengee, 35MB of crufty rdf/xml parser/serialiser test data for ya : http://hyperdata.org/knobot_2007-12-28.rdf
17:37:53
was output from Jena, but I think includes things like bad URIs
17:37:59
hey, cool
17:38:54
prolly what kwijibo was hitting problems with
17:39:13
danja: yeah, actally i think its the parser rather than your data
17:39:41
you're trying to move your blog data into a talis store?
17:40:09
yep
17:40:39
I still want to use knobot (to keep reto happy), but not be totally reliant on it...
17:41:04
also link other stuff in (which knobot can do, but it can be clunky UI-wise)
17:41:26
danja -> dogwalk
17:41:40
I see
17:41:59
kwijibo wonders how bengee intuited danja's evil plans
17:45:13
;)
17:47:57
bengee: RDFa doctype triggers quirksmode?
17:49:12
it's a FF thing, it seems
17:49:50
is it in quirksmode? or is it some other issue?
17:50:09
render mode: standards compliant
17:51:14
weird
17:51:23
yeah
17:51:33
the css is a whole mess, but still..
17:52:06
bengee tries another template
17:53:27
yep, same problem
18:13:35
danja does a drive-by bwahahaha
18:17:29
bengee: add $d = htmlspecialchars($d, ENT_NOQUOTES);
18:17:30
to h4Cdata ?
18:20:12
danja: Invalid URIs are not permitted in RDF/XML documents. Please replace the uri <mbox:thaynes{at}openlinksw.com> with a valid one.
18:22:57
danja un-bwahahas
18:23:52
kwijibo, is the xmlliteral not coming in via parseType=literal, but as a plain literal with a datatype=rdf:XMLLiteral attribute perhaps?
18:25:23
that would explain why the parser doesn't change the state to 6
18:26:09
maybe this coul then be fixed by adding a datatype check to h4Cdata
18:29:03
bengee adds htmlspecialchars to h4Cdata and runs syntax tests, maybe we can just add it for all cdata
18:32:00
hmm, there is no test for this, it seems
18:32:56
bengee takes danja's export as test file
18:43:27
bengee thinks he actually prefers to have unescaped markup after parsing
18:47:32
e.g. turtle has different escaping rules, seems to make sense to not pass entities to its serialiser etc
18:53:13
"parsed 291785 triples in 43.061 secs". the data *seems* to be ok
