Channel #semsol: Logs

This is a public chat log generated from the #semsol IRC channel.

14:10:41 danbri: bengee, does arc have any foaf discovery code?
14:10:54 danbri: given http://danbri.org/ can it sniff its way to http://danbri.org/foaf.rdf ?
14:10:59 bengee: hmm
14:11:09 bengee: maybe the dc extractor
14:11:15 bengee: lemme check
14:11:27 danbri: or even a function to read the link-rels would be fine
14:11:45 danbri: i'm trying it inside a simple php/openid session
14:11:57 danbri: at moment i read only the rdfa in my homepage, which was enough to prove a point
14:12:26 danbri: also, is there a utility method that uses and destroys/hides a uuid: named graph for when you want to sparql, but don't care about keeping the data in mysql?
14:12:39 danbri: not sure what i mean by 'hides'; perhaps 'caches'
14:13:31 danbri: in current hack, i'm pulling the remote data into tag:danbri.org:2008:phpsid:6923c94fb05995a11cb4e415c133a220 ... named after the php session
14:13:59 bengee: the dc extractor should give you <> rdfs:seeAlso <link.rel=meta|alternate.href> . <link.rel=meta|alternate.href> dc:title "link.title" ; dc:format "link.type" .
14:14:01 danbri: and i guess i might want to make a graph that superimposes homepage markup (rdfa, microformats) with nearby feeds, foaf.rdf etc
14:14:18 danbri: next hope was to try an oauth contact-list importer but am still nosing around those apis
14:15:02 bengee: interesting
14:15:50 danbri: re the above, i mean basically a way of pretending you offer in-memory sparql query, by making the sql store an implementation detail
14:16:10 danbri: would require some thought re data visibility (caching could be useful; privacy of data to app could be useful; some tension there)
14:17:23 bengee: bengee has been using temporary graphs, too
14:18:28 bengee: I was wondering about a separate "inbox" store which is then used to pull verified/tweaked data from into the app's main store
14:19:07 bengee: but there is no utility method which does that transparently
14:19:55 bengee: you have to do the LOAD INTO / DELETE FROM manually
14:23:30 bengee: if you don't need sophisticated sparql, you can of course just load/parse the source into an index structure
14:25:05 bengee: and then do $see_also = $index['http://danbri.org']['http://...seeAlso'][0]['value']
14:25:34 danbri: ah yeah i tried navigating the nest of arrays i got back from the parser
14:25:40 danbri: then thought 'sod it, i need sparql'
14:25:47 bengee: heh, ok
14:26:53 danbri: also wondering about 'nice' ways of superimposing graphs
14:27:33 danbri: imagine danbri's homepage/mf/rdfa, rss feed, foaf.rdf, flickr-foaf.rdf etc are all in different named graphs
14:27:38 danbri: can we do
14:27:41 danbri: WHERE {
14:27:50 danbri: GRAPH ?g { stuff we ask of the aggregate }
14:28:23 danbri: ?g xyz:somerelation <http://danbri.org/>
14:28:32 danbri: ... and expect reasonable performance?
14:28:45 danbri: or some other grouping construct, eg. more uuid / session oriented
14:29:18 bengee: haven't ried, but it could work
14:29:24 bengee: *t*ried
14:32:17 danbri: i think i need to try
14:32:34 danbri: so main Q of all these, is help poking around the link-rel from a .html page (whether xhtml or not)
14:33:29 bengee: a LOAD with default extractor settings should give you seeAlsos
14:33:47 bengee: if you used rel=meta|alternate
14:35:19 bengee: bengee tries locally
14:36:28 bengee: yep, works here
14:36:33 danbri: ah sweet
14:36:46 danbri: if there are multiple formats, should i get all of 'em?
14:36:54 danbri: rdfa plus microformats plus html-based?
14:36:59 danbri: in the same graph?
14:37:11 danbri: (assuming they're all active extractors)
14:37:17 bengee: unless you do conneg, yes
14:37:34 danbri: cool
14:37:37 bengee: (e.g. if you serve rdf/xml directly, instead of the html)
14:37:59 bengee: when you serve html, all active extractor are applied
14:39:27 bengee: rdfa needs the rdfa doctype, and erdf needs the profile hook, though
14:40:25 bengee: things became a bit too slow otherwise
14:42:06 bengee: I'm trying to implement a "one pass for all posh formats" approach to accelerate the extractor, but atm, each extractor needs a separate walk through the dom
14:48:10 danbri: aw, can't the rdfa be sniffed?
14:48:28 kwijibo: how?
14:48:48 bengee: yeah, by one of their dozen new attributes
14:48:49 danbri: hmm there's a thread in my mailbox with brad fitzpatrick re google sgapi doing a sniff
14:49:09 bengee: but the dtd was easier for now
14:49:12 danbri: there might even be perl around
14:49:14 danbri: i'll dig it out
14:49:16 danbri: but i better get to my departure gate 1st
14:49:49 bengee: I guess the ns defs make a good indicator, too
14:50:14 kwijibo: would you have to parse the whole doc first though to look for rdfa attributes?
14:50:42 bengee: but that won't work for reserved predicates (license and such), so the dtd is the only reliable thing (or profile)
14:51:06 bengee: ARC parses the whole doc once anyway
14:51:29 kwijibo: before invoking the specific parsers?
14:51:35 bengee: yes
14:52:05 bengee: it's not the most efficient approach, I admit, bt the extarctors work on a node index
14:52:28 bengee: heh, "extarctors"
14:52:32 kwijibo: one common false positive you might get with the ns, is xslt generated html, where the ns leak into the output
14:53:55 danbri: you could scan the document-as-string looking for strong indicators of rdfa, then do a real parse
14:54:18 danbri: false positives get stripped out by the real parse, so heuristic is harmless?
14:54:36 bengee: or the rdf-in-xhtml group could come up with a proposal that's more practical ;)
14:54:54 danbri: any suggestions?
14:55:17 kwijibo: actually, what you said about the node index does sound pretty efficient anyway
14:57:46 bengee: would be nice if rdfa, mfs, erdf, and posh formats were based on a common approach that allowed single-pass processing
14:58:20 kwijibo: what are posh formats?
14:58:32 bengee: plain old semantic html
14:59:22 bengee: the stuff that grddl is made for
14:59:24 kwijibo: i mean, what examples are there?
14:59:55 bengee: consistent class names, titel, h1-h6, ol/li, etc
15:01:05 kwijibo: does arc parse any posh ?
15:01:18 bengee: not in a generic way
15:02:11 bengee: I'm dreaming of something like DIY grddl where you define some mapping between html hooks and rdf triples
15:02:32 kwijibo: cool
15:02:38 bengee: ARC could have a set of predefined definitions for MFs