This is a public chat log generated from the #semsol IRC channel.
14:10:41
bengee, does arc have any foaf discovery code?
14:10:54
given http://danbri.org/ can it sniff its way to http://danbri.org/foaf.rdf ?
14:10:59
hmm
14:11:09
maybe the dc extractor
14:11:15
lemme check
14:11:27
or even a function to read the link-rels would be fine
14:11:45
i'm trying it inside a simple php/openid session
14:11:57
at moment i read only the rdfa in my homepage, which was enough to prove a point
14:12:26
also, is there a utility method that uses and destroys/hides a uuid: named graph for when you want to sparql, but don't care about keeping the data in mysql?
14:12:39
not sure what i mean by 'hides'; perhaps 'caches'
14:13:31
in current hack, i'm pulling the remote data into tag:danbri.org:2008:phpsid:6923c94fb05995a11cb4e415c133a220 ... named after the php session
14:13:59
the dc extractor should give you <> rdfs:seeAlso <link.rel=meta|alternate.href> . <link.rel=meta|alternate.href> dc:title "link.title" ; dc:format "link.type" .
14:14:01
and i guess i might want to make a graph that superimposes homepage markup (rdfa, microformats) with nearby feeds, foaf.rdf etc
14:14:18
next hope was to try an oauth contact-list importer but am still nosing around those apis
14:15:02
interesting
14:15:50
re the above, i mean basically a way of pretending you offer in-memory sparql query, by making the sql store an implementation detail
14:16:10
would require some thought re data visibility (caching could be useful; privacy of data to app could be useful; some tension there)
14:17:23
bengee has been using temporary graphs, too
14:18:28
I was wondering about a separate "inbox" store which is then used to pull verified/tweaked data from into the app's main store
14:19:07
but there is no utility method which does that transparently
14:19:55
you have to do the LOAD INTO / DELETE FROM manually
14:23:30
if you don't need sophisticated sparql, you can of course just load/parse the source into an index structure
14:25:05
and then do $see_also = $index['http://danbri.org']['http://...seeAlso'][0]['value']
14:25:34
ah yeah i tried navigating the nest of arrays i got back from the parser
14:25:40
then thought 'sod it, i need sparql'
14:25:47
heh, ok
14:26:53
also wondering about 'nice' ways of superimposing graphs
14:27:33
imagine danbri's homepage/mf/rdfa, rss feed, foaf.rdf, flickr-foaf.rdf etc are all in different named graphs
14:27:38
can we do
14:27:41
WHERE {
14:27:50
GRAPH ?g { stuff we ask of the aggregate }
14:28:23
?g xyz:somerelation <http://danbri.org/>
14:28:32
... and expect reasonable performance?
14:28:45
or some other grouping construct, eg. more uuid / session oriented
14:29:18
haven't ried, but it could work
14:29:24
*t*ried
14:32:17
i think i need to try
14:32:34
so main Q of all these, is help poking around the link-rel from a .html page (whether xhtml or not)
14:33:29
a LOAD with default extractor settings should give you seeAlsos
14:33:47
if you used rel=meta|alternate
14:35:19
bengee tries locally
14:36:28
yep, works here
14:36:33
ah sweet
14:36:46
if there are multiple formats, should i get all of 'em?
14:36:54
rdfa plus microformats plus html-based?
14:36:59
in the same graph?
14:37:11
(assuming they're all active extractors)
14:37:17
unless you do conneg, yes
14:37:34
cool
14:37:37
(e.g. if you serve rdf/xml directly, instead of the html)
14:37:59
when you serve html, all active extractor are applied
14:39:27
rdfa needs the rdfa doctype, and erdf needs the profile hook, though
14:40:25
things became a bit too slow otherwise
14:42:06
I'm trying to implement a "one pass for all posh formats" approach to accelerate the extractor, but atm, each extractor needs a separate walk through the dom
14:48:10
aw, can't the rdfa be sniffed?
14:48:28
how?
14:48:48
yeah, by one of their dozen new attributes
14:48:49
hmm there's a thread in my mailbox with brad fitzpatrick re google sgapi doing a sniff
14:49:09
but the dtd was easier for now
14:49:12
there might even be perl around
14:49:14
i'll dig it out
14:49:16
but i better get to my departure gate 1st
14:49:49
I guess the ns defs make a good indicator, too
14:50:14
would you have to parse the whole doc first though to look for rdfa attributes?
14:50:42
but that won't work for reserved predicates (license and such), so the dtd is the only reliable thing (or profile)
14:51:06
ARC parses the whole doc once anyway
14:51:29
before invoking the specific parsers?
14:51:35
yes
14:52:05
it's not the most efficient approach, I admit, bt the extarctors work on a node index
14:52:28
heh, "extarctors"
14:52:32
one common false positive you might get with the ns, is xslt generated html, where the ns leak into the output
14:53:55
you could scan the document-as-string looking for strong indicators of rdfa, then do a real parse
14:54:18
false positives get stripped out by the real parse, so heuristic is harmless?
14:54:36
or the rdf-in-xhtml group could come up with a proposal that's more practical ;)
14:54:54
any suggestions?
14:55:17
actually, what you said about the node index does sound pretty efficient anyway
14:57:46
would be nice if rdfa, mfs, erdf, and posh formats were based on a common approach that allowed single-pass processing
14:58:20
what are posh formats?
14:58:32
plain old semantic html
14:59:22
the stuff that grddl is made for
14:59:24
i mean, what examples are there?
14:59:55
consistent class names, titel, h1-h6, ol/li, etc
15:01:05
does arc parse any posh ?
15:01:18
not in a generic way
15:02:11
I'm dreaming of something like DIY grddl where you define some mapping between html hooks and rdf triples
15:02:32
cool
15:02:38
ARC could have a set of predefined definitions for MFs
