Re: [arc-dev] Some questions about ARC
From: "John A. Crow"
Subject: Re: [arc-dev] Some questions about ARC
Date: Tue, 19 Aug 2008 09:56:07 -0500
Ben -
"Between vacations" ??? That's a cool way to look at work ... :)
On Aug 19, 2008, at 8:23 AM, Benjamin Nowack wrote:
>
>
> Hi,
>
> just a quick reply (I'm between two vacations, just went online to =20
> do some
> email inbox housekeeping).
>
> On 15.08.2008 12:52:20, Bruno Barberi Gnecco wrote:
>
>> * has anyone used it with large databases (say, >10 million triples)?
> Highest reported number of triples using the available codebase so far
> was 11MT. Updates are horribly slow at that scale, though. I did a
> couple of successful tests for mid-size datasets (10-50 MT), but the
> required store optimizer code (which splits the triple tables) is not
> released yet.
>
>> Analyzing the
>> SQL tables I noticed that mediumint is used for everything, =20
>> including primary
>> keys for triples (limiting it to 16M triples). Any reason for that?
> Efficiency. The optimizer mentioned above will auto-adjust column =20
> types
> when the store size grows beyond a certain point.
>
>> * any performance benchmarks? The only one I found seems to be =20
>> rather old:
>> http://cweiske.de/tagebuch/SPARQL Engines Benchmark Results.htm
> None yet, but I've already explored the different bechmark options and
> will use SP=B2B [1] for tests soon. Again, the store optimizer is the =20=
> main
> dependency that's still missing.
>
>> * what is the triple_backup table used for?
> The backup table is no longer used. If you are using one of the latest
> revisions, you can delete it.
>
>> In fact, is there some place which
>> describes all the mysql tables?
> Not yet, things are still evolving. I'll describe them once the =20
> dust has
> settled.
>
>> * the end user documentation is awesome, but I couldn't find any =20
>> docs on the
>> internals of ARC. Does anything like that exists?
> Not really. I occasionally blog about findings, but the limited =20
> resources so
> far have been spent on code revisions and HowTos.
>
>> * what is the process for accepting patches, should I post them =20
>> here? Something
>> which I could immediately use is a setDBCon() in ARC2_Store, to =20
>> avoid opening a
>> 2nd DB connection if another one is already open.
> ARC should check that automatically (only for single-store setups, =20
> though), but
> in general, yes, just send your patches to me or the list. I guess =20
> one day I
> have to set up some sort of collaborative repository, so far it =20
> still seems
> to work with me being the bottleneck.
>
>> * any reason for using mysql instead of mysqli? Is there interest =20
>> in a port to
>> mysqli?
> I'm using mysql_* functions in so many places that maintaining both =20=
> would be
> a bit too much work. I am thinking of a "custom ARC" generator, =20
> though, as
> there is an increasing number of ARC users who only need a subset =20
> of the
> grown codebase (e.g. Talis doesn't use the store or SPARQLScript, =20
> others may
> only need the parsers, or the microformat extractors). Such a =20
> generator would
> allow me to simply str_replace the mysql commands with =20
> corresponding mysqli
> ones. For people without mysql_*, I could perhaps add a small drop-=20
> in that
> redefines the missing functions, but so far, I didn't have such =20
> requests.
>
>> Thanks a lot for ARC2 and any replies!
> Thanks for the questions! Much appreciated.
>
> Benji
>
> [1] http://dbis.informatik.uni-freiburg.de/index.php?project=3DSP2B
>
>
>> --
>> Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
>>
>
>
> --
> Benjamin Nowack
> http://bnode.org/
>
""" ;
ns1:returnPath "<crow@umn.edu>" ;
ns1:xOriginalTo "arc-dev@semsol.org" ;
ns1:deliveredTo "web11p1@p15192371.pureserver.info" ;
ns1:received """from [128.101.213.49] (x-128-101-213-49.wireless.umn.edu [128.101.213.49])
by mta-a2.tc.umn.edu (UMN smtpd) with ESMTP
for <arc-dev@semsol.org>; Tue, 19 Aug 2008 09:56:15 -0500 (CDT)""" ;
ns1:xUmnRemoteMta "[N] x-128-101-213-49.wireless.umn.edu [128.101.213.49] #+LO+TS+AU+HN" ;
ns1:xUmnClassification "local" ;
ns1:mimeVersion "1.0 (Apple Message framework v753.1)" ;
ns1:inReplyTo "<PM-GA.20080819152337.3CE8C.1.1D@semsol.com>" ;
ns1:references "<48A5A634.4010409@gmail.com> <PM-GA.20080819152337.3CE8C.1.1D@semsol.com>" ;
ns1:contentType "text/plain; charset=ISO-8859-1; delsp=yes; format=flowed" ;
ns1:messageId "<8B3D418F-D539-4751-801F-FF3709CE62A4@umn.edu>" ;
ns1:contentTransferEncoding "quoted-printable" ;
ns1:from '"John A. Crow" <crow@umn.edu>' ;
ns1:subject "Re: [arc-dev] Some questions about ARC" ;
ns1:date "Tue, 19 Aug 2008 09:56:07 -0500" ;
ns1:to '"arc-dev" <arc-dev@semsol.org>' ;
ns1:xMailer "Apple Mail (2.753.1)" ;
ns1:xSpamCheckerVersion """SpamAssassin 2.64 (2004-01-11) on
p15192371.pureserver.info""" ;
ns1:xSpamLevel "" ;
ns1:xSpamStatus """No, hits=-1.5 required=5.0 tests=BAYES_01 autolearn=ham
version=2.64