Thursday, February 16, 2006

Topic Maps and RDF scutters (assertion spidering bots): State of the art?

I am looking for the state of the art in Topic Maps and RDF scutters (assertion spidering bots).
Do you have any useful pointers/hints?

The growing use of semantic knowledge technologies like RDF and Topic Maps should result in larger collections of represented assertions available on the internet. Scutters (information agents spidering such assertions) could collect and integrate them.

One example of such an assertion scutter is:
http://rdfweb.org/topic/Scutter
http://rdfweb.org/topic/ScutterVocab

I have sketched the idea in my blog entry dated 24th Nov., 2005.
http://asigel.blogspot.com/2005/11/ideas-for-aggregation-of-distributed.html

Do you have any information concerning the following four questions:

(1)
Which available collections of statements/assertions do you know and can you recommend to me for an aggregation scenario?


I want to use them in a content aggregation scenario where statements about the same subjects are collocated. Ideally, such a collection would use Published Subjects (or subject indicators). I am particularly looking for topic map data, but would also like to know about RDF data, since accorcing to the latest guidelines in semantic interoperability, useful mappings are possible between Topic Maps and RDF.

MusicBrainz is one example for a semantic web service with RDF.
There are also e.g. approaches for converting genealogical data (GEDCOM) to RDF FOAF,
or one might use DMOZ RDF data.

(2)
Which scutters (spidering information agents for RDF and/or topic maps
(or fragments) do you know/can you recommend?

I am planning to use:
http://search.cpan.org/~kjetilk/RDF-Scutter/
a LWP agent based on RDF::Redland.

Does something similar already exist for Topic Maps?

I know of some Java agents,
in particular the CC-licenced
Slug: A Simple Semantic Web Crawler (December 09, 2004) http://www.ldodds.com/blog/archives/000167.html
http://aloo.gnomehack.com/~ldodds/projects/slug/javadoc/

SECO contains an
RDF Crawler: Scutter (Bash and Pyhton for Scuttering) http://triple.semanticweb.org/svn/aharth/2004/wwwnyc/seco-talk.html
http://www.harth.org/andreas/2004/ieeeis/
SECO: mediation services for semantic Web data Harth A IEEE Intelligent Systems, (USA) May/Jun 2004, Vol 19 No 3, 66ff.
Harth and Gassert describe a 103 MB test data set they compiled:
On Searching and Displaying RDF Data from the Web http://sw.deri.org/2004/12/derisearch/Eswc2005Demo.pdf

In addition, researchers in SNA (Social Network Analysis) write scutters.
The data set compiled e.g. by PhD student Peter Mika is impressive:
Social Networks and the Semantic Web
http://doi.ieeecomputersociety.org/10.1109/WI.2004.10039

There exists a Redfoot-RDF-Scutter in Python:
http://redfoot.net/scutter/
for which a REST interface has been proposed:
Sun, 29 Jan 2006
A RESTful Scutter Protocol for Redfoot Kernel http://copia.ogbuji.net/blog/2006-01-29/A_RESTful_

There is a Javascript extension for Mozilla:
Scuttering Composite RDF Datasource
http://nachbaur.com/software/mozilla/objects/index.xhtml

In his research proposal "Mining the Semantic Web", Ajay Chakravarthy in section 2.4 names some existing tools http://www.dcs.shef.ac.uk/~ajay/reports/Research%20Proposal.pdf
(Ontotext, Hackdiary, others with poor performance)
HyperSpider - HyperSpider (Java app) collects the link structure of a website. Data import/export from/to database and CSV-files. Export to Graphviz DOT, Resource Description Framework (RDF/DC), XML Topic Maps (XTM), Prolog, HTML. Visualization as hierarchy and map.
http://hyperspider.sourceforge.net/
(I could export website interlinkings with this, but this is formal metadata)

A List of RDF Crawlers
http://www.dbis.informatik.uni-frankfurt.de/~tolle/RDF/RDFReferences.html
(4 entries)
* RDF Crawler (in Java) from Institute AIFB, University of Karlsruhe, Germany
* Decentralised and reliable resource discovery using RDF metadata (also known as Fydra)
* DAML Crawler
* RDF Crawling Services - RDF Gateway
LuMriX which is topic map-based contains a crawler, but I know not enough about it.
http://www.lumrix.de/xmlsearch_keyfacts.php

(3)
Which sites freely offer semantic web services?
I want to retrieve assertions, i.e. fragments of knowledge networks realized with Topic Maps or RDF.
Preferably with a possibility to retrieve by published subject (or subject indicator).
Indirect search by name where I assert the identity of the subject might do for the moment.

(4)
Do you know of demo sites which can be externally queried with TMRAP 0.2 (or higher: 1.0, 2.0)?


Scratchpad of additional references:
(not yet checked)
------------------------------------

Current State of Semantic Web Mining
http://www.fernuni-hagen.de/DVT/Aktuelles/zhao_yi.pdf
Check starting slide 38, but not so useful for this purpose
Ontobroker, which includes
an ontology-based web-crawler

DefineCrawler
http://www.lalic.paris4.sorbonne.fr/stic/octobre/octobre/apr/Nauer.pdf

RDFWeb notebook: aggregation strategies
http://rdfweb.org/2001/01/design/smush.html
(describing Swoogle)

Finding and Ranking Knowledge on the Semantic Web http://www.cs.umbc.edu/~ypeng/Publications/2005/iswcLiDing.pdf

Search on the Semantic Web
http://www.cs.umbc.edu/~ypeng/Publications/2005/IeeeSemanticWebSearch.pdf

JNotes. Automatic Generation of Semantic Networks
http://www.jnotes.de/JNotes/jnotes_webware.nsf/0/2DC6FB39AE566557C12570EC00307C3B?openDocument

[xtm-wg] Sketch of a Possible Algorithm for Fragment Grabbing (2000) http://lists.oasis-open.org/archives/topicmaps-comment/200007/msg00018.html

Pragmatic applications of the Semantic Web using SemTalk.
The agents are supported by crawlers searching proactively or after request for existing models to generate index files for the agents. The crawlers do not only look in the local filesystem, but also in the Semantic Web, for available knowledge sources in the RDFS format.
http://www.semtalk.com/pub/KnowTech2001.htm

Metadata-based Web Querying
http://www.cs.bilkent.edu.tr/~ismaila/research_projects.htm

RDFStore
Perl/C RDF storage and API
http://rdfstore.sourceforge.net/

CARA is an RDF API written in Perl
http://cara.sourceforge.net/

---

TMRA 2006: International Conference on Topic Maps Research and Applications, Leipzig (DE)

TMRA 2006 - International Conference on Topic Maps Research and Applications"
Leveraging the Semantics"
Leipzig, Germany, 11-12 October 2006
http://www.informatik.uni-leipzig.de/~tmra/2006/

Full disclosure: I am co-chair of the program committee

Thursday, November 24, 2005

Ideas for the aggregation of distributed (P2P) RDF and Topic Maps

This blog entry is about some ideas on how to aggregate content with markup in RDF and topic maps which is distributed over the web. The aggregation may also be accomplished P2P-like.

1. Problem space/application areas
-----------------------------------
I want to use existing RDF/Topic Maps annotations. Those can be understood as disaggregations of underlying texts, dervice by humans annotating.

My aggregation will lead to innovative knowledge products and knowledge services.

I will not only aggregate assertions, but also aggregate (compose) knowledge services. A knowledge service can be implemented with TMRAP on top of a topic map.

See e.g. the slides of my talk
kPeer (Knowledge Peers): Informationssuche beim verteilten SemBloggen

The project "SemNetMan" (Semantisch basiertes Netzwerkmanagement) [1] combines:
SNA (Social Network Analysis) with Semantic Web Technologies

Similarly, Peter Mika is doing his dissertation research [2] on Social Networks and the Semantic Web, applying it to Flink [3].

I myself want to achieve semblogging, integration of content from weblogs which have semantic markup.

2. Architecture
---------------
According to [4], three elements are useful for this:
(1) RTM (RDF to topic maps mapping),
(2) PSIs, and
(3) topic map content distributed on the web

RTM is currently implemented in the commercial solution OKS and the Omnigator, and according to the slides, a TMAPI implementation is also under way (but I do not know the status and which backend would support that already).

A tutorial-like description for RTM mapping (for SKOS) is given in [5], and a description of RTM itself can be found in [6, 7].

So, where do I get a PSI collection from?

TopicMapster (TMShare) [8] is the idea of exchanging topic map fragments in P2P fashion.

This has meanwhile become possible in practice with TMRAP (Topic Maps Remote Access Protocol) [9] (version 1.0 will be available end of 2005 in OKS)
It is a web service for Topic Maps
For an explanation, see [10, 11]

Which (free) topic map software does/will support TMRAP?
(There seems to be some early TMRAP support in TM4J? (see Sourceforge-CVS of TM4Web)

Alternative approaches:
TMIP [12], or SNAPI [13]

3. Technical tools
-----------------
To collect RDF triples from various sources into one store, one could use, on an experimental basis, RDF::Scutter [14]. It is a web robot collecting distributed RDF into a central store.

It is based on RDF::Redland [15], a Perl binding for Redland framework [16]. Redland itself is a "free, open source C library for parsing, storing and querying RDF files"

(Yes, there is also a Java binding for Redland) [17]

Redland is useing RDQL for queries [18]

Two tutorials how to use RDF::Redland from Perl [19, 20]

To bind RDF triples to Perl objects in general, one can use Class::RDF [21]

References
[1]
http://www.semantics2005.net/semnetman-semantisch-basiertes-management-sozialer-netzwerke.workshop.60.11.htm

[2]
http://www.cs.vu.nl/~pmika/research.html
http://www.cs.vu.nl/~pmika/research/papers/VUBIS-PhDproposal.doc

[3]
http://flink.semanticweb.org/

[4]
[Garshol & Naito 2004] Realization of seamless knowledge: connecting distributed RDF and Topic Maps", presented 2004-11-06 at SIG-SWO-A403-04
http://www.jaist.ac.jp/ks/labs/kbs-lab/sig-swo/papers/SIG-SWO-A403/SIG-SWO-A403-04.pdf (2 pages, Abstract)
http://www.knowledge-synergy.com/topicmaps/document/sig-swo.pdf (16 slides, PDF)

[5]
Garshol, Marius: SKOS in Topic Maps
Blog entry 2005-10-24
http://www.garshol.priv.no/blog/10.html

[6]
The RTM RDF to topic maps mapping: Definition and introduction
2003-12-28
http://www.ontopia.net/topicmaps/materials/rdf2tm.html

[7]
Lars Marius Garshol: Living with topic maps and RDF: Topic maps, RDF, DAML, OIL, OWL, TMCL
http://www.ontopia.net/topicmaps/materials/tmrdf.html

[8]
Ahmed, Khalil (2003): TMShare - Topic Map Fragment Exchange In a Peer-To-Peer Application. In: Procs. XML Europe 2003, 2003. http://www.idealliance.org/papers/dx_xmle03/papers/02-03-03/02-03-03.html
http://www.techquila.com/topicmapster.html

[9]
Pepper, Steve (2004-04-17): Topic Maps Remote Access Protocol 0.2
(Technical report)
http://www.jtc1sc34.org/repository/0507.htm

[10]
[Garshol 2006] Garshol, Lars Marius (2006): TMRAP: A Web Service Protocol for Topic Maps. Procs. TMRA'05, International Workshop on Topic Map Research and Applications, Leipzig, Oct 6-7, 2005. Springer (under preparation)
Online: http://www.informatik.uni-leipzig.de/~tmra05/PRES/LMGa.pdf (slides)

[11]
[Pepper & Garshol 2004] Pepper, Steve & Garshol, Lars Marius (2004): Seamless Knowledge. Spontaneous Knowledge Federation using Topic Maps. Late breaking talk, presented at Extreme Markup Languages 2004, Montréal, Quebec, Canada, August 2-6.
Online:
http://www.ontopia.net/topicmaps/materials/Seamless%20Knowledge%20with%20TMRAP.ppt
(slides, with title: Seamless Knowledge. Spontaneous Knowledge Federation using TMRAP)

[12]
Barta, Robert: TMIP, a RESTful Topic Maps Interaction Protocol,
Presentation at Extreme Markup 2005
http://www.mulberrytech.com/Extreme/Proceedings/xslfo-pdf/2005/Barta01/EML2005Barta01.pdf

[13]
SNAPI - Semantic Network API
http://sourceforge.net/projects/snapi

[14]
RDF::Scutter
http://search.cpan.org/~kjetilk/RDF-Scutter-0.1/lib/RDF/Scutter.pm

[15]
RDF::Redland
http://search.cpan.org/~djbeckett/Redland-0.9.14.1/

[16]
Redland
http://librdf.org/

[17]
Java-Binding for Redland
http://librdf.org/docs/java.html

[18]
RDQL
http://www.w3.org/Submission/RDQL/

[19]
http://www.robertprice.co.uk/robblog/archive/2004/10/Querying_RDF_In_Perl_with_RDF_Redland.shtml

[20]
Barta, Robert (Perl expert in the topic maps community)
http://james.bond.edu.au/courses/inft73371/043/redland.mc

[21]
Class::RDF
http://search.cpan.org/~zooleika/Class-RDF-0.20/

Tuesday, October 18, 2005

SnipSnap (Weblog and Wiki) and K-Logs

While checking SnipSnap, the free "easy Weblog and Wiki Software", I encountered the concept of K-Logs (knowledge management weblogs, in short: klogs), and hence klogging. Cool!
http://www.snipsnap.org/space/k-logs.

In consequence, this might lead us to "semklogging" instead of semblogging, of course :-)

Monday, October 17, 2005

Sent Lutz Maicher some comments on his German Terminology of TMDM Topic Map terms

http://www.informatik.uni-leipzig.de/~maicher/tmt/latest_tmt.pdf

I should probably one day check the finalized version against my thesis.

Friday, October 14, 2005

Semantic Wikis (SemWikis), Topic Map Wikis and Semblogging

How can I couple (Topic Map-based) semblogging with SemWikis?

Would one write LTM on the wiki pages and/or use a topic map editor?

Karsten Böhm (boehm@informatik.uni-leipzig.de) kindly pointed me to some references in SemWiki land at University of Leipzig, Germany:

In addition I found:

Semantic wikimedia is a project just started (pre-alpha) coupling MediaWiki (the Wikipedia-Software) with semantics (http://meta.wikimedia.org/wiki/Semantic_MediaWiki). AIFB Karlsruhe people have a project on a semantic MediaWiki (http://www.aifb.uni-karlsruhe.de/Projekte/viewProjektenglish?id_db=67)
http://lists.w3.org/Archives/Public/www-html/2005Sep/0002.html talks inter alia about Topic Maps. I have not found more co-citing of Topic Maps and SemWikis.

They want to employ Redland as RDF backend.

There exist topic map-based wikis, z.B. Topiki (http://www.shelter.nu/blog-070.html) or TMWiki by Hendrik Thomas (http://www.topic-maps.org/)

Addition 2005-12-08:
================
IkeWiki is a RDF/OWL-based rewrite of MediaWiki, based on Jena
<http://ikewiki.salzburgresearch.at/>
(Sourceforge: <https://sourceforge.net/projects/ikewiki/> )
<http://jena.sourceforge.net/>

Thursday, October 13, 2005

Topic Properties, PSIs and more (Email by Jack Park)

As a follow-up of informal discussions at TMRA05 in Leipzig with Jack Park on a PSI registry, how difficult this might be with notions in a flux, and what this has to do with subject identification/identity, he today sent an email. I do not fully agree and want to come back soon to this (in a comment). We probably should discuss this at Bernhard Vatant's blog.

In my understanding Jack's main points are:
* Topic Maps can help humans to mediate among heterogeneous populations of ontologies
* we need a working ontology to map most other ontologies
* a lone PSI is not sufficient to unambigously identify a subject among a universe of subjects
* TMRM acknowledges this and allows specifying the properties of subjects
* subject properties means key-value pairs (key-value-valueType or propertyType-value-valueType triples)
* we need PSIs for keys (propertyTypes) and valueTypes (derived from XML dataTypes)
* AI feature vectors might be related to subject properties
* we need a TMA (topic map application) to identify all the ontologies to define the notion of "subject properties"
* the most important aspect of future mapping of subjects with topic maps will be to establish subject identity by way of subject property declarations
* in order to not reinvent terminology, such a TMA should attempt to include all relevant metadata/ontology standards.

Wiki on Topic Maps in Libraries (Suellen Stringer-Hye)

As announced on topicmapmail, Suellen Stringer-Hye of Vanderbilt University is putting together a Topic Maps for Libraries wiki (with pmwiki) at
<http://tm4lib.library.vanderbilt.edu/wiki/>
and has established a Topic Maps Interest Group within LITA.

Website: http://staffweb.library.vanderbilt.edu/libtech/stringer/
Email: suellen.stringer-hye@Vanderbilt.Edu

Monday, October 10, 2005

Semblogging use case: 40.000 digital fotos of destroyed paintings online

Reading the article [1] about the 40.000 digital fotos now online from http://www.zi.fotothek.org/, it occurred to me that this might be an interesting user community for semblogging (semantic blogging). I hope to be able to look into providing a first semblogging facility for such users in some weeks.

Unfortunately, the application uses popup windows and not REST style!

[1] "Jetzt online: Hitlers Dia-Sammlung im Netz". Süddeutsche Zeitung, Feuilleton, Freitag 7. Oktober 2005, S. 14

For Semblogging, see my open space presentation slide at TMRA05, included below:

Semblogging with Topic Maps

Motivation
----------
* smart content aggregation of blog entries needs more semantics than just tag clouds
* see e.g. Jack Park‘s elaboration on tagging in his „just for me“ paper
* Semblogging as a special case of semantic annotation in line with DKM (Distributed Knowledge Management)

Prior work
----------
* seminal work by Cayzer (RDF semblogging concept and protoype), citing the XTM book
* redo it, and even better with Topic Maps!
* ideas by Jack Park (on semblogging as an example for Augmented Storytelling)
* idea of Dmitry Bogachev. Prototypical OKS semblogging application by Lars Marius Garshol

Current work
------------
* Developing some Use Cases
* One student implementing in his diploma thesis a prototype coupling blojsom with tmapi and TM engine
* Further work under way looking more into distributed aspects and semantic web services

What to Semblog about?
----------------------
* bibMap or the Topic Map Research and Applications Landscape?
* All TMRA05 participants and community semblogging on Topic Maps? SemWikiBlogging?
* Learning process in a teaching course in information and knowledge management?
[And now added: the collection of 40.000 digital fotos, as discussed above)

Your ideas? Who is interested in what? Who will contribute what?