Thursday, October 13, 2005

Topic Properties, PSIs and more (Email by Jack Park)

As a follow-up of informal discussions at TMRA05 in Leipzig with Jack Park on a PSI registry, how difficult this might be with notions in a flux, and what this has to do with subject identification/identity, he today sent an email. I do not fully agree and want to come back soon to this (in a comment). We probably should discuss this at Bernhard Vatant's blog.

In my understanding Jack's main points are:
* Topic Maps can help humans to mediate among heterogeneous populations of ontologies
* we need a working ontology to map most other ontologies
* a lone PSI is not sufficient to unambigously identify a subject among a universe of subjects
* TMRM acknowledges this and allows specifying the properties of subjects
* subject properties means key-value pairs (key-value-valueType or propertyType-value-valueType triples)
* we need PSIs for keys (propertyTypes) and valueTypes (derived from XML dataTypes)
* AI feature vectors might be related to subject properties
* we need a TMA (topic map application) to identify all the ontologies to define the notion of "subject properties"
* the most important aspect of future mapping of subjects with topic maps will be to establish subject identity by way of subject property declarations
* in order to not reinvent terminology, such a TMA should attempt to include all relevant metadata/ontology standards.


Blogger Alexander Sigel said...

Some first comments, to be extended later:

• Yes, Topic Maps can help humans to mediate among heterogeneous populations of ontologies because humans can read and interpret PSI documentations.
• It is not our job to do this but to provide an open infrastructure where others can do this and where the emergent knowledge structure evolves in a useful direction.
• Yes, notions are in a continuous flux, and the semantics of a PSI can’t be changed after its publication. But this is not contradicting each other. A given assertion refers to a PSI. The PSI documents the fixed notion the person asserting had in mind and that was valid in the view of that person at the time of the assertion. If the notion changes, even slightly, a new PSI must be issued (which could be related to the previous one in an evolution relationship). There may be lots of PSIs with the same base name, even in the same discourse community when a term evolved.
• Humans can identify similarity and equality of subjects depending on their interpretation/viewpoint. Computers can help to formalize such viewpoints and semi-automatically propose like subjects, but cannot establish more complicated conceptual similarities because we cannot formalize such concepts well. Establishing identity of the referent is a cognitive process which humans can better perform than computers.
• With a PSI even without its documentation (just the URI) humans and computers can __distinguish__ between subjects. However, two different PSIs might have the same referent, only the identity is (not yet) established.
• A PSI can help to establish identity, but is not meant to be a formal specification (its semantics lies in the corresponding textual documentation). However, we _can_, in addition to a textual explanation, attach a better computer-interpretable specification to the PSI that is grounded in a framework (theory?) of how which characteristics characterize. Maybe this is driving what you call a TMA to disclose the ontology of what subject properties means?
• This is probably related to "essential characteristics" in knowledge organization theory and to Fred Riggs onomasiological approach in terminology/conceptology (see my chapter on knowledge organization in Jack’s topic map book), e.g. as implemented by Gerardo Sierra:
Gerardo Sierra, Instituto de Ingeniería, UNAM, Mexico and John McNaught, UMIST, Manchester: Design of an onomasiological search system: A concept-orientedtool for terminology, In: Terminology 6:1. 2000. (pp. 1–34)
more see (publications, also conferences))
• The set of essential characteristics depends on the interests and viewpoints of discourse communities (what is important to them in viewing entities and establishing their identity). Because of epistemological openness, the set is infinite regarding all potential future usages, but I expect that in practice it can be handled quite well.
• A concept can be internally defined by its essential characteristics which can be key-value pairs instantiating corresponding types. The concept is a single topic in a topic map. In the sense of Jack Park’s drill-down topic map feature, for each concept, the whole characteristics stuff of a topic can, with a proper ontology, be modelled in a detailed topic map of which the concept topic is the reification.
• Let’s assume that e.g. a person might be characterized by its birthday, day of death, birthplace, occupations, works produced, influence in thinking on the works of others, etc. We can write this data as a human-interpretable text into the PSI documentation, or we can attach a fully-fledged topic map to the PSI!
• From everything known about an entity (and represented with topic maps) we have to select all essential characteristics. (I do not know how). Those characteristics might diverge between two discourse communities. In addition, the identity assertion might only hold from a certain point of view. It might be possible to agree in a discourse community on what essential characteristics for an item of interest (e.g. artist, writer, composer) are.
• How can we convene upon a common ontology that specifies how to establish subject identity? How can two people agree if an entity is the same? E.g. one discourse community might want to establish identity via paintings a painter produced whereas a second one might want to establish identity via publications a writer wrote. Both might be able to agree on the meaning of “work” and the creator-created relationship for this person. Something like FRBR could be used (work-expression-manifestation-item) (Blog:, my earlier attempts
• A lone PSI can identify a subject if the notion is made explicit with a definition topic map attached to this PSI, as sketched above.
• I do not think that feature vectors are that much related to essential characteristics as e.g. in text retrieval feature vectors can be statistically derived from bag of word tokenization of texts (Joachims 2002, see Which means that the features are not well interpretable.

9:39 AM  

Post a Comment

Links to this post:

Create a Link

<< Home