Topic Merging

Top  Previous  Next

Automatic topic merging is a key feature of topic maps and one that brings many benefits to topic map development and to applications that make use of topic maps for managing and exchanging data.

 

The principle behind topic merging is that in any given topic map, each subject described by the topic map must be represented by one and only one topic in the topic map. This means that it is the responsibility of the topic map processor to attempt to identify the situation in which two topics represent the same subject and to process them so that only one topic remains. This is the process of merging.

 

Identifying when two topics represent the same subject is achieved by applying heuristics. The topic maps standard defines a set of basic heuristics:

 

1.If two topics share the same source locator, then they have been parsed from the same topic map source and must be considered to represent the same concept.
2.If two topics have the same subject locator, then they both identify the same network resource as being the thing that they represent.
3.If two topics have the same subject indicator, then they are both using the same resource to describe the concept that they represent and must be considered to represent the same concept.
4.Finally, a topic map application may make use of any domain-specific information it has to determine that two topics represent the same concept.

 

Item (3) in the list above shows the importance of selecting a good resource as the description for a concept. If the description is somehow ambiguous or if the resource addressed is not well-defined enough, it is possible that two different topic map authors might use the same resource as a descriptor for different concepts, leading to undesired merging. In our experience, good resources for subject descriptors are ones created specifically to describe a single subject - the pages at wikipedia.org, for example, or pages created by the topic map author(s) or by a community of practitioners to define a controlled vocabulary.

 

Item (4) allows for applications to extend the Topic Maps standard's set of merging criteria with application-specific criteria. These could include criteria based on more than a straight-forward string or URI comparison. For example an application might know that "The Duke" and "John Wayne" are names for the same actor and merge two topics on that basis.

 

Having identified the topics to be merged, the merging process defines the process of replacing those two (or more) topics with a single topic. The single topic that results from the merging process has all of the identifiers, names, and occurrences of the topics that are merged. In addition, the result topic replaces the merged topics wherever they are referenced (that is, in any associations, scopes, or types that they appear in). This process is shown schematically in the diagram below.

Topic Merging

Topic Merging