Generating A Schema From A Topic Map

The NPCL API provides a basic schema inference engine, that can examine a topic map to determine what types exist and how they are used. The programming interface to this engine is the method InferSchema(ITopicMap, string) in the class NetworkedPlanet.Npcl.SchemaUtils. To use this method you must pass in the ITopicMap handle to the topic map you want to evaluate, and the base URI to be used for generated subject identifiers. The base URI string you pass in can be null, in which case the base URI defaults to the special prefix 'urn:x-tmcore:topicid'.

The inference proceeds by querying the topic map to determine what topics are being used as types; which types of occurrences occur on which types of topics; which types of association roles appear in which types of associations; and which types of topics play which types of association role. The inference also attempts to determine what types of topics are being used in scopes and will read and reflect the superclass-subclass hierarchy if you have used the XTM-defined subject identifiers for the superclass-subclass association type and related role types.

The inference has a number of limitations and failsafes that you should be aware of:

  1. If a topic is found that is used as a type, NPCL requires that the topic must have a subject identifier. If the topic does not already have at least one subject identifier, one will be generated for it. This generated subject identifier will be base URI prefix specified in the method parameters followed by the name of the topic forced into lower case. So if you specify the base URI prefix as 'http://www.mycompany.com/psi/general/', then the topic with the name "Sales Forecast" will get the generated subject identifier 'http://www.mycompany.com/psi/general/sales forecast'.

    If you pass in NULL as the base URI prefix, then topics will be assigned a URI of the form 'urn:x-tmcore:topicid:' followed by the database object identifier of the topic. This special form of URI is recognised by the NPCL TopicMapSchemaWriter class, which will find the topic to be updated using its object identifier rather than a subject indicator. This feature is provided to allow you to generate an NPCL schema from a topic map and then import the schema into that topic map without creatin duplicates of topics that did not originally have subject identifiers. You should NOT use this feature to create an NPCL schema that you then intend to import into a different topic map.

  2. Cardinality constraints are generated to be rather lax. Occurrence and role player constraints are generated as either '0 or more' or '1 or more' constraints. A '1 or more' constriant is generated only if every topic of the given type would conform to that constraint. Association role constraints minimum cardinality is generated as 0 or 1 (only if all associations of the given type have at least one role that conforms), and maximum cardinality is generated as either unbounded or 1 (only if all associations of the given type have no more than one role that conforms). You may wish to review the generated constraints and make them tighter in certain applications.

  3. The inference process makes no attempt to infer datatype, minimum value, maximum value or value pattern facets for occurrence types. You should add these manually if they are required.

  4. When a topic is used in a scope, the inference engine works as follows:

    1. If the scoping topic is also a role type and the scoped item is a topic name, the topic is ignored (role types are often used to scope association type names to provide context-sensitive association labels - these are one-off usages and would make the NPCL schema grow unmanageably if they were all recorded).

    2. If the scoping topic is typed, then the appropriate scoping facet value is added to the topic type. So if a topic "English" is used to scope a topic name, and the "English" topic has the type "Language", then the scoping facet value for the type "Language" will include NAME as one of its values.

    3. If the scoping topic is untyped, then a new Scoping Topic is generated in the schema to represent it. Note, that this only occurs if the scoping topic is not typed.

Example 12.5. Generating An NPCL Schema File From A Topic Map

The following code snippet shows how the schema inference engine can be used to generate an NPCL file for use in other topic maps

// Assume that tmSystem is an ITopicMapSystem object that is already initialized

// Get the topic map to process
ITopicMap tm = tmSystem.GetTopicMap("my-topicmap");
// Invoke the SchemaUtils to generate the schema
ISchema schema = SchemaUtils.InferSchema(tm, "http://www.mycompany.com/psi/");
// Write the schema to a file
XmlTextWriter xmlWriter = new XmlTextWriter("my-topicmap.npcl", System.Text.Encoding.UTF8);
xmlWriter.Formatting = Formatting.Indented;
XmlSchemaWriter writer = new XmlSchemaWriter(xmlWriter, false, "npcl");
writer.WriteSchema(schema);
xmlWriter.Close();

Example 12.6. Generating And Adding Schema Information To A Topic Map

The following code snippet shows how the inference engine can be used to populate a topic map with a schema generated from it.

// Assume that tmSystem is an ITopicMapSystem object that is already initialized

// Get the topic map to process
ITopicMap tm = tmSystem.GetTopicMap("my-topicmap");

// Invoke the SchemaUtils to generate the schema
// Using null for the subject identifier base generates identifiers
// that will correctly update topics that do not currently have subject identifiers.
ISchema schema = SchemaUtils.InferSchema(tm, null);

// Write the schema back to the topic map - replacing any previous schema information
TopicMapSchemaWriter writer = new TopicMapSchemaWriter(tm, SchemaWriterMode.ReplaceAll);
writer.WriteSchema(schema);