The Structural skm/molDB XML format is a simple XML grammar for marking up chemical overview data. This document describes v1.0 of this grammar, and should be viewed as an informative specification.
When Structural was designed, we wished to avoid pitfalls we had encountered when working on Beware of Molecules 4. Specifically, the wished to avoid the rigidity of it's name retrieval system. To anyone that might have studied the internals of a database file from BoM4, this should come as no surprise. The database files for both version 3 and version 4 were organized as a number of key-value pairs, where the name served as the key. However, in the crazy mixed up world of chemistry, one-to-one relationships rarely exist in practice. Instead, a chemical compound has a large number of properties:
This specification describes how Structural 5.0 handles a few of this issues, and will hopefully lead to lively and enriching discussion and conversation.
moldb
The root element of the skm/molDB format is called <moldb>.
In normal usage, a root element should be used as follows:
<?xml version="1.0" encoding="utf-8"?>
<moldb xmlns="http://totlandweb.info/xml/moldb" xml:lang="en">
<!-- contents here -->
</moldb>
If the moldb element is embedded in another XML format,
we suggest using the prefix skm:, as in <skm:moldb>.
(For more about embedded moldb, see XHTML+molDB.)
identifier
The identifier should, for standalone documents, be a Reverse-DNS string uniquely identifying the SKM document (e.g. info.totlandweb.moldb.elements). Between the company identifier (info.totlandweb) and the name (elements) should be a part named moldb. All identifiers that start in info.totlandweb.moldb are reserved.
The skm/molDB format currently has two metadata elements: title and info.
title
The title element contains a human-readable name for the data set.
Examples include "Structural Main Database". This element is required.
info
The info element contains a human-readable description of the database.
This description should probably not be more than 3 or 4 lines.
This element is required, but may be empty or self-closed. (<info />)
license
The info element currently has only one attribute; license.
It describes the license for the database, and must be one of the following.
This license only applies to the database file, and not the appearance of the molecule in the interface of the application that uses it.
mol
The base element for encapsulating molecular information is the mol element.
It groups name and suffix information, as well as other properties, together.
The mol element currently uses 2 attributes:
formulaThe formula of the molecule in question. This attribute is required.
smilesThe SMILES representation of the molecule. This attribute is optional.
name
This element goes inside of mol elements.
A mol element may contain an arbitrary ammount of name elements,
but when searching in the regular interface, only the first name of the first
matching mol group will be used. (This means that while "carbon",
"diamond" and "graphite" all produce 'C', searching with 'C' will only ever yield "carbon".)
suff
This element contains the name of the suffix this molecule fragment gives when it is the latter end of a bigger compound. In each mol, only one name or suff item is needed.
<?xml version="1.0" encoding="utf-8"?>
<moldb xmlns="http://totlandweb.info/xml/moldb" xml:lang="en">
<title>Example MolDB</title>
<info license="PD" />
<mol formula="H2O" smiles="O">
<name>water</name>
<suff>hydrate</suff>
</mol>
<mol formula="CH4" smiles="C">
<name>methane</name>
</mol>
<mol formula="SO4">
<suff>sulfate</suff>
</mol>
</moldb>
This document is a draft of the skm/molDB xml specification version 1.0. Among things that need to be done are, in no particular order:
license, for one.)