The Structural skm/molDB XML format

The Structural skm/molDB XML format is a simple XML grammar for marking up chemical overview data. This document describes v1.0 of this grammar, and should be viewed as an informative specification.

Structure

The Root Element

moldb

The root element of the skm/molDB format is called <moldb>. In normal usage, a root element should be used as follows:

<?xml version="1.0" encoding="utf-8"?>
<moldb xmlns="http://totlandweb.info/xml/moldb" xml:lang="en">
   <!-- contents here -->
</moldb>

If the moldb element is embedded in another XML format, we suggest using the prefix skm:, as in <skm:moldb>. (For more about embedded moldb, see XHTML+molDB.)

Attributes

identifier

The identifier should, for standalone documents, be a Reverse-DNS string uniquely identifying the SKM document (e.g. info.totlandweb.moldb.elements). Between the company identifier (info.totlandweb) and the name (elements) should be a part named moldb. All identifiers that start in info.totlandweb.moldb are reserved.

Metadata

The skm/molDB format currently has two metadata elements: title and info.

title

The title element contains a human-readable name for the data set. Examples include "Structural Main Database". This element is required.

info

The info element contains a human-readable description of the database. This description should probably not be more than 3 or 4 lines. This element is required, but may be empty or self-closed. (<info />)

Attributes

license

The info element currently has only one attribute; license. It describes the license for the database, and must be one of the following.

This license only applies to the database file, and not the appearance of the molecule in the interface of the application that uses it.

Data

mol

The base element for encapsulating molecular information is the mol element. It groups name and suffix information, as well as other properties, together.

Attributes

The mol element currently uses 2 attributes:

formula

The formula of the molecule in question. This attribute is required.

smiles

The SMILES representation of the molecule. This attribute is required.

name

This element goes inside of mol elements. A mol element may contain an arbitrary ammount of name elements, but when searching in the regular interface, only the first name of the first matching mol group will be used. (This means that while "carbon", "diamond" and "graphite" all produce 'C', searching with 'C' will only ever yield "carbon".)

suff

This element contains the name of the suffix this molecule fragment gives when it is the latter end of a bigger compound. In each mol, only one name or suff item is needed.

Full example

<?xml version="1.0" encoding="utf-8"?>
<moldb xmlns="http://totlandweb.info/xml/moldb" xml:lang="en">
  <title>Example MolDB</title>
  <info license="PD" />
  <mol formula="H2O" smiles="O">
    <name>water</name>
    <suff>hydrate</suff>
  </mol>
  <mol formula="CH4" smiles="C">
    <name>methane</name>
  </mol>
  <mol formula="SO4">
    <suff>sulfate</suff>
  </mol>
</moldb>

TODO

This document is a draft of the skm/molDB xml specification version 1.0. Among things that need to be done are, in no particular order: