The Structural skm/molDB XML format

The Structural skm/molDB XML format is a simple XML grammar for marking up chemical overview data. This document describes v1.0 of this grammar, and should be viewed as an informative specification.


The Root Element


The root element of the skm/molDB format is called <moldb>. In normal usage, a root element should be used as follows:

<?xml version="1.0" encoding="utf-8"?>
<moldb xmlns="" xml:lang="en">
   <!-- contents here -->

If the moldb element is embedded in another XML format, we suggest using the prefix skm:, as in <skm:moldb>. (For more about embedded moldb, see XHTML+molDB.)



The identifier should, for standalone documents, be a Reverse-DNS string uniquely identifying the SKM document (e.g. info.totlandweb.moldb.elements). Between the company identifier (info.totlandweb) and the name (elements) should be a part named moldb. All identifiers that start in info.totlandweb.moldb are reserved.


The skm/molDB format currently has two metadata elements: title and info.


The title element contains a human-readable name for the data set. Examples include "Structural Main Database". This element is required.


The info element contains a human-readable description of the database. This description should probably not be more than 3 or 4 lines. This element is required, but may be empty or self-closed. (<info />)



The info element currently has only one attribute; license. It describes the license for the database, and must be one of the following.

This license only applies to the database file, and not the appearance of the molecule in the interface of the application that uses it.



The base element for encapsulating molecular information is the mol element. It groups name and suffix information, as well as other properties, together.


The mol element currently uses 2 attributes:


The formula of the molecule in question. This attribute is required.


The SMILES representation of the molecule. This attribute is required.


This element goes inside of mol elements. A mol element may contain an arbitrary ammount of name elements, but when searching in the regular interface, only the first name of the first matching mol group will be used. (This means that while "carbon", "diamond" and "graphite" all produce 'C', searching with 'C' will only ever yield "carbon".)


This element contains the name of the suffix this molecule fragment gives when it is the latter end of a bigger compound. In each mol, only one name or suff item is needed.

Full example

<?xml version="1.0" encoding="utf-8"?>
<moldb xmlns="" xml:lang="en">
  <title>Example MolDB</title>
  <info license="PD" />
  <mol formula="H2O" smiles="O">
  <mol formula="CH4" smiles="C">
  <mol formula="SO4">


This document is a draft of the skm/molDB xml specification version 1.0. Among things that need to be done are, in no particular order: