Short Documentation of ChemHits

Short Documentation

What is ChemHits for?

ChemHits enables its users to match either a single name of a chemical compound or a whole list of names against reference databases, even when the notations are different. This matching is solely done on the name (string) basis of chemical compounds without identifying the exact chemical structure of the molecule described. An input name is normalized to a unique name form by a set of transformation rules. These rules include, among others, reordering of substituent descriptions in the name and replacement of synonymous name constituents (e.g. equivalent trivial names), as well as more simple rules dealing with different spellings, spaces, hyphens, etc. The resulting normalized term does not necessarily represent a valid or even systematic notation for a given compound but is only intended for matching two names normalized by the same methods.

Synonym relations can be established between any two names which resulted in the same normalized name form. The synonyms found in the pre-normalized reference databases, by use of our normalization methods, are displayed as the matching results for the input name together with their ID.

As reference databases containing already normalized name forms we currently provide

Their names in ChemHits, e.g. sabio081209, indicate the creation date of the reference name list in ChemHits.

ChemHits Manual

The name matching against a pre-normalized reference databases can be done:

1) with a single name of a chemical compound: Match Single Compound Name
2) with a list of names of chemical compounds: Match List of Compound Names

1) Match Single Compound Name:
The input can be any chemical compound name which the user wants to have normalized and matched against one of the pre-normalized databases.

2) Match List of Compound Names:
A file containing a list of either compound names or compound names with their corresponding IDs has to be specified (with the corresponding path where it can be found). The file has to be formatted in one of the allowed formats before normalization and matching can be performed.

Options:

For both cases optional matching of conjugate acid-base pairs can be chosen: Perform Acid-Base Equalisation? (Checkbox). This option allows recognition of a conjugate base to a given acid and vice versa. As the protonation state of an acid in solution depends on the pH of the solution, in many cases the conjugate pairs can be considered as identical (e.g. for many biochemical applications). The default selection therefore is checked (switched on). However, for some chemical purposes discrimination is needed, for these cases this option can be deselected.

For single compound name matching there is a second option, which allows doing a sub-string search over the resulting set of normalized names. If this option is selected (checkbox Search for Normalised Substrings?) all notations are shown from the reference database that contain the given chemical compound name anywhere in their name.

References:

Normalization and Matching of Chemical Compound Names
Martin Golebiewski, Jasmin Saric, Henriette Engelken, Meik Bittkowski, Ulrike Wittig, Wolfgang Müller, Isabel Rojas (2009).
Available from Nature Precedings.

Flache und semantische Verarbeitung von Namen biochemischer Verbindungen
Henriette Engelken, Martin Golebiewski, Meik Bittkowski, Fritz Hamm, Jasmin Saric, Ulrike Wittig, Wolfgang Müller, Uwe Reyle, and Isabel Rojas (2009).
In INFORMATIK 2009 - Im Focus das Leben, Beiträge der 39. Jahrestagung der Gesellschaft für Informatik e.V. (GI), Lübeck, Germany, 28.September - 02.Oktober 2009, LNI.

Contact: chemhits (at) h-its.org

SDBV group page

Back to ChemHits Search