Computer methods of analyzing mass spectral data center on three fundamental methodologies: library search techniques, expert system procedures, and classification methods.

For additional information, see Using the Search Utilities, Previewing Unimolecular Reactions, or Example Workflow.

Self-Organizing Maps (SOM) is a special class of neural networks. Use these methods, based on different principles, to explore complex data from various perspectives. Principal Component Analysis (PCA) uses multivariate statistics, fuzzy clustering assigns data to clusters, and SOM is based on competitive learning.

In the multivariate statistic, you can consider each spectrum as a single point in an n-dimensional space, with the intensities being the coordinates of this point. A dimension (axis) of that space represents a mass-to-charge ratio, m/z, of the considered peak. This means that the dimensionality is determined by the m/z value of the last peak in the spectrum. For example, the EI spectrum of hydrogen exhibits two peaks at m/z =1 (intensity 2%) and m/z = 2 (100%). You can view this spectrum as a point in a two-dimensional space with the coordinates [2, 100].

In reality, spectra have a far higher dimensionality than two. When the dimensionality is too high, or several coordinates are equal to zero (usually a mass spectrum does not have peaks at every m/z value), the classification methods might not provide the required results. As a result, a reduction of dimensionality is carried out either before a spectrum is placed in n-dimensional space, or during the classification process.

The basic hypothesis of multivariate statistical methods assumes that the distance between points (spectra) in an n-dimensional space is related to a relevant property of the compounds that represent these points. When the points are close enough to form a cluster or a separated region, you can assume that the compounds that correspond to these points exhibit common or similar properties. To ensure the results of the classification methods have statistical significance, place a large number of spectra (usually one or more groups, each with 10–1000 spectra) in the same n-dimensional space. Then apply multivariate statistical methods, with various parameters, to evaluate these points (spectra). The objective of a classification process is to separate these points (spectra) into two or more classes according to the desired structural or other properties.