My start in the computer industry came about as a result of being hired as a programmer by a company called Synthetica Technologies in Richmond California.

Synthetica Technologies had constructed a reactor to degrade toxic waste.

This reactor was essentially an autoclave. In an autoclave, which is essentially a pressure cooker, water is heated under pressure and thus becomes super heated steam.

When water is heated thusly a hydrogen-oxygen bond is broken homolytically to produce a hydrogen radical and a hydroxy radical.
The radical species are very reactive species and attack the molecules of the toxic waste to degrade it to innocuous chemicals.

My assignment was to create a human interface on the Macintosh for a component of the process to be described next.

When a 55 gallon drum or a tank car of toxic waste arrives a sample is taken and sent to an analytical lab which returns a report of the components of the mixture as well as the mole fraction.

When operating the reactor one needs to know how much energy to add to the system so that there is enough hydroxy radicals to ensure complete reaction but not too much as the excess is just wasted.

With the components of the mixture and the mole fraction one can calculate the heat capacity of the mixture and thus determine how much electricity to use to heat the mixture.

The operator uses the lab report to enter into the human interface the chemicals and the associated mole fraction.

This program interfaced with a program which calculated the heat capacities of chemicals.

The database, however, had the names of the chemicals in all upper case.

If you look at the periodic table of the elements you will find that some chemicals have as its symbol represented as a single character.

Others are represented by two symbols, the first upper case and the second lower case.

When you have all the symbols in upper case, information is lost.

For example if you have the string of characters in a fragment of a molecular formula such as SIO does that represent Sulfur, Iodine and Oxygen or is it Silicon(Si) and Oxygen?

To resolve this ambiguity I designed and wrote code to automatically determine the correct molecular formula. The data structure that I constructed I originally referred to as a binary chickenwire after a binary tree. I later found out that this type of data structure was known and was termed a directed graph.

In this type of data structure as in others, records are created. Each record has several fields. Some of the fields contain character information, other fields contain numerical information. A third type of field containes pointer information, which is a link to other records in the data structure.

Each record is referred to as a node. The pointer fields can further be subdivided into those constructed at the time the graph is created and another poiner field has its value determined during the graph traversal phase of the process. This value identifies the record last visited in the traversal. This is used when back tracking is necessary.

The molecular formula is a string of characters

Parsing this string involves breaking it up into its characters and groups of characters. Since some chemical symbols are single character and some are two character a set of records is created on the fly as the string is parsed.

The first character of a molecular formula is either one or the other, so a single character record is created and the first character written into it. The first character can also be the first character of a two letter symbol, and when the second character in the string is read it is put into the first record of the two character record type, as well as into the second single character record.

The entry point of the graph is called the root. The root has two pointer fields.

One of the pointers from the root of the tree points to the first single character record and the other pointer points to the first two character record.

In any chemical formula if there are more than one atoms that are the same there is a number, as in, for example, water, H2O, indicates that there are two hydrogen atoms. When the parsing program encounters this situation it knows that H and O do not go into a two character record.

In the representation of the data structure in the animation, the molecular formula if all single character chemical symbols can be entered sequentially in the boxes running down the left. The other pointer field in the root points to the first two character record. Each record for either the single caracter record or the double character record has also two pointer fields. The one on the left always points to a single character record and the one to the right always points to a double character record.

Once the entire string representing the molecular formula has been parsed then it is possible to link, starting at the root, through all the possible combinations of ways of traversing the graph.

The algoritm traverses the graph. In the traversal the algorithm visits first the first single character record. The algorithm compares the character contained therein with a table of the elements, and if that character is found in the periodic table that character is said to map to the periodic table. If the character maps to the periodic table the atomic weight of that atom is added to a running sum and the algorithm then visits the next record in the graph taking the left branch, which points to the next single character record. If the character maps to the periodic table the process continues as before but if the character does not map to the periodic table the algorithm back tracks to the previous node and then follows the link to the two character record that is pointed to in the graph.

When the algorithm terminates by completely parsing the string that is the chemical formula it compares the sum of the atomic weights calculated during the traversal and comparres it to the molecular weight for the compound included in the original record and within a tolerance determines a match the process is terminated returning the identity to the operator. In the case where there is no checksum match the algorithm backtracks and follows a different traversal path through the graph to find the correct interpretation.(work in progress, to be continued)