Computation 11 00154 v3
Computation 11 00154 v3
Computation 11 00154 v3
Article
Revealing the Genetic Code Symmetries through Computations
Involving Fibonacci-like Sequences and Their Properties
Tidjani Négadi
Physics Department, Faculty of Exact and Applied Science, University Oran1 Ahmed Ben Bella,
Oran 31100, Algeria; negadi.tidjani@univ-oran1.dz
Abstract: In this work, we present a new way of studying the mathematical structure of the genetic
code. This study relies on the use of mathematical computations involving five Fibonacci-like
sequences; a few of their “seeds” or “initial conditions” are chosen according to the chemical and
physical data of the three amino acids serine, arginine and leucine, playing a prominent role in a recent
symmetry classification scheme of the genetic code. It appears that these mathematical sequences, of
the same kind as the famous Fibonacci series, apart from their usual recurrence relations, are highly
intertwined by many useful linear relationships. Using these sequences and also various sums or
linear combinations of them, we derive several physical and chemical quantities of interest, such as
the number of total coding codons, 61, obeying various degeneracy patterns, the detailed number
of H/CNOS atoms and the integer molecular mass (or nucleon number), in the side chains of the
coded amino acids and also in various degeneracy patterns, in agreement with those described in the
literature. We also discover, as a by-product, an accurate description of the very chemical structure of
the four ribonucleotides uridine monophosphate (UMP), cytidine monophosphate (CMP), adenosine
monophosphate (AMP) and guanosine monophosphate (GMP), the building blocks of RNA whose
groupings, in three units, constitute the triplet codons. In summary, we find a full mathematical and
chemical connection with the “ideal sextet’s classification scheme”, which we alluded to above, as
well as with others—notably, the Findley–Findley–McGlynn and Rumer’s symmetrical classifications.
Keywords: genetic code symmetries; Fibonacci-like sequences; amino acids; ribonucleotides; patterns;
hydrogen atoms; atoms; molecular mass
Citation: Négadi, T. Revealing the
Genetic Code Symmetries through
Computations Involving
Fibonacci-like Sequences and Their
1. Introduction
Properties. Computation 2023, 11, 154.
https://doi.org/10.3390/ A novel approach to studying the genetic code’s mathematical and chemical structure
computation11080154 is presented in this paper. More precisely, using a small set of Fibonacci-like sequences
and, occasionally, some (useful) well-known elementary functions from number theory,
Academic Editor: Sergei Abramovich
the whole and detailed chemical content of the set of amino acids, as structured by several
Received: 8 June 2023 well-known symmetry patterns, including their degeneracy, is revealed. Also, several other
Revised: 29 July 2023 original applications, using the above sequences, are carried out.
Accepted: 1 August 2023 This paper, in addition to presenting new research results, also has an educational
Published: 7 August 2023 dimension, that of introducing the interested reader to an aspect of the mathematical study
of the genetic code. It could therefore also be read (the computations easily worked out) by
non-experts with mathematical backgrounds.
G for RNA. As for the “alphabet” of the second language, it comprises a set of 20 amino
acids. In the process of translation between these two languages, in the ribosome for short,
there are 64 = 43 “words”, the codons. Each group of three bases in mRNA constitutes
a codon, and each (sense) codon specifies a particular amino acid. Multiple codons can
encode the same amino acid; they are known as “synonymous” codons. This phenomenon
is also called degeneracy. In the standard genetic code, 61 sense codons are translated into
20 amino acids, which are organized into five “multiplets”, and three other (nonsense)
codons serve as termination or stop signals. These “multiplets” are the following:
• Three sextets: each coded by six codons serine (Ser), arginine (Arg) and leucine (Leu);
• Five quartets: each coded by four codons proline (Pro), alanine (Ala), threonine (Thr),
valine (Val) and glycine Gly);
• One triplet: coded by three codons isoleucine (Ile);
• Nine doublets: each coded by two codons phenylalanine (Phe), tyrosine (Tyr), cys-
teine (Cys), histidine (His), glutamine (Gln), glutamic acid (Glu), aspartic acid (Asp),
asparagine (Asn) and lysine (Lys);
• Two singlets: each coded by one codon methionine (Met) and tryptophane (Trp).
Table 1 shows the relationship between the amino acids, represented in their three-
letter code (see above), and the codons that encode them. For example, the codon UUU
codes for the amino acid phenylalanine (UUU-Phe). The three stop codons are indicated in
black.
Table 1. The genetic code table.
In this work, the “anomalous” three amino acids serine, arginine and leucine, each
coded by six codons, will play a prominent role. Contrary to the 17 other amino acids, the
codons of which share the same first base, the three mentioned amino acids have, each,
their six codons distributed over two separate family boxes. There are 16 such family boxes
in the genetic code table, and each one of them is a set of four codons sharing the same
first and second base (see Table 1). The structure of the three sextets is the following serine:
{UCN, AGY}, arginine {CGN, AGR}, leucine {CUN, UUR} (N for any base, Y for pyrimidine
U or C and R for purine A or G).
There are more and more voices rising to underline or put emphasis on the singular
nature of the three sextets and also bring experimental data which tend to show it [2,3].
A few years ago, a published work [4] claimed that the number of “codon families”
has to be increased to 23 by considering the quartet part and the doublet part of each
one of the three sextets as distinct. A “codon family”, a term used by the authors of the
above reference, not to be confused with the “family box” alluded above, is a group of
synonymous codons. In the case of the standard genetic code, each member in the five
multiplets mentioned above, taken individually, constitutes such a “codon family” because
its codons are synonymous and encode the same amino acid. For example, the triplet of
codons AUU, AUC and AUA, in Table 1, encode isoleucine. Also, in the special case of the
five quartets and the three quartet parts of the three sextets, the “codon family” and “family
box” represent the same thing. This identification is no longer valid in the other cases
where each one of the eight remaining “family boxes” contains groups of non-synonymous
codons. For example, in the “family box” AAN, the two synonymous codons AAU and
AAC encode asparagine, and the two codons synonymous AAA and AAG encode lysine.
Computation 2023, 11, 154 3 of 28
In their work, the above authors present a new “effective number of codon families”,
called Nc , to characterize codon usage bias in the analysis of protein-coding genes, which
improves existing ones. (An “effective number of codons” is a widely used index in
bioinformatics, see the above mentioned reference [4].) Specifically, they show that Nc is a
better predictor when its value is increased from 20 to 23; in particular, each sixfold codon
set (each sextet, as it is called in this work) is considered to be composed of separate fourfold
and twofold parts. These six entities are SerII,IV , ArgII,IV and LeuII,IV which, added to the
17 remaining amino acids with no “degeneracy” at the first base position, as mentioned
above, give a total of 23. This number (of codons), together with the remaining degenerate
codons, 38, constitutes what we call the pattern “23 + 38” (see Section 4.1 and elsewhere in
the paper). Of the kind of approaches mentioned above (i.e., Refs. [2–4]), there is one that
is particularly relevant to the present work: the “Ideal” symmetry classification scheme,
introduced a few years ago. It will be summarized in Section 2.3, and we present its
numerous connections with the present work in Section 4.2.2.
well as various other remarkable patterns. We have also included, at the end of Section 4
(in Section 4.2.5), a discussion concerning the choice, and its justification, of the initial condi-
tions of our Fibonacci-like sequences, defined in Section 3. In Section 5, still using some ele-
ments of our sequences, we make contact with the work by shCherbak, [12], concerning the
singular structure of proline and derive a mathematical form of the shCherbak–Makukov
“activation” key, [13], which, as is well known, led to many remarkable and beautiful
nucleon number patterns comprising, in particular, those related to Rumer’s symmetry. In
Section 6, using the “seeds” of our Fibonacci-like sequences, that is, their initial conditions,
and only these, we find that they are capable, on their own, to provide the very hydrogen
atom content of the amino acids, derived in the various patterns considered in Section 4.
Finally, in Section 7, we present some (new) results concerning the vertebrate mitochondrial
genetic code, a case that arose while finishing this paper. We strongly recommend that the
reader, at this point, before going to the next sections and getting a comfortable reading
of them, take a careful look at Appendix A, which gives the chemical data of all 20 amino
acids, in Table A1, and also includes some hints for the evaluation of several quantities
when the degeneracy is involved. (Several of these quantities, evaluated from the table,
are to be compared with their equivalents, derived mathematically in this paper, from our
Fibonacci-like sequences and their properties.) In Appendix B, a few other mathematical
tools used in this paper are defined with the presentation of some computation examples.
We have also included a third Appendix C, where we explain how the use of mathematical
software, containing a built-in “Fibonacci” function, could help the reader to carry out the
various computations presented in this paper. We also give several examples.
subsets, where each subset contains only codons having the same third base. Each of
these subsets may be mapped by f into members of the amino acids set A, with the image
being denoted f(Ck ); this is shown in Table 3 below. One has, therefore, f(CU ) = f(CC )
and f(CA ) 6= f(CG ). With this f-mapping, the authors also establish relations that define a
one-to-one correspondence between one member of a doubly degenerate codon pair and
the other member (see the reference above for details). These relations could be stated, in
words, as follows: (i) if a codon for an amino acid has the third base U, then there is a codon
for the same amino acid having the third base C and vice versa, OR (ii) if a codon for an
amino acid has the third base A, then there is a codon for the same amino acid having the
third base G and vice versa. For a doubly degenerate codon pair, (i) and (ii) are mutually
exclusive. For order four, or quartets, (i) and (ii) hold simultaneously. For order six, the
sextets, the quartet part obeys (i) AND (ii), and for the doublet part, one has (i) OR (ii). For
the odd-order degenerate codons (Ile, Met and Trp), however, there is a slight deviation
from symmetry. In Table 3, we show this classification.
In the above table, which is also a duplicate of Table 1, the “leading” group is indicated
in a light green background. As explained, at length, by the authors in [10], the genetic
code table in this new scheme is created by codons sextets based on exact purine/pyrimidine
symmetries (YR: (U, C, A, G) → (C, U, G, A)), A+U-rich/C+G-rich symmetries, strong/weak,
or complementary, symmetries (SW: (U, C, A, G) → (A, G, U, C)) and keto/amino symmetries
(KM: (U, C, A, G) → (G, A, C, U)). By starting with serine, the initial generator with its six
codons, the whole “leading” group (32 codons) is created using transformations among
those mentioned above and some mapping rules. Analogously, starting from the two
codons of leucine ( LeuII ) as “seeds”, the whole “nonleading” group is constructed. There
is also a simple relation between the “leading” group and the “nonleading” group. We
show, in Table 4, for visualization, these two groups by using our own format of the genetic
code table. We also find it noteworthy to mention that, under Rumer’s transformation
U ↔ G, A ↔ C , the “leading” group remains globally invariant whether the transformation
is applied to the first base only, to the first two bases only or to all three bases, and the same
is true for the “nonleading” group.
Below, in Section 4.2, we will show that the three amino acids serine, arginine and
leucine will also play a prominent role as mathematical (and chemically inspired) “seeds”
in computing the chemical content of the twenty amino acids, including degeneracy.
where Fn is an ordinary Fibonacci number. These four sequences differ only by the data
of the numbers p and q, which play the role of initial conditions or “seeds”, as we will
call them throughout this paper. Below, we shall explain and justify the choice of these
“seeds”, but for the moment, we introduce the four sequences by giving a name to each
one of them while assigning their “seeds”: (i) an : p = 1, q = 6, (ii) an0 : p = 6, q = 1,
(iii) bn : p = 9, q = 13, (iv) cn : p = 5, q = 30. In Table 5 below, we give the first few terms.
Computation 2023, 11, 154 7 of 28
Table 5. The first few terms of the Fibonacci-like sequences an , an0 , bn and cn .
n 1 2 3 4 5 6 7 8 9 10 11 12 13
p = 1,
an 6 1 7 8 15 23 38 61 99 160 259 419 678
q=6
p = 6,
an0 1 6 7 13 20 33 53 86 139 225 364 589 953
q=1
p = 9,
bn 13 9 22 31 53 84 137 221 358 579 937 1516 2453
q = 13
p = 5,
cn 30 5 35 40 75 115 190 305 495 800 1295 2095 3390
q = 30
These sequences obey several linear relations (or identities), some of which will prove
very useful in view of their applications in this work. They are presented below, in Equation
(2), and could be checked (see Appendix C, where concrete examples are also presented)
an − an0 −1 , (3)
in an unusual but interesting form: its “seeds” here are inverted with respect to the usual
Fibonacci sequence. Also, the sum of any of its first members until a certain index gives
an exact Fibonacci number, contrary to the usual Fibonacci sequence with the seeds 0 and
1, which always gives one unit less than a Fibonacci number. For example, in our case,
for n = 9, we obtain ∑91 Fn0 = 34. (Note that the indexing is shiftedhere, but the recurrence
relation is still valid.) There is also another relation linking the sequences an0 and bn . It
writes
an0 − bn−2 = 2Fn0 −5 . (5)
For n = 7, the sequences a0 and b take the same value: a70 − b5 = 0. Also, for n = 8,
a80= 86 and b6 = 84, and their difference is 2. These relations will have applications in the
following sections. Importantly, the sequences in Table 5 together with the one defined in
Equations (26) and (27) below either display several numbers highly relevant in this work,
directly as members in Table 5 (shown in a dark red color), or lead to significant sums to
be evaluated in the following sections. We have also discovered that the above sequences,
including the one defined in Equation (26), can all be shown to exhibit a bilateral symmetry
and other symmetry properties, in the line of thought of those established for the ordinary
Fibonacci sequence by Edge, see [14]. These findings will be reported elsewhere.
Computation 2023, 11, 154 8 of 28
It is the analog of the one for the ordinary Fibonacci sequence and could be checked
either with a pocket calculator directly from Table 5, for low values of k, or using the same
computations as those performed for the examples in Appendix C. For k = 5, we have
6 + 1 + 7 + 8 + 15 = 37 = 38 − 1. By grouping the first three terms on the one hand and
the remaining two on the other, we have
The unit is transferred to the left. Using the sum mentioned above (a4 + a5 = 8 + 15 = a6 = 23)
and adding it to the preceding relation gives (by appropriately arranging the terms)
It appears that there are 15 amino acids and 14 degenerate codons in Rumer’s set M2 ,
while there are 8 amino acids and 24 degenerate codons in Rumer’s set M1 (see above).
Let us now go into the details by examining, first, the set M2 . The number 15 could
be partitioned in two ways. The first consists in using the above sum for k = 3 to ob-
tain 6 + (1 + 7 + 1) = 6 + 9 = 15. Using the second way, we can apply the useful
A0 function and its properties (see below and Appendix B) to the number 15 (= 3 × 5):
A0 (15) = A0 (3) + A0 (5) = 6 + 9 = 15, which gives the same result as above, where
we have used the additivity property. Finally, the number 6, a perfect number, could be
written as the sum of its proper divisors {1, 2, 3} so that 15 = 1 + 2 + 3 + 9. We interpret
this relation as one triplet, two singlets, three doublet parts of the three sextets and nine
doublets. On the other hand, for the degeneracy part, 14, which writes 6 + 1 + 7 (see
above), we can, again, write 6 as the sum of its divisors, arrange the terms and obtain
14 = 3 + (1 + 1) + (2 + 7) = 3 + 2 + 9. Here, we have three degenerate codons for the three
doublet parts of the three sextets, two degenerate codons for the triplet and nine degenerate
codons for the nine doublets. For the set M1 , things are simpler. The degeneracy part from
Equation (8) above writes 24 = (8 + 1) + 15 = 9 + 15. As for the number of amino acids,
eight, as a Fibonacci number, it could simply be written as 5 + 3. This is the structure of the
set M1 . Table 6, below, summarizes all of these results for the two Rumer’s sets, which are
thus completely described using the Fibonacci-like sequence an .
Table 6. The derived multiplet structure of the amino acids in Rumer’s division.
This last equation will be considered in detail below, as it has great importance
concerning the computation of the degeneracy of the genetic code in various formats. By
isolating the last term a90 , we have
This relation is important and will play a prominent role in this section and later
(in Section 6). Equation (10) gives the number of hydrogen atoms in the amino acids’ side
chains, distributed into two parts: 139 hydrogen atoms in 23 amino acids (17 amino acids
with no “degeneracy” at the first base position and the six entities SerIV–II , ArgIV–II and
LeuIV–II ), on the one hand, and 219 hydrogen atoms in the remaining side chains of the
amino acids encoded by the 38 degenerate codons, on the other (see Appendix A for the
calculations from the table). This is the equivalent “23 + 38” pattern for the hydrogen
content. Next, as we have 139 = 53 + 86 = 22 + 31 + 86 from the recurrence relation of the
sequence bn , we can cast the relation above as follows:
This is the hydrogen atom content in the usual pattern “20 + 41” (117 hydrogen atoms
in the side chains of 20 amino acids and 241 hydrogen atoms in the side chains of the amino
acids coded by the 41 degenerate codons; see Table A1 in Appendix A). Note that 22 is the
number of hydrogen atoms in the side chains of serine, arginine and leucine, corresponding
to one codon for each one of them (see Table A1 in Appendix A). It is also just the right
factor that connects the two patterns “23 + 38” and “20 + 41”.
By restricting the sum in Equation (10), as shown below, we have
First, this hydrogen atom pattern corresponds to 132 hydrogen atoms in all the side
chains of the 3 sextets coded by 18 codons, on the one hand, and 226 hydrogen atoms in
all the side chains of the remaining 17 amino acids coded by 43 codons, on the other (see
below). Here, we see that the three sextets are set apart, and this has, we think, a link with
the subject of Section 4.2.2 below. Second, this pattern also describes the distribution of
hydrogen atoms in the side chains of the amino acids in the two classes of the aminoacyl
t-RNA synthetases: 226 hydrogen atoms in the side chains of all the amino acids coded
by 29 codons in Class-I and 132 hydrogen atoms in the side chains of all the amino acids
coded by 32 codons in Class-II; see [7]. Note the codon pattern “29 + 32”, the same as in
Equation (8) above.
4.2.2. The Hydrogen Atom Content in the “Ideal” Symmetry Classification Scheme
In this section, we consider the hydrogen atom content for the “ideal” symmetry
classification scheme, [10], which occupies an important place in this work, as it has a
tight relation with the choice of the “seeds” of our Fibonacci-like series. As promised at
the beginning of Section 3, this is the right place to explain and justify the choice of the
initial conditions of the sequences bn and cn , as defined in Section 3, having importance
in this section (more will be said about the “seeds” of the other sequences in Section 4.2.5,
which is devoted to their choice). Concerning bn , the “seeds” are 13 and 9 (see Table 5).
These are chosen, respectively, to be the number of hydrogen atoms in arginine’s and serine’s
side chains (10 + 3) and in leucine’s side chain (9). Their sum, which is the recurrence
relation, b1 + b2 = 13 + 9 = b3 = 22, is the number of hydrogen atoms in the side chains
of these three amino acids (see Equation (12)). The “seeds” of cn , 30 and 5, are chosen to
be, respectively, the number of atoms in the side chains of arginine and leucine (17 + 13)
and in the side chain of serine (5). Here, as for hydrogen, we have the recurrence relation
c1 + c2 = 17 + 13 = c3 = 30, which is the number of atoms in the side chains of these three
amino acids (see Table A1 in Appendix A).
We show, in this section and also in the next ones, using all the resources offered by
our Fibonacci-like series and their properties, that these three sextets (more precisely, their
hydrogen and atoms numbers), as “seeds”, will create the entire hydrogen atom, atom and
even nucleon content of the whole set of amino acids, including the degeneracy, much like
the creation of the 64 codons from the three sextets in the “ideal” symmetry scheme, [10],
mentioned above.
Now, we return to the subject of this section. First, using the relation (v) cn + 2bn−1 = bn+2
in Equation (2), we can derive the hydrogen atom content in the two sets: the “leading” group
and the “nonleading” group. We have, for n = 7 (see Table 5 and also Appendix C)
It can be seen, from Table 4 and also, in parallel, from an evaluation using the data in
Table A1 in Appendix A, that there are 190 and 168 hydrogen atoms in the side chains of the
amino acids in the “leading” group and in the “nonleading” group, respectively. Moreover,
concerning the latter, there are 84 hydrogen atoms in the side chains of the amino acids, the
codons of which have the same first two bases, UU, CC, AA and GG (in the four corners
of Table 4), and 84 hydrogen atoms in the side chains of the amino acids located in the
four boxes in the center of the table, the codons of which have different first two bases, UG,
GU, AC and CA. Equation (14) above faithfully describes, therefore, this pattern. Now, we
Computation 2023, 11, 154 11 of 28
move further to accurately describe the hydrogen atom content involving the amino acids
of the “core” comprising serine, arginine and leucine. To see this, we invoke the following
two relations:
5an + 2bn−1 = bn+2 , (15)
This is the correct content of the part of the “core” in the “leading” group: 60 hydrogen
atoms (6 × 10) in arginine’s side chain (ArgIV/II ), 36 hydrogen atoms (4 × 9) in leucine’s
side chain (LeuIV ) and 18 hydrogen atoms (6 × 3) in serine’s side chain (SerIV/II ). Let us,
alternatively, add the above-mentioned two functions to the number 114. We have
This is the number of hydrogen atoms in the side chains of the amino acids of the
“nonleading” group, where the isolated number 18 is now re-interpreted as the number of
hydrogen atoms in the side chain of leucine (2 × 9), the “seed” of the “nonleading” group,
that is, LeuII (see above). We have thus established the exact hydrogen atom content in the
“ideal” symmetry scheme of the genetic code where the sextets play a prominent role. Note,
finally, that, as λ(114) = 18 has been used two times, once as the number of hydrogen
atoms in SerIV/II and once as the number of hydrogen atoms in LeuII , we can summarize
all of what has been said above by adding λ(114) = 18 to Equation (17) and write the
exact hydrogen atom content of the entire “core” 60 + (36 + 18) + 18 = 132 constituted
by ArgIV/II , (Leu IV + LeuII and SerIV/II , respectively. (The 18 codons of the “core” are
underlined in Table 4.) Of course, after subtracting the number 132 from the total sum 358
in Equation (14) above, we are left with 226, the number of hydrogen atoms in the side
chains of the 17 amino acids outside the “core”. We have thus seen that the “seeds” of the
Computation 2023, 11, 154 12 of 28
sequences bn and cn are capable of creating the hydrogen atom structure in good agreement
with the “ideal” symmetry classification scheme (see also Section 4.2.4 below).
As a by-product of the results obtained in this section, we have found, unexpectedly, a
way to derive from the number of hydrogen atoms in the part of the “core” in the “leading”
group, 114, and in the rest, 244, comprising the part of the “core” in the “nonleading” group
(see above), and only from these, the very chemical structure of the building blocks of
RNA: the four ribonucleotides uridine monophosphate (UMP), cytidine monophosphate (CMP),
adenosine monophosphate (AMP) and guanosine monophosphate (GMP). Using the functions
A0 and λ (see Appendix B), we have A0 (114) = 38, A0 (244) = 88 = 61 + 1 + 18 + 4 + 4
and λ(114) = 18 (see Appendix B, where the details of the computations are given as
examples). First, we have, from these three quantities, [ A0 (114) + λ(114)] + A0 (244) =
56 + 88 = 144. This is the total number of atoms in the four ribonucleotides: 56 in the
four nucleotides U (12 atoms), C (13 atoms), A (15 atoms) and G (16 atoms) and 88 in the
four identical “backbones”, each with 22 atoms (see [7] for the details of the calculation,
which also includes a mathematical derivation of the number 22 above, which is part
of the “condensation” equation for the assembly of a ribonucleotide from the three units:
a nucleotide, a ribose and a phosphate group with the release of two water molecules,
also derived). Now, as there are 30 codons in the “leading” group (two stop codons not
counted) and 31 codons in the “nonleading” group (one stop codon also not counted) (see
Table 4), we can use this decomposition for the number 61 above and finally write the
relations above in the form (30 + 4) + (31 + 4) + (2 × 18 + 1) + 38 = 34 + 35 + 37 + 38.
Note that the above decomposition of the number 61 could also be obtained in another
way, by directly using the properties of the sequence an ; see Table 5. We have, in this case,
a8 = 61 = 23 + 38, a7 = 38 = 23 + 15 and a5 = 15 = 7 + 8, so by combining them, we
obtain 61 = (23 + 7) + (23 + 8) = 30 + 31. The above-computed quantities 34, 35, 37 and
38 are, respectively, the number of atoms in the four ribonucleotides UMP (C9 H13 N2 O9 P),
CMP (C9 H14 N3 O8 P), AMP (C10 H14 N5 O7 P) and GMP (C10 H14 N5 O8 P), where we have
indicated their elemental composition.
where we have used the recurrence relation of the sequence an0 to write the number 139 as
86 + 53 (see Table 5). We have already mentioned in the examples following Equation (5)
that, for n = 8, one has 86 − 84 = 2 or 86 = 84 + 2. Inserting this quantity in the above
equation results in
186 + (84 + 88) = 358. (20)
This is the hydrogen atom content in Rumer’s division: 186 hydrogen atoms in the side
chains of the amino acids in M2 and 172 hydrogen atoms in the side chains of the amino
acids in M1 , where, in this latter, we have the correct partition into 84 hydrogen atoms
(4 × 21) in the side chains of the amino acids constituting the 5 quartets and 88 hydrogen
atoms (4 × 22) in the side chains of the amino acids constituting the 3 sextets. To obtain the
details concerning the number of hydrogen atoms in M2 , 186, we first isolate the sum of the
first four numbers in the sum in Equation (19), that is, 1 + 6 + 7 + 13 = 27 = 33 = 3 × 9.
This is equal to the number of hydrogen atoms in the triplet isoleucine (see below). We
are left, in the sum, with the three terms 3 × 53. By writing the number 53 once as 15 + 38
from the relation (viii) in Equation (2), with n = 5, and twice as 22 + 31 from the recurrence
relation of the sequence bn , we obtain
2 × 50 + 2 × 22 + 27 + 7 + 8 = 186. (21)
Computation 2023, 11, 154 13 of 28
This relation, as it is, is the pattern shown in Table 3 for the gross third-base division
UC/AG; more exactly, we have from the Table 3. 2 × 84 + (92 + 98) = 2 × 84 + 190 = 358.
Here, we note that this relation already describes, nicely, the equality of the number of
hydrogen atoms in the columns third base U and third base C, where the amino acids are
the same (see the penultimate row in the Table 3). We can do better by invoking two more
relations. First, we have the relation (x) in Equation (2): an + bn+2 = 4an+2 which, for
n = 4, gives 8 + 84 = 92 (see Appendix C). Second, we have the relation 2bn + bn+1 = cn+2 ,
which also holds and gives, for n = 5, 2 × 53 + 84 = 190. Inserting the number 84 = 92 − 8,
from the relation just above, in the second one results in 190 = 92 + 98. Collecting these
results in Equation (22) above gives, finally,
2 × 84 + 92 + 98 = 358. (23)
This last relation completely describes, therefore, the hydrogen atom content pattern
of Table 3. The third base classification mentioned above can also be supported by the
following calculation. We know, from Section 2.2, that the doubly degenerate codons
(group-II) obey a fundamental symmetry, so they must play a basic role, including, we will
show, in the hydrogen atom content. We have, using the sequence an ,
By subtracting this sum from the right side of Equation (22) above, which gives the
total number of hydrogen atoms in the side chains of all the amino acids coded by the
61 sense codons, we obtain, by arranging,
These two numbers can be interpreted as follows: 100 hydrogen atoms in the side chains
of the amino acids constituting the 9 doublets and 258 hydrogen atoms in the side chains of the
amino acids constituting the remaining multiplets (5 quartets, 3 sextets, 2singlets and 1 triplet);
see Equation (21) and below it. This same relation, Equation (25), could also be obtained, in
another way, from the relation mentioned in Section 4.1, a9 + a11 = 99 + 259 = b9 = 358,
noting that the sum in Equation (24) above is also equal to 259 − 1 (recall ∑k1 an = ak+2 − 1,
with k = 9). We then get back to our result as follows: (99 + 1) + 258 = 100 + 258. Note also
that 2 × ϕ(258) = 2 × 84 and 358 − 2 × ϕ(258) = 190 or 2 × 84 + 190, which is nothing but
the hydrogen atoms pattern of the present classification (see Equation (22) and Table 3). (The
function ϕ is defined in Appendix B, and the factor two, which has been introduced above, is
for “doubly” degenerate codons.)
Computation 2023, 11, 154 14 of 28
where the numbers 23 and −3 are the “seeds”. The first few terms are shown below:
gn : 23, −3, 20, 17, 37, 54, 91, 145, 236, 381, . . . (27)
Computation 2023, 11, 154 15 of 28
bn + gn = 6an , (28)
which can be shown to hold (see Appendix C). The case n = 9 is particularly relevant. We
have, from Table 5 and the series in Equation (27) above,
and we see that it gives the total number of atoms in the side chains of all the amino acids
coded by the 61 sense codons, distributed into 358 hydrogen atoms (see Section 4.2.1) and
236 atoms (C/N/O/S); see Table A1 in Appendix A (180 carbon atoms and 56 N/O/S
atoms). Now, we have the relation
which can also be shown to hold for any k, which is the analog of the sum of the first k
Fibonacci numbers. For k = 7, it gives 236 + 3 = 239 or 236 = 239 − 3. By inserting this
latter in the above equation, we obtain
Here, we have the number of atoms, also in the “23 + 38” pattern: 239 atoms in all the
side chains of the amino acids encoded by 23 codons (the sextets with 35 atoms are counted
two times) and 355 atoms in the side chains of the amino acids encoded by the remaining
38 degenerate codons (see Table A1 in Appendix A). Let us, at this stage, remember the
sequence cn , especially its “seeds” a1 = 30 and a2 = 5 with the sum a1 + a2 = 35. They
were chosen, intentionally, as the sum of the number of atoms in arginine and leucine,
equal to 30 (= 17 + 13), on the one hand, and the number of atoms in serine, equal to 5, on
the other (see Section 4.2.2). Their sum is therefore just the right thing to add and subtract
from Equation (31) above to obtain
which is the correct partition of the number of atoms—this time, in the pattern “20 + 41”
(see the comments between Equations (11) and (12) in Section 4.2.1 for hydrogen). We
have 204 atoms in the side chains of 20 amino acids, on the one hand, and 390 atoms
in the side chains of the amino acids encoded by 41 degenerate codons (see Table A1 in
Appendix A). Now, the use of the above sum in Equation (30), for k = 8, gives ∑81 gn = 384,
which appears also doubly significant; see below. By subtracting this latter number from
the total sum, 594, and arranging, we have
This partition of the number of atoms also has an interpretation: there are 210 atoms
inthe side chains of the six entities (the sextets) SerIV–II , ArgIV–II and LeuIV–II (35 × 6) en-
coded by 18 codons and 384 atoms in the side chains of the remaining 17 amino acids
encoded by 43 codons (taking into account the degeneracy). It is worth noting that the first
two recurrence relations of the sequence gn 23 − 3 = 20 and 20 − 3 = 17, together, lead to
the relation
23 = 17 + (3 + 3), (34)
which is in line with the above result for the atom numbers and also with the “ideal”
symmetry scheme (as depicted below):
(3 + 3) ↔ SerIV , ArgIV , LeuIV + SerII , ArgII , LeuII . (35)
Computation 2023, 11, 154 16 of 28
Finally, we could also derive the partition of the number of atoms for Rumer’s sets
M1 and M2 . Consider, again, the equation above, 210 + 384 = 594—more precisely, the
number 384, which was calculated from Equation (30), with k = 8. By partitioning this sum
in two parts: the first, for k = 4, gives 54 − (−3) = 54 + 3, and the second, which is equal
to g5 + g6 + g7 + g8 , gives 327. By inserting these two parts in Equation (33) and arranging,
we obtain
(210 + 54) + (327 + 3) = 264 + 330 = 594. (36)
This is the content in atoms in M1 (264) and in M2 (330); see Table A1 in Appendix A.
We can also reveal the details for the multiplets. Considering, first, M1 , let us present the
following (new) relation connecting the sequences bn and cn :
M1 : 4 × 31 + 4 × 35 = 264,
(38)
M2 : 192 + 2 × 35 + 3 × 13 + 11 + 18 = 330,
which is the precise and detailed partition. Finally, let us note that the number 384,
mentioned below Equation (32), also has another relevant interpretation. It is equal to the
number of atoms in the 20 amino acids, this time adding to the side chains their 20 identical
backbones with 9 atoms each: 204 + 9 × 20 = 384.
It appears that this number, 3404, is the number of nucleons in the side chains of all the
amino acids coded by the 61 sense codons (see Table A1 in Appendix A). This is nice, but
we could do more. Consider again the “seeds” of the sequence cn , 30 and 5 with the sum
35, the number of atoms in the side chains of the three sextets serine (5), arginine (17) and
leucine (13). Here, we call Zeckendorf’s theorem which states that every positive integer
can be represented uniquely as the sum of one or more non-consecutive Fibonacci numbers.
It is not difficult, by applying this theorem to the number 30 (= 21 + 8 + 1) and the fact
that 21 = 13 + 8, to show that the sum of the “seeds” takes the form 13 + 17 + 5 = 35, i.e.,
the correct atom numbers in the three sextets, mentioned above. Now, by isolating the sum
of the above “seeds” of cn from the third sum in Equation (39) and including it in the two
other sums, we obtain
2149 + 1255 = 3404. (40)
Here, we have a significant result: there are 1255 nucleons in the side chains of the
20 amino acids (see Table A1 in Appendix A) and 2149 nucleons in the side chains of the
amino acids encoded by the 41 degenerate codons, following, again, the pattern “20 + 41”
(see Equations (11) and (32)). Let us now exploit the relation between the two sequences an
and cn ( cn = 5an ), mentioned above, and write the sum in Equation (39) in the form
9 9 9 9
4
∑an + ∑ bn
+ 2
∑an + ∑bn = 1960 + 1444 = 3404. (41)
1 1 1 1
Recall the sum ∑k1 an = ak+2 − 1, mentioned in Equation (6) of Section 4.1. In the
present case, for its use in Equation (41), we have ∑91 an = 259 − 1 for k = 9. By considering
this latter relation in only one such sum in the first bracket of the above equation and
including the unit “−1” in the second bracket, we obtain
One recognizes here the nucleon number, in the pattern “38 + 23” (see above and
Appendix A): 1443 nucleons in the side chains of the amino acids coded by 23 codons
(the sextets counted two times) and 1961 nucleons in the side chains of the remaining
amino acids encoded by 38 degenerate codons. We can also, from the above relations,
make contact with the “ideal” symmetry scheme of Section 4.2, at the level of the nucleon
numbers. To do this, let us first remark that the number 114 appears twice, once as the
number of hydrogen atoms in the part of the “core” belonging to the “leading” group of
the “ideal” symmetry scheme (see Section 4.2.2) and once as the number of nucleons in
LeuII (2 × 57), the part of the “core” belonging to the “nonleading” group (see Table A1 in
Appendix A). This will prove significant in the following. Consider the sum
The number 2114 by itself is not very interesting, but its ϕ-function is. We have
ϕ(2114) = 900 (see Equation (A3) in Appendix B) and, adding to this two times the
number 114 gives 900 + 2 × 114 = 1128. This is the number of nucleons in the “core”:
31 × 6 + 100 × 6 + 57 × 6 = 1128. Arranging the sum as (900 + 114) + 114 = 1014 + 114 gives
the partition of the nucleon numbers between the two parts of the “core”, 31 × 6 + 100 × 6 +
57 × 4 = 1014 in the “leading” group, on the one hand, and 57 × 2 = 114 in the “nonleading”
group, on the other:
1128 = 1014 + 114. (44)
Computation 2023, 11, 154 18 of 28
In the following, we can also derive three more results by “watering three plants with
one hose”, so to speak. Consider again the sum in Equation (39), and split it as follows:
9
∑1 ∑1 ∑1 ∑1 ∑
9 9 7 9
an + bn + cn + bn + cn = 1676 + 1728 = 3404. (45)
8
We have here the nucleon number pattern of the third base classification of Section 2.2:
1728 nucleons in the U/C third-base division and 1676 nucleons in the A/G third-base
division (see Table 3, last row). By borrowing, from the first bracket above, the sum of
the first three members of the sequence cn : 30 + 5 + 35 = 2 × 35 = 70, the one we used
earlier (see above Equation (38)), to the benefit of the second bracket, we obtain (as an
example of evaluation from the table in Appendix A, one obtain for the «leading» group:
31 × 6 + 57 × 4 + 100 × 6 + 15 × 4 + 59 × 2 + 73 × 2 + 107 × 2 + 57 × 3 + 75 = 1798).
Here, we recognize the number of nucleons in the “leading” group, 1798, and that in the
“nonleading” group, 1606.
1606 + 1798 = 3404. (46)
Finally, we could also establish the nucleon number pattern corresponding to Rumer’s
division. Consider again Equation (39). We partition it as follows:
h i
∑81 an + 2∑81 bn +∑81 cn + (a9 + 2b9 + c9 ) = 2094 + 1310 = 3404. (47)
It suffices now, analogously to what we did in Equation (40) above, to subtract, once,
the sum of the “seeds” of the sequence bn in the bracket, that is, 13 + 9 = 22, and add it to
the three terms in the parenthesis to obtain
It could be shown and verified that the above relation holds for any k (see Appendix C).
For k = 9, it gives 358 = 364 − 6. (This low k case could simply be evaluated from Table 5
using a pocket calculator.) As established and mentioned many times previously, 358
Computation 2023, 11, 154 19 of 28
is the number of hydrogen atoms in the side chains of all the amino acids coded by the
61 sense codons, where the special amino acid proline has 5 hydrogen atoms in its side
chain. If, instead, one considers that proline’s side chain now has six hydrogen atoms, at
the cost of its block, i.e., no standardization made, or the “activation key” off (see below),
and taking into account the number of its coding codons, which is four, then we now
have 362 = 358 + 4 hydrogen atoms in the side chains of all the amino acids coded by the
61 sense codons. Let us reconsider Equation (10), the partition of the number of hydrogen
atoms between the amino acids encoded by 38 degenerate codons, 219, and the amino acids
encoded by 23 codons, 139, (the sextets counted twice), but now using the above relation
(358 = 364 − 6):
8
∑an0 + a 0
9 = 219 + 139 = 364 − 6. (50)
1
To obtain a correct partition, let us consider the perfect number 6 which is, as such,
equal to the sum of its proper divisors: 6 = 1 + 2 + 3 (also used in Section 4.2.5). These are
just the right numbers we need. By inserting them in the above equation by selecting the
odd divisors 1 and 3 and shifting them to the left while leaving the even one 2 to the right,
and finally arranging them properly, we obtain
We have here something noteworthy: one more hydrogen atom in the amino acids in
the part encoded by 23 codons and 3 more hydrogen atoms for its 3 degenerate codons,
still in its side chain and located in the degeneracy part.
Taking a look at the sixth term in the sequence cn , 115 = 40 + 75, it appears to be equal
to the number of nucleons in proline’s side chain and backbone; see below about this latter
sum. This number, 115, is “invariant” whether we make shCherbak’s “borrowing” of one
nucleon or not. To obtain more insight, we consider another invariant number, the total
number of hydrogen atoms in all the amino acids coded by the 61 sense codons, including
the backbones (with 4 hydrogen atoms in each), that is, 358 + 244 = 362 + 240 = 602.
Without borrowing one nucleon from the side chain of proline in favor of its block, there are
362 hydrogen atoms in the side chains and 240 hydrogen atoms, 57 × 4 + 4 × 3 = 240, in the
backbones of all the amino acids coded by the 61 sense codons. Applying the “borrowing”,
there are 358 hydrogen atoms in all the side chains and 244(= 61 × 4) hydrogen atoms in
all the backbones. Note, in passing, the following nice relations seemingly linking the two
views: ϕ(240) + ϕ(362) = 244 and (240 + 362) − [ϕ(240) + ϕ(362)] = 358.
Now, let us examine the former point, the derivation of the “activation key”. Consid-
ering the above-mentioned invariant numbers, 115 ( = 5 × 23) and 602(= 2 × 7 × 43), we
have, using their A0 function (defined in Appendix B):
Finally, using ϕ(41) = 40 = 41 − 1 and the decomposition of the number 75 as the sum
of three squares, mentioned above, we can write, by allocating the two units in two ways:
41 − 1 + 1 + 74 = 41 + 74 = 42 + 73. This is, again, what we found above from Equations
(52) and (53).
One recognizes here the number of hydrogen atoms in the side chains of the 20 amino
acids, 117, augmented by the number of hydrogen atoms in the three sextets, 22. The total,
139, corresponds to 23 codons (the sextets counted two times). Let us now compute the
following expression, using the sum and product of the “seeds” of the sequence cn and only
the sum of the “seeds” of the other three remaining sequences an , an0 and gn (the latter
defined in Equation (26)). We have
Here, we have the number of hydrogen atoms in the side chains of the amino acids
coded by the 38 degenerate codons. Equations (54) and (55), together, constitute the
“23 + 38” hydrogen atom pattern established in Section 4.1. Furthermore, borrowing the
number 22 from Equation (54) to the benefit of Equation (55) gives 117 + 241 = 358, which
corresponds to the other pattern “20 + 41” (see Equations (10) and (11)). Next, we arrange
Equations (54) and (55) as follows:
Here, we have, again, the hydrogen atom content in Rumer’s division: 172 hydrogen
atoms in M1 and 186 hydrogen atoms in M2 ; see Section 4.2 and Equations (19) and (20).
To obtain the other patterns, we call the Fibonacci (0, 1, 1, 2, 3, 5, . . .) series and the Lucas
(2, 1, 3, 4, 7, 11, . . .) series, which, as is well known, are linked by the relation Fn + Ln+2 = Fn+4 .
For n = 5, we have 5 + 29 = 34, so we can replace the term 34 = 14 + 20 in Equation (56) with
the latter. By arranging, we obtain
This is the hydrogen atom pattern for (i) the third base classification of Section 4.2
(Equation (14)) and (ii) the “ideal” symmetry classification scheme in the same section
(Equation (22)).
Computation 2023, 11, 154 21 of 28
Finally, we reconsider Zekendorf’s theorem (see above) and apply it to the number
117, giving 89 + 21 + 5 + 2. Writing 89, a Fibonacci number, as 55 + 34, we can rear-
range the content of the second parenthesis in Equation (57) above as 55 + 29 = 84 and
34 + 21 + 22 + 5 + 2 = 84, so that 168 = 2 × 84, which, again, describes the pattern
190 + 2 × 84 = 358. The fact of having used the Fibonacci and Lucas sequences here is all
the more interesting in that it can also give us another remarkable result. By adding the
two “seeds” of the Fibonacci and Lucas sequences, 0 and 1 and 2 and 1, respectively, to the
above sum of Equations (54) and (55) and arranging, we obtain
which is the hydrogen atom pattern found in Section 5, devoted to the special imino
acid proline and the shCherbak–Makukov “activation” key, when this latter is “off ”; see
Equation (51) in Section 5.
7 6
2 ∑a + ∑g
n n = 172 + (24 + 148) = 172 + 172 (59)
1 1
where we put the quartet part of the two sextets with the quartets and their doublet part
with the other doublets. It is even possible to separate, in M1 , the hydrogen atom count
of the quartets from that for the quartet part of the two sextets by writing the above
term, 2 ×86, as 2 × (84 + 2) = 2 × (2 × 31 + 22 + 2) = 4 × 31 + 4 × 12, where we have
used the identity in Equation (5) for n = 8 (86 = 84 + 2) and also the recurrence relation
of the sequence bn to write 84 as 53 + 31 and next as 31 + 22 + 31 = 2 × 31 + 22. We
Computation 2023, 11, 154 22 of 28
have, therefore, a perfect description, via computation, of the highly symmetric vertebrate
mitochondrial genetic code (VMC). The summary is depicted in Table 7 below, where the
hydrogen atom numbers of the two parts of the sextets, the quartet part, 48, in M1 , and the
doublet part, 24, in M2 , are set apart. (Observe that the “symmetry” of the numbers is also
gracefully put on show).
M1 M2
{48, 124} {24, 148}
172 172
8. Conclusions
In this work, we have strayed a little off the beaten paths in genetic code mathematical
research. Starting with a handful of Fibonacci-like sequences, in Section 3, we have derived
not only the degeneracy structure of the genetic code, in Section 4.1, but also the hydrogen
atom content, in Sections 4.2.1–4.2.4. We have also included, in Section 4.2.5, a discussion
devoted to the choice of the initial conditions of our Fibonacci-like sequences. Next, we
derived the atom number content, in Section 4.3, and also the integer molecular mass
(nucleon) content of the set of 20 amino acids, as structured in the 64-codon table, in
Section 4.4. As a by-product of our mathematical formalism, we derived the atomic
(elemental) content of the building blocks of RNA, the four ribonucleotides UMP, CMP,
AMP and GMP, in Section 4.2.2.
Still using the above mathematics, we bring, for the first time, in Section 5, an addi-
tional brick to shCherbak’s theory, concerning the role of the special imino acid proline
whose virtual “double” structure renders possible, via the use of the “activation key”, a
large number of remarkable and beautiful arithmetical patterns.
In Section 6, we show that the “seeds” of our Fibonacci-like sequences and only these,
by themselves, are capable of reproducing the main hydrogen number patterns derived in
this paper.
Finally, in Section 7, we have applied, successfully, our Fibonacci-like formalism to the
highly symmetrical vertebrate mitochondrial genetic code as well as a numerical hydrogen
atom balance inherent to Rumer’s division of the genetic code table.
Our main findings, such as the total hydrogen atom content, the total atom content,
the total molecular mass content of the 20 amino acids, including the degeneracy, as well as
other relevant quantities related to the symmetries of the genetic code, are found directly,
either as ostensible members of the Fibonacci-like sequences or from the summation
properties of the latter.
Let us note that the hydrogen atom, atom and nucleon contents of the amino acids
considered in this work are the ones corresponding to their neutral state. This choice has
also been considered in [12]. Now, it is well known that few amino acids are charged in
their normal (physiological) state. This case can also lead to the existence of remarkable
(nucleon or integer mass) balances; see [13] and also [20]. We have found that this latter
case could also be handled using the mathematical formalism used in the present work.
The corresponding results, which are in progress, will be submitted soon for publication.
Below, we give a brief summary of the paper, in a “one-liner” format, showing only the
main “parent” relations whose numerous “offsprings”, which are derived in the different
sections, disclose the symmetries of the genetic code.
Computation 2023, 11, 154 23 of 28
Hydrogen atoms in all the amino acid side chains coded by 61 sense codons (Section 4.2)
1. 9
∑a0n + a90 = 219 + 139 = 358
i =1
Atoms (H/CNOS) in all the amino acid side chains coded by 61 sense codons (Section 4.3)
2.
b9 + g9 = 6a9 = 358 + 236 = 594
Integer molecular mass (nucleon number) in all the amino acid side chains coded by 61 sense
3. codons (Section 4.4)
9 9 9
∑ an + 2 ∑ bn + ∑cn = 3404
1 1 1
Hydrogen atoms in all the amino acid side chains coded by 60 sense codons in the vertebrate
mitochondrial genetic code (Rumer’s division, Section 7)
4.
7 6
2 ∑ an + ∑gn = 344 = 172 + 172
1 1
Appendix A
In the table of this appendix, we give the detailed elemental composition of the side
chains of the 20 amino acids. H stands for hydrogen, C for carbon, N for nitrogen, O for
oxygen and S for sulfur. The calculated values of some important quantities, taking into
account the degeneracies, are indicated in the last five rows; they are useful to know when
reading the main text (those shown in red color are all mathematically derived in this paper using
the present new approach). In the table, the first column, M, gives the number of codons
which code for an amino acid (four for a quartet, six for a sextet, two for a doublet, three
for a triplet and one for a singlet). In column six, we provide the number of atoms in the
side chains, and the number of nucleons (protons and neutrons), which is also the integer
molecular mass of an amino acid, is displayed in column 7. Below the table, we offer
hints for computing some of them. The table is in the “standardized” form, that is, proline
has 5 hydrogen atoms in its side chain, and all 20 amino acids, including proline, have
74 nucleons in each of their backbones; see Section 5. The general chemical (linear) formula
of an amino acid is
R − CH(NH2) − COOH,
where R is the radical, also called the side chain, and the rest of the molecule constitutes
the backbone. Also, the side chain is bound to the α-carbon. In the special case of proline,
its side chain from the α-carbon connects to the nitrogen N, forming a pyrrolidine loop. (It
is the side chain that gives an amino acid its specific functional properties.) To calculate, for
example, the nucleon numbers or the integer molecular mass of an amino acid, the molecu-
lar masses of the chemical elements are those of the most abundant isotopes: hydrogen (1),
carbon (12), nitrogen (14), oxygen (16) and sulfur (32). From the formula above, one easily
computes the integer molecular mass of the backbone: 2 × 12 + 1 × 14 + 2 × 16 + 4 × 1 = 74.
In the (unique) case of proline, as mentioned above, there is one less hydrogen atom in
the backbone, and the nucleon number is 73 = 74 − 1; this is the non-standardized form
(“activation key” off ) (see Section 5).
Computation 2023, 11, 154 24 of 28
Obtaining the results in the second of the last five rows from the first one, it suffices to
count the values of the sextets two times. For the rest, to ease the calculations, one can use
the following pre-calculated sums for the hydrogen atom content: 5 quartets 21, 3 sextets 22,
9 doublets 50, 1 triplet 9 and 2 singlets 15 = 7 + 8. For the atom number, it is: 5 quartets 31,
3 sextets 35, 9 doublets 96, 1 triplet 13 and 2 singlets 29 = 11 + 18. For the nucleon numbers,
it is: 5 quartets 145, 3 sextets 188, 9 doublets 660, 1 triplet 57 and 2 singlets 205 = 75 + 130.
In the calculations, the reader also needs to know what we mean by degeneracy. This
latter is defined as the number of codons coding for an amino acid minus one. Therefore,
for a quartet, the degeneracy is 3 = 4 − 1; for a doublet, it is 1 = 2 − 1; for a triplet, it
is 2 = 3 − 1 and for a singlet, it is 0 = 1 − 1. For the special case of the sextets, there
are two possibilities related to the two patterns mentioned several times in this paper:
“20 + 41 = 61” and “23 + 38 = 61”. In the first case, the degeneracy is 3 + 2 = 5 (three for the
quartet part and two for the doublet part whose two codons are both considered degenerate).
In the second case, the quartet part and the doublet part of each sextet are considered as separate
entities (e.g., SerIV and SerII , so the degeneracy is equal to 3 + 1 = 4, three for the quartet
part and one for the doublet part, which, here, is considered as a doublet. In this way, for the
number of amino acids and the total number of coding codons, we have 20 = 5 + 3 + 9 + 1 + 2
and 41 = 5 × 3 + 3 × 5 + 9 × 1 + 1 × 2 in the first case and 23 = 5 + (3 + 3) + 9 + 1 + 2 and
38 = 5 × 3 + 3(3 + 1) + 9 × 1 + 1 × 2 in the second one. With these definitions, it is not difficult
Computation 2023, 11, 154 25 of 28
to carry out the rest of the computations. Let us give a few examples from the table above for
the number of hydrogen atoms for the pattern “23 + 38”: 139 = 21 + 22 × 2 + 50 + 9 + 7 + 8,
219 = 21 × 3 + 22 × 4 + 50 × 1 + 9 × 2, 358 = 21 × 4 + 22 × 6 + 50 × 2 + 9 × 3 + 7 + 8.
Appendix B
In this appendix, we mention a few other additional mathematical elements used in
this paper: (i) Euler’s phi totient function, (ii) the Carmichael lambda function and (iii) our
function A0 . All these functions rely on the Fundamental Theorem of Arithmetic, which states
that every integer n (except the number one) can be represented, uniquely, as a product of
prime numbers, irrespective of their order:
n = p1 n1 × p2 n2 . . . × pk nk (A1)
First, there is Euler’s totient function for an integer n, ϕ(n), which is extensively used
in many scientific areas such as in cryptography and graph theory. It counts the number
of positive integers less than or equal to n which are relatively prime to n (also called
coprimes). For example, 24 has 8 coprimes (1, 5, 7, 11, 13, 17, 19, 23): ϕ(24) = 8. A simple
formula for computing this function is the following (see [21])
m
1
ϕ(n) = n∏ 1 − (A2)
i=1
pi
where m is the distinct prime factors in the factorization (A1). Let us take two examples
from the text: ϕ(2114) = 900 (see below Equation (43)) in Section 4.4 and ϕ(114) = 36
(mentioned above Equation (17)) in Section 4.2.2. The prime factorizations of these two
numbers are given by 2114 = 21 × 71 × 1511 and 114 = 21 × 31 × 191 . From Equation (A2),
we have, respectively,
1 1 1
ϕ(2114) = 2114 × 1 − × 1− × 1− = 1 × 6 × 150 = 900 (A3)
2 7 151
ϕ(114) = 114 × 1 − 12 × 1 − 13 × 1 − 19 1
(A4)
= 1 × 2 × 18 = 36
Second, there is the Carmichael λ-function, also called the reduced totient function,
which is, in fact, used only once in Section 4.2, where it appears to be useful. It is defined as
the smallest positive divisor of Euler’s totient function that satisfies Euler’s Theorem, [22],
which states that if n is a positive integer and a and n are coprime, then aϕ(n) ≡ 1 (mod n),
where ϕ(n) is Euler’s totient function. For example, λ(24) = 2. (The reader could easily
find good online calculators for these functions for checking.) Here, there also exists a
simple formula for computing this function, using Equation (A1):
h i
n −1
λ(n) = lcm (pi − 1)pi i (A5)
i
where pni is the prime factors of n from Equation (A1) and lcm is the least common multiple.
Let us give, as an example, the computation of λ(114), mentioned above in Equation (17) in
Section 4.2.2. From its prime factorization above and Equation (A5), we have
where a0 (n) is the sum of the prime factors of the integer n, including the multiplicities,
p1 × n1 + p2 × n2 + . . . +pk × nk , SPI(n) is the Sum of the Prime Indices PI(p 1 ) × n1 +
PI(p2 ) ×n2 + . . . +PI(p k ) × nk , where PI(2) = 1, PI(3) = 2, PI(5) = 3 and so on, also including
the multiplicities and, finally, Ω(n), the so-called Big Omega function, is the number of
the number of the prime factors n1 + n2 + · · · + nk . Consider, as an example, the number
192, whose prime factorization is 26 × 31 . We have
A0 (192) = a0 26 × 31 + SPI 26 × 31 + Ω 26 × 31
= (6 × 2 + 1 × 3) + (6 × 1 + 1 × 2) + (6 + 1) = 30.
apart the two factors four proved useful in revealing the structure of the four ribonucleotides
(in Section 4.2).
Appendix C
In this appendix, we give some hints to the interested reader who wants either to
verify the identities in Equation (2) of Section 3 or to carry out the various computations
presented in the different sections by himself/herself. In the latter case, where only low
values of n are involved, it suffices to use a pocket calculator, along with the data in Table 5
of Section 3. For more complicated cases, like the verification of the identities in Equation
(2), especially for large or even very large values of n, a computer is necessary. In this
vein, a mathematical software, to the extent that it contains a built-in “fibonacci” function,
generally written as “fibonacci(i)”, as it exists in Maple, Matlab, Mathematica, etc., could be
used. Those familiar with programming languages, like, for example, Python or C++, could
use the source codes for the Fibonacci sequence, available in the following links: [23,24],
respectively. Given this function, the reader only needs, for performing the verifications
or the calculations, to write the five functions an , an0 , bn , cn and gn together with their
“seeds” in terms ofthe fibonacci function, from their definition in Equation (1) of Section 3,
as follows:
a[n] := fibonacci(n − 1) + 6 ∗ fibonacci(n − 2)
a0 [n] := 6 ∗ fibonacci(n − 1) + fibonacci(n − 2)
b[n] := 9 ∗ fibonacci(n − 1) + 13 ∗ fibonacci(n − 2) (A7)
c[n] := 5 ∗ fibonacci(n − 1) + 30 ∗ fibonacci(n − 2)
g[n] := −3 ∗ fibonacci(n − 1) + 23 ∗ fibonacci(n − 2)
Let us give some examples.
Example A1. The verification of the identity (x) an + bn+2 = 4an+2 in Equation (2) of
Section 4.2.3. For n = 4, we have a[4] = 8, b[6] = 84, 4 ∗a[6] = 92 and a[4] + b[6] = 4
∗a[4] = 92. (This can be checked simply by hand from Table 5.) For larger values of n, a computer
must be used. For n = 100 (taking a value for n that is not too large to save the place), one obtains
Example A2. The verification of the identity bn + gn = 6an in Equation (28). For n = 9, we have,
from Table 5: b[9] = 358, g[9] = 236, 6a[9] = 594, b[9] + g[9] = 358 + 236 = 6a[9] = 594.
Computation 2023, 11, 154 27 of 28
Example A3. The verification of the identity (v) cn + 2bn−1 = bn+2 in Equation (2). The
case n = 7 , which was involved in Equation (14), gives immediately from Table 5: c[7] = 190,
2b[6] = 2 × 84 = 168, b[9] = 358 and c[7] + 2b[6] = 190 + 168 = b[9] = 358.
Once the functions an , an0 , bn , cn and gn are written, one can use a simple built-in
summation function for them to evaluate the various sums in the text, which all involve
only low values of the index n. As an example, let us compute the two parts of Equation
(10) of Section 4.2.1 and their sum. We have
8 8
∑a [i] = 219,
0 0
a [9] = 139, ∑a0 [i] + a [9] = 358
0
(A10)
i=1 i=1
References
1. Nirenberg, M.; Leder, P.; Bernfield, M.; Brimacombe, R.; Trupin, J.; Rottman, F.; O’Neal, C.N.A. Codewords and Protein Synthesis,
VII. On the General Nature of the RNA Code. Proc. Natl. Acad. Sci. USA 1965, 53, 1161–1168. [CrossRef] [PubMed]
2. Inouye, M.; Takino, R.; Ishida, Y.; Inouye, K. Evolution of the genetic code; Evidence from codon use disparity in Escherichia coli.
Proc. Natl. Acad. Sci. USA 2020, 117, 28572–28575. [CrossRef] [PubMed]
3. Zwick, A.; Regier, J.C.; Zwickl, D. Resolving Discrepancy between Nucleotides and Amino Acids in Deep-Level Arthropod
Phylogenomics: Differentiating Serine Codons in 21-Amino-Acid Models. PLoS ONE 2012, 7, e47450. [CrossRef] [PubMed]
4. Sun, X.; Yang, Q.; Xia, X. An improved implementation of effective number of codons (Nc ). Mol. Biol. Evol. 2013, 30, 191–196.
[CrossRef] [PubMed]
5. Négadi, T. The genetic code multiplet structure, in one number. Symmetry Cult. Sci. 2007, 18, 149–160. [CrossRef]
6. Négadi, T. The Genetic Code via Gödel Encoding. Open Phys. Chem. J. 2008, 2, 1–5. [CrossRef]
7. Négadi, T. The genetic code invariance: When Euler and Fibonacci meet 2014. Symmetry Cult. Sci. 2014, 25, 261–278.
8. Rumer, Y. About systematization of the genetic code. Dok. Akad. Nauk SSSR 1966, 167, 1393–1394.
9. Findley, G.I.; Findley, A.M.; McGlynn, S.P. Symmetry characteristics of the genetic code. Proc. Natl. Acad. Sci. USA 1982, 79,
7061–7065. [CrossRef] [PubMed]
10. Rosandić, M.; Paar, V. Codons sextets with leading role of serine create “ideal” symmetry classification scheme of the genetic
code. Gene 2014, 543, 45–52. [CrossRef] [PubMed]
11. Rosandić, M.; Paar, V. The novel Ideal Symmetry Genetic Code table-Common purine-pyrimidine symmetry net for all RNA and
DNA species. J. Theor. Biol. 2021, 524, 110748. [CrossRef] [PubMed]
12. shCherbak, V. The Arithmetical origin of the genetic code. In The Codes of Life: The Rules of Macroevolution; Barbieri, M., Ed.;
Springer Publishers: New York, NY, USA, 2008; pp. 153–185.
13. shCherbak, V.; Makukov, M. The “wow! Signal” of the terrestrial genetic code. Icarus 2013, 224, 228–242. [CrossRef]
14. Edge, M. Symmetry in Fibonacci numbers. Symmetry Cult. Sci. 2009, 20, 393–408.
15. Rakočević, M.M. Genetic Code: The unity of the stereochemical determinism and pure chance. arXiv 2009, arXiv:0904.1161v1.
16. Shu, J.J. A new integrated symmetrical table for genetic codes. Biosystems 2017, 151, 21–26. [CrossRef] [PubMed]
17. Lehmann, J. Physico-chemical constraints connected with the coding properties of the genetic system. J. Theor. Biol. 2000, 202,
129–144. [CrossRef] [PubMed]
18. Gonzalez, D.L.; Giannerini, S.; Rosa, R. On the origin of the mitochondrial genetic code. Towards a unfied mathematical
framework for the management of genetic information. Nat. Prec. 2012, 2012, 1–20. [CrossRef]
19. Available online: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG2 (accessed on 27 July 2023).
20. Downes, A.M.; Richardson, B.J. Relationships between genomic base content and distribution of mass in coded proteins. J. Mol.
Evol. 2002, 55, 476–490. [CrossRef] [PubMed]
21. Available online: https://www.dcode.fr/euler-totient (accessed on 27 July 2023).
Computation 2023, 11, 154 28 of 28
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.