3.5 Non-Canonical Amino Acids: Occurrence, Biosynthesis, and Synthetic Access

Intermediate 7 min read APS Editorial Article 3.5

The twenty canonical amino acids are the vocabulary of ribosomal protein synthesis, but they are not the full vocabulary of peptide chemistry. Selenocysteine and pyrrolysine extend the genetic code naturally, while genetic code expansion and solid-phase synthesis provide access to an essentially unlimited range of non-canonical building blocks with properties engineered for research and therapeutic applications.

Key Terms

Selenocysteine, Sec, U: The 21st canonical amino acid, encoded by the UGA codon in the context of a specific mRNA secondary structure element. Selenocysteine contains selenium in place of the sulfur of cysteine and is present in the active sites of selenoproteins including glutathione peroxidase and thioredoxin reductase.
Pyrrolysine, Pyl, O: The 22nd canonical amino acid, encoded by the UAG codon in certain methanogenic archaea. Pyrrolysine contains a pyrroline ring conjugated to the ε-amino group of lysine and is found in the active sites of methylamine methyltransferases.
Genetic code expansion: The engineering of an organism's translational machinery to incorporate non-canonical amino acids in response to a sense or nonsense codon, typically using orthogonal aminoacyl-tRNA synthetase and tRNA pairs that do not cross-react with endogenous components.
Bioorthogonal chemistry: Chemical reactions that proceed selectively in biological environments without interfering with endogenous biochemistry. Non-canonical amino acids carrying bioorthogonal handles, such as azides or alkynes, enable site-specific modification of proteins in living cells.
Amber suppressor: A tRNA engineered to recognize the UAG amber stop codon and to accept a non-canonical amino acid from an orthogonal aminoacyl-tRNA synthetase, allowing site-specific incorporation of the non-canonical amino acid at defined positions in a recombinant protein.

Beyond the Standard Twenty

The twenty canonical amino acids encoded by the standard genetic code define the chemistry of all ribosomally synthesized proteins. They do not, however, define the chemistry of all peptides, nor do they represent the full range of amino acid building blocks accessible to modern peptide science. Non-canonical amino acids enter peptides and proteins through three routes: natural extension of the genetic code to encode additional amino acids, post-translational modification of canonical residues to produce chemically distinct variants, and direct synthetic incorporation during solid-phase synthesis. This article addresses the first and third of these routes; the second is the subject of Article 3.7.

Selenocysteine: The 21st Amino Acid

Selenocysteine is the only amino acid beyond the standard twenty that is directly encoded by a codon in the human genome. It is incorporated cotranslationally in response to UGA, normally a stop codon, when a specific mRNA secondary structure element called the SECIS element, selenocysteine insertion sequence, is present in the 3' untranslated region of the mRNA. The dedicated cellular machinery includes a UGA-decoding tRNA, a specialized elongation factor, and a selenocysteine-specific aminoacyl-tRNA synthetase that charges the tRNA with serine before converting it to selenocysteine. ^[9]

The biological rationale for this biochemical complexity is the exceptional reactivity of the selenol functional group of selenocysteine. The pKa of the selenol is approximately 5.2, meaning selenocysteine is almost fully ionized as the selenolate anion at physiological pH. The selenolate is a far better nucleophile and reductant than the cysteine thiolate, and the selenium-containing active sites of glutathione peroxidase and thioredoxin reductase achieve catalytic efficiencies that cysteine analogs cannot match. Selenocysteine is found in twenty-five human selenoproteins, most involved in redox homeostasis. Its presence in the human proteome illustrates that biological chemistry has extended the canonical amino acid set when the chemical properties of the twenty were insufficient for a specific catalytic need.

Pyrrolysine: The 22nd Amino Acid

Pyrrolysine was discovered in 2002 as a naturally encoded amino acid in the active sites of methylamine methyltransferases in methanogenic archaea and some bacteria. ^[9] Like selenocysteine, it is encoded by a stop codon, in this case UAG, with suppression mediated by a dedicated tRNA and aminoacyl-tRNA synthetase pair specific to the organisms that use it. Unlike selenocysteine, pyrrolysine incorporation does not require a specific mRNA structural element: the pyrrolysine-specific machinery reads UAG in the appropriate genomic context without secondary structure signals, making it more amenable to genetic code expansion applications.

The discovery of pyrrolysine had immediate practical significance for the genetic code expansion field, because the pyrrolysine aminoacyl-tRNA synthetase accepts structurally diverse substrates and was rapidly engineered to charge a range of non-canonical amino acids in response to UAG codons in heterologous organisms.

Genetic Code Expansion: Engineering New Amino Acids into Proteins

Beyond the two naturally expanded codons, the field of genetic code expansion has developed methods to incorporate essentially any non-canonical amino acid into recombinant proteins at defined positions by engineering orthogonal aminoacyl-tRNA synthetase and tRNA pairs. ^[10] The key requirement for orthogonality is that the engineered synthetase must not aminoacylate any endogenous tRNA in the host organism, and the engineered tRNA must not be aminoacylated by any endogenous synthetase. Pairs from organisms phylogenetically distant from the host, including archaeal pairs expressed in bacterial or mammalian cells, satisfy this requirement with high selectivity.

The non-canonical amino acids incorporated through genetic code expansion cover an enormous chemical range. Amino acids carrying bioorthogonal handles, including azides, alkynes, and tetrazines, enable site-specific conjugation reactions in living cells or on purified proteins. Amino acids carrying photocrosslinking groups including benzophenone and diazirine allow covalent capture of protein interaction partners upon UV irradiation. Amino acids with spectroscopic probes including fluorescent dyes, infrared labels, and NMR-active nuclei enable biophysical studies with precise spatial resolution. Photo-caged amino acids allow light-triggered activation of function. The scope of this chemistry has transformed what is possible in protein engineering and biological research.

Non-Canonical Amino Acids in Chemical Synthesis

Solid-phase peptide synthesis imposes no constraints from the genetic code: any amino acid that can be appropriately protected and coupled can be incorporated into a synthetic peptide. The range of non-canonical amino acids used in SPPS is enormous and encompasses several practically important categories.

Beta-amino acids, in which the amino group is on the β-carbon rather than the α-carbon, produce peptides with altered backbone geometry and substantially enhanced proteolytic stability. When incorporated into alpha-peptide sequences, beta-amino acids disrupt recognition by proteases whose active sites are complementary to the alpha-peptide backbone. Peptides composed entirely of beta-amino acids, beta-peptides, fold into distinct secondary structures including 14-helices and 12-helices, establishing a distinct foldamer chemistry. N-methylated amino acids, in which the backbone nitrogen carries a methyl group rather than a hydrogen, eliminate the amide NH hydrogen bond donor, restrict backbone conformation, and confer resistance to proteolysis, as exemplified by cyclosporine's N-methylated residues. Alpha-methyl amino acids, of which alpha-aminoisobutyric acid is the most commonly used, induce helical structure through steric effects and eliminate the alpha-hydrogen, removing one site of potential racemization. D-amino acids, discussed in full in Article 3.6, provide an orthogonal approach to proteolytic stability without backbone nitrogen modification.

The non-canonical amino acid toolkit also includes building blocks designed for specific chemical purposes: amino acids with allyl, propargyl, or azide side chains for post-synthetic conjugation; amino acids with protected thiols or amines at defined positions for branching or cyclization; and spectroscopic probes pre-installed as amino acid derivatives. The availability of this toolkit, through commercial suppliers and custom synthesis, is one of the defining practical advantages of SPPS over recombinant expression for applications requiring structural diversity beyond the canonical twenty.

References

[9] Ambrogelly, A., Palioura, S., & Söll, D. (2007). Natural expansion of the genetic code. Nature Chemical Biology, 3(1), 29–35.
[10] Liu, C. C., & Schultz, P. G. (2010). Adding new chemistries to the genetic code. Annual Review of Biochemistry, 79, 413–444.

Comments (0)

No comments yet.

Article Info

non-canonical amino acids selenocysteine pyrrolysine genetic code expansion unnatural amino acids amber suppression bioorthogonal chemistry UAA click chemistry

Non-Canonical Amino Acids: Occurrence, Biosynthesis, and Synthetic Access