当前位置:文档之家› 密码子偏爱与异源蛋白表达

密码子偏爱与异源蛋白表达

密码子偏爱与异源蛋白表达
密码子偏爱与异源蛋白表达

Codon Bios and Heterologous Protein Expression

Codon Bias and Heterologous Protein Expression

Claes Gustafsson*, Sridhar Govindarajan & Jeremy Minshull

DNA 2.0, Inc. 1455 Adams Drive, Menlo Park, CA 94025 * Corresponding author: cgustafsson@https://www.doczj.com/doc/c64186015.html,

The expression of

functional proteins in heterologous hosts is a cornerstone of modern biotechnology. Unfortunately proteins are often difficult to express outside their original context. They may contain codons that are rarely used in the desired host, come from expression -limiting complete redesign of entire gene sequences to Redesign strategies including modification of translation elements and use of different codon biases are discussed.

In 1977 when Genentech scientists and their academic collaborators produced the first human protein

(somatostatin) in a bacterium 1

, expression of proteins in

heterologous hosts played a critical role in the launch of the entire biotechnology industry. At the time, only the amino acid sequence of somatostatin was known, so the Genentech group synthesized the 14 codon long somatostatin gene using oligonucleotides instead of cloning it from the human genome. Itakura and co-workers designed these oligonucleotides based on three criteria. First, codons favored by the phage MS2 were used preferentially. Not much of the Escherichia coli (E. coli ) genome DNA sequence was known at the time, but the MS2 phage had just been sequenced and was assumed to provide a good guide to the codons used in highly expressed E. coli genes. Second, care was taken to eliminate undesirable inter- and intra-molecular pairing of the overlapping oligonucleotides as this would compromise the gene synthesis process. Third, sequences rich in GC followed by AT rich sequence was avoided, as it was believed it could terminate transcription. The result was the first production of a functional polypeptide from a synthetic gene. Now a quarter of a century later, most genes are cloned from cDNA libraries or directly by polymerase chain reaction (PCR) from the organism of origin. De

novo gene synthesis is largely avoided because of

perceived high costs in time and effort 2

. Despite its prevalence, PCR-based cloning often requires templates that may not be trivial to access (cDNA templates must generally be used for organisms with introns), gene-specific PCR conditions, re-sequencing of PCR product and site-directed mutagenesis to repair PCR errors. The real fun, though, begins after the amplified gene is cloned into an expression vector: often the protein is not expressed or expressed only at very low levels. Much work has been done to improve the expression of cloned genes, including optimization of host growth conditions and the development of new host strains, organisms and

cell free systems 3

. Despite the advances that these approaches have made, they have skirted a significant underlying problem: the DNA sequence used to encode a protein in one organism is often quite different from the sequence that would be used to encode the same protein in another organism.

Why do different organisms prefer different codons? The genetic code uses 61 nucleotide triplets (codons) to encode 20 amino acids and three to terminate translation. Each amino acid is therefore

in the ribosome by complementary tRNAs which have been charged with the appropriate amino acid. The

degeneracy of the genetic code allows many alternative

nucleic acid sequences to encode the same protein. The frequencies with which different codons are used vary significantly between different organisms, between proteins expressed at high or low levels within the same

organism, and sometimes even within the same operon 4

.

There is continuing speculation regarding the evolutionary forces that have produced these differences

in codon preferences 5

. Codon distribution respond to genome GC content and the changes in codon usage are at least partly explained by a mutation/selection equilibrium between the different synonymous codons in

each organism 6

. Some researchers have hypothesized that codon biases that tend to reduce the diversity of isoacceptor tRNAs reduce the metabolic load and are therefore beneficial to organisms that spend part of their

lives under rapid growth conditions 7

.

Whatever the reasons for codon bias, it has become increasingly clear that codon biases can have profound impacts on the expression of heterologous

proteins 8

.

Visualizing codon biases

The correlation that has been observed between the codon bias of a gene and its expression levels has

been used to define a codon adaptation index 9

. This measure of codon usage is derived from a reference set of highly expressed genes to score the extent to which an to predict the expression levels of endogenous from genome sequence data 10

. However because index measures only the degree of preference but not the nature of that preference, it cannot be used to assess the likely compatibility between a gene and a candidate host. The gene may have a strong bias resulting in high codon adaptation indices, but these preferences may be for quite different codons.

Principal component analysis can be used to compress the high dimensional information into a two-dimensional map. This provides a more convenient way to visualize differences in codon preferences between different organisms. Figure 1 shows the average codon preferences of the genomes from eight commonly studied organisms represented on such a map. As can be seen in the figure, Streptomyces coelicolor (S. coelicolor ) has the most extreme codon usage profile. In

密码子偏爱与异源蛋白表达

异源的,异质的,异性的

墙角石,奠基石,基础

生长激素抑制素重新,再一次兼并性提出问题:一种有机体中编码一个蛋白质的DNA序列为什么同另一种有机体编码同一种蛋白质的DNA序列存在如此大的差异?密码子偏爱性出现的原因:

1、密码子具有兼并性。而这可为编码相同蛋白质的核酸序列提供更多选择空间;

2、进化的力量。不同物种的不同同义密码子的突变/选择平衡;

3、减少tRNAs的多样性,以减轻机体负担。

this organism almost every “wobble” position (the third

base in each codon, where much of the degeneracy of the genetic code resides) is a G or C, resulting in S. coelicolor’s high GC content (71%). The figure also shows that Saccharomyces cerevisiae (S. cerevisiae ), Caenorhabditis elegans (C. elegans ) and Arabidopsis thaliana (A. thaliana ) cluster in this map, indicating that they share similar codon preferences and suggesting that S. cerevisiae would be a good candidate for expressing native A. thaliana or C. elegans genes.

Figure 1 also makes immediately obvious the considerable divergence between E. coli and human codon preferences. This confirms what many researchers have learned through extensive experimentation: E. coli is not the optimal host for expressing proteins encoded with human codon usage profile.

The codon distribution in the map helps to visualize the codons that are used differentially by each of the 8 organisms. For example mammalian genes commonly use AGG and AGA codons for Arg (each are used for 11.2% of Arg codons in human genes) whereas these are very rarely used in E. coli (2.1% and 2.4% respectively). Thus in Figure 1 AGG and AGA both contribute to positive deviations in principal component 2 (PC2) as is seen for the overall human codon preference. In contrast E. coli prefers the CGT Arg codon (used 16.4% of the time, compared with 4.5% usage in human genes), so CGT contributes to negative deviations in PC2, as is seen for the overall E. coli codon bias. A map of ‘codon usage space’ is therefore useful as it quickly identifies infrequently used codons in genes derived from each organism that will be potentially problematic when attempting heterologous expression.

How does codon bias affect protein expression?

Codon usage has been identified as the single

most important factor in prokaryotic gene expression 11

. The reason for this is almost certainly because preferred codons correlate with the abundance of cognate tRNAs available within the cell. This relationship serves to optimize the translational system and to balance codon

concentration with isoacceptor tRNA concentration 12

. In

E. coli , for example, the tRNA Arg

4 that reads the infrequently used AGG and AGA codons for Arg is present only at very low levels. It is likely that codon usage and tRNA isoacceptor concentrations have

coevolved, and that the selection pressure

for this

coevolution is more pronounced for highly

expressed genes than genes expressed at low levels 13

.

The coevolution of isoacceptor tRNAs with codon

frequencies has even led in some cases to departures

from the canonical genetic code 14

. While comparative genomics studies are shedding new light on the ongoing

evolution of the genetic code 15,16

, the existence of slightly different codes in different organisms is a very significant barrier to heterologous expression. Indeed some organisms, notably the ciliates that have played an important role in the elucidation of telomere biology, possess tRNAs that read the canonical stop codons TAA and TAG as Glu, making these genes impossible to express heterologously.

Improving expression by modifying the host

If the negative effect of different codon biases on heterologous gene expression results from different tRNA

levels, one solution appears to be to expand the host’s intracellular tRNA pool. This can be done by over-expressing genes encoding the rare tRNAs. For E. coli , the primary targets to facilitate expression of human

genes are the argU gene encoding the minor tRNA Arg

4

that reads AGG and AGA codons, tRNA Ile

2 that reads

AUA, tRNA Leu

3 that reads CUA and CUG, and tRNA Pro 2

that reads CCC and CCU 8

. E. coli strains over-expressing these tRNA genes are commercially available from companies such as Stratagene (www.stratagene. com) and Novagen (https://www.doczj.com/doc/c64186015.html,/html /NVG/home.html). Several laboratories have shown that expression yields of proteins whose genes contain rare codons can be dramatically improved when the cognate

tRNA is increased within the host 8

.

Even though tRNA over-expression initially appears as an attractive solution, there are caveats. Different tRNAs may need to be over-expressed for genes from different organisms and the strategy is less appealing for hosts more difficult to manipulate than E coli . There may also be metabolic effects of changing a cell’s tRNA concentrations. Perhaps most important, though, is the question of how increasing the tRNA concentration will affect amino acylation and tRNA modifications and thus whether the composition of the over-expressed protein will be consistent.

Transfer RNA molecules are extensively processed prior to amino-acylation and participation in the translational process. More than 30 modified nucleotides have been found in E. coli tRNAs; some are present at the same position for all tRNAs, others are found in one

or a few different tRNAs 17

. Many of the tRNA modifications scattered throughout the tRNA molecule and especially those located in the anticodon loop, have

been shown to improve reading frame maintenance 18

. One purpose of these modifications is thought to be to reduce translational frameshifts: the lack of some tRNA modifications has been experimentally linked to missense

and nonsense errors during translation 17

, for example tRNAs lacking methylation of tRNA at the N-1 position of

guanosine (m 1

G) at position 37 result in translational

frameshifts 19

.

A problem with the tRNA over-expression strategy, then, is that producing a fully functional tRNA requires other cellular components that may be in limiting supply

when the tRNA alone is over-produced. When tRNA Leu

1 is over-expressed in E. coli the tRNA is significantly

under-modified in at least two ways: m 1

G at position 37 and pseudouridine (Ψ) at position 32. Only 40% of the

tRNA Leu

1 molecules are amino acylated, the strain grows very slowly and the ribosomal step time is reduced two -

three fold 20. Similarly over-expressed tRNA Tyr

results in a decrease of the 2-methylthio-N-6-isopentenyl adenosine

(ms 2i 6

A) modification at position 37 and a tRNA that is

less efficient in vitro 21. Loss of the ms 2i 6

A modification

following tRNA Phe

over-expression led to decreased

fidelity of translation 22

.

Translational missense substitution frequencies can increase with more than an order of magintude as a function of under-acetylated tRNA. One particular concern over such loss of fidelity is the possibility that the resulting heterogeneous mixture of proteins might induce

an immune response if introduced into vertebrates 23

. In addition to translational fidelity and host metabolic load issues, the tRNA over-expression strategy is not terribly flexible. It is much more difficult to engineer fungal or mammalian host cells than E coli . In eukaryotic

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

大肠杆菌不是表达人类蛋白质的最佳宿主共同进化端粒

对宿主菌进行修饰的一个方法是:使宿主内编码稀有tRNAs的基因过量表达

[keiviAt] 警告, 告诫scattered 离散的,分散的

cells the tRNA expression is driven by copy number, not promoter strength, further complicating the issue. For some applications such as the emerging field of DNA vaccines, host engineering is quite out of the question. The alternative approach is to modify the gene to be expressed.

Results from codon optimization

In general, the more codons that a gene contains that are rarely used in the expression host, the less likely it is that the heterologous protein will be expressed at reasonable levels 8. Low expression levels are exacerbated if the rare codons appear in clusters or in the N-terminal part of the protein. A common strategy to improve expression is therefore to alter the rare codons in the target gene so that they more closely reflect the codon usage of the host, without modifying the amino acid sequence of the encoded protein. Techniques for achieving this range from sequential site-directed mutagenesis steps 24 to resynthesis of the entire gene 25.

In Table 1, we have attempted to identify all publications where protein expression levels from natural gene sequences are compared with their codon-optimized counterparts in identical systems. The methods for codon optimization differ in each case, but all have replaced one or more codon that is rarely used in the host with one that is more frequently used.

Many of the published codon optimization reports involve expressing mammalian proteins in E. coli. In several instances, increases in expression levels achieved are dramatic. Two papers describe proteins that were effectively undetectable when expressed from the native genes. After codon-optimization, expression levels of between 10% and 20% of total E. coli soluble protein were obtained 26,27. More typical increases in expression for codon-optimized mammalian proteins in E. coli are between five and 15-fold and can frequently yield as much as 5% of the E coli soluble protein.

Another very successful application of codon optimization, generally by complete resynthesis of the gene, is in enhancing the expression of viral genes in mammalian cell lines. Viruses are a particularly interesting example because their codons are often constrained by a completely different pressure: their very dense information load is frequently accommodated using overlapping reading frames. Many viral genes also encode cis-acting negative regulatory sequences within the coding sequence. When expression of only one protein is required, the gene can be resynthesized with a host codon bias that also disrupts the regulatory elements thereby enhancing protein production 28. Viral codon optimization is often performed for DNA vaccine research to increase the immunogenicity of the target. In many published studies the immune response to the injected DNA is measured but not the protein concentration. Some of these examples have been omitted from Table 1, which only lists publications where the protein concentration is measured directly.

Gene resynthesis is also essential for heterologous expression of genes from organisms that use non-canonical codes. These include pathogens such as Candida albicans 29 and ciliate model organisms such as Tetrahymena30. Elimination of codons that would be read as termination signals or different amino acids is essential not just to improve expression levels, but to achieve any expression at all of the encoded protein. Beyond codon bias

Although the codon bias in a gene plays a large role in its expression, it would be misleading to suggest that this is the only factor involved. The choice of expression vectors and transcriptional promoters are also important 3. The nucleotide sequences surrounding the N-terminal

region of the protein appear particularly sensitive, both to the presence of rare codons 31,32and to the identities of the codons immediately adjacent to the initiation AUG 33,34. There is also some interplay between translation and mRNA stability which has not been completely deconvoluted 2, although reduced translational efficiency may be accompanied by a lower mRNA level because decreased ribosomal protection of the mRNA will increase its exposure to endo-RNAses. The structure of the 5’ end of the mRNA also has a significant effect 35, and strategies using short upstream open reading frames for translational coupling of target genes have proved successful in improving the efficiency of expression of some problem genes 36.

It should also be noted that efficient translation is necessary but not sufficient to produce a functional protein. The polypeptide chain must fold correctly, in some cases form appropriate disulphide bonds and even undergo post-translational modifications such as glycosylation. For these processes the absence of the correct redox environment, chaperonins, normal association partners or modifying enzymes will provide additional challenges. These issues are beyond the scope of this article: we will content ourselves for the time being with efficiently producing the polypeptide.

Gene design considerations

Designing a gene de novo can be both liberating and daunting. At the least constrained end of the choice spectrum there are an enormous number of DNA sequences that can encode a single amino acid sequence. Each amino acid can be encoded by an average of 3 different codons, so there are around 3100 (~5 x 1047) nucleotide sequences that would all produce the same 100 amino acid protein. How many of these possible sequences will result in high levels of heterologous protein expression? At the other end of this spectrum only a single nucleotide sequence is possible. Here only one codon –the one used most frequently by the host - is used for each amino acid.

The ‘One amino acid – one codon’ approach has several drawbacks. First, a strongly transcribed mRNA from such a gene will generate high codon concentrations for a subset of the tRNA, resulting in imbalanced tRNA pool, skewed codon usage pattern and the potential for translational error 23: heterologously expressed proteins may be produced at levels as high as 60% of total cell mass, making the use of a single tRNA pool a significant problem. Introducing silent mutations in a ‘One amino acid – one codon’ optimized gene can increase protein expression four-fold 37. Second, with no flexibility in codon selection, it is impossible to avoid repetitive elements and secondary structures in the gene and mRNA which may inhibit ribosome processivity through mRNA stem-loops 35. Repetitive elements may also affect the ease of gene synthesis, making it more troublesome if performed in-house or more expensive and time-consuming if outsourced. Severe repetitive elements may also affect the stability of a gene in its host. Third, it is often desirable to incorporate or exclude sequence elements such as restriction sites from the

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

sequence to facilitate subsequent manipulations. These are impossible to accommodate if the codon usage is rigidly fixed.

Conclusions

The genetic information encoded in an open reading frame goes far beyond simply stating the order of the amino acids in the protein. It is now estimated that alternative splicing comprises 40-60% of all human multiexon genes, antisense transcription occurs in 10-20% of all genes, mRNA editing is common (at least in neural cells), regulatory elements abundant and mRNA degradation signals through RNAi and otherwise are identified throughout the human genome. As we start to peel through the different layers of complex and integrated information present in the coding regions of DNA, we can start making more informed decisions on how to design genes and genetic networks.

The design and use of synthetic genes offers a mechanism by which researchers can assume much greater control of heterologous protein expression. As well as manipulating codon biases, peptide tags can be added, splice sites removed and restriction sites placed as desired. The cost and fidelity of gene synthesis appears to be following a trajectory similar to that seen for synthetic oligonucleotides over the past two decades, making their use increasingly cost-effective. This trend will allow scientists to focus more on science rather than on obtaining the tools with which to work. The biotechnology industry is thus en-route to closing the circle to its distant past; the genetic engineering tools pioneered by the Genentech group and their academic collaborators in long-ago 1977 will once again become state-of-the-art.

Acknowledgements

We would like to thank Drs. Jon Ness, Tony Cox, Ramasubbu Venkatesh and Tom Vigdal, all from DNA

2.0 Inc., for discussions on codon optimization and

helpful comments on the manuscript.

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

Gene Origin Protein Name Host Improvement Ref H. sapiens IL2

E. coli 16 fold 38

C. tetani

Fragment C

E. coli

Four fold 39 B. thuringiensis CryIA(b), CryIA(c) L. esculentum 100 fold

40

B. thuringiensis CryIA(b), CryIA(c) Nicotiana tabacum Below detection vs. >0.1% of tot. protein 40

M. musculus IG kappa chain S. cerevisiae > 50 fold

41

Bacillus hybrid (1,3-1,4)-β-glucanase H. vulgare Below detection vs. 40ng per

2 x 105

protoplasts

42

H. sapiens TnT E. coli 10 and 40 fold (two different constructs) 43 HIV Gp120 H. sapiens >40 fold

44

A. victoria GFP H. sapiens Below detection vs. substantial signal 44

A. victoria GFP H. sapiens 22 fold

45

A. victoria Mutated GFP C. albicans Below detection vs. strong band in western

29

M. musculus c-Fos

E. coli Below detection vs. 20% of soluble protein 27

S. oleracea plastocyanin E. coli 1.2 fold 46 H. sapiens

neurofibromin E. coli

three fold 47 L. monocytogenes LLO M. musculus 100 fold 48 H. sapiens M2-2 E. coli 140 fold 49 R. prowazekii Tlc

E. coli

No effect

50 BPV1 L1 and L2 mammalian > 1 x 103

fold

51

H. sapiens PC-TP E. coli

Trace levels vs. 10% of cytosolic protein 26 H. sapiens

hCG-β

Dictyostelium Four-five fold

52

T. aestivum CYP73A17 S. cerevisiae Four, seven and 13 fold (three different constructs) 53 T. aestivum CYP73A17 N. tabacum Five fold 53

HIV

gag

H. sapiens > 322 fold 54 Dermatophagoides ProDer p1 P. troglodytes Five-10 fold 55 HIV

gag

H. sapiens 1.5-two fold 28 Plasmodium

EBA-175 region II and MSP-1 M. musculus Four fold

56

Tn10/Herpes simplex virus rtTA M. musculus > 20 fold

57 HPV

L1 H. sapiens 1 x 104 – 1 x 105

fold 58

C. diphtheriae – mammal hybrid DT P. pastoris 0 vs 10mg L -1

59

P1 phage Cre

Mammalian 1.6 fold 60

A. equina Equistatin P. pastoris Two fold 61 H. sapiens IL-6

E. coli Three fold

62

H. sapiens

Glucocerebrosidase Pichia pastoris Eight and 10 fold (two different constructs)

63

Schistosoma mansoni SmGPCR

H. sapiens Barely detectable vs. strong band in western 64

C. elegans GluClα1, GluClβ R. norvegicus Six-nine fold 65

Herpesvirus U51

Mammalian 10-100 fold

66

HIV

gag , pol, env, nef H. sapiens >250x, >250x, >45x, >20x respectively 67

H. sapiens IL-18 E. coli

Five fold 68 HPV E5 Mammalian Six-nine fold 69 HPV E7

Mammalian 20-100 fold

70

Plasmodium

F2 domain of EBA175

E. coli , Pichia pastoris

Four fold and nine fold

71

Table 1. Compilation of publications where gene expression of codon optimized and wildtype sequences have been compared head-to-head and the produced protein yield has been measured.

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

Figure 1. Graphical representation of “codon usage space”. Principal component analysis (PCA) involves a mathematical

procedure that transforms a number of correlated variables (here codon frequencies) into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. The frequencies with which each codon is used in all proteins of eight commonly studied organisms (www.kazusa.or.jp/codon/) were tabulated in a 8 r ows/organisms x 62 columns/codons and subjected to principal component analysis to produce a map of “codon usage space”. The two codons ATG and TGG that uniquely encode Met and Trp respectively have been omitted. Two dimensions were identified that accounted for 70% (PC1) and 12% (PC2) of the total codon variability information respectively. The black diamonds represent the loads, i.e. the contribution of each codon to the two principal component dimensions (for example codons GAT and CAG contribute nothing to PC2 but have approximately equal negative and positive contributions to PC1). The values of the codon loads have been normalized to that of the organism distribution. The red squares show the preferences of each organism plotted within this space. The plot was made using MatLab from Mathworks (https://www.doczj.com/doc/c64186015.html,)

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

The gene design process

The procedure developed at DNA 2.0 Inc. (https://www.doczj.com/doc/c64186015.html,) for designing a gene sequence to encode a specific protein is shown in figure. The process involves using an initial codon usage table to propose candidates, then a successive set of filters to eliminate those sequences that do not also comply with additional design constraints.

1. Constructing and using a codon usage table. The large amount of genomic sequence now available has made it possible to derive the codon usage for any organism. An excellent compilation can be found at www.kazusa.or.jp/codon/. For expression in E. coli , for example, codon usage from highly expressed (type II) genes are available https://www.doczj.com/doc/c64186015.html,/~mmaduro/codonusage/codontable.ht m 72

. These tables can be adapted for gene design in two steps. First, a threshold level is set. That is, all frequencies below a certain value (typically between 5% and 10%) are set to zero, so that rare codons are completely eliminated. Second, the remaining frequencies are normalized so that the summed frequencies for codons for each amino acid equal 100%.

Hybrid codon usage tables can be constructed for a protein that is to be expressed in more than one host. Codons that are below the threshold in either host are eliminated. The frequencies for the remaining codons can be calculated by simply using the frequencies for the most restrictive organism, or by calculating a mean value for each codon in all of the desired hosts.

Once the codon usage table has been constructed, candidate sequences are enumerated in silico by selecting codons at random with probabilities obtained from the codon usage table. Each designed sequence is then passed through subsequent filters to ensure a match with additional design criteria.

2. Eliminating unfavorable codon pairs and extreme GC content. The GC content of genes and the frequency with which adjacent codons occur (codon pair frequency) are both factors that are correlated to codon usage frequency. The codon pair frequency can deviate significantly from

what would be expected from just the statistical distribution of each single codon. Codon pairs that are avoided in highly expressed E. coli genes can be found on the web

(www.bio21.bas.bg/codonpairs) 73

and is used as a criterion to reject candidate designs.

3. Eliminating repetitive sequences. Direct repeats can be detected by standard methods such as a BLAST

comparison 74

of the sequence against itself. Candidate designs that contain significant lengths of direct repeats are eliminated.

4. Avoiding unfavorable mRNA secondary structures. Stable mRNA structures, particularly at the 5’ end of the transcript, have been implicated in reduced gene

expression 2,35

. The potential of a transcribed mRNA to adopt such a structure can be identified using free energies calculations. Software for performing such analyses can

be found at https://www.doczj.com/doc/c64186015.html,/applications/mfold 75

. 5. Avoiding and including restriction sites. The presence or absence of selected restriction sites is often important to facilitate subsequent gene manipulations such as swapping between vectors, exchanging protein domains and adding or removing peptide tags or fusion partners. Candidate sequences can be tested to ensure the correct placement or elimination of restriction sites.

6. Other constraints. Additional constraints that can be used to sift the gene design solutions through includes avoiding cryptic splice sites and regulatory elements, immuno-stimulatory or immuno-suppressive elements (for

DNA vaccines) 76

, RNA methylation signals, selenocystein incorporation signals and many more depending on the biological system used and specific concerns. Gene designs can also be used to maximize genetic distances from endogenous gene homologs (to minimize risk of in vivo recombination) or patented sequences (to avoid patent infringement).

As shown in Figure, each of these filters reduce the number of possible sequences, but many possible sequences generally remain even with five or six constraints in addition to the codon bias.

10Adjust codon bias & 10Eliminate unfavorable Add / remove >1047possibilities

eliminate unfavorable pairs

30possibilities

Eliminate repeats

1015

possibilities

mRNA structures

1012possibilities

restriction sites

108

possibilities

Other constraints

4possibilities

In silico

Wet lab

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

Refs

1 Itakura, K. et al. (1977) Expression in Escherichia coli

of a chemically synthesized gene for the hormone somatostatin. Science 198 (4321), 1056-1063

2 Wu, X. et al. (2004) Codon optimization reveals critical

factors for high level expression of two rare codon genes in Escherichia coli: RNA stability and secondary structure but not tRNA abundance. Biochem Biophys Res Commun 313 (1), 89-96.

3 Higgins, S.J., Hames, B. D. (1999) Protein

Expression: A Practical Approach, Oxford University Press

4 Gouy, M. and Gautier, C. (1982) Codon usage in

bacteria: correlation with gene expressivity. Nucleic Acids Res 10 (22), 7055-7074.

5 Grosjean, H. and Fiers, W. (1982) Preferential codon

usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene18 (3), 199-209.

6 Knight, R.D. et al. (2001) A simple model based on

mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2 (4), RESEARCH0010. Epub 2001 Mar 0022.

7 Andersson, G.E. and Kurland, C.G. (1991) An

extreme codon preference strategy: codon reassignment. Mol Biol Evol 8 (4), 530-544.

8 Kane, J.F. (1995) Effects of rare codon clusters on

high-level expression of heterologous proteins in Escherichia coli. Curr Opin Biotechnol 6 (5), 494-500.

9 Sharp, P.M. and Li, W.H. (1987) The codon

Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15 (3), 1281-1295.

10 Carbone, A. et al. (2003) Codon adaptation index as a

measure of dominating codon bias. Bioinformatics19

(16), 2005-2015.

11 Lithwick, G. and Margalit, H. (2003) Hierarchy of

sequence-dependent features associated with prokaryotic translation. Genome Res13 (12), 2665-2673.

12 Ikemura, T. (1981) Correlation between the

abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol 151 (3), 389-409.

13 Bulmer, M. (1987) Coevolution of codon usage and

transfer RNA abundance. Nature325 (6106), 728-730.

14 Massey, S.E. et al. (2003) Comparative evolutionary

genomics unveils the molecular mechanism of reassignment of the CTG codon in Candida spp.

Genome Res 13 (4), 544-557.

15 Santos, M.A.S. et al. (2004) Driving change: the

evolution of alternative genetic codes. Trends Genet.

20, 95-102

16 Knight, R.D. et al. (2001) Rewiring the keyboard:

evolvability of the genetic code. Nat Rev Genet2 (1), 49-58.

17 Bj?rk, G.R. (1996) Stable RNA modification. In

Escherichia coli and Salmonella: Cellular and Molecular Biology.(Neidhardt, F.C. et al., eds.), pp.

861-886, ASM Press,

18 Urbonavicius, J. et al. (2001) Improvement of reading

frame maintenance is a common function for several tRNA modifications. Embo J 20 (17), 4863-4873. 19 Li, J.N. and Bj?rk, G.R. (1995) 1-methylguanosine

deficiency of tRNA influences cognate codon interaction and metabolism in Salmonella typhimurium. J. Bact 177, 6593-6600

20 Wahab, S.Z. et al. (1993) Effects of tRNA(1Leu)

overproduction in Escherichia coli. Mol Microbiol 7 (2), 253-263.

21 Gefter, M.L. and Russell, R.L. (1969) Role

modifications in tyrosine transfer RNA: a modified base affecting ribosome binding. J Mol Biol39 (1), 145-157.

22 Wilson, R.K. and Roe, B.A. (1989) Presence of the

hypermodified nucleotide N6-(delta 2-isopentenyl)-2-methylthioadenosine prevents codon misreading by Escherichia coli phenylalanyl-transfer RNA. Proc Natl Acad Sci U S A 86 (2), 409-413.

23 Kurland, C. and Gallant, J. (1996) Errors of

heterologous protein expression. Curr Opin Biotechnol

7 (5), 489-493.

24 Kink, J.A. et al. (1991) Efficient expression of the

Paramecium calmodulin gene in Escherichia coli after four TAA-to-CAA changes through a series of polymerase chain reactions. J Protozool38 (5), 441-447.

25 Nambiar, K.P. et al. (1984) Total synthesis and cloning

of a gene coding for the ribonuclease S protein.

Science 223 (4642), 1299-1301.

26 Feng, L. et al. (2000) High-level expression and

mutagenesis of recombinant human phosphatidylcholine transfer protein using a synthetic gene: evidence for a C-terminal membrane binding domain. Biochemistry 39 (50), 15399-15409.

27 Deng, T. (1997) Bacterial expression and purification

of biologically active mouse c-Fos proteins by selective codon optimization. FEBS Lett409 (2), 269-272.

28 Graf, M. et al. (2000) Concerted action of multiple cis-

acting sequences is required for Rev dependence of late human immunodeficiency virus type 1 gene expression. J Virol 74 (22), 10822-10826.

29 Cormack, B.P. et al. (1997) Yeast-enhanced green

fluorescent protein (yEGFP)a reporter of gene expression in Candida albicans. Microbiology143 (Pt

2), 303-311.

30 Collins, K. and Gandhi, L. (1998) The reverse

transcriptase component of the Tetrahymena telomerase ribonucleoprotein complex. Proc Natl Acad Sci U S A 95 (15), 8485-8490.

31 Hoekema, A. et al. (1987) Codon replacement in the

PGK1 gene of Saccharomyces cerevisiae: experimental approach to study the role of biased codon usage in gene expression. Mol Cell Biol7 (8), 2914-2924.

32 Deana, A. et al. (1998) Silent mutations in the

Escherichia coli ompA leader peptide region strongly affect transcription and translation in vivo. Nucleic Acids Res 26 (20), 4778-4782.

33 Sato, T. et al. (2001) Codon and base biases after the

initiation codon of the open reading frames in the Escherichia coli genome and their influence on the translation efficiency. J Biochem (Tokyo) 129 (6), 851-860.

34 Stenstrom, C.M. and Isaksson, L.A. (2002) Influences

on translation initiation and early elongation by the messenger RNA region flanking the initiation codon at the 3' side. Gene 288 (1-2), 1-8.

35 Griswold, K.E. et al. (2003) Effects of codon usage

versus putative 5'-mRNA structure on the expression of Fusarium solani cutinase in the Escherichia coli cytoplasm. Protein Expr Purif 27 (1), 134-142.

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

36 Ishida, M. et al. (2002) Overexpression in Escherichia

coli of the AT-rich trpA and trpB genes from the hyperthermophilic archaeon Pyrococcus furiosus.

FEMS Microbiol Lett 216 (2), 179-183.

37 Klasen, M. and Wabl, M. (2004) Silent point mutation

in DsRed resulting in enhanced relative fluorescence intensity. BioTechniques 36 (2), 236-237

38 Williams, D.P. et al. (1988) Design, synthesis and

expression of a human interleukin-2 gene incorporating the codon usage bias found in highly expressed Escherichia coli genes. Nucleic Acids Res

16 (22), 10453-10467.

39 Makoff, A.J. et al. (1989) Expression of tetanus toxin

fragment C in E. coli: high level expression by removing rare codons. Nucleic Acids Res17 (24), 10191-10202.

40 Perlak, F.J. et al. (1991) Modification of the coding

sequence enhances plant expression of insect control protein genes. Proc Natl Acad Sci U S A 88 (8), 3324-3328.

41 Kotula, L. and Curtis, P.J. (1991) Evaluation of foreign

gene codon optimization in yeast: expression of a mouse IG kappa chain. Biotechnology (N Y)9 (12), 1386-1389.

42 Jensen, L.G. et al. (1996) Transgenic barley

expressing a protein-engineered, thermostable (1,3-1,4)-beta-glucanase during germination. Proc Natl Acad Sci U S A 93 (8), 3487-3491.

43 Hu, X. et al. (1996) Specific replacement of

consecutive AGG codons results in high-level expression of human cardiac troponin T in Escherichia coli. Protein Expr Purif 7 (3), 289-293.

44 Haas, J. et al. (1996) Codon usage limitation in the

expression of HIV-1 envelope glycoprotein. Curr Biol 6

(3), 315-324.

45 Zolotukhin, S. et al. (1996) A "humanized" green

fluorescent protein cDNA adapted for high-level expression in mammalian cells. J Virol70 (7), 4646-4654.

46 Ejdeback, M. et al. (1997) Effects of codon usage and

vector-host combinations on the expression of spinach plastocyanin in Escherichia coli. Protein Expr Purif11

(1), 17-25.

47 Hale, R.S. and Thompson, G. (1998) Codon

optimization of the gene encoding a domain from human type 1 neurofibromin protein results in a threefold improvement in expression level in Escherichia coli. Protein Expr Purif 12 (2), 185-188.

48 Uchijima, M. et al. (1998) Optimization of codon usage

of plasmid DNA vaccine is required for the effective MHC class I-restricted T cell responses against an intracellular bacterium. J Immunol161 (10), 5594-5599.

49 Johansson, A.S. et al. (1999) Use of silent mutations

in cDNA encoding human glutathione transferase M2-

2 for optimized expression in Escherichia coli. Protein

Expr Purif 17 (1), 105-112.

50 Alexeyev, M.F. and Winkler, H.H. (1999) Gene

synthesis, bacterial expression and purification of the Rickettsia prowazekii ATP/ADP translocase. Biochim Biophys Acta 1419 (2), 299-306.

51 Zhou, J. et al. (1999) Papillomavirus capsid protein

expression level depends on the match between codon usage and tRNA availability. J Virol73 (6), 4972-4982.

52 Vervoort, E.B. et al. (2000) Optimizing heterologous

expression in dictyostelium: importance of 5' codon adaptation. Nucleic Acids Res 28 (10), 2069-2074.

53 Batard, Y. et al. (2000) Increasing expression of P450

and P450-reductase proteins from monocots in

heterologous systems. Arch Biochem Biophys 379 (1), 161-169.

54 zur Megede, J. et al. (2000) Increased expression and

immunogenicity of sequence-modified human immunodeficiency virus type 1 gag gene. J Virol74

(6), 2628-2635.

55 Massaer, M. et al. (2001) High-level expression in

mammalian cells of recombinant house dust mite allergen ProDer p 1 with optimized codon usage. Int Arch Allergy Immunol 125 (1), 32-43.

56 Narum, D.L. et al. (2001) Codon optimization of gene

fragments encoding Plasmodium falciparum merzoite proteins enhances DNA vaccine protein expression and immunogenicity in mice. Infect Immun69 (12), 7250-7253.

57 Valencik, M.L. and McDonald, J.A. (2001) Codon

optimization markedly improves doxycycline regulated gene expression in the mouse heart. Transgenic Res

10 (3), 269-275.

58 Leder, C. et al. (2001) Enhancement of capsid gene

expression: preparing the human papillomavirus type

16 major structural gene L1 for DNA vaccination

purposes. J Virol 75 (19), 9201-9209.

59 Woo, J.H. et al. (2002) Gene optimization is necessary

to express a bivalent anti-human anti-T cell immunotoxin in Pichia pastoris. Protein Expr Purif25

(2), 270-282.

60 Shimshek, D.R. et al. (2002) Codon-improved Cre

recombinase (iCre) expression in the mouse. Genesis

32 (1), 19-26.

61 Outchkourov, N.S. et al. (2002) Optimization of the

expression of equistatin in Pichia pastoris. Protein Expr Purif 24 (1), 18-24.

62 Li, Y. et al. (2002) Cloning and hemolysin-mediated

secretory expression of a codon-optimized synthetic human interleukin-6 gene in Escherichia coli. Protein Expr Purif 25 (3), 437-447.

63 Sinclair, G. and Choy, F.Y. (2002) Synonymous codon

usage bias and the expression of human glucocerebrosidase in the methylotrophic yeast, Pichia pastoris. Protein Expr Purif 26 (1), 96-105.

64 Hamdan, F.F. et al. (2002) Codon optimization

improves heterologous expression of a Schistosoma mansoni cDNA in HEK293 cells. Parasitol Res 88 (6), 583-586.

65 Slimko, E.M. and Lester, H.A. (2003) Codon

optimization of Caenorhabditis elegans GluCl ion channel genes for mammalian cells dramatically improves expression levels. J Neurosci Methods124

(1), 75-81.

66 Bradel-Tretheway, B.G. et al. (2003) Effects of codon-

optimization on protein expression by the human herpesvirus 6 and 7 U51 open reading frame. J Virol Methods 111 (2), 145-156.

67 Gao, F. et al. (2003) Codon usage optimization of HIV

type 1 subtype C gag, pol, env, and nef genes: in vitro expression and immune responses in DNA-vaccinated mice. AIDS Res Hum Retroviruses 19 (9), 817-823.

68 Li, A. et al. (2003) Optimized gene synthesis and high

expression of human interleukin-18. Protein Expr Purif

32 (1), 110-118.

69 Disbrow, G.L. et al. (2003) Codon optimization of the

HPV-16 E5 gene enhances protein expression.

Virology 311 (1), 105-114.

70 Cid-Arregui, A. et al. (2003) A synthetic E7 gene of

human papillomavirus type 16 that yields enhanced expression of the protein in mammalian cells and is useful for DNA immunization studies. J Virol77 (8), 4928-4937.

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

71 Yadava, A. and Ockenhouse, C.F. (2003) Effect of

codon optimization on expression levels of a functionally folded malaria vaccine candidate in prokaryotic and eukaryotic expression systems. Infect Immun 71 (9), 4961-4969.

72 Henaut, A. and Danchin, A. (1996) Analysis and

predictions from Escherichia coli sequences. In Escherichia coli and Salmonella typhimurium cellular and molecular biology(Vol. 2) (Neidhardt F, C. et al., eds.), pp. 2047-2066, ASM press

73 Boycheva, S. et al. (2003) Codon pairs in the genome

of Escherichia coli. Bioinformatics 19 (8), 987-998. 74 Altschul, S.F. et al. (1990) Basic local alignment

search tool. J Mol Biol 215 (3), 403-410.

75 Zuker, M. (2003) Mfold web server for nucleic acid

folding and hybridization prediction. Nucleic Acids Res

31 (13), 3406-3415.

76 Satya, R.V. et al. (2003) A pattern matching algorithm

for codon optimization and CpG motif engineering in DNA expression vectors. In The Second International IEEE Computer Society Computational Systems Bioinformatics Conference(Titsworth, F., ed.), pp.

294:305, The Institute of Electrical and Electronics Engineers, Inc.

Codon Bios and Heterologous Protein Expression

Codon Bios and Heterologous Protein Expression

密码子使用偏好性参数汇总

研究密码子偏好性常用的参数 1、相对同义密码子使用度(Relativ e Synonymous Codon Usage, RSCU ) 是指对于某一特定的密码子在编码对应氨基酸的同义密码子间的相对概率,它去除了氨基酸组成对密码子使用的影响。如果密码子的使用没有偏好性,该密码子的RSCU值等于1,当某一密码子的RSCU值大于1时,代表该密码子为使用相对较多的密码子,反之亦然。第i个氨基酸的第j个密码子的相对同义密码子使用度值的计算公式如下: 公式中, X ij是编码第i个氨基酸的第j个密码子的出现次数, n i是编码第i个氨基酸的同义密码子的数量( 值为1~6) 。研究中通常先利用高表达基因的RSCU值建立参考表格。 2、密码子适应指数(Codon Adaptation Index, CAI) 可以根据已知高表达基因的序列来估计未知基因密码子使用的偏好性程度。CAI的值在0~1之间, 如果越高则表明该基因的密码子使用偏好性越强。CAI 值一般用来预测种内基因的表达水平( 但目前的研究发现对于单细胞生物比较适用, 而在哺乳动物中并不能用来表示基因表达水平), 又可以用来预测外源基因的表达水平。 w ij(The relative adaptiveness of a codon): 密码子相对适应度 上式中RSCU imax、X imax分别指编码第i个氨基酸的使用频率最高的密码子的RSCU值和X值 L是指基因中所使用的密码子数。 3、密码子偏好参数(Codon Preference Parameter, CPP) CPP的变化范围为0 ~ 18, 越接近18表示密码子被非随机使用的程度越高。它对于基因编码区域总的碱基组成不敏感, 适于比较基因间或物种间密码子使用偏性的大小。 x ij是编码第i个氨基酸的第j个密码子的出现次数, n i是编码第i个氨基酸的同义密码子的数量( 值为2~6, n i= 1 的情况被排除) 4、有效密码子数(Effective Number of Codon, ENC) ENC值的范围在20~ 61之间, 越靠近20偏性越强。此值是描述密码子使用偏离随机选择的

常用密码表

【基本字母表】 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | I A | B | C | D | E | F | G | H | I | J | K | L M | | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | I N | O | P | Q | R | S | T | U | V | W | X | Y | Z | 1QWE加密表〗 | | | ----- 其实QWE加密可以表示成这种形式 【QWE解密表】 | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z 门 卜-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-T k | x | v | m | c| n| o | p | h | q | r| s | z| y | I |j | a | d| l | e | g | w | b u| f | t | 【电脑键盘表】 丁@ 丁#丁$丁% 丁A

I I I I I I I I I I I I I I 「-丄-丄-丄-丄-丄-丄-丄-丄-丄-丄-丄-丄o 盘表】 【埃特巴什加密/解密表】 I a I b I c I d I e I f I g I h I i I j I k I l I m I n I o | p I q I r I s I t I u I v I w I x I y I z I 卜-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-T I Z I Y I X I W I V I U I T I S I R I Q I P I O I N I M I L I K I J I I I H I G I F I E I D I C I B I A I 1反序QWE 加密表〗 I a I b I c I d I e I f I g I h I i I j I k I l I m I n I o I p I q I r I s I t I u I v I w I x I y I z I 卜-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-十-T I M I N I B I V I C I X I Z I L I K I J I H I G I F I D I S I A I P I O I I I U I Y I T I R I E I W I Q I (a,m,f,x,e,c,b,n ,d,v,t,u,y,w,r,o,s,i,k,h,l,g, z,q,p)(j) 【反序QWE 解密表】 I A I B I C I D I E I F I G I H I I I J I K I L I M I N I O I P I Q I R I S I T I U I V I W I 3ZXCVBNM / 1/2/3 -- Shift

1-大肠杆菌重组蛋白表达提取及纯化实验(最新整理)

第一天 1、配置LB培养基: 酵母粉15g、胰蛋白胨30g、氯化钠30g,定容至3000ml。调节PH至 7.4(2M NaOH),高压蒸汽灭菌20分钟,37℃保存。分装成15瓶(每瓶200ml)。 2、接种(超净台要提前杀菌通风) 取4瓶上述培养基,每瓶加200μlAMP(1:1000)、60μl菌液。37℃过夜。 第二天 1、扩大培养(超净台) 4瓶扩至16瓶,每瓶培养基加200μlAMP,摇床培养1小时左右。 2、诱导(超净台) 加40μlIPTG,加完后去除封口的除牛皮纸,扎口较松。25℃摇床培养4小时。 3、离心获取菌体 4℃,8000rpm离心25分钟。注意配平。 4、超声波破碎菌体 离心后去上清,向沉淀加入(600mlPB裂解液、300μl溶菌酶、3mlPMSF)。将菌液转入2个烧杯中,冰浴超声波破菌,400W,75次,每次6秒,间隔2秒。离心收集上清液。 600mlPB裂解液:20mM/L PB,10mM/L EDTA,5%甘油,1mM/L DTT,调节PH至7.4。 超声波破碎:首先用去离子水清洗探头,再将盛有菌液的小烧杯置于有冰 水混合物的大烧杯中,冰水界面略高于菌液面即可。探头浸没于菌液中,不可伸入过长。注意破菌过程中由于冰的融化导致的液面变化。 5、抽滤(双层滤纸) 洗胶(GST)。将上述上清液抽滤,滤液与GST胶混合,磁力搅拌过夜。 第三天

1、抽滤蛋白-胶混合液,滤液取样20μl,留电泳。 2、洗杂蛋白,用1×PBS+PMSF(1000:1)约400ml,洗脱若干次,用移液枪吸去上层泡沫(杂蛋白),至胶上无泡沫为止。 3、洗脱目的蛋白,洗脱液加50ml,分3次进行(15+15+15),每次加入后间歇搅拌,自然静置洗脱15分钟,抽滤,勿使胶干,合并洗脱液,取样20μl,留电泳。用洗脱液调零,测OD280。(OD值达到1.5为佳) 4、将洗脱液置于透析袋中(透析袋应提前煮好),将透析袋置于2L透析液1中,加入磁珠置于4℃冰箱内磁力搅拌器上,4小时后换为透析液2。胶的回收:用3M氯化钠溶液(用1×PBS溶液溶解)、1×PBS(无沉淀)洗涤,20%乙醇洗脱,装瓶。 洗脱液:50mM/LTRIS-HCL 、10mM/LGSH 透析液1:20mM/L TRIS-HCL、1mM/L EDTA 、0.15mM/L DTT 透析液2::0.5mM/L EDTA、1×PBS

重组蛋白表达系统的选择

重组蛋白表达系统的选择、表达策略和方法学研究 宁

1. 前言 在生命科学的很多研究和应用领域中,如何获得大量、均一、高纯、有活性的蛋白质都是一个关键问题。现代重组蛋白表达技术为我们提供了多种选择:传统的大肠杆菌、酵母、昆虫和哺乳动物细胞表达系统以及较新的植物和体外表达系统。每种表达系统都有很多成功的例子,但重组蛋白的个性不尽相同,没有任何一个系统和方法是普遍适用的,为目的蛋白选择一个恰当的表达系统也就成为表达工作的重中之重。 关于目的蛋白的一切信息,对表达系统的选择都是有帮助的,有几个最基本的问题一定要在表达之前回答清楚:目的蛋白的来源是原核还是真核生物?具有什么样的功能?分子量和聚合状态?是膜蛋白还是水溶蛋白?胞表达还是分泌表达?是否需要以及需要何种翻译后修饰?有没有配体、底物或产物类似物可以利用?对蛋白酶是否敏感?有多少分子及分子间二硫键?对目的蛋白的表达量、活性、表达速度和成本有怎样的要求?除了摸清目的蛋白的脾性,还要清楚各个表达系统的特点、优势和局限性,才能找到表达工作的大略方向,要获得最适合目的蛋白的表达方案,还需要在具体实验中调整优化。 表1比较了目前常用的表达系统的特点,并给出了粗略的适用围。 大肠杆菌酵母昆虫细胞哺乳动物细胞流程简单简单复杂复杂 培养基简单简单复杂复杂 成本低低中高 产率高中中低 表达量高高较高较低 蛋白折叠中较好较好好 胞外表达周质空间分泌至培养基分泌至培养基分泌至培养基 细胞增殖周期30min 90min 18H 24H 折叠常有错误折叠偶有不当折叠正确折叠正确 二硫键难以形成有有有 N-糖基化无甘露糖残基,高无唾液酸,简单复杂 O-糖基化无有有有 磷酸化无有有有 酰化无有有有 γ-羧基化无无无有 适用原核蛋白、简单 真核蛋白 真核蛋白、分泌 表达蛋白 真核蛋白、分泌 表达蛋白 复杂高等真核生 物蛋白 表1:常用表达系统比较

密码子偏好性与异源蛋白表达

密码子偏性与异源蛋白表达 原文:Claes Gustafsson, et al. TRENDS in Biotechnology, 2004,22(7): 346-353. https://www.doczj.com/doc/c64186015.html,/corp/images/MS102504CG.pdf 翻译:zhxm409511 在1977年,当Genetech的科学家和他们的科研合作伙伴首次利用细菌生产出人类蛋白(生长激素释放抑制因子)时[1],蛋白的异源表达在整个生物技术产业中发挥着关键的角色。那时,仅知道生长激素释放抑制因子的氨基酸序列,还不知如何从人的基因组中克隆该基因,因此,Genetech小组采用数条寡核苷酸合成了14个密码子长的生长激素释放抑制因子基因。Itakura和同事们设计这些寡核苷酸时遵循了三条标准[1]。首先,优先使用MS2噬菌体偏爱的密码子——尽管当时对大肠杆菌的基因组DNA序列还知之甚少,却已刚刚完成了MS2噬菌体的测序,并认为该噬菌体的序列能够代表大肠杆菌高表达基因所使用的密码子。其次,消除寡核苷酸不必要的分子内和分子间配对,因为这可能影响基因合成。第三,避免那些先是富含GC随后是富含AT的序列,当时认为这种序列可能会导致转录终止。结果,利用这条合成的基因首次制生产出来了具有功能活性的多肽。 25年后的今天,大多数基因克隆自cDNA文库或直接利用聚合酶链反应(PCR)从相应的基因组中扩增获得。要尽量避免从头合成基因,因为这样做需要消耗大量的财力和人力[2]。尽管基于PCR的克隆被广泛使用,但很多情况下它还是不及所描述的那样快捷和容易。它经常需要一些不易得到的模板(对于具有内含子的生物,需要cDNA模板),此外还需要进行PCR条件的优化,需要对PCR产物进行测序,如果PCR引入了任何的配对错误,还经常需要通过定点突变进行修复。然而,当扩增出的基因克隆入表达载体后,真正有趣的事情就发生了:经常是没有蛋白表达或表达水平很低。人们已经进行了大量的研究,以提高克隆基因的表达水平,包括优化宿主的生长条件,建立新的宿主系,改用新的宿主,和无细胞系统[3]。尽管这些方法都取得了一些进展,但它们都是围绕一个最根本问题进行的:一种生物所采用的编码蛋白的DNA序列经常不同于另外一种生物在编码该蛋白时所采用的DNA序列。 为什么不同的生物会偏爱不同的密码子? 遗传密码采用61组三连核苷酸(密码子)编码20种氨基酸,采用3个密码子终止翻译。因此每个氨基酸利用1个(Met和Trp)至6个(Arg,Leu,和Ser)同义密码子编码。这些密码子在核糖体中被互补的tRNAs阅读,而这些tRNAs已经事先携带了相应的氨基酸。密码子的兼并性使得同一蛋白可采用多种不同的核苷酸序列编码。对于两种不同的生物,或对于同一生物的高表达和低表达基因,有时甚至在同一个操纵子内部,对不同密码

常用密码表

【基本字母表】 ┃01┃02┃03┃04┃05┃06┃07┃08┃09┃10┃11┃12┃13┃ ┠--╂--╂--╂--╂--╂--╂--╂--╂--╂--╂--╂--╂--┨ ┃A ┃B ┃C ┃D ┃E ┃F ┃G ┃H ┃I ┃J ┃K ┃L ┃M ┃ ====================================================== ┃14┃15┃16┃17┃18┃19┃20┃21┃22┃23┃24┃25┃26┃ ┠--╂--╂--╂--╂--╂--╂--╂--╂--╂--╂--╂--╂--┨ ┃N ┃O ┃P ┃Q ┃R ┃S ┃T ┃U ┃V ┃W ┃X ┃Y ┃Z ┃ ================ 〖QWE加密表〗 ┃a┃b┃c┃d┃e┃f┃g┃h┃i┃j┃k┃l┃m┃n┃o┃p┃q┃r┃s┃t┃u┃v┃w┃x┃y┃z ┃ ┠-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-┨ ┃Q┃W┃E┃R┃T┃Y┃U┃I┃O┃P┃A┃S┃D┃F┃G┃H┃J┃K┃L┃Z┃X┃C┃V┃B┃N┃M┃ --------其实QWE加密可以表示成这种形式; --------(a,q,j,p,h,i,o,g,u,x,b,w,v,c,e,t,z,m,d,r,k)(f,y,n)(l,s) --------至于它是什么意思,自己去琢磨. --------至于这种形式比表形式有什么优点,自己去琢磨. 【QWE解密表】 ┃A┃B┃C┃D┃E┃F┃G┃H┃I┃J┃K┃L┃M┃N┃O┃P┃Q┃R┃S┃T┃U┃V┃W ┃X┃Y┃Z┃ ┠-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-╂-┨ ┃k┃x┃v┃m┃c┃n┃o┃p┃h┃q┃r┃s┃z┃y┃i┃j┃a┃d┃l┃e┃g┃w┃b┃u┃f┃t ┃ ================ 【电脑键盘表】 ┏!┯@┯#┯$┯%┯^┯&┯*┯(┯)┯_┯+┯|┓ ┃1│2│3│4│5│6│7│8│9│0│-│=│\┃ ┃│ │ │ │ │ │ │ │ │ │ │ │ ┃ 1┃Q│W│E│R│T│Y│U│I│O│P│[│]│ ┃7/8/9 -- Tab ┃│ │ │ │ │ │ │ │ │ │ │ │ ┃ 2┃A│S│D│F│G│H│J│K│L│;│'│ │ ┃4/5/6 -- Caps Lock ┃│ │ │ │ │ │ │ │ │ │ │ │ ┃

密码子表

密码子表 标准密码子表: =============================================== F ttt S tct Y tat C tgt F ttc S tcc Y tac C tgc L tta S tca * taa * tga L ttg S tcg * tag W tgg =============================================== L ctt P cct H cat R cgt L ctc P ccc H cac R cgc L cta P cca Q caa R cga L ctg P ccg Q cag R cgg =============================================== I att T act N aat S agt I atc T acc N aac S agc I ata T aca K aaa R aga M atg T acg K aag R agg =============================================== V gtt A gct D gat G ggt V gtc A gcc D gac G ggc V gta A gca E gaa G gga V gtg A gcg E gag G ggg =============================================== 脊椎动物线粒体密码子表: =============================================== F ttt S tct Y tat C tgt F ttc S tcc Y tac C tgc L tta S tca * taa W tga L ttg S tcg * tag W tgg =============================================== L ctt P cct H cat R cgt L ctc P ccc H cac R cgc L cta P cca Q caa R cga L ctg P ccg Q cag R cgg =============================================== I att T act N aat S agt I atc T acc N aac S agc M ata T aca K aaa * aga

转录因子WRKY的同义密码子使用偏好性分析

拟南芥和水稻转录因子WRKY的同义密码子使用偏好性分析 生物科学2004级何瑞 指导老师刘汉梅讲师 摘要:本文首次对拟南芥和水稻WRKY基因家族的密码子用法进行了分析,发现两个物种WRKY基因的碱基组成明显不同,水稻的密码子在第一、二、三位GC含量都明显高于拟南芥,且第三位差异最大。不同物种的WRKY基因存在共同的进化趋势,即基因GC3s 逐步增大。对应性分析结果显示,拟南芥WRKY基因的密码子使用偏性受碱基组成等多种因素共同作用,水稻主要受碱基组成和基因表达水平两个因素的影响。最后确定了拟南芥和水稻WRKY基因家族的最优密码子,分别为11个和27个。研究结果为深入开展其进化、表达调控机制和提高该基因家族新成员预测的准确性等提供了重要的理论依据。 关键词:WRKY基因,密码字偏好性,GC含量,Enc Synonymous Codon Bias of WRKY Gene Family in Aribidopsis and Rice HE Rui Biological Science,Grade 2004 Directed by LIU Han-mei (instructor) Abstract: WRKY gene family were firstly analyzed on the codon bias in Arabidopsis and Rice. The components of nitrogenous bases in the two species are obviously different: the GC content at the fist, second and third position of Rice are significantly higher than those of Aribidopsis, that discrepancy at the third position is the most marked. Meanwhile, as WRKY gene family is evolving, G-ending and C-ending codons of both Aribidopsis and Rice are good for the genes evolution. According to Correspondence Analysis, the codon usage of WRKY gene in Aribidopsis is affected by many factors, such as the components of nitrogenous bases. But the components of nitrogenous bases and the gene expression level are two primary factors in Rice. The numbers of the optimal codon in Arabidopsis and Rice are 11 and 27. The results of the the research provide the accuracy of important theoretical basis of forecasts for its evolution, regulation of gene expression and adding the gene family members. Keywords: WRKY Gene,Codon bias,GC content,Enc 蛋白质中的氨基酸序列是由mRNA中核苷酸序列决定的。mRNA上连续相邻的核苷酸以3个为一体,即三联体密码子,进行翻译时,识别与其对应的tRNA,正确的译出遗

实验三 重组蛋白的表达及Western boltting鉴定

实验三重组蛋白的表达及Western boltting鉴定 一、实验内容 1.重组蛋白在大肠杆菌中的诱导表达。 2.重组蛋白的Western boltting鉴定。 二、实验要求 通过实验,要求学生掌握外源基因在原核细胞中表达的方法,掌握Western bolt的基本原理、实验操作步骤及注意事项。 三、实验方法 1.重组蛋白的原核表达与SDS-PAGE分析 (1)将含有重组质粒的细胞在LB平板(含抗生素)上划线,37℃培养过夜。(2)从LB平板挑取单菌落分别移至2 mL的LB培养液(含抗生素),37 ℃振荡培养过夜。 (3)将过夜培养物按1:100转接于2 mL的LB培养液(含抗生素),37℃继续振摇培养至细菌生长对数中期(OD600值达0.5~0.6)。 (4)加入IPTG至终浓度0.5 mmol/L,37℃诱导表达3~6 hr。 (5)取200 μL菌液装入1.5 mL Eppendorf管中,以5000 rpm离心1 min,得到菌体细胞。将细胞重悬于30 μL水,再加入10 μL 4 × SDS-PAGE加样缓冲液,混匀,100℃煮沸10 min后,12,000 rpm离心2 min,吸取上清转移至另一新的离心管中。 (6)样品取5~10 μL进行SDS-PAGE分析。 2.Western blot分析 (1)取4 μL阳性克隆诱导后的样品,利用15%SDS-PAGE电泳分离。 (2)用半干式电转移法将蛋白转至NC膜上。 a.将聚丙稀酰胺凝胶浸泡在电转移缓冲液中平衡10 min。 b.裁剪与凝胶大小相同的NC膜,在电转移缓冲液中平衡10 min。 c.裁剪合适大小的滤纸,用电转移缓冲液浸润,按阳极、三层滤纸、NC膜、 凝胶、三层滤纸、阴极的顺序叠放电转移三明治。 d.按0.5 mA/cm2膜恒流电转移30~50 min。 (3)杂交 a.将NC膜在5%脱脂奶封闭液中室温反应1 hr。 b.TBST漂洗3 × 10 min c.转移膜至用TBST1:100稀释的一抗工作液(抗六个组氨酸单克隆抗体)中, 室温反应1 hr。 d.TBST漂洗3 × 10 min e.转移膜至用TBST1:2000稀释的辣根过氧化物酶(HRP)偶联的羊抗兔I gG

密码子数据库及密码子偏好性分析软件

密码子数据库及密码子偏好性分析软件 题记:转基因研究中经常要进行基因的异源表达,在翻译过程中,受体物种对外源基因密码子的翻译效率对表达有非常大的制约。因此,利用相应的生物信息学数据库及软件对目标序列进行受体物种的密码子偏好性分析将有助于完成对转基因效率的评价,适当选择合适的受体物种进行高效、可行的表达。 人物,阅读前,让我们感谢下列科学家,是他们为基因异源高效表达提供有价值参考。Yasukazu Nakamura博士: The First Laboratory for Plant Gene Research,Kazusa DNA Research Institute 开发Codon Usage Database(生物密码子表的利用情况统计)。 PrimerX:编写了Codon Usage Analyzer在线密码子统计表处理软件(/cgi-bin/codon.cgi),它使得对密码子的统计用图表的形式显示出来,更加的直观可读。 Morris Maduro博士:针对E. coli开发了E. coli Codon Usage Analyze 。目前的版本为2.1。Thomas Sch?dl:开发设计的以图形形式对异源基因表达的密码子使用分析软件 (Graphical codon usage analyser),用以帮助异源基因表达时对异源基因进行改造,以适应受体物种,避免由于翻译时密码子使用情况的限制使受体物种对外源基因表达产生负面影响。内容: 一:密码子使用统计数据库 Codon Usage Database(.jp/codon/ 是由植物基因研究第一实验室(The First Laboratory for Plant Gene Research)Kazusa DNA Research Institute的Yasukazu Nakamura博士开发的生物密码子表的利用情况统计。数据来源于GenBank 的DNA 序列数据库,是GenBank 的Codon Usage Tabulated 数据库在WWW模式下的扩展和整合。每个物种的密码子使用情况都可以通过WWW方式以网页的形式进行分析查询。 在该数据库中29,311个物种的不同形式的密码子使用情况被统计,包含1,756,171 个全长编码区序列。该数据库的数据来源于NCBI GenBank 的Flat File[December 19 2005]. 在数据库的编写过程中,GenBank中的pri (primate sequence entries), rod (rodent sequence entries), mam (other mammalian sequence entries), rt (other ertebrate sequence entries), in (inertebrate sequence entries), pln (plant sequence entries), bct (bacterial sequence entries), rl (iral sequence entries) and phg (phage sequence entries) 文件类型所代表的数据被采用,而EST,pat (patent sequence entries), rna (Structural RNA sequence entries), sts (STS: sequence tagged site sequence entries), syn (synthetic and chimeric sequence entries) and una (unanotated sequence entries)文件类型所代表的数据被舍弃。另外,编码区序列(complete sequenced protein coding genes)被采用,但测序数据中包含的不明确碱基所代表的密码子被排除。 数据库的使用方法: 该数据库可以对物种的拉丁名进行密码子使用情况的搜索,但数据库的搜索是不支持英文别名的。比如对于酵母密码子的搜索,要用其拉丁名Saccharomyces cereisiae,而“yeast”的搜索结果显示为零。另外,数据库对物种也进行了字母排序的统计,同样对酵母,进入S起始的“字典”里可以找到。对于线粒体、叶绿体的密码子使用情况,数据库同样给出了汇总整理。 二:密码子偏好性分析 对于密码子偏好性的分析,有Correspondence Analysis of Codon Usage软件分析程序(/)和graphical codon usage analyser在线分析软件(/faq.php?on=cut)。而对于E. coli,由于其作为发酵工程表达蛋白的最主要的手段,因此Morris Maduro博士针对E. coli开发了 E. coli Codon Usage Analyzer(.edu/~mmaduro/codonusage/usage.htm),目前的版本为2.1,它对于在

酵母密码子偏好表

密码子表密码子

酿酒酵母密码子偏好表

UUU 26.1(170666) UCU 23.5(153557) UAU 18.8(122728) UGU 8.1( 52903) UUC 18.4(120510) UCC 14.2( 92923) UAC 14.8( 96596) UGC 4.8( 31095) UUA 26.2(170884) UCA 18.7(122028) UAA 1.1( 6913) UGA 0.7( 4447) UUG 27.2(177573) UCG 8.6( 55951) UAG 0.5( 3312) UGG 10.4( 67789) CUU 12.3( 80076) CCU 13.5( 88263) CAU 13.6( 89007) CGU 6.4( 41791) CUC 5.4( 35545) CCC 6.8( 44309) CAC 7.8( 50785) CGC 2.6( 16993) CUA 13.4( 87619) CCA 18.3(119641) CAA 27.3(178251) CGA 3.0( 19562) CUG 10.5( 68494) CCG 5.3( 34597) CAG 12.1( 79121) CGG 1.7( 11351) AUU 30.1(196893) ACU 20.3(132522) AAU 35.7(233124) AGU 14.2( 92466) AUC 17.2(112176) ACC 12.7( 83207) AAC 24.8(162199) AGC 9.8( 63726) AUA 17.8(116254) ACA 17.8(116084) AAA 41.9(273618) AGA 21.3(139081) AUG 20.9(136805) ACG 8.0( 52045) AAG 30.8(201361) AGG 9.2( 60289) GUU 22.1(144243) GCU 21.2(138358) GAU 37.6(245641) GGU 23.9(156109) GUC 11.8( 76947) GCC 12.6( 82357) GAC 20.2(132048) GGC 9.8( 63903) GUA 11.8( 76927) GCA 16.2(105910) GAA 45.6(297944) GGA 10.9( 71216) GUG 10.8( 70337) GCG 6.2( 40358) GAG 19.2(125717) GGG 6.0( 39359) 酸性氨基酸:天冬氨酸、谷氨酸 碱性氨基酸:赖氨酸、精氨酸、组氨酸 目录 [隐藏] ? 1 基本結構 ? 2 分類 ? 3 理化特性 ? 4 胺基酸的化學結構 ? 5 胺基酸列表 ? 6 基本氨基酸 ?7 必需氨基酸 ?8 次要编码氨基酸 ?9 其它胺基酸 ?10 參考資料

氨基酸的简写表格及密码子的对照表

【氨基酸密码子表】【氨基酸缩写表】表1 氨基酸中英文对照及缩写 丙氨 酸Alanine A 或 Ala 89.079CH 3 -脂肪族类 精氨酸Arginin e R 或 Arg 174.188 HN=C(NH 2 )-NH-(CH 2 ) 3 - 碱性氨基酸类 天冬酰胺Aspara gine N 或 Asn 132.104 H 2 N-CO-CH 2 - 酰胺类 天冬氨酸Asparti c acid D 或 Asp 133.089 HOOC-CH 2 - 酸性氨基酸类 半胱氨酸Cystein e C 或 Cys 121.145HS-CH 2 -含硫类 谷氨酰胺Glutami ne Q 或 Gln 146.131 H 2 N-CO-(CH 2 ) 2 - 酰胺类 谷氨酸Glutami c acid E 或 Glu 147.116 HOOC-(CH 2 ) 2 - 酸性氨基酸类 甘氨 酸Glycine G 或 Gly 75.052H-脂肪族类 组氨酸Histidin e H 或 His 155.141 N=CH-NH-C H=C-CH 2 - 碱性氨基酸类

|__________| 异亮氨酸Isoleuci ne I 或Ile131.160 CH 3 -CH 2 -CH(CH 3 )- 脂肪族类 亮氨酸Leucin e L 或 Leu 131.160 (CH 3 ) 2 -CH-CH 2 - 脂肪族类 赖氨 酸Lysine K 或 Lys 146.17 H 2 N-(CH 2 ) 4 - 碱性氨基酸类 蛋氨酸Methio nine M 或 Met 149.199 CH 3 -S-(CH 2 ) 2 - 含硫类 苯丙氨酸Phenyl alanine F 或 Phe 165.177 Phenyl-CH 2 - 芳香族类 脯氨 酸Proline P 或 Pro 115.117 -N-(CH 2 ) 3 -CH- |_________| 亚氨基酸 丝氨 酸Serine S 或 Ser 105.078HO-CH 2 -羟基类 苏氨酸Threoni ne T 或 Thr 119.105 CH 3 -CH(OH)- 羟基类 色氨酸Tryptop han W 或 Trp 204.213 Phenyl-NH-C H=C-CH 2 - |__________ _| 芳香族类

拟南芥及水稻转录因子MADS密码子的偏好性比较

浙江大学学报(农业与生命科学版)  31(5):513~517,2005Journal of Zhejiang U niversity (Agric 1&Life Sci 1) 文章编号:100829209(2005)0520513205 收稿日期:2005201229 基金项目:国家自然科学基金(39870421);浙江省重点研究项目基金(2003C22007);浙江省“04206"工程水稻品种改良项目. 作者简介:李娟(1979— ),女,山东省济南人,从事基因组学方面的研究.通讯作者:薛庆中,男,教授,博士生导师,从事植物遗传育种,基因组学方面的研究.E 2mail :qzhxue @hot https://www.doczj.com/doc/c64186015.html,. 拟南芥及水稻转录因子MADS 密码子的偏好性比较 李娟1,薛庆中1,2 (1.浙江大学沃森基因组科学院,浙江杭州310008;21浙江大学农学系,浙江杭州310029) 摘 要:大多数与花发育相关的功能基因属于MADS 基因家族.应用CodonW 的因子分析表明,拟南芥MADS 转录因子家族偏好使用A 、U 结尾的密码子,而水稻MADS 转录因子家族偏好使用G 、C 结尾的密码子.同时通过氨基酸序列的多重比对,表明密码子偏好性与氨基酸序列及二级结构之间存在关联,证实了不同的密码子编码的氨基酸位于蛋白质二级结构的特定位置.关 键 词:水稻;拟南芥;密码子偏性;转录因子;AU 含量中图分类号:S511 文献标识码:A L I J uan 1,XU E Qing 2zhong 1,2(1.J ones D.W atson I nstitute of Genome Science ,Zhej iang Universit y ,H angz hou 310008,China ;2.Dept of A g ronom y ,Zhej iang Universit y ,H angz hou 310029,China ) Comparison of MADS transcriptional factor on codon bias in arabidopsis and rice.Journal of Zhejiang University (Agric 1&Life Sci 1),2005,31(5):5132517 Abstract :Most of the flower development 2related f unctional genes are belong to MADS transcription factors families.Through the factorial correspondence analysis (FCA )of CodonW ,we can find out that MADS transcriptional factors in Arabidopsis prefer to A 2ending and U 2ending codons ,while that in rice prefer to G 2ending and C 2ending codons.By using the ClustalX for searching the relation between the bias of the codons and second structure of the MADS ,we confirm that the amino acids coding by different codons are on the special position of the second structure of the proteins.K ey w ords :rice ;arabidopsis ;codon usage bias ;transcriptional factors ;AU content 转录因子是指那些专一性地结合于DNA 特定序列上,能激活或/和抑制其它基因转录的蛋白质.根据DNA 结合功能域结构,他们主要分为:b HL H (碱基性螺旋2环2螺旋)、bZIP (碱性亮氨酸拉链)、homeodomain 蛋白、MADS 2box 蛋白、zinc 2finger 蛋白、Myb 蛋白、A P2/EREBP 蛋白、HSF 蛋白、HM G 蛋白和A T hook 蛋白等[1]. 植物MADS 基因是一个序列特异的调节 基因家族.和其他真核生物转录因子一样 MADS 蛋白由MADS (M )、Intervening (I )、Keratin 2like (K )和C 2terminal (C )等结构域组成,属于结构域蛋白.大多数花发育相关功能基因属于MADS 因子家族[2,3],被子植物的大部分MADS 基因参与花发育的调控[4].不仅在花器官原基分化期表达,在植物其它部位也有表达,且某些MADS 2box 在烟草花粉发育全过程中持续表达[5].同时,MADS 2box 基因家族还

所有物种的稀有密码子表(超实用的密码子优化用材料)

The 8 least used codons in E. coli, yeast, Drosophila, and primates. E. coli yeast Drosophila primates amino acid Found in s1(m41) gene AGG AGG arginine AGA AGA arginine103,604,676,994,1096,1315,1606 AUA AUA isoleucine241,364,586,1225,1252,1306,1414 CUA leucine22,43,871,997,1372 CGA CGA CGA CGA arginine1258 CGG CGG CGG CGG arginine CCC proline UCG UCG serine1135 CGC CGC arginine CCG CCG proline256,1243 CUC leucine GCG GCG alanine142 ACG ACG threonine250,802 UUA leucine GGG glycine AGU serine UGU cysteine CGU arginine

密码子表密码子SECOND U C A G UUU UCU UAU UGU Phe(F) UUC UCC Tyr(Y) UAC Cys(C) UGC UUA UCA UAA STOP UGA U Leu(L) UUG Ser(S) UCG STOP UAG Trp(W) UGG CUU CCU CAU CGU CUC CCC his(H) CAC CGC CUA CCA CAA CGA C Leu(L) CUG Pro(P) CCG Gln(Q) CAG Arg(R) CGG AUU ACU AAU AGU AUC ACC Asn(N) AAC Ser(S) AGC Ile(I) AUA ACA AAA AGA A Met(M) AUG Thr(T) ACG Lys(K) AAG Arg(R) AGG GUU GCU GAU GGU GUC GCC Asp(D) GAC GGC GUA GCA GAA GGA FIRST G Val(V) GUG Ala(A) GCG Glu(E) GAG Gly(G) GGG

重组蛋白的表达

重组蛋白的概述 1.概述 分离纯化组成了基因工程的下游处理(downstream processing)阶段,这一过程又和上游过程紧密相联系,上游过程的诸方面影响到下游的分离纯化,所以在进行目标蛋白质表达纯化时要统一考虑和整体设计,并充分考虑上游因素对下游的影响,如是否带有亲和标签,是否进行分泌表达。目前应用最广泛的表达系统有三大类,分别是大肠杆菌表达系统、酵母表达系统和CHO细胞表达系统,不同的表达系统和培养方法显著影响下游的处理过程,目标蛋白表达是否形成包涵体,目标蛋白表达的定位(胞内、细胞内膜、周质空间和胞外),蛋白表达的量都依赖于所选择的表达系统。选择将所表达的蛋白分泌到细胞外或周质空间可以避免破碎细胞的步骤,并且由于蛋白质种类少,目标蛋白容易纯化;而在细胞质内表达蛋白,可能是可溶性表达,可能形成包涵体,可溶性的蛋白往往需要复杂的纯化步骤,而包涵体易于分离,纯度较高,但回收具有生物活性的蛋白却变的相当困难,需要对聚集的蛋白进行变复性,通常活性蛋白的得率比较低,表1列出了不同策略对表达、纯化的影响,对于其中的有些缺点可以通过一定的方法进行克服和避免,如利用DNA重组技术给外源蛋白加上一个亲和纯化的标签,有助于可溶性外源蛋白的选择性纯化,并能保护目标蛋白不被降解(96)。 表 1 重组蛋白不同表达策略的优点和缺点 表达策略优点缺点 分泌表达至细胞外增强正确二硫键的形成 降低蛋白酶对表达蛋白的降解 可获得确定的N末端 显著减少杂蛋白水平,简化纯化 不需要细胞破碎 表达水平低 多数蛋白不能进行分泌表达表达蛋白需要进行浓缩 细胞周质空间表达增强正确二硫键的形成 可获得确定的N末端 显著减少杂蛋白水平,简化纯化好些蛋白不能分泌进入周质空间没有大规模选择性的释放周质空 间蛋白的技术 周质蛋白酶可引起重组蛋白酶解 胞内包涵体表达包涵体易于分离 保护蛋白质不被降解 蛋白质不具有活性对宿主细胞生 长没有大的影响,通常可获得高的 表达水平需要体外的折叠和溶解,得率较低具有不确定N末端 胞内可溶性蛋白表达不需要体外溶解和折叠 一般具有正确的结构和功能高水平的表达常难以得到需要复杂的纯化 可发生蛋白质的酶解具有不确定的N末端 在细胞的提取物中,除了目标蛋白外,还含有其它各种性质的蛋白、核酸、多糖等。在这样一个混合体系中,蛋白质纯化要求将目标蛋白与其它的成分分离,得到一定的量,达到一定的纯度,同时要尽可能保留蛋白的生物活性,并使蛋白保持完整。所以蛋白质的分离纯化可以看作是一系列的分部收集过程,总是希望目标蛋白富集于其中的一个收集部位,而大量的杂蛋白存在于其它的收集部位。当然对目标蛋白纯度的要求要根据纯化蛋白的用途而定,对于治疗性的蛋白要求有大于99%的纯度,并对处方有活性和稳定性的要求,对于某些酶的纯度则要求较低,需要在纯度和得率之间进行一个平衡,所以下游的工艺流程取决于最终对目标蛋白的要求。 蛋白质的功能依赖于蛋白质的结构,对于有生物活性的蛋白质,在分离纯化过程中必须根据目标蛋白的特点,采用合适的操作条件和方法,保证目标蛋白的活性尽量不损失。除了在分离纯化的

相关主题
文本预览
相关文档 最新文档