2013 BFng2569 MOESM20 ESM
2013 BFng2569 MOESM20 ESM
2013 BFng2569 MOESM20 ESM
Zhenhua Peng1,4, Ying Lu2,4, Lubin Li1,4, Qiang Zhao2,4, Qi Feng2,4, Zhimin Gao3,4,
Hengyun Lu2, Tao Hu3, Na Yao1, Kunyan Liu2, Yan Li2, Danlin Fan2, Yunli Guo2,
Wenjun Li2, Yiqi Lu2, Qijun Weng2, Congcong Zhou2, Lei Zhang2, Tao Huang2, Yan
Zhao2, Chuanrang Zhu2, Xinge Liu3, Xuewen Yang3, Tao Wang1, Kun Miao1, Caiyun
Zhuang1, Xiaolu Cao1, Wenli Tang3, Guanshui Liu3, Yingli Liu3, Jie Chen1, Zhenjing
Liu1, Licai Yuan3, Zhenhua Liu1, Xuehui Huang2, Tingting Lu2, Benhua Fei3, Zemin
Ning2, Bin Han2* & Zehui Jiang1,3*
1
Research Institute of Forestry, Chinese Academy of Forestry, Key Laboratory of Tree
Breeding and Cultivation, State Forestry Administration, Beijing 100091, China.
2
National Center for Gene Research,Shanghai Institute of Plant Physiology and
Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences,
500 Caobao Road, Shanghai 200233, China.
3
International Center for Bamboo and Rattan, 8 Fu Tong Dong Da Jie, Chaoyang
District, Beijing 100102, China.
4
These authors contributed equally to this work.
*
Correspondence should be addressed to B.H. (bhan@ncgr.ac.cn); Z.J.
(jiangzehui@icbr.ac.cn).
Repeat annotation.
The de novo repeat annotation revealed that the moso bamboo genome comprised
approximately 59% transposable elements (TEs). Detection of the TEs in the
Sanger-BACs showed 53% of TE content, similar with that in whole genome. With
comparison to other grass species, the moso bamboo genome had similar TE
content to that of the sorghum (62%) (main text ref. 36), and more TE content than
rice (40%) (main text ref.18) and Brachypodium (28%)10, but lower than maize (84%)
(main text ref.36, 26). Of the observed TEs, retrotransposons were the dominating
repetitive sequences (39%), as well as 9.5% of DNA transposons. Like the rice,
sorghum and maize genomes, the most abundant repeats in bamboo were
long-terminal repeat elements (LTRs), 24.6% of Gypsy-type LTRs and 12.3% of
Copia-type LTRs.
Bamboo genome has the highest copy numbers of TEs, Gypsy/Copia-type LTR
retrotransposons and En/Spm transposons. Rice and sorghum have the highest copy
numbers of MITE transposons (Tourist & Stowaway), and Harbinger transposons
8
10
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
33
34
35
c d
e f
36
37
38
39
40
41
42
Supplementary Figure 12 Phylogenic tree of CesA and Csl gene families among
Arabidopsis, poplar, rice, maize, sorghum, Brachypodium, and bamboo. (a) NJ tree
of CesAs. A, B, C, D, E, F, and G indicated 7 clades where the bamboo genes were
located. (b) NJ tree of Csls. Different subfamilies were shown in different colors. (c)
NJ tree of CCR genes. (d) NJ tree of HCT genes. The bamboo genes were labeled by
red point. The numbers beside the branches were bootstrap percentage.
43
30
Dof
20
RPKM
10
0
S20 S50 RH RT LF P1 P2
200
MADS14
RPKM
100
0
S20 S50 RH RT LF P1 P2
44
45
46
47
2,400 - 3,600 13
6,800 - 8,700 12
†
13,500 - 17,500 2
Scaffold 2 328,698 62,052 4,869,017 2,051,719,643
120,000
(10,327 pairs of 0.66
BAC-ends)
†
Final scaffolds with less than 500 bp were excluded.
48
49
Coverage Coverage
Coverage Coverage
GeneBank Accession Length of single GeneBank Accession Length of single
of all of all
No. (bp) best No. (bp) best
matches matches
match match
50
51
Coverage Identities
Coverage
Length of single in
GenBank Accession No. of all Description
(bp) best aligned
matches
match region
gi|145845825|gb|EF549577.1| 1,035 0.960 0.884 0.960 cinnamyl alcohol dehydrogenase
gi|145845829|gb|EF549579.1| 801 0.983 0.983 0.997 caffeoyl-CoA O-methyltransferase
gi|162568699|gb|EU295482.1| 1,130 0.983 0.947 0.998 DRE-binding protein DREB2 (DREB2)
gi|169743367|gb|EU366146.1| 1,060 0.983 0.983 0.993 chloroplast chlorophyll a/b binding protein
gi|169743369|gb|EU366147.1| 1,141 0.976 0.976 0.990 chloroplast chlorophyll a/b binding protein
gi|175050407|gb|EF549578.2| 2,302 0.984 0.984 0.894 phenylalanine ammonia-lyase
gi|190694830|gb|EU780143.1| 554 0.978 0.978 0.994 chloroplast chlorophyll a/b binding protein
gi|195546525|gb|EU860441.1| 1,224 0.984 0.984 0.998 DRE-binding protein DREB1 (DREB1)
gi|222154090|gb|FJ594467.1| 2,171 0.985 0.985 0.949 phenylalanine ammonia-lyase (PAL1)
gi|222154092|gb|FJ594468.1| 2,294 0.985 0.985 0.942 phenylalanine ammonia-lyase (PAL1)
gi|237506882|gb|FJ495287.1| 3,293 0.985 0.985 0.998 cellulose synthase (cesA1)
gi|251766020|gb|FJ475350.1| 3,262 0.985 0.985 1.000 cellulose synthase (CesA2)
gi|251766022|gb|FJ475351.1| 3,306 0.978 0.978 0.986 cellulose synthase (CesA4)
gi|255764546|gb|FJ600727.1| 824 0.983 0.983 0.999 PsbS protein
gi|294818264|gb|GU944762.1| 590 0.981 0.981 0.997 putative pathogenesis protein (WRKY10)
gi|301071262|gb|GU434145.1| 1,245 0.979 0.979 0.997 Actin
gi|312232178|gb|HM747940.1| 1,770 0.985 0.985 0.980 MYB protein
gi|312232180|gb|HM747941.1| 739 0.943 0.943 0.978 VAH protein
Sum 28,741 0.981 0.976 0.981
52
53
54
55
Organism Sorghum Maize Rice Brachypodium Poplar Arabidopsis Foxtail millet Bamboo
Assembly size
730 2,300 382 272 385 125 401 2,051
(Mb)
Transposable
0.62 0.842 0.4 0.281 0.374 0.14 0.4 0.60
element
Gene number 27,640 32,540 34,792 25,532 45,555 25,498 35,472 31,987
13.1
Gene density
24 (non-repeat 11.0 10.7 8.5 4.5 11.3 64.1
(kb/gene)
region)
Average exon #
4.7 5.3 3.7 5.5 4.3 5.2 4.5 5.3
per gene
56
12 10
KO00010 Glycolysis / Gluconeogenesis 91 0.929 66 62 0.818 85 0.840
3 0
KO00020 Citrate cycle (TCA cycle) 48 59 0.895 30 32 0.867 45 34 0.778
KO00030 Pentose phosphate pathway 40 59 0.929 30 27 0.857 42 46 0.929
Fructose and mannose
KO00051 42 50 0.765 26 16 0.600 38 35 0.714
metabolism
KO00052 Galactose metabolism 31 31 0.833 24 13 0.667 24 18 0.857
KO00061 Fatty acid biosynthesis 18 20 1.000 24 10 0.400 24 26 0.800
KO00071 Fatty acid metabolism 31 46 0.700 14 13 0.571 21 25 0.667
KO00100 Steroid biosynthesis 18 24 0.769 36 10 0.467 22 21 0.769
Ubiquinone and other
KO00130 25 15 0.357 19 0 0.000 16 3 0.200
terpenoid-quinone biosynthesis
12 10
KO00190 Oxidative phosphorylation 0.605 89 65 0.413 99 74 0.525
2 9
KO00195 Photosynthesis 71 21 0.259 59 15 0.217 66 18 0.216
10
KO00230 Purine metabolism 96 0.768 96 42 0.406 79 51 0.537
2
KO00240 Pyrimidine metabolism 83 66 0.656 76 29 0.316 70 33 0.447
Alanine, aspartate and glutamate
KO00250 40 36 0.652 25 22 0.450 37 27 0.611
metabolism
Glycine, serine and threonine
KO00260 36 35 0.714 28 17 0.500 26 23 0.765
metabolism
Cysteine and methionine
KO00270 60 72 0.917 33 21 0.500 38 40 0.778
metabolism
Valine, leucine and isoleucine
KO00280 32 39 0.765 20 14 0.500 27 29 0.563
degradation
Valine, leucine and isoleucine
KO00290 29 27 0.600 27 11 0.429 29 16 0.615
biosynthesis
KO00300 Lysine biosynthesis 14 10 0.875 12 2 0.250 12 6 0.625
KO00330 Arginine and proline metabolism 44 64 0.750 29 26 0.478 39 48 0.652
KO00340 Histidine metabolism 19 26 0.500 11 0 0.000 15 14 0.222
KO00350 Tyrosine metabolism 25 31 0.583 17 8 0.222 25 18 0.385
KO00360 Phenylalanine metabolism 47 57 0.667 30 11 0.182 28 22 0.417
KO00380 Tryptophan metabolism 29 40 0.688 15 5 0.222 22 17 0.364
Phenylalanine, tyrosine and
KO00400 33 34 0.611 24 7 0.188 23 20 0.571
tryptophan biosynthesis
KO00410 beta-Alanine metabolism 20 30 0.800 17 8 0.300 20 22 0.667
57
58
59
60
MicroRNA
Bamboo microRNA ID Target gene Pfam
family
PH01000002G1660 PF03110 SBP
61
PH01000409G0160 Unknown
PH01000210m01
MIR164 PH01000483G1000 PF02365 NAM
PH01000543m01
PH01000501G0450 PF02365 NAM
PH01001318G0370 Unknown
PH01001320G0360 Unknown
PH01002494G0010 Unknown
PH010004131G0150 Unknown
PH01000070G2000 PF05071 NDUFA12
62
PH01000016G0550 Unknown
PH01000019G2340 Unknown
PH01000024G0010 Unknown
PH01000042G1600 Unknown
PH01000045G0170 Unknown
PH01000069G1200 PF02630 SCO1-SenC
PH01000102G0710 Unknown
PH01000245G0160 Unknown
PH01000349G0440 Unknown
PH01000352G0040 PF00155 Aminotran_1_2
PH01000358G0650 PF00612 IQ
PH01000361G0710 PF02576 DUF150
PH01000367G0560 PF01985 CRS1_YhbY
PH01000367G0850 PF04755 PAP_fibrillin
PH01000590G0720 Unknown
PH01000591G0360 Unknown
PH01000597G0190 Unknown
63
PH01000753G0540 Unknown
PH01000866G0400 Unknown
PH01000875G0370 PF02836 Glyco_hydro_2_C; PF05282 AAR2
PH01001740G0210 Unknown
PH01001760G0230 PF03690 UPF0160
PH01002124G0020 Unknown
PH01002232G0310 PF02540 NAD_synthase
PH01003342G0140 Unknown
PH01003422G0190 PF01926 MMR_HSR1; PF06071 YchF-GTPase_C
64
PH01005724G0010 Unknown
PH01007546G0020 PF07719 TPR_2
PH01040671G0010 Unknown
PH01099851G0010 Unknown
PH01000369G0580 Unknown
PH01001716G0120 Unknown
PH01002279G0010 PF03070 TENA_THI-4
PH01002838G0190 Unknown
PH01000037G1490 Unknown
65
PH01001028G0270 Unknown
PH01002138G0280 PF00067 p450
PH01002503G0090 Unknown
PH01002963G0090 Unknown
PH01003375G0030 PF04859 DUF641
PH01002024G0310 Unknown
PH01000436G0030 Unknown
PH01003440G0030 Unknown
PH01003658G0100 Unknown
PH01004857G0110 PF00566 TBC
66
67
Length Percentage of
occupied (bp) sequences
Class I elements (Retroelements) 790,027,115 0.385
LTR Retrotransposons 764,632,374 0.373
LTR/Copia 251,526,442 0.123
LTR/Gypsy 505,181,075 0.246
unclassified LTR 7,924,857 0.004
non-LTR Retrotransposons 24,526,904 0.012
LINE 23,701,208 0.012
SINE 825,696 0.000
unclassified retrotransposons 867,837 0.000
Class II elements (DNA Transposons) 194,238,269 0.095
DNA Transposons 176,524,830 0.086
DNA/En-Spm 74,871,248 0.036
DNA/hAT 26,546,823 0.013
DNA/MuDR 73,460,366 0.036
DNA/Harbinger 1,646,393 0.001
MITEs 7,356,902 0.004
DNA/TcMar-Stowaway 3,209,627 0.002
DNA/Tourist 4,147,275 0.002
RC/Helitrons 3,241,423 0.002
unclassified transposons 7,115,114 0.003
Unknown repeats 226,597,546 0.110
Total transposable elements 1,210,862,930 0.590
Low_complexity 1,130,281 0.001
68
Repeat ID Class of TE-element Copies Repeat ID Class of TE-element Copies Repeat ID Class of TE-element Copies
69
PH01R6F001124 LTR/Gypsy 20624 21071416 SPMLIKE DNA/En-Spm 2150 6292696 Gypsy-122_SBi-I Gypsy 4366 27222661
PH01R5F003093 LTR/Gypsy 6366 16535123 RETRO2_I LTR/Gypsy 628 3846962 Gypsy-133_SBi-I Gypsy 4979 17779891
PH01R6F002209 LTR/Gypsy 9446 15516557 RIRE2_I LTR/Gypsy 758 3498738 Gypsy-121_SBi-I Gypsy 3137 14031073
PH01R1F000002 LTR/Gypsy 11637 12906984 RIRE3_LTR LTR/Gypsy 1956 3338048 ATHILA-1_SBi-I Gypsy 4976 11493996
PH01R6F000836 LTR/Copia 7388 12860851 ATLANTYS-I_OS LTR/Gypsy 1086 3215554 Gypsy-136_SBi-I Gypsy 3917 10004929
PH01R5F002786 LTR/Copia 10725 12543585 TRUNCATOR LTR/Gypsy 1755 2952995 ATHILA-1_SBi-LTR Gypsy 6810 8641228
PH01R6F004818 LTR/Gypsy 4337 12059267 RIRE3A_LTR LTR/Gypsy 2301 2796270 Gypsy-125_SBi-I Gypsy 5002 8122387
PH01R6F002298 LTR/Gypsy 6746 11580121 SZ-7_int LTR/Gypsy 693 2394637 Gypsy-128_SBi-LTR Gypsy 2820 7778326
PH01R5F002220 LTR/Gypsy 12658 10774197 TRUNCATOR2_OS LTR/Gypsy 2411 2356738 Gypsy-122B_SBi-I Gypsy 2366 7355980
PH01R1F000015 LTR/Copia 6969 9936341 RIREX_I LTR/Gypsy 967 2179712 ATHILA-3_SBi-I Gypsy 3267 7080252
70
Divergence
Species Mean Ks
time (mya)
1
The wheat gene models were downloaded at
ftp://ftp.ncbi.nih.gov/repository/UniGene/Triticum_aestivum/.
71
Family
Description of conserved function domains Phe Bdi Osa Sbi Zma Sit Ath P-value
No.
73
74
Supplementary Table 13b Statistics of syntenic bamboo loci on the aligned sorghum
genome
No. of No. of Average gene
Sorghum syntenic gene syntenic number per
blocks genes block
Chromosome-01 296 3,319 11.2
Chromosome-02 212 1,893 8.9
Chromosome-03 199 2,364 11.9
Chromosome-04 159 1,667 10.5
Chromosome-05 48 352 7.3
Chromosome-06 134 1,688 12.6
Chromosome-07 102 909 8.9
Chromosome-08 72 708 9.8
Chromosome-09 167 1,507 9.0
Chromosome-10 150 1,339 8.9
Sum 1,539 15,746 10.2
Note: A total of 30,379 (94.8% of 31,987) Bamboo loci located on the scaffolds with
length over 50 KB were aligned to the rice and sorghum gene models, respectively.
At least 5 genes are required to call synteny. Within a syntenic gene block, the
maximum number of non-syntenic genes between two adjacent syntenic genes
should be less than 5.
75
CesA 19 10 20 9 12 11 18
Total 38 29 33 24 37 34 37
CslA 9 8 10 8 8 10 5
CslB 0 6 0 0 0 0 2
CslC 10 5 8 4 6 6 5
CslD 7 6 5 3 5 5 11
Csl
CslE 4 1 2 3 3 3 3
CslF 7 0 7 5 11 8 0
CslG 0 3 0 0 0 0 4
CslH 0 0 0 1 3 2 0
CslJ 1 0 1 0 1 0 2
77
78
79
Known
Floral genes in
Abbr. homologous Referencing function Interpro domains
bamboo
genes
PH01000032G1740 OsERF334 Drought tolerance
PH01000046G1730
PH01000129G0360
PH01000573G0640 OsAP2-3934 Drought tolerance
PH01000573G0670
PH01001102G0050
PH01001360G0530 IPR001471 Pathogenesis-related
ERF PH01001634G0140 transcriptional factor/ERF, DNA-binding;
PH01001704G0270 IPR016177 DNA-binding, integrase-type
PH01002279G0250
PH01002393G0230
PH01002571G0300
PH01002648G0300
PH01003475G0200
PH01004791G0030
80
81
82
83
84
Note: description of the abbreviation was listed at the follows: ERF, ethylene-responsive transcriptional factor ; bZIP, Basic-leucine zipper (bZIP)
transcription factor; CCT/B-box, CCT/B-box zinc finger protein; F-box, F-box domain containing protein; HTH myb-type, Helix-turn-helix
transcriptional regulator, Myb-type; Homeobox, Homeobox domain containing protein; MADS-box, Transcription factor, MADS-box; NAC, NAC
domain transcription factor; WD-40, WD-40 repeat family protein; YABBY, YABBY domain containing protein; zf-Dof, dof zinc finger domain
containing protein. HSP20, Heat shock protein Hsp20; HSP70, Heat shock protein Hsp70; HSP DnaJ, Heat shock protein DnaJ; HSF, Heat shock
factor (HSF)-type; Peroxidase, Plant peroxidase; Dehydrin, Dehydrin domain containing proteins; Thaumatin, Thaumatin domain containing
proteins; HM, Heavy metal transport/detoxification protein); MT, Plant metallothionein, family 15; BURP, BURP domain containing proteins; MIP,
Major intrinsic protein.
85
3'-UTR, 390 bp
Transcription PH01000682 157487 - 159283 LINE/L1 very low¶
54 from stop-codon
factor CO (At5g15840) &
PH01005551 14180 - 15803 not detected
CONSTANS Hd-1
intron,
(up-regulation (LOC_Os06g16370)55
PH01002508 56985 - 62300 LTR/Gypsy insertion >4,000 not detected
of FPIs)
bp
promoter, 578 bp
PH01000216 339337 - 340520 LINE/L1 very low
from start-codon
3'-UTR, 456 bp
PH01002091 17501 - 18413 LINE/L1 not detected
from stop-codon
PH01001461 145851 - 156173 LTR/Copia coding region not detected
FT (At1g65480), TSF
intron,
(At4g20370), TFL1
FPIs (floral PH01000257 828385 - 839445 LTR/Gypsy & DNA/En-Spm insertion >8,000 not detected
(At5g03840)56, RCN2
pathway bp
(LOC_Os02g32950)57
integrators) 5'-UTR, 187 bp
& Hd3a PH01002149 213193 - 215000 DNA/TcMar-Stowaway not detected
from start-codon
(LOC_Os06g06320)
promter, 661 bp
PH01000020 1187666 - 1189438 Unclassified TE very low
form start-codon
PH01000089 982699 - 984616 Unclassified TE coding region not detected
near 3'-UTR, 920
PH01000128 233584 - 235378 LTR/Gypsy not detected
bp from
86
87
88
89
Identified RPKM
Homologs in
genes in Abbreviation† Function
bamboo S20 S50 RH RT LF P1 P2
Arabidopsis
AT4G24540, PH01000038G1550 9 9 4 21 2 3 2
AGL24, SVP Regulator of FPI/FMI
AT2G2254062 PH01000437G0930 72 71 52 36 26 11 10
AT2G2755063, PH01001134G0390 0 0 0 0 0 0 0
AT5G6204058, ATC, BFT, TFL1 FPI PH01002570G0010 4 2 0 0 0 0 0
AT5G0384064 PH01003363G0220 0 1 0 0 0 0 0
PH01000029G1950 2 1 1 0 1 0 1
AT5G0610065 ATMYB33 Regulator of FPI/FMI
PH01000009G0060 2 2 1 1 2 1 7
PH01000263G1210 6 4 1 2 5 3 1
AT4G0892066, PH01000968G0540 8 6 3 2 4 4 4
CRY1, CRY2 Photoperiod pathway
AT1G0440067 PH01002304G0120 4 3 4 4 3 1 1
PH01002373G0140 3 2 1 1 3 2 1
PH01001266G0500 20 32 19 7 7 8 6
AT4G2214068 EBS Regulator of FPI/FMI PH01001406G0500 12 14 3 1 1 6 5
PH01002328G0250 9 12 0 0 0 0 0
PH01000364G0790 2 3 2 1 2 1 1
Autonomous
AT4G1588069 ESD4 PH01000526G0230 8 12 9 7 7 5 7
pathway
PH01001219G0240 6 7 10 5 7 4 6
PH01002213G0250 9 9 2 2 3 3 5
AT1G6805070 FKF1 Photoperiod pathway PH01002958G0010 11 9 6 16 7 10 22
PH01007024G0030 15 15 6 6 9 6 7
90
91
92
93
Identities
Homologous rice Description of rice Involved
Bamboo gene ID of amino Function
gene gene pathway†
acids
PH01000015G0220 LOC_Os01g04380 OsHSP17.045 0.90 Stress tolerance ABA
PH01000032G1740 LOC_Os01g58420 OsERF44 0.66 Drought tolerance ETH
PH01000053G1650 LOC_Os11g03300 OsNAC1037 0.64 Drought tolerance ABA
PH01000058G1180 LOC_Os01g72530 OsCML3192 0.75 Signal transduction Ca2+ sensor
PH01000068G0660 LOC_Os07g44330 OsPDK193 0.92 Signal transduction GA
PH01000074G0590 LOC_Os02g44235 OsTPP194 0.51 Stress tolerance ABA
PH01000081G0140 LOC_Os03g06630 OsHSF746 0.82 Stress tolerance
PH01000091G0440 LOC_Os03g26870 SRWD339 0.80 Salinity tolerance
PH01000099G1710 LOC_Os01g42860 OCPI195 0.66 Drought tolerance
PH01000111G0850 LOC_Os01g66120 OsNAC6 ; SNAC296 0.82 Stress tolerance ABA
PH01000113G0300 LOC_Os03g07360 OsDof12 23,24 0.80 Regulator of FMI Flowering
PH01000122G1000 LOC_Os03g60080 SNAC138 0.70 Drought tolerance
PH01000154G1240 LOC_Os03g16040 OsHSP17.741 0.54 Drought and heat stress tolerance
PH01000162G1010 LOC_Os03g44710 OsYABBY2; OsYAB240 0.71 FMI
PH01000173G1010 LOC_Os02g52780 OsbZIP2335 0.59 Drought and salinity tolerance ABA
PH01000174G0590 LOC_Os07g08140 OsHsfA2b97 0.62 Heat stress tolerance
PH01000192G1330 LOC_Os04g41540 OsCML2292 0.54 Signal transduction Ca2+ sensor
PH01000194G0800 LOC_Os10g28340 OsHsfA2c47 0.80 Heat and oxidative stress tolerance
PH01000208G0690 LOC_Os03g06630 OsHSF746 0.78 Heat stress tolerance
PH01000222G1190 LOC_Os03g54160 OsMADS14 0.69 FMI Flowering
PH01000242G0910 LOC_Os05g49420 OsbZIP4536 0.78 Reproductive development and stress ABA
94
95
96
† Abbreviation of the pathways: Floral pathways of FMI or FPI (Flowering), abscisic acid pathway (ABA), Gibberellin pathway (GA),
ethylene-responsive pathway (ETH), jasmonic acid pathway (JA).
97