Nothing Special   »   [go: up one dir, main page]

Get Transcriptome Data Analysis Methods and Protocols 1st Edition Yejun Wang Free All Chapters

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Full download test bank at ebook textbookfull.

com

Transcriptome Data Analysis Methods


and Protocols 1st Edition Yejun

CLICK LINK TO DOWLOAD

https://textbookfull.com/product/transcriptom
e-data-analysis-methods-and-protocols-1st-
edition-yejun-wang/

textbookfull
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Functional Proteomics: Methods and Protocols Xing Wang

https://textbookfull.com/product/functional-proteomics-methods-
and-protocols-xing-wang/

Microbiome Analysis Methods and Protocols Robert G.


Beiko

https://textbookfull.com/product/microbiome-analysis-methods-and-
protocols-robert-g-beiko/

RNA Abundance Analysis : Methods and Protocols Hailing


Jin

https://textbookfull.com/product/rna-abundance-analysis-methods-
and-protocols-hailing-jin/

Single Cell Analysis: Methods and Protocols Miodrag


Gužvi■

https://textbookfull.com/product/single-cell-analysis-methods-
and-protocols-miodrag-guzvic/
Selected Methods of Planning Analysis 2nd Edition
Xinhao Wang

https://textbookfull.com/product/selected-methods-of-planning-
analysis-2nd-edition-xinhao-wang/

Relative fidelity processing of seismic data : methods


and applications 1 edition Edition Wang

https://textbookfull.com/product/relative-fidelity-processing-of-
seismic-data-methods-and-applications-1-edition-edition-wang/

Functional Analysis of Long Non-Coding RNAs: Methods


and Protocols Haiming Cao

https://textbookfull.com/product/functional-analysis-of-long-non-
coding-rnas-methods-and-protocols-haiming-cao/

TALENs Methods and Protocols 1st Edition Ralf Kühn

https://textbookfull.com/product/talens-methods-and-
protocols-1st-edition-ralf-kuhn/

Zymography Methods and Protocols 1st Edition Jeff


Wilkesman

https://textbookfull.com/product/zymography-methods-and-
protocols-1st-edition-jeff-wilkesman/
Methods in
Molecular Biology 1751

Yejun Wang
Ming-an Sun Editors

Transcriptome
Data Analysis
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY

Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK

For further volumes:


http://www.springer.com/series/7651
Transcriptome Data Analysis

Methods and Protocols

Edited by

Yejun Wang
Department of Cell Biology and Genetics, School of Basic Medicine, Shenzhen University
Health Science Center, Shenzhen, China

Ming-an Sun
Epigenomics and Computational Biology Lab, Biocomplexity Institute of Virginia Tech,
Blacksburg, VA, USA
Editors
Yejun Wang Ming-an Sun
Department of Cell Biology and Epigenomics and Computational Biology Lab
Genetics, School of Basic Medicine Biocomplexity Institute of Virginia Tech
Shenzhen University Health Blacksburg, VA, USA
Science Center
Shenzhen, China

ISSN 1064-3745 ISSN 1940-6029 (electronic)


Methods in Molecular Biology
ISBN 978-1-4939-7709-3 ISBN 978-1-4939-7710-9 (eBook)
https://doi.org/10.1007/978-1-4939-7710-9
Library of Congress Control Number: 2018933577

© Springer Science+Business Media, LLC 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.
The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Humana Press imprint is published by Springer Nature


The registered company is Springer Science+Business Media, LLC
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Preface

As sequencing technology improves and costs decrease, more and more laboratories are
performing RNA-Seq to explore the molecular mechanisms of various biological pheno-
types. Due to the increased sequencing depth available, the purposes of transcriptome
studies have also been expanded extensively. In addition to the conventional uses for gene
annotation, profiling, and expression comparison, transcriptome studies have been applied
for multiple other purposes, including but not limited to gene structure analysis, identifica-
tion of new genes or regulatory RNAs, RNA editing analysis, co-expression or regulatory
network analysis, biomarker discovery, development-associated imprinting studies, single-
cell RNA sequencing studies, and pathogen–host dual RNA sequencing studies.
The aim of this book is to give comprehensive practical guidance on transcriptome data
analysis with different scientific purposes. It is organized in three parts. In Part I, Chapters 1
and 2 introduce step-by-step protocols for RNA-Seq and microarray data analysis, respec-
tively. Chapter 3 focuses on downstream pathway and network analysis on the differentially
expressed genes identified from expression profiling data. Unlike most of the other proto-
cols, which were command line-based, Chapter 4 describes a visualizing method for tran-
scriptome data analysis. Chapters 5–11 in Part II give practical protocols for gene
characterization analysis with RNA-Seq data, including alternative spliced isoform analysis
(Chapter 5), transcript structure analysis (Chapter 6), RNA editing (Chapter 7), and
identification and downstream data analysis of microRNA (Chapters 8 and 9), lincRNA
(Chapter 10), and transposable elements (Chapter 11). In Part III, protocols on several new
applications of transcriptome studies are described: RNA–protein interactions (Chapter 12),
expression noise analysis (Chapter 13), epigenetic imprinting (Chapter 14), single-cell RNA
sequencing applications (Chapter 15), and deconvolution of heterogeneous cells
(Chapter 16). Some chapters cover more than one application. For example, Chapter 5
also presents the analysis of single molecule sequencing data in addition to alternative
splicing analysis; Chapter 12 also gives solutions for the analysis of small RNAs in bacteria.
Some topics were not included in this volume due to various factors, e.g., analysis on circular
RNAs, metatranscriptomics, biomarker identification, and dual RNA-Seq. For circular
RNAs, there are numerous published papers or books with protocols that can be followed.
Metatranscriptomics is a new technique and data-oriented methods for analysis are still
lacking. For most other applications, the core protocols for data processing and analysis are
the same as presented in the chapters of this volume.

Shenzhen, China Yejun Wang


Blacksburg, VA, USA Ming-an Sun

v
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

PART I GENERAL PROTOCOLS ON TRANSCRIPTOME DATA ANALYSIS


1 Comparison of Gene Expression Profiles in Nonmodel
Eukaryotic Organisms with RNA-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Han Cheng, Yejun Wang, and Ming-an Sun
2 Microarray Data Analysis for Transcriptome Profiling. . . . . . . . . . . . . . . . . . . . . . . . 17
Ming-an Sun, Xiaojian Shao, and Yejun Wang
3 Pathway and Network Analysis of Differentially Expressed
Genes in Transcriptomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Qianli Huang, Ming-an Sun, and Ping Yan
4 QuickRNASeq: Guide for Pipeline Implementation
and for Interactive Results Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Wen He, Shanrong Zhao, Chi Zhang, Michael S. Vincent,
and Baohong Zhang

PART II OBJECTIVE-SPECIALIZED TRANSCRIPTOME DATA ANALYSIS

5 Tracking Alternatively Spliced Isoforms from Long Reads


by SpliceHunter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Zheng Kuang and Stefan Canzar
6 RNA-Seq-Based Transcript Structure Analysis with TrBorderExt . . . . . . . . . . . . . 89
Yejun Wang, Ming-an Sun, and Aaron P. White
7 Analysis of RNA Editing Sites from RNA-Seq Data Using GIREMI. . . . . . . . . . . 101
Qing Zhang
8 Bioinformatic Analysis of MicroRNA Sequencing Data . . . . . . . . . . . . . . . . . . . . . . 109
Xiaonan Fu and Daoyuan Dong
9 Microarray-Based MicroRNA Expression Data Analysis
with Bioconductor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Emilio Mastriani, Rihong Zhai, and Songling Zhu
10 Identification and Expression Analysis of Long Intergenic
Noncoding RNAs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Ming-an Sun, Rihong Zhai, Qing Zhang, and Yejun Wang
11 Analysis of RNA-Seq Data Using TEtranscripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Ying Jin and Molly Hammell

vii
viii Contents

PART III NEW APPLICATIONS OF TRANSCRIPTOME

12 Computational Analysis of RNA–Protein Interactions


via Deep Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Lei Li, Konrad U. Förstner, and Yanjie Chao
13 Predicting Gene Expression Noise from Gene Expression Variations . . . . . . . . . . 183
Xiaojian Shao and Ming-an Sun
14 A Protocol for Epigenetic Imprinting Analysis with RNA-Seq Data . . . . . . . . . . . 199
Jinfeng Zou, Daoquan Xiang, Raju Datla, and Edwin Wang
15 Single-Cell Transcriptome Analysis Using SINCERA Pipeline . . . . . . . . . . . . . . . . 209
Minzhe Guo and Yan Xu
16 Mathematical Modeling and Deconvolution of Molecular
Heterogeneity Identifies Novel Subpopulations in Complex Tissues. . . . . . . . . . . 223
Niya Wang, Lulu Chen, and Yue Wang

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Contributors

STEFAN CANZAR  Gene Center, Ludwig-Maximilians-Universit€ a t München,


Munich, Germany
YANJIE CHAO  Institute of Molecular Infection Biology, University of Würzburg,
Würzburg, Germany; Department of Molecular Biology and Microbiology, Howard Hughes
Medical Institute, Tufts University School of Medicine, Boston, MA, USA
LULU CHEN  Department of Electrical and Computer Engineering, Virginia Polytechnic
Institute and State University, Arlington, VA, USA
HAN CHENG  Key Laboratory of Rubber Biology, Ministry of Agriculture, Rubber Research
Institute, Chinese Academy of Tropical Agricultural Sciences, Danzhou, Hainan, P.R.
China
RAJU DATLA  National Research Council Canada, Saskatoon, SK, Canada
DAOYUAN DONG  Department of Chemistry and Biochemistry, University of the Sciences,
Philadelphia, PA, USA
KONRAD U. FÖRSTNER  Institute of Molecular Infection Biology, University of Würzburg,
Würzburg, Germany
XIAONAN FU  Department of Biochemistry, Virginia Tech, Blacksburg, VA, USA
MINZHE GUO  The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary
Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
MOLLY HAMMELL  Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
WEN HE  Early Clinical Development, Pfizer Worldwide R&D, Cambridge, MA, USA
QIANLI HUANG  School of Biological and Medical Engineering, Hefei University of
Technology, Hefei, China
YING JIN  Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
ZHENG KUANG  Department of Immunology, The University of Texas Southwestern Medical
Center, Dallas, Texas, USA
LEI LI  Institute of Molecular Infection Biology, University of Würzburg,
Würzburg, Germany; Division of Biostatistics, Dan L. Duncan Cancer Center, Baylor
College of Medicine, Houston, TX, USA
EMILIO MASTRIANI  Systemomics Center, College of Pharmacy, Harbin Medical University,
Harbin, China; Genomics Research Center (State-Province Key Laboratories of
Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, China
XIAOJIAN SHAO  Department of Human Genetics, McGill University, Montréal, Canada;
The McGill University and Génome Québec Innovation Centre, Montréal, QC, Canada
MING-AN SUN  Epigenomics and Computational Biology Lab, Biocomplexity Institute of
Virginia Tech, Blacksburg, VA, USA
MICHAEL S. VINCENT  Inflammation and Immunology Research Unit, Pfizer Worldwide
R&D, Cambridge, MA, USA
EDWIN WANG  Department of Experimental Medicine, McGill University,
Montreal, QC, Canada; Center for Bioinformatics, McGill University,
Montreal, QC, Canada; Center for Health Genomics and Informatics, University of
Calgary Cumming School of Medicine, Calgary, AB, Canada; Department of Biochemistry
and Molecular Biology, University of Calgary Cumming School of Medicine,
Calgary, AB, Canada; Department of Medical Genetics, University of Calgary Cumming

ix
x Contributors

School of Medicine, Calgary, AB, Canada; Department of Oncology, University of Calgary


Cumming School of Medicine, Calgary, AB, Canada; Alberta Children’s Hospital
Research Institute, Calgary, AB, Canada; Arnie Charbonneau Cancer Research Institute,
Calgary, AB, Canada; O’Brien Institute for Public Health, Calgary, AB, Canada; Wang
Lab, Health Science Centre, University of Calgary, Calgary, AB, Canada
NIYA WANG  Department of Electrical and Computer Engineering, Virginia Polytechnic
Institute and State University, Arlington, VA, USA
YEJUN WANG  Department of Cell Biology and Genetics, School of Basic Medicine, Shenzhen
University Health Science Center, Shenzhen, PR China
YUE WANG  Department of Electrical and Computer Engineering, Virginia Polytechnic
Institute and State University, Arlington, VA, USA
AARON P. WHITE  Vaccine and Infectious Disease Organization, University of Saskatchewan,
Saskatoon, SK, Canada
DAOQUAN XIANG  National Research Council Canada, Saskatoon, SK, Canada
YAN XU  The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology,
Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA; Division of
Biomedical Informatics, Cincinnati Children’s Hospital Medical Center,
Cincinnati, OH, USA
PING YAN  School of Biological and Medical Engineering, Hefei University of Technology,
Hefei, China
RIHONG ZHAI  School of Public Health, Shenzhen University Health Science Center,
Shenzhen, China
BAOHONG ZHANG  Early Clinical Development, Pfizer Worldwide R&D,
Cambridge, MA, USA
CHI ZHANG  Early Clinical Development, Pfizer Worldwide R&D, Cambridge, MA, USA
QING ZHANG  Integrative Biology and Physiology, The University of California, Los Angeles
(UCLA), Los Angeles, CA, USA
SHANRONG ZHAO  Early Clinical Development, Pfizer Worldwide R&D,
Cambridge, MA, USA
SONGLING ZHU  Systemomics Center, College of Pharmacy, Harbin Medical University,
Harbin, China; Genomics Research Center (State-Province Key Laboratories of
Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, China
JINFENG ZOU  National Research Council Canada, Montreal, QC, Canada
Part I

General Protocols on Transcriptome Data Analysis


Chapter 1

Comparison of Gene Expression Profiles in Nonmodel


Eukaryotic Organisms with RNA-Seq
Han Cheng, Yejun Wang, and Ming-an Sun

Abstract
With recent advances of next-generation sequencing technology, RNA-Sequencing (RNA-Seq) has
emerged as a powerful approach for the transcriptomic profiling. RNA-Seq has been used in almost every
field of biological studies, and has greatly extended our view of transcriptomic complexity in different
species. In particular, for nonmodel organisms which are usually without high-quality reference genomes,
the de novo transcriptome assembly from RNA-Seq data provides a solution for their comparative tran-
scriptomic study. In this chapter, we focus on the comparative transcriptomic analysis of nonmodel
organisms. Two analysis strategies (without or with reference genome) are described step-by-step, with
the differentially expressed genes explored.

Key words Nonmodel organism, RNA-Seq, Next-generation sequencing, Differential expression,


Transcriptome, de novo transcriptome assembly

1 Introduction

Recent advantages in next-generation sequencing have enabled the


development of RNA-Seq—a powerful approach allowing the
investigation of transcriptome at unsurpassed resolution [1].
RNA-Seq has the potential to reveal unprecedented complexity of
the transcriptomes, to provide quick insights into the gene struc-
ture without the requirement of reference genome, to expand the
identification for the genes of interest, to develop functional molec-
ular markers, to quantify gene expression, and to compare gene
expression profiles [2]. These advantages have made RNA-Seq the
most popular method for transcriptome analysis [3]. In particular,
unlike microarray which is another popular method for transcrip-
tome profiling but needs to be designed according to presequenced
reference genome, RNA-Seq could be applied for the transcrip-
tomic study in nonmodel organisms [4]. Next-generation sequenc-
ing becomes more affordable in recent years, making RNA-Seq
more and more popular in ordinary molecular biology laboratory.

Yejun Wang and Ming-an Sun (eds.), Transcriptome Data Analysis: Methods and Protocols, Methods in Molecular Biology,
vol. 1751, https://doi.org/10.1007/978-1-4939-7710-9_1, © Springer Science+Business Media, LLC 2018

3
4 Han Cheng et al.

RNA-Seq has already been used in almost every field of biological


studies, and has greatly extended our view of transcriptomic com-
plexity in different species. However, the huge amounts of reads
generated by RNA-Seq pose great challenges to the assembly and
analysis of complete transcriptomes. Fortunately, recent progresses
in bioinformatics provided powerful tools for RNA-Seq analysis of
species lacking high-quality reference genome.
In nonmodel organisms, de novo transcriptome assembly is the
first step for constructing a reference when the complete genome
sequences are absent. In recent years, several tools have been devel-
oped for de novo transcriptome assembly, such as Trinity,
SOAPdenovo-Trans, and ABYSS [4–6]. These tools each have
their own merits for dealing with different types of genomes. The
short reads are then mapped to the reference transcriptome, and
the read counts of each transcript are normalized and compared
between each sample. In this step, we usually use RSEM for quan-
tifying transcript abundances [7]. The final step is to annotate each
transcript and to visualize the expression results.
The tools mentioned above greatly facilitate transcriptome
assembly and promote RNA-Seq studies in the nonmodel organ-
isms. In recent years, a great number of studies appeared to identify
differentially expressed (DE) genes between specific treatments or
tissues [8–13]. In this chapter, we give a step-by-step protocol to
assemble a reference transcriptome and to explore DE genes from
RNA-Seq data.

2 Materials

2.1 Software All the software packages need to be installed in your workstation in
Packages advance. Because most bioinformatics tools are designed for Linux
operating systems, here we demonstrate each step according to 64-bit
Ubuntu OS. For the convenience of running the commands in
your working directory, add the folders containing your executes
into your PATH environment variable so that the executes could be
used directly when you type their names. To be noted, some software
used in this protocol may be not the latest version. In such case, it is
highly encouraged to download the latest version for use.

2.1.1 SRA Toolkit Download the SRA toolkit [14], unpack the tarball to your desti-
nation directory (e.g., /home/your_home/soft/), and add the
executables path to your PATH, type:
wget http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratool
kit.current-centos_linux64.tar.gz.
Non-Model Organisms Transcriptome Analysis 5

tar xzf –C /home/your_home/soft/ sratoolkit.current- centos_


linux64.tar.gz

export PATH¼/home/your_home/soft/sratoolkit.2.7.0-
ubuntu64/bin:$PATH

2.1.2 FastQC Download the FastQC package [15], unpack and add the directory
to your PATH.
wget http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
fastqc_v0.10.1.zip

unzip fastqc_v0.10.1.zip –d /home/your_home/soft/

export PATH¼/home/your_home/soft/FastQC:$PATH

2.1.3 Trinity Download the Trinity package [4], unpack, and add the directory
to your PATH.
wget https://github.com/trinityrnaseq/trinityrnaseq/archive/
v2.2.0.tar.gz.

tar xzf –C /home/your_home/soft/ trinityrnaseq-2.2.0.tar.gz

export PATH¼/home/your_home/soft/trinityrnaseq-2.2.0:
$PATH

export PATH¼/home/your_home/soft/trinityrnaseq-2.2.0/
util:$PATH

2.1.4 RSEM Download the RSEM package [7], unpack, and add the RSEM
directory to your PATH.
wget https://github.com/deweylab/RSEM/archive/v1.2.8.tar.gz

tar xzf –C /home/your_home/soft/ RSEM-1.2.8.tar.gz

export PATH¼/home/your_home/soft/rsem-1.2.8:$PATH

2.1.5 R Download R [16], unpack and then install.


wget https://cran.r-project.org/src/base/R-3/R-3.2.2.tar.gz

tar zxf –C /home/your_home/soft/ R-3.2.2.tar.gz

cd /home/your_home/soft/R-3.2.2

./configure ./configure --prefix¼/home/your_home/bin


6 Han Cheng et al.

make

make check

make install

2.1.6 Bowtie2 Download Bowtie2 package [17], unpack, and then add Bowtie2
directory to your PATH.
wget http://jaist.dl.sourceforge.net/project/bowtie-bio/bow
tie2/2.2.6/bowtie2-2.2.6-linux-x86_64.zip

unzip bowtie2-2.2.6-linux-x86_64.zip -d /home/your_home/


soft/

export PATH¼/home/your_home/soft/ bowtie2-2.2.6:


$PATH

2.1.7 Tophat Download Tophat [18], unpack and install, and then add the
(See Note 1) directory to your PATH.
wget http://ccb.jhu.edu/software/tophat/downloads/tophat-
2.0.9.Linux_x86_64.tar.gz

tar zxf tophat-2.0.9.Linux_x86_64.tar.gz

cd tophat-2.0.9.linux_x86_64

./configure --prefix¼/home/your_home/soft/tophat2

make

make install

export PATH¼/home/your_home/soft/tophat2:$PATH

2.1.8 Cufflinks Download Cufflinks [19], unpack and then add the directory to
your PATH.
wget http://cole-trapnell-lab.github.io/cufflinks/assets/down
loads/cufflinks-2.2.1.Linux_x86_64.tar.gz

tar xzf –C /home/your_home/soft/ cufflinks-2.2.1.Linux_x86_


64.tar.gz
Non-Model Organisms Transcriptome Analysis 7

export PATH¼/home/your_home/soft/cufflinks-2.2.1.Linux_
x86_64:$PATH

2.1.9 EBSeq EBSeq [20] is an R Bioconductor package for gene and isoform
differential expression analysis of RNA-Seq data. For installation,
just start R and enter:
source("https://bioconductor.org/biocLite.R")

biocLite("EBSeq")

2.1.10 DESeq DESeq [21] is an R Bioconductor package for differential expres-


sion analysis with reads count data. To install it, start R and enter:
source("https://bioconductor.org/biocLite.R")

biocLite("DESeq")

2.2 Data Samples Most public RNA-Seq data could be downloaded from NCBI SRA
database (https://www.ncbi.nlm.nih.gov/sra) (see Note 2). In this
protocol, we use RNA-Seq data set from the rubber tree. This data set
includes six samples from control and cold stressed conditions with
three biological replicates, which are denoted as “control” and “cold.”

3 Methods

Download the RNA-Seq data from NCBI SRA database and place
the files in your working directory (e.g., /home/your_name/
NGS/SRA). Run the commands as demonstrated in this protocol
in your working directory (see Notes 3 and 4).

3.1 RNA-Seq Data 1. Generate FASTQ files from SRA files. To extract FASTQ files
Quality Control from downloaded sra files, and put them in a new folder “fq”,
go to your NGS data directory and type (see Note 5):
fastq-dump -O ./fq --split-files ./SRA/SRR*.sra
2. Quality controlling by fastQC (see Note 6).
fastqc -o ./qc -f fastq ./fq/Sample*.fastq
3. Remove reads of low quality (optional). In most cases, the low
quality reads have been removed when the sequences were
transferred from the service supplier. In this example, the
FASTQ file has been filtered when submitted to the NCBI
SRA database (see Note 7).
fastq_quality_filter -Q33 -v -q 30 -p 90 -i fq/Sample*.fastq
-o fq/Sample*.fastq
8 Han Cheng et al.

3.2 Gene Expression In most cases, nonmodel organisms do not have reference genome.
Analysis Without We therefore use no reference genome analysis strategy to compare
Reference Genome gene expression profiles and to find DE genes. This strategy first
assembles a reference transcriptome from the RNA-Seq data, and
then maps the reads to the reference transcriptome and calculates
gene expression. In this protocol, we use Trinity to assemble transcrip-
tome, and then use RSEM to calculate reads counts, finally utilize two
popular packages, EBSeq and DESeq, to find DE genes respectively.
1. Reference transcriptome assembly. The Trinity program [4]
can assemble the reads in all the sample files into one reference
transcriptome. Then the reference transcriptome can be used
for gene expression analysis. For paired-end RNA-Seq with
read1 (*_1.fastq) and read2 (*_2.fastq), the reference tran-
scriptome could be assembled by typing:
Trinity.pl --JM 500G --seqType fq --left fq/Sample*_1.fastq
--right fq/Sample*_2.fastq --output trinity_out --min_
kmer_cov 5 --CPU 32

(see Note 8)
Trouble shooting: In some cases, the Trinity program will
stop due to short of memory when executing the “butterfly_
commands”. You may go to the results directory trinity_out/
chrysalis/ and check if the “butterfly_commands” file exists.
Then use the following commands to continue the assembly.
cmd_process_forker.pl -c trinity_out/chrysalis/butterfly_
commands --CPU 10 --shuffle;

find trinity_out/chrysalis -name "*allProbPaths.fasta" -exec


cat {} \; > trinity_out/Trinity.fasta;
You will find a “Trinity.fasta” file in the output directory, which
is the assembled reference transcriptome of all the reads. You
can also check the reference transcriptome statistics by running
the TrinityStats.pl script provided by Trinity package:
TrinityStats.pl trinity_out/Trinity.fasta
2. Gene expression quantification with RSEM. RSEM is an accu-
rate and user-friendly tool for quantifying transcript abun-
dances from RNA-Seq data and it does not rely on the
existence of a reference genome [7]. Therefore, it is particularly
useful for expression quantification with de novo transcriptome
assemblies. The RSEM program includes just two scripts (rsem-
prepare-reference and rsem-calculate-expression), which invokes
Non-Model Organisms Transcriptome Analysis 9

Bowtie [22] for read alignment. The first step is to extract and
preprocess the reference sequences and then builds Bowtie
indices.
mkdir rsem

cd rsem

mkdir tmp

extract-transcript-to-gene-map-from-trinity ../trinity_out/
Trinity.fasta tmp/unigenes.togenes

rsem-prepare-reference --transcript-to-gene-map tmp/


unigenes.togene ../trinity_out/Trinity.fasta tmp/unigenes

Then the RNA-Seq reads in each sample are aligned to the


Bowtie indices and their relative abundances are calculated. The
tasks are handled by the rsem-calculate-expression script. By default,
RSEM uses the Bowtie alignment program to align reads, with
parameters specifically chosen for RNA-Seq quantification. The
rsem-calculate-expression script processes the reads in each sample.
A short Bash script will be much easier to handle large amount of
samples in one analysis.
export k

for ((k¼1;k<6;kþ¼1));do

rsem-calculate-expression -p 24 --bowtie-chunkmbs 512


--paired-end --no-bam-output --forward-prob 0.0 fq/Sample
${k}_1.fq fq/Sample${k}_2.fq tmp/unigenes rsem/Sample${k};

done

The rsem-calculate-expression script produces two files with “.


results” suffix, in which the “.gene.results” file calculate TPM and
FPKM for each gene, whereas the “.transcripts.results” listed the
TPM and FPKM for each transcript. The file structures are as follow:
The “Sample.genes.results” file:
gene_id transcript_id(s) length effective_length expected_count TPM FPKM

c0.graph_c0 c0.graph_c0_seq1 745.00 690.31 14.00 2.43 1.79

c1.graph_c0 c1.graph_c0_seq1 262.00 207.46 1.00 0.58 0.43


10 Han Cheng et al.

The “Sample.transcripts.results” file:


transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct

c0.graph_c0_seq1 c0.graph_c0 745 690.31 14.00 2.43 1.79 100.00

c1.graph_c0_seq1 c1.graph_c0 262 207.46 1.00 0.58 0.43 100.00


3. Differentially expressed gene identification with EBSeq. EBSeq
is an R package for exploring DE genes and isoforms from
RNA-Seq data, which is based on empirical Bayesian method
and aims to identify DE isoforms between two or more
biological samples [20]. EBSeq processes counts matrix files
generated by RSEM, and calculates the expression of each gene
in each sample.
RSEM provides several wrappers which could invoke EBSeq to
identify differentially expressed genes. This is the easier way to use
EBSeq. Merge each single counts file to generate a matrix file with
the following commands:
rsem-generate-ngvector ../trinity_out/Trinity.fasta cov5_trinity

rsem-generate-data-matrix Sample*.genes.results >


genes.counts.matrix

Then use the following commands to obtain DE genes:


rsem-run-ebseq --ngvector cov5_trinity.ngvec genes.
counts.matrix 3,3 GeneMat.results

rsem-control-fdr GeneMat.results 0.05 GeneMat.de.txt

(see Note 9)
Alternatively, you can also use EBSeq in a native way for DE
gene identification. In R console, type:
library(“EBSeq”)

setwd("/path/to/your/directory/rsem/")

GeneMat <- data.matrix(read.table(file¼"genes.counts.


matrix"))

NgVec <- scan(file¼"cov5_trinity.ngvec", what¼0, sep¼"\n")


Non-Model Organisms Transcriptome Analysis 11

Condition ¼ factor(c("Control","Control","Control","Cold",
"Cold","Cold"))

GeneSizes ¼ MedianNorm(GeneMat)

GeneEBOut ¼ EBTest (Data¼GeneMat, Conditions¼Condi-


tion,sizeFactors¼GeneSizes, maxround¼10)

GeneEBDERes¼GetDEResults(GeneEBOut, FDR¼0.05)

(see Note 9)
For more detailed function introduction, please refer EBSeq
vignette [20].
4. Differentially expressed gene identification with DESeq. Alter-
natively, you can use DESeq for DE gene identification. DESeq
is a R package to analyze sequence counts data from RNA-Seq
and test for differential expression [21]. DESeq accepts RSEM
output files for analysis. The first step is to merge each FPKM
count files generated by rsem-calculate-expression script in
RSEM package. The merging step can be performed with
merge_RSEM_frag_counts_single_table.pl scripts from Trinity
package:
TRINITY_HOME/util/RSEM_util/merge_RSEM_frag_
counts_single_table.pl Sample1.genes.results Sample2.genes.results
Sample3.genes.results Sample4.genes.results Sample5.genes.results
>all.genes.counts

Then in R console, type:


library(“DESeq”)

countTable<-read.table("all.genes.counts",header¼T,sep¼
"\t",row.names¼1)

countTable ¼ round(countTable)
(see Note 10)
conditions<-factor(c("Control","Control","Control",
"Cold","Cold","Cold"))

cds<-newCountDataSet(countTable,conditions)

cds<-estimateSizeFactors(cds)

cds<-estimateDispersions(cds)
12 Han Cheng et al.

res <-nbinomTest(cds,"Control","Cold") #call differential


expression

write.table(res, ’compare.csv’,sep¼’\t’,quote¼F,row.names¼F)

head(res)

plotMA(res)

res_sig<-subset(res, padj<0.05);
(see Note 11)
dim(res_sig)

res_sig_order<-res_sig[order(res_sig$padj),]

write.table(res_sig_order, ’difference.txt’,sep¼’\t’,quote¼F,
row.names¼F)
(see Note 12)
For detailed introduction, please refer to DESeq vignette [23].

3.3 Gene Expression Benefiting from genome sequencing projects, many reference gen-
Analysis omes have been published in nonmodel organisms recently. In
with Reference these organisms, the analysis strategy with reference genome can
Genome be adopted. Typically, we first prepare the reference genome files,
then map each reads file to the reference genome, and finally call the
DE genes.
1. Prepare reference genome file. Download the genome files
(sequence fasta file and gff annotation file) from GenBank
database, and then build the bowtie2 index with “bowtie2-
build” command in Bowtie2 package:

bowtie2-build /path/to/genome/HbGenome.fas bowtie-


ref/Hbgenome

(see Note 13)


2. Map reads to reference genome. Map each reads file to the
genome index with tophat2 program, and then assemble tran-
scripts from the reads file with cufflinks program:
tophat2 -o 1th -p 32 -G /path/to/gff/HbGenome.gff3
bowtie-ref/HbGenome /path/to/sample1/Sample1_1.fq/
path/to/sample/Sample1_2.fq
Non-Model Organisms Transcriptome Analysis 13

cufflinks -p 32 -o 1cl 1th/accepted_hits.bam

You may use a short Bash script to analyze several samples in


one command:
export k;

for ((k¼1;k&lt;6;kþ¼1));do

tophat2 -o ${k}th -p 32 -G /path/to/gff/HbGenome.gff3


bowtie-ref/HbGenome /path/to/sample1/Sample${k}_1.fq/
path/to/sample/Sample${k}_2.fq;

cufflinks -p 32 -o ${k}cl ${k}th/accepted_hits.bam;

done
Then merge all the assembled transcripts files:
ls *cl/transcripts.gtf >assemblies.txt

cuffmerge -p 32 -g /path/to/gff/HbGenome.gff3 -s /
path/to/genome /HbGenome.fas assemblies.txt
(see Note 14)
3. Call differential expression genes with Cuffdiff. Cufflinks
includes a program, “Cuffdiff”, which can be used to find
significant changes in transcript expression, splicing, and pro-
moter use. Cuffdiff requires two types of files: sam (or bam) file
from Tophat program and transcript annotation gtf file from
cufflinks:

cuffdiff -o diff_out/ -b /path/to/genome/Hbgenome.fa


-L Control,Cold -u merged_asm/merged.gtf -p 8 1th/accep-
ted_hits.bam,2th/accepted_hits.bam,3th/accepted_hits.bam
4th/accepted_hits.bam,5th/accepted_hits.bam,6th/accepted_
hits.bam

(see Note 15)


The comparison results will be wrote to “diff_out” directory.
Several comparison results will be found, including cds, isoform,
gene, tss, splicing, and promoter. In most cases, you may be inter-
ested in “gene_exp.diff” file. Then you can extract DE genes from
this file based on your criteria and the adjusted “q_value”. The
content of the diff file:
14 Han Cheng et al.

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2


log2(fold_change) test_stat p_value q_value significant

XLOC_000001 XLOC_000001 - scaffold0001:445549-451760 Control Cold


OK 4.17386 2.62692 -0.668007 -0.799812 0.1381 0.404678 no

4 Notes

1. The Tophat2 was superseded by HISAT2. In this protocol, we


still use old version Tophat for analysis.
2. To simplify the analysis procedure, we use nonmodel Hevea
brasiliensis (rubber tree) RNA-Seq data as the example.
This dataset include two samples (Leaf under control condition,
and cold treated for 24 h), each with three biological replicates.
3. This protocol only shows how to run each analysis steps, and
also gives frequently used options for each command or scripts.
You may also go to check each option of the command and
optimize your own analysis parameters.
4. Please note that the directory structural differences between
this protocol and your own workstation. You should change
the file paths and names according to your own directory.
5. The fastq-dump tool extract reads from SRA package. The
parameter “-O” defines the output directory. “--split-files”
option will enable dumping each read into separate file. Files
will receive suffix corresponding to read number.
6. The results are in the subdirectory under the name of fastq
filename with a “_fastqc” suffix. You may examine the detail
quality check results in “astqc_report.html” file.
7. Add the “-Q33” parameter when meet “fastq_quality_filter:
Invalid quality score value” error.
8. “--JM” option defines how much Giga memory allocated for
the jellyfish to calculate k-mer. --left and --right define the left
and right fastq files for the pair-end seuqencing results. --
min_kmer_cov defines the minimal kmer when calculate the
k-mer number in Inchworm, a high --min_kmer_cov value will
reduce the noise in the assembly and to identify only transcripts
that were relatively highly expressed, but also lose some lowly
expressed transcripts. Define --CPU number for the inchworm
when your server has multiple CPU.
9. This analysis found DE genes at the target FDR of 0.05.
Non-Model Organisms Transcriptome Analysis 15

10. Expected_counts from RSEM are float numbers because the


reads mapped to multiple locations are assigned to each loca-
tion according to the fractional weighted estimation using an
EM algorithm. However, the DESeq only accepts integer
counts. We therefore use round function to get integer counts.
11. Get DE genes by adjusted p-value less than 0.05.
12. The scripts find DE genes by adjusted p-value less than 0.05,
then export DE gene list to the “difference.txt” file.
13. The bowtie2-build command builds an “Hbgenome” genome
index from genome file “HbGenome.fas”.
14. The program will generate a “merged.gtf” file in “merge-
d_asm” directory.
15. Supply replicate SAMs as comma separated lists for each condi-
tion: Sample1_rep1.sam,sample1_rep2.sam,...sample1_repM.
sam. Separate each condition with space. -L/labels, comma-
separated list of condition labels. Each lable indict one treatment
(condition); The label numbers should equal to conditions.

Acknowledgments

This work is supported by the National Natural Science Foundation


of China (grant No. 31301072).

References
1. Hoeijmakers WAM, Bártfai R, Stunnenberg 8. Chao J, Chen Y, Wu S, Tian W-M (2015)
HG (2013) Transcriptome analysis using Comparative transcriptome analysis of latex
RNA-Seq. Methods Mol Biol 923:221–239 from rubber tree clone CATAS8-79 and
2. Garg R, Jain M (2013) RNA-Seq for transcrip- PR107 reveals new cues for the regulation of
tome analysis in non-model plants. Methods latex regeneration and duration of latex flow.
Mol Biol 1069:43–58 BMC Plant Biol 15:104
3. Wang Z, Gerstein M, Snyder M (2009) 9. Fang Y, Mei H, Zhou B et al (2016) De novo
RNA-Seq: a revolutionary tool for transcrip- Transcriptome analysis reveals distinct Defense
tomics. Nat Rev Genet 10:57–63 mechanisms by young and mature leaves of
4. Grabherr MG, Haas BJ, Yassour M et al (2011) Hevea Brasiliensis (Para rubber tree). Sci Rep
Full-length transcriptome assembly from 6:33151
RNA-Seq data without a reference genome. 10. Bevilacqua CB, Basu S, Pereira A et al (2015)
Nat Biotechnol 29:644–652 Analysis of stress-responsive gene expression in
5. Xie Y, Wu G, Tang J et al (2014) cultivated and weedy Rice differing in cold
SOAPdenovo-trans: de novo transcriptome stress tolerance. PLoS One 10:e0132100
assembly with short RNA-Seq reads. Bioinfor- 11. Fu J, Miao Y, Shao L et al (2016) De novo
matics 30:1660–1666 transcriptome sequencing and gene expression
6. Simpson JT, Wong K, Jackman SD et al (2009) profiling of Elymus Nutans under cold stress.
ABySS: a parallel assembler for short read BMC Genomics 17:870
sequence data. Genome Res 19:1117–1123 12. Nakashima K, Yamaguchi-Shinozaki K, Shino-
7. Li B, Dewey CN (2011) RSEM: accurate tran- zaki K (2014) The transcriptional regulatory
script quantification from RNA-Seq data with network in the drought response and its cross-
or without a reference genome. BMC Bioin- talk in abiotic stress responses including
formatics 12:323 drought, cold, and heat. Front Plant Sci 5:170
16 Han Cheng et al.

13. An D, Yang J, Zhang P (2012) Transcriptome 19. Trapnell C, Roberts A, Goff L et al (2012)
profiling of low temperature-treated cassava Differential gene and transcript expression
apical shoots showed dynamic responses of analysis of RNA-seq experiments with TopHat
tropical plant to cold stress. BMC Genomics and cufflinks. Nat Protoc 7:562–578
13:64 20. Leng N, Dawson JA, Thomson JA et al (2013)
14. SRA Toolkit: https://trace.ncbi.nlm.nih.gov/ EBSeq: an empirical Bayes hierarchical model
Traces/sra/ for inference in RNA-seq experiments. Bioin-
15. FastQC: http://www.bioinformatics. formatics 29:1035–1043
babraham.ac.uk/projects/fastqc/ 21. Anders S, Huber W (2010) Differential expres-
16. R: The R Project for Statistical Computing. sion analysis for sequence count data. Genome
https://www.r-project.org/ Biol 11:R106
17. Langmead B, Salzberg SL (2012) Fast gapped- 22. Langmead B, Trapnell C, Pop M, Salzberg SL
read alignment with bowtie 2. Nat Methods (2009) Ultrafast and memory-efficient align-
9:357–359 ment of short DNA sequences to the human
18. Kim D, Pertea G, Trapnell C et al (2013) genome. Genome Biol 10:R25
TopHat2: accurate alignment of transcrip- 23. Love MI, Anders S, Kim V, Huber W (2015)
tomes in the presence of insertions, deletions RNA-Seq workflow: gene-level exploratory
and gene fusions. Genome Biol 14:R36 analysis and differential expression. F1000Res
4:1070
Chapter 2

Microarray Data Analysis for Transcriptome Profiling


Ming-an Sun, Xiaojian Shao, and Yejun Wang

Abstract
Microarray data have vastly accumulated in the past two decades. Due to the high-throughput characteristic
of microarray techniques, it has transformed biological studies from specific genes to transcriptome level,
and deeply boosted many fields of biological studies. While microarray offers great advantages for expres-
sion profiling, on the other hand it faces a lot challenges for computational analysis. In this chapter, we
demonstrate how to perform standard analysis including data preprocessing, quality assessment, differential
expression analysis, and general downstream analyses.

Key words Microarray, Normalization, Clustering, Differential expression, Bioconductor, Limma,


GeneFilter

1 Introduction

The successful application of microarray for expression analysis


could be traced back to two decades ago [1]. Since then, the
microarray technique has been widely used for expression profiling
in almost every field of biological research [2]. Beyond transcrip-
tion analysis, alternative microarray based techniques have also
been designed for other purposes such as genotyping, DNA
mapping, protein binding, and epigenetic studies [3]. Due to the
high-throughput characteristics of microarray techniques, it has
transformed biological studies from specific genes to transcriptome
level, and deeply boosted many fields of biological studies. Previous
studies showed that microarray is robust for measuring transcrip-
tome [4]. Even though RNA-Seq has emerged in recent years,
microarrays remain popular for measuring gene expression
[5, 6]. In particular, since microarray is cheaper than RNA-Seq, it
has advantages for clinical studies, which may involve a huge
amount of samples. For example, microarray is frequently used in
several comprehensive projects for cancers, including The Cancer
Genome Atlas project [7].

Yejun Wang and Ming-an Sun (eds.), Transcriptome Data Analysis: Methods and Protocols, Methods in Molecular Biology,
vol. 1751, https://doi.org/10.1007/978-1-4939-7710-9_2, © Springer Science+Business Media, LLC 2018

17
18 Ming-an Sun et al.

While microarray offers great advantages for expression


profiling, on the other hand it faces a lot challenges for analysis
[2]. In particular, technical noise could be introduced in microarray
data. Additionally, the challenges of analysis also come from the
tremendous number of probes in microarray, and the few number
of replicates used for most microarray studies. Currently, a large
number of methods have been proposed to deal with problems for
each analysis step, including quality control [8–10], normalization
[11], and differential expression analysis [12–14].
Bioconductor is an open-source, open-development software
project for the analysis and comprehension of high-throughput
data arising from genomics and molecular biology [15]. So far
more than 1000 packages have been released in the Bioconductor.
Importantly, every step for microarray data analysis could find a
solution using packages hosted in Bioconductor project (see Note
1). In this chapter, we show how to implement each step of micro-
array analysis, including quality control, normalization, differential
expression analysis and some general downstream analyses, using
packages mainly from Bioconductor project. In this protocol, data
generated from Affymetrix Mouse Gene 2.0 ST Array (MoGene-
2.0-ST) platform was used for demonstration. However, the analy-
sis procedure described in this protocol could be adjusted for the
analysis of data from other microarray platforms easily.

2 Materials

2.1 Microarray Data This protocol starts with Affymetrix microarray data of CEL format
(see Note 2). The CEL files store the results of the calculated
intensity. In addition to newly generated CEL files in the lab, a
huge amount of published CEL files could be retrieved from several
public resources, in particular ArrayExpress (https://www.ebi.ac.
uk/arrayexpress/) and NCBI Gene Expression Ominibus (GEO;
https://www.ncbi.nlm.nih.gov/geo/). To be noted, ArrayExpress
is specific for microarray data, while GEO also contains other types
of OMICs data.
In this protocol, we use public datasets (GEO accession:
GSE67964) for Affymetrix Mouse Gene 2.0 ST Array (MoGene-
2.0-ST) for demonstration.

2.2 R Packages This protocol involves a number of R packages, thus basic knowl-
edge about R and Bioconductor is essential. The basics of R could
be found from resources such as http://tryr.codeschool.com/. R
and Bioconductor could be installed by following instructions from
http://www.bioconductor.org/install/. Below we briefly summar-
ized the ways for R and Bioconductor packages installation and
loading (see Note 3). For the installation of each package used in
this protocol, it will be described in the corresponding section.
Microarray Data Analysis 19

R packages could be installed using the install.packages() func-


tion easily. Take ggplot2 package as example, you just need to start
the R console and type:
install.packages("ggplot2")

To install core packages from Bioconductor, type:


source("https://bioconductor.org/biocLite.R")

Then, specific Bioconductor packages could be installed. For


example, to install the oligo package, type:
biocLite("oligo")

After installation, both R or Bioconductor packages could be


loaded by the library() function. Take the oligo package as exam-
ple, to load it, type:
library(oligo)

2.3 Annotation Files Two types of annotation files are required: (1) the probe set anno-
tation, which summarizes the location of all probes on the array, as
well as the probes for each probe set; (2) gene annotation, which
maps the probesets to their corresponding genes.
For most microarray platforms, R Bioconductor packages
providing the annotation information are ready for use (see Note 1).
For example, the two annotation packages for MoGene2.0-ST micro-
array are pd.mogene.2.0.st [16] and mogene20sttranscriptcluster.db
[17], respectively. Since this protocol involves a lot of R Bioconductor
packages, these annotation packages could be incorporated into the
pipeline seamlessly.

3 Methods

3.1 Data We download CEL files from GEO (https://www.ncbi.nlm.nih.


Preprocessing gov/geo/) by searching GEO accession (e.g., GSE67964). This
dataset contains data for wild-type and ROR_alpha_gamma_dKO,
3.1.1 Prepare Data each with four replicates.

3.1.2 Set Work Directory To set the work directory, type:


setwd(“directory_with_CEL_files”)

3.1.3 Read Data into The Bioconductor package “oligo” offers a number of tools for
Memory preprocessing of Affymetrix CEL files, including data import, back-
ground correction, normalization, data summarization and visuali-
zation [18]. In addition, you might need to install and load the
20 Ming-an Sun et al.

probe set annotation package (e.g., pd.mogene.2.0.st for


MoGene2.0-ST platform), if it is failed to be installed automatically
together with “oligo”.
1. To install and load the oligo package, type:
biocLite("oligo")
library(oligo)

2. To get the list of all the CEL files in the directory, type:

cel.files <- list.celfiles()

Or if you only want to read specific CEL files (e.g., celfile1


and celfile2), type:

cel.files <- c(celfile1, celfile2)

3. By default, CEL file names will be specified as sample names.


However, we usually want to respecify sample names, in partic-
ular when the CEL file names are lengthy. The sample names
should be of the same number and order of CEL file names. To
specify sample names manually, type:

sample.names = c("WT1", "WT2", "WT3", "WT4","KO1", "KO2",


"KO3", "KO4")

4. To read CEL files into memory, type:

affy.raw <- read.celfiles(cel.files, sampleNames = sample.


names)

3.1.4 Get Normalized To summarize gene level expression, the probeset annotation for
Gene Expression specific array is required. Take microarray data from mogene.2.0.st
platform as example, the Bioconductor package pd.mogene.2.0.st
[16] is needed.
1. To install and load the annotation library pd.mogene.2.0.st,
type:
biocLite("pd.mogene.2.0.st")
library(pd.mogene.2.0.st)

2. To make reasonable comparison between different samples,


normalization must be performed. Robust Multi-Array Aver-
age (RMA) is the most widely used normalization algorithm.
Meanwhile, there are several other normalization algorithms,
including GCRMA, Mas5, dChip, and so on (see Note 4). The
differences of these methods have been discussed in previous
Microarray Data Analysis 21

studies []. The GCRMA package takes GC content into


account when doing RMA normalization. However, one
study argued that a crucial step in GCRMA responsible for
introducing severe artifacts in the data leading to a systematic
overestimate of pairwise correlation []. Here we show the use
of RMA, but you could apply other your preferred algorithms.
To normalize gene expression using RMA algorithm, and cre-
ate an ExpressionSet object (see Note 5), type:
eset <- rma(affy.raw)

3. To save the expression data in a local file that may be used later
(to be noted, the expression values in the output are normal-
ized and log2 transformed), type:
write.exprs(eset,file="rma_norm_expr.txt")

3.1.5 Gene Annotation Gene annotation is need for further interpretation of the results.
Two Bioconductor packages are required, including Biobase [15]
and mogene20sttranscriptcluster.db [17].
1. To install and load these two packages, type:
biocLite("Biobase")
biocLite("mogene20sttranscriptcluster.db")
library(Biobase)
library(mogene20sttranscriptcluster.db)

2. The mogene20sttranscriptcluster.db package provides a variety


of detailed information for Mogene2.0ST platform, including
ACCNUM, ENSEMBL, ENTREZID, ENZYME, GENE-
NAME, GO, PATH, PFAM, PROSIT, REFSEQ, SYMBOL,
UNIGENE, and UNIPROT. To get a list of available objects in
the package, type:
keytypes(mogene20sttranscriptcluster.db)

3. To retrieve data for selected objects (e.g., ENTREZID and


SYMBOL as showed below) as a data frame, type:
gns <- select(mogene20sttranscriptcluster.db, keys(mogen-
e20sttranscriptcluster.db), c("ENTREZID", "SYMBOL"))

4. For certain types of annotations (such as gene symbol), there


could be multiple matches for the same gene. In such case, if
you only want to keep one match per gene, the most naive way
is to keep the first one. However, just skip this step if you want
to use full annotation information. To keep only the first
annotation for each gene, type:
gns <- gns[!duplicated(gns[,1]),]
Another random document with
no related content on Scribd:
flowers, there must still be minds in which language is growth and
beauty; and there must be a Gradus ad Parnassum, a means of
working-up through the machine-made stages, a consciousness
piercing somehow down into the copy-book world, something to
remind the half-lettered of the primitive life they have emerged from
and the completer life to which they would attain. Our English must
keep its natural warmth and concreteness, its gift of free response to
the fresh fact. These things cannot be preserved. Preserves, it is
true, keep indefinitely, but at the sacrifice of freshness; and it is
freshness that we want. What we love most in English is just that
quality of unsugared sweetness, which is the difference between fruit
and jam.

Here we bring new water from the well so clear,


For to worship God with this happy New Year.

The best English always has a bloom upon it. The danger is that, as
vulgarisms increase on one side, proprieties will increase on the
other, and that conversation may begin to burden itself with a sense
of duty. To be correct is already to be mechanical. The defiance of
correctness, even by the vulgar, has in it something of the virtue and
virility, which, in the work of masters, we recognize as the genius of
the language. It is easy enough to avoid saying “like I do”; but it is
difficult to realize that living language overrides grammatical
distinctions and that the test of a phrase is not whether it has been
tabled at Oxford, but whether it has its share of soil and sun and
dew. Here the indolences of our language, its cautiousness, and
even its propensity to wallow in the mire, may have their saving
influence. They are all symptoms of the instinct to get appearances
on the honourable side, the instinct to appear less, not more, than
you are; they are the tacit acknowledgment of a standard of reality,
and count for ballast and steadiness.
Are there then no means of vitalizing our English speech? One
cannot put the question without seeing that it is unreal. “The answer
is in the negative”, as our officials say. Even education itself,
consciously applied, may defeat its object; for if people are to talk
English, they must talk as they wish to talk; they know that the
majority of their would-be masters talk the worse for talking as they
have been taught. As to the meanings of words, the temptation to
suppose that they can be decided from on high must specially be
resisted. We all have our contribution to make to the meaning of the
words we use, and the greatest words—faith, freedom, sport, spirit—
cannot mean more than we do. These cannot be standardized;
standardization, the name without the thought, is their death, simply.
The Trade Unionists of England are disposed to banish ‘competition’
from our dictionary; will nature vanish it from hers? ‘Religion’,
somewhere in America, is the belief that the world was created in six
days; if truth is a fundamentalist, well and good. Obviously there
must be standardization up to a point if people are to stick together,
and we must be prepared to swallow it in considerable doses now
that English is the language of two hemispheres. But the essential is
that the point should be a point of agreement. The kind of feeling, the
kind of habit, that can be imposed on a man are not worth imposing:
the Germans showed that. We, too, have our outbreaks of the
dragooning impulse: the word ‘Empire’ is a notorious rally, with
hyænas always hot upon its trail. But, on the whole, the tendency to
reduce experience to rule and its expression to a formula, the
tendency to regularize men’s minds and drill them into uniformity,
flatly opposed as it is to all our traditions, wins little success amongst
us. True, we have a certain uniformity of drabness (the livery of the
sparrow) which suggests an army inured to all the degradations of
drill and rebellious only against its smartness. But then, it is the
smartness that kills. Drill is machine-made uniformity, a necessary
evil of which the English hate to make a panache. Their uniformities
are morose, because they are uniformities of submission; their pride
goes out to the things they touch directly and can make their own.
This is the attitude to be cherished at all costs, because the future is
open to it, because it opens to the future. By Heaven’s grace, the
English have it deep ingrained. Thus the future of English presents
itself to the mind as depending, above all, on the survival, in its pre-
eminence, of the spirit of freedom, the more so because the scope of
freedom is determined by the capacity for discipline. The question of
the day is how much machinery a man can stand; and the hope for
English is that the average Englishman can stand so much.
Regulations are necessary everywhere. Language itself must have
its dictionary, grammar its rules. The English rob them of their sting
by toleration. Their order even when they speak is spontaneous and
has a taste of liberty.
That an Englishman should regard England as the life-centre of
the English language is, perhaps, inevitable; yet he is foolish if he
assumes her to be so. The life-centre of English is to be found where
the spirit of those who speak it is in closest accord with developing
realities, and these cannot reveal themselves to minds fixed in any
past, however vital that past may have been when it was present.
Are not, then, the Americans living a more contemporary life than we
are?—has not the focus of development passed over to them? This
is a question so searching that I can touch upon it only with the
greatest diffidence. At the conclusion of his first preface to Leaves of
Grass, Whitman, distinguished among great writers for the forward
view, congratulated himself and the Americans on the qualities of the
language they had inherited. “English”, he wrote, “is the chosen
tongue to express growth, faith, self-esteem, freedom, justice,
equality, friendliness, amplitude, prudence, decision, and courage.” It
is a noble list of virtues which no one would wish to disavow; and yet
the Englishman, of whatever station, would still prefer the briefer
catalogue of Chaucer’s knight, who, five hundred years ago,

loved chivalrye
Trouthe and honour, fredom and curteisye.

In such words as courtesy, chivalry, and honour, though doubtless he


does not understand them quite as Chaucer did, he would trace a
fullness of experience, for which self-esteem, friendliness, and their
like, however generously mixed with faith and courage, seem poor
equivalents. Now, Chaucer’s virtues obviously assume inequalities
between men and a sense of the responsibilities of privilege.
Whitman’s assertion is that the English ideal survives when privilege
is discarded. Can it? Is not the bloom, is not the ripeness of our most
comprehensive, most human words, is not the peculiar aroma which
surrounds the English conception of the virtues, traceable to our
candid admission that inequalities, even when traditional, may be
bedded in truth? Honour itself, though not the property of a class,
belongs we feel, to those who, by favour of circumstance in part,
have come to see that circumstance counts for nothing by the side of
truth and loyalty, and who therefore identify these with their very
being. Arising out of advantage, the sense of honour carries with it a
compensating obligation to all from whom such advantage is
withheld. No such associations can attach to the word in America,
because they imply limits which are not recognized, nor is honour
allowed its externalization, its badge. The King is, with us, the
fountain of honour, as he is also its personification at the height; and
to them our toleration of royalty is a mysterious medievalism. Yet the
Englishman who easily sees the absurdity of kings in general finds
his own miraculously contemporaneous. Differences like this affect in
a thousand ways the flavour and idiom of the two languages (for, for
the moment, we must call them two), and even the tone with which
they are spoken. American talk is full of equality; and to the English
ear this equality sounds less like a harmonious prevision of Nature’s
purpose than a grim determination to wrest it into line with human
wishes.
Right and wrong in such a matter can be decided only by the
event. However it be, the United States, obviously, is now the scene
of the severest ordeals, the vividest excitements of our language.
Only when we hear English on the lips of Americans do we fear for
its integrity; others might drag it down; they alone could lift it into
change; they alone speak an actively competitive English. They have
the right. The English of the United States is not merely different
from ours, it has a restless inventiveness which may well be founded
in a sense of racial discomfort, a lack of full accord between the
temperament of the people and the constitution of their speech. The
English are uncommunicative, the Americans are not. In its coolness
and quiet withdrawal, in its prevailing sobriety, our language reflects
the cautious economies and leisurely assurance of the average
speaker. We say so little that we do not need to enliven our
vocabulary and underline our sentences, or cry ‘Wolf!’ when we wish
to be heard. The more stimulating climate of the United States has
produced a more eager, a more expansive, a more decisive people.
The Americans apprehend their world in sharper outlines and aspire
after a more salient rendering of it. No doubt the search for emphasis
in the speech of Americans and of American women particularly
arises, in part, out of the sheer volume of their communication; but it
is also because of their keener interest in things that they have a
greater desire to talk about them.
With this greater vividness goes, inevitably perhaps, a disposition
to anticipate, to define, to ‘fix’. The American nation was born of the
desire for a more perfect freedom than was obtainable in England;
and one of its first actions was to get freedom fixed, to define and
express it in a constitution. It might seem impossible that freedom
should ever be a chain, but stranger things have happened; and a
chain that passes under the name of freedom is peculiarly galling.
The American is threatened by a danger of knowing his freedom
before he gets it; the Englishman at best surmises, out of a mind
stored with immemorial checks and inhibitions. Idealism with the
English is an unacknowledged leaven, permeating action and
language and passing from one to the other in a haze of tolerance
that helps them to surmount the difficult transition from thought to
things. Sleepy blundering protects them against the cruder
certitudes. The American attitude has more of the unmediated clash
of steel on steel, unsurpassable when the fit is perfect and the
speeds accurately timed, but, in the world we know, liable to produce
friction, heat, and jarring. The bright slap-dash of the American
vernacular shows the defect of this quality, and with its insistence on
scoring leaves reality behind. In the ‘he-man’ hero of ‘sob-stuff’
efficiency and sentimentalism meet and marry.
Oppressed by the weight of their traditions, anxious to find a
machinery for maintaining them, the English in England show
symptoms of decline. Societies to study and protect a language
however admirably inspired, have an ominous, classicizing trend.
We are becoming conscious of our language as of our Empire, and
our virtue was our unconsciousness. The fresh outlook, the frank
unconcern, the overflowing youthfulness of the Americans drive us
back upon ourselves, it may be, but they are a reviving challenge,
nevertheless; and though much that is most deeply characteristic of
the language is threatened by Americanism, the conditions under
which English is spoken in the United States (where it is only one
language among many) have a great deal in common with those out
of which it originally grew, and are certain to produce, as indeed they
have produced already, a flow of novel words and novel devices,
some of which will remain to enrich and renovate our speech. The
fact, too, that America and England stand for different impulses, not
easily reconcilable, may enable them to discover and release a
further impulse, deeper than that with which either seeks to be
identified. Above all, the more magnetic, more mercurial, the tauter,
stormier American temperament has, with these gifts of the modern
life of speed and contrast, a quicker sympathy, a warmer and more
inclusive comradeship. Love and freedom are the greatest words of
our speech; and if, in America, ‘freedom’ is losing some of its bloom,
‘love’ has found there a new substance and sweetness.
The contrasting and competitive use of their one language by the
English and the Americans gives it a new occasion for the exercise
of its old and noble faculty of compromise. In a period of promise
and renewal, it was beginning to grow old, the Americans are young;
in a period of urgency, it was lagging, the Americans have made
speed their element. Nothing, we may be sure, will ever make the
English language brisk; but its strong constitution will assimilate
tonics as fast as friends can supply them, and take no serious harm.
Changes are certainly in store for it; but the best and most English
instinct is still that of resistance to change, and above all to any plan
or method of change, any committee or academy or association to
school and enlighten us. Let the future of our language repose in our
own keeping; let us be jealous of our property in it. Take the most
obvious of its faults, its vagaries of spelling and pronunciation. Of
course it would be an advantage if there were less chaos here. But it
is doubtful whether, if a revision was made by the best people that
could be found, our gains would outweigh the loss we should suffer
in having asked for it; and, just because rulings are un-English, they
generally come from the worst people. On pronunciation the B.B.C.
already undertakes to instruct us, and its chief adviser is said to be
an Irishman. O passi graviora...! The Lord will make an end of these
things too. Milton spelt a number of words variably to express
degrees of emphasis; it is pleasant to think that nothing need prevent
a successor of his doing the same to-morrow, if he ever finds a
successor. But, naturally, the position is different now that usage is
settled. Usage is our best law. The Americans have dropped a u out
of humour and other words; possibly we should have done so, if they
had not. An inconspicuous adjustment like this which saves time and
trouble is obviously harmless, and one may even hope that it will be
followed by others. From time to time experiments can be aired in
the press or by some enterprising publisher; if they find favour, they
will be adopted. But conscious spelling leads to conscious
pronunciation; and, again, this kind of consciousness, when English
people get it, always goes wrong. You change ‘humour’ into ‘humor’
and you get people talking as if the last syllable rhymed with ‘or’. You
change the spelling of a word to bring it into line with the
pronunciation and, before you can look round, people have changed
the pronunciation to bring it into line with the spelling. Where are you
then? The truth is, that sensitive pronunciation of English involves
gradations and blends of vowel sound that the alphabet has no
means of recording; and our frank anomalies are really useful if they
help to remind us of this. How am I to pronounce ‘prophecy’ or
‘library’ or ‘worship’? I only know when I hear them on the lips of
some one who can speak English. A further value of our spelling, as
we have it, is its bond with the past. It is a pity that many usages,
when first established, were established amiss; but the errors are of
such ancient date that they have grown into the language. Most of
our spellings, too, have something to tell us of the history and origin
of the words concerned, and, in a mixed language like ours, this is
much more important than that they should attempt to imitate and
perpetuate our way of pronouncing them. It is absurd to spell ‘rough’
and ‘dough’ as we do; but if we substituted ‘ruff’ and ‘doe’, we should
lose interesting information and also fall into a confusion which we
now avoid.
What applies to spelling applies equally to grammar and to the
formation of words. We appreciate it, of course, when people who
have studied language and have leisure to think about such things
tell us how we ought to speak and what kind of improvements we
might introduce into our language if we chose. This is the sort of
topic which serves admirably for the correspondence columns of the
daily press during the month of August, and gives its readers
something to refresh their minds with in intervals of fishing and
shooting. But when enthusiasts run campaigns against ‘cinema’ or
‘aeroplane’, telling us that we must say ‘kineema’ and ‘air-plane’, and
suggesting that English will go to the dogs unless we are more
serious and can consent to be guided by competent authority, the
reply is that seriousness and authority are the dogs, where English is
concerned. So far, it has always kept them running and we hope it
always will.
All the same, it would be the greatest mistake to suppose,
because English refuses to be dictated to and dislikes above all
things the dictation of the specialist, that the destinies of the
language are really in the hands of an unlettered herd. Authority is
always at work; but it emanates from sources wider, fresher, and
saner than any from which it would be possible to obtain it in the
form of rules and laws. If no authority is recognized, it is because we
all aspire to be authorities in our measure, and perceive by instinct
which of our neighbours sees further or knows more than we do.
Instead of a regal fiat, which it would be ignominy to ignore or
disobey, what guides us is an infection of reverence for a mysterious
rightness, the tutelage of which belongs to ourselves just so far as
we are able to penetrate the secret of its being. The final exponents
of this rightness are, of course, the great writers of English when
they are writing as they would like to do—few if any of them have
often done it; and the way of penetration is the knowledge of their
works: not the knowledge which regards them as things done once
and for ever (though, in one aspect, they are inevitably that), but
such as finds in them, rather, the revelation of a spirit capable of
revealing itself anew and of taking forms which, in proportion to their
life and worth, must always be unpredictable.
For, of course, if English is to continue to be the speech of vital,
developing, progressive peoples, nothing is more certain than that
this vitality and progress will be accompanied and sustained by a
literature. We stand together now because of the treasury of wisdom
which our common language enables us to share; but wisdom itself
fades to a dream, unless new expressions of it are continually found,
to illuminate and summarize the swift accumulations of human
experience. Not that books are to be regarded as the greatest thing
in life; or, rather, let us be bold and say that they must be so
regarded; but they are in life, and there are a thousand other things
in it which divide the interest of those who would appreciate books at
their true worth, and which constitute, let us confess it, a very
tolerable education for those (in England they have always been
many) who never open a book at all. The best books are
concentration of the experience of the best livers, of men who, over
and above their faculty, for direct living, have the impulse to live a
second life in which they share with others the discoveries and
delights of the first one. And, just as, among ordinary English people,
action is more than speech and speech shines by its contented
subservience, so, among those who read and write English, the
direct life has always counted for more than the translation. English
literature has been the work of men who lived before they wrote: that
is its greatness. And though this quality is certainly menaced now
that writing tends to become a trade; though the modern audience of
two hundred millions tempts even an Englishman to raise his voice;
yet modern life, we may reflect, has room for many things, and the
worry and self-importance of our literary professionals of all kinds will
somehow get worked into the larger equilibrium required of us, along
with much else that is worrying and imposing. Life is richer now in its
opportunities, more exhilarating in its occupations, more tantalizing
in its questions, more urgent in the close pressure of its reality than it
ever was for our forefathers; and the men who enter into all these
things in flesh and blood will not fail to lift their meaning one day into
the ideal world of books.
Meantime the life itself has to be lived, and the very fact that it will
be inevitably a harassing, distracting life gives the impassive
Englishman his chance. Some one has to take the lead and steer the
steady course—why not he? It is ‘up to him’; for he is not only solid
but sociable; the institutions he devises are the attraction and the
torment of the world. No one else can work them; every one that
sees them has to have a try. Roughly stated, the problem of Western
Civilization is still to abjure slavery, to be rid of the legacies of a
social organization which involved the unconsenting sacrifice of a
class. Every man’s mind is now to be its own master; everything of
value must be open to every one capable of possessing it; the
individual must know his limitations to be his own. And this is no
idealist’s ideal; it is a necessity arising from the diffusion, by
mechanical means in the main, of a knowledge which may easily
wreck us, but of which we cannot get rid. How then is this knowledge
to be formed into an instrument of progress? The condition of
success, clearly, is the presence of a soul-stirring warmth among all
classes, the participation of all in one atmosphere—for every man,
however unawakened, his place in the sun, so that, even if he does
not care to lift his eyes to the light, light may at least reach him
through the pores of his skin. This percolation of light, this
preparatory gestation of embryonic soul, is assured to the English by
the natural mysticism of their intelligence, by the tincture of poetry
that irradiates and solidifies their common sense. The influence
which chiefly sustains them in this firmest and fruitfullest of all their
compromises, is, no doubt, their age-old familiarity with the Bible. All
classes have possessed it, and possessed it so thoroughly as to
insist on a hundred private and personal interpretations of the one
sacred text. Nothing is more English than non-conformity, except the
acceptance of it, and nothing more necessary to the vitality of the
practical English mind. For to conform is to take your truth from
another or to acknowledge that the truth is beyond you. But religion
is practised by the English because its truth is known; personal
discovery has made truth real to them; and the vehicle of the
discovery has been a collection of mysterious poems and
rhapsodies, the words of which there is no holding, for they mean at
the same time everything and nothing. From childhood up poetry has
ruled us all, and our language has been a kind of rainbow-bridge on
which we passed from earth to heaven. The speech which was on
our lips from day to day belonged not only to the day’s events, but
also to a region of heavenly mystery which brooded over them. Our
very faculty of experience has been cradled in the love of
incomprehensible beauties; the ruling virtues of our lives draw
radiance from the words in which they were made known to us.
Out of the merging of the practical and the poetical, the intuitive
acknowledgment of unknown margins as a working factor in
everyday affairs, springs the evolutionary virtue of the English mind,
the hope of its future; and, of course, however broadened by the
Bible, the English instinct for poetry does not stop and did not begin
there. It has expressed itself at large in English literature, the most
companionable literature the world has seen, and it has permeated
the language, a language formed for common uses and stubbornly
matter-of-fact, yet one in which matter-of-factness itself is not hard,
but deep. The English practical man is poetically practical; for, in his
view, the practical lines, in thought and action, are the lines of life;
things that are to succeed, he feels, must hold their place in an
equilibrium, must learn their forms and limits and the economy of
their power as wild things do in the world of natural competition; his
genius is at its best, in work or play, when his occupation is richest in
vital analogies. What is the greatness of cricket—cricket, one of the
great words of the language as it is one of the great facts of English
life—if not that its excellencies can be developed only in a large
frame of human feeling, that it is life in little, as much a poem as a
game? Now the practical life is the life all have to lead; and if the
spirit in which men lead it on the humble level of quiet plodding is the
same as that which in his more radiant element inspires the poet, it
would seem that the condition, essential to progress in this age, of
one light shining for all in varying degrees of brightness, is actually
fulfilled.
What we have abutted on is not, really, a paradox. The nettle, the
sparrow of the world, is its rose, its nightingale. Again, why not?—he
has been, and may be again. The point is that, in life as the English
practise it, one passes into the other imperceptibly. For other
peoples, poetry has been a thing removed from truth and fact,
treating of shadowy or unearthly beauties in an atmosphere no
human being ever breathed. That has never been the prevailing
English view. For them the poet’s task has been the practical one of
making language live, casting on one side the intellectual figments
and abstractions in which speech entangles us and bringing back to
words their primal power and motion. Poetry is often called simple,
but the word needs a gloss. Simple people have poetry because
they are so near nature and speak so little that their speech is like an
animal’s cry, half its own, half an echo of its surroundings. As the
complexities of civilization pass over them, they become complex,
they ‘grow up’, and because they are grown up, we think them more
mature. They are not really more mature: they are more mechanical.
So far as by growth we become complex, we are growing towards a
condition in which growth is stultified. The mature is that of which the
elements are indistinguishably fused together, it is simplicity at a
higher power. This is the simplicity of poetry, which outreaches the
finest minds in their subtlest discriminations and abashes science
with the flames of its enveloping beauty. This, too, is the simplicity of
the English nature, and the English language; neither of them,
obviously, simple things at all, but possessed, it seems, of Nature’s
secret of growth and therefore destined, we may believe, to go on
growing.
It was right that an essay on the future of English should contain
very little about English itself. To test the mirror, watch what it
reflects. The less we think about our language, the likelier we are to
retain the qualities which have made it what it is; the more we study
it, the greater the risk of breaking that continuous impulse with which
the English mind, in high and low alike, feels its way through the
world, watching without defining, absorbing rather than classifying,
identified with the meanings of things, not distinguished from them.
For its loyal use and a true maintenance of the virtue of its tradition
we have only to assume that it was made for our purposes by others
whose purposes were the same as ours, and to see that it lives to-
day on our lips as it lived once on theirs. “Ripeness is all.”

Transcriber’s Notes:
Punctuation and spelling inaccuracies were silently
corrected.
Archaic and variable spelling has been preserved.
*** END OF THE PROJECT GUTENBERG EBOOK POMONA; OR,
THE FUTURE OF ENGLISH ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright
in these works, so the Foundation (and you!) can copy and
distribute it in the United States without permission and without
paying copyright royalties. Special rules, set forth in the General
Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree to
abide by all the terms of this agreement, you must cease using
and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project
Gutenberg™ works in compliance with the terms of this
agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms
of this agreement by keeping this work in the same format with
its attached full Project Gutenberg™ License when you share it
without charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United


States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it
away or re-use it under the terms of the Project Gutenberg
License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country where
you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of the
copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite
these efforts, Project Gutenberg™ electronic works, and the
medium on which they may be stored, may contain “Defects,”
such as, but not limited to, incomplete, inaccurate or corrupt
data, transcription errors, a copyright or other intellectual
property infringement, a defective or damaged disk or other
medium, a computer virus, or computer codes that damage or
cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES -


Except for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU
AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE,
STRICT LIABILITY, BREACH OF WARRANTY OR BREACH
OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE
TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER
THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR
ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE
OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF
THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If


you discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person or
entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS

You might also like