Nothing Special   »   [go: up one dir, main page]

Huang et al., 2017 - Google Patents

LW-FQZip 2: a parallelized reference-based compression of FASTQ files

Huang et al., 2017

View HTML @Full View
Document ID
16325876719449657397
Author
Huang Z
Wen Z
Deng Q
Chu Y
Sun Y
Zhu Z
Publication year
Publication venue
BMC bioinformatics

External Links

Snippet

Background The rapid progress of high-throughput DNA sequencing techniques has dramatically reduced the costs of whole genome sequencing, which leads to revolutionary advances in gene industry. The explosively increasing volume of raw data outpaces the …
Continue reading at link.springer.com (HTML) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • G06F17/30129Details of further file system functionalities
    • G06F17/3015Redundancy elimination performed by the file system
    • G06F17/30156De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • G06F17/30129Details of further file system functionalities
    • G06F17/3015Redundancy elimination performed by the file system
    • G06F17/30153Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/22Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/28Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
    • G06F3/0601Dedicated interfaces to storage systems
    • G06F3/0628Dedicated interfaces to storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks

Similar Documents

Publication Publication Date Title
Huang et al. LW-FQZip 2: a parallelized reference-based compression of FASTQ files
Gamaarachchi et al. Fast nanopore sequencing data analysis with SLOW5
Numanagić et al. Comparison of high-throughput sequencing data compression tools
Durbin Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT)
Layer et al. Efficient genotype compression and analysis of large genetic-variation data sets
Giancarlo et al. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies
Zhang et al. Light-weight reference-based compression of FASTQ data
Yorukoglu et al. Compressive mapping for next-generation sequencing
Tabari et al. PorthoMCL: parallel orthology prediction using MCL for the realm of massive genome availability
US20110246505A1 (en) File generation and search methods for data search, and database management system for data file search
Li et al. A self-contained and self-explanatory DNA storage system
Bonfield CRAM 3.1: advances in the CRAM file format
Messih et al. Protein domain recurrence and order can enhance prediction of protein functions
Li et al. HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads
Dufresne et al. The K-mer File Format: a standardized and compact disk representation of sets of k-mers
El Allali et al. MZPAQ: a FASTQ data compression tool
Shibuya et al. Space-efficient representation of genomic k-mer count tables
Shibuya et al. Better quality score compression through sequence-based quality smoothing
Tang et al. KCOSS: an ultra-fast k-mer counter for assembled genome analysis
Xu et al. RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches
Déraspe et al. Flexible protein database based on amino acid k-mers
Ogasawara et al. Sam2bam: High-performance framework for NGS data preprocessing tools
Zhong et al. GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data
Mason et al. Standardizing the next generation of bioinformatics software development with BioHDF (HDF5)
Yao et al. Parallel compression for large collections of genomes