Huang et al., 2017 - Google Patents
LW-FQZip 2: a parallelized reference-based compression of FASTQ filesHuang et al., 2017
View HTML- Document ID
- 16325876719449657397
- Author
- Huang Z
- Wen Z
- Deng Q
- Chu Y
- Sun Y
- Zhu Z
- Publication year
- Publication venue
- BMC bioinformatics
External Links
Snippet
Background The rapid progress of high-throughput DNA sequencing techniques has dramatically reduced the costs of whole genome sequencing, which leads to revolutionary advances in gene industry. The explosively increasing volume of raw data outpaces the …
- 238000007906 compression 0 title abstract description 99
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30156—De-duplication implemented within the file system, e.g. based on file segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/28—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from or digital output to record carriers, e.g. RAID, emulated record carriers, networked record carriers
- G06F3/0601—Dedicated interfaces to storage systems
- G06F3/0628—Dedicated interfaces to storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huang et al. | LW-FQZip 2: a parallelized reference-based compression of FASTQ files | |
Gamaarachchi et al. | Fast nanopore sequencing data analysis with SLOW5 | |
Numanagić et al. | Comparison of high-throughput sequencing data compression tools | |
Durbin | Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT) | |
Layer et al. | Efficient genotype compression and analysis of large genetic-variation data sets | |
Giancarlo et al. | Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies | |
Zhang et al. | Light-weight reference-based compression of FASTQ data | |
Yorukoglu et al. | Compressive mapping for next-generation sequencing | |
Tabari et al. | PorthoMCL: parallel orthology prediction using MCL for the realm of massive genome availability | |
US20110246505A1 (en) | File generation and search methods for data search, and database management system for data file search | |
Li et al. | A self-contained and self-explanatory DNA storage system | |
Bonfield | CRAM 3.1: advances in the CRAM file format | |
Messih et al. | Protein domain recurrence and order can enhance prediction of protein functions | |
Li et al. | HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads | |
Dufresne et al. | The K-mer File Format: a standardized and compact disk representation of sets of k-mers | |
El Allali et al. | MZPAQ: a FASTQ data compression tool | |
Shibuya et al. | Space-efficient representation of genomic k-mer count tables | |
Shibuya et al. | Better quality score compression through sequence-based quality smoothing | |
Tang et al. | KCOSS: an ultra-fast k-mer counter for assembled genome analysis | |
Xu et al. | RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches | |
Déraspe et al. | Flexible protein database based on amino acid k-mers | |
Ogasawara et al. | Sam2bam: High-performance framework for NGS data preprocessing tools | |
Zhong et al. | GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data | |
Mason et al. | Standardizing the next generation of bioinformatics software development with BioHDF (HDF5) | |
Yao et al. | Parallel compression for large collections of genomes |