Verma et al., 2024 - Google Patents

Automatic image caption generation using deep learning

Verma et al., 2024

Document ID: 2300853860263171345
Author: Verma A; Yadav A; Kumar M; Yadav D
Publication year: 2024
Publication venue: Multimedia Tools and Applications

External Links

Cited by

Snippet

Image captioning is an interesting and challenging task with applications in diverse domains such as image retrieval, organizing and locating images of users' interest, etc. It has huge potential for replacing manual caption generation for images and is especially suitable for …

Continue reading at www.researchsquare.com (PDF) (other versions)

238000013135 deep learning 0 title description 10

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Li et al.	2019	Visual to text: Survey of image and video captioning
Li et al.	2019	Know more say less: Image captioning based on scene graphs
CN108628828B (en)	2022-04-01	Combined extraction method based on self-attention viewpoint and holder thereof
CN112000818B (en)	2023-05-12	Text and image-oriented cross-media retrieval method and electronic device
Gupta et al.	2020	Integration of textual cues for fine-grained image captioning using deep CNN and LSTM
Xiao et al.	2019	Dense semantic embedding network for image captioning
Verma et al.	2024	Automatic image caption generation using deep learning
Biswas et al.	2020	Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking
Li et al.	2018	Bundled object context for referring expressions
Salur et al.	2022	A soft voting ensemble learning-based approach for multimodal sentiment analysis
Su et al.	2020	Hierarchical deep neural network for image captioning
Guo et al.	2019	Implicit discourse relation recognition via a BiLSTM-CNN architecture with dynamic chunk-based max pooling
CN114818717A (en)	2022-07-29	Chinese named entity recognition method and system fusing vocabulary and syntax information
Huang et al.	2023	An effective multimodal representation and fusion method for multimodal intent recognition
Merkx et al.	2019	Learning semantic sentence representations from visually grounded language without lexical knowledge
Pande et al.	2021	Development and deployment of a generative model-based framework for text to photorealistic image generation
CN115730232A (en)	2023-03-03	Topic-correlation-based heterogeneous graph neural network cross-language text classification method
Dai et al.	2020	Visual relationship detection based on bidirectional recurrent neural network
Cao et al.	2021	Visual question answering research on multi-layer attention mechanism based on image target features
Xie et al.	2023	Extractive text-image summarization with relation-enhanced graph attention network
Sharma et al.	2022	Graph neural network-based visual relationship and multilevel attention for image captioning
Liu et al.	2022	A multimodal approach for multiple-relation extraction in videos
Al-Shamayleh et al.	2024	A comprehensive literature review on image captioning methods and metrics based on deep learning technique
Zhang et al.	2014	Chinese-English mixed text normalization
Zhan et al.	2022	Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding