Verma et al., 2024 - Google Patents
Automatic image caption generation using deep learningVerma et al., 2024
View PDF- Document ID
- 2300853860263171345
- Author
- Verma A
- Yadav A
- Kumar M
- Yadav D
- Publication year
- Publication venue
- Multimedia Tools and Applications
External Links
Snippet
Image captioning is an interesting and challenging task with applications in diverse domains such as image retrieval, organizing and locating images of users' interest, etc. It has huge potential for replacing manual caption generation for images and is especially suitable for …
- 238000013135 deep learning 0 title description 10
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Visual to text: Survey of image and video captioning | |
Li et al. | Know more say less: Image captioning based on scene graphs | |
CN108628828B (en) | Combined extraction method based on self-attention viewpoint and holder thereof | |
CN112000818B (en) | Text and image-oriented cross-media retrieval method and electronic device | |
Gupta et al. | Integration of textual cues for fine-grained image captioning using deep CNN and LSTM | |
Xiao et al. | Dense semantic embedding network for image captioning | |
Verma et al. | Automatic image caption generation using deep learning | |
Biswas et al. | Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking | |
Li et al. | Bundled object context for referring expressions | |
Salur et al. | A soft voting ensemble learning-based approach for multimodal sentiment analysis | |
Su et al. | Hierarchical deep neural network for image captioning | |
Guo et al. | Implicit discourse relation recognition via a BiLSTM-CNN architecture with dynamic chunk-based max pooling | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
Huang et al. | An effective multimodal representation and fusion method for multimodal intent recognition | |
Merkx et al. | Learning semantic sentence representations from visually grounded language without lexical knowledge | |
Pande et al. | Development and deployment of a generative model-based framework for text to photorealistic image generation | |
CN115730232A (en) | Topic-correlation-based heterogeneous graph neural network cross-language text classification method | |
Dai et al. | Visual relationship detection based on bidirectional recurrent neural network | |
Cao et al. | Visual question answering research on multi-layer attention mechanism based on image target features | |
Xie et al. | Extractive text-image summarization with relation-enhanced graph attention network | |
Sharma et al. | Graph neural network-based visual relationship and multilevel attention for image captioning | |
Liu et al. | A multimodal approach for multiple-relation extraction in videos | |
Al-Shamayleh et al. | A comprehensive literature review on image captioning methods and metrics based on deep learning technique | |
Zhang et al. | Chinese-English mixed text normalization | |
Zhan et al. | Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding |