It is our great pleasure to welcome you to the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice- McGE 2024
We believe that this workshop will provide a valuable platform for researchers and practitioners to discuss and exchange ideas on the latest advancements, challenges, and opportunities in the rapidly evolving field of multimedia content generation.
Proceeding Downloads
McGE '24: The 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods & Practice
This workshop aims to explore key topics in the multimedia field, focusing on multimedia content generation, quality assessment, dataset creation, and construction. These topics are essential for the growth and advancement of the multimedia domain. ...
Jointly Text Region and Stroke Modeling for Scene Text Removal
Scene text removal has been widely applied in various applications due to its remarkable progress. Most of the previous methods apply text regions or strokes individually to provide text location information which exhibit distinct advantages and ...
Text-guided Multi-Task Image Aesthetic Quality Assessment
In the realm of image aesthetic quality assessment, additional tagging information, such as scene classification, photographic style, and aesthetic attributes, embodies a wealth of aesthetic connotations. The textual descriptions and visual features ...
Spatial and Channel Squeeze & Excitation in Adapting Vision Transformers for Temporal Action Localization
Transformer-based methods have achieved impressive performance on temporal action localization (TAL). Although this achievement is attributed to the multiheaded self-attention (MSA) mechanism, there is still a lack of systematic understanding. ...
High Quality Fire Smoke Dataset: A Benchmark for Fire and Smoke Detection
In this paper, we present the High Quality Fire Smoke Dataset(HQFSD), a new comprehensive fire and smoke dataset tailored for training and evaluating fire detection algorithms. It currently comprises 12,166 meticulously selected images sourced from over ...
SAFormer: An Efficient Hierarchical Transformer Network Specialized for Temporal Action Detection
Temporal action detection (TAD) is a critical task of multimedia video understanding, focusing on accurately predicting the starting and ending times of action instances along with their classification. Most previous works have relied on two-stage ...
Attention Mixture Network for Crowd Counting via Binarization Transfer
Crowd counting endeavors to estimate the numerical count of individuals present within an image depicting a gathering of people. In recent years, there has been notable and gradual advancement in the realm of crowd counting, driven by the integration of ...
RecipeSD: Injecting Recipe into Food Image Synthesis with Stable Diffusion
In this paper, we introduce RecipeSD, a novel approach for food image synthesis using Stable Diffusion, enhanced by integrating recipe text information. RecipeSD leverages a pretrained recipe encoder from a cross-modal retrieval task to extract ...
Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score Labels
For aesthetic attribute evaluation of images (AAEI), the annotation of image aesthetic attribute scores plays an important role. It requires experienced artists and professional photographers, which hinders the collection of large-scale fully-annotated ...
BrandDiffusion: Multimodal Personalized Marketing Visual Content Generation
Creating visual content such as product advertisements for marketing purposes has attracted research attention recently. Traditionally, such visuals showcase the product against a specific backdrop while adhering to a consistent corporate style to ...
Index Terms
- Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice