Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3613904.3642794acmconferencesArticle/Chapter ViewFull TextPublication PageschiConference Proceedingsconference-collections
research-article
Open access

CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI

Published: 11 May 2024 Publication History

Abstract

Graphic designers often get inspiration through the recombination of references. Our formative study (N=6) reveals that graphic designers focus on conceptual keywords during this process, and want support for discovering the keywords, expanding them, and exploring diverse recombination options of them, while still having room for designers’ creativity. We propose CreativeConnect, a system with generative AI pipelines that helps users discover useful elements from the reference image using keywords, recommends relevant keywords, generates diverse recombination options with user-selected keywords, and shows recombinations as sketches with text descriptions. Our user study (N=16) showed that CreativeConnect helped users discover keywords from the reference and generate multiple ideas based on them, ultimately helping users produce more design ideas with higher self-reported creativity, compared to the baseline system without generative pipelines. While CreativeConnect was shown effective in ideation, we discussed how CreativeConnect can be extended to support other types of tasks in creativity support.
Figure 1:
Figure 1: Three main features of CreativeConnect that support reference recombination. (Left) Extraction of 4 different types of keywords from the reference image. (Center) The mood board shows the reference images with the user-selected keywords and keyword recommendations. (Right) Merging selected keywords to generate diverse recombination options and showing them as sketches and descriptions.

1 Introduction

References play a crucial role in creative thinking, such as graphic design, serving as valuable sources to both grasp the landscape of the existing ideas and ignite novel ones [31, 52, 62, 73]. They offer diverse visual, conceptual, and functional stimuli, allowing individuals to explore various creative directions and draw lessons from established successful examples [3]. One effective method to generate new ideas with references is making a combination of existing examples, which is often called combinatorial creativity [7, 9, 79, 87]. In practice, this is often done through reference recombination, which is the process of extracting the elements or aspects from multiple references, considering connections between them [31], and blending those to gain novel design ideas [2].
However, each step of recombination requires significant effort from designers. To discover sources for recombination, designers need to dissect the references into individual elements and analyze them to determine which combinations of elements are worth mixing. Additionally, they must engage in exploratory efforts by drawing multiple sketches to find effective methods of blending those elements into a new design idea. This takes a long time and multiple iterations, especially for those less experienced in the design process, as they have difficulty identifying various factors from references and integrating references from disparate domains compared to professionals [4].
Previous research has provided support for these individual steps. Several approaches have been proposed [35, 36, 39] to decompose the references or show connections between them, aiding users in identifying the sources for recombination. However, these approaches do not guide how to incorporate extracted elements into a design. Also, many approaches have attempted to help users blend different concepts or images into a novel one [12, 13, 84, 91]. However, these approaches primarily emphasize generating precise combinations that effectively incorporate all elements harmoniously rather than aiming to produce diverse combinations for creative exploratory purposes. Another thread of research focuses on searching by genetic recombination [14, 45, 82, 88]. Still, these techniques focused on widening the range of the design exploration rather than offering inspiration on effectively combining specific design elements.
Through a formative study with six early-stage graphic designers and design students, we aimed to understand the process of reference recombination and identify their challenges. There were two distinctive stages of ideation: 1) conceptual ideation, which aims to convey the design topic effectively, and 2) visual ideation, which is about deciding style-wise details on top of the selected concept. We decided to focus on the conceptual one as the recombination of references tends to be more prevalent in this stage. During the conceptual ideation, designers extracted four types of elements from the reference—subject matter, action & pose, theme & mood, and compositional aspects (arrangement). Then, they tried to brainstorm more elements related to the extracted ones and combine them in several ways. However, due to the high effort required to recombine them manually, they were concerned that they could not try out all possible methods. They also mentioned that support for ideation should not be in an overly completed form as it can diminish their own input. With these observations, we propose four design goals for a reference recombination support system: (1) enable users to effortlessly specify the four types of conceptual elements from the reference image, (2) recommend relevant elements, (3) provide many recombinations as much as possible, and (4) intentionally keep the generated output partially unfinished to foster user’s creativity.
Based on the design goals, we propose CreativeConnect, a system that supports the design ideation process by helping users easily extract elements from the reference images and generate a wide range of recombinations of those elements. Using CreativeConnect, users can easily discover and select elements from the reference image based on the four element types and get recommendations for more relevant keywords. Once the user has chosen the keywords to combine, they get various recombination options presented as pairs of sketch images and one-line descriptions. We introduced novel pipelines with generative models to automate the extraction of keywords from images, generate recombination options, and transform them into descriptions and sketches.
We conducted a within-subjects study with 16 design students to compare CreativeConnect with the baseline system, which consisted of a mood board with manual keyword notes, layout diffusion model, and ChatGPT. Results showed that CreativeConnect could support both stages of the reference recombination process, discovering elements from references and generating design ideas by recombining them. Participants could also produce more design ideas in a given time and perceived that CreativeConnect helped them develop more creative sketches than the baseline. They emphasized that CreativeConnect was especially beneficial for getting inspirational ideas vastly different from their initial concepts. We compared the creativity support of CreativeConnect with the baseline and proposed an opportunity to design a comprehensive recombination support tool that could support a broad spectrum of design needs and situations. We also found that the low fidelity of sketch-based output led users to imagine more and get more stimulus for their creativity. Finally, we discussed the generalizability of CreativeConnect in terms of user expertise, collaborative settings, and different domains of design.
This paper presents the following contributions:
(1)
CreativeConnect, a system that supports graphic designers’ ideation process by helping extraction of elements from reference images and suggesting a wide range of recombinations of those elements.
(2)
Computational pipelines with generative models that extract and suggest keywords from images and generate recombinations of keywords in text descriptions and sketches.
(3)
Findings from a user study (N=16) about how CreativeConnect can aid designers in each step of recombination, leading to the generation of more design ideas and participants to perceive their ideas as more creative.

2 Related Work

This work aims to support designers in their reference recombination process for creativity. In this section, we review previous literature on (1) how references are used in graphic design ideation, (2) how recombination is employed for creative thinking, and (3) previous generative AI approaches for creativity.

2.1 Reference in Graphic Design Ideation

The creative process begins by collecting relevant inspirational materials from various sources [22, 77]. Designers leverage these collected examples to gain a comprehensive understanding of the problem space. As the process advances into idea generation, these compiled examples play a pivotal role in fostering creativity, igniting new ideas through analogical thinking [29, 34]. Recognized as one of the most challenging phases in the entire design process, previous research on creativity-supporting tools has extensively concentrated on enhancing this ideation step [25]. Previous research demonstrated that designers get valuable insights and inspirations in different ways [31], and many studies have delved into the significance of these references in design thinking, showing their potential to stimulate creativity and innovation [3, 78].
One of the primary approaches to support idea generation with references is to help designers see diverse references. Exploring diverse ideas is crucial in terms of preventing fixation [37], in which a designer becomes overly fixated on a single concept, potentially hindering creativity and innovation. Therefore, Zhang et al. [89] have utilized a Generative Adversarial Network (GAN) for exploring diverse images, while Matejka et al. [60] developed the Dream Lens to assist in exploring generative 3D design solution space.
Another avenue of research is to help designers manage their inspirations drawn from references, particularly through the use of mood boards [22]. Prior research demonstrated that building a mood board can enhance the comprehension and interpretation of ephemeral elements in design [27], which is beneficial for both defining and resolving design challenges [8] and ultimately leading to a boost in creativity [58]. Therefore, many computational systems have been proposed to help designers to build interactive mood boards, such as Funky Wall [59], SemanticCollage [49], and May AI [48].
While this paper primarily focuses on the recombination of references, we have integrated two significant insights from prior research about design references. First, we emphasize the importance of offering users diverse images to support their creative processes. Second, we have incorporated the concept of a mood board as a valuable tool for organizing references within our system.

2.2 Recombination for Creative Thinking

In the creative thinking process, new ideas often come through the combination of the existing examples [7, 79]. It was shown that creativity often arises from forging new associations among previously unrelated frames [50]. This process includes two crucial components: recognizing the differences between existing concepts and blending them [2, 64]. Also, the diversity of the given examples is important for building novel associations between them during this process [63]. Observations of designers’ creative processes showed that designers often maintain multiple small components and keep employing them to generate new variations through a process akin to recombination [24]. Many computational systems were also proposed for building recombinations and verified to be effective in tasks such as chair design [87] or text-based ideation [9].
One practical implementation of this concept in terms of design ideation is genetic exploration. Genetic exploration involves generating novel solutions by merging elements from preexisting designs to widen the range of references. This approach has been applied in diverse domains such as garden design [45], 3D modeling [14, 70], architecture [82], and 2D graphics [88]. However, these approaches primarily aim to enrich the reference in the information-gathering stage by utilizing existing references rather than supporting designers to generate their own ideas from those recombinations in the next stage.
In recombination, it is also critical to decompose the reference and get elements that are worth combining. Therefore, several tools have been developed to facilitate this process, especially by automatically decomposing the original source and showing the fine-grained aspects. CollageMachine [44] decomposes websites and makes them into an interactive collage. MetaMap [39] provides a decomposed view of the reference image into three dimensions (semantic, color, and shape) and lets users explore more references using it. Hope et al. [35] divides the product’s information into fine-grained functional parts, allowing users to combine the inspiring part. MoodCubes [36] offers a new mood board experience by decomposing multimedia references into constituent elements and using it to provide suggestions for new inspirational materials. They may not, however, directly discuss the exact strategies for merging these outputs as a new design idea. On the other hand, VRicolage [80] enables users to decompose objects into different parts, motions, or colors, and mix them. However, this process was more of utilizing collected assets rather than generating a new idea from recombination.
Additionally, many previous approaches supported the process of mixing the reference images or concepts. For example, VisiBlends [12] and VisiFit [13] introduced a novel pipeline to blend two objects to convey integrated meaning. ICONATE [91] supports users to generate a new icon by mixing different icons, and PopBlends [84] automatically suggests conceptual blends of reference images. FashionQ [38] supports this blending in the domain of fashion design. Artinter [15] supports recombining style elements from the reference to facilitate communication. Nevertheless, these approaches primarily focus on seamlessly merging entire references rather than breaking them down to the element level. This approach may not fully align with the creative recombination process, which often begins by identifying specific elements to combine within the provided examples. 3DALL-E [57] presents a recombination workflow for generating a new idea, which suggests diverse low-level keywords and combines them into a prompt for text-to-image models. This approach, however, differs from our definition of reference recombination as the keywords are from LLM’s understanding of the world rather than the design references.

2.3 Generative AI Approaches for Creativity

Before looking into AI systems for creativity, it’s important to know how visual designers perceive AI for supporting their design tasks. Ko et al. [47] looked into how graphic designers use large-scale text-to-image generation models (LTGMs) to help with their creative works and suggest the design guidelines for building creative supporting systems using them.
Recently, diffusion-based techniques [65, 72, 74] and CLIP embedding [71] enable people to represent their ideas to visual materials quickly and easily using text prompting. There were also many previous approaches to incorporate inputs with additional modalities, such as layout [10, 55, 92] or sound [53, 81]. Techniques to add extra conditions and styles for more granular control have been proposed as well [61, 90]. There is also a thread of research on modifying the generated image to align with user intent better, such as adding style [28], latent-space manipulation [42, 43], human-prompt editing [6], and editing a specific part in generated images [26, 75].
With those novel ML techniques, the creative landscape is continuously reshaped, offering innovative solutions and enriching the artistic experience. Promptify [5] stands out as an iterative prompt refine tool, letting users get closer to their intended result by clearing unintended outcomes. PromptPaint [16] allows users to go beyond language to mix prompts to express challenging concepts, supporting the iterative shaping of the image. On the other aspect, the interplay between humans and AI is also fast evolving. The concept that Karimi et al. [40, 41] proposes is a generative AI system that helps designers by collaborating during the design phase instead of taking over the design process. Oh et al. [66] and Framer [51] proposed a user-AI collaborative interface to allow a co-drawing experience.
While there has been a lot of research on expressing user intention to ML models accurately to get a better image or collaborate with AI during the design execution phase, it is less relevant to the ideation task of expanding the variety of ideas. Specifically, it still needs to be discovered how to design interaction with generative AI models to inspire graphic designers by recombining the references.

3 Formative Study

We conducted a formative study to understand how designers recombine design references for ideation and what challenges they encounter during the process.

3.1 Participants

The prior research [4] suggested that the less experienced tend to encounter more challenges in getting inspiration from references and combining them. Therefore, we targeted early-stage designers as they are expected to have an understanding of the overall design process but still struggle with many challenges in ideation through recombination compared to professional designers. We defined an early-stage designer as someone who got a design education in university or has less than 3 years of professional experience as a designer.
Six participants (6 female; age M=25.3 and SD=3.32) were recruited through an online recruitment posting. Two were professional UI/UX designers with 1 year of experience each, and one was a freelance brand designer with 3 years of experience. Three were students majoring in industrial design, with two at the graduate level and one in their fourth year of undergraduate studies. All participants reported that they had experience in at least three different graphic design projects before.

3.2 Study Process

The study included (1) an observation on reference searching and idea sketching and (2) a semi-structured interview. For the first part of the study, participants were asked to draw an illustration for one of three different design topics they chose: "Tourism service for kids," "Pet grooming service," or "Eco-friendly restaurant." They were first given 10 minutes to search for reference images that they wanted to use. For each reference they chose, they were asked to describe what aspects of the references they found appealing. Then, participants sketched their design ideas using their preferred method for 30 minutes. Three participants used pen and paper to sketch their ideas, while the other three used a tablet and digital drawing software. They were asked to generate at least three distinct design ideas and describe how they integrated their references into each sketch. After that, we conducted a semi-structured interview to ask about their challenges in generating multiple ideas using references.
After each study session, two authors independently coded the recombination methods the participants employed in their ideation tasks and the semi-structured interview results. The coded data were then discussed collaboratively. After conducting six studies, codes were saturated, and no further study sessions were conducted.

3.3 Findings

Through the observation of participants’ design processes, we discovered that the recombination of different references primarily occurs during the initial stages of design ideation, with a specific focus on the conceptual aspects of the reference images rather than the visual elements. We identified four distinct categories of elements employed in this process. We also found specific challenges associated with it and observed that the system supporting this process should reserve a degree of incompleteness to encourage creativity.

3.3.1 Early-Stage Design Ideation Focuses on Conceptual Aspects.

All six participants said they refer to the references in two distinct stages: conceptual ideation and visual development. During the conceptual ideation stage, designers focus on elements that could effectively convey the design topic, such as objects or mood. After looking at those elements, they generated multiple drafts by combining them in several ways. On the other hand, the visual development stage revolved around adding visual details like color and texture to complete the sketches derived from the conceptual ideation. During this stage, designers often had a clear direction in their mind and referred to a specific set of references that aligned well with their chosen direction, with less emphasis on exploring different recombinations of diverse references. This aligns with findings from previous research [33], which shows that artists engage in a spectrum of reference usage in their creative process, ranging from detailed recreation (visual development) by tracing images to interpretive inspiration for high-level components (conceptual ideation). In summary, designers recombined references primarily for conceptual ideation, which was usually the first step of the design ideation, suggesting that a system supporting the reference recombination process should focus on how to facilitate this early-stage step.

3.3.2 Types of Elements Used for Recombination.

During the conceptual ideation phase, participants tried to extract specific elements from references and incorporate them into their design concepts. They employed a variety of approaches for this. The simplest approach observed in all participants was utilizing objects in a reference in their sketch. For example, for drawing an illustration for "Tourism service for kids," one participant took an image of a paper plane from a reference to convey the image of playful children and tour service at the same time. Five participants extracted the abstract semantic meaning or overall theme conveyed by references. For example, after looking at an image of a person holding a pamphlet and deep in thought, one participant said that the keyword "imagination" could effectively capture the concept of kids. So, they developed a design concept about children imagining various travel destinations. Another approach observed in three out of six participants was to take the action of a character from a reference. For instance, by looking at a reference illustrating an animal and a person holding hands, a participant got the concept of children holding hands together. Lastly, five participants referred to the composition from the reference images. For example, by looking at a reference where leaf shapes were arranged together to form a shovel, one participant came up with the idea of using multiple tree trunk shapes to represent the structure of a building.

3.3.3 Challenges During Finding Elements.

We identified some opportunities to support the process of extracting elements from the references. There were many cases where the elements designers initially found appealing in the reference search phase differed from those they eventually utilized in their design concepts. In the interview, participants said that upon closer examination of the references, they discovered new elements of interest and incorporated them. This means that designers couldn’t immediately extract elements upon viewing the reference, and it often required several examinations to uncover such elements, which was time-consuming.
Another observation was that participants often came up with new keywords based on what they had already found for further brainstorming at the element level. For instance, P3 identified "toy blocks" from one reference and "train" from another reference, then came up with the new keyword "toy train" and incorporated it into their final idea. However, this process was often more challenging than finding elements directly from the reference images. P4 highlighted an opportunity for system support this thinking process by mentioning that “I usually talk with others about my ideas, which leads me to discover new keywords related to the original one. Just like that, I think it would be nice if the system could recommend a new keyword to expand my current design idea.”

3.3.4 Challenges During Recombining Elements.

After finding out the elements from the references they want to utilize in their design ideas, another challenge became apparent. While there can be numerous ways to combine these elements, participants were often frustrated as they couldn’t sketch out all the possibilities to determine if they were viable. Three out of six participants expressed anxiety about not being able to consider all possible combinations. P3 stated, “I always feel anxious that there might be a better way, but I can’t think of it.” P6 also mentioned that “The more options I explore, the more I become confident about my final design idea. I want some faster way to explore alternatives as much as possible.” Four out of six participants said they rely on their imagination to envision numerous recombination possibilities within their minds, as sketching out all is too time-consuming and effortful. However, two participants expressed frustration that, although combinations seemed good in their minds, they might not come together as effectively in actual sketches.

3.3.5 System Support should be Incomplete.

Designers tended to deliberately exclude visual details during conceptual ideation. Participants said that when recombining the references for conceptual inspiration, they did not pay much attention to visual details, and several participants noted that they even needed to exclude those details intentionally. P2 stated, “When combining different concepts, colors and textures often become messy, so I deliberately use the same brush for all elements.” P3 agreed with another viewpoint by expressing concern about becoming overly fixated on frequently recurring visual details while exploring conceptual recombinations. We also asked the participants which form would be preferred if they could get recommendations for different recombination options. Four participants mentioned that they would prefer incomplete outputs, such as a sketch or even a textual description of the idea so that they could focus on the concept itself. The main reason for this was the concern that the model would compromise their creativity or lead them to perform unintentional plagiarism.

3.4 Design Goals

Based on the findings of the formative study, we identified four design goals to build a system to support designers’ reference recombination process during early-stage ideation.
DG 1.
Facilitate Element Extraction from References. To help users efficiently find the elements that would be used for the recombination, the system should help users discover the overlooked elements. Based on our observation, elements that users want to extract from references are (1) subject matters (e.g., objects, characters, landscapes), (2) action & pose, (3) theme & mood, and (4) arrangement.
DG 2.
Suggest Diverse and Relevant Elements. To help users explore more elements on top of what they found from the references, the system should provide some recommendations of relevant elements that users might like.
DG 3.
Generate Diverse Recombination Options. To help users explore diverse recombination possibilities, the system should show users a varied range of recombination options and reduce their anxiety over not considering all feasible combinations. This goal highlights the system’s ability to propose combinations that users might not have considered independently.
DG 4.
Present Recombination in an Incomplete Format. To align with designers’ preference for conceptual sketches over highly detailed artwork during the initial ideation phase, the system-generated outputs should be intentionally incomplete, such as sketches. This emphasizes the importance of allowing users to inject their own creativity into the images.

4 CreativeConnect

With derived design goals, we implemented CreativeConnect (Figure 2), an AI-powered design tool that supports graphic designers in coming up with novel design ideas by recombining reference images in early-stage conceptual ideation. CreativeConnect mainly consists of a mood board where users can import reference images and select what they like about the reference. When the user imports a new image, the system extracts keywords according to the four categories defined in the formative study (Section  3.3.2) so that users can choose among them. This helps users easily discover and select keywords (DG 1). Selected keywords are then displayed on the mood board along with the images. CreativeConnect offers further keyword recommendations based on the keywords users have added to the board or their specific selections (DG 2). Also, when the user chooses a set of keywords to recombine, the system generates multiple drafts with diverse ways of combining them (DG 3). All system-generated recombination outputs are produced in line sketches with one-line descriptions so that users can further reinterpret by themselves (DG 4).
Figure 2:
Figure 2: Screenshot of CreativeConnect. (a) Keyword Extraction Panel: The system automatically extracts keywords in four categories (subject matter, action & pose, theme & mood, and arrangement) from the reference image. Users can select these keywords or add keywords manually. (b) Interactive Mood Board with Keyword Suggestion Panel: Users can organize the reference images along with the selected keywords. Users can import the keywords shown below, which are suggested based on all keywords on the board or the keywords that users selected on this mood board. (c) Keyword Merge Panel: When users select keywords they want to recombine on the mood board, the system generates sketches and their respective descriptions, including all selected keywords. Users can view more generated sketches by clicking the "More Sketches" button.

4.1 User Scenario

To demonstrate our system, we show how Sarah, a junior illustration designer, uses CreativeConnect to generate ideas for her design project. Sarah recently accepted a new commission to draw an illustration for the cover of a children’s book titled, "A Christmas Dinner in the Underwater World." As the given topic is an unusual combination of two themes, she struggled to get inspiration from the references and mix them to come up with ideas, so she decided to explore references with the help of CreativeConnect.

4.1.1 Getting User Inputs on the Design Reference.

Sarah first uploads ten reference images she got from her client into CreativeConnect. Looking through the references, she is intrigued by the one where two scuba divers swim with a turtle. When she chooses the image, CreativeConnect shows some keywords that can be found in the image, divided into four categories – subject matter, action & pose, theme & mood, and arrangement (Figure 2 (a)). As she finds the scuba diver concept interesting, she clicks on the subject matter category. She finds "scuba diver" in the keyword list and clicks it. She also finds "coral reef" on the list, which she didn’t recognize before. She looks at the references again and thinks coral reefs would look great in her illustration, so she clicks "coral reef" as well. Similarly, she looks through the list of the keywords in the "action & pose" and "theme & mood" categories and selects "swimming" and "adventure" from each list. She also likes the overall composition of the image, so she clicks its "arrangement" as well. She also works on selecting keywords that she likes on other references.

4.1.2 Mood Board with the User-selected Keywords & Keyword Recommendation.

As Sarah selects the keywords she finds useful from each image, the canvas of the CreativeConnect offers a dynamic mood board that shows the references with user-selected keywords, capturing her creative goal and preferences (Figure 2 (b)). As she freely moves the images to organize them, the selected keywords move along with the image. By looking at the keywords, Sarah wants to come up with additional ideas for character actions that align with the adventurous theme, similar to swimming or scuba diving. Therefore, she selects "subject matter: scuba diver", "action & pose: swimming", and "theme & mood: adventure" to get system recommendations with these keywords. CreativeConnect shows a set of keywords, such as "action & pose: exploring sunken ship", and "subject matter: anchor". Sarah finds those keywords valuable, so she drags them into the mood board.

4.1.3 Recombining Design References using Keywords.

From the set of keywords on the mood board, Sarah now selects some keywords she wants to include in her design and uses the system to make a first draft. She selects "Christmas tree" and "Santa Claus" for a Christmas dinner theme, and "whale", "swimming", "exploring the sunken ship", and "adventure" for the underwater theme. She also selects the "arrangement" of one of the images with an interesting composition.
After clicking the merge button, CreativeConnect generates three different drafts, each showcasing a unique and different way of incorporating these keywords (Figure 2 (c)). Each draft contains a one-line text description of the image concept and a sketch-style image generated based on the description and the arrangement that Sarah selected. She appreciates the results as the way each draft combined keywords would be difficult to think of by herself and that all three drafts are distinct from each other. Also, the sketch format allows her to imagine further design concepts rather than fixating on the concept and details in the generated results.
Among the drafts, Sarah finds one description interesting: "Santa Claus goes on an underwater adventure on a sled pulled by a whale." However, she feels dissatisfied with the generated sketch and presses the "More Sketches" button. Then, CreativeConnect generates five more sketches with the same description but in a slightly different way. She gets some good design ideas from the new sketches and starts working on her draft.

4.2 Technical Details

CreativeConnect was built as a web-based system with a ReactJS1-based front-end client and a Flask2-based back-end server. We implemented ML pipelines for extracting the keywords from the references and merging keywords into recombinations. The technical details of these pipelines are discussed in the following sections. Some examples of outputs from the pipeline are presented in Figure  9 in the Appendix.
Figure 3:
Figure 3: Technical pipeline of CreativeConnect. (a) Keyword extraction from image: The caption generated from the image captioning model goes into the LLM to extract subject matter, action & pose, and theme & mood. The segmentation model is used to detect the image’s arrangement. (b) Keyword-based image generation: the LLM generates descriptions based on the given keywords, and the layout variator generates similar arrangements. The image generation model generates the image, and the style transfer model converts this into a sketch.

4.2.1 Extracting Keywords from Reference Images (Figure 3. (a)).

Based on the findings from our formative study, our pipeline is designed to extract keywords from a provided reference image in four categories: subject matter, action & pose, theme & mood, and arrangement. To achieve this, we follow a multi-step process.
To identify the subject matter, action & pose, and theme & mood within the image, we employ an image captioning model BLIP-2 [54] to generate textual descriptions of the image contents. For a comprehensive understanding of the entire image, we divide it into 3 × 3 segments and generate captions for each segment as well as the whole image. These captioning results are then processed by GPT-4 [68], a Large Language Model (LLM), to extract lists of subject matter, action & pose, and theme & mood present in the image captions. Prompts used for this are in Appendix A.1.
For the arrangement, we utilize the Segment Anything model [46] to generate segments and then identify the top ten prominent segments within the image using the approach from LLM-grounded Diffusion [56]. Bounding boxes around these segments provide information about the image’s overall structure, such as where the items are placed and where large negative spaces are.
Additionally, for generating the recommendations of the relevant keywords, we use GPT-4, and the prompts used for this are shown in Appendix A.2.

4.2.2 Generating Recombinations (Figure 3. (b)).

When the user selects a set of keywords to generate a new recombination, our system generates a range of options to mix those keywords.
The system first generates three textual descriptions encompassing the selected subject matter, action & pose, and theme & mood keywords. Then, it extracts the list of the objects that must be drawn on the image for this description. We use few-shot prompting with GPT-3.5-turbo [67] for this. The prompt used is in Appendix A.3. For the arrangement, we developed a layout variator to create layouts similar to the selected image’s arrangement while aligned with the generated text description. The layout variator first applies an empirically defined random variation of -50 to 50 pixels on each bounding box component (i.e., x, y, w, h) in the original arrangements. Then, it randomly selects boxes depending on the number of objects that need to be drawn and sorts highly similar layouts first using the similarity metric. The similarity is calculated by summing the IoU and the complement of the min-max normalized centroid distance between the closest pairs of bounding boxes. Following this similarity, the top five arrangements are utilized for the recombination generation. The most similar layout is used for generating the image in the initial iteration, and other layouts are used when the user requests more sketches. A few shot prompting with GPT-3.5-turbo is used to map between the arrangements and the objects to create the best image possible. The full prompt is shown in Appendix A.4.
However, when the user does not select any arrangement from the references, the system generates a broader range of diverse layout options. A few-shot prompting pipeline using GPT-3.5-turbo generates the three most appropriate layouts for the given text description and object list. This pipeline is built based on the previous work [56], and the full prompt for this is in Appendix A.5.
Given the textual description and the list of the objects mapped with the generated layout, the system generates images with a layout diffusion model [55]. Following our design goal, the system converts the generated image into a simple line sketch using the U-Net structured style transfer model [20].

4.3 Technical Evaluation

We evaluated ML-based pipelines, especially for keyword extraction, keyword recommendation, and textual description generation by merging keywords.

4.3.1 Keyword Extraction Pipeline.

We built a dataset of 100 images with tags categorized by the subject matter, action & pose, and theme & mood. We asked 20 people with expertise in design or HCI to annotate five images each. On average, 5.03, 1.87, and 2.29 keywords in the category of subject matter, action & pose, and theme & mood, respectively, were collected per image.
Using this dataset as ground truth, we evaluated the prediction result from the keyword extraction pipeline. Keywords in subject matter and action & pose categories were matched manually one by one between similar ones. The precision and recall of our pipeline were 94.2% and 58.2% in subject matter, and 35.3% and 51.3% in action & pose. Although some salient keywords in the dataset were missing, the pipeline provided quite accurate keywords in the subject matter. The predicted action & pose keywords were not perfectly aligned with the dataset tags, but they were still acceptable on the user side because they were perceived as similar to users even if they were not completely accurate (e.g., for an image of a cat standing straight, our pipeline predicted "stretching arms", while the ground-truth is "dancing"). For theme & mood keywords, we calculated the cosine similarity of mean embedding vectors [85] of ground-truth and predicted result to compare the semantic similarity. This was because for theme and mood, even if words are not exactly the same, there can be many other words that can be accepted as similar. The similarity of the ground truth and prediction was 0.826, which means the keyword extraction model estimates the theme & mood words quite closely. Examples of the predictions are presented in the Appendix (Figure  10).

4.3.2 Keyword Recommendation Pipeline.

We evaluated the keyword recommendation pipeline based on whether there was a proper level of similarity between the original keywords and the recommended keywords. This was because it would only be effective if the recommendations were not too similar or irrelevant to the original keywords.
We randomly sampled three to ten keywords from each image-keyword pair in the dataset and made 100 sets of keywords. Then, from the pipeline, we got the recommendations for each set. To verify whether these recommendations have a proper range of diversity, we generated two comparison groups of keywords: the irrelevant group and the synonym group. The irrelevant group consists of random keywords from the dataset, and the synonym group is generated by paraphrasing the keyword in each set. NLTK [1] and GPT-3.5 were employed to find synonyms. Then, we used the text embedding [85] to calculate the cosine similarity of each group with the original keywords.
The similarity of the irrelevant and synonym groups to the original keywords was 0.624 and 0.774, respectively, and the recommended keywords had a similarity of 0.696, which is in the middle. This shows that our recommendations are less similar to original keywords than the synonym group but more similar than the irrelevant group.

4.3.3 Recombination Generation Pipeline.

The recombination generation pipeline gets a user selection of a set of keywords and generates three different descriptions of the possible image that includes those keywords. As the pipeline aims to provide diverse options, we evaluated the diversity of the description generation model.
Similar to section 4.3.2, we built 100 sets of keywords randomly extracted from the dataset. We generated three descriptions using our pipeline for each set, calculated the cosine similarities between those three, and averaged them. Here, we calculated diversity as 1 − similarity. To validate our description generator, we prepared two more description sets, one consisting of explicitly unrelated descriptions randomly acquired from the dataset, and the other one consisting of descriptions that merely paraphrase one of the generated descriptions using paraphraser with T5-based model [83]. The diversity within the random and paraphrased groups was 0.801 and 0.209, respectively, while the generated descriptions from our pipeline show a diversity of 0.395. This indicates that generated output is more diverse than just paraphrasing and less diverse than random ones, which means that the pipeline generates descriptions of a reasonable amount of diversity.
We didn’t evaluate the later part of this pipeline, which is about generating images and transforming them into sketches, as we used models from previous research [56] without any customization or adaptation.

5 Evaluation

We conducted a within-subjects comparative study with 16 participants. As our design goals encompassed two steps of the reference recombination – (1) Finding elements (DG 1 and DG 2) and (2) Recombining elements (DG 3 and DG 4), we first observed how CreativeConnect supported each of these steps. We also evaluated whether CreativeConnect eventually improves designers’ idea generation results and how it supports the creative process.
RQ1. How does CreativeConnect support the two steps of the recombination process—finding elements from the references and recombining elements?
RQ2. Can CreativeConnect help users generate better quality and quantity of design ideas?
RQ3. How do users utilize the output of CreativeConnect in their ideation process?
The baseline system shared a similar interface with CreativeConnect but without the key features of CreativeConnect—extracting keywords from the reference, suggesting relevant keywords, and generating recombination options. In this baseline system, users could manually leave keyword notes on each reference image, create sketches by specifying layouts and prompts to the image generation model, and use ChatGPT3. To assess the efficacy of the design of CreativeConnect’s features and pipelines rather than the effect of AI functionalities, the same AI functionalities are also included in the baseline system. After observing prevalent use cases of AI in design processes through recent survey [69] and videos [21, 76, 93], we included both the language model and the image generation model in the baseline system to simulate real-world scenarios of designers with AI tools. The baseline included a model closely aligned with the CreativeConnect pipeline to prevent the model performance from affecting the study results. Instead of the GPT models, we provided GPT-3.5-based ChatGPT, and for image generation, we offered the same layout diffusion model as CreativeConnect. The screenshot of the baseline interface is presented in Figure 11 in the Appendix.

5.1 Participants

We recruited 16 participants (10 females, 6 males; age M=24.81 and SD=3.78) through an online recruitment posting. To determine whether the CreativeConnect can handle the challenges found in the formative study with early-stage designers, our participants were set as a group similar to the formative study. We required participants to have a degree in design or art and have participated in at least three different design projects. 11 participants were students majoring in design—5 were at the graduate level, and 6 were at or above the third-year undergraduate level. The other 5 participants have graduated—2 majored in design, 1 minored in design, while others pursued majors in media arts and painting.
All participants also reported having enough sketching skills since we asked them to draw their design ideas during the task. The study was conducted for 2 hours, and we compensated participants with 70,000 KRW (approximately 53 USD).

5.2 Study Procedure

Figure 4:
Figure 4: User study process. The 2-hour user study consists of two sessions with different tools, each including a 30-minute ideation phase utilizing the given tool. The order of the tool and the design tasks are counterbalanced. After the two sessions, they had 20-minute semi-structured interviews about their experiences.
The whole process of the user study is shown in Figure 4. Participants were asked to perform design ideation tasks twice in two settings: CreativeConnect and baseline. The task was to draw an illustration for the cover of a fictional children’s book, "Starry Safari: Exploring Alien Jungles" or "A Christmas Dinner in the Underwater World". They were also provided with 10 reference images for each topic. The order of topics and tools was counterbalanced for each participant.
For the first five minutes of each round, participants had a tutorial on the given system and tried it out with sample images to get used to it. They were then given the topic and the reference images and started ideation using the tool for 30 minutes. If the participants came up with a design idea they wanted to develop further, they sketched it on the paper using a pen. After each round, they completed the post-task survey. Between the two rounds, they could get a 10-minute break. After both rounds, we conducted a 20-minute semi-structured interview to ask about the difference between the two conditions and the effect of the tools on their ideation process. The interview questions are in Appendix B.2.

5.3 Measures

The survey after each round included questions about the usefulness of the given system for the different steps of the ideation: organizing the references, discovering useful elements from the reference, exploring multiple ideas, discovering new ideas, and exploring multiple ideas. The survey also included five questions about satisfaction with participants’ sketch results regarding overall outcome, quantity, quality, diversity, and creativity. We also had five questions from [86] to assess participants’ self-perceived experience using the AI system. Participants answered these questions for the image generation feature and ChatGPT after the baseline session, and for the keyword extraction, keyword recommendation, and image/description generating features after the CreativeConnect session. Also, the survey included the Creativity Support Index [11] and NASA-TLX questionnaire [30].
We also gathered the usage logs (i.e., participant actions with timestamps) to get quantitative metrics for user behaviors. We used this data to calculate the time taken for each sketch, the number of images generated, the number of inputs provided to the image-generating model, etc. Also, every time the participants completed the sketch, the system prompted participants to rate how well the given tool assisted them in producing the idea.
Additionally, we conducted an expert evaluation of the participant’s sketches. We recruited two experts with bachelor’s degrees in art and had 6 and 1.5 years of experience teaching art each. We asked them to evaluate two factors in the 7-point Likert scale: (1) the creativity of each sketch and (2) the diversity of ideas within a set of sketches. We randomly chose three sketches drawn by each participant on each design topic, and a total of 96 sketches (3 sketches x 16 participants x 2 conditions) were evaluated. The evaluators rated the sketches individually, and for cases of significant score differences (more than 3 points), we asked evaluators to re-evaluate them. While re-evaluating, they were given each other’s comments and scores and could choose to change their original score or leave it. They also had to leave comments about their decision as well. There were 9 sketches that required re-evaluation, and all of the conflicts were resolved after one round of re-evaluation. After that, we used the average score for the two evaluators’ scores for the result analysis.

6 Results

Results showed that CreativeConnect helped participants both find and recombine elements for reference recombination. Also, it was shown that users with CreativeConnect could generate more design ideas in a given time and perceived their ideas as more creative compared to the baseline. We also found some differences between CreativeConnect and baseline regarding how users utilize the tool for their creative process.
Figure 5:
Figure 5: Survey results on the user-perceived efficiency during each recombination step with a 95% confidence interval. CreativeConnect was significantly helpful in discovering elements from the reference image and generating multiple ideas.

6.1 Support for Different Recombination Steps

To answer RQ1, we examined survey questions and log analysis results divided into two steps of reference recombination: (1) discovering keywords from the reference images and (2) recombining the found elements into a new concept. We used a Wilcoxon signed-rank test for all survey questions, as they were ordinal data on a 7-point Likert scale. For the usage log analysis, we conducted a two-sample t-test or two-sample paired t-test to compare between CreativeConnect and baseline.

6.1.1 Finding Keywords from the Reference.

Participants perceived that CreativeConnect helped discover valuable keywords from the given reference images. As shown in Figure 5, participants found out that CreativeConnect is significantly more helpful in discovering valuable elements from the reference that can be used for their ideation (M=6.13, SD=1.31) compared to the baseline system (M=3.75, SD=1.98 / p=0.001, W=0.0). Regarding how CreativeConnect and baseline helped with organizing references, the rating was not significantly different, but with a slightly higher average rating for CreativeConnect (Baseline: M=4.38, SD=1.96 / CreativeConnect: M=5.31, SD=1.58 / p=0.121, W=23.5).
Usage logs also showed that CreativeConnect effectively encouraged participants to explore and extract different keywords. In comparing the numbers of the keyword notes that participants left in both conditions using a two-sampled paired t-test, participants with CreativeConnect added more keyword notes (M=34.69, SD=10.74) compared to the baseline system (M=13.19, SD=10.53 / p<0.0001, t=5.52). Also, as shown in Figure 6, participants with baseline typically extracted keywords exclusively during their initial sketch, thereafter relying solely on the previously extracted keywords without actively discovering additional keywords. In contrast, participants using CreativeConnect consistently added more keywords throughout the whole process. While they also extracted the most keywords at the beginning, they continued to extract new keywords from references for every new sketch. One participant (P15) drew all sketches in one go after developing multiple design ideas, instead of sketching immediately after formulating each idea. As we cannot match keyword notes with each specific sketch in this case, this data was omitted from this analysis of actions associated with each sketching instance. All participants’ raw usage log data, including P15, is provided in the Appendix B.3.
Figure 6:
Figure 6: Comparison of the count of two different actions (adding keywords, generating image through generation model) taken to generate each sketch (from the first sketch to the fifth sketch) in CreativeConnect and baseline. The results showed that users use the add keyword action more in CreativeConnect compared to the baseline, where users only add keywords for the initial sketches. There was no significant difference in the count of generated image action.

6.1.2 Recombining elements.

The survey’s findings indicated that the CreativeConnect can be useful for recombining different elements into new design ideas. Participants said that CreativeConnect is significantly more helpful (M=5.94, SD=1.34) than the baseline system (M=4.88, SD=1.89 / p=0.023, W=10.5) for them to generate multiple ideas from the collected elements (Figure 5). However, participants’ perception of how much the system helped explore multiple ideas was not significantly different in both conditions, although CreativeConnect had a slightly higher average rate (Baseline: M=4.88, SD=1.93 / CreativeConnect: M=5.69, SD=1.35 / p=0.178, W=22.0). Also, it was not significant in terms of discovering novel ideas, but the average was slightly higher in CreativeConnect (baseline: M=5.00, SD=1.75 / CreativeConnect: M=5.75, SD=1.48 / p=0.110, W=19.0).
We also examined how participants used the given image generation model to recombine elements into a design idea. As shown in Table 1, there was no significant difference in the number of generated images between the two conditions. However, in exploring diverse recombinations using the model, CreativeConnect showed particular advantages, as evident from the unique patterns observed when users interacted with the generation model under two conditions. Users could provide separate inputs for overall image descriptions and each object using the baseline system. CreativeConnect allowed users to select multiple keywords to merge. In both conditions, participants could input multiple phrases together to combine them. We analyzed how diverse phrases are given as a single input into the model. Out of a total of 347 input sets (202 from the baseline, 145 from CreativeConnect), 14 sets (11 from the baseline, 3 from CreativeConnect) consisted of only one input, and they were excluded from the analysis since our objective was to compare the semantic similarity between phrases provided to the model together. For the remaining 333 input sets (191 from baseline, 142 from CreativeConnect), we computed the semantic similarity between all pairs of phrases within each input set and calculated the mean and minimum similarity. The mean similarity represents the overall similarities between phrases provided as input together, while the minimum similarity represents the most diverse pairs within the set. Finally, we conducted a two-sample t-test for each metric.
As shown in Table 1, the input sets created within CreativeConnect showed significantly lower similarity between the keywords (M=0.222, SD=0.094) compared to the sets made within the baseline system (M=0.263, SD=0.166 / p=0.008, t=2.66) when they are calculated based on the minimum similarity. This difference is also similar when they are calculated based on the mean similarity, but it was slightly not significant (Baseline: M=0.356, SD=0.148 / CreativeConnect: M=0.330, SD=0.075 / p=0.051, t=1.95). This means that participants with CreativeConnect actively sought to create unique recombinations with greater semantic diversity, ultimately exploring diverse and distinct recombinations compared to the baseline condition.
Table 1:
  CreativeConnectBaselineStatistics
  meanstdmeanstdpSig.
Image Generation Model Usage
(Per session)
# of generated image57.0617.9146.6923.520.119-
 # of user inputs to the model9.314.5710.564.760.468-
Semantic Similarity
within Input Sets
Semantic Similarity (Mean)0.3300.0750.3560.1480.051-
 Semantic Similarity (Min)0.2220.0940.2630.1660.008⁎⁎
Table 1: Number of image generation model usage and the semantic similarity between user inputs in CreativeConnect and baseline. (-: p > .05, ⁎: p < .050, ⁎⁎: p < .010, ⁎⁎⁎: p < .001)

6.2 Ideation Results

To answer RQ2, we analyzed the design idea sketches that participants drew during the study session through expert evaluation, usage log, and survey results. Similar to the RQ1, the Wilcoxon signed-rank test was used for survey questions. We used a two-sample t-test for expert evaluation and log analysis results. For pairwise data, such as comparing the number of sketches drawn in each condition by each participant, we conducted a two-sample paired t-test.

6.2.1 Creativity & Diversity of the Final Sketches.

Figure 7:
Figure 7: Evaluation results of user-drawn sketches with 95% confidence interval. (a) Expert evaluation on the diversity and creativity for CreativeConnect and baseline condition. (b) Self-reported satisfaction on sketch result in terms of quantity, quality, diversity, creativity, and overall for CreativeConnect and baseline condition.
As shown in Figure 7 (b), the survey results showed that participants perceived their sketch as more creative when they were using CreativeConnect (M=5.38, SD=1.09) compared to the baseline (M=4.19, SD=1.64 / p=0.004, W=0.0). During the interview, 12 out of 16 participants said that they felt they could be more creative with the support of CreativeConnect rather than the baseline, especially when they’re having a hard time coming up with a new idea in the early ideation stage. There were no significant statistical differences between the two conditions in terms of other factors, including overall satisfaction, quantity, quality, and diversity of the sketches.
However, as shown in Figure 7 (a), the expert evaluation does not show a significant difference between the two conditions. The creativity score of the expert evaluation was slightly better in CreativeConnect (M=4.854, SD=1.418) compared to the baseline (M=4.344, SD=1.708 / p=0.114, t=1.59), but it was not significant according to the two-sample t-test results. There was also no significant difference in diversity (Baseline: M=4.625, SD=1.607 / CreativeConnect: M=4.75, SD=1.418 / p=0.833, t=0.23). There were possible reasons that expert evaluation was different from the survey results. First, even though the experts were asked to focus on the idea as much as possible, the participants’ sketch skills were inevitably reflected in the evaluation, and some of the comments left by the evaluators were actually about the sketch skills. There is also a possibility that deviations according to the design topic may have been affected. In fact, sketches about the topic of underwater Christmas were rated higher on average.

6.2.2 Efficiency of the Ideation Process.

Table 2:
 CreativeConnectBaselineStatistics
 meanstdmeanstdpSig.
# of sketch per session5.561.635.061.730.041
time per sketch (min)5.012.875.393.030.403-
Table 2: Number of sketches drawn by the participants per session and the average time taken for sketches. (-: p > .05, ⁎: p < .050, ⁎⁎: p < .010, ⁎⁎⁎: p < .001)
As shown in Table 2, the two-sample pairwise t-test result showed that participants came up with more sketches in the same 30-minute ideation session with the support of CreativeConnect (M=5.56, SD=1.63) than with the baseline (M=5.06, SD=1.73 / p=0.041, t=2.24). This result indicates that CreativeConnect can be helpful for efficient ideation. The interview results also demonstrated that CreativeConnect could be useful when they have to come up with a lot of ideas in a limited set of references and time, which is a common scenario in professional design tasks where clients provide references and designers must provide drafts with them.

6.2.3 Perceived Workload.

As shown in Table 4, there was no difference between the two conditions regarding the perceived workload. While CreativeConnect has additional complications, such as requiring users to specify keywords to give inputs to the image generation model, this does not cause users to feel overwhelmed while performing the task.

6.3 Impact on User’s Creative Process

6.3.1 Source of the Inspiration.

Table 3:
CreativeConnectBaseline
Source of
Inspiration
Within
the tool
Generated image/description95
  Recommended keywords1-
  ChatGPT answers-2
 Outside of
the tool
Own creativity11
  Reference images58
     
Avg. tool assistance rating5.6254.563
Table 3: Number of inspiration sources by category for the most creative sketches chosen by the participants and the average rating of the efficiency of the tool assistance for drawing those sketches. Figure 8 illustrates the example use cases of inspiration within the tool.
To investigate how users use the output from CreativeConnect and baseline system for generating new design ideas differently, we asked participants to pick one sketch they think is the most creative for each study session and explain how they got the inspiration for it.
Five different inspirational sources were found in two conditions. Many participants got their ideas from the generated images or text descriptions in both conditions. In CreativeConnect condition, more than half of the participants said that their best ideas are inspired by these generated images or text (Table 3). As illustrated in Figure 8 (a), participants utilized keywords from both reference images and recommendations, and merged them using the system. Notably, they got the generated images and tried to reinterpret them in their own way rather than accepting what was drawn there. Participants with the baseline system were also influenced by the images generated, but the number was slightly less (Table 3), and how they were influenced was slightly different. They tend to refer to the visual compositions or details of the shapes and apply them to their sketch. P7 mentioned the reason for this, “While putting prompts into the image generation model (in the baseline), I already had the concept I wanted. Therefore, I refer to the expression method of it, rather than trying to find something new out of it.”
One noticeable thing is that participants were influenced more by the given reference images when using the baseline. This shows that CreativeConnect can make users less directly affected by reference images, ultimately preventing them from fixating on them. P16 explicitly pointed out this by saying, “When using CreativeConnect, I gave less focus to given images, and as I can expand to a lot of ideas only with a small number of references, I didn’t even use all of them.” P14 mentioned, “This (baseline) tool feels like a notepad that manages references, so I kept referring to the reference images themselves."
As shown in Figure 8 (d), there was also a participant who got an idea from CreativeConnect’s recommended keywords. In baseline, instead of this keyword recommendation feature, they could use ChatGPT, and 2 participants said that they got their inspiration from this. However, this usage was relatively small (Table  3), mainly because of the challenges of using it for visual tasks. During the interview, participants mentioned difficulties in formulating prompts and leveraging the language-based output for their design.
This difference in sources of inspiration affected the results of the user’s rating of how effective the assistance of the tool was. We conducted a two-sample pairwise t-test to compare participants’ ratings on the tool’s usefulness for generating their favorite ideas. The rating was higher in CreativeConnect (M=5.63, SD=1.41) compared to the baseline (M=4.56, SD=1.89 / p=0.045, t=2.18) (Table 3), indicating that users perceived the features of CreativeConnect are more helpful in coming up with their best ideas, compared to the baseline system.
Figure 8:
Figure 8: Examples of user input, system-generated output, user-drawn sketches inspired by the system output, and corresponding user quotes. In both conditions, users were inspired by the generated images. However, CreativeConnect users were more inspired by the overall concept of the image, while baseline users referred to a specific composition or the detail of the object from the image.
The survey results about the perceived experience of using the AI-based system also showed a more specific reason for this helpfulness. As shown in Table 4, CreativeConnect is shown to be significantly better for thinking through what kind of outputs users want to complete for the given task (baseline: M=5.00, SD=1.97 / CreativeConnect: M=6.13, SD=1.02 / p=0.045, W=14.5). This shows that participants don’t think of the results of the CreativeConnect’s image generation model as their final results but more as a guide to thinking about what they want. It leads users to think in diverse ways. P9 mentioned that “In baseline, the result came out exactly what I thought, so I replicated the output. However, CreativeConnect shows me various high-level ways to combine things so I could explore those methods and expand those processes on my own.”
Table 4:
  CreativeConnectBaselineStatistics
  meanstdmeanstdpSig.
Self-perceived
experience
on ML model
Match goal5.001.634.631.960.5805-
 Think through6.131.025.001.970.0454
 Transparent4.811.804.381.670.4488-
 Controllable4.751.954.061.840.2976-
 Collaborative5.381.594.942.080.4809-
NASA-TLXMental3.691.824.191.940.39-
 Physical1.811.222.502.100.10-
 Temporal2.811.833.502.280.23-
 Effort3.631.823.942.050.63-
 Performance5.311.085.061.390.78-
 Frustration2.631.933.501.750.14-
Creativity Support IndexEnjoyment5.911.005.091.780.077-
 Exploration5.381.544.811.560.211-
 Expressiveness5.441.184.531.750.032
 Immersion4.691.994.691.821-
 Results Worth Effort5.471.275.251.710.591-
 Collaboration5.191.254.411.710.016
Table 4: Survey results of self-perceived experience on ML features, NASA-TLX questionnaire, and Creativity Support Index. (-: p > .05, ⁎: p < .050, ⁎⁎: p < .010, ⁎⁎⁎: p < .001)

6.3.2 Creativity Support Index.

According to Table 4, users prefer CreativeConnect significantly more than the baseline regarding expressiveness and collaboration. Still, the other criteria showed no significant difference between the two systems. Through the post-interview, we found out that participants felt different types of creativity support in each system. Participants said that the baseline was helpful when they had an overall idea in their mind and wanted to get support for expressing it in the sketch. On the other hand, participants said that the CreativeConnect is helpful for their creativity when they have no idea yet. These differences will be explained in more detail in section 7.1.

7 Discussion

We propose a novel AI-infused creativity support tool CreativeConnect, which assists graphic designers in generating their design ideas by recombining reference images. Based on our findings, we suggest some design implications for future creativity support tools.

7.1 CreativeConnect vs. baseline - Two Different Types of Creativity Support

The results show that CreativeConnect successfully supports the early-stage conceptual ideation with reference recombination process by aligning well with the four design goals we derived from the formative study. Participants could easily extract keywords (DG 1) and utilize keyword recommendations as a source of new inspirations (DG 2), leading them to make more keyword notes. Also, they explored diverse keyword recombinations (DG 3), leading them to make more design ideas in a given time. Additionally, they perceived their idea as more creative as CreativeConnect provided the output as an incomplete sketch and let participants inject their creativity into it (DG 4). However, participants didn’t feel the difference in the overall degree of creativity support between the two tools. The interviews revealed that this was because CreativeConnect and baseline both provided valid creativity support, but in a distinct way based on users’ current needs.
In the baseline system, users should specify all the details of the generated image, so they appreciated the transparency and control. The system faithfully reproduced user input by that control, resulting in a final output that closely mirrors the concept in their mind. These generated outputs helped users actualize their existing ideas, more supporting implementation [17, 18]. Sketch-Sketch Revolution [23] or Framer [51] had a similar approach to creativity in terms of this.
Conversely, CreativeConnect stimulates creativity by providing inspiration [17, 18]. Instead of requiring users to provide detailed input, CreativeConnect accepts keywords and deliberately refrains from exact expression, generating a wider range of outcomes, potentially with serendipity. How CreativeConnect can provide participants with this creative leap can be explained by Cross’ descriptive model of creative design [19]. The keyword extraction feature actively supports emergence, allowing designers to find unrecognized properties of the existing design. The keyword recommendation also supports mutation, helping designers to generate new ideas by modifying existing designs partially. P15 metaphorically likened this process to having someone nearby constantly talking with them with fresh variations of ideas. Furthermore, the keyword merging feature enhances combination, where new ideas are generated by combining features from existing designs. Therefore, CreativeConnect could be potentially helpful for addressing a common challenge known as “artist’s block” or “creative block”, similar to the “writer’s block” experienced by writers [32]. CreativeConnect could provide proper support when designers find themselves creatively stuck, breaking creative inertia by sparking novel ideas and opening new creative avenues.
These differences can be valuable design implications for future creativity support tools as designers require different types of creativity support in different stages of the ideation process. By dynamically adjusting the type of support based on the user’s context, such a tool can offer a more personalized and practical creative experience. For instance, when the system detects a user in the exploration phase, it can employ an approach similar to CreativeConnect, encouraging the generation of diverse and abstract ideas. Conversely, when the user wants to refine and develop a particular concept, the tool can provide baseline-like features to ensure greater control and fidelity in the generated output. This adaptable approach acknowledges the multifaceted nature of the creative process and supports users with the right tools at the right moment, ultimately enhancing their creativity. Also, integrating those inspiration and implementation support into a single tool can enable a seamless transition between generating diverse ideas and refining specific concepts, fostering a more iterative and efficient creative workflow.

7.2 The Role of Low-fidelity Output for Creativity Support

The post-interview showed that adopting low-fidelity output can facilitate further imagination beyond what the system provided. We deliberately employ a low-fidelity sketch output in both CreativeConnect and the baseline. During the interview, 12 out of 16 participants preferred the sketch output over a complete image, allowing users room for imagination and interpretation. The image converted into a sketch omits small details and retains only the larger forms, generating a large empty space. This emptiness encourages users not just to perceive the generated image but to see it as room for further development and makes users deeply engaged in further ideation. Some participants even expressed opposition to completed images for the ideation stage, as they believed that an abundance of details in reference images makes them fixated on that specific design idea and hinders them from utilizing the images in their own ideas. P2 said, “I usually get completed artworks from Pinterest 4 as a reference, and I found myself unavoidably looking at the unique style of that designer, wanting to replicate it. This time, I liked that I could maintain my own style while exploring different references of concepts.” Based on our findings, adopting low-fidelity output could be an option when designing creativity support systems to prevent fixation and facilitate the user’s creativity in ideation. For example, a design reference tool can dynamically adjust the levels of details of the provided images based on the user’s current design stage. When the user wants references for overall concepts, the system can convert reference images to a simple black line drawing or even present it solely as a textual description. Conversely, when the user has determined a specific concept and is exploring different visual details, the system can offer the original images with full details.

7.3 Generalizability of CreativeConnect in Different Context

CreativeConnect is designed to support early-stage designers, such as design students, with a general understanding of the design process but need help with reference recombination. However, our user study revealed some insights applicable to different expertise levels. We observed that participants with limited sketching skills were satisfied more with the baseline system, as it was more aligned with their intentions and suitable for the aid for the actual sketching. Therefore, for users less familiar with artistic expression, an AI tool’s output should prioritize alignment with the users’ original intent rather than abstraction. Conversely, for experts accustomed to extracting inspiration from references and combining them into their original idea [4], CreativeConnect could serve as a tool for serendipity rather than helping them with the process of keyword extraction and recombination. For example, P16 said that suggested keywords and merged images acted as a prompt to remind them of some aspects initially overlooked. Therefore, features should be redesigned to encourage reflection and creative exploration, such as highlighting the part of the generated images that were not present in existing references but emerged through our system features.
The user study results showed that CreativeConnect could also be utilized for other design contexts, such as collaborative projects. According to the CSI survey results (Section 6.3.2), participants indicated that CreativeConnect would be significantly helpful for collaborating with other designers. This was because CreativeConnect is designed to follow the sequential steps of leaving keyword notes and merging them, and it keeps track of these processes on the mood board and the merging panel. Therefore, participants said that simply showing CreativeConnect screen could share their creative processes with other designers, making it easier for them to understand each other’s thought processes and quickly reach an agreement on the design direction. One future work direction can be incorporating features of CreativeConnect to collaborative mood board tools [15, 48, 49] and studying the benefits of keyword-based recombination features.
CreativeConnect could also used for other design domains. Our design goals and the feature design of CreativeConnect are primarily tailored to the illustration design task, which is predominantly about conveying design topics through visual subject matters and does not usually include other modalities such as text (common in poster or publication design) or motion & interaction (common in UI/UX and motion graphic design). However, even in other design domains, the recombination process of extracting elements from the reference and recombining them is an effective strategy. To apply the recombination approach to another design domain, we must first identify what elements designers in that domain focus on when looking at references and use those different categories of elements as keywords in the pipeline of CreativeConnect.

8 Limitations and Future Work

Our work has several limitations that future work can address. In our user study, the ideation tasks were conducted for 30 minutes in each condition, which was shorter than the actual design process. Therefore, it was difficult to observe how the behavior changed over a long time. Future work can be done to incorporate CreativeConnect with real-world design projects and see how their behavior patterns differ from lab studies.
Our pipeline generates an image description containing all of the keywords selected by the user as a method of recombination. However, there can be various ways of recombination other than this, such as blending objects or indirectly expressing some keywords through visual details such as colors. Further work can be done on these various recombination methods and how to support them.
As CreativeConnect and baseline both leverage generative AI, including LLM and layout diffusion model, the result may be influenced based on users’ familiarity with AI. Since this study did not explore those dimensions, future research can examine how creativity supporting tools with AI features may have varying effects depending on the user’s knowledge level of AI or prior experiences of using AI.

9 Conclusion

This paper proposed CreativeConnect, a system designed to support graphic designers in the reference recombination process, allowing them to generate novel design ideas. Building on our formative study observations, CreativeConnect assists users in identifying key elements within reference images. It also provides diverse recommendations for relevant keywords and recombination options. Notably, the low-fidelity sketch-based output of CreativeConnect was shown to encourage creativity by enabling further imaginative exploration. Our user study demonstrated that CreativeConnect efficiently supported both steps of finding and recombining elements and helped participants come up with more design ideas and perceive their ideas as more creative than the baseline. While CreativeConnect represents a promising step towards comprehensive recombination support tools for designers, we also suggested an opportunity to expand such systems to address a broader spectrum of design needs and situations.

Acknowledgments

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2021-0-01347, Video Interaction Technologies Using Object-Oriented Video Modeling / No.2019-0-00075, Artificial Intelligence Graduate School Program (KAIST)) We thank all of our study participants and the members of KIXLAB for their insightful discussions and constructive feedback.

A Technical Details

A.1 Prompt: Extracting Keywords from Image Captions

System Prompt
You will be provided with multiple sentences to describe an illustration. Your task is to extract a list of Subject matter, Action & pose, and Theme & mood.
Subject matters are one-word, describing the specific physical objects, characters, or landscape that the user wants to include in their illustration. Example subject matters include pencil, children, or wave. For subject matters, no adjectives should be included. They should always be a noun.
Actions & poses are word-level or phrase-level actions that the character or the object in the illustration performs. Example actions & poses include riding a bus, standing still, or traveling.
Themes & moods are words not directly present in the illustration, but those that can potentially convey the overall theme or mood of the illustration. Example themes & moods include imaginative, eco-friendly, or sad. They should be adverbs, preferably one or two words.
If you are provided sentences including some style such as cartoon, illustration, image, or photo, exclude it. For other examples, ’an illustration of a woman sitting at a table’ caption is extracted to ’woman’, ’table’, ’sitting at a table’, ’cozy’. The ’illustration’ is not contained. Eliminate the changed forms of the same word, such as plurals. Only include roots. For example of ’trees’ and ’tree’, only include ’tree’."
Examples
User: a card with chinese writing with colorful objects on it
a red and orange background with a blank paper, chinese, pencils, stationery and more
an image of a classroom scene with various supplies
Assistant: Subject matter: card, Chinese writing, colorful objects, red and orange background, blank paper, Chinese, pencils, stationery, classroom, supplies.
Action & pose:
Theme & mood: education, learning, multiculturalism
User: a man and a woman sitting at a table eating food
a woman sitting in a chair in a room with potted plants
a man sitting at a table eating a hot dog
a illustration of a woman sitting at a table in a kitchen with potted plants
an illustration of a woman sitting at a table
an illustration of a refrigerator next to a chair and a table
an illustration of two people sitting at a table with their feet on a chair
a cartoon of a person sitting in a chair in a shower
Assistant: Subject matter: man, woman, table, food, chair, potted plants, hot dog, kitchen, refrigerator, feet
Action & pose: sitting at a table, eating food
Theme & mood: cozy, heartwarming
User: a painting of a white barn in a field of flowers
a painting of a blue house with a tree next to it
a painting of a blue house with a roof and trees
a painting of a window of a house with flowers
a painting of a field of flowers in front of a house
Assistant: Subject matter: painting, white barn, field, flowers, blue house, tree, roof, window
Action & pose:
Theme & mood: rural, peaceful, nature
User: the album cover of the beatles abbey road
a man and a woman standing in front of a car
a man and a woman walking down a street
a group of people walking across a crosswalk
Assistant: Subject matter: Beatles, Abbey Road, man, woman, car, street, group, people, crosswalk
Action & pose: standing in front of a car, walking down a street, walking across a crosswalk
Theme & mood: urban, nostalgia

A.2 Prompt: Recommending Relevant Keywords

System Prompt
We are trying to support novice designers’ ideation process by semantically combining different parts of illustration references. You will be provided with the topic of the ideation, and multiple keywords users like in the illustrations they found as references. There are three types of keywords: Subject matter, Action & Pose, and Theme & Mood.
Subject matters are one-word, describing the specific physical objects, characters, or landscape that the user wants to include in their illustration. Example subject matters include pencil, children, or wave. For subject matters, no adjectives should be included. They should always be a noun. Come up with more than four new keywords for Subject matter.
Actions & poses are word-level or phrase-level actions that the character or the object in the illustration performs. Example actions & poses include riding a bus, standing still, or traveling.
Themes & moods are words not directly present in the illustration, but those that can potentially convey the overall theme or mood of the illustration. Example themes & moods include imaginative, eco-friendly, or sad. They should be adverbs, preferably one word.
Your task is to expand on the keywords being given, by combining multiple keywords or looking for synonyms that can inspire new creations or ideas. For example, the subject matter "pencil" can be combined with the action & pose "traveling" to inspire a new action & pose "writing a diary". You can combine as many keywords at once. Another example is to generate "hair salon" from "hair dryer", "comb", and "scissors". For combinations that result in theme & mood, make them as abstract as possible. An example is to make "adventurous", "gusty" from "riding on ship" and "tent".
Come up with new keywords for each keyword type with creative combinations. Only use the original keywords provided for creating new keywords. Do not just paraphrase original keywords. Do not suggest similar keywords to the original ones.
Important: Include at least one subject matter for each combination. Subject matter and theme & mood should be a SINGLE WORD. Combinations among subject matters are highly recommended. New keywords should be śurprisingćompared to original ones. It means the character of your suggested word should have low similarity.’
Examples
User: Subject matter: camping, tent, tree, animals, Eiffel tower, family
Action & pose: riding on a bus, riding on a ship
Theme & mood: playful, imaginative
Assistant: Subject matter: bear, sleeping person, safari, cruise
Action & pose: traveling, setting up camp, dancing jazz
Theme & mood: adventurous, serene, joyful, romantic
User: Subject matter: boy, dinosaur, flower
Action & pose: watching television
Theme & mood: fantasy, playful
Assistant: Subject matter: wind mill, volcano, movie screen
Action & pose: exploding strongly, riding a dinosaur, flying away to the sky
Theme & mood: vast, whimsical, rustic, frenetic
User: Subject matter: dreamy scene, boy
Action & pose: playing with dino toys
Assistant: Subject matter: universe, Saturn, astronauts
Action & pose: imagining adventures, floating on the space, role-playing, daydreaming
Theme & mood: jolly, imaginative, impactful
User: Subject matter: Christmas tree
Action & pose: dancing around the Christmas tree
Theme & mood: family-bonding
Assistant: Subject matter: Fireplace, wooden sled, Snowman, jazz, piano
Action & pose: melting, giving present, body-warming
Theme & mood: jubilant, sparkling, heartwarming
User: Subject matter: sea turtles, Christmas tree, marine life
Action & pose: swimming, dancing around the Christmas tree
Theme & mood: fantasy, underwater, family-bonding
Assistant: Subject matter: Sea horse, Christmas lights, coral, mermaid
Action & pose: floating on the wave, blinking eye, singing under the sea
Theme & mood: ethereal, aquatic, charming, panoramic
User: Subject matter: kid, cat
Action & pose: laying on top of a suitcase, playing hide and seek
Theme & mood: Rustic, vivid, exhilarating
Assistant: Subject matter: Birdcage, attic, trunk, blue bird
Action & pose: jumping on boxes, chasing birds, hiding in a suitcase
Theme & mood: quaint, mischievous, lively, nostalgic

A.3 Prompt: Generating Recombinations in Text Descriptions

System Prompt
The user wants to draw an illustration, with the assistance of you. You will be provided with multiple keywords users want to include in their illustrations. There are three types of keywords: Subject matter, Action & pose, and Theme & mood.
Subject matters are one-word, describing the specific physical objects, characters, or landscape that the user wants to include in their illustration. Example subject matters include pencil, children, or wave. For subject matters, no adjectives should be included. They should always be a noun.
Actions & poses are word-level or phrase-level actions that the character or the object in the illustration performs. Example actions & poses include riding a bus, standing still, or traveling.
Themes & moods are words not directly present in the illustration, but those that can potentially convey the overall theme or mood of the illustration. Example themes & moods include imaginative, eco-friendly, or sad. They should be adverbs, preferably one word.
Your task is to generate three descriptions of the illustration that the user can draw based on the given keywords. The three descriptions should be significantly different from each other. Each description should include three things: "Caption" and "Objects".
"Caption" is a simple description of the overall image of the description. This should include some objects in the "Objects" list. Keep it concise. Do not make it long and do not include unnecessary adjectives.
"Objects" is a list of the objects depicted in the illustrations, and a short description of them. The objects should be one of the given "Subject matters", or something related to them. You don’t have to include all given subject matters. The given action and concept should also be considered for generating the object’s detail.
Examples
User: Subject matter: ball, cat, dog
Action & pose: jumping
Theme & mood: playful, peaceful
Assistant: 1.
Scene: In the backyard, cats and dogs play happily with a ball which is their favorite toy.
Objects: [(ball, a brightly colored ball), (dog, a dog jumping with its tongue sticking out), (cat, an agile cat in mid-jump)]
2.
Scene: A beach where a dog is united in its pursuit of a bouncing ball.
Objects: [(ball, a sandy beach ball), (dog, a spotted dog leaping for the ball)]
3.
Scene: In a residential living room, the cat and dog both defy gravity in their jumps to catch the toy.
Objects: [(toy, a small rubber toy with vibrant stripes), (dog, a bouncy terrier flying mid-air), (cat, a Siamese cat also in mid-jump)]
User: Subject matter: dinosaur, drink, hat, chair
Action & pose: relaxing
Theme & mood: sunny, relaxed
Assistant: 1.
Scene: A dinosaur wearing a hat, lounging with a drink on a chair under the sun.
Objects: [(dinosaur, a large but friendly looking dinosaur), (drink, a fruity cocktail with a tiny umbrella), (hat, a wide-brimmed straw hat), (chair, a comfortable looking lounge chair)]
2.
Scene: Dinosaur relaxes in a sophisticated environment with a drink and plops a hat on the chair next to it.
Objects: [(dinosaur, a dinosaur in a suit), (drink, a fancy drink in a crystal glass), (hat, a stylish trilby), (chair, a plush velvet chair)]
3.
Scene: A kid with a hat is playing on a picnic mat with a toy dinosaur.
Objects: [(kid, a kid with pink shirts), (dinosaur, a small green toy dinosaur)]
User: Subject matter: dog, teeth
Action & pose:
Theme & mood: care
Assistant: 1.
Caption: A caring veterinarian examining a dog’s teeth.
Objects: [(dog, a dog sitting calmly), (teeth, a pair of dental tools), (vet, a veterinarian wearing a lab coat)]
2.
Caption: A child diligently brushing their dog’s teeth.
Objects: [(dog, a dog lying on its back), (teeth, a toothbrush and dog toothpaste), (child, a child brushing the dog’s teeth)]
3.
Caption: A dental hygienist showing a dog owner the correct brushing technique for their pet’s teeth.
Objects: [(dog, a dog standing on a dental examination table), (teeth, a dental mirror and toothbrush), (hygienist, a dental hygienist demonstrating the brushing technique)]

A.4 Prompt: Matching Layout with Objects

System Prompt
You are an intelligent bounding box matcher. I will provide you with a caption that describes an illustration, a list of the objects that are included in the illustration, and a list of bounding boxes. Your task is to match bounding boxes to each object to make the illustration most balanced and realistic.
Each bounding box is in the format of (object name, [top-left x coordinate, top-left y coordinate, box width, box height]). The bounding boxes are represented as a proportion. The top-left corner has coordinates [0, 0]. The bottom-right corner has coordinates [1, 1]. The bounding boxes should not go beyond the image boundaries.
Examples
User: A realistic image of landscape scene depicting a green car parking on the left of a blue truck, with a red air balloon and a bird in the sky
[air balloon, car, bird, truck]
[0.041, 0.783, 0.442, 0.179], [0.525, 0.699, 0.408, 0.263], [0.261, 0.458, 0.222, 0.264], [0.525, 0.458, 0.279, 0.195]
Assistant: [(’car’, [0.041, 0.783, 0.442, 0.179]), (’truck’, [0.525, 0.699, 0.408, 0.263]), (’balloon’, [0.261, 0.458, 0.222, 0.264]), (’bird’, [0.525, 0.458, 0.279, 0.195])]
User: A realistic top-down view of a wooden table with two apples on it
[apple, apple, wooden table]
[0.219, 0, 0.562, 1], [0.402, 0.138, 0.195, 0.195], [0.402, 0.667, 0.195, 0.195]
Assistant: [(’wooden table’, [0.219, 0, 0.562, 1]), (’apple’, [0.402, 0.138, 0.195, 0.195]), (’apple’, [0.402, 0.667, 0.195, 0.195])]
User: A realistic scene of three skiers standing in a line on the snow near a palm tree
[skier, skier, skier, palm tree]
[0.487, 0.131, 0.142, 0.441], [0.661, 0.131, 0.143, 0.441], [0.836, 0.131, 0.142, 0.441], [0.795, 0.613, 0.183, 0.387]
Assistant: [(’skier’, [0.487, 0.131, 0.142, 0.441]), (’skier’, [0.661, 0.131, 0.143, 0.441]), (’skier’, [0.836, 0.131, 0.142, 0.441]), (’palm tree’, [0.795, 0.613, 0.183, 0.387])]
User: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
[dolphin, steam boat]
[0.273, 0, 0.245, 1], [0.032, 0.455, 0.135, 0.420]
Assistant: [(’steam boat’, [0.273, 0, 0.245, 1]), (’dolphin’, [0.032, 0.455, 0.135, 0.420])]
User: Immersed in his imagination, a boy is indoors enacting a prehistoric tale using four toy dinosaurs.
[dino toys, dino toys, dino toys, boy, dino toys]
[0.250, 0.218, 0.566, 0.563], [0.074, 0.556, 0.137, 0.284], [0.074, 0.76, 0.137, 0.284], [0.659, 0.041, 0.254, 0.134], [0.464, 0.840, 0.195, 0.120]
Assistant: [(’boy’, [0.250, 0.218, 0.566, 0.563]), (’dino toys’, [0.074, 0.556, 0.137, 0.284]), (’dino toys’, [0.074, 0.76, 0.137, 0.284]), (’dino toys’, [0.659, 0.041, 0.254, 0.134]), (’dino toys’, [0.464, 0.840, 0.195, 0.120])]
User: Two pandas in a forest without flowers
[panda, panda]
[0.114, 0.399, 0.183, 0.441], [0.733, 0.106, 0.150, 0.441]
Assistant: [(’panda’, [0.114, 0.399, 0.183, 0.441]), (’panda’, [0.733, 0.106, 0.150, 0.441])]

A.5 Prompt: Generating Layout based on Image Caption

System Prompt
You are an intelligent bounding box generator. I will provide you with a caption for an illustration and a list of the objects. Your task is to generate the bounding boxes for the objects based on the caption. The images are of size 512x512. The top-left corner has coordinates [0, 0]. The bottom-right corner has coordinates [512, 512]. The bounding boxes should not go beyond the image boundaries. Each bounding box should be in the format of (object name, [top-left x coordinate, top-left y coordinate, box width, box height]). If an object must appear several times by the provided caption, multiple bounding boxes may be added for the object.
Examples
User: A realistic image of landscape scene depicting a green car parking on the left of a blue truck, with a red air balloon and a bird in the sky
[car, truck, air balloon, bird]
Assistant: [(’car’, [0.041, 0.549, 0.412, 0.31]), (’truck’, [0.525, 0.553, 0.408, 0.313]), (’balloon’, [0.129, 0.016, 0.283, 0.264]), (’bird’, [0.578, 0.082, 0.279, 0.195])]
User: A realistic top-down view of a wooden table with two apples on it
[wooden table, apple]
Assistant: [(’wooden table’, [0.039, 0.289, 0.922, 0.422]), (’apple’, [0.293, 0.441, 0.195, 0.195]), (’apple’, [0.547, 0.441, 0.195, 0.195])]
User: A realistic scene of three skiers standing in a line on the snow near a palm tree
[skier, palm tree]
Assistant: [(’skier’, [0.01, 0.297, 0.271, 0.328]), (’skier’, [0.543, 0.375, 0.236, 0.308]), (’skier’, [0.289, 0.338, 0.242, 0.303]), (’palm tree’, [0.789, 0.205, 0.201, 0.49])]
User: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
[steam boat, dolphin]
Assistant: [(’steam boat’, [0.453, 0.439, 0.502, 0.291]), (’dolphin’, [0.041, 0.486, 0.369, 0.24])]
User: A cute cat and an angry dog without birds
[cat, dog]
Assistant: [(’cat’, [0.1, 0.131, 0.529, 0.632]), (’dog’, [0.589, 0.232, 0.412, 0.445])]
User: Two pandas in a forest without flowers
[panda]
Assistant: [(’panda’, [0.059, 0.335, 0.414, 0.441]), (’panda’, [0.516, 0.338, 0.434, 0.432])]
User: Immersed in his imagination, a boy is indoors enacting a prehistoric tale using two toy dinosaurs.
[boy, dino toys]
Assistant: [(’boy’, [0.367, 0.076, 0.301, 0.486]), (’dino toys’, [0.15, 0.469, 0.201, 0.254]), (’dino toys’, [0.543, 0.475, 0.188, 0.248])]

A.6 Example Outputs from the Technical Pipeline

Figure 9 shows some examples of inputs and outputs for three technical pipelines of CreativeConnect — (1) element extraction pipeline, (2) keyword recommendation pipeline, and (3) recombination generation pipeline. Figure 10 shows more examples from the keyword extraction pipeline.
Figure 9:
Figure 9: Examples of inputs and outputs given to CreativeConnect’s pipelines. From the input image, (a) captions describing a given image and (c) arrangement of an image are acquired. Using captions as input, (b) keywords found in this image are extracted and categorized in subject matter, action & pose, and theme & mood. From the (c) extracted arrangement, a layout variator generates two different recommended layouts similar to the original one, each containing 2 and 3 bounding boxes. The keyword recommendation pipeline is used for (e) recommending keywords relevant to the input keywords. Based on the input keywords, (f) descriptions are generated with image captions and the details of objects. The final (g) generated image is created from the descriptions and converted into a sketch style.
Figure 10:
Figure 10: Examples of the reference image with corresponding human-labeled ground truth labels and predicted keywords from the keyword extraction pipeline of CreativeConnect. Keywords predicted by our system can often be more descriptive and innovative than the ground-truths, which are highlighted in bold.

B User Study

B.1 Baseline System Interface

Figure 11 shows the interface of the baseline system used for the user study. The baseline system looks similar to the CreativeConnect. There is no keyword extraction feature in the left panel, but it allows participants to add keyword notes manually. In the center, there is the same interactive mood board with the CreativeConnect, but no keyword suggestion panel exists. The right panel enables users to manually configure the layout and prompts for image generation instead of selecting the keywords to combine. Other features such as mood board interactions (zoom, add/delete images) and saving favorite sketches were provided the same as the CreativeConnect. Participants could employ ChatGPT for various purposes other than this interface.
Figure 11:
Figure 11: Screenshot of baseline system. (a) Keyword Note Panel: Users can add keywords manually on each image. (b) Interactive Mood Board: Users can organize the reference images on the mood board, along with the added keyword notes. (c) Sketch Generation Panel: Users can configure the overall layout of the generated image by manipulating boxes in the layout controller (beige) on the top of the panel. Additionally, users can provide prompts for the entire image and specific parts. They can specify more image details by clicking the "Add object" button. Users can click the "Generate sketch" button to get the generated sketches. (d) ChatGPT: Users were also provided with ChatGPT on a separate screen.

B.2 Interview questions

These are the questions used for the semi-structured interview after the two idea generation sessions with baseline and CreativeConnect tools.
(1)
Can you share the idea sketch you think is most creative in each topic, and what was the main source of inspiration for those ideas?
(2)
Comparing the baseline and CreativeConnect, what were the main differences you noticed in the idea generation process?
(3)
In each of the three main stages of idea generation—finding reference elements, exploring ideas, and generating sketches—did you find one tool more helpful than the other, and why?
(4)
Were there any differences in your typical approach to idea generation when using these tools? If so, how was it different from the usual work process?
(5)
Which functionalities were most beneficial in both tools, and in what scenarios were they particularly useful?
(6)
Were there any situations or specific sketches where the tools were especially useful or not useful?
(7)
In terms of image generation methods, what were the main differences between baseline and CreativeConnect, and when did you feel each method was more helpful?
(8)
How did you feel about the output in sketch format, and do you think the tool’s effectiveness would differ if outputs were presented as a completed image rather than a sketch?
(9)
How did you incorporate the generated images into your final idea sketch?

B.3 Additional User Study Results: Raw Usage Log

Figure 12 shows the full usage log for all 16 user study participants, showing the timestamps of 3 types of user actions (adding keyword notes, generating images, and completing a design idea sketch).
Figure 12:
Figure 12: Usage log for all participants in both CreativeConnect (Top) and baseline (Bottom) condition. The red triangle indicates the timestamp when the participants complete each sketch. The pink dot is when the participant added new keywords for the reference image. The sky blue dot is when the participant gave input into the image generation model. As shown in the figure, P15 first conducted multiple keyword-adding and image-generation actions, came up with all the design ideas, and then sketched all of them all at once later in the session. During the interview, P15 explained that they intended to focus exclusively on the sketching process, so they decided to jot down half-baked design ideas as memos in a text and draw them collectively. Unfortunately, the collected usage log only records the point when the overall sketch is completed and does not capture the individual instances of writing each memo. Therefore, we could not analyze which actions affected each design idea, so we excluded the usage data of P15 from the analysis of the relationship between action types and each sketching turn (Figure 6).

Footnotes

3
OpenAI. (2023). ChatGPT (August 3 Version). https://chat.openai.com/

Supplemental Material

MP4 File - Video Preview
Video Preview
MP4 File - Video Presentation
Video Presentation
Transcript for: Video Presentation
MP4 File - Video Figure
Video figure showing the walkthrough of CreativeConnect system.

References

[1]
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".
[2]
Margaret A. Boden. 1998. Creativity and artificial intelligence. Artificial Intelligence 103, 1 (1998), 347–356. https://doi.org/10.1016/S0004-3702(98)00055-1
[3]
Nathalie Bonnardel. 1999. Creativity in design activities: The role of analogies in a constrained cognitive environment. In Proceedings of the 3rd conference on Creativity & cognition. 158–165.
[4]
Nathalie Bonnardel and Evelyne Marmèche. 2005. Towards supporting evocation processes in creative design: A cognitive approach. International Journal of Human-Computer Studies 63, 4 (2005), 422–435. https://doi.org/10.1016/j.ijhcs.2005.04.006 Computer support for creativity.
[5]
Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Grossman. 2023. Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 96, 14 pages. https://doi.org/10.1145/3586183.3606725
[6]
Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning to Follow Image Editing Instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18392–18402.
[7]
Donald T Campbell. 1960. Blind variation and selective retentions in creative thought as in other knowledge processes.Psychological review 67, 6 (1960), 380.
[8]
Tracy Cassidy. 2011. The Mood Board Process Modeled and Understood as a Qualitative Design Research Tool. Fashion Practice 3, 2 (2011), 225–251. https://doi.org/10.2752/175693811X13080607764854 arXiv:https://doi.org/10.2752/175693811X13080607764854
[9]
Joel Chan, Steven Dang, and Steven P. Dow. 2016. Comparing Different Sensemaking Approaches for Large-Scale Ideation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 2717–2728. https://doi.org/10.1145/2858036.2858178
[10]
Minghao Chen, Iro Laina, and Andrea Vedaldi. 2023. Training-Free Layout Control with Cross-Attention Guidance. arxiv:2304.03373 [cs.CV]
[11]
Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index. ACM Trans. Comput.-Hum. Interact. 21, 4, Article 21 (jun 2014), 25 pages. https://doi.org/10.1145/2617588
[12]
Lydia B. Chilton, Savvas Petridis, and Maneesh Agrawala. 2019. VisiBlends: A Flexible Workflow for Visual Blends. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3290605.3300402
[13]
Sam Ross Lydia B Chilton, Ecenaz Jen Ozmen, and Sam Ross. 2019. VisiFit: AI Tools to Iteratively Improve Visual Blends. (2019).
[14]
Orestes Chouchoulas and A.K. Day. 2007. Design Exploration Using A Shape Grammar With A Genetic Algorithm. Open House International 32 (06 2007), 26–35. https://doi.org/10.1108/OHI-02-2007-B0004
[15]
John Joon Young Chung and Eytan Adar. 2023. Artinter: AI-Powered Boundary Objects for Commissioning Visual Arts. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 1997–2018. https://doi.org/10.1145/3563657.3595961
[16]
John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (, San Francisco, CA, USA,) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 6, 17 pages. https://doi.org/10.1145/3586183.3606777
[17]
John Joon Young Chung, Shiqing He, and Eytan Adar. 2021. The Intersection of Users, Roles, Interactions, and Technologies in Creativity Support Tools. In Proceedings of the 2021 ACM Designing Interactive Systems Conference (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1817–1833. https://doi.org/10.1145/3461778.3462050
[18]
John Joon Young Chung, Shiqing He, and Eytan Adar. 2022. Artist Support Networks: Implications for Future Creativity Support Tools. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 232–246. https://doi.org/10.1145/3532106.3533505
[19]
Nigel Cross. 1997. Descriptive models of creative design: application to an example. Design Studies 18, 4 (1997), 427–440. https://doi.org/10.1016/S0142-694X(97)00010-0 Descriptive models of design.
[20]
Datasculptor. 2023. Image2LineDrawing. https://huggingface.co/spaces/Datasculptor/Image2LineDrawing Hugging Face Spaces.
[21]
Design with Canva. 2023. How to Use ChatGPT to Design Like a Pro. https://www.youtube.com/watch?v=VmBLuvBf0xE. [Online; accessed 2023-12-04].
[22]
Claudia Eckert and Martin Stacey. 2000. Sources of inspiration: a language of design. Design studies 21, 5 (2000), 523–538.
[23]
Jennifer Fernquist, Tovi Grossman, and George Fitzmaurice. 2011. Sketch-sketch revolution: an engaging tutorial system for guided sketching and application learning. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 373–382. https://doi.org/10.1145/2047196.2047245
[24]
Jonas Frich, Michael Mose Biskjaer, Lindsay MacDonald Vermeulen, Christian Remy, and Peter Dalsgaard. 2019. Strategies in Creative Professionals’ Use of Digital Tools Across Domains. In Proceedings of the 2019 Conference on Creativity and Cognition (San Diego, CA, USA) (C&C ’19). Association for Computing Machinery, New York, NY, USA, 210–221. https://doi.org/10.1145/3325480.3325494
[25]
Jonas Frich, Lindsay MacDonald Vermeulen, Christian Remy, Michael Mose Biskjaer, and Peter Dalsgaard. 2019. Mapping the Landscape of Creativity Support Tools in HCI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3290605.3300619
[26]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. arxiv:2208.01618 [cs.CV]
[27]
Steve Garner and Deana McDonagh-Philp. 2001. Problem interpretation and resolution via visual stimuli: the use of ‘mood boards’ in design education. Journal of Art & Design Education 20, 1 (2001), 57–64.
[28]
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2414–2423. https://doi.org/10.1109/CVPR.2016.265
[29]
A.K. Goel. 1997. Design, analogy, and creativity. IEEE Expert 12, 3 (1997), 62–70. https://doi.org/10.1109/64.590078
[30]
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
[31]
Scarlett R. Herring, Chia-Chen Chang, Jesse Krantzler, and Brian P. Bailey. 2009. Getting Inspired! Understanding How and Why Examples Are Used in Creative Design Practice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 87–96. https://doi.org/10.1145/1518701.1518717
[32]
Barbara Hirst. 1992. How artists overcome creative blocks.The Journal of Creative Behavior (1992).
[33]
Josh Holinaty, Alec Jacobson, and Fanny Chevalier. 2021. Supporting Reference Imagery for Digital Drawing. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. 2434–2442.
[34]
Keith J Holyoak and Paul Thagard. 1996. Mental leaps: Analogy in creative thought. MIT press.
[35]
Tom Hope, Ronen Tamari, Daniel Hershcovich, Hyeonsu B Kang, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2022. Scaling Creative Inspiration with Fine-Grained Functional Aspects of Ideas. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 12, 15 pages. https://doi.org/10.1145/3491102.3517434
[36]
Alexander Ivanov, David Ledo, Tovi Grossman, George Fitzmaurice, and Fraser Anderson. 2022. MoodCubes: Immersive Spaces for Collecting, Discovering and Envisioning Inspiration Materials. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 189–203. https://doi.org/10.1145/3532106.3533565
[37]
David G Jansson and Steven M Smith. 1991. Design fixation. Design studies 12, 1 (1991), 3–11.
[38]
Youngseung Jeon, Seungwan Jin, Patrick C. Shih, and Kyungsik Han. 2021. FashionQ: An AI-Driven Creativity Support Tool for Facilitating Ideation in Fashion Design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 576, 18 pages. https://doi.org/10.1145/3411764.3445093
[39]
Youwen Kang, Zhida Sun, Sitong Wang, Zeyu Huang, Ziming Wu, and Xiaojuan Ma. 2021. MetaMap: Supporting Visual Metaphor Ideation through Multi-Dimensional Example-Based Exploration. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 427, 15 pages. https://doi.org/10.1145/3411764.3445325
[40]
Pegah Karimi, Nicholas Davis, Mary Lou Maher, Kazjon Grace, and Lina Lee. 2019. Relating Cognitive Models of Design Creativity to the Similarity of Sketches Generated by an AI Partner. In Proceedings of the 2019 Conference on Creativity and Cognition (San Diego, CA, USA) (C&C ’19). Association for Computing Machinery, New York, NY, USA, 259–270. https://doi.org/10.1145/3325480.3325488
[41]
Pegah Karimi, Jeba Rezwana, Safat Siddiqui, Mary Lou Maher, and Nasrin Dehbozorgi. 2020. Creative Sketching Partner: An Analysis of Human-AI Co-Creativity. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). Association for Computing Machinery, New York, NY, USA, 221–230. https://doi.org/10.1145/3377325.3377522
[42]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44]
Andruid Kerne. 2000. CollageMachine: An Interactive Agent of Web Recombination. Leonardo 33, 5 (10 2000), 347–350. https://doi.org/10.1162/002409400552801 arXiv:https://direct.mit.edu/leon/article-pdf/33/5/347/1570572/002409400552801.pdf
[45]
Kevin Gonyop Kim, Richard Lee Davis, Alessia Eletta Coppi, Alberto Cattaneo, and Pierre Dillenbourg. 2022. Mixplorer: Scaffolding Design Space Exploration through Genetic Recombination of Multiple Peoples’ Designs to Support Novices’ Creativity. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 308, 13 pages. https://doi.org/10.1145/3491102.3501854
[46]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arxiv:2304.02643 [cs.CV]
[47]
Hyung-Kwon Ko, Gwanmo Park, Hyeon Jeon, Jaemin Jo, Juho Kim, and Jinwook Seo. 2023. Large-Scale Text-to-Image Generation Models for Visual Artists’ Creative Works. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 919–933. https://doi.org/10.1145/3581641.3584078
[48]
Janin Koch, Andrés Lucero, Lena Hegemann, and Antti Oulasvirta. 2019. May AI? Design Ideation with Cooperative Contextual Bandits. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300863
[49]
Janin Koch, Nicolas Taffin, Andrés Lucero, and Wendy E. Mackay. 2020. SemanticCollage: Enriching Digital Mood Board Design with Semantic Labels. In Proceedings of the 2020 ACM Designing Interactive Systems Conference (Eindhoven, Netherlands) (DIS ’20). Association for Computing Machinery, New York, NY, USA, 407–418. https://doi.org/10.1145/3357236.3395494
[50]
A Koestler. 1964. The Act of Creation: A study of the conscious and unconscious in art.
[51]
Tomas Lawton, Kazjon Grace, and Francisco J Ibarrola. 2023. When is a Tool a Tool? User Perceptions of System Agency in Human–AI Co-Creative Drawing. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 1978–1996. https://doi.org/10.1145/3563657.3595977
[52]
Brian Lee, Savil Srivastava, Ranjitha Kumar, Ronen Brafman, and Scott R. Klemmer. 2010. Designing with Interactive Example Galleries. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 2257–2266. https://doi.org/10.1145/1753326.1753667
[53]
Seungwoo Lee, Chaerin Kong, Donghyeon Jeon, and Nojun Kwak. 2023. AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion. arxiv:2305.04001 [cs.CV]
[54]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arxiv:2301.12597 [cs.CV]
[55]
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. 2023. GLIGEN: Open-Set Grounded Text-to-Image Generation. CVPR (2023).
[56]
Long Lian, Boyi Li, Adam Yala, and Trevor Darrell. 2023. LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models. arxiv:2305.13655 [cs.CV]
[57]
Vivian Liu, Jo Vermeulen, George Fitzmaurice, and Justin Matejka. 2023. 3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 1955–1977. https://doi.org/10.1145/3563657.3596098
[58]
Andrés Lucero. 2015. Funky-Design-Spaces: Interactive Environments for Creativity Inspired by Observing Designers Making Mood Boards. In Human-Computer Interaction – INTERACT 2015, Julio Abascal, Simone Barbosa, Mirko Fetter, Tom Gross, Philippe Palanque, and Marco Winckler (Eds.). Springer International Publishing, Cham, 474–492.
[59]
Andrés Lucero, Dzmitry Aliakseyeu, Kees Overbeeke, and Jean-Bernard Martens. 2009. An Interactive Support Tool to Convey the Intended Message in Asynchronous Presentations. In Proceedings of the International Conference on Advances in Computer Entertainment Technology (Athens, Greece) (ACE ’09). Association for Computing Machinery, New York, NY, USA, 11–18. https://doi.org/10.1145/1690388.1690391
[60]
Justin Matejka, Michael Glueck, Erin Bradner, Ali Hashemi, Tovi Grossman, and George Fitzmaurice. 2018. Dream Lens: Exploration and Visualization of Large-Scale Generative Design Datasets. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173943
[61]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. 2023. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. arxiv:2302.08453 [cs.CV]
[62]
Felix Müller-Wienbergen, Oliver Müller, Stefan Seidel, and Jörg Becker. 2011. Leaving the beaten tracks in creative work–A design theory for systems that support convergent and divergent thinking. Journal of the Association for Information Systems 12, 11 (2011), 2.
[63]
Michael D Mumford, Michele I Mobley, Roni Reiter-Palmon, Charles E Uhlman, and Lesli M Doares. 1991. Process analytic models of creative capacities. Creativity research journal 4, 2 (1991), 91–122. https://doi.org/10.1080/10400419209534428 arXiv:https://doi.org/10.1080/10400419209534428
[64]
Yukari Nagai, Toshiharu Taura, and Futoshi Mukai. 2009. Concept blending and dissimilarity: factors for creative concept generation process. Design Studies 30, 6 (2009), 648–675. https://doi.org/10.1016/j.destud.2009.05.004
[65]
Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. 2022. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In Proceedings of the 39th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 16784–16804. https://proceedings.mlr.press/v162/nichol22a.html
[66]
Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174223
[67]
OpenAI. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]
[68]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
[69]
Taylor Palmer, Jordan Bowman, and Tommy Geoco. 2023. 2023 Design Tools Survey - AI. https://uxtools.co/survey/2023/ai
[70]
Marcin L. Pilat and Christian Jacob. 2008. Creature Academy: A system for virtual creature evolution. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence). 3289–3297. https://doi.org/10.1109/CEC.2008.4631243
[71]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
[72]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arxiv:2204.06125 [cs.CV]
[73]
Daniel Ritchie, Ankita Arvind Kejriwal, and Scott R. Klemmer. 2011. D.Tour: Style-Based Exploration of Design Example Galleries. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 165–174. https://doi.org/10.1145/2047196.2047216
[74]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695.
[75]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22500–22510.
[76]
Satori Graphics. 2023. Ai Tools That TOTALLY Innovate A Designer’s Life!https://www.youtube.com/watch?v=eTffPm_e1ko. [Online; accessed 2023-12-04].
[77]
Ben Shneiderman. 2000. Creating Creativity: User Interfaces for Supporting Innovation. ACM Trans. Comput.-Hum. Interact. 7, 1 (mar 2000), 114–138. https://doi.org/10.1145/344949.345077
[78]
Pao Siangliulue, Joel Chan, Krzysztof Z. Gajos, and Steven P. Dow. 2015. Providing Timely Examples Improves the Quantity and Quality of Generated Ideas. In Proceedings of the 2015 ACM SIGCHI Conference on Creativity and Cognition (Glasgow, United Kingdom) (C&C ’15). Association for Computing Machinery, New York, NY, USA, 83–92. https://doi.org/10.1145/2757226.2757230
[79]
Dean Keith Simonton. 2003. Scientific creativity as constrained stochastic behavior: the integration of product, person, and process perspectives.Psychological bulletin 129, 4 (2003), 475.
[80]
Evgeny Stemasov, David Ledo, George Fitzmaurice, and Fraser Anderson. 2023. Immersive Sampling: Exploring Sampling for Future Creative Practices in Media-Rich, Immersive Spaces. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 212–229. https://doi.org/10.1145/3563657.3596131
[81]
Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, and Tae-Hyun Oh. 2023. Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6430–6440.
[82]
Michela Turrin, Peter von Buelow, and Rudi Stouffs. 2011. Design explorations of performance driven geometry in architectural design using parametric modeling and genetic algorithms. Advanced Engineering Informatics 25, 4 (2011), 656–675. https://doi.org/10.1016/j.aei.2011.07.009 Special Section: Advances and Challenges in Computing in Civil and Building Engineering.
[83]
Maxim Kuznetsov Vladimir Vorobev. 2023. A paraphrasing model based on ChatGPT paraphrases.
[84]
Sitong Wang, Savvas Petridis, Taeahn Kwon, Xiaojuan Ma, and Lydia B Chilton. 2023. PopBlends: Strategies for Conceptual Blending with Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 435, 19 pages. https://doi.org/10.1145/3544548.3580948
[85]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 5776–5788. https://proceedings.neurips.cc/paper_files/paper/2020/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[86]
Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 385, 22 pages. https://doi.org/10.1145/3491102.3517582
[87]
Lixiu Yu and Jeffrey V. Nickerson. 2011. Cooks or Cobblers? Crowd Creativity through Combination. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 1393–1402. https://doi.org/10.1145/1978942.1979147
[88]
Loutfouz Zaman, Wolfgang Stuerzlinger, Christian Neugebauer, Rob Woodbury, Maher Elkhaldi, Naghmi Shireen, and Michael Terry. 2015. GEM-NI: A System for Creating and Managing Alternatives In Generative Design. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 1201–1210. https://doi.org/10.1145/2702123.2702398
[89]
Enhao Zhang and Nikola Banovic. 2021. Method for Exploring Generative Adversarial Networks (GANs) via Automatically Generated Image Galleries. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 76, 15 pages. https://doi.org/10.1145/3411764.3445714
[90]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models., 3836-3847 pages.
[91]
Nanxuan Zhao, Nam Wook Kim, Laura Mariah Herman, Hanspeter Pfister, Rynson W.H. Lau, Jose Echevarria, and Zoya Bylinskii. 2020. ICONATE: Automatic Compound Icon Generation and Ideation. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376618
[92]
Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, and Xi Li. 2023. LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22490–22499.
[93]
Xinyi Zhu. 2023. Research Guides: Machines and Society: ChatGPT for Visual Design. https://guides.nyu.edu/data/chatgpt-visual-design

Cited By

View all
  • (2024)Multimodal Outputs for the Workplace From Generative AIComputational Practices and Applications for Digital Art and Crafting10.4018/979-8-3693-2927-6.ch008(198-225)Online publication date: 17-Jul-2024
  • (2024)Inteligencia Artificial en Illustrator. Percepción y exploración en estudiantes de diseño gráficoInteligencia Artificial en Illustrator. Percepción y exploración en estudiantes de diseño gráficoEuropean Public & Social Innovation Review10.31637/epsir-2024-6709(1-19)Online publication date: 12-Sep-2024
  • (2024)MemoVis: A GenAI-Powered Tool for Creating Companion Reference Images for 3D Design FeedbackACM Transactions on Computer-Human Interaction10.1145/369468131:5(1-41)Online publication date: 4-Sep-2024
  • Show More Cited By

Index Terms

  1. CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
    May 2024
    18961 pages
    ISBN:9798400703300
    DOI:10.1145/3613904
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 May 2024

    Check for updates

    Badges

    Author Tags

    1. Creativity support tool
    2. Graphic Design ideation
    3. Machine Learning
    4. Reference recombination

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    CHI '24

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI '25
    CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3,428
    • Downloads (Last 6 weeks)820
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multimodal Outputs for the Workplace From Generative AIComputational Practices and Applications for Digital Art and Crafting10.4018/979-8-3693-2927-6.ch008(198-225)Online publication date: 17-Jul-2024
    • (2024)Inteligencia Artificial en Illustrator. Percepción y exploración en estudiantes de diseño gráficoInteligencia Artificial en Illustrator. Percepción y exploración en estudiantes de diseño gráficoEuropean Public & Social Innovation Review10.31637/epsir-2024-6709(1-19)Online publication date: 12-Sep-2024
    • (2024)MemoVis: A GenAI-Powered Tool for Creating Companion Reference Images for 3D Design FeedbackACM Transactions on Computer-Human Interaction10.1145/369468131:5(1-41)Online publication date: 4-Sep-2024
    • (2024)ShadowMagic: Designing Human-AI Collaborative Support for Comic Professionals’ ShadowingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676332(1-15)Online publication date: 13-Oct-2024
    • (2024)Not Just Novelty: A Longitudinal Study on Utility and Customization of an AI WorkflowProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661587(782-803)Online publication date: 1-Jul-2024
    • (2024)The application and impact of artificial intelligence technology in graphic design: A critical interpretive synthesisHeliyon10.1016/j.heliyon.2024.e4003710:21(e40037)Online publication date: Nov-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media