1 Introduction
Managing an ever-increasing accumulation of knowledge has long been a challenge for scholars [
8]. With the recent proliferation of published materials, researchers face an even bigger challenge of keeping up with the literature [
25,
42,
55]. Fundamentally, scholars need to both discover new and relevant papers and contextualize them to their own research interests. One popular practice recently is to leverage recommender systems that can help researchers retrieve potentially relevant, such as arXivist
1, arXiv Sanity
2 or Google Scholar
3. Typically, these systems allow users to create “paper alerts” by providing a short description of a specific research topic and a set of seed papers as examples of papers of interest. For example, Semantic Scholar
4 allows users to save sets of collected papers under named topical folders. Users would then receive periodic paper alerts that contain a list of recently published papers similar to the collected papers. These alerts help researchers to quickly narrow down from all recent publications to a small set of potentially relevant papers, allowing them to more easily stay up to date on research topics that are of interest to them.
However, when receiving s set of potentially relevant papers in an alert, researchers still need to more deeply inspect each paper to understand its relevance. This typically involves making meaningful connections between newly encountered information and their existing knowledge of the literature—a process that can incur high cognitive costs for researchers. To illustrate this process with an example, a researcher working on
Research Support Systems may encounter a relevant new paper titled “SPECTER: Document-level Representation Learning using Citation-informed Transformers” [
15]. However, it can be challenging for the researcher to realize its relevance since the paper title only contains information about how it uses Transformers to generate document embeddings. Only when the researcher carefully examines the full abstract can they learn that the paper further described how the pre-trained model can be “easily applied to develop downstream applications,” and that the paper implemented a “research paper recommender system” for its evaluation. Further, if the researcher spent additional efforts to examine the content of the recommended paper, they might discover that the related work section described how the proposed method was built on SciBERT[
6]—a familiar paper they had recently saved and read—but extends its capability from embedding sentences to long documents. This manual process is effortful but important because failure to identify meaningful connections between new and existing knowledge will lead to overlooking new papers relevant to a researcher’s interests.
Further, existing paper recommender systems typically only show a list of titles and abstract summaries for the recommended papers with little information on
how they were relevant to the topic of the folder or the set of seed papers that the recommendations were based on. Findings from our formative study (Section
3) suggest that the title of a paper often lacks enough details that help researchers understand the paper’s relevance yet the abstract is often too long to skim through for paper alerts. Prior work has explored an approach that generates one-sentence TL;DR (too long; didn’t read) summaries [
10] that are easier to consume but they lack contextualization to the folder topic and the collected papers in them. Specifically, since the summaries are not tailored to a user’s folder context, parts of the abstracts that showed how the recommended papers were relevant to the folder can sometimes be omitted. While there has been research exploring ways to better contextualize paper recommendations by surfacing personalized social signals (e.g., based on a user’s prior interaction or publication history [
28]), they do not describe how the content of the papers is relevant to the users. As a result, users are left to decide whether to carefully examine the recommended papers to find potential connections, with no guarantee that the effort will pay off. As shown in the above example, this sensemaking process can be effortful for users, and failure to effectively triage paper recommendations could result in overlooking important paper recommendations and reducing the effectiveness of such systems.
In this work, we investigate what types of information in paper alerts help scholars deeply understand the relevance of recommended papers to their topical folders. In a formative study with seven researchers, we investigated challenges in identifying relevance from existing paper alerts and desired alternative descriptions of the recommended papers. Our formative findings suggested that scholars desire paper alerts that were contextualized to their folders compared to only showing titles, abstracts, and uncontextualized summaries. Participants found that descriptions about the recommended papers that revealed connections of a recommended paper to the user’s folder context helped them understand the recommended paper more effectively since it spotlights where to focus on among many aspects of the paper. We also found that presenting the comparison and contrasting descriptions of multiple papers allowed participants to understand how the recommended papers build on prior work they had already collected in their folders. Anchoring unfamiliar papers with collected familiar papers also reduced the cognitive load of processing new information.
Based on our formative findings, we propose
PaperWeaver, a new paper alert system that can enrich existing paper recommender systems by generating descriptions of how each recommended paper relates to a user’s interests and their collection of papers. Our system is built on an existing document recommender system.
PaperWeaver leverages recent advancements in Large Language Models (LLMs) for text generation to support users in several ways. First,
PaperWeaver generates a compact topic description of a set of user-collected papers. This compact topic description provides users with a quick summary of the collected papers. It is user-editable and useful for generating future descriptions for each recommended paper contextualized to a user’s interest. Second, to help users understand how a recommended paper is relevant to their research context,
PaperWeaver generates two types of complementary contextual descriptions:
contextualized aspect-based summaries and
paper-paper descriptions (Fig.
1).
Contextualized aspect-based summaries leverage the generated topic descriptions to extract statements on the problems, methods, and findings of papers that are highly relevant to the topic of interest.
Paper-paper descriptions summarize how a recommended paper relates to collected papers, which are more familiar to the users. If the recommended paper cites collected papers,
PaperWeaver summarizes these citation descriptions and, if not, it synthesizes relationships by comparing and contrasting aspects from the papers. Motivated by how multiple alternative descriptions can improve understanding of complex scientific topics [
1,
2], we design an interactive paper alert interface where users can explore multiple descriptions for recommended papers (Fig.
2).
To evaluate PaperWeaver, we conducted a within-subjects study (N = 15) with researchers who were interested in receiving paper recommendation alerts. To ensure participants were motivated in the study, the paper alerts used in the study were generated based on their actual set of collected papers. We compared PaperWeaver with a strong baseline similar to existing paper alert systems but additionally enriched with uncontextualized summaries and extracted related work sections from the recommended papers. Our user study results showed that participants were able to better understand the nuanced relevance of the recommended papers and were able to triage them more confidently with PaperWeaver. Further, participants were able to capture richer relationships between recommended and collected papers in their notes when using PaperWeaver compared to the baseline.
The contributions of this work are as follows:
•
Qualitative findings from a formative study employing design probes with researchers that identified user challenges around making sense of recommended papers and the need for contextualized summaries.
•
PaperWeaver, a tool that provides additional contextualized descriptions for a set of recommended papers, tailored to the user-collected papers. PaperWeaver uses an LLM-based pipeline that synthesizes content from both recommended and user-collected papers.
•
Findings from a user study (N = 15) that demonstrated how using PaperWeaver facilitates sense-making of paper recommendations and aids in uncovering useful relationships between recommended and collected papers.
4 PaperWeaver
Based on our design goals, we present
PaperWeaver (Fig.
2), a paper alert system that contextualizes recommended papers’ descriptions based on a user’s topical folder information.
PaperWeaver enables users to explore these descriptions while reading paper alerts. With lessons learned from the formative study, our computational method leverages a combination of an LLM-generated text and extracted related work sections from the recommended papers.
PaperWeaver extracts folder-specific aspects from a recommended paper to surface how it is relevant to the user’s context (DG1). The system describes how the recommended paper is similar to and different from a topical folder paper by identifying the relationships between papers (DG2). To remind users about papers they have previously collected in the folder,
PaperWeaver includes information about both the recommended and the collected papers in the descriptions (DG3). Given a paper recommendation and a set of collected papers in a named topical folder, the system provides “
paper-paper descriptions” that compare and contrast the recommended paper anchored to a relevant collected paper. To generate this type of descriptions, the system uses a pair of an abstract and a citance as input when available. For recommended papers that do not cite any of the collected papers,
PaperWeaver uses an LLM to identify latent relationships from their abstracts. Finally,
PaperWeaver also generates “
contextualized aspect-based paper descriptions” that summarizes a recommended paper’s abstract in a way that reflects the paper’s relevance to the topic of the given folder.
4.1 Example User Scenario
Imagine a computer science researcher who started working on the topic of Reading-supporting interfaces a few weeks ago. She has collected a list of papers related to this topic in a folder. Her familiarity with each paper in the folder varies. She has read some of the papers during her prior project. For some other papers, she only saved them because she saw relevant keywords in the title of the paper but has not had the time to read them yet. Because this is her active project, she is looking out for new papers relevant to the topic. She signed up for a paper alert service that regularly provides her with a list of papers based on papers that she has collected. However, she felt that the information provided was lacking. She often had to wade into a paper’s abstract and its full text to see whether the paper was relevant to her research interests. This activity requires more time and attention than she usually has when reading paper alerts.
Looking for a more effective way to process paper alerts, she gives PaperWeaver a try. She starts using the system by creating a folder titled Reading-supporting interfaces and adding the list of papers she has collected to the folder. PaperWeaver automatically expanded on the folder title to provide an overview summary of the source papers she collected. Although she is mostly satisfied with the automated summary, she removes the keyword medical literature in the description and adds another keyword Large Language Models to better reflect her research interests. She also notices a keyword multitouch technology, which reveals an aspect she has not thought of before. PaperWeaver saves the updated description and uses this description throughout the rest of the process.
After setting up the folder, the researcher receives a paper alert email when there is a new set of recommended papers. By clicking on the link in the email, she is directed to an interactive paper alert interface (Fig.
2). She bookmarks this link for future reference. She quickly goes through the recommended papers and immediately saves papers with obvious relevance (e.g., “The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interface”[
36]) based on the provided title and metadata (authors and TL;DR). For a paper with non-obvious connections, she explores the contextualized descriptions of the paper. She first opens the
Problem, method, and findings tab to see a summary of the paper by the folder context broken down into problem, method, and findings. For example, for the recommended paper “Synergi: A Mixed-Initiative System for Scholarly Synthesis and Sensemaking” [
29], she can see that
PaperWeaver-provided method of the paper is relevant to her interest in LLM ("Method: Developed Synergi, a mixed-initiative workflow tying user input, citation graphs, and LLMs") and the paper’s problem is "Problem: synthesizing research threads from multiple papers is challenging". Seeing such connections with
Reading-supporting interfaces, she saves the paper to her library folder. Feeling curious about the Synergi paper, she decides to explore the paper-paper relationship descriptions under the
Relate to Paper tab. She chooses to compare the recommended paper with a paper “CiteRead: Integrating Localized Citation Contexts into Scientific Paper Reading” [
48] that she is already familiar with. She learns that the system in the Synergi paper requires more active engagement from users compared to the system in the CiteRead paper. Now, she notes interactivity as one dimension of the reading interfaces design space. She also sees that there is a description for the Synergi paper and a paper she has collected in passing. By reading the description, she now recalls that “Scim: Intelligent Skimming Support for Scientific Papers” [
17] supports paper reading by automatically highlighting important content in a paper and, unlike Synergi, there is no personalization component in the system. Compared to other traditional paper alert systems that only show a list of titles and, at most, abstracts,
PaperWeaver helps the researcher figure out how each of the recommended papers connects to her topic of interest, as well as their nuanced relationships to papers she had already collected.
4.2 Methods for Generating Contextualized Descriptions
To generate contextualized descriptions like those presented in the scenario, we developed an LLM-based pipeline (Fig.
3) that processes the user’s folder information and the recommended paper’s content to generate descriptions for each recommended paper. LLMs can make meaningful improvements in comprehension and summarization, particularly for long, complex documents that demand a high degree of accuracy [
3]. This capability enables
PaperWeaver to identify relevant aspects with the given context in papers and synthesize descriptions from multiple aspects.
PaperWeaver generates three types of descriptions: (1)
contextualized aspect-based paper summaries, (2)
paper-paper descriptions based on citances, and (3)
paper-paper descriptions via generated pseudo-citances. We describe how
PaperWeaver generates a compact summary of collected papers in a topical folder (§
4.2.1) and uses the folder summary to generate contextualized aspect-based summaries (§
4.2.2), contextualized paper-paper relationships when recommended papers cite one or more collected papers (§
4.2.3), and when they do not (§
4.2.4). Full prompts are in Appendix
A.
4.2.1 Suggested Topic Description.
To ensure the three types of descriptions (detailed in the following subsections) are relevant to the topic of the folder,
PaperWeaver allows users to provide a compact description for their folders in addition to the folder name. This topic description is used in all subsequent LLM prompts to convey the user’s interests. To lower the effort of writing folder descriptions,
PaperWeaver uses an LLM to generate a default description based on collected papers already saved in the folder. We adapted our prompt design from the prior
LAMP approach[
50], which creates a user profile prompt with the information from the user’s own paper. Our prompt takes a task instruction and a list of titles of papers included in the folder as inputs (T1 in Appendix
A.1). Then, our prompt instructs an LLM to generate a description including the folder title, shared goals of collected papers, and a list of topical keywords (Fig.
2 A; folder title and description). This default folder description was then shown to the user. The user could further edit them to reflect their interests beyond the papers that they had collected.
4.2.2 Contextualized Aspect-based Paper Summaries.
Among the various aspects of the recommended paper, we extract the aspects (
i.e., rhetorical structure elements indicating problem, method, findings [
40]) that the user who has curated this library folder might find relevant. The problems, methods, and findings are typically the main pillars of most papers [
11]. At the same time, these aspects can describe research in a specific and comprehensive way. To extract a set of problems, methods, and findings in the context of the library folder, our method takes a title and an abstract of the recommended paper and the folder description that represents the user’s research interest as inputs (T2 in Appendix
A.2). We guide an LLM to identify as many relevant problems from the recommended paper as possible. Then, we describe the specific methods applied by the paper for each problem and elaborate on the specific findings identified by applying each method. Our method uses these aspects not only as the summary of a single recommended paper, we further build on the extracted aspects to align two papers to create pseudo-citances as described in §
4.2.4.
4.2.3 Paper-paper Descriptions Based on Citances.
For this description type, we exploit citations between recommended papers and collected papers. Citation sentences (
i.e., citances) are widely used as proxies of the relationship between the citing and the cited papers [
38]. Among the intents of citances (
i.e., background, method, or results), citances with the background intent give more context about a problem, concept, approach, topic, or importance of the problem in the field [
14]. Citances with the background intent often contain information about how a citing paper presents a new approach and how it compares to a cited paper. Since our formative study revealed that participants regarded “build on” relationship to be important, we prioritize background citances as classified by [
14] when selecting a citance to generate a description.
A citance by itself is hard to understand for users without any context. Once a citance was selected in a recommended paper, we extracted the paragraph which the citance was in (
i.e., citing paragraph) to obtain additional context around it. Rather than showing the citing paragraph that only has partial content of the recommended paper, we employ an LLM to generate a compact but detailed description that describes both the recommended paper and its relationship to the cited collected paper as mentioned in the citing paragraph. Additionally, to support our DG3 of helping users learn new aspects of collected papers, we added a short summary of the cited collected paper. To obtain this description, we designed inputs of the prompt to include the titles and abstracts of both the recommended and the cited paper, as well as the citing paragraph (T3 in Appendix
A.3).
4.2.4 Paper-paper Descriptions via Generated Pseudo-citances.
The method described in the above section (§
4.2.3) does not apply to all recommended papers. Authors typically do not comprehensively cite all relevant papers. Some relevant papers are omitted from the citances even when they are relevant to some of the collected papers. We build on prior work that used the problem-method-findings schema to show similarities and differences between papers in a search engine setting. Our method generates structured summaries using a similar approach to describe paper recommendations anchored to previously collected papers. In our design, this type of description features two core relations primitives between papers: 1)
comparisons, which provide concise descriptions of the most salient similarities along either the problem or the method aspect and 2)
contrasts, which surface differences along a different aspect (
e.g., two papers that tackled the same research problem but used different methods). This structure has been shown to facilitate scholarly sensemaking and inspirations (§
2.2.)
Rather than concatenating each paper’s aspects that are similar or different with extractive techniques, our method aims to provide well-aligned comparisons and contrasts between two papers. This method is motivated by the formative study where participants mentioned that simple concatenation was not helpful in making sense of the relationships between two papers. To create this description, our method follows the following process: (1) find relevant papers, (2) identify shared aspects for each pair of the recommended paper and a collected paper to find similarities and differences between them, (3) verify whether the shared aspect is aligned with both papers and (4) generate a structured summary.
Here, we explain our approach in more detail, using the case where the recommended paper and the collected paper share similar “problems”. Our approach first selects the top-5 most similar collected papers to a recommended paper based on the abstract similarity using Flag embeddings [
56], a state-of-the-art text embedding model. Then, with the aim of finding similarities and differences between papers, we use the method described in §
4.2.2 to extract multiple problem-method aspect pairs from each paper’s abstract. By providing the titles and the aspects of all five relevant collected papers and the recommended paper as inputs, we instruct an LLM to (1) identify the top-5 papers that have problems that are the most similar to the problems in the recommended paper, (2) list all of the identified pairs (one from the given paper and the other from a collected paper), and (3) describe one shared problem that could encompass the two identified problems. To confirm whether the shared problem encompasses problems of both the recommended paper and the chosen collected paper, our approach prompts an LLM to verify whether the shared problem is addressed in each paper. This was done by providing each paper’s title, abstract, and the generated shared problem. Finally, by inputting respective contrasting methods employed in the recommended and the folder paper, along with the generated shared problem and the abstracts of both papers, the LLM generates the structured summary. This summary includes comparing and contrasting sentences with short summaries of two papers. While this process explains how to generate a description for two papers with similar problems, a similar process applies to a description of two papers with similar methods.
4.3 Paper Alert Interface
For each folder,
PaperWeaver takes the folder title, the folder description, and a list of papers in the folder as inputs. We use the publicly available Semantic Scholar Paper Recommendation API
8 to retrieve a list of recommended papers for a given set of folded papers. For each folder, a dedicated web page allows users to view recommendations displayed on detailed cards, featuring information about the paper (title, authors, venue, and year) and a machine-generated TL;DR summary [
10]. The title of a paper links to the corresponding paper details page on Semantic Scholar where users can access the pdf file of the paper (if available) together with other information such as the paper’s citations and references. Users can explore different descriptions displayed in three tabs:
•
Related to Paper (Fig.
2 B) shows paper-paper descriptions that are focused on showing relationships between the recommended paper and a specific paper previously saved in the folder. The descriptions shown in this tab are
paper-paper descriptions based on citances and
paper-paper descriptions via generated pseudo-citances. The two papers mentioned in the text are highlighted in different colors to indicate which paper the description refers to. Users can use the dropdown control to select descriptions for the relationships between the recommended paper and a specific paper.
•
Problem, method, and findings (Fig.
2 C) shows the
contextualized aspect-based paper summary for the recommended paper broken down into three aspects.
•
Abstract (Fig.
2 D) shows the original full abstract of the paper.
Through the paper alert interface, users can save a recommended paper (Fig.
2 E) to their folder and write a note (Fig.
2 F) about the recommended paper for future references.
4.4 Implementation Details
PaperWeaver interface is a standard web application. The back-end was implemented in Python using Flask for an HTTP server and PostgresSQL for a database. The front-end was written in TypeScript using React framework.
PaperWeaver retrieves the paper’s metadata (title, authors, abstract, etc.), TL;DR and citances from the public Semantic Scholar API
9. We obtain the extracted plain text for papers using S2ORC, an open-source PDF-to-text extraction pipeline and a corpus of processed 81.1M academic papers across multiple disciplines [
37]. We use GPT4-0613 through OpenAI API
10 for all text generation with an LLM.
6 Findings
Our results showed that PaperWeaver helped participants understand the relevance of recommended papers and make more effective relevance judgments. Additionally, PaperWeaver promoted the discovery of more connections between papers that were relevant to the topic of the folder, and led to writing more detailed notes that contained rich connections between papers.
6.1 General Behavioral Differences
During the study, we asked participants to think-aloud and observed how they interacted with the two systems. In both conditions, participants typically did a quick pass over the recommended paper list to filter out few recommended papers that seemed obviously irrelevant. When interacting with the baseline system, participants relied on the titles and the TL;DR summaries for this quick triaging process, while when interacting with PaperWeaver they additionally considered the contextualized aspect-based description. P2 mentioned that even though the two summaries were similar in length, TL;DR summaries typically focused on the research problems that were not always relevant to their folder’s topic. In contrast, the contextualized aspect-based description provided by PaperWeaver often surfaced parts of the abstracts that were relevant. At the same time, participants in both conditions said that after this first pass they still needed to examine most of the papers more carefully to understand how they are relevant and to confidently judge which ones to save.
When interacting with the baseline system, participants continued to read the full abstract or open the papers to see its figures but said this process was effortful. For example, P9 said that “I tried to understand [the recommended] papers but reading all of the abstracts is overwhelming.” Further, even when they tried to carefully examine the recommended papers in the baseline condition, they often failed to identify the connections. For example, P1 mentioned that “I cannot understand why this paper is recommended to me. It seems they are talking about just their own topic” and P15 said that “Abstract alone cannot answer ‘how does this paper relevant to my research context?’.
In contrast, when interacting with PaperWeaver, participants relied on paper-paper description after this first pass. In this case, many participants mentioned that they can better understand “how this paper is different from the paper that they’ve already known” and “what new contributions are there [in the recommended papers]” that were relevant to the collected paper. Interestingly, after reading the paper-paper description, participants often still continued to examine the abstracts. Participants mentioned the goal was to both verify the LLM-generated paper-paper description with the original sources and to “gain deeper context” around the paper-paper description once they became interested in a recommended paper (more details around this behavior in §6.4).
6.2 Exploring Papers Broadly by Understanding Relevance
Based on the post-survey, participants felt that they could understand how the recommended papers were relevant to their own research interest significantly better in PaperWeaver (M = 6.07, SD = 0.68) than the baseline (M = 3.80, SD = 1.64, p = 0.0013, NP). We also found evidence that PaperWeaver supported decision-making. Specifically, participants felt PaperWeaver helped them decide which papers were worth saving (PaperWeaver: M = 5.27, SD = 1.06, baseline: M = 4.00, SD = 1.55, p = 0.0024, P) and were significantly more confident in their decisions (PaperWeaver: M = 6.07, SD = 0.85, baseline: M = 5.33, SD = 1.07, p = 0.0124, NP). Qualitative insights revealed that these perceptions were due to how PaperWeaver contextualized explanations for each user based on their topical folder. According to our participants, both contextualized aspect-based description and paper-paper description effectively highlighted which parts of the abstracts or papers they should focus on (P9, P13). Specifically, while contextualized aspect-based description surfaced “explicitly relevant aspects dispersed in multiple sections [in the paper, relevant] to my folder context”. Participants also appreciated how paper-paper description connected recommended and collected papers with detailed and insightful relationships with “narrower research interest perspective” when compared to uncontextualized TL;DR summaries. For example, P15 mentioned that “I don’t have time to read all the abstracts but TL;DRs are too high-level. So with the [baseline], I do not have evidence to choose what to save. With the descriptions in PaperWeaver, I was more confident in my decision because I can get more understandable evidence from the explanations. For example, how the recommended papers [were relevant but] tackled different problems than the paper I knew already)”.
6.3 Deeper Insights on Both Recommended and Previously Collected Papers
To gain a deeper understanding of the knowledge participants had gained from interacting with PaperWeaver and the baseline, we further analyzed the notes participants took during the study in the two conditions (while being blind to which conditions the notes came from). We found that when using PaperWeaver, participants on average captured significantly more notes that described connections between papers (PaperWeaver: M = 2.21, SD = 1.81, baseline: M = 1.07, SD = 0.96, p = 0.0459, NP). At the same time, when describing the relationships, participants included similar levels of details after accounting for between- and within-participant variability (β = .113, p = .13). This suggests that the bottleneck of learning connections between papers is a recall problem, and that it is easier to capture these connections with PaperWeaver compared to looking at multiple paper abstracts in the baseline condition. These results based on participant behavior also corroborate their perceived understanding of the information that was presented in the two conditions. Specifically, in the post-survey, participants reported that they could understand how the recommended papers connect to their collected papers significantly better with PaperWeaver (PaperWeaver: M = 5.80, SD = 1.04. baseline: M = 3.00, SD = 1.67, p = 0.0021, NP).
One possible trade-off for the positive learning effect could require significantly higher cognitive demands on the users. However, based on our NASA-TLX survey, participants did not report a higher workload when comparing using PaperWeaver and the baseline. On the other hand, qualitative data do show anecdotal evidence of higher mental and temporal demand for some participants when using PaperWeaver. For example, P6 and P9 pointed to descriptions from PaperWeaver that peaked their interest in the recommended papers and urged them to explore them in more detail. P4 further commented that “I felt mentally more demanded using PaperWeaver because I actively compare the new papers with other [collected] papers. This might be a positive side effect of showing relationships.” These comments suggest that even for participants who felt using PaperWeaver was more cognitively demanding, they perceived it positively because it prompted them to become more actively engaged with the paper alert, motivating them to more deeply process the information in front of them.
Finally, in addition to understanding the recommended papers, our analysis of the post-survey data also revealed that participants felt PaperWeaver helped them “learn something new about the papers that they have already saved in the folder” significantly more than the baseline (PaperWeaver: M = 5.33, SD = 1.49. baseline: M = 3.67, SD = 2.18, p = 0.011, NP). In the interviews, participants pointed to different benefits when seeing descriptions that covered relevant collected papers in the folders, including refreshing their memories about papers they had previously read and gaining new knowledge and perspectives about them based on how recommended papers described them in their related work sections. In addition, we also found some participants who had saved papers in their folder that they had not yet read, in this case, they describe how the descriptions from PaperWeaver helped them rediscover previously collected papers with a renewed interest.
6.4 Accuracy of LLM-Generated Descriptions
One known issue with current LLMs is that they can be prone to hallucinations (
cf. [
5,
7,
53]), and we did find that some descriptions contained one or two mistakes based on manual evaluation. In general,
contextualized aspect-based description are more extractive and contain fewer mistakes than
paper-paper description which relies on training knowledge to align the extracted aspects. Specifically, for
contextualized aspect-based description, 8% of samples has one or two mistakes (HCI: 6%, NLP: 10%, based on 30 samples each), and
paper-paper description, 20% (HCI: 16 %, NLP: 23%, based on 30 samples each). One fundamental question here is that - is this an acceptable level of accuracy in the context of paper alerts? Results from our user study around participants’ strategies when interacting with
PaperWeaver offered some insights.
Firstly, like most retrieval systems, results from document recommender systems can also contain errors (i.e., documents falsely classified as relevant) [
15]. Because of this, we found that when interacting with paper alerts, participants were already in the mindset of verifying the machine-generated document recommendations, in both conditions. We also found participants go through the list of recommended papers in multiple passes, typically using the titles first to rule out clear irrelevant recommended papers, then reading the generated descriptions in the second pass, and finally verifying them with the abstract or the content of the papers. For example, representative quotes from P6 and P8 mentioned
“validate the connections between them [papers] in the abstract” and P2 provided more details around their strategy:
“[I read abstracts] not only to see deeper levels of information but also to verify the content of the descriptions.”Secondly, many participants also explicitly mentioned that they see the PaperWeaver descriptions as “supplementary material” that help them triage the recommended papers, suggesting that participants could appropriately adjust their levels of reliance on the LLM-generated descriptions. P2 said “I used the descriptions from [PaperWeaver] as supplementary material going from titles and TL;DR to abstract,” representative quote from P8 additionally pointed to benefits around awareness and discovery “PaperWeaver’s descriptions triggered me to become interested in two highly relevant [recommended] papers and have curiosity about them so that I can [decide to] read the abstracts and [then the] whole papers to get more information. These descriptions are supplementary bridges from the titles and TL;DRs to the abstracts rather than replacing them.”
Finally, results based on post-survey and NASA-TLX questionnaire suggested that
PaperWeaver did not increase perceived workload even though participants were actively verifying LLM-generated content and that they were able to judge the relevance of the recommended papers more confidently and captured richer relationships between papers in their notes. We acknowledge that our participants were computer science researchers who currently might be more familiar with recent advancements in LLMs than researchers in other domains, although the increasing popularity of LLM-powered end-user tools also provides them increasing opportunities to interact with LLMs in other scenarios. Nevertheless, our results suggest that interfaces that make use of LLM-generated text should always provide both adequate indication when generated text is presented to its users and allow users to freely turn on or turn off generated content. Moreover, it is essential to design effective mechanisms for users to verify content efficiently. Inspired by AngleKindling’s system design [
45] that shows the connection between LLM-generated angles and source material to support journalists, one approach involves making specific sections of the paper, particularly relevant to the LLM-generated descriptions, accessible—such as through highlighting or indicating whether there is clear citations with a hyperlink to source papers. We also can indicate whether there is clear source material that the output is grounded on by marking citations at the end of each sentence.
7 Limitations and Future Work
7.1 Limitations
Our work has some limitations. We conducted our study with computer science graduate students who might not represent the broad spectrum of academic domains. While our approach is motivated by a general use case and designed to accommodate a broad range of academic domains, further evaluation with researchers from other academic domains, especially less technology-oriented ones, can help us understand how the generated descriptions can be applied broadly. Another limitation is our use of the Problem-Method-Findings schema. While this schema fits a large portion of papers, it might not cover some other papers such as survey papers or systematic review papers [
26]. Further, researchers might want to apply their own schema (e.g., medical researchers would be interested in specific aspects of clinical trials). We discuss how to extend the schematic digests beyond Problem-Method-Findings in §
7.4. Finally, the measurements in our user study are more focused on subjective measures because it is inherently challenging to get robust and valid measures that score participants’ understanding of the relevance between papers and measure their triaging performance due to the evolving notion of relevance personalized to each individual. Standardization of this notion would require careful examinations of prolonged interaction scenarios beyond the scope of the laboratory experiment conducted here. While we obtained objective metrics in the experiment that can be evaluated independently of each researcher’s context, as a future work, conducting a longitudinal deployment study could involve participants assessing their own written outputs over time, allowing for evaluations that consider evolving individual contexts.
7.2 Pairwise vs. Multiple-Paper Descriptions
In this work, we focused on describing pairs of recommended and collected papers to help users contextualize unfamiliar papers with familiar papers. In our current design, each description only covers two papers: a recommended paper and a collected paper. This design decision was based on our formative study where we observed that participants with different familiarity with the topics all found benefits in seeing descriptions that covered two papers, but less experienced participants felt that descriptions covering three or more papers were too complex for paper alerts. Future work could explore ways to allow users to further customize their paper alerts by adjusting the complexity of the descriptions. One opportunity for future work in this direction is to look to research that focuses on automatic related work section generation [
43] or multi-document summarization [
21] that aimed to generate descriptions for many documents. One interesting and relevant use case we observed in the user study was one participant who had saved a survey paper in their folder. In this case, the description
PaperWeaver generated allowed participants to compare and contrast recommended papers with different research threads that were described in the survey paper’s abstract and was perceived positively by the participants.
7.3 Longer-Term Usage and Evolving Folder Descriptions
In the user study, we observed that participants collected richer notes that captured information that connects multiple papers as opposed to about a single paper. Since PaperWeaver leverages a user’s folder name and description in its prompts to generate a description that can better reflect a user’s knowledge about the folder topic, an interesting future direction is to allow users to update their folder name and descriptions based on what they had learned in a paper alert. This information can be used to update the folder description to represent the user’s current knowledge and this can, in turn, improve subsequent paper alert generations. In this sense, the user’s folder can serve as an evolving external representation of the user’s understanding on a specific research topic. However, the longer-term effects of accumulating additional notes need further investigation.
7.4 Extending Schematic Digests beyond Problem-Method-Findings
Participants in our studies have also commented on how they would like to further customize the information presented in paper alerts, pointing to avenues for future work. First, some participants commented that they wanted to be able to surface information along certain other aspects of the schema, such as differences or similarities in evaluation regimes and their outcomes (e.g., positive, negative); what types of study designs were run and how they were conducted (e.g., controlled lab studies, field deployment studies, RCTs); approaches to developing AI models in a given problem domain; and the design of interaction features in proposed systems. Such aspects of schemas were customized to different participants, suggesting that while the default information provided in PaperWeaver’s problem-method-findings schema served as a useful entry into recommended papers, a deeper inspection following users’ triage would benefit from further digesting papers along user-defined secondary schemas.
However, the first-and-secondary division of schemas that adapts to users’ interaction with paper digests over time also suggests that there is an important gradient of specificity that may have to be designed to be adaptively adjustable based on user interaction, for example by utilizing a form of passive sensing over users’ intent based on their interaction. Clear examples of this are when participants stated that they wanted to “see more” in the aspect-based paper summaries, demonstrating an intent for lower-level details; another intent that participants may express would be wanting to “adjust” an explanation provided for a recommended paper, for example, as pointed out by P4, when the problem-method-findings schema in our approach did not provide useful information for survey papers because the problem and method descriptions were presented at too high of a level, abstracting away useful details of any individual papers or groups of papers synthesized by the authors of the survey paper.
7.5 Contextualized Paper Descriptions beyond Paper Alerts
Finally, while we focused on the scenario of helping users make sense of paper recommendation alerts, the proposed pipeline can potentially be generalized to other scenarios where users need to make sense of unfamiliar papers. For example, using one’s publications as “collected papers” and generating descriptions for papers from another author to explore common research interests and facilitate collaborations. Another opportunity is to enrich a user’s experience when
reading a related work sections in a paper (
cf. CiteSee [
12], Threddy [
30], CiteRead [
48]) by generating alternative descriptions about the cited papers based on papers already familiar to the current user. Beyond comparing with existing papers, users can also compare their own draft of a paper with new papers that might be relevant to them to get insights or new perspectives for framing the contrast and comparisons between papers when organizing related work sections.
A Prompts
All prompts used in PaperWeaver are listed below. The blue text represents the input content.
A.1 [T1] Generating Folder Description
System Prompt You are an intelligent and precise assistant that can understand the contents of research papers. You are knowledgeable in different fields and domains of science, in particular computer science.
User Prompt This is my scholarly library, titled folder title . The following papers are included. Write down two-line descriptions about this library that deal with high-level characteristics of these works commonly shared. Present the result as "Title: <given title>; Description: <two-line descriptions starting with "It encompasses">.
[Library papers]
A set of titles of library papers
A.2 [T2] Generating Contextualized Aspect-based Paper Summaries
System Prompt You are an intelligent and precise assistant that can understand the contents of research papers. You are knowledgeable in different fields and domains of science, in particular computer science. You are able to interpret research papers based on the user’s perspective.
User Prompt We would like you to extract the dimensions of the paper based on my research interest. You will be given my research interest and a paper and will be asked to extract the problem, method, and findings that I might have interest in from the paper. You will be provided with the title and abstract of the paper and my research interest that describes the topics that I’m currently interested in.
[The Start of My Research Interest]
folder description
[The End of My Research Interest]
[The Start of Given Paper]
Title: title
Abstract: abstract
[The End of Given Paper]
[System]
Please identify as many relevant aspects from the paper with respect to any research problems in the topic of folder title . Once you identified the research problems, describe what specific methods the following paper is applying for each of the problems. Each method from the paper should resolve the matched problem and they should be specific, which means not widely used. Once you identified the methods, describe what specific findings the following paper identified by applying each of the methods.
Finally, return a result as Python dictionary object of the following format: "[{"Problem": <problem composed of 20-word long phrase>, "Method": <method composed of 20-word long phrase>, "Findings": <findings composed of 20-word long phrase>},.]". If there is no specific method to resolve the problem, then write down "N/A".
A.3 [T3] Generating Paper-paper Descriptions Based on Citances
System Prompt You are an intelligent and precise assistant that can understand the contents of research papers. You are knowledgeable in different fields and domains of science, in particular computer science. You are able to interpret research papers to identify similarities and differences between research papers.
User Prompt We would like you to compare two research papers for a researcher. You will be provided with the title and abstract of each paper. To help you when you compare the papers, we provided a subsection of Paper A where Paper B is cited. In the subsection of Paper A, cited Paper B already identified methods that are similar between the papers and what problems are solved in each paper using these shared methods.
[The Start of Paper A]
Title: title
Abstract: abstract
[The End of Paper A]
[The Start of Paper B]
Title: title
Abstract: abstract
[The End of Paper B]
[System]
Please explain the content of Paper A for a researcher. Explain the paper by comparing it to Paper B, and interpreting the relationships between these papers. Your explanation should only be four sentences long and it should follow the following structure: a sentence that states what aspects are similar between Paper A and Paper B, one sentence summary of Paper A, one sentence summary of Paper B, and one sentence comparing and contrasting between Paper A and B.
A.4 [T4] Paper-paper Descriptions via Generated Pseudo-citances - Finding similar problem aspects across the recommended and collected papers using LLM
System Prompt You are an intelligent and precise assistant that can understand the contents of research papers. You are knowledgeable in different fields and domains of science, in particular computer science. You are able to interpret research papers to identify similarities and differences between research papers.
User Prompt We would like you to examine a set of papers. You will be given a paper and will be asked to compare this paper to a list of papers labeled A, B, C, and D. You will be provided with the title of each paper and a set of dimensions that describe the content of the paper. These dimensions describe different problems that were addressed by the paper, the method applied in the paper to address each problem, and findings related to that problem and method. These dimensions are provided in a Python JSON format.
[The Start of Given Paper]
Title: title
Dimensions: dimensions
[The End of Given Paper]
[The Start of Paper A]
Title: title
Dimensions: dimensions
[The End of Paper A]
[The Start of Paper B]
Title: title
Dimensions: dimensions
[The End of Paper B]
[The Start of Paper C]
Title: title
Dimensions: dimensions
[The End of Paper C]
[The Start of Paper D]
Title: title
Dimensions: dimensions
[The End of Paper D]
[System]
Please compare the problems of the given paper with the problems of the other listed papers. Please identify papers in the list that have problems that are the most similar with problems in the given paper. Focus on identifying problems that are similar even though they may be resolved with different types of methods. List all of the identified pairs of similar problems, where one problem is from the given paper and the other is a similar problem from another paper in the list. For each pair, please describe one shared problem that could contain the two problems. You should avoid just simply concatenating two problems when describing a shared problem. Also, you should avoid containing a phrase that is only included in one of the papers even though it is a very small part.
Finally, return the list of pairs of similar problems and the shared problem for each pair as a list in a Python JSON object of the following format: "[{"chosen_paper": <title of the paper that has a problem that is similar to one in the given paper>, "similar_problem": <problem that is similar to a problem in the given paper>, "given_problem”: <problem from given paper that is similar to the identified problem>, "shared_problem": <one challenge that can encompass the two similar problems>},.]". You should ensure that you return a valid JSON object by escaping any quote marks in your output. (Example: {"valid_object": "This is a "valid" JSON object that escapes any c̎haracters."}) If there were no papers that share common problems with the given paper, then only write down "N/A".
A.5 [T5] Paper-paper Descriptions via Generated Pseudo-citances - Verifying whether shared problem is aligned with each paper
System Prompt You are an intelligent and precise assistant that can understand the contents of research papers. You are knowledgeable in different fields and domains of science, in particular computer science. You are able to interpret research papers to identify similarities and differences between research papers.
User Prompt You will be provided with the title and abstract of Paper A and the given problem.
[Title of Paper A]
title
[The End of the title]
[Abstract of Paper A]
abstract
[The End of the title]
[The Start of Given Problem]
shared problems from paper A and B
[The End of Given Problem]
[System]
Please verify whether Paper A tackled the given problem based on the abstract of the paper. Provide the result as True if Paper A tackled the given problem with their own method, else provide False. If the part of the given problem is not aligned with Paper A’s challenges, it should be verified as False.
A.6 [T6] Paper-paper Descriptions via Generated Pseudo-citances - Generating structured summary
System Prompt You are an intelligent and precise assistant that can understand the contents of research papers. You are knowledgeable in different fields and domains of science, in particular computer science. You are able to interpret research papers to identify similarities and differences between research papers.
User Prompt We would like you to compare two research papers for a researcher. You will be provided with the title of each paper and a set of dimensions that describe the content of the paper. These dimensions describe different problems that were addressed by the paper, the method taken by the paper to address each problem, and findings related to that problem and method. These dimensions are provided in a Python dictionary format. To help you when you compare the papers, we have already identified problems that are similar between the papers and what methods are adopted in each paper to solve the shared problem.
[The Start of Paper A]
Title: title
Dimensions: dimensions
[The End of Paper A]
[The Start of Paper B]
Title: title
Dimensions: dimensions
[The End of Paper B]
[The Start of Shared Problems]]
shared problem addressed in Paper A and B
[The End of Shared Problem]
[The Start of Methods]
Paper A: method that is used in Paper A to resolve the aligned problem
Paper B: method that is used in Paper B to resolve the aligned problem
[The End of Methods]
[The Start of Research interest]
folder description
[The End of Research Interest]
[System]
Please explain the content of Paper A for a researcher. Explain the paper by comparing it to Paper B, and interpreting the similarities and differences between these papers. You should consider the researchers’ research interest, which is described above when explaining Paper A. Ensure that your explanation includes information that may be fascinating or engaging for the researcher based on their interests. Your explanation should only be four sentences long and it should follow the following structure: a sentence that states what aspects are similar between Paper A and Paper B, one sentence summary of Paper A, one sentence summary of Paper B, and one sentence comparing and contrasting between Paper A and B.