Keywords

1 Introduction

Benjamin Franklin Wedekind (1864–1918) was a German playwright, probably best-known for his play Spring Awakening, and book author. During his life, Wedekind maintained a manifold of correspondences via postal mail, i.e. letters and postcards. Wedekind’s correspondences are of great literary value and must be preserved for future generations [4]. Currently, there are about 3500 correspondence items that are known and accessible, but further artefacts are regularly discovered during research on known artefacts. Those are to be transcribed, critically annotated, categorised and commented, which is the basis of the digital scholarly edition of Frank Wedekind. In order to further clarify the term digital edition, the categorisation of Franzini et al. [7] is quoted as follows:

S-Scholarly: An edition must be critical, must have critical components. A pure facsimile is not an edition, a digital library is not an edition.

D-Digital: A digital edition cannot be converted to a printed edition without substantial loss of content or functionality. Vice versa: a retro-digitised printed edition is not a scholarly digital edition (but it may evolve into a scholarly digital edition through new content or functionalities).

Edition: An edition must represent its material (usually as transcribed/edited text) – a catalogue, an index, a descriptive database is not an edition

Sahle states that digital scholarly editions “offer the opportunity to overcome the limitations of print technology” [13] and argues that progressing from print to digital editions opened new perspectives regarding accessibility, searchability, usability and computability. Digital and print editions can be differentiated as follows: “digital editions follow a digital paradigm”, in comparison to print editions that follow(ed) a paradigm that “was shaped by the technical limitations and cultural practices of typography and book printing” [13].

Digital scholarly editions serve, from a computer science perspective, as a repository for data. This data can be enriched through annotations and comments by editors, which can provide additional information such as interconnections or relations. The data could be used to investigate and analyse relations between correspondences and an author’s literary work, to research connections to other authors or to get a better understanding of the society of the past.

There are two roles that we find important to differentiate here: the author/editor of digital scholarly editions, and its counterpart the information seeker. Both well-known expressions from the information retrieval (IR) domain. The author creates data, that the information seeker is supposed to be able to search in, to browse, etc. If an author creates an incomplete or incorrect annotated text, the information seeker might not be able to find it later on. Authors can also be information seekers, e.g., when existing data is used to support the process of creating/annotating new content.

As an editor one has to conduct a manifold and extensive groundwork, which is necessary to create a digital scholarly edition, that requires expert knowledge, and a vast amount of time and effort. Usually, editors have to cope with a manifold of tools and applications, such as image processing tools (for facsimiles), text annotation/mining and database tools. We have briefly compared tools and applications that were suggested in related work and in online communities (with focus on digital humanities) as well as online courses. However, many tools are either tailored for specific domains/purposes that require a deep technical knowledge, have technical constraints (i.e. require specific platforms), are out-dated or even incompatible with current operating systems, or proprietary software, which naturally comes with various limitations and constraints.

We share the point of view of Sahle, who stated that “a digital edition is more like a workplace or a laboratory" [13], and thus posed the question of how to create a workplace, also referred to as “virtual research environment”, that is easy and efficient to use. Franzini et al. [8] created a survey to investigate the expectations and requirements of digital editions and found that “digital editions are imperfect tools unable to meet the expectations of every single user”, and suggested to “explore the extent to which creators of digital editions engage with their target users during the preparation and development stages of the project”. Therefore, an understanding of how authors create digital scholarly editions, for whom they are created, and what user expectations and requirements for scholarly digital editions are, remains an important research endeavour.

Our contribution is therefore threefold. First, we provide an insight into the work of editors, by analysing their tasks, in the context of a digital scholarly edition for correspondences of Wedekind. We show typical challenges in the process of creating a digital scholarly edition. Second, we show that there is a necessity of providing tools and advanced features for editors of digital scholarly editions for correspondences based on our evaluation and in correlation of related work. Third, we describe our design process and outline the methods that work and did not work, to guide future researchers in the same situation. This includes the challenges of working in an interdisciplinary team of researchers from humanities and computer science, such as terminology, different knowledge of technology and experiences in using software.

2 Related Work

Baillot et al. [1] state that “Scholarly editions are generally conceived by scholars for scholars”. They differentiate between “Digital scholarly editions” and “scholarly print editions”. According to Baillot et al. digital editions have higher accessibility, provide the potential “to address and actually reach other readers”. However, it appears that the developers of this edition are also seen to be the users. Assumptions are made and features implemented, but insights about the actual users are only accessed by evaluating log files later on.

Pohl et al. [12] describe new approaches towards user research and software architecture and depict a differentiation between maintaining software and adding new features at the same time. Issues about outdated or old systems in this domain are reported and challenges outlined.

We found that most publications neglect the important part of how data is actually prepared and efficiently entered into the system and rather focus on how data is presented, shared or used (e.g., [21]). Thus, we also conducted a competitive analysis focused on usability and data input based on the catalogue of digital editions provided by Franzini et al. [8]. The results showed a variety of methods, such as manually editing XML files [6] to comply to the TEI correspDesc XML standard [17] custom plugins for Microsoft Word that transform Word documents into TEI [15] or WYSIWYG editors [14, 18].

3 Design and Development Process

In this project, we introduced a user-centred design process based on the double diamond design process [20], which utilises the principle of divergent and convergent thinking and serves as a guideline to find the right problem(s) and the right solution(s) [11]. The overall aim is to improve the computer-aided work [3] of editors of digital scholarly editions. Thus, we utilise our interdisciplinary team of researchers from digital humanities, interaction design and computer science to design, develop and test concepts, prototypes and new releases of the digital scholarly edition.

In parallel, we use a second process, from software engineering (SE), which is the basis for maintaining and incrementally extending/improving the existing software. The SE process handles tasks for the supporting platform (e.g., updates of server operating system, Java EE stack, etc.) and other tasks, such as automated database backups or security (e.g. SSL Certificates, firewall settings, etc.). The software is already used by researchers and is therefor a productive system, which requires frequent maintenance to ensure its availability and stability.

For completion, we also sketch the process used by the researchers in digital humanities. In Fig. 1 we depict the processes as parallel tracks of processes to show the methods and interconnections in one time line. Our process is similar to the Agile UX Approach (e.g. explained by Schwartz [16]), but due to a small team size and variable/temporary team members (e.g., master students writing their thesis), we adapted the process to our needs.

Thus in design phases, before a software release is created, we collect insights by conducting contextual inquiries, interviews and workshops. Low- to high-fidelity prototypes are built and tested with users, and the knowledge and feedback gained are used to implement features for the target stack. We use a separate test environment (same hardware and stack setup as the productive system) for testing new releases before the productive system is updated. After releases are rolled out, they are used by real users, and we gather feedback through contextual inquires, focus groups or usability tests. Users also frequently document and report issues via email, which can be roughly categorised into: software-related bugs (something does not work at all), usability related (something does not work well), and utility-related (we need something to ...). All of those are managed in “GitLab”, which serves as a central point for information (e.g., design guidelines, interview data, bug reports, technical documentation, etc.). Thus, the knowledge gained before and after software releases, influences and informs future design decisions in this process.

One benefit of utilising two different processes is supporting different types of work within the project, such as technology-focused research (e.g., database query performance, improving the technology stack, etc.) and design-focused research (e.g., optimising the application’s usefulness (utility + usability)). This is also an advantage for side projects, such as a student writing a master thesis. Students have their time frame and schedule and thus produce artefacts and solutions, but also insights into specific problems. Their solutions can (and most often do) solve problems discovered/documented in either of the two high-level processes.

Fig. 1.
figure 1

High level view of the Design Process and its intertwining and interconnections with the processes of the digital humanities research group and the software engineering process  

3.1 Design Process

The double diamond design process is split into four phases: discover, decide, develop and deliver. The process is iterative but not necessarily sequential, which means it is possible to jump between phases, if necessary. The design process in Fig. 1 shows, that certain phases might also overlap, due to parallel work or the fact that in the time frame between two software releases, users will file error and bug reports always for the newest release, and thus their feedback for the last release overlaps with the develop phase, where we build prototypes or implement new releases. Frequent feedback, e.g. via email or conversations, is not depicted in the process seen in Fig. 1.

In the design process several methods are used, such as contextual inquires, task analysis, competitive analysis, building low-fidelity prototypes, conducted interviews as well as focus groups, sketching. High-fidelity prototypes can be time-intensive, and it is sometimes faster to directly implement a solution for the actual target platform using the SE process. This takes a considerably high effort, but the chance of failure shrinks, because of previous iterations and user tests in the design process. A result from the software engineering process is usually a new software release, which is then used in the design process to conduct user studies, etc.

The design process is the central part of this project. It is used to build the bridge between two domains, i.e. digital humanities and computer science (technology). From a high-level perspective, the design process allows us to be between human and technology, and thus have to opportunity to observe, document, facilitate communication and to work in and with both sides. This is also an opportunity to encourage interdisciplinary work between domains and to break existing barriers, e.g., by organising workshops with participants from (digital) humanities and computer science (see Workshops in Fig. 1).

3.2 Software Engineering—Maintenance, Utility and Usability

Since maintenance and software architecture is not part of this paper, we will not display the SE process in all details, but rather outline its use from the perspective of the design process as follows.

The existing system has to be maintained so that the software stack is kept up-to-date, which means to update the operating system, platform and components, but also to fix security issues and other bugs. For example, a new web browser version might introduce changes in handling web pages that also requires an update of the software. E.g. HTML/CSS constructs might be rendered slightly different, which requires an update of the user interface component. General changes, e.g., different security settings cause the browser shows warning messages when the server does not support SSL connections, might require updates.

In fact, there are a lot of “moving” parts that have external dependencies, which come with their development cycles and dependencies and thus having a separate SE process to tackle those challenges and limitations proved to be useful.

Implementing new features that involve new frameworks, components, libraries, etc. is another reason to commit to a well-defined SE process. E.g., using continuous integration and deployment to automatise the release of new versions removes a lot of manual workloads. The rising complexity of software, that is increased with every new feature, requires the use of well established SE methods.

Users are usually not interested in how their solution is built, but rather care about whether the system is up and running. However, as soon as problems appear and the software does not run as expected, they cannot work in their established process or have to skip certain parts of it. This is especially critical in the context of a digital scholarly edition, which “as a publication is a process rather than a product. It grows incrementally not only before its final release, but also during its availability to the public” [13]. Thus, breaking the process in any form means not only stopping the incremental growth but also to disrupt its availability. Therefore, it is necessary to facilitate best practices from the well-established software engineering domain, to provide state of the art technology and methods for digital scholarly editions and their underlying platforms.

4 Interviews

To understand the process of creating scholarly digital editions, we conducted a qualitative study based on the method of qualitative, semi-structured interviewing and on-site contextual inquiry [2]. The results are insights into the work of editors, i.e. experts of the field, and their tools.

Table 1. Details about interview participants.

4.1 Methodology

We interviewed four experts in the domain of humanities (see Table 1), who currently work with digital scholarly editions. Three of those experts (P1, P2, P3) have studied Wedekind and his correspondence, publications and artefacts in depth. One participant (P2) has studied Wedekind for more than 30 years and published multiple print editions. Due to time constraints, we scheduled and conducted all interviews within 3 month from November 2018 to January 2019.

The interviews of P1, P2, and P3 were separated into two parts. The first part was a contextual inquiry with the editors at their workplace. We observed their work with the initial editor system (described in [18, 19]), took notes and documented the workflow by taking photos (see Figs. 2, 4, and 3) and videos of the screen. We recorded a total of about 7 h of audio recordings, which we transcribed. The editor system had the same version in all interviews.

The second part of the interview was a semi-structured interview, with questions about digital scholarly editions, the expert’s expectations, research purpose and usage. Due to the remote location of the interviews and time constraints, we conducted the contextual inquiry and the interview on the same day, but with a break of about one hour between both.

We also interviewed one expert (P4), who is not directly involved in the Wedekind research but worked with digital editions, and used the same semi-structured interview guideline.

Terms and expressions were translated from German to English, in the form of an analogous translation, in order to present and describe the tasks and workflow in the following sections.

5 Task Analysis and Work Flow

Based on observations during the contextual inquiries, bug reports, email discussions and workshops, we analysed tasks and workflow. Usually, digital scholarly edition “projects start with digital facsimiles and subsequently create transcriptions and edited versions of the text” [13]. However, in this project, we observed a more complex process. E.g., it requires a not to be underestimated preliminary effort to track down and to acquire documents or digital facsimiles (see Sect. 5.2). Analogue documents have to be digitised (see Sect. 5.3) and the vast amount of files have to be managed (see Sect. 5.4). While this sections seems to be ordered in a timely manner, please note that tasks are often done in parallel, not necessarily sequential, and also iterative.

5.1 Correspondence Item

The herein analysed digital scholarly edition contains correspondence items. Those come in many different types of correspondence, such as letters, postcards, or drafts/notes. These items are either handwritten, typewritten or pre-printed (or a combination of those). An item can be written and/or signed by multiple authors and can contain multiple languages (e.g., quotes from another language such as French). The common script used around 1900 is “Kurrent”, but also Latin and the use of typewriters became more common. In summary, correspondence items may contain mixtures of writing styles in Kurrent, Latin, or typewritten text, by one or multiple authors and in one or multiple languages. E.g., a post-card contains pre-printed text (e.g., the description of the picture on the back of a post-card), an addressees address written in Latin, and a German text written in Kurrent quoting french phrases. Also, some authors add drawings, sketches or other markings, for example musical notes in Fig. 4c, as shown by P2 during the interview.

5.2 Acquisition

Correspondence items may be available as paper copy of an original (see Fig. 2c), as digital scan, i.e. facsimile (in various file formats, Fig. 4a on the right screen), or as original (see Fig. 2b). The latter involves either a local or remote repository, such as a library, an archive or a private collection, where the correspondence items are stored and organisational effort to find, request and get access to relevant items. Additionally, artefacts and their facsimiles might be subject to copyrights and thus additional legal paperwork in the form of contracts or agreements have to be considered and managed by the editors. Certain obligations might affect how facsimiles can be presented online, such as the requirement of showing copyright information below the image, adding additional thank you notes, restricting the size/resolution in case of download options, or restricting the download of the facsimile. Also, copyrights last for a certain period, for example, dependent on the date of death of the author. Editors have to take this into consideration, and the digital scholarly edition system must comply with laws and regulations, which leads to additional requirements for the implementation.

5.3 Digitisation

After analogue documents have been found, they need to be digitised, i.e. scanning/photographing the physical document, which as an artefact is then called “facsimile” Some facilities, such as archives and libraries, offer the service of digitisation for a service fee or provide facsimiles directly. In some circumstances those facilities might lack the resources for digitisation, thus researchers have to travel to the facility (with equipment) and digitise documents themselves (see Fig. 2a). Thus, researchers have to be familiar with digitisation software, the camera equipment and the best practices for digitising documents, such as using a colour reference chart that is required for colour calibration later on (see Fig. 2b). The reference chart also allows to determining the size of the given object in the photograph, which can be a utility/feature in the digital scholarly edition system.

Fig. 2.
figure 2

A: Set-up (camera/laptop) by research team to digitise artefacts; B: Camera Point of View, showing an correspondence item and a colour reference chart; C: Researcher measuring a paper copy of an correspondence item. 

5.4 Document Management

Handling multiple thousand files like facsimiles/images, contracts, word documents, makes rigorous file management necessary. The research group of P1 and P3 use a shared Group network drive for all project-related files. They defined three main folders: (A) “letter lists”, (B) “scans from sites” and (C) “transcriptions”, with the following content:

  1. A

    contains two Excel-Files called “Letters to Wedekind” and “Letters from Wedekind”, which contain information about the current state of progress and metadata (e.g., author, date, place of writing, etc.) for each correspondence item.

  2. B

    contains facsimile files organised in folders named from A to Z with sub-folders in the format “City, Name of Facility”, e.g., “Zurich, Central Library”.

  3. C

    contains transcriptions, additional material, such as biographies, and necessary facsimiles, which have been copied from folder B and renamed so that the filename can be used to identify the contained letter or part of a letter. The files are organised in a folder from A to Z with sub-folders for each correspondent in the format “Lastname, Firstname”, e.g., “Zweig, Stefan”. This means the folder “Zweig, Stefan” contains all material related to the correspondence of “Frank Wedekind” and “Stefan Zweig”.

Fig. 3.
figure 3

A: list of correspondence items in Word (used to sort items by order of correspondence); B: Logging the state of the transcription correspondence item in a Word document; C: Excel sheet showing a list of correspondence items with additional information about transcription states and other metadata 

5.5 Meta Data

Before the actual transcription takes place, metadata of the document has to be documented. The editors “set the framework and define the subject” [13] of the digital scholarly edition. Currently, the following metadata and categories are documented (if available): place of writing, date of writing, author, coauthor, addressee, document type (letter, draft, postcard, etc.), materiality, site of recipient, information of publication/print, information of location (where the originals are), copyright information. The materiality of the physical document contains information about fonts, writing style, writing tools (e.g., pencil, pen, ink, etc.), type of paper. Additional notes (free-text fields) can be added, e.g., to describe additional features. Notes are also used to denote how metadata was determined, e.g., if the diary of a person was the source of the information.

Since adding data sets of correspondence partners (person entities) is time extensive, the research team of P1 and P3 decided to create place, city and person entities in the database in advance to speed up the annotation process later on. This is possible through information from the Excel files described in Sect. 5.4. Entities can also contain additional contextual information such as a biography for a person. Creating the person entity is not seen to be a problem since for many things there are already databases with the necessary information available (P3). However, completely unknown persons, that appear in this context for the first time and have never been mentioned anywhere else, might require additional research. Another example is the change of city or street names, places and countries within the last century. Researchers use for example phone books from the time of the correspondence, which are often available from archives/libraries (as can be seen in Fig. 3b), in order to validate street names.

5.6 Transcription

We observed different setups and workflows for transcribing and reviewing correspondence items in the contextual inquiries. P1 and P3 use a Desktop-PC with two screens, one showing the facsimile and another one for the word processing software or web browser (see Fig. 4a). P2 uses a Laptop, thus single screen, but works with copies/printouts of facsimiles (see 4c). P1, P2 and P3 use keyboard and mouse, and P2 did not use the touchpad of the laptop.

Fig. 4.
figure 4

A: Work place set-up with two screens; B: Using an external archive to research street names in phone books; C: Paper copy of a correspondence item showing hand drawn music notes. 

The research group of P1 and P3 decided to create a template in the form of a Word file, which is used to collect metadata, transcription and other information of a correspondence item in a single document. The Word file is the work basis for all following steps. P2 and P3 explained that one Word file is used to collect a complete correspondence, but P3 stated also that this depends on the number of correspondence items and the workflow of the editor. Some editors in the research group use one Word file per correspondence item.

All editors of this digital scholarly edition adapted the workflow of transcribing to a Word file first. Multiple reasons have been identified. System crashes and software errors in early prototypes/releases of the system could have caused, yet unsaved, transcriptions to get lost (P2, P3), thus the Word file also serves as a backup for transcriptions, metadata and other notes. The file can contain a whole correspondence, instead of a single correspondence item (P3). This is important for large correspondences, where often additional notes are taken, which the current system lacks functionality for (P3). It becomes easier to find the order of correspondence and to determine a possible time range for unknown dates of writing of correspondence items when an overview of the whole correspondence is accessible (P3). P3 also stated that the digital scholarly edition system provides “very little methodological control”, which can easily lead to mistakes.

An initial transcription is done by one editor and usually reviewed by at least one, most often two or more other editors. A review in this context means, that another editor collates the transcription, metadata, annotations and comments for correctness and completeness. Deciphering handwriting can be difficult because the language and thus expressions might differ from nowadays language. Therefore a discussion about the meaning of certain characters/words might be required before a conclusion can be made. P2 described the review process as letting the reviewer read the text from the original document out loud so that the editor can check/confirm the transcription. P3 explained a similar review process. Thus, at least two people read the same original document, which allows detecting miss-readings, wrong interpretations of handwriting or other errors. P2 also stated that at least two people review the transcription and that this is necessary to ensure the quality of the transcription. P1 and P3 use a Word file with the transcription, metadata and comments for review. The word comment function is used to add comments and mark problematic areas or cases of uncertainty with text markings. Editors can leave notes for the reviewer, to point to a specific already known problem.

5.7 Annotations

Sahle states that textual criticism, comments, annotations and contextual texts “have to substantiate the claim that this is the best possible representation of the editorial subject” [13]. Therefore this digital edition provides a What-You-See-Is-What-You-Get (WYSIWYG) Editor, the key component for annotating, commenting and adding contextual texts to a transcription.

There are different types of annotations available: visual annotations, e.g. font-type, underlining, strike, and bold or italic font; annotation of entities, such as person, city, literary work or events; damages in the material of the original correspondence item, e.g., when parts of the text were unreadable due to water damage.

Editors apply annotations in a different order and in different stages. P2 prepares a Word file with transcriptions and annotations. Custom markings are added in form of sign/letter combinations, for unsupported or digital edition specific annotations. Copy/paste is used to transfer the text to the WYSIWYG Editor, which converts some of the Word formats automatically to corresponding annotations. The rest of the custom markings have to be replaced with annotations from the WYSIWYG Editor, which are internally handled an XML structure.

P3 and P1 use similar custom markings, such as , to mark that this part of the text was written in Latin font, instead of the default font “Kurrent”. The custom marking is manually replaced with an annotation in the WYSIWYG Editor later on. P3 describes the first step as “to get the text into the database” and the second step as formal indexing, which means to find and annotate entities in the transcribed text, such as a city, person or place. While P1 adds contextual comments directly after annotating entities, P3 states that this step is done later on.

5.8 Editorial Guidelines

“The most basic exigency in traditional editing – State your rules and follow them! – is as well the central law and starting point of all digital editing” [13]. Due to software updates and increased possibilities, editorial guidelines might be subject to change. This might be caused by an increased functionality of the digital scholarly edition, such as support of further annotations. Editorial guidelines describe how correspondence items are annotated, transcribed and presented, but also provide information about the scholarly edition itself, its purpose and aim.

6 Interdisciplinary Work

Working in an interdisciplinary team has the advantage (and luxury) of discussing and defining common guidelines to optimise working together. In this section, we describe the methods that were most beneficial for this team.

6.1 Workshops

We regularly organise workshops with the researchers of the digital scholarly edition, in which new concepts, ideas, issues and possible solutions are discussed. It is a critical but constructive process, which also supports the knowledge transfer from computer science to digital humanities and vice versa. Discussions include, for example, explaining the limitations, constraints or feasibility of certain ideas in regard of technology, the advantages and disadvantages of certain design decisions (e.g, non-responsive vs responsive design), and to look into bugs/errors that were reported with the team.

E.g., an important improvement for the digital editions’ website was the switch to responsive design and accessibility (according to WCAG). Responsive design allows viewing the same content on different devices. The content is adjusted based on the available browser space. This, however, might interfere with the expectation of having content that is always aligned/adjusted in the same way, as it is in a PDF or a printed book. We used a workshop to present test pages that exemplify how responsive design will affect the presentation of the content of the digital scholarly edition. One example is depicted in Fig. 5. Eventually, this led to changes in how content was formatted, i.e. aligned, to avoid issues in the presentation of content for mobile devices.

We recommend interdisciplinary workshops to support the knowledge exchange. It is important that computer scientists understand the requirements of the digital humanities, but it is also important that editors understand the constraints and possibilities of technology.

Fig. 5.
figure 5

Mock-up for a Workshop to show how different screen sizes, orientation and resolution affect the layout of an example text 

6.2 Focus Groups

Focus groups “involving multiple participants can be useful for understanding a range of perspectives, but execution and analysis can be complicated by the dynamics of individuals working in a group” [9]. We can confirm the positive effect of Focus groups regarding the understanding of a range of different perspectives. Our interdisciplinary team organised a focus group with 7 students as part of a seminar lecture about letter culture around 1900. A homework exercise was given in advance, which included 8–10 tasks about finding certain information on the digital scholarly editions website. This was used as a basis for discussion during the meeting. The students of age from 20 to 25 explained how they used the website, and what solutions they found for the given tasks. They also discovered usability issues, suggested improvements and complained about the choice of colours (among other things).

The most important take away from this focus group was, however, reports about how and when students used the website. One student told that she browsed and read letters published in the digital scholarly edition while taking the bus to the university using her mobile phone. Having had a long discussion about the usefulness of responsive design and other design choices weeks before that, made the team aware of different contexts of use and in consequence the importance of responsive design.

The constructive feedback of this focus group led to further improvements of the digital scholarly editions web site.

6.3 Feedback and Error Reports

We defined a very easy rule about how to handle bug reports in this project. Editors should not waste time in filing bug reports, instead, it should be made as easy and comfortable as possible. Our rule states that all error reports are sent to our mailing list with a short error description, time of the error, a screenshot (if possible) and any other useful information. After receiving an error report, log files are checked for detailed error descriptions, which the system automatically logs. The log file and the error report is then entered into the ticket system, i.e. bug tracking system, and categorised. Naturally, bugs are incrementally solved in the SE process and in a timely manner, due to well-written error reports.

6.4 Testing

Testing with users is a critical and important step in every (human-centred) design process. Thus, at the beginning of the project we set up a second server, further on called the test server. The only difference for the users using the test server, compared to the productive system, is a different URL. There a username/password prompt is shown to protect the test website from being crawled by search engines, which could confuse visitors when they accidentally end up on the test server instead of the productive system.

The test server is used for internal tests, and every pre-release version is tested before it enters the productive system. We have tried remote tests with researchers, but due to the lack of dedicated time for testing, users did not engage with the system as they would with the productive system. This means instead of following the process for entering metadata strictly by the guidelines, users on the test system skipped steps and entered less metadata, annotations and contextual information as usual. Thus, many errors were not found and were later rolled out to the productive system, even though they system was also tested by the software engineers. This shows again that “Designers Are Not Users” [10] and that testing with real users is absolutely inevitable.

In order to cope with this, we blocked some time for testing in workshops and meetings, and used the contextual inquires to gain insight on pre-releases.

6.5 Sketching, Low-Fidelity Prototypes, and Live-Coding

The domain of Digital humanities has a strong focus on text, but it is difficult to convey ideas for interaction styles, page layouts and more via text. Instead, in workshops and meetings, we use sketches, paper prototypes, and sometimes clickable low-fidelity prototypes to show how certain ideas, concepts and thus solutions could be implemented. In workshops, the whiteboard is also often used to discuss identified problems and solutions. Figure 6 shows an example sketch, where the transcription and commenting functions are shown next to a facsimile viewer.

In some workshops, the principles of live-coding were used to test different variants of how content is presented on the digital editions web site. The browser’s developer tools were utilised to change style sheets and HTML structures to show and explain features and possibilities. A screenshot from the live coding session is shown in Fig. 7.

Fig. 6.
figure 6

Sketches of different states of the WYSIWYA Editor (left&middle) and a facsimile viewer (right)  

Fig. 7.
figure 7

Live-Coding: changing HTML/CSS of the web site during a workshop presentation using a browser’s developer tools.  

7 Discussion

There are two perspectives that are important to differentiate, but at the same time are intertwined with each other. The first perspective takes the side of the editor, who is entering data into the system. The second perspective is the side of the user (e.g., students, researchers, etc.) that use the data for various purposes, such as literature research.

7.1 Iterative and Incremental Process

Any given system comes with certain constraints. The project has been running since 2012 and can be divided into three very different phases.

In the first phase, from 2012 to 2014, a prototype version was created. No structured design process was used. The requirements were collected iteratively and incrementally in discussions with the users at the time. As a result, the system runs on a relational database that provides a certain structure to store data and relations between data entities. Being developed by computer science students, it is implemented with an graphical user interface with input controls for every property and relation in the database table, thus it does not comprehend the tasks and workflow of editors of digital editions, but rather provides a graphical user interface that looks (and feels) like working with a database to store and archive correspondence items.

This comes with positive, but also negative aspects. Positive is, that there is a relational model of how data is stored and connected, thus it provides a fixed, consistent and clear structure with data fields expecting certain formatted data input. This allows to check the data for consistency using standard queries (e.g., via SQL queries).

However, being confined to the given structure, editors can struggle to enter data that had not been expected, foreseen while the data model was created. This is a natural problem when we consider the tasks and the workflow of editors. We will come back to this point later.

From 2014 to 2018, the system was not further developed in terms of IT, but the editors have already entered a number of documents and other artefacts. This data is now stored in the selected structure in the database.

In the third phase of the project, since 2018, the project has been worked on intensively by a new and larger group of editors and a professional software developer. The new and more users naturally lead to new requirements. Correspondence items are entered incrementally, sometimes as bulk package of multiple correspondence items between two authors. Thus, in the process of entering, annotating and commenting those items, discoveries are being made and those interesting details, patterns, relations, etc. might want to be annotated, commented or stored by editors. This leads to discussions between editors in the research group, of how this information might be entered into the system. We have observed multiple different cases of storing data that circumvents, i.e. exploits, the given data structure.

Therefore, since 2018, we have been further developing the system based on these new requirements and used the double diamond design process as a guideline for design methods. Ideas, needs and wishes are regularly discussed in common interdisciplinary workshops. This allows to conclude that the work process of digital editors is incremental and iterative, and it connects to the software engineering process on multiple occasions.

There are two main types of new requirements: (D) The nature of the data was not correctly recorded; (A) Annotations are missing in order to record the contents of the letter text with editorial accuracy.

As an example of category (D), only persons were originally intended as correspondence partners. During scientific research and the discovery of further correspondence items, it turned out that journals can also be correspondence partners. Of course, journals have completely different characteristics than persons and therefore the data model of the database must be adapted. The adaptation of a relational schema for already existing data is a complex process that has to be carried out very carefully.

Sometimes it also happens that editors enter unexpected data in data fields outside the intended purpose as workaround, might lead to ambiguous data where unambiguous data was expected. This can affect algorithms that create statistics based on this field. Consequently, algorithms must be adapted. Generally speaking, due to the newly introduced complexity in the data fields, new evaluations have to be carried out or the data model has to be changed to take these new requirements into account, which also requires the transformation of existing data.

There are also interesting challenges regarding annotations (A). The number and variants of annotations in the field of edition science are very large and diverse. Here it is important to convey that only true TEI annotations can be evaluated later. Characters that (as in print editions) can be used to mark them up make the search later more difficult or even impossible. Editors use existing annotations, but combine/cascade annotations or define certain characters in combination with an annotation to define a new annotation that the system did not previously provide. E.g. a text annotated as “bold” is used to show printed text (not written by hand or typewriter) on a post card. Another example is the combination of square brackets and text annotated as italic: . E.g., those combinations might be expected to be found via search function (perspective of information seekers), because they can be easily mistaken for real annotations in the presentation of the text. Here it is important to establish a process that allows missing annotation types to be added to the WYSIWYG Editor in order to avoid those and other quality problems later on.

7.2 From Print Edition Tool and Mindset Towards Digital Editions

“What You See Is What You Get” (WYSIWYG) is not a fitting expression for the work in digital editions anymore. Digital editions offer much more possibilities, even because they are interactive, interconnected and dynamic. Annotations are one of the reasons, and the more we can use information technology to parse and understand text (either through automated text mining, or manual annotations or contextual comments) the more opportunities we can create. We suggest to call it “What You See Is What You Annotated” (WYSIWYA) instead, showing that we moved away from ‘just’ visual formatting as it was standard for print editions.

From the perspective of information retrieval, annotations are a way of indexing the content and thus allowing information seekers to find information better and in different ways. Thus, a WYSIWYA Editor that allows to efficiently add annotations, contextual information and comments to transcribed texts, can reduce the time to build a digital scholarly edition. Time saved due to not being required to enter and fill complex XML structures manually, can be used for research or for improving the digital scholarly edition. Once WYSIWYA Editors are configurable to dynamically support new annotations, as needed by the editors, it will allow to add new levels of contextual enrichment to transcribed text. This can then be used to improve the annotation task for new transcriptions, by detecting similar text patterns, or by suggesting entities that have been used in the remainder of a correspondence. Since it is a scholarly edition, the editor must be in control of what is annotated and annotations have to be verified by an editor.

In workshops and discussions many different ideas came up to improve the work of editors of digital scholarly editions. Those ideas reach from visual representation, statistical analysis of correspondence items and automatic validation mechanisms. All of them aim towards a more efficient process of creating a digital scholarly edition. The quest on how to improve the work of digital scholarly edition editors is yet to complete, however, first prototypes look promising.

8 Conclusion

In this paper insights into the work of editors and researchers of an digital scholarly edition were presented, and the design and development processes were described. The identified tasks and workflows showed limitations and typical challenges in creating a digital scholarly edition. We showed that a solution is a interdisciplinary design process that connects the domain of digital humanities and software engineering. Working in an interdisciplinary team in-between two domains, require methods for discussions and knowledge exchange, such as regular and frequent workshops. It is beneficial to dive into the respective other domain, in order to build up domain knowledge and terminology, which in consequence improves the communication between domains.

Methods from the design process should be adapted as necessary. There is a multitude of methods available, which allows to pick the ones that work best in the team, instead of wasting time for methods that are not accepted or just don’t work in the particular case. We used an incremental, iterative, not necessarily sequential design process in parallel to an “agile” software engineering process, which allowed to tackle the problems of each domain individually. This creates space for creative exploratory means, but also allows to be exploitative to solve and optimise deeply technical issues. We suggest to make use of both, to explore opportunities and to break out of establishes patterns, but to also investigate and optimise technology. To create the next generation digital scholarly edition, one will need to have both: a well designed, useful and usable interface to help and support the user to achieve its goals, while at the same time providing a platform that is highly functional, stable, maintainable, extendable and adaptable to fulfil the user’s needs. Developers and designers shall not forget to answer the question of “who is the user?”, “what are the tasks?” and “what does the user need?” (i.e., finding the right problem(s)), before starting to build the right solution(s).

With this paper we have provided specific tasks and shown different workflows of how editors and creators of digital scholarly editions work. Based on the gained insight we are now able to continue to improve the next generation of a platform for digital scholarly editions.