Keywords

1 Introduction

Today, when it comes to the communication of text to computer systems, humans rely mostly on manual methods Footnote 1 and on various types of keyboards. In comparison to both voice and brain computer interfaces, which have been recently explored as potential alternative solutions, manual entry remains predominant for a variety of good reasons that have much less to do with technology than with how people work. Manual methods are not used just because they are more efficient (accurate and fast), but because they let people work without disturbing others, and without letting everyone else know every little thing that they are doing. As a result, despite the technological leaps in all other aspects of human-computer interaction and against all predictions, for decades now, in terms of text entry, the typical typewriter-like keyboard appears unbeaten and still far from being surpassed. A significant disadvantage of the typical keyboard that is seriously considered in our days is its big size and limited portability. Because of this, and as today we rely more and more on mobile devices and our needs for ubiquitous access to text entry are constantly increasing, we are often forced to leave our keyboards back home and compromise with less powerful, text entry solutions, such as smaller keypads, onscreen keyboards, speech-to-text systems, etc.

To this end, we argue that nowadays text entry has turned into a bottleneck in the communication process between humans and modern computer systems. In fact, we believe that the user performance (speed and accuracy Footnote 2) is degrading significantly when it comes to mobile and on-the-move text entry. Furthermore, each time a developer needs to integrate a text input method into a new product (e.g., into a new smart TV, ATM, car-navigator system, medical machine, etc.), in the end the user is often confronted with a whole new input model that suffers from poor performance rates, serious accessibility issues, and unbalanced user satisfaction levels. Clearly, this blocks to a certain degree the overall potential of ICT and the realization of the disappearing computer vision. One could therefore argue that a major leap to text entry that would overcome such barriers, could revolutionize the interaction between humans and computers and release a huge technology and human potential, especially in the emerging cloud computing and ambient intelligence era.

This work aims at contributing to current research towards identifying modern text entry solutions that will unblocking the human potential, achieve better performance rates, and will make the dominant keyboard lose its throne held for so many years now.

2 Text Entry in Brief

Manual text entry, as the human process of communicating text to computer-based systems, is in essence an alternative way of writing. In other words, an alternative way of storing text into a tangible medium. The main difference is that the storage medium now used is the computer memory, rather than paper. Naturally, the new medium posed itself a whole new set of affordances and limitations. For instance, computers now require users to select among a set of pre-encoded characters,Footnote 3 whereas paper, as a flat surface, implies the use of 2D symbols. As a whole, the new medium brought along new options that changed writing, among others, forever. For instance, it enabled authors to act on transcribed texts without imposing restrictions for modifications, and to reproduce texts parts very easily (e.g., through copy-paste).

Manual text entry to computer systems is merely a selection process that can be defined as the human process T that involves choosing from a set of computer encoded characters C and modifying options M, Footnote 4 by means of a total number i of discrete, machine-detectable, user actions, such as key taps, gestures, etc.

Typing, originally introduced by mechanical typewriters, has been a diachronic manual text entry method since the early emergence of computers and command line interfaces. This method involves the correspondence of characters and modifiers with a set of keys spread across a device. Keyboards with different numbers of keys exist, but the 101/105-keys keyboard, like QWERTY, is still predominant. With the emergence of graphical user interfaces, menu-based text entry was introduced as a supplement of typing, allowing users to enter additional characters not found on the keyboard through menus. Such menus are now perceived as the ancestors of the later forms of soft or onscreen keyboards. The later, are features of a program or operating system that generate graphically a keyboard on screen that is then operated through mouse, stylus, etc. Soft keyboards are particularly useful in two cases: (a) for making text entry accessible to people with manual dexterities - e.g., they can be combined with scanning techniques and allow operation through few special switches; (b) for removing keypads from mobile devices and reducing their physical size – e.g., characters can be selected through tapping on a touch display. In both cases, soft keyboards are most usually a representation of a full-size QWERTY keyboard. With the emergence of mobile devices, typing and menu-based methods were fused to allow text entry through few keys [1]. Soon, the success of SMS messaging on mobile phones brought attention to 12-key keypads, with the letters A-Z being traditionally encoded on eight keys,Footnote 5 and a number of 12-key entry methods emerged for assigning more than one letters to each key [2], with multi-tap and dictionary-based disambiguation being the most common among them.

3 Background and Previous Related Work

The work presented here was originally motivated from previous research experiences of the lead author and his former colleagues on text entry methods for users with limited motor functionality of upper limbs. Out of those efforts, the basic concept of the AUK keyboard emerged [3]. That is basically a multi-tier 3x3 menu system, consisting of a primary menu, where eight cells are used for the alphabet letters and one cell is reserved for entering into additional, again 3 × 3, menus that accommodate additional characters C and modifying options M . Through this strict hierarchical structure, AUK supports: (a) minimum keystrokes per character (KSPC) in average, (b) entry through for the widest possible range of input devices, including keypads, indication devices, joysticks, etc., and (c) various layout configurations for achieving even lower KSPC values [1].

Back then, during the early development stages of the AUK concept as a text entry solution for disabled users, we also saw behind it the possibility to develop an ambient and ubiquitous text entry solution for all, for instance through optical tracking of the user fingers [3]. A novel virtual keyboard, Footnote 6 not necessarily visible, was conceived for 10-finger text entry, thus replacing the traditional keyboards. By mapping each finger to one of the virtual keys of the AUK’s 10-key mode (ibid.), if, for instance, the user makes a discrete move of the small finger of the left hand, then the character ‘a’ is to be entered (see Fig. 1).

Fig. 1.
figure 1

Towards ambient ten-thumbs text entry (reprinted from [3]).

Given that few successful tracking solutions exist today, we now see a great opportunity to proceed in this direction and work on the development of a universal text entry solution. But let’s first identify the desired quality characteristics of such a solution.

4 Guidelines for Universal Text Entry Solutions

After several years of research in text entry and hands on experience with hundreds of methods, we have listed a number of features that we would expect to find in an ideal, universal, manual text entry method that, ultimately, could ‘show the exit’ to the traditional keyboard. More specifically, we argue that an ideal method shall:

  1. 1.

    Prevent user errors and allow for easy error recovery. An ideal method shall minimise the frequency of user errors, such as typos due to wrong character selection, and shall allow for rapid error identification and recovery.

  2. 2.

    Support fast typing. The new method shall offer better typing rates than previous methods. Anything below the entry rates with today’s solutions on smartphones and tablet would be a complete a failure.

  3. 3.

    Be easy to learn and use. Ideally, an ideal new method would allow for a smooth transition from the current ways and would introduce progressively various tips and adjustments to help users master the new method.

  4. 4.

    Support WYSIWYG. Footnote 7 An ideal method shall display the exact letters and words that are in the mind of the writer and avoid cluttering the display with rough text and words guesses (like with Dasher or T9 for instance).

  5. 5.

    Support blind typing. An ideal text entry needs to be target-less and potentially eyes-free. The user should not have to search for, and achieve physical contact with, a specific area (physical or virtual, e.g., a key), as this can eliminate performance delays related to time spent (a) in visual scanning for target identification (Hick and Human’s method), and (b) in selecting the target (Fitt’s Law). In that case, the user is free to focus on the transcribed text and the task at hand.

  6. 6.

    Support minimal hand and finger movement and travelling. For instance, during typing on a typical keyboard, finger and hand movements are quite intense, due the need to travel across a keyboard (and the mouse) to reach different keys, which is both time and energy consuming.

  7. 7.

    Accommodate functionality for indication. Ideally, both keyboard and mouse functionality shall be integrated into a single interaction concept. For instance, employ fingers tracking both for typing and for controlling the mouse cursor.

  8. 8.

    Support entry through any desired number of detectable user actions i . An ideal method needs to allow flexible configuration to any number of distinct user actions i , depending on each user’s ability and preferences. In other words, it is desired that the method can be easily set or switched to i  =  10 mode (two hands), or i  =  5 (one hand), or i  =  3 (special assistive switches), or i  =  1 (one finger), etc.

  9. 9.

    Support multi-device compatibility (if not full device independence). An ideal method shall work equally well with any existing device type. The user needs to have a single model regardless if he is working with finger tracking, mouse alone, special switches, joystick, or the arrow keyboard keys.

  10. 10.

    Be smart and adaptable. An ideal text entry system shall be able to propose specific optimisations for options and functions used very often by the user and the process of writing shall be easily or even automatically adapted to various usage conditions, such as in switching to one-hand mode, typing in the dark, etc.

  11. 11.

    Support additional special characters. An ideal method would offer easy and rapid access to numbers and special characters.

5 Refining the AUK Concept

Here the reader is briefly introduced to the basics of the AUK concept, according which everything must be organised in sets of nine (3 × 3) options. As mentioned earlier, in text entry the user, in order to transcribe the text in mind, chooses repeatedly among a finite set of supported characters C and modification/control options M . AUK is nothing more than multi-tier menu system for organising all these options. The proposed structure (keyboard) consists of multiple levels of sets of strictly nine options (menus), starting with a primary menu (home menu), where eight “keys” are used for entering letters and one is reserved for switching between other similar 9-options basic menus in which extra characters and modification options are accommodated. Each 9-options menu can be visualised as a 3 × 3 grid that can accommodate option keys and at least one menu switch key for entering or exiting alternative menus. Up to nine basic menus are foreseen and these can be interchanged sequentially through a consistent menu switch key (see Fig. 2). Each individual option key can itself accommodate up to nine options. Typically, only a part of these options is to be used, leaving the rest empty (see Fig. 3).

Fig. 2.
figure 2

Navigating through consecutive basic menus in the proposed 3 × 3 multi-tier structure.

Fig. 3.
figure 3

Each option key (grey) on a menu is able to accommodate 1 to 9 input options (e.g., characters). For instance, in this example (right), “key 9” hosts the letters “w”, “x”, “y” and “z”, thus selecting this particular key three times in the row, shall produce the character “y”).

The home menu (primary menu) shall typically accommodate the letters A-Z placed on eight option keys, and the menu switch key can be used for switching to the next basic menu and for timeout kill. Letters can be placed alphabetically, similarly to a phone keypad (see Fig. 4 - left). The multi-tap technique can then be used for character disambiguation (i.e., for selecting among the 9 sub-options on each key), and system timeout for entering consecutive letters from the same key. The main benefit of such an approach is clear: familiarity. However, there are also some disadvantages recorded: (a) timeouts slow down users; (b) with an alphabetic layout, some frequent letters require more keystrokes than some less frequently needed letters; (c) some very common characters or modifiers need to be pushed down to deeper level menus. Taking into account that several studies (e.g., [4]) have indicated that an alphabetic order is not critical for user experience, more effective layouts can be created by disregarding the alphabetical order all together. Thus, alternative arrangements of the letters and characters are considered (see [3]) for achieving less keystrokes per character (KSPC [5]). For instance, the apostrophe, SPACE (SP), BACK (BK), and SHIFT (SH)Footnote 8 can be added on the option keys of the home menu, and letters can be re-arranged (see Fig. 4 - middle), so that (a) letters on each key have the lowest possible digraph frequencies [6] in an English text corpus in order to have consecutive letters on the same key (and thus timeouts) less frequently,Footnote 9 and (b) the letters on each key are arranged in decreasing letter frequency in the English LanguageFootnote 10 – thus ensuring less ‘taps’ for most frequent letters.

Fig. 4.
figure 4

The home menu (letters) based on the typical phone keypad layout (left) and on alternative layouts (Frequency order used for generating this layout is: SP BS e t a SH o i n s h r d l c u m w f g y p b v apostrophe k j x q z) for reaching less KSPC and timeouts with 9-keys (middle) or joystick (right)

In addition to the home menu (the “letters menu”), several additional menus (always as sets of nine options) can be introduced as described above, i.e., either as basic menus reached through the main menu switch key or as secondary menus that can be reached through secondary switch keys (a kind of link) that can be placed one any other menu, instead of an option key. Such menus can include: navigation options, numerals, special characters, brackets, numeric operators, etc. (see Fig. 5).

Fig. 5.
figure 5

Indicative organisation of additional characters and other options into secondary menus

The class diagrams in Fig. 6 summarise the proposed structure. As discussed in [3], using this menu structure for arranging all text entry options, we achieve to deliver a universal model that works in an optimum way with any number of i between 1 and 9. For instance through this structure, the average KSPC is lower than in any other arrangement (e.g., in a QWERTY-like arrangement), even in 1-button mode, or 5-keys mode, joystick mode, etc.

Fig. 6.
figure 6

Class diagrams for the proposed multi-tier structure

It should be stated here that the proposed structure can be visualised or not and presented in multiple ways. For instance, it can be presented in parts as consecutive 3 × 3 menus or simple lists of the 9-options presented horizontally or vertically. One could even represent all menu levels simultaneously, in a 2D or 3D representation. What needs to be made clear is that the allocation of input options to keys and menus is one thing, and the visualisation of menus and keys is another. For each unique allocation of options to menus and keys one can produce multiple display versions. Footnote 11 On the other hand, the allocation of options to menus and keys can be altered at any moment upon demand (e.g., for optimising performance rates as mentioned above), which means that for any finite set of options C and M , we can specify multiple versions of allocation. Such optimisations can be identified on a personal base, after for example calculating the use frequencies of individual options, keys or menus. Further discussion on possible layout rearrangement strategies is provided in [3]. In that same work, we also discussed a potential adaption for supporting 10 thumbs entry ( i  =  10 ) through finger tracking.

The last part of this paper, focuses on describing how proposed concept can lead to the development of a touchless text entry solution for writing with one finger up to ten fingers.

6 Towards Touchless Text Entry for All

As mentioned above, a number of successful tracking methods are made available today and the authors started working on implementing various proof-of-concept prototypes with one of them. That is Leap Motion Controller, Footnote 12 a novel 3D motion controller that uses infrared to detect the user hands and specify user gestures and interactions with virtual elements. The LEAP controller, now available as a USB device, supports a 150-degree field of view using infrared (IR) to detect the user hands and specify if the user points, waves, reaches, or grab virtual elements, such as a button on the user interface. This controller is already small enough to be embedded to portable devices, inspiring a huge leap in traditional HCI. Notably, there is an open programming API offered that has attracted the attention of hundreds of researchers from all over the world who turned into it to explore new ways to improve the user experience of their applications and systems. What is interesting, especially for the work presented in this paper, is that among the top 5 most viewed and liked projects at the LEAP community area, one can find two projects that aim at using LEAP for text entry. This is supporting the argument that text entry is emerging as major challenge today, and that we are all in search of new solutions and new paradigms of interactions.

6.1 LEAP for Text Entry: Previous Works and Their Limitations

Let’s first have a quick insight in these text-entry related works from the LEAP community. One of them is that of Dasher, Footnote 13 an input method and computer accessibility tool that enables users to write without using a keyboard, by entering text on a screen using a pointing device. Another method, which shares some common characteristics with that of Dasher, is that of the Minuum keyboard for LEAP,Footnote 14 which was originally aimed to improve touchscreen typing, replacing the onscreen QWERTY-like keyboards, through a minimal keyboard with a specialized auto-correction algorithm that allows highly imprecise typing. 8pen Footnote 15 takes a completely different tack to the keyboard problem, by eliminating it altogether. 8pen arranges the most used letters and characters in 4 sectors around a central ring. To “type” the user places the finger on the center ring and draw loops through the sectors. The starting sector and the number of sectors passed determine the character produced. A limitation seen in all the approaches, is that the only one finger is used, resulting in relatively slow rates. Furthermore, only 8-pen supports blind typing (subject to the user being able to memorise the position of each letter). Another key issue, is that of ease of learning, and only Minuum scores well on that.

Our vision is to deliver an advanced solution based on hand tracking (e.g., through LEAP), that will overcome such limitations and conform to most if not all of the guidelines set in Sect. 4. Towards this direction, we have conceived an approach that builds on the AUK concept and makes use of the LEAP functionality in order to implement 10-thumbs writing. As part of this effort, we have started working on the development of various prototypes with LEAP based on the AUK concept for one up to ten fingers writing. These are developed with Unity, a cross-platform game creation system, including a game engine and integrated development environment (IDE), since the Leap Motion SDK can be used as a Unity plugin to access Leap Motion tracking data in a Unity application. Here, we are introducing two basic models based on the AUK concept: one for one-finger input (like Dasher, 8pen, etc.) and one for 10-fingers input as an alternative to typing with a standard desktop keyboard.

6.2 Writing with One Finger

The basic concept behind using AUK with LEAP for writing with one finger involves placing within the interaction space (the area in which LEAP can detect the movement of hands and individual fingers) the menus and option keys of the proposed 3 × 3 menu structure and using finger tracking for interacting with them. A simple way of doing this is by simply placing in the interaction zone nine virtual keys, applying to them any desired arrangement according to the AUK concept, and then allow the user by pointing in and out of them to type letter-by-letter. This approach shares some similarities with the Minuum and Dasher as it uses LEAP for indication (pointing).

An alternative approach that is closer to the gesture based approach of 8pen, involves splitting the interaction zone into nine vertical zones, as presented in Fig. 7. Splitting the interaction space into 9 areas in order to support one-finger typing. In such case, a potential arrangement for the home menu is that in Fig. 4 – right (which in fact is a rearrangement of that shown in middle of the same figure), so that the user can write by moving his finger around the eight directions N, NE, E, SE, S, SW, W, and NW and returning back to C for switching to other menus. For character disambiguation among letters on the same key, there are at least 4 possible options: (a) use a kind of multi-tap approach; (b) use gestures; (c) use dictionary-based disambiguation; or (d) split the zones to individual sub zones for each letter (e.g., see Fig. 8).

Fig. 7.
figure 7

Splitting the interaction space into 9 areas in order to support one-finger typing

Fig. 8.
figure 8

Splitting the interaction space into individual subzones for each letter

6.3 Writing with 10 Fingers

In [3], an early discussion on a potential adaption of AUK for supporting 10 thumbs entry ( i  =  10 ), even through finger tracking, was included. This is important since it may enable reaching fast writing (close if not better than rates achieved with a desktop keyboard), and at the same time enable the detachment of the writing process from any kind of physical keyboard. With 10-fingers in mind, a 9 + 1Footnote 16 version of the multi-tier structure is used, which may be visualised as a 10 × 1 horizontal grid. Then we can map each finger to one of these 10 virtual keys, and use LEAP to detect the movement of each individual finger and produce characters accordingly. One way, seeking familiarity, is to group and allocate the letters in the 10 keys in question by taking into account the way that the letters are placed on a QWERTY layout. However, we strongly believe, that one needs to detach the arrangement of letters from that of the QWERTY keyboard,Footnote 17 and produce whole new arrangement that will be designed from scratch for the new medium (hands tracking). An alternative arrangement would involve for instance assigning the most frequent keys to the more dominant fingers, ensuring the lowest possible digraph frequencies, etc. (e.g., see Fig. 9).

Fig. 9.
figure 9

Indicative AUK arrangement for 10-fingers typing with LEAP

Yet, in order to respect the physiology of the human hands and fingers and avoid the “gorilla arm” effect induced by uncomfortable placement of hands, we wish to support more comfortable placement of the hands while writing with LEAP. Thus, we aim at implementing an algorithm that will effectively detect the finger movements (i.e. key-taps), while the users hands are placed and move like if they were holding and playing with a virtual 3D ball, having their elbows resting on the table in front of them [7].

7 Conclusions & Future Work

The proposed text entry method is touchless, allows for blind typing, and supports various entry modes and multiple adaptations for diverse user skills and conditions of use.

Clearly, one can implement this technique with LEAP without conforming to the presented 3 × 3 multi-tier structure of AUK, but in that case all the benefits of that approach (device-independence, easy switch to any i  <  10 mode, adaptations for improved performance and accessibility, etc.) would be lost for good. Overall, the concept of combining AUK with hand tracking shows a number of benefits and the potential to meet the majority, if not all, of the requirements of an ideal text entry method as presented earlier in Sect. 4. Nonetheless, there are substantial problems still with the use of the LEAP system for hand tracking. For example, IR reflections, finger occlusions and moving outside the detection space are all capable of halting the process. All these, shall be tested and validated in pilot tests using the developed prototypes.

Regarding our future plans: (a) the KSPC of each proposed mode and character layout will be calculated using public corpora, and detailed comparison results with other techniques will be collected; (b) user performance will be estimated and predicted using theory-based methods such as Hick and Human’s method and Fitt’s Law; (c) user trials will be conducted with the interactive prototypes; (d) a mechanism for employing personalised user corpora will be developed and integrated; (e) best possible arrangement for 10 fingers typing with LEAP will be proposed after studying hand and fingers kinematics and dynamics, and (f) research will be conducted to identify the frequency of all input options and the dioption frequencies (term used in analogy to digraph frequency) for optimising the tree structure of the menus, especially for the 10-key mode.