1 Introduction
Smartwatches are spreading as wearable devices mainly for health monitoring and receiving notifications. They are expected to be used not only passively but also actively if their text entry becomes easier to use. In interacting with the small touchscreen, a menu must be concisely allocated with visual information, and the “fat finger” and occlusion problems [
28] must be considered. Efficient menu interfaces with minimal occlusion [
20,
21] and those for eyes-free interaction [
5,
24] have been recently proposed. In implementing text entries, the problems have been mainly tackled with three approaches. The first one involves focusing on a part of a keyboard by zooming in [
22], splitting [
4,
11], or showing a call-out [
17]. The second approach involves assigning multiple characters to a reduced number of keys and making these selectable [
14,
15,
27]. The third approach involves statistical decoding using probabilistic touch and language models [
8,
31,
32]. Statistical decoding enables precise tap and gestural typing on a full QWERTY keyboard on a smartwatch. The QWERTY layout serves as a standard virtual keyboard on smartwatches in Latin script languages.
However, this is not always the case for non-Latin script languages. In Japanese, text entry has additional challenges, such as having more than fifty syllabic characters, kana, to enter and subsequent kana-kanji conversion, which converts a sequence of kana characters into a standard Japanese text style with a mixture of kanji and kana. Figure
2 shows a basic set of kana syllabary; a kana is composed of a consonant and a vowel (CV) (or a single vowel). More than fifty kana are made up with additional symbols for voiced consonants, contracted sounds, and double consonants. Before smartphones prevailed, Japanese users became familiar with a Japanese text entry based on a numeric keypad on feature phones. This text entry interface assigns five kana of a common consonant (i.e., characters in a row in Figure
2) to a numeric key and the symbols and marks to the remaining two keys; thus, five kana can be toggled through multiple taps. This interface was inherited by smartphones, as shown in Figure
3. The virtual numeric keypad introduced flicking in addition to multiple tapping for kana selection. Specifically, simple tapping selects a kana with the vowel ’a’ and flicking in four directions corresponds to four characters with the vowels ’i’, ’u’, ’e’, and ’o’. Most Japanese users use the virtual numeric-keypad-based text entry on smartphones.
Under these circumstances, smartwatches have no standard Japanese text entry interface. Although the keys of the numeric keypad are larger than those of a QWERTY keyboard, simple porting of the numeric keypad onto a smartwatch makes the key size and spacing very small. Flick operation generally requires a wider area than tapping or swiping. Tojo et al. proposed a Japanese text entry with flick operation based on kana keys allocated in an annular layout [
30]. In light of the examples, we consider the essential factors to be 1) a space-efficient key layout with big key size and spacing, 2) minimal operations to enter a character, and 3) the regularity of operations that is simple enough to remember. Previous studies have focused on interface design that efficiently allocates the keys on a small touchscreen and takes advantage of the regularity of the kana CV structure in kana selection, but overlooked commonality regarding flick directions with familiar smartphone interfaces. Thus, we propose PonDeFlick, an annular-layout-based Japanese kana text entry that provides the entire touchscreen for gestural operation and simplifies the gestural operation by commonalizing the flick directions with familiar smartphone interfaces.
We conducted a ten-day user study comparing PonDeFlick with a miniature numeric-keypad-based flick text entry and a modification of PonDeFlick that keeps the regularity of the kana CV structure but does not commonalize the flick directions. The user study revealed the effectiveness of commonality in flick operation.
2 Related Studies
When implementing a keyboard on a smartwatch, the small form factor makes precise typing difficult. Various solutions to the "fat finger" and occlusion problems have been proposed. The solutions can be roughly classified into three groups.
The first category is zooming a part of the keyboard [
3,
4,
11,
17,
22]. ZoomBoard [
22] zooms in on a small QWERTY keyboard by taps, and an additional tap specifies the desired key. Swipeboard [
4] divides a QWERTY keyboard into nine regions, allowing users to swipe twice to enter a character. The first swipe specifies the area where the character is located, and the second swipe specifies the desired character. SplitBoard [
11] displays half of a QWERTY keyboard on a small touchscreen, enabling a user to switch to the other half and an additional page for numbers and special characters by horizontal flicking. Virtual Sliding QWERTY [
3] enables a user to move an oversized keyboard to a desired position by tap-and-drag operation. ZShift keyboard [
17] displays a call-out showing a zoomed-in image of the touched area in a non-occluded area [
33], enabling a user to change the character if needed by shifting the touching finger slightly. A systematic literature review [
18] covered early works before 2018. These interfaces require additional taps or swipes to manipulate the display.
The second category is keyboards with a reduced number of keys, which assigns multiple characters to each of the keys and makes the multi-characters selectable somehow [
10,
14,
15,
27]. SwipeKey [
27] is a Latin script text entry that determines a letter based on tapping and flick directions on a reduced number of square keys in a tiled layout. The work tested various key sizes and numbers of flick directions and optimized the configuration in a 25 mm x 15 mm rectangular keyboard on a smartwatch. The user study showed that six keys of 7.5 mm x 7.5 mm squares with five flick directions (tapping and four directions) recorded the fastest text-entry speed and the lowest error rate. This configuration obtained the lowest difficulty score and the highest preference in subjective evaluation. The work also reported that the error rate increased drastically if the key size was smaller than 5.7 mm x 5.7 mm. DualKey [
10] assigns two adjacent letters in the QWERTY keyboard to a key and makes the two selectable by finger identification between the index and middle fingers. UOIT keyboard [
14] divides 26 English letters into 13 frequent one-keystroke letters and the other 13 two-keystroke letters and defines an easy-to-learn rule that maps them to 13 frequent letters (’u’, ’o’, ’i’, ’t’ etc. for the one-keystroke letters) and pairs of the 13 letters for the two-keystroke letters. Meanwhile, ambiguous keyboards [
6,
15] reduce the load of specifying letters. Komninos’s ambiguous keyboard provides context-based word suggestions, word completion, and next-word suggestions on a six-key keypad in an alphabetical layout [
15]. WrisText [
6] enables one-handed text entry by whirling the wrist of the watch hand toward six directions of an annular ambiguous keyboard.
The third category is statistical decoding [
7,
8,
31,
32] using probabilistic touch and language models. The models enable precise detection of key touches and accurate prediction of the next words. Google’s WatchWriter [
8] provided precise tap typing and gesture typing on a miniature QWERTY keyboard based on their Smart Touch Keyboard and Smart Gesture Keyboard techniques developed on smartphones [
25]. VelociTap [
32] achieved a text-entry speed of 41 words per minute on a 40-mm-wide keyboard with a sentence-based decoder incorporating a probabilistic touch model, a 12-gram character language model, and a 4-gram word language model. While the statistical decoding enables fast text entry with suggestion, auto-completion, and auto-correction, it sometimes causes errors, especially in entering rare words such as proper nouns. VelociWatch [
31] tackled a challenging text input task with error avoidance functions such as letter locking and selection slots. The statistical decoding techniques strongly support efficient text entry regardless of keyboard type.
Various virtual keyboards on smart devices have been proposed for non-Latin script languages. The circumstances of Korean text entry are similar to Japanese. The most popular virtual keyboard on smartphones is a QWERTY-like Korean keyboard. Some manufacturers released original text entries on the numeric keypad of feature phones, and smartphones inherited them. Ilinkin et al. [
12] implemented four popular types of Korean text entries on smartwatches and conducted a comparative evaluation. Although the QWERTY-like Korean keyboard is preferred for two-thumb typing on smartphones, the three numeric-keypad-based text entry interfaces performed better than the QWERTY-like Korean keyboard on smartwatches. These numeric-keypad-based interfaces were based on tapping. Flick-based interfaces require a wider area, generally. As a Japanese text entry on smartwatches, BubbleFlick [
30] provided the widest area possible for flick operation while also leaving an area for editing text by rearranging the twelve keys of the numeric keypad in an annular layout. Though it opened up the entire touchscreen for flick operation, it left an issue that the flick directions changed depending on the keys, making it hard to learn even after 30-day uses.
In menu interface research, Marking Menu [
16], which enables command selection by directional gesture, is an influential one that facilitates users’ smooth transition from a novice mode to an expert mode and has given inspiration to many variations. For instance, FlowMenu [
9] enables consecutive command selection by combining the marking menu with Quikwriting [
23] based on an octant with a rest area in the center. Text entry and command selection face a similar challenge in that many commands must be grouped clearly and easily selectable. Zone menu [
34], which increases the number of commands by setting multiple zones to start directional gestures, forms groups of commands. The hierarchical levels of the Marking Menu are effectively increased by distance extended marking menu [
19]. The hierarchical gestures suggest solutions for two-step text entry [
13,
27,
30] to meet the three requirements in the introduction. The marking menu is further extended to those initiating from a bezel, such as Bezel Menus [
13], which proposes a Latin alphabet text-entry similar to our PonDeFlick on smartphones, Bezel-Tap Gestures [
26] on tablets, Bezel-to-bezel interaction for eyes-free interaction on smartwatches [
24], and bezel-based selection interfaces for minimal-occlusion interaction [
20,
21]. The menu interface and text entry have differences as well. While users usually select commands inconsecutively and memorize only frequently used ones for menu selection, they have to select a variety of commands successively for text entry. So, the gestural operation should be light in cognitive load. In other words, the operation should be more reflexive for text entry.
3 PonDeFlick
PonDeFlick is an interface that allocates necessary keys and a text-editing area efficiently in a small touchscreen while providing the entire area for flick operation that has commonality with the popular Japanese text entry on smartphones. Figure
1 shows screenshots of PonDeFlick. Panel (a) is the initial screen. Ten keys of representative kana, which are the heading characters of the rows circled in the kana syllabary table (Figure
2), and two keys for symbols and marks, which make up twelve in total, are arranged in an annular layout. The size of a key is 6.82 mm in diameter, which is greater than 5.7 mm specified in [
27], and the spacing between the keys is 0.64 mm. The text editing area in the center shows three lines of entered text with seven kana per line.
Forty-six kana are systematically assigned to the ten keys in combination with tapping and four flick directions. The leftmost of the two bottom keys is for adding a voiced sound mark or modifying a kana to a double consonant or contracted sound. The rightmost one is for adding punctuation, i.e., point, comma, question, or exclamation point. An additional key inside the ring is a completion key. A leftward flick in the text editing area works as a backspace. Vertical flicking is used for scrolling up and down the entered text. Panel (b) illustrates the entering of ’na’, which comprises the consonant ’n’ and vowel ’a’. The bold yellow shades illustrate trajectories of gestural strokes. A finger touches down on the ’na’ key and then touches up on the key. A flick guide indicating the flick direction is displayed 0.3 seconds after the touchdown. Panel (c) illustrates the entering of ’ne’, which comprises the consonant ’n’ and vowel ’e’. A finger touches down on the ’na’ key, slides slightly towards the center, and then flicks rightward. One of four kana characters with vowels ’i’, ’u’, ’e’, and ’o’ is determined by the flick direction, i.e., leftward for ’i’, upward for ’u’, rightward for ’e’, and downward for ’o’. The correspondence between the flicking direction and the vowel is shown in Figure
4. This correspondence is common with that for selecting a kana on the numeric-keypad-based Japanese text entry on smartphones.
In operating PonDeFlick, a finger stroke changes its direction. To detect flicking and recognize its direction anywhere on the surface, we developed an algorithm to search for the final inflection point, which is considered the starting point of the flicking. Figure
5 illustrates the algorithm.
Let the sequence of points in the stroke be denoted as P0 = (x0, y0)T, P1 = (x1, y1)T, ⋅⋅⋅, PN = (xN, yN)T, where P0 and PN are the touch-down and touch-up points, respectively. Let the inflection point be denoted as S, which is initialized with P0. We set two thresholds: a minimal travel distance D for flick detection and an angular threshold θ for inflection point detection.
After touch-down, touched positions are continuously detected at intervals of a few milliseconds as
P0,
P1, ⋅⋅⋅,
PN. If a touch-up is detected with a travel distance below
D from
P0 on one of the keys, as shown in Panel (b) of Figure
1, a kana with the vowel ’a’ is selected. If a travel distance exceeds
D, a search of a new inflection point runs each time a new touched position
Pn is obtained. The closest point to
Pn with a travel distance over
D,
Pn − k(
k ≥ 1), is a new inflection point candidate. It is
Pn − 2 in Figure
5. The displacement vector from
Pn − k to
Pn is
\(\overrightarrow{P_{n-k}P_{n}}\), and the vector from the current
S to
Pn − k is
\(\overrightarrow{SP_{n-k}}\). The angle
α formed between the two vectors is measured. If
α ≥
θ, the inflection point is updated to
Pn − k (
S =
Pn − k). Note that
D and
θ are set to 30 dp (approximately 4.8 mm) and 70 degrees, respectively.
When detecting a touch-up, the final displacement vector
\(\overrightarrow{SP_N}\) determines the flick direction. The maximum of inner products with
\(\vec{d_i}=(-1, 0)^T\),
\(\vec{d_u}=(0, -1)^T\),
\(\vec{d_e}=(1, 0)^T\), and
\(\vec{d_o}=(0, 1)^T\) determines a kana to enter.
6 Discussion
Considering the CPMs, EPCs, SUS scores, and subjective comments, KeypadFlick on smartwatches had too small keys to operate, though its precise touch detection enabled fast typing. PonDeFlick and PonDeSlide had larger keys in a wider area, making their operation easier. The results with the different key sizes of KeypadFlick and our original interfaces match the finding of the SwipeKey work that the error rate increases drastically if the button size is smaller than 5.7 mm x 5.7 mm [
27]. Comparing the two variations, we consider that PonDeSlide’s slide gesture that determines a kana by where to touch up forced the user to operate it more carefully than PonDeFlick. PonDeFlick’s flick gesture alleviated the need for carefulness and enabled faster typing than PonDeSlide.
PonDeFlick can be viewed as an application of the marking menu with four flick directions to an annular key layout, whereas PonDeSlide is a linear menu. Regarding comparing the marking and linear menus, a paper reported experiments on learning with a grid-based marking menu (M3 Gesture Menu), a multi-stroke marking menu, and a linear menu on a smartphone [
35]. The experimental results exhibited that the users of the two marking menus got much faster after three ten-minute practice sessions, whereas those of the linear menu did not. Our experimental results showed the same tendency on a small smartwatch touchscreen. In the experiment, PonDeFlick showed that the speed of flicking made up for the extra time to slide toward the center. This finding might apply to a more general menu interface on a small surface. Commonalizing flick operations with users’ familiar interface can benefit easy operation, even if the key layout differs. In other words, the flick operations can be designed separately from the key layout to some extent.
A limitation of this study is a potential issue that the CPM of the PonDeFlick might not reach stable performance in the 10-day experiment. The experimental period should be longer to obtain stable performance in the long run. Another limitation is that we have not proposed a solution for square-face smartwatches. However, even if the key layout differs from PonDeFlick, it is probably effective to commonalize the flick directions with the numeric-keypad-based text entry on smartphones.