Keywords

1 Introduction

Phishing is used to obtain confidential information, install malware, obtain funds, or steal resources  [18]. Targeted phishing is a critical component of that; for example, phishing attacks on Zoom increased four orders of magnitude between March and April 2020 and COVID-19-related phishing, including misinformation as well as attacks on the benefits for the newly unemployed. The most targeted form of phishing attack is spear phishing  [1]. As spear phishing is a challenge essentially grounded in human behavior and decision-making  [29], solutions should be informed by human subject evaluations as well.

Conversely, studies on phishing show a bias toward machine learning and purely technical solutions, with only \(13.9\%\) of published papers on phishing in the ACM Digital Library utilizing human participants or user-centered methodologies  [8]. Even when research does involve human subjects, it often studies convenience samples, specifically university students. Investigating high school students is particularly important, as previous research has shown that age is a critical factor in predicting susceptibility to phishing attacks  [22, 23, 26]. Improved understanding of participants’ mindsets when they click on a malicious email link can enable robust defensive and offensive techniques against spear phishing attacks. In order to contribute to this understanding, we combined phishing detection with signal detection theory (SDT) to explore how spear phishing cues impact this population  [2]. SDT is often used to effectively measure and differentiate between present patterns and figuratively noisy distractions  [24].

Specifically, we conducted a user study focusing on 57 high school students and staff members to explore the less-observed correlation between participant mentalities and email spear phishing attacks. Our goal was to address the following research questions:

  • RQ1: How confident are participants in distinguishing between legitimate and non-legitimate spear phishing content over email?

  • RQ2: How does age affect a user’s ability to distinguish between legitimate and non-legitimate spear phishing content over email?

2 Related Work

The U.S. Department of Homeland Security identified the sequence of actions taken to craft a spear phishing attack: (1) identify the target, (2) meticulously craft the message with the intent of the recipient taking immediate action, and (3) deliver the message from a counterfeit email address  [31]. Rajivan et al. found that phishing emails with “specific attack strategies (e.g., sending notifications, use of authoritative tone, or expressing shared interest)” were found to be more successful  [32]. The use of social engineering through psychological manipulation can establish trust, and, as a result, lure in victims  [20].

Previous research on phishing has focused on software- or hardware-based solutions, such as toolbars, machine learning models, and warning indicators  [4]. Although significant advances in technology-based tools have emerged  [30, 34, 35], less research has focused on end users  [8]. Yet, the need for such research has long been recognized; in 2008, Friedrichs et al. argued that humans must be studied to stop web-based identity theft, including phishing attacks [15]. Such insights become even more important in light of Karakasiliotis et al.’s findings that only 36% of their study’s participants could identify legitimate websites. Only 45% of participants could correctly identify malicious websites  [21]. Dhamija et al. found that visual deception can fool even sophisticated users; a good phishing website fooled 90% of the participants in their study  [13]. Fewer studies have focused on more vulnerable populations, such as younger students. In our background research, we did not find any studies focused on high school students or staff. Thus, we specifically selected a high school environment for our study.

In 2016, Canfield et al. performed two experiments comparing detection and performance using SDT. They found that “Greater sensitivity was positively correlated with confidence. Greater willingness to treat emails as legitimate was negatively correlated with perceived consequences from their actions and positively correlated with confidence”  [2]. We implemented SDT in our research by analyzing the ‘stimulus,’ which triggers the decision-making in users. To evaluate the efficacy of the stimulus, we measured hits, misses, false alarms, and correct rejections (i.e., true positive, false negative, false positive, and true negative). We analyzed how users chose to click or not click links sent via electronic mail. The use of SDT enabled us to evaluate which sections of the phishing email arouse suspicion when they are present  [2].

3 Methodology

To explore the relationship between the phishing susceptibility of high school students and their educators, we wanted to see what email cues both groups notice when deciding to click (or not click) on a malicious link. We conducted a non-experimental, quantitative correlation analysis by collecting data through a descriptive survey to check phishing susceptibility outcomes, age differences, and confidence levels. We primarily collected data from high school students and staff at a suburban high school in the United States. We obtained approval from the Ethical Review Board before beginning this experiment.

3.1 Recruitment

To begin, we instituted a collaboration with a suburban high school from the Midwestern part of the United States. As most high school students were under the age of 18, parental permission was required on a paper version of an informed consent document. We only allowed people to participate after their form was signed and approved by the staff and the students’ parents. During the recruitment phase, we engaged with language arts classrooms to find willing research participants. English language arts classes were chosen because all students were required to enroll in these classes to graduate. The study was also advertised to every student in the building during the morning school announcements. We also distributed flyers advertising the study to 200 participants. Students who turned in the paper consent forms then received emails that contained an electronic form of the survey. To recruit teachers and faculty members, we sent out emails containing the link to the consent form and questionnaire. Because the study was announced beforehand, teachers and faculty were expecting this recruitment email. The participants received an incentive at the end of the survey by choosing to enter a drawing for Starbucks gift cards. Our power analysis showed that we required sample size of more than 50 participants. We obtained a complete response set from 57 participants in our final data set.

3.2 Survey Instrument and Study Design

The survey consisted of three parts: the informed consent information, the demographic questionnaire, and the actual phishing susceptibility assessment. We utilized Google Forms as the tool to provide the survey questionnaire because it was easily accessible to both students and teachers. The first author anonymized the data so that personally identifiable information would not be shared with anyone else, including other researchers. Participants began by opening a Google Forms link from their email and confirming their status as a student or a staff member of the high school. The staff needed to confirm their consent to the study, while students would move on to the next step due to their parents having already agreed via the consent form. Next, participants answered a set of demographic questions regarding their age group (and not their specific date of birth to reduce the risk of disclosure of identifiable information). Afterward, the participants were presented with ten questions to assess their spear phishing susceptibility through the use of images of phishing emails. We selected images instead of asking them to go through actual emails to mitigate any concern that they may respond to malicious messages. The participants classified the images as “regular email” or “phishing email”. For each question, the participants rated their confidence in their decision, from least to most confident using a five-point Likert scale.

Spear Phishing Susceptibility: Based on prior phishing research, there are three main factors identified in most phishing emails: anonymous senders, suspicious URLs or installations, and a sense of urgency  [14]. Figure 2 is an example that shows the present signs of a harmful phishing email such as: an anonymous sender (e.g., “is outside your organization”), a sense of urgency (e.g., “URGENT! CLICK THE LINK”), a suspicious URL (e.g., “http://baoonhd.vn/api/get.php?...”), and a risky action (e.g., clicking on “Open in Docs”). In contrast, Fig. 1 shows an authentic email from Google, as seen by the trustworthy email address, the accurate website link, and the valid email format. Non-phishing examples were adopted from personal school emails that the high school staff and students received earlier, and at least one individual reported as suspicious. This data was obtained from the high school staff and IT support, who anonymized the email samples.

Phishing examples were adopted from the Berkeley Phishing Examples Archive (PEA) Footnote 1. The adopted phishing emails were modified to include the name of the school and actual school activities, including grades and exams. The images were edited to address the participants’ real names and roles (teacher or student). Google documents addressed school-specific information to check the participants’ susceptibility to spear phishing emails. The signals that were used in the phishing emails were (a) the greeting, (b) suspicious URLs with a deceptive name or IP address, (c) content that did not match the ostensible sender and subject, (d) requests for urgent action, and (e) grammatical or typographical errors. We selected this set of signals based on a 2016 study by canfield et al. that similarly focused on detection theory, albeit using an online survey of people aged 19–59  [2].

Fig. 1.
figure 1

Example of an authentic email displayed to the participants in the survey

Fig. 2.
figure 2

Example of a phishing email displayed to the participants in the survey

3.3 Analysis: Method

Once the data collection was complete, we analyzed the data using RStudio and SPSS Statistics. Using SDT, participants’ answers were categorized as four possible outcomes: hit, miss, false alarm, and correct rejection. Table 1 shows the signal detection theory outcomes adjusted to become appropriate for this study. The outcomes from the phishing assessment were analyzed in a one-way analysis of variance (ANOVA) to explore the relationship between the independent variable (age group) and the dependent variables (the number of different outcomes and the average confidence levels). The one-way analysis of variance is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups  [17]. For ANOVA, we usually compare three or more groups. For this study, we divided the data set into seven groups.

Table 1. Modified signal detection theory implemented to evaluate spear phishing susceptibility

4 Findings and Discussions

Our data collection was done over a period of two months. We collected a complete data set of 57 subjects, who provided their consent and participated in it. Of these 57 participants, 12 were students, and 45 were staff members of the high school. Eight participants were from 12 to 17 years old; four participants were from 18 to 24 years old; 11 participants were from 25 to 34 years old; 15 participants were from 35 to 44 years old; 12 were from 45 to 54 years old; seven were from 55 to 64 years old. Thus, the participants’ ages ranged from 12 to 64 years old. This study aimed to determine if there was a significant difference between the age groups (12–17, 18–24, 25–34, 35–44, 45–54, and 55–64 years old), the email outcomes (hit, miss, correct rejection, false alarm), and the confidence levels (Likert scale one through five ratings) using a ten-item test. Results of the ANOVA test are shown in Table 2. A significant difference was noted for the hit or miss email outcomes (F(5, 51) = 2.614, p < .035). The correct rejection, false alarm, and all the different confidence levels had no significant difference between the groups.

Table 2. ANOVA results of the different signals (hit, miss, correct rejection, and false alarm) between and within groups (divided based on age)
Fig. 3.
figure 3

SDT Mean Outcome shows the mean for the email outcomes in a linear transformation from 100% to a five point scale. It shows (from top to bottom) correct rejection in yellow, correct acceptance (hit) in blue, incorrect acceptance (miss) in red, and false alarm in green. (Color figure online)

Fig. 4.
figure 4

SDT Mean Outcome for Confidence Levels showing the confidence level. Misclassifying phishing email (red) is associated with the same confidence as correct rejection for 12–17 (yellow), with confidence falling with age. False alarm is shown with least confidence in ages 12–17, and increases with age. (Color figure online)

The results illustrate a significant number in the hit or miss category, but few correct rejections and false alarms across all the confidence levels. The ANOVA results of the confidence levels of the participants can be seen in Table 3. Here, we can say that age plays a significant role in responding to a stimulus, as evidenced by the participants either responding with “Authentic Email” or “Phishing Email.” A potential reason for the lack of significance could be that the confidence levels were not precisely represented and that participants’ perceived confidence was subjective. One participant’s response of a 5 (most confident) could be the same as another participant’s 3 (average confidence). Their perceived confidence could also shift throughout the survey; a response of 1 (least confident) could be changed to a 2 (lower confidence) or 3 later on, depending on whether or not the participants believed that the questions were more or less difficult at the beginning of the survey.

Figure 3 shows correct results (yellow, blue) increase with age. Figure 4 show confidence increasing in false alarms in with age (green), with confidence about correct identification (and misidentification higher for younger age groups. Our data revealed that the highest mean for the hit outcome was from age group six (45–54 years old). The second-highest mean for the hit outcome was from age group five (35–44 years old). Groups five and six also had the lowest mean for the miss outcome. In Fig. 3, we show the mean outcome for hit and correct rejection, which has an increasing slope, with a negative correlation with miss and false alarm. Therefore, there is strong evidence that older groups are less susceptible to spear phishing than the younger groups in a high school setting. Figure 4 shows that the other variables were not significant. This result is quite different from that hypothesized under the ‘digital native’ rubrics that argue for younger cohorts’ lifetime exposure resulting in improved decision-making (e.g.,  [27]).

Table 3. ANOVA descriptives for SDT confidence levels outcomes

5 Implications

Spear phishing is an effective form of attack because attackers manipulate their targets, either through luring them in with promises of specific benefits or by coercing them with specific threats  [25]. These techniques are designed to lead to impulsive or quick decision-making from the end-users. In our findings (Sect. 4), we leveraged SDT to understand participant decision-making with spear phishing stimuli. When the mean of the outcomes was graphed, the results revealed a positive slope for the hit and correct rejection outcomes, meaning that the older participants tended to be less susceptible to spear phishing. The effects of these relationships can contribute to a better understanding of how people interact with fraudulent acts online. Here we offer recommendations that our findings indicate as ways to increase resilience against spear phishing attacks.

Align Anti-Phishing Training with Self-perceived Expertise: Our work found that older participants were less susceptible to spear phishing than younger participants, as age group six had the highest average number of hits (i.e., correct detection) throughout the experiment. This is aligned with previous research from Sheng et al.  [33]. One reason for this gap may be students’ lack of exposure to training geared towards them. For this reason, we recommend introducing phishing training to students at a younger age and aligning it with their self-perceived expertise. Our results show both a high level of incorrect responses and a high level of confidence. This indicates that younger participants may be unaware that they have been the victim of a successful phishing attack.

Targeted Risk Communication: In addition to providing anti-phishing training, organizations should consider providing clear risk communication, especially for younger adults or children. Students may lack an understanding of the technical threats that may be present in their email inbox  [19], believing that they will not be targeted. Thus, the need for context-aware risk communication  [3] that has been identified as necessary for older adults  [6, 7, 16] is similarly required for high school student populations.

Enable Multi-factor Authentication: To create more robust defensive techniques against spear phishing attacks, we need to reduce the risk of compromised credentials. Such compromised credentials can be used to steal sensitive information. Because of this, schools that provide laptops (or require these for online instruction) should consider adopting multi-factor authentication (MFA) for students and staff  [5, 9, 28]. The introduction of these (like other training) should be aligned with user risk mental models  [10,11,12]. The issue of over-confidence above also motivates the importance of another factor for authentication (e.g., a hardware token) in addition to their password, which would mitigate the harm of phishing.

6 Limitations and Future Work

This work, with its focus on the conference as well as correctness. Opens more questions than it answers. Other factors besides age and confidence levels should be studied to gain a holistic understanding of susceptibility to spear phishing. The suburban high school we engaged with has relatively high socio-economic homogeneity, and the study should be repeated with other high schools. To improve diversity, future work should begin with more diverse schools, and then study specific underrepresented populations, such as students with physical or learning disabilities. Interviewing the participants to collect more qualitative data and better understand user decision making is a needed expansion of this work.

7 Conclusion

With the current rise in spear phishing, especially among vulnerable populations, it is critical to developing tools and educational approaches to train users to differentiate between authentic and malicious emails. To understand spear phishing attack resilience, we studied a population in a high school environment (\(N=57\)). We found that age and confidence play a critical role in the identification of spear phishing attacks. Our study concludes by providing recommendations for developing anti-phishing training tools and communicating risks and benefits.