6.1 RQ1: Analysis of Workers’ Answers
We begin by analyzing the answers provided by the workers for the
P1 part of the survey, from Section
6.1.1 to Section
6.1.14, and those provided for the
P2 part, from Section
6.1.15 to Section
6.1.25. We then summarize all our findings in Section
6.1.26.
6.1.1 Previous Experiences.
To begin the investigation, we analyzed the previous experiences with longitudinal studies in which each worker reported having taken part, reported in Table
3. We recall that the charts shown in the following figures (Figures
1–
22) should be interpreted as described in Section
5.3.2.
A total of 300 workers were recruited, with each platform contributing 100 workers. They reported 547 previous experiences with longitudinal studies, averaging 1.82 experiences per worker. Prolific workers reported the most experiences (193), followed by Amazon Mechanical Turk (187) and Toloka (167). Prolific had the highest proportion of workers with previous experience (35.28%), followed by Amazon Mechanical Turk (34.19%), while Toloka workers had less experience (30.53%). Additionally, 97 workers (32.3%) reported experiences from a different crowdsourcing platform than their recruitment platform (see also Figure
6).
Figure
1 details workers’ previous experiences with longitudinal studies from Table
3. The analysis shows that 45% of workers reported one experience, while 27.67% and 27.33% reported two and three experiences, respectively. These proportions varied across platforms. For Amazon Mechanical Turk, 42% reported one experience, 29% reported two, and 29% reported three. In Prolific, 43% reported one experience, 21% reported two, and 36% reported three. In Toloka, 50% reported one experience, 33% reported two, and 17% reported three. No statistically significant differences were observed across platforms.
The analysis suggests that workers on Prolific are more likely to report multiple previous experiences compared to those on other platforms, validating the recruitment criterion described in Section
5.2. Workers on Amazon Mechanical Turk and Toloka seem accustomed to longitudinal studies, indicating the need for a higher HIT completion threshold to recruit them effectively.
6.1.2 Time Elapsed.
Figure
2 describes the time elapsed in terms of months since each previous experience reported, with a particular focus on participation in longitudinal studies that occurred up to 12 months earlier.
The majority of the reported experiences (87%), indeed, occurred within the 12 months preceding participation in the survey, while the remaining 13% occurred earlier. The distribution of participation that took place within the previous year, however, is rather homogeneous, with roughly 13% of participation for each crowdsourcing platform occurring more than 12 months earlier. This indicates that on Amazon Mechanical Turk and Prolific, workers were able to commit to longitudinal studies throughout the whole year before participating in this survey, while on Toloka, the experiences reported have been more recent (Amazon Mechanical Turk vs. Toloka statistically significant, adjusted p-value < 0.05).
6.1.3 Number of Sessions.
Figure
3 details, for each previous experience with longitudinal studies reported, how many sessions composed the overall study referred.
The longitudinal studies in which workers participated on Amazon Mechanical Turk and Toloka have an average of about six sessions, while those on Prolific have seven sessions on average. In general, it appears that task requesters tend to publish slightly longer longitudinal studies on Prolific, although we did not obtain statistically significant comparisons across platforms.
6.1.4 Interval Between Sessions.
Figure
4 details the time elapsed, in terms of days, between the sessions of the longitudinal study to which the reported experiences refer, focusing on ranges from 1 day to more than 30 days.
The timespans ranging from 1 day to 9 days, encompass the majority of the longitudinal studies referred to by the reported experiences (63.45%). By extending the considered range up to 30 days, the vast majority of previous experiences (90%) are comprised. Summarizing, most requesters schedule the next session of a study starting from the following day up to a month later, with ten days being the most common timespan (Amazon Mechanical Turk vs. Toloka statistically significant, adjusted p-value < 0.01).
6.1.5 Session Duration.
Figure
5 details the duration of sessions in the longitudinal study to which the reported experiences refer, measured in minutes or hours.
Almost half of the longitudinal studies had sessions lasting 15 minutes (48.09%), while 22.89% lasted for 30 minutes, 12.72% for 45 minutes, and 12.41% for 60 minutes. The vast majority of sessions, thus, take place within an hour of work (96.11%). There is a small but not negligible number of sessions in longitudinal studies available on Toloka that last for 2 hours (13), along with two sessions on Amazon Mechanical Turk and a single session on Prolific. Furthermore, two workers reported Amazon Mechanical Turk sessions lasting 3 hours or more.
In general, the vast majority of task requesters on Prolific tend to publish longitudinal studies with shorter sessions, primarily 15 minutes (72%) or 30 minutes (20%), compared to other platforms. The answer distribution is more uniform when comparing Amazon Mechanical Turk and Toloka, although requesters on the latter platform tend to publish studies with longer sessions (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Toloka vs. Prolific statistically significant; adjusted p-value < 0.01).
6.1.6 Crowdsourcing Platform.
Figure
6 describes on which previous experiences with longitudinal studies were conducted, as a worker recruited on a platform might have worked also elsewhere. Roughly, the same number of experiences took place on Amazon Mechanical Turk (38.16%) and Prolific (39.47%), while fewer experiences (22.37%) happened on Toloka.
Breaking down the responses by platform, the majority of experiences reported by Amazon Mechanical Turk and Prolific workers occurred on their respective platforms (around 90%). However, there were instances of cross-platform participation: 9% of Amazon Mechanical Turk workers reported experiences on Prolific, while 6% of Prolific workers reported experiences on Amazon Mechanical Turk and 4% on Toloka. Additionally, although experiences reported by Toloka workers primarily occurred on Toloka (63%), a notable portion also occurred on Amazon Mechanical Turk (17%) and Prolific (19%).
Summarizing, the distribution of the collected answers shows that Toloka workers tend to work on other platforms more frequently than those recruited from Amazon Mechanical Turk and Prolific, particularly in the context of longitudinal studies. However, this trend can also be observed on the remaining platforms (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Prolific vs. Toloka are statistically significant with an adjusted p-value < 0.01).
6.1.7 Payment Model.
Figure
7 investigates the payment model adopted by the longitudinal studies in which the recruited workers reported participating.
The majority of reported previous experiences (70.31%) involved longitudinal studies where workers were paid after each session, while 21.84% reported experiences with a final reward as the only form of payment. Only 7.84% of the reported experiences described studies relying on a combination of both payment approaches.
The distribution of the answers collected shows that the majority of previous experiences reported were part of longitudinal studies in which the workers were paid after each session, particularly on Amazon Mechanical Turk (75%). Using a final reward is also a viable option, as in 25% of the experiences reported by workers recruited on Prolific and Toloka. Furthermore, 9% of the experiences reported by Amazon Mechanical Turk workers and 7% of those reported on the remaining platforms refer to longitudinal studies that employed a combination of both approaches (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.8 Participation in Same Study.
Figure
8 investigates the workers’ satisfaction after having participated in the longitudinal study referred to by each reported experience.
The vast majority of workers (91.59%) express their interest in participating again in the same longitudinal study. When breaking down the data across each platform, such opinion is consistent for both Prolific and Toloka workers, with a percentage of positive answers of 98% and 93%, respectively, while it lowers to 83% for Amazon Mechanical Turk workers (Amazon Mechanical Turk vs. Prolific and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.9 Loyalty and Commitment.
The mandatory open question 1.1.X.7.2 (P1 part) asks workers to specify what drove them to return for a second session after completing the first one in the longitudinal study referred to by the reported experience. Also, the workers must explain why they would refuse to participate in the same study altogether.
The workers provided 485 answers among the 547 previous experiences with longitudinal studies reported (88.66%). The distribution of answers collected across different themes is as follows: 272 out of 485 (56.08%) addressed aspects related to the task performed (
task_features), while 101 (20.82%) focused on workers’ own beliefs and motivations (
worker_features). Additionally, 10 (2.06%) were about the longitudinal study as a whole (
ls_features), 9 (1.86%) about the requester (
requester_features), and 2 (0.41%) about the platform (
platform_features). Lastly, 91 (18.76%) answers were deemed unusable (
answer_useless). Table
7 (Appendix
B) shows a sample of such answers.
The majority of responses (272 out of 485, 56.08%) highlight how task attributes influence their decisions. Some workers find tasks interesting (100 out of 272, 36.76%), easy (54 out of 272, 19.85%), or well-paid (112 out of 272, 41.58%), which motivates their return. Others (15 out of 272, 5%) mention the perceived reliability of securing rewards in subsequent sessions as a driver to return. Many workers (58 out of 272, 41.58%) appreciate the task’s agency for expressing their views and getting paid in return. Conversely, issues like low or unfair rewards, worker unavailability during follow-up sessions, or device-specific requirements are common reasons for abandonment or refusal to participate in longitudinal studies after a session. About 20.82% of responses (101 out of 485) come from workers who believe their preferences and attributes influence their decision to return for subsequent sessions in longitudinal studies.
A few workers (4 out of 101, 3.96%) mentioned the sunk costs of completing the first session as a motivating factor to return [
5]. Additionally, 45 out of 101 workers (44.55%) expressed satisfaction with completing the initial session, citing the commitment required (12 out of 101, 11.88%); overall involvement; or the chance to gain insights, learn, and develop skills throughout the studies (15 out of 101, 15%).
A small number of workers (9 out of 485, 1.86%) discussed aspects and characteristics of the task requester that impact loyalty and commitment to the longitudinal study. They highlghted communication with the requesters and their ability to remind participants of subsequent study sessions as crucial factors. Additionally, 10 workers out of 485 (2%) touched on aspects of the longitudinal study as a whole. They described the type of study they enjoy and explained how longitudinal studies provided guaranteed work without the need to compete for tasks.
6.1.10 Participation Incentives (in Previous Experiences).
Figure
9 addresses the underlying motivations that drive workers’ participation in the previous experiences with longitudinal studies reported.
Monetary aspects such as rewards and bonuses are the most important incentives for the participation in the majority of reported experiences (70.42%). Workers’ personal interest in the task proposed by the requester in the longitudinal study is an incentive for roughly 19% of experiences. Roughly 6% of participation in the reported experiences occurred because the worker found the task proposed educative, while the workers’ altruism, in terms of helping the overall research, has a lower but not negligible importance, considered by 4.71% of the respondents.
When considering each platform, it is interesting to note that 17% of Toloka workers found the task proposed in the reported experience with longitudinal study educative, while this component is almost absent from Amazon Mechanical Turk (1%) and Prolific. Furthermore, Prolific is the platform that published the majority of experiences that took place due to workers’ personal interest (26%) or willingness to help the research (7%). This may be due to the fact that such platforms are mostly focused on academic-related research projects, and task requesters are often researchers [
58].
Generally, even though monetary aspects are the most popular incentives that drove workers to participate in the previous experiences with longitudinal studies reported, the remaining factors should not be overlooked when designing the overall longitudinal study (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.11 Study Completion.
Figure
10 investigates whether workers completed the overall longitudinal study to which each reported experience refers. Specifically, they claim completion of almost every previous experience (97.65%), with only 13 experiences out of 547 (2.35%) dropped.
When considering each platform, workers claim completion of almost every experience on Prolific and Toloka (99%), while this amount is slightly lower for Amazon Mechanical Turk, particularly 95% (no statistically significant comparisons across platforms obtained). Even though the crowdsourcing platforms do not provide any means of verifying this data, we recall that we recruit workers with certain task completion rates (i.e., experienced workers), as described in Section
5.2. Thus, we argue that they have little incentive to provide inaccurate information about their previous completions.
6.1.12 Completion Incentives (in Previous Experiences).
Figure
11 addresses the underlying motivations that drive workers to complete the previous experiences with longitudinal studies reported and should be compared with the answers provided for question 1.1.
X.8, analyzed in Section
6.1.10, which focuses on the ones that drive workers to participate. Indeed, while the set of possible answers is the same, this question restricts the focus to completed experiences and attempts to grasp the changes in workers’ perception of the overall experience. Thus, the 11 experiences from which workers dropped participation (i.e., those reported in the right half of Figure
10) are marked using a separate string, that is “Participation Dropped”, to allow a direct comparison of the bar charts.
Monetary aspects such as rewards and bonuses remain the most important factors for the majority of the previous experiences reported (68.3%), with a slight decrease (2.12%). The impact of workers’ personal interest in the task proposed by the requester (18.49%) remains almost unchanged, as does their opinion about the task being educative. Most of the answers that shift from monetary aspects, indeed, end up describing workers’ willingness to help with the overall research, from 4.71% to 6.06%.
When considering each platform, the overall distribution of answers does not change in terms of relative comparisons. The most noticeable difference is found for Prolific, where workers’ personal interest in the proposed task drops from 26% to 19%, becoming comparable with that of other platforms. A similar phenomenon occurs for Amazon Mechanical Turk, where interest in the final reward shifts from 49% to 55% (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.13 Crowdsourcing Platforms Suitability.
The mandatory open-ended question 2 (P1 part) is used to ask workers about the adequacy and suitability of the crowdsourcing platform of provenance in the support they provide for longitudinal studies.
The majority of workers (273 out of 300, 91%) provided an answer that allows us to draw some kind of consideration. The distributions of the answers collected across different themes is as follows: 244 out of 273 (89.34%) addressed aspects related to the crowdsourcing platform (
platform_features), while 11 (4.03%) focused on workers’ own beliefs and motivations (
worker_features). Lastly, 18 (6.59%) answers were deemed unusable (
answer_useless). Table
8 (Appendix
B) shows a sample of such answers.
The vast majority of answers directly relate to the crowdsourcing platform of origin (244 out of 273, 89.34%). Breaking down the respondents across each platform reveals 98 workers from Amazon Mechanical Turk, 100 from Prolific, and 76 from Toloka. The majority of Amazon Mechanical Turk workers (70 out of 98, 71.43%) believe the platform is generally adequate, with few providing additional details. Three of them (3.06%) specifically mention the ease of sending reminders for upcoming longitudinal study sessions. Only seven (7.14%) find the platform inadequate in supporting longitudinal studies. One worker suggests that the platform needs design improvements to facilitate scheduling tasks for longitudinal studies, while another highlights the challenge for requesters to ensure worker honesty.
Nearly all Prolific workers (97 out of 100, 97%) consider the platform adequate for supporting longitudinal studies, with many providing detailed responses. Some mention the platform’s detailed task reports, facilitating tracking throughout the study. Others (7 out of 100, 7%) highlight the diverse backgrounds and skills of available individuals. Factors such as ease of contacting or sending reminders to workers using their identifier are noted by 16 out of 100 workers (16%). Additionally, two workers (2 out of 100, 2%) emphasize worker motivation and reliability as important considerations for researchers. Notably, one worker mentions being recruited from the platform via a third-party application that relies on the platform’s API.
The majority of Toloka workers (68 out of 76, 89.47%) consider the platform adequate overall, with few providing specific details. Two workers (2 out of 76, 2.63%) mention worker availability and the ease of contacting them using their identifier. One worker’s response is notable; they believe the platform cannot adequately support a longitudinal study due to residing in a country with poor network infrastructure.
When considering workers who are uncertain or outright deny the adequacy of the platform, several cross-platform factors become apparent. These workers are more likely to drop out of longitudinal studies due to perceived inadequacies. They express difficulties in assessing requester honesty, which can lead to skepticism about participating in such studies. Additionally, respondents believe that workers typically do not actively seek out longitudinal studies, suggesting a need for platforms to better distinguish these studies from standard crowdsourcing tasks.
6.1.14 Reasons that Limit Availability on Platforms.
Figure
12 investigates the reasons that limit the availability of longitudinal studies on crowdsourcing platforms according to workers’ opinions. The most prevalent reasons, chosen roughly the same number of times, are that workers dislike the required commitment (32.85%) and that the provided rewards and incentives are insufficient. Several answers indicate that longitudinal studies are not optimally supported by current popular crowdsourcing platforms (24.85%), and 9.07% of answers point out that usually requesters do not need longitudinal participation since most tasks deal with static data to annotate.
The distribution of answers changes when considering each platform. Specifically, 44% of the answers provided by Prolific workers indicate their dislike of the required commitment, while this factor is less important for Amazon Mechanical Turk workers (29%) and Toloka workers (26%). The lack of adequate technical support is prevalent among the answers provided by Toloka workers (35%), while for Prolific, this is reported by only 12% of the answers. The percentage of answers indicating that rewards and incentives are insufficient is slightly higher for Amazon Mechanical Turk (36%) compared to Toloka (33%), which in turn is slightly higher than Prolific (29%). Among the answers describing that often crowdsourcing tasks do not need longitudinal participation, those from Prolific are prevalent (15%).
Summarizing, workers indeed dislike the required commitment and find monetary aspects and related incentives insufficient. Also, they think that longitudinal studies are not adequately supported by crowdsourcing platforms (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.15 Preferred Commitment Duration.
Figure
13 investigates the number of days workers would be happy to commit to for a longitudinal study, hypothesizing a single session having a duration of 15 minutes per day.
By considering each platform, Amazon Mechanical Turk and Toloka workers show rather similar trends, with mean numbers of days around 19 and 17, respectively. Turning to Prolific, this number increases to an average of almost a month (29 days).
Generally, Prolific is the platform that allows for finding workers willing to commit to longitudinal studies for longer periods, at least when compared with Toloka (Prolific vs Toloka statistically significant with adjusted p-value < 0.05).
6.1.16 Reasons for Declining Participation.
Figure
14 investigates which are the reasons that drive workers to decline participation in longitudinal studies.
The majority of the answers provided by workers indicate that the length of the longitudinal study, in terms of the number of sessions and thus the time elapsed in days or even months since its start, is the most important factor (70.79%). The remaining answers (29.03%) indicate that the frequency of the sessions of the longitudinal study is also a reason that can lead to declining participation and should not be overlooked.
By considering each platform, the vast majority of answers provided by Prolific workers (85%) consider study length as a major concern, and this holds also when considering Toloka, albeit to a lesser extent (71%). As for Amazon Mechanical Turk, the trend is more nuanced, since the gap between answers that consider study length (57%) and study frequency (43%) is smaller (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.17 Preferred Participation Frequency.
Figure
15 investigates the preferred participation frequency in longitudinal studies according to the workers, in terms of time periods.
The vast majority of workers prefer frequent studies, having a daily to weekly participation commitment. Particularly, a daily participation is the most popular option overall (42.78%). Only a niche of 11 workers (6.68%) would prefer longer time periods.
There are some nuances among the preferences of the workers recruited from each platform. Particularly, Toloka workers prefer, for the most part, a daily participation frequency (53%). Prolific workers, on the other hand, have a slightly higher preference for a weekly frequency (40%), followed by a daily frequency (35%). For Amazon Mechanical Turk workers, the trend is the opposite, as they prefer a daily participation frequency (40%), shortly followed by a weekly frequency (38%). Regarding longer frequencies, it is worth noting that six Toloka workers (6%) prefer a biweekly frequency, and five Amazon Mechanical Turk workers (5%) along with three Prolific workers (3%) prefer a monthly frequency.
These findings can be aligned with those described in Figure
14, as indeed the study length is a major concern for workers (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka and Prolific vs. Toloka statistically significant; adjusted
p-value < 0.01).
6.1.18 Preferred Session Duration.
Figure
16 investigates the preferred session duration in hours for longitudinal studies according to workers.
Prolific workers prefer short sessions of less than 1 hour on average, while Amazon Mechanical Turk and Toloka workers share a more uniform preference, indicating an average of about two hours. The figure does not show 9 outliers who provide non-reasonable durations (i.e., between 15 and 50 hours), thus they are removed.
Generally speaking, Amazon Mechanical Turk and Toloka workers are thus keen to work for a longer time within a single session when compared with Prolific workers (Amazon Mechanical Turk vs. Prolific and Prolific vs. Toloka statistically significant; adjusted p-value < 0.05).
6.1.19 Acceptable Hourly Payment.
Figure
17 investigates the acceptable hourly payment rate in USD$ for participating in longitudinal studies on the recruitment platform, as reported by the workers.
Amazon Mechanical Turk workers aim to receive the highest hourly payment on average (about $13), while for Prolific workers, this amount lowers to about $10.50. On the other hand, Toloka workers indicate the lowest acceptable amount of money (about $8.5). The figure does not include eight outliers who provided unreasonable amounts (i.e., amounts ranging between $80 and $100) and were thus removed.
To interpret the provided answers, one must consider that the payment models of Amazon Mechanical Turk and Toloka differ from that of Prolific. On the first two platforms, a task requester proposes a unitary amount of money for each work unit performed, which can be arbitrarily high. On the other hand, the Prolific platform requires requesters to estimate the task completion time and propose, instead of a unitary amount, a minimum amount of money based on the hourly estimate. Thus, this difference may impact the workers’ perception of the acceptable payment amount (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka statistically significant; adjusted p-value < 0.05).
6.1.20 Preferred Time to Allocate Daily.
Figure
18 investigates the preferred amount of time in hours that workers are available to allocate for participating in longitudinal studies on a daily basis.
The workers recruited on Toloka are those keen to work more per day, being available to allocate up to almost four hours on average (3.81). Then, Amazon Mechanical Turk workers prefer working up to almost three hours (2.85), while Prolific ones expect to work less, with roughly an hour and a half (1.66). The figure does not show 18 outliers who provided non-reasonable amounts of hours per day (i.e., between 20 and 25), and were thus removed.
In general, Toloka workers are those who are keen to work more within a day and expect to be rewarded less. This is evident not only in the time they allocate daily for participation, as shown in Figure
18, but also when asked about their preferred session duration (Figure
16) or their ideal daily payment (Figure
17). As for Amazon Mechanical Turk and Prolific workers, they expect to work less on average, particularly the latter ones (Amazon Mechanical Turk vs. Prolific, Prolific vs. Toloka statistically significant; adjusted p-value < 0.05).
6.1.21 Participation Incentives (in New Experiences).
Figure
19 investigates the underlying motivations that drive participation in new longitudinal studies.
In general, the type of reward/payment mechanism is the most important incentive, according to the vast majority of answers provided by workers (81.86%). Among them, the preferred alternative is providing payment after each session (32.07%). As for the remaining ones, 24.22% indicate a final bonus to be awarded after the last session, while 20.38% prefer a progressive incremental payment after each session. A progressive decremental payment (2.51%) or eventual penalization for skipping one or more sessions (2.43%) have a small but not negligible influence on participation chances in new studies.
Beyond the reward/payment mechanism, 12.04% of answers indicate working on different task types to increase engagement diversity, while 6.18% suggest experimental variants of the same tasks to reduce repeatability. When considering each crowdsourcing platform, no particular trends emerge (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.22 Tasks Type.
Figure
20 investigates the tasks that workers would like to perform in a longitudinal study. We acknowledge that the predefined set of answers we provided might not have been perceived as exhaustive. Indeed, they were given the opportunity to provide a free-text response to further elaborate.
By surveys, we refer to surveys about various aspects that are usually crowdsourced, like demographics (22.71%). Verification and validation tasks require workers in the crowd to either verify certain aspects as per the given instructions, or confirm the validity of various kinds of content (17.99%). Interpretation and analysis tasks rely on the wisdom of the crowd to use their interpretation skills during task completion (17.92%). Information finding tasks delegate the process of searching to satisfy one’s information need to the workers in the crowd (16.51%). Content access tasks require the crowd workers to simply access some content (14.59%) and content creation tasks require the workers to generate new content for a document or website (10.28%).
It is worth noting that two workers mentioned in their free text responses other types of tasks, namely gamified tasks and content editing, which indeed is an option that we did not consider along with content access and content creation.
Summarizing, workers are willing to perform any of the task types proposed across each platform, with a rather homogeneous answer distribution. However, this distribution still accounts for statistical significance (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.23 Involvement Benefits.
Figure
21 investigates which are the benefits of being involved in longitudinal studies according to workers.
In general, workers think that the most important benefit characterizing longitudinal studies is increased productivity due to their more operational nature (32.1%). They also appreciate the time-saving aspect, as longitudinal studies eliminate the need for regular task searching (26.64%). Furthermore, workers think that receiving intermediate payments, after each session of the longitudinal study, would increase trust in the requester (25.81%). Additionally, some workers find value in avoiding the need to re-learn tasks when participating in longitudinal studies (15.45%).
The trends are homogeneous across all platforms, with no factor considered more important than others. However, the only exception is increased productivity, which is more prominent for Amazon Mechanical Turk workers (36%) and Toloka workers (37%) compared to Prolific (24%). Nonetheless, this distribution still accounts for statistical significance (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.24 Involvement Downsides.
Figure
22 investigates which are the downsides of being involved in longitudinal studies according to workers.
The answers provided by workers indicate that a reward provided only at the end of the longitudinal study is the most important downside (30.87%). The lack of flexibility in the study schedule and the long term commitment required have roughly are indicated by roughly the same amount of answers, namely 27.48% and 27.63%. The lack of diversity in terms of the work to be performed during each session of the overall study plays a minor role (14.02%).
By considering each platform, the trends are rather homogeneous for Amazon Mechanical Turk and Prolific. However, it is interesting to notice how the lack of diversity is a more prominent downside for Toloka workers (20%), while at the same time, the long-term commitment is less of an issue (21%) when compared to the remaining platforms (Amazon Mechanical Turk vs. Prolific, Amazon Mechanical Turk vs. Toloka, and Prolific vs. Toloka statistically significant; adjusted p-value < 0.01).
6.1.25 Suggestions about Longitudinal Study Design.
The last and optional question 11 (P2 part) asked workers to provide any suggestions to requesters that aim to design a longitudinal study.
There are 201 out of 300 (67%) workers who provide some kind of answer. The distribution of answers collected across different themes is as follows: 139 out of 201 (69.15%) addressed aspects related to the task performed (
task_features), 9 (4.48%) focused on requesters (
requester_features), and 7 (3.48%) focused on workers’ own beliefs and motivations (
worker_features). Additionally, 5 (2.49%) were about the longitudinal study as a whole (
ls_features), and 2 (1%) were about the platform (
platform_features). Lastly, 2 (1%) answers were deemed unusable (
answer_useless). Furthermore, 37 (18.41%) workers explicitly stated that they did not have any suggestions (
no_suggestion). Table
9 (Appendix
B) shows a sample of such answers.
The majority of workers (139 out of 201, 69.15%) suggest improvements related to the features of the task to be performed within each session of the longitudinal study, including its design, scheduling, and participant filtering. Six out of 139 workers (4.32%) propose allowing a reasonable window for completion, considering other activities in workers’ schedules. One worker suggests the option to skip a session if unable to commit occasionally. Additionally, a few workers (3 out of 139, 2.16%) emphasize the importance of conducting pilot tests for the tasks, which can help both requesters find suitable participants and retain workers throughout the study. A worker suggests offering different systems for participating in the study (e.g., desktop devices, smartphones) and another worker advises against requiring downloads. This resonates with prior work that has revealed diverse work environments that workers are embedded in [
25]. Workers emphasize the need for clear instructions and user interface, an understandable sequence of events, identifying changes over time, and providing insight into cause-and-effect relationships. Some believe variability could help maintain interest in the study.
Regarding the overall structure of a longitudinal study (5 out of 201, 2.49%), workers suggest planning all sessions in advance while remaining flexible with the schedule, especially when involving multiple geographic time zones. They also recommend establishing a sense of progression, such as highlighting differences in previous responses at the end of each session.
A few workers (7 out of 201, 3.48%) provide personal insights. One worker notes that many are self-employed and must pay taxes on their earnings from crowdsourcing platforms, so rewards should reflect this. Another worker prefers small payments with a bonus for completing all sessions.
Considering aspects related to the task requesters (9 out of 201, 4.48%), workers think regular feedback from requesters is important. They suggest that requesters should be communicative and friendly, leave spaces for feedback in each study, send reminders when needed, and provide clear upfront information.
6.1.26 Summary.
The workers’ answers for the
P1 part of the survey are summarized in Table
4, while those provided for
P2 in Table
5. Both tables provide a detailed summary of the answers, along with the code used to classify each question and a breakdown of responses across each crowdsourcing platform considered.
Table
6 shows the outcome of statistical tests performed by comparing the groups of answers provided across each platform. The table includes the name and answer type of each question. A checkmark (
\(\checkmark\)) indicates a statistically significant comparison with the adjusted
p-value provided, while its absence indicates that a given comparison was not statistically significant.
Finally, we summarize the key findings with a list of take-home messages, starting from the perception of longitudinal studies’ according to workers’ previous experiences (messages 1-9, P1 part questions), then moving to workers’ opinions about future longitudinal studies (messages 10-17, P2 part questions). For each message, we report a reference to the corresponding section where the analysis is reported.
(1)
Workers with more experience with longitudinal studies can be found more easily on the Prolific platform (Section
6.1.1), and the available studies on this platform tend to have more sessions compared to other platforms (Section
6.1.3).
(2)
Most of the experiences reported by the workers took place up to one year before their participation in the survey (Section
6.1.2).
(3)
Most of the sessions of the reported longitudinal studies lasted up to 2 hours, with roughly half of them lasting for only 15 minutes (Section
6.1.5).
(4)
Most of the time intervals between sessions in the reported longitudinal studies range from 1 to 30 days (Section
6.1.4).
(5)
Most of the longitudinal studies reported provide partial rewards after each session (Section
6.1.7).
(6)
The main motivation that drove workers to participate in and complete the reported longitudinal studies is the monetary aspect (Sections
6.1.10 and
6.1.12).
(7)
Almost every worker claims completion of the reported longitudinal studies (Section
6.1.11).
(8)
Most of the workers want to continue participating in the longitudinal studies reported in the future (Section
6.1.8).
(9)
The main reasons that limit the availability of longitudinal studies on crowdsourcing platforms are workers’ dislike for the required commitment and the insufficiency of the provided rewards (Section
6.1.14).
(10)
In a hypothetical longitudinal study where workers are asked to engage in a single session for 15 minutes each day, workers are willing to commit to participating for an average of 21 days (Section
6.1.15). However, when considering session duration, workers are generally willing to work for up to an average of 103 minutes per session (Section
6.1.18).
(11)
Most of the workers prefer a daily to weekly participation frequency for longitudinal studies (Section
6.1.17).
(12)
The workers prefer to allocate daily for participating in longitudinal studies about 2.7 hours on average (Section
6.1.20).
(13)
The workers think that the acceptable hourly payment for participating in longitudinal studies is about $10.75 on average (Section
6.1.19). It must be noted that such an amount should be adjusted for inflation.
(14)
Workers report that the main incentives driving participation in new longitudinal studies are related to the reward provided (Section
6.1.21).
(15)
Most of the workers believe that the length of a longitudinal study is critical in influencing their decision to refuse participation (Section
6.1.16).
(16)
Workers report that the main benefits of being involved in longitudinal studies are increased productivity due to their operational nature and the elimination of the need for regular task searching (Section
6.1.23).
(17)
Workers report that the main downsides of being involved in longitudinal studies are the long-term commitment required, the lack of flexibility, and the reward provided only at their completion (Section
6.1.24).