Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (57)

Search Parameters:
Keywords = human–robot conversation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2476 KiB  
Article
Enhancing Human–Agent Interaction via Artificial Agents That Speculate About the Future
by Casey C. Bennett, Young-Ho Bae, Jun-Hyung Yoon, Say Young Kim and Benjamin Weiss
Future Internet 2025, 17(2), 52; https://doi.org/10.3390/fi17020052 - 21 Jan 2025
Viewed by 581
Abstract
Human communication in daily life entails not only talking about what we are currently doing or will do, but also speculating about future possibilities that may (or may not) occur, i.e., “anticipatory speech”. Such conversations are central to social cooperation and social cohesion [...] Read more.
Human communication in daily life entails not only talking about what we are currently doing or will do, but also speculating about future possibilities that may (or may not) occur, i.e., “anticipatory speech”. Such conversations are central to social cooperation and social cohesion in humans. This suggests that such capabilities may also be critical for developing improved speech systems for artificial agents, e.g., human–agent interaction (HAI) and human–robot interaction (HRI). However, to do so successfully, it is imperative that we understand how anticipatory speech may affect the behavior of human users and, subsequently, the behavior of the agent/robot. Moreover, it is possible that such effects may vary across cultures and languages. To that end, we conducted an experiment where a human and autonomous 3D virtual avatar interacted in a cooperative gameplay environment. The experiment included 40 participants, comparing different languages (20 English, 20 Korean), where the artificial agent had anticipatory speech either enabled or disabled. The results showed that anticipatory speech significantly altered the speech patterns and turn-taking behavior of both the human and the agent, but those effects varied depending on the language spoken. We discuss how the use of such novel communication forms holds potential for enhancing HAI/HRI, as well as the development of mixed reality and virtual reality interactive systems for human users. Full article
(This article belongs to the Special Issue Human-Centered Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Gameplay example during experiment: human vs. avatar (from [<a href="#B58-futureinternet-17-00052" class="html-bibr">58</a>]).</p>
Full article ">Figure 2
<p>Artificial agent speech patterns (H1 condition).</p>
Full article ">Figure 3
<p>Sentiment analysis—H1 condition.</p>
Full article ">Figure 4
<p>Sentiment analysis—Control condition.</p>
Full article ">Figure A1
<p>Speech dialogue with anticipatory speech (gathering resources).</p>
Full article ">Figure A2
<p>Speech dialogue without anticipatory speech.</p>
Full article ">Figure A3
<p>Speech dialogue with anticipatory speech (fighting monsters).</p>
Full article ">Figure A4
<p>Speech dialogue without anticipatory speech.</p>
Full article ">Figure A5
<p>Speech dialogue with anticipatory speech (impending night-time, i.e., substantial danger level increase).</p>
Full article ">Figure A6
<p>Speech dialogue without anticipatory speech.</p>
Full article ">
21 pages, 3698 KiB  
Article
Child-Centric Robot Dialogue Systems: Fine-Tuning Large Language Models for Better Utterance Understanding and Interaction
by Da-Young Kim, Hyo Jeong Lym, Hanna Lee, Ye Jun Lee, Juhyun Kim, Min-Gyu Kim and Yunju Baek
Sensors 2024, 24(24), 7939; https://doi.org/10.3390/s24247939 - 12 Dec 2024
Viewed by 720
Abstract
Dialogue systems must understand children’s utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child–robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot [...] Read more.
Dialogue systems must understand children’s utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child–robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot comprehend children’s intent as accurately as humans because of their distinctive features. An LLM-based dialogue system should acquire the manner by which humans understand children’s speech to enhance its intention reasoning performance in verbal interactions with children. To this end, we propose a fine-tuning methodology that utilizes the LLM–human judgment discrepancy and interactive response data. The former data represent cases in which the LLM and human judgments of the contextual appropriateness of a child’s answer to a robot’s question diverge. The latter data involve robot responses suitable for children’s utterance intentions, generated by the LLM. We developed a fine-tuned dialogue system using these datasets to achieve human-like interpretations of children’s utterances and to respond adaptively. Our system was evaluated through human assessment using the Robotic Social Attributes Scale (RoSAS) and Sensibleness and Specificity Average (SSA) metrics. Consequently, it supports the effective interpretation of children’s utterance intentions and enables natural verbal interactions, even in cases with syntactic incompleteness and mispronunciations. Full article
(This article belongs to the Special Issue Challenges in Human-Robot Interactions for Social Robotics)
Show Figures

Figure 1

Figure 1
<p>Overview of AI home robot service and interaction design from our previous study.</p>
Full article ">Figure 2
<p>Results of Godspeed questionnaire.</p>
Full article ">Figure 3
<p>Process of fine-tuning dataset construction (Q: robot’s question; A: child’s answer; R: interactive response).</p>
Full article ">Figure 4
<p>Example of prompts and response judgment data provided to LLM and humans.</p>
Full article ">Figure 5
<p>Structure of fine-tuning dataset with message roles.</p>
Full article ">Figure 6
<p>Comparison of dialogue systems for child’s utterance with lack of specificity.</p>
Full article ">Figure 7
<p>Comparison of dialogue systems for child’s utterance with subtle affirmative expression.</p>
Full article ">Figure 8
<p>Comparison of dialogue systems for child’s utterance with mispronunciation or misrecognition.</p>
Full article ">Figure 9
<p>Evaluation results for dialogue system.</p>
Full article ">Figure A1
<p>Dialogue system prompts.</p>
Full article ">
16 pages, 3403 KiB  
Article
Beyond Binary Dialogues: Research and Development of a Linguistically Nuanced Conversation Design for Social Robots in Group–Robot Interactions
by Christoph Bensch, Ana Müller, Oliver Chojnowski and Anja Richert
Appl. Sci. 2024, 14(22), 10316; https://doi.org/10.3390/app142210316 - 9 Nov 2024
Viewed by 966
Abstract
In this paper, we detail the technical development of a conversation design that is sensitive to group dynamics and adaptable, taking into account the subtleties of linguistic variations between dyadic (i.e., one human and one agent) and group interactions in human–robot interaction (HRI) [...] Read more.
In this paper, we detail the technical development of a conversation design that is sensitive to group dynamics and adaptable, taking into account the subtleties of linguistic variations between dyadic (i.e., one human and one agent) and group interactions in human–robot interaction (HRI) using the German language as a case study. The paper details the implementation of robust person and group detection with YOLOv5m and the expansion of knowledge databases using large language models (LLMs) to create adaptive multi-party interactions (MPIs) (i.e., group–robot interactions (GRIs)). We describe the use of LLMs to generate training data for socially interactive agents including social robots, as well as a self-developed synthesis tool, knowledge expander, to accurately map the diverse needs of different users in public spaces. We also outline the integration of a LLM as a fallback for open-ended questions not covered by our knowledge database, ensuring it can effectively respond to both individuals and groups within the MPI framework. Full article
(This article belongs to the Special Issue Advances in Cognitive Robotics and Control)
Show Figures

Figure 1

Figure 1
<p>Intended design for dyadic versus group interactions, illustrating linguistic nuances in conversation design using the German language as a case study. Own illustration with AI-generated background.</p>
Full article ">Figure 2
<p>This is a conversation between a user (red) and a social robot (orange) in a city administration setting. Each speech bubble is bilingual (German/English). Icons next to the robot indicate a knowledge database response and single user mode. The dialogue continues in <a href="#sec3dot1-applsci-14-10316" class="html-sec">Section 3.1</a>.</p>
Full article ">Figure 3
<p>Architectural sketch of our social robot, with the corresponding subsystems. Own illustration.</p>
Full article ">Figure 4
<p>Detection of individuals at the city hall through multiple cameras evaluates engagement by examining the area of bounding boxes. Blue boxes represent engaged individuals, while red boxes signify either non-users or background objects. Exocentric camera outputs are shown in the left and right images, and the center video is from Furhat’s egocentric camera. Participant faces are anonymized to safeguard their privacy.</p>
Full article ">Figure 5
<p>This is the continuation of the dialogue from <a href="#sec2dot1-applsci-14-10316" class="html-sec">Section 2.1</a>, where a second user joins the interaction space, prompting the social robot to switch to MPI mode. The second users intents are shown in purple. Icons beside the robot indicate MPI mode and a knowledge database response. The conversation will be continued in <a href="#sec3dot3-applsci-14-10316" class="html-sec">Section 3.3</a>.</p>
Full article ">Figure 6
<p>The GPT 3.5 prompt to generate multiple English synonyms for plural sentences, addressing a group.</p>
Full article ">Figure 7
<p>This is the continuation of the dialogue from <a href="#sec3dot1-applsci-14-10316" class="html-sec">Section 3.1</a>. Icons beside the robot indicate MPI mode and knowledge database responses. The conversation will be continued in <a href="#sec3dot4-applsci-14-10316" class="html-sec">Section 3.4</a>.</p>
Full article ">Figure 8
<p>This is the continuation of the dialogue from <a href="#sec3dot4-applsci-14-10316" class="html-sec">Section 3.4</a>. Icons beside the robot indicate MPI mode vs. single user mode as well as LLM vs. knowledge database responses.</p>
Full article ">Figure 9
<p>Final LLM prompt.</p>
Full article ">
25 pages, 6212 KiB  
Article
Qualitative Analysis of Responses in Estimating Older Adults Cognitive Functioning in Spontaneous Speech: Comparison of Questions Asked by AI Agents and Humans
by Toshiharu Igarashi, Katsuya Iijima, Kunio Nitta and Yu Chen
Healthcare 2024, 12(21), 2112; https://doi.org/10.3390/healthcare12212112 - 23 Oct 2024
Viewed by 1065
Abstract
Background/Objectives: Artificial Intelligence (AI) technology is gaining attention for its potential in cognitive function assessment and intervention. AI robots and agents can offer continuous dialogue with the elderly, helping to prevent social isolation and support cognitive health. Speech-based evaluation methods are promising as [...] Read more.
Background/Objectives: Artificial Intelligence (AI) technology is gaining attention for its potential in cognitive function assessment and intervention. AI robots and agents can offer continuous dialogue with the elderly, helping to prevent social isolation and support cognitive health. Speech-based evaluation methods are promising as they reduce the burden on elderly participants. AI agents could replace human questioners, offering efficient and consistent assessments. However, existing research lacks sufficient comparisons of elderly speech content when interacting with AI versus human partners, and detailed analyses of factors like cognitive function levels and dialogue partner effects on speech elements such as proper nouns and fillers. Methods: This study investigates how elderly individuals’ cognitive functions influence their communication patterns with both human and AI conversational partners. A total of 34 older people (12 men and 22 women) living in the community were selected from a silver human resource centre and day service centre in Tokyo. Cognitive function was assessed using the Mini-Mental State Examination (MMSE), and participants engaged in semi-structured daily conversations with both human and AI partners. Results: The study examined the frequency of fillers, proper nouns, and “listen back” in conversations with AI and humans. Results showed that participants used more fillers in human conversations, especially those with lower cognitive function. In contrast, proper nouns were used more in AI conversations, particularly by those with higher cognitive function. Participants also asked for explanations more often in AI conversations, especially those with lower cognitive function. These findings highlight differences in conversation patterns based on cognitive function and the conversation partner being either AI or human. Conclusions: These results suggest that there are differences in conversation patterns depending on the cognitive function of the participants and whether the conversation partner is a human or an AI. This study aims to provide new insights into the effective use of AI agents in dialogue with the elderly, contributing to the improvement of elderly welfare. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) the appearance of the author who conducted the interpersonal conversation and (<b>b</b>) the appearance of the AI agent modelled based on the author’s appearance.</p>
Full article ">Figure 2
<p>Protocol of daily conversation system for cognitive function estimation by AI agents.</p>
Full article ">Figure 3
<p>Configuration diagram showing the entire system.</p>
Full article ">Figure 4
<p>Comparison of the average frequency of fillers based on dialogue partner.</p>
Full article ">Figure 5
<p>Comparison of the frequency of fillers based on cognitive function levels.</p>
Full article ">Figure 6
<p>Comparison of the average frequency of fillers based on dialogue partner in the high cognitive function group.</p>
Full article ">Figure 7
<p>Comparison of the average frequency of fillers based on dialogue partner in the low cognitive function group.</p>
Full article ">Figure 8
<p>Comparison of the average frequency of fillers based on cognitive function levels in conversations with AI.</p>
Full article ">Figure 9
<p>Comparison of the average frequency of fillers based on cognitive function levels in conversations with humans.</p>
Full article ">Figure 10
<p>Comparison of the average frequency of proper nouns based on dialogue partner.</p>
Full article ">Figure 11
<p>Comparison of the average frequency of proper nouns based on cognitive function levels.</p>
Full article ">Figure 12
<p>Comparison of the average frequency of proper nouns based on dialogue partner in the high cognitive function group.</p>
Full article ">Figure 13
<p>Comparison of the average frequency of proper nouns based on dialogue partner in the low cognitive function group.</p>
Full article ">Figure 14
<p>Comparison of the average frequency of proper nouns based on cognitive function levels in conversations with AI.</p>
Full article ">Figure 15
<p>Comparison of the average frequency of proper nouns based on cognitive function levels in conversations with humans.</p>
Full article ">Figure 16
<p>Comparison of the average frequency of listening back based on dialogue partner.</p>
Full article ">Figure 17
<p>Comparison of the average frequency of listening back based on cognitive function levels.</p>
Full article ">Figure 18
<p>Comparison of the average frequency of listening back based on cognitive function levels in conversations with AI.</p>
Full article ">Figure 19
<p>Comparison of the average frequency of listening back based on cognitive function levels in conversations with humans.</p>
Full article ">Figure 20
<p>Comparison of the average frequency of listening back based on dialogue partner in the low cognitive function group.</p>
Full article ">Figure 21
<p>Comparison of the average frequency of listening back based on dialogue partner in the high cognitive function group.</p>
Full article ">
19 pages, 4224 KiB  
Article
A Rigid–Flexible Supernumerary Robotic Arm/Leg: Design, Modeling, and Control
by Jiajun Xu, Mengcheng Zhao, Tianyi Zhang and Aihong Ji
Electronics 2024, 13(20), 4106; https://doi.org/10.3390/electronics13204106 - 18 Oct 2024
Cited by 2 | Viewed by 1321
Abstract
As humans’ additional arms or legs, supernumerary robotic limbs (SRLs) have gained great application prospects in many fields. However, current SRLs lack both rigidity/flexibility adaptability and arm/leg function conversion. Inspired by the muscular hydrostat characteristics of octopus tentacles, fiber-reinforced actuators (FRAs) were employed [...] Read more.
As humans’ additional arms or legs, supernumerary robotic limbs (SRLs) have gained great application prospects in many fields. However, current SRLs lack both rigidity/flexibility adaptability and arm/leg function conversion. Inspired by the muscular hydrostat characteristics of octopus tentacles, fiber-reinforced actuators (FRAs) were employed to develop SRLs simultaneously realizing flexible operation and stable support. In this paper, an SRL with FRAs was designed and implemented. The analytic model of the FRA was established to formulate the movement trajectory and stiffness profile of the SRL. A hierarchical hidden Markov model (HHMM) was proposed to recognize the wearer’s motion intention and control the SRL to complete the specific working mode and motion type. Experiments were conducted to exhibit the feasibility and superiority of the proposed robot. Full article
(This article belongs to the Special Issue Advancements in Robotics: Perception, Manipulation, and Interaction)
Show Figures

Figure 1

Figure 1
<p>Schematic view of SRL. (<b>a</b>) The SRL comprise multiple FRAs; (<b>b</b>) the SRL is worn on the human body.</p>
Full article ">Figure 1 Cont.
<p>Schematic view of SRL. (<b>a</b>) The SRL comprise multiple FRAs; (<b>b</b>) the SRL is worn on the human body.</p>
Full article ">Figure 2
<p>Stress distribution of FRA microelements.</p>
Full article ">Figure 3
<p>Schematic illustration of hierarchical segmentation of human motion intention and the corresponding SRL state.</p>
Full article ">Figure 4
<p>Illustration of the HHMM state transition.</p>
Full article ">Figure 5
<p>Overall diagram of the human–SRL interaction control system.</p>
Full article ">Figure 6
<p>SRL prototype experiment. (<b>a</b>) The SRL prototype with the control system and different end-effectors; (<b>b</b>) the SRL fetches a toy in the wearer’s hand; (<b>c</b>) the SRL moves a bottle with large workspace and high flexibility; (<b>d</b>) the SRL wraps a box and lifts it up.</p>
Full article ">Figure 6 Cont.
<p>SRL prototype experiment. (<b>a</b>) The SRL prototype with the control system and different end-effectors; (<b>b</b>) the SRL fetches a toy in the wearer’s hand; (<b>c</b>) the SRL moves a bottle with large workspace and high flexibility; (<b>d</b>) the SRL wraps a box and lifts it up.</p>
Full article ">Figure 7
<p>Movement trajectory test of the SRL. (<b>a</b>) Pick-and-place task 1; (<b>b</b>) pick-and-place task 2; (<b>c</b>) pick-and-place task 3.</p>
Full article ">Figure 7 Cont.
<p>Movement trajectory test of the SRL. (<b>a</b>) Pick-and-place task 1; (<b>b</b>) pick-and-place task 2; (<b>c</b>) pick-and-place task 3.</p>
Full article ">Figure 8
<p>Stiffness test of the SRL. (<b>a</b>) Stiffness regulation task 1; (<b>b</b>) stiffness regulation task 2; (<b>c</b>) stiffness regulation task 3.</p>
Full article ">
19 pages, 54237 KiB  
Article
“You Scare Me”: The Effects of Humanoid Robot Appearance, Emotion, and Interaction Skills on Uncanny Valley Phenomenon
by Karsten Berns and Ashita Ashok
Actuators 2024, 13(10), 419; https://doi.org/10.3390/act13100419 - 16 Oct 2024
Viewed by 1878
Abstract
This study investigates the effects of humanoid robot appearance, emotional expression, and interaction skills on the uncanny valley phenomenon among university students using the social humanoid robot (SHR) Ameca. Two fundamental studies were conducted within a university setting: Study 1 assessed student expectations [...] Read more.
This study investigates the effects of humanoid robot appearance, emotional expression, and interaction skills on the uncanny valley phenomenon among university students using the social humanoid robot (SHR) Ameca. Two fundamental studies were conducted within a university setting: Study 1 assessed student expectations of SHRs in a hallway environment, emphasizing the need for robots to integrate seamlessly and engage effectively in social interactions; Study 2 compared the humanlikeness of three humanoid robots, ROMAN, ROBIN, and EMAH (employing the EMAH robotic system implemented on Ameca). The initial findings from corridor interactions highlighted a diverse range of human responses, from engagement and curiosity to indifference and unease. Additionally, the online survey revealed significant insights into expected non-verbal communication skills, continuous learning, and comfort levels during hallway conversations with robots. Notably, certain humanoid robots evoked stronger emotional reactions, hinting at varying degrees of humanlikeness and the influence of interaction quality. The EMAH system was frequently ranked as most humanlike before the study, while post-study perceptions indicated a shift, with EMAH and ROMAN showing significant changes in perceived humanlikeness, suggesting a re-evaluation by participants influenced by their interactive experiences. This research advances our understanding of the uncanny valley phenomenon and the role of humanoid design in enhancing human–robot interaction, marking the first direct comparison between the most advanced, humanlike research robots. Full article
(This article belongs to the Special Issue Advanced Robots: Design, Control and Application—2nd Edition)
Show Figures

Figure 1

Figure 1
<p>EMAH (Empathic Mechanized Anthropomorphic Humanoid) robotic system implemented on Gen 1 Ameca robot from Engineered Arts.</p>
Full article ">Figure 2
<p>EMAH in the university hallway.</p>
Full article ">Figure 3
<p>Emotions of EMAH along with the image shown for humanlikeness ranking.</p>
Full article ">Figure 4
<p>Robot–human interaction (ROBIN) [<a href="#B7-actuators-13-00419" class="html-bibr">7</a>] (p. 1).</p>
Full article ">Figure 5
<p>Emotions of ROBIN along with the image shown for ordering for humanlikeness.</p>
Full article ">Figure 6
<p>Robot–human interaction machine (ROMAN) [<a href="#B50-actuators-13-00419" class="html-bibr">50</a>] (p.1).</p>
Full article ">Figure 7
<p>Emotions of ROMAN along with the image shown for ordering for humanlikeness.</p>
Full article ">Figure 8
<p>Results of human preferences of interaction with EMAH.</p>
Full article ">Figure 9
<p>Raincloud plot of humanoid robots rated on bi-variate ratings of humanlikeness, familiarity, and eeriness.</p>
Full article ">
27 pages, 2184 KiB  
Review
The “What” and “How” of Pantomime Actions
by Raymond R. MacNeil and James T. Enns
Vision 2024, 8(4), 58; https://doi.org/10.3390/vision8040058 - 26 Sep 2024
Viewed by 1309
Abstract
Pantomimes are human actions that simulate ideas, objects, and events, commonly used in conversation, performance art, and gesture-based interfaces for computing and controlling robots. Yet, their underlying neurocognitive mechanisms are not well understood. In this review, we examine pantomimes through two parallel lines [...] Read more.
Pantomimes are human actions that simulate ideas, objects, and events, commonly used in conversation, performance art, and gesture-based interfaces for computing and controlling robots. Yet, their underlying neurocognitive mechanisms are not well understood. In this review, we examine pantomimes through two parallel lines of research: (1) the two visual systems (TVS) framework for visually guided action, and (2) the neuropsychological literature on limb apraxia. Historically, the TVS framework has considered pantomime actions as expressions of conscious perceptual processing in the ventral stream, but an emerging view is that they are jointly influenced by ventral and dorsal stream processing. Within the apraxia literature, pantomimes were historically viewed as learned motor schemas, but there is growing recognition that they include creative and improvised actions. Both literatures now recognize that pantomimes are often created spontaneously, sometimes drawing on memory and always requiring online cognitive control. By highlighting this convergence of ideas, we aim to encourage greater collaboration across these two research areas, in an effort to better understand these uniquely human behaviors. Full article
Show Figures

Figure 1

Figure 1
<p>Idealized representation of the kinematic differences in the aperture scaling profiles of real and pantomime grasps. (<b>A</b>) The top image (red border) depicts a power grasp on a standard-sized beaker, while the bottom image (blue border) depicts a precision grasp on a graduated cylinder beaker. (<b>B</b>) This graph represents how aperture size varies over time for real grasps. A pre-contact “grip overshoot” is reliably observed in the case of both power (red line) and precision (blue line) grasps. This functions to create a margin of safety for avoiding target collision and establishing a secure grip upon contact. Peak grip aperture is reached during this overshoot phase of grip aperture scaling. (<b>C</b>) The aperture scaling profile for pantomime grasps. The lines of the precision (blue) and power (red) grasps are overlaid with those of real grasps (transparent). In this example, we see that the grip overshoot is conspicuously absent for pantomimed grasps and that peak grip aperture is attained later in the reach trajectory.</p>
Full article ">Figure 2
<p>Simplified schematic of the dual-route model of pantomime production proposed by Rothi et al. [<a href="#B146-vision-08-00058" class="html-bibr">146</a>,<a href="#B147-vision-08-00058" class="html-bibr">147</a>]. Different modalities can elicit the pantomime’s performance. The experimenter may make a verbal request—e.g., “show me how you hammer a nail”—or perform the pantomime themselves for imitation by the patient. Alternatively, the patient may have to pantomime an object’s use after it is presented in physical or pictorial form. In the case of the former, processing will usually proceed through the lexical route, allowing the retrieval of the appropriate motor schema. This may need to be mapped to knowledge about what the object is used for (i.e., action semantics). If the input is gestural, a direct route allows for “on the fly” imitation, without any semantic processing. The object recognition system allows structural knowledge to activate the appropriate motor schema in the absence of a patient being able to explicitly report on the object’s conventional function.</p>
Full article ">Figure 3
<p>The Two Action Systems Plus (2AS+) model of real and pantomimed actions, based on [<a href="#B35-vision-08-00058" class="html-bibr">35</a>,<a href="#B114-vision-08-00058" class="html-bibr">114</a>,<a href="#B150-vision-08-00058" class="html-bibr">150</a>]. The dorso-dorsal channel supports grasp-to-move actions and novel tool use by providing online visual feedback of object affordances. It originates in the primary visual cortex (V1) and projects to the superior parietal lobe, where it continues to the dorsal premotor area. The ventro-dorsal channel (purple) supports real and pantomimed familiar tool use. With input originating in V1, it projects to areas in the inferior parietal lobe, including the supramarginal gyrus (SMG), and terminates in the ventral premotor area. The SMG serves as a critical hub for integrating ventral stream (red arrow) object representations and sematic knowledge with the sensorimotor processing of the ventro-dorsal and dorso-dorsal channels. The “plus” in the model refers to a circuit of reciprocal connections formed between the inferior frontal gyrus and SMG. This circuit is proposed to serve as an action selection module, resolving competing options for motor output corresponding to object transport (move, dorso-dorsal) versus function (use, ventro-dorsal).</p>
Full article ">Figure 4
<p>Schematic of the working memory model of pantomime production proposed by Bartolo et al. [<a href="#B138-vision-08-00058" class="html-bibr">138</a>]. This model builds on the work of Rothi et al. [<a href="#B146-vision-08-00058" class="html-bibr">146</a>]. The authors view pantomimes as creative gestures that are formed de novo. Working memory, conceptualized as a workspace, is proposed to operate as an obligatory creative mechanism that combines sensory input (i.e., the visual or auditory action prompt), conceptual knowledge (action semantics), and procedural memory (action lexicon) to support pantomime production. The visuomotor conversion module facilitates “on-the-fly” imitation, which may or may not require working memory (reflected by dashed arrows). The model’s formulation was motivated by observations of a patient known by the initials VL. Tests of VL’s gestural ability revealed a near-selective deficit in pantomime production, while cognitive testing revealed a selective impairment in working memory. See text for additional details.</p>
Full article ">Figure 5
<p>The technical reasoning (neurocognitive) model of pantomime. See text for additional details. Reproduced from Osiurak et al. [<a href="#B169-vision-08-00058" class="html-bibr">169</a>] (Creative Commons).</p>
Full article ">
25 pages, 6749 KiB  
Article
Application of Artificial Neuromolecular System in Robotic Arm Control to Assist Progressive Rehabilitation for Upper Extremity Stroke Patients
by Jong-Chen Chen and Hao-Ming Cheng
Actuators 2024, 13(9), 362; https://doi.org/10.3390/act13090362 - 16 Sep 2024
Viewed by 1259
Abstract
Freedom of movement of the hands is the most desired hope of stroke patients. However, stroke recovery is a long, long road for many patients. If artificial intelligence can assist human arm movement, the possibility of stroke patients returning to normal hand movement [...] Read more.
Freedom of movement of the hands is the most desired hope of stroke patients. However, stroke recovery is a long, long road for many patients. If artificial intelligence can assist human arm movement, the possibility of stroke patients returning to normal hand movement might be significantly increased. This study uses the artificial neuromolecular system (ANM system) developed in our laboratory as the core of motion control, in an attempt to learn to control the mechanical arm to produce actions similar to human rehabilitation training and the transition between different activities. This research adopts two methods. The first is hypothetical exploration, the so-called “artificial world” simulation method. The detailed approach uses the V-REP (Virtual Robot Experimentation Platform) to conduct different experimental runs to capture relevant data. Our policy is to establish an action database systematically to a certain extent. From these data, we use the ANM system with self-organization and learning capabilities to develop the relationship between these actions and establish the possibility of conversion between different activities. The second method of this study is to use the data from a hospital in Toronto, Canada. Our experimental results show that the ANM system can continuously learn for problem-solving. In addition, our three experimental results of adaptive learning, transfer learning, and cross-task learning further confirm that the ANM system can use previously learned systems to complete the delivered tasks through autonomous learning (instead of learning from scratch). Full article
Show Figures

Figure 1

Figure 1
<p>The structure of the ANM system.</p>
Full article ">Figure 2
<p>Cytoskeleton elements.</p>
Full article ">Figure 3
<p>Evolutionary learning of the ANM system.</p>
Full article ">Figure 4
<p>Research model.</p>
Full article ">Figure 5
<p>Comparison of muscle joints between robotic arm and human arm.</p>
Full article ">Figure 6
<p>Artificial World dataset data collection flowchart.</p>
Full article ">Figure 7
<p>(<b>a</b>) Microsoft Kinect (k4w) v2. (<b>b</b>) A handheld end-effector with two degrees of freedom.</p>
Full article ">Figure 8
<p>Comparison of learning results at different stages of the ANM system.</p>
Full article ">Figure 9
<p>The structure of adaptive learning.</p>
Full article ">Figure 10
<p>The relationship between healthy people and patients in the Toronto Rehabilitation dataset.</p>
Full article ">Figure 11
<p>The concept of clustering in adaptive learning.</p>
Full article ">Figure 12
<p>(<b>a</b>) Clustering results of ten healthy people moving forward–backward with left arm. (<b>b</b>) Clustering results of nine healthy people moving forward–backward with right arm. (<b>c</b>) Clustering results of ten healthy people moving side-to-side with left arm. (<b>d</b>) Clustering results of nine healthy people moving side-to-side with right arm.</p>
Full article ">Figure 13
<p>(<b>a</b>) Similarity clustering of ten healthy people’s compensatory actions (Fwr_Bck_L) for P4. (<b>b</b>) Similarity clustering of nine healthy people’s compensatory actions (Fwr_Bck_R) for P4. (<b>c</b>) Similarity clustering of ten healthy people’s compensatory actions (Sd2Sd_Bck_L) for P4. (<b>d</b>) Similarity clustering of nine healthy people’s compensatory actions (Sd2Sd_Bck_R) for P4. We note that different colors of lines are for better visualization.</p>
Full article ">Figure 14
<p>Comparative diagram of adaptive and progressive learning in rehabilitation.</p>
Full article ">Figure 15
<p>The concept of progressive learning.</p>
Full article ">
16 pages, 1760 KiB  
Article
Robot Control Platform for Multimodal Interactions with Humans Based on ChatGPT
by Jingtao Qu, Mateusz Jarosz and Bartlomiej Sniezynski
Appl. Sci. 2024, 14(17), 8011; https://doi.org/10.3390/app14178011 - 7 Sep 2024
Viewed by 1949
Abstract
This paper presents the architecture of a multimodal human–robot interaction control platform that leverages the advanced language capabilities of ChatGPT to facilitate more natural and engaging conversations between humans and robots. Implemented on the Pepper humanoid robot, the platform aims to enhance communication [...] Read more.
This paper presents the architecture of a multimodal human–robot interaction control platform that leverages the advanced language capabilities of ChatGPT to facilitate more natural and engaging conversations between humans and robots. Implemented on the Pepper humanoid robot, the platform aims to enhance communication by providing a richer and more intuitive interface. The motivation behind this study is to enhance robot performance in human interaction through cutting-edge natural language processing technology, thereby improving public attitudes toward robots, fostering the development and application of robotic technology, and reducing the negative attitudes often associated with human–robot interactions. To validate the system, we conducted experiments measuring negative attitude robot scale and their robot anxiety scale scores before and after interacting with the robot. Statistical analysis of the data revealed a significant improvement in the participants’ attitudes and a notable reduction in anxiety following the interaction, indicating that the system holds promise for fostering more positive human–robot relationships. Full article
Show Figures

Figure 1

Figure 1
<p>Robot Control Platform for Multimodal Interactions with Humans based on ChatGPT.</p>
Full article ">Figure 2
<p>Sequence of interactions in the proposed architecture, highlighting envisioned actions.</p>
Full article ">Figure 3
<p>Application working on Pepper robot, user view of the robot during conversation.</p>
Full article ">Figure 4
<p>Interaction flow used in experiments.</p>
Full article ">Figure 5
<p>Results before and after experiment with NARS survey.</p>
Full article ">Figure 6
<p>Results before and after experiment with NARS survey, grouped into three factors: S1—negative attitude towards interaction with robots, S2—negative attitude towards social influence of robots, and S3—negative attitude toward emotions in interaction with robots.</p>
Full article ">Figure 7
<p>Results before and after experiment with RAS survey.</p>
Full article ">Figure 8
<p>Results before and after experiment with RAS survey, grouped into three factors: S1—anxiety towards communication capability of robots, S2—anxiety towards behavioral characteristics of robots, S3—anxiety towards discourse with robots.</p>
Full article ">
29 pages, 6331 KiB  
Article
Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning
by Diego Resende Faria, Abraham Itzhak Weinberg and Pedro Paulo Ayrosa
Appl. Sci. 2024, 14(15), 6631; https://doi.org/10.3390/app14156631 - 29 Jul 2024
Cited by 1 | Viewed by 2257
Abstract
Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning [...] Read more.
Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human–robot interaction, and cross-cultural communication. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed architecture for affective communication merging speech emotion and sentiment analysis.</p>
Full article ">Figure 2
<p>Frameworks for categorizing emotions: (<b>a</b>) Russell’s circumplex model of affect (adapted from [<a href="#B34-applsci-14-06631" class="html-bibr">34</a>]) and (<b>b</b>) Plutchik’s wheel of emotions (adapted from [<a href="#B35-applsci-14-06631" class="html-bibr">35</a>]).</p>
Full article ">Figure 3
<p>Overview of the proposed 2L-DBMM architecture for multimodality: SER and SA.</p>
Full article ">
9 pages, 1356 KiB  
Article
Bio-Inspired Double-Layered Hydrogel Robot with Fast Response via Thermo-Responsive Effect
by Yunsong Liu and Xiong Zheng
Materials 2024, 17(15), 3679; https://doi.org/10.3390/ma17153679 - 25 Jul 2024
Viewed by 872
Abstract
Bio-inspired hydrogel robots have become promising due to their advantage of the interaction safety and comfort between robots and humans, while current hydrogel robots mainly focus on underwater movement due to the hydration–dehydration process of thermo-responsive hydrogels, which greatly limits their practical applications. [...] Read more.
Bio-inspired hydrogel robots have become promising due to their advantage of the interaction safety and comfort between robots and humans, while current hydrogel robots mainly focus on underwater movement due to the hydration–dehydration process of thermo-responsive hydrogels, which greatly limits their practical applications. To expand the motion of the thermo-responsive hydrogel robot to the ground, we constructed a hydrogel robot inspired by a caterpillar, which has an anisotropic double-layered structure by the interfacial diffusion polymerization method. Adding PVA and SA to PNIPAm will cause different conformation transitions. Therefore, sticking the two layers of hydrogel together will form a double-layer anisotropic structure. The ultra-high hydrophilicity of PVA and SA significantly reduces the contact angle of the hydrogel from 53.1° to about 10° and reduces its hydration time. The responsive time for bending 30° of the hydrogel robot has been greatly reduced from 1 h to half an hour through the enhancement of photo-thermal conversion and thermal conductivity via the addition of Fe3O4 nanoparticles. As a result, the fabricated hydrogel robot can achieve a high moving speed of 54.5 mm·h−1 on the ground. Additionally, the fabricated hydrogel has excellent mechanical strength and can endure significant flexibility tests. This work may pave the road for the development of soft robots and expand their applications in industry. Full article
(This article belongs to the Section Polymeric Materials)
Show Figures

Figure 1

Figure 1
<p>Mechanical performance evaluation of the double-layered hydrogel. (<b>a</b>) Stress–strain curve; (<b>b</b>) flexibility test. (<b>c</b>) Contact angles of different hydrogels.</p>
Full article ">Figure 2
<p>The working principle of double-layer hydrogel robot. Comparison of the top (<b>a</b>) and (<b>b</b>) bottom hydrogels before and after illumination; (<b>c</b>) physical picture of double-layered hydrogel; (<b>d</b>) bending process of double layer hydrogel under light; (<b>e</b>) masses of hydrogel at different stages.</p>
Full article ">Figure 3
<p>(<b>a</b>) Light absorptance of hydrogels at different Fe<sub>3</sub>O<sub>4</sub> concentrations; (<b>b</b>) response time of hydrogel bending 30 degrees at different concentrations; (<b>c</b>) thermal conductivity and enhancement of hydrogels with different Fe<sub>3</sub>O<sub>4</sub> loading; (<b>d</b>) response time of hydrogel with 2.0 wt% Fe<sub>3</sub>O<sub>4</sub> loading bending 30 degrees under different light intensity.</p>
Full article ">Figure 4
<p>Performance of the double-layer hydrogel robot. (<b>a</b>) Moving process of hydrogel robot at different times; (<b>b</b>) conformation of hydrogel robot before and after light irradiance.</p>
Full article ">
27 pages, 8263 KiB  
Article
How the Degree of Anthropomorphism of Human-like Robots Affects Users’ Perceptual and Emotional Processing: Evidence from an EEG Study
by Jinchun Wu, Xiaoxi Du, Yixuan Liu, Wenzhe Tang and Chengqi Xue
Sensors 2024, 24(15), 4809; https://doi.org/10.3390/s24154809 - 24 Jul 2024
Cited by 1 | Viewed by 2018
Abstract
Anthropomorphized robots are increasingly integrated into human social life, playing vital roles across various fields. This study aimed to elucidate the neural dynamics underlying users’ perceptual and emotional responses to robots with varying levels of anthropomorphism. We investigated event-related potentials (ERPs) and event-related [...] Read more.
Anthropomorphized robots are increasingly integrated into human social life, playing vital roles across various fields. This study aimed to elucidate the neural dynamics underlying users’ perceptual and emotional responses to robots with varying levels of anthropomorphism. We investigated event-related potentials (ERPs) and event-related spectral perturbations (ERSPs) elicited while participants viewed, perceived, and rated the affection of robots with low (L-AR), medium (M-AR), and high (H-AR) levels of anthropomorphism. EEG data were recorded from 42 participants. Results revealed that H-AR induced a more negative N1 and increased frontal theta power, but decreased P2 in early time windows. Conversely, M-AR and L-AR elicited larger P2 compared to H-AR. In later time windows, M-AR generated greater late positive potential (LPP) and enhanced parietal-occipital theta oscillations than H-AR and L-AR. These findings suggest distinct neural processing phases: early feature detection and selective attention allocation, followed by later affective appraisal. Early detection of facial form and animacy, with P2 reflecting higher-order visual processing, appeared to correlate with anthropomorphism levels. This research advances the understanding of emotional processing in anthropomorphic robot design and provides valuable insights for robot designers and manufacturers regarding emotional and feature design, evaluation, and promotion of anthropomorphic robots. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Figure 1
<p>The uncanny valley function, as proposed by Mori (1970) [<a href="#B10-sensors-24-04809" class="html-bibr">10</a>].</p>
Full article ">Figure 2
<p>Schematic time course of the experimental procedure and the stimuli used in the experiment.</p>
Full article ">Figure 3
<p>Experimental equipment set up and electrodes. (<b>a</b>) Brain vision actiCHamp EEG system; (<b>b</b>) 64 electrodes (with names and number labels) used in the experiment.</p>
Full article ">Figure 4
<p>The grand-averaged ERP waveforms in response to high, middle, and low anthropomorphic robots.</p>
Full article ">Figure 5
<p>Topographic scalp maps of ERPs component during selected time course for high, middle, and low anthropomorphic robots.</p>
Full article ">Figure 6
<p>Figure (<b>a</b>) and figure (<b>b</b>) showed emotional valence and emotional arousal results of the 40 participants toward the three types for anthropomorphic robots, respectively.</p>
Full article ">Figure 7
<p>Figure (<b>a</b>) and figure (<b>b</b>) showed the subjective likeability and warmth rating results of the 40 participants toward the three types for anthropomorphic robots, respectively.</p>
Full article ">Figure 8
<p>Spectrograms of theta-band (3–8 Hz) ERS at the frontal cluster and parietal-occipital cluster associated with H-AR\M-AR\L-AR conditions: (<b>a</b>) the time–frequency representations of ERD/ERS related to H-AR\M-AR\L-AR conditions at the frontal and parietal-occipital clusters and (<b>b</b>) the channel electrode clusters of interest. The red represents frontal cluster, while the green represents parietal-occipital cluster.</p>
Full article ">Figure 9
<p>The (<b>left</b>) and (<b>right</b>) plots represented the interaction effect of theta power of frontal cluster and parietal-occipital cluster across the early time window (50–380 ms) and the late time window (400–1000 ms), respectively.</p>
Full article ">Figure 10
<p>(<b>a</b>) represents Spearman’s r of emotional responses, ERPs and ERSP. (<b>b</b>) represents Spearman’s p of emotional responses, ERPs and ERSP. The statistical method used the Spearman correlation coefficient; a and p indicate the frontal and posterior regions, respectively, e and l represent the early time window and late time window, respectively.</p>
Full article ">
23 pages, 9408 KiB  
Article
Evolution of Industrial Robots from the Perspective of the Metaverse: Integration of Virtual and Physical Realities and Human–Robot Collaboration
by Jing You, Zhiyuan Wu, Wei Wei, Ning Li and Yuhua Yang
Appl. Sci. 2024, 14(14), 6369; https://doi.org/10.3390/app14146369 - 22 Jul 2024
Cited by 1 | Viewed by 1971
Abstract
During the transition from Industry 4.0 to Industry 5.0, industrial robotics technology faces the need for intelligent and highly integrated development. Metaverse technology creates immersive and interactive virtual environments, allowing technicians to perform simulations and experiments in the virtual world, and overcoming the [...] Read more.
During the transition from Industry 4.0 to Industry 5.0, industrial robotics technology faces the need for intelligent and highly integrated development. Metaverse technology creates immersive and interactive virtual environments, allowing technicians to perform simulations and experiments in the virtual world, and overcoming the limitations of traditional industrial operations. This paper explores the application and evolution of metaverse technology in the field of industrial robotics, focusing on the realization of virtual–real integration and human–machine collaboration. It proposes a design framework for a virtual–real interaction system based on the ROS and WEB technologies, supporting robot connectivity, posture display, coordinate axis conversion, and cross-platform multi-robot loading. This paper emphasizes the study of two key technologies for the system: virtual–real model communication and virtual–real model transformation. A general communication mechanism is designed and implemented based on the ROS, using the ROS topic subscription to achieve connection and real-time data communication between physical robots and virtual models, and utilizing URDF model transformation technology for model invocation and display. Compared with traditional simulation software, i.e., KUKA Sim PRO (version 1.1) and RobotStudio (version 6.08), the system improves model loading by 45.58% and 24.72%, and the drive response by 41.50% and 28.75%. This system not only supports virtual simulation and training but also enables the operation of physical industrial robots, provides persistent data storage, and supports action reproduction and offline data analysis and decision making. Full article
(This article belongs to the Topic Smart Production in Terms of Industry 4.0 and 5.0)
Show Figures

Figure 1

Figure 1
<p>Industrial metaverse evolution: virtual–physical trajectory diagram.</p>
Full article ">Figure 2
<p>WEB concept industrial metaverse.</p>
Full article ">Figure 3
<p>Industrial robot connection WEB simulation system framework diagram.</p>
Full article ">Figure 4
<p>Robotic arm posture display and workflow.</p>
Full article ">Figure 5
<p>Coordinate axis switching and implementation workflow.</p>
Full article ">Figure 6
<p>Multi-terminal WEB display and implementation workflow.</p>
Full article ">Figure 7
<p>Bidirectional communication diagram between physical robot and ROS.</p>
Full article ">Figure 8
<p>Communication diagram between ROS and WEB platform.</p>
Full article ">Figure 9
<p>Process flow diagram for URDF model conversion.</p>
Full article ">Figure 10
<p>Overall performance and motion diagram of industrial robot.</p>
Full article ">Figure 11
<p>Real-time data reception from robot diagram.</p>
Full article ">Figure 12
<p>Robot model loading.</p>
Full article ">Figure 13
<p>Comparison diagram of loading time with KUKA Sim Pro model.</p>
Full article ">Figure 14
<p>Comparison diagram of model loading time with ABB RobotStudio.</p>
Full article ">Figure 15
<p>Model driving direction diagram (KUKA, top left; ABB, bottom left; system, right).</p>
Full article ">Figure 16
<p>Comparison chart of response time with KUKA Sim Pro driver.</p>
Full article ">Figure 17
<p>Comparison chart of response time with ABB RobotStudio driver.</p>
Full article ">
13 pages, 5281 KiB  
Article
Design and Implementation of Adam: A Humanoid Robotic Head with Social Interaction Capabilities
by Sherif Said, Karim Youssef, Benrose Prasad, Ghaneemah Alasfour, Samer Alkork and Taha Beyrouthy
Appl. Syst. Innov. 2024, 7(3), 42; https://doi.org/10.3390/asi7030042 - 27 May 2024
Cited by 1 | Viewed by 1973
Abstract
Social robots are being conceived with different characteristics and being used in different applications. The growth of social robotics benefits from advances in fabrication, sensing, and actuation technologies, as well as signal processing and artificial intelligence. This paper presents a design and implementation [...] Read more.
Social robots are being conceived with different characteristics and being used in different applications. The growth of social robotics benefits from advances in fabrication, sensing, and actuation technologies, as well as signal processing and artificial intelligence. This paper presents a design and implementation of the humanoid robotic platform Adam, consisting of a motorized human-like head with precise movements of the eyes, jaw, and neck, together with capabilities of face tracking and vocal conversation using ChatGPT. Adam relies on 3D-printed parts together with a microphone, a camera, and proper servomotors, and it has high structural integrity and flexibility. Adam’s control framework consists of an adequate signal exploitation and motor command strategy that allows efficient social interactions. Adam is an innovative platform that combines manufacturability, user-friendliness, low costs, acceptability, and sustainability, offering advantages compared with other platforms. Indeed, the platform’s hardware and software components are adjustable and allow it to increase its abilities and adapt them to different applications in a variety of roles. Future work will entail the development of a body for Adam and the addition of skin-like materials to enhance its human-like appearance. Full article
(This article belongs to the Section Human-Computer Interaction)
Show Figures

Figure 1

Figure 1
<p>The humanoid robotic head; Adam’s structural design.</p>
Full article ">Figure 2
<p>Dimensions of the Adam robotics head. All the measures are in millimeters.</p>
Full article ">Figure 3
<p>Eye movement mechanism; 3D design.</p>
Full article ">Figure 4
<p>Jaw movement mechanisms; 3D Design.</p>
Full article ">Figure 5
<p>Neck movement mechanisms; 3D design.</p>
Full article ">Figure 6
<p>(<b>i</b>,<b>ii</b>) Pitch and (<b>iii</b>,<b>iv</b>) yaw movements.</p>
Full article ">Figure 7
<p>(<b>a</b>) Side view of the neck pitching mechanism with the kinematic chain highlighted in orange; (<b>b</b>) kinematic diagram of the pitching mechanism showing the transmission angle <math display="inline"><semantics> <mi>γ</mi> </semantics></math>.</p>
Full article ">Figure 8
<p>Transmission angle.</p>
Full article ">Figure 9
<p>Motor torque for pitching.</p>
Full article ">Figure 10
<p>FEA von Mises stresses in the important neck components.</p>
Full article ">Figure 11
<p>Detection of the face of a person interacting with Adam. Note that the face is blurred in the current illustration. Demonstration video of the ADAM operation (<a href="https://youtu.be/6w9tZgyRsAs?si=9OjM9k1w_Xy-wXd_" target="_blank">https://youtu.be/6w9tZgyRsAs?si=9OjM9k1w_Xy-wXd_</a> accessed on 10 March 2024).</p>
Full article ">Figure 12
<p>Consecutive steps in the conversational block of Adam.</p>
Full article ">Figure 13
<p>Number of accurate answers provided by Adam to the 5 questions asked by each interlocutor among 10.</p>
Full article ">
34 pages, 5234 KiB  
Article
Simulated Dopamine Modulation of a Neurorobotic Model of the Basal Ganglia
by Tony J. Prescott, Fernando M. Montes González, Kevin Gurney, Mark D. Humphries and Peter Redgrave
Biomimetics 2024, 9(3), 139; https://doi.org/10.3390/biomimetics9030139 - 25 Feb 2024
Cited by 1 | Viewed by 1978
Abstract
The vertebrate basal ganglia play an important role in action selection—the resolution of conflicts between alternative motor programs. The effective operation of basal ganglia circuitry is also known to rely on appropriate levels of the neurotransmitter dopamine. We investigated reducing or increasing the [...] Read more.
The vertebrate basal ganglia play an important role in action selection—the resolution of conflicts between alternative motor programs. The effective operation of basal ganglia circuitry is also known to rely on appropriate levels of the neurotransmitter dopamine. We investigated reducing or increasing the tonic level of simulated dopamine in a prior model of the basal ganglia integrated into a robot control architecture engaged in a foraging task inspired by animal behaviour. The main findings were that progressive reductions in the levels of simulated dopamine caused slowed behaviour and, at low levels, an inability to initiate movement. These states were partially relieved by increased salience levels (stronger sensory/motivational input). Conversely, increased simulated dopamine caused distortion of the robot’s motor acts through partially expressed motor activity relating to losing actions. This could also lead to an increased frequency of behaviour switching. Levels of simulated dopamine that were either significantly lower or higher than baseline could cause a loss of behavioural integration, sometimes leaving the robot in a ‘behavioral trap’. That some analogous traits are observed in animals and humans affected by dopamine dysregulation suggests that robotic models could prove useful in understanding the role of dopamine neurotransmission in basal ganglia function and dysfunction. Full article
(This article belongs to the Special Issue Bio-Inspired and Biomimetic Intelligence in Robotics)
Show Figures

Figure 1

Figure 1
<p>(<b>A</b>) A diagram of the connectivity, relative position, and relative size of the nuclei that comprise the vertebrate basal ganglia showing the separate projection targets of the D1 and D2 receptor striatal neurons as modelled by Gurney et al. [<a href="#B39-biomimetics-09-00139" class="html-bibr">39</a>,<a href="#B40-biomimetics-09-00139" class="html-bibr">40</a>]. (<b>B</b>) The connection scheme of the extended basal ganglia model, as modelled by Humphries and Gurney [<a href="#B41-biomimetics-09-00139" class="html-bibr">41</a>], incorporating a feedback pathway to the cortex via the thalamus. The box labelled ‘basal ganglia’ contains the functional anatomy shown on the left. Solid lines depict the excitatory pathway, and dotted lines depict inhibitory pathways in both diagrams. Anatomical labels are for the primate brain. Abbreviations: GPe—globus pallidus external segment; GPi—globus pallidus internal segment (EP—entopeduncular nucleus in rat); STN—subthalamic nucleus; SNc—substantia nigra pars compacta; SNr—substantia nigra pars reticulata. TRN—thalamic reticular nucleus—VL—ventrolateral thalamus. Reprinted with permission from Ref. [<a href="#B41-biomimetics-09-00139" class="html-bibr">41</a>]. 2002, Taylor &amp; Francis Informa UK Ltd—Journals.</p>
Full article ">Figure 2
<p>The model task. (<b>A</b>) A hungry rat placed in an open arena will initially explore the periphery (frames 1 and 2) before eventually venturing into the centre (frame 3) to retrieve food pellets that are then consumed in a sheltered ‘nest’ corner (frame 4). (<b>B</b>) In the robot, these behaviours are simulated by seeking (frame 1) and following walls (frame 2) and by searching for and acquiring cylinders (frame 3) that are then deposited in the lit corner of the arena (frame 4) (see <a href="#app1-biomimetics-09-00139" class="html-app">Supplementary Video, part 2</a>).</p>
Full article ">Figure 3
<p>The robot basal ganglia model. The robot (i) interfaces, via the embedding architecture (ii), with the extended basal ganglia model (iii). The embedding architecture is composed of five action-subsystems (one shown), perceptual and motor sub-systems, and an integrator that combines the gated motor output of all five channels. See text, <a href="#sec4dot2-biomimetics-09-00139" class="html-sec">Section 4.2</a>, the <a href="#app1-biomimetics-09-00139" class="html-app">Supplementary Methods</a>, and [<a href="#B36-biomimetics-09-00139" class="html-bibr">36</a>] for further explanation. Abbreviations: VG—(motor) vector generator; SI—shunting inhibition (Equation (1)); <span class="html-italic">e</span>—gating signal; <span class="html-italic">b</span>—busy signal; <span class="html-italic">s</span>—salience signal; <span class="html-italic">f</span>—feedback signal; <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">y</mi> </mrow> <mrow> <mi mathvariant="bold-italic">s</mi> <mi mathvariant="bold-italic">n</mi> <mi mathvariant="bold-italic">r</mi> </mrow> </msup> </mrow> </semantics></math>—basal ganglia output; <b>v</b>—motor vector; <math display="inline"><semantics> <mrow> <mover accent="true"> <mrow> <mi mathvariant="bold-italic">v</mi> </mrow> <mo>^</mo> </mover> </mrow> </semantics></math>—aggregate motor vector; SSC—somatosensory cortex; MC—motor cortex (other anatomical abbreviations as per <a href="#biomimetics-09-00139-f001" class="html-fig">Figure 1</a>). Reprinted with permission from [<a href="#B36-biomimetics-09-00139" class="html-bibr">36</a>]. 2006, Elsevier Science and Engineering Journals.</p>
Full article ">Figure 4
<p>Processing within the <span class="html-italic">i</span>th basal ganglia channel. The salience of channel <span class="html-italic">i</span> is represented by the variable <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mi>i</mi> </mrow> </msub> </mrow> </semantics></math>. Leaky integrator units represent the activity in the input striatal units, with separate units for the D1- and D2-type neuron populations, and the substantia nigra output units. Other units within the model are not shown (see [<a href="#B36-biomimetics-09-00139" class="html-bibr">36</a>] and the <a href="#app1-biomimetics-09-00139" class="html-app">Supplementary Methods</a>). Synaptic efficacy is increased by tonic dopamine within the D1 channel (1 + λ) and reduced within the D2 channel (1 − λ). The basal ganglia output for channel <span class="html-italic">i</span> is modelled as affecting target motor systems via shunting inhibition (Equation (1)) and represented by the gating signal (<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mi>i</mi> </mrow> </msub> </mrow> </semantics></math>) for that channel.</p>
Full article ">Figure 5
<p>(<b>A</b>) The percentage of selection competitions falling into different classes of selection outcome for values of simulated dopamine, <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>,</mo> <mo> </mo> </mrow> </semantics></math>ranging from 0 through to 0.5 in increments of 0.01. Data were obtained through an exhaustive search of a two-dimensional salience space. Partial selection is predominant for low dopamine values; distortion and multiple selection are evident at high dopamine values. Simulation with levels of <math display="inline"><semantics> <mrow> <mi>λ</mi> </mrow> </semantics></math> &gt; 0.5 resulted in continuation of the trends shown in the figure (see <a href="#app1-biomimetics-09-00139" class="html-app">Supplementary Materials</a>). (<b>B</b>) Average efficiency (green) and distortion (red) across all runs at each level of λ.</p>
Full article ">Figure 6
<p>Selection boundaries in two-dimensional salience space for sample levels of simulated dopamine—very low (λ = 0.06), low (0.12), intermediate (0.22), high (0.31), and very high (0.40). For each plot, the salience of channel 1 is shown on the <span class="html-italic">x</span>-axis, and that of channel 2 is shown on the <span class="html-italic">y</span>-axis ranging from 0.0 to 1.0 (shown only for the central plot). Labels indicate the following: N—no selection; P—partial selection; C1—clean selection of channel 1; C2—clean selection of channel 2; D—distortion; M—multiple selection.</p>
Full article ">Figure 7
<p>(<b>A</b>) Selection outcomes in the disembodied model re-classified as a channel 1 win, a channel 2 win, a stand-off (no selection), or a tie. Channel 1 (c1) wins substantially more competitions than channel 2 (c2) for all but the lowest levels of simulated dopamine. (<b>B</b>) The level of channel 2 salience, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math>, required for channel 2 to prevail (i.e., e2 &gt; e1) against a channel 1 salience, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, of 0.3, 0.4, or 0.5, for different values of λ. Data are shown only where there is a clear switch from channel 1 to channel 2 with increasing <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> (i.e., without an intervening interval of no selection or multiple selection). The degree of hysteresis varies depending on λ and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, with the value of λ that generates maximum hysteresis decreasing with increasing <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>s</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 8
<p>Bout/sequence structure of action selection in the robot model for a 240 s trial (λ = 0.20); the first 100 s is shown in the <a href="#app1-biomimetics-09-00139" class="html-app">Supplementary Video, part 3</a>. Each of the first five plots shows the efficiency (<span class="html-italic">e</span>) of selection for a given action sub-system plotted against time. The sixth plot shows the inefficiency of the current winner, the seventh the higher-order structure of the bout sequences, (av = avoidance; fo = foraging; n = no behaviour), and the final plot the levels of the two simulated motivations. All measures vary between 0 and 1 on the <span class="html-italic">y</span>-axis. The robot displays appropriate bouts of behaviour organised into integrated, goal-achieving sequences.</p>
Full article ">Figure 9
<p>(<b>A</b>–<b>E</b>). The percentage of selection competitions falling into different classes of selection outcome for values of simulated dopamine ranging from 0.03 through to 0.46. Data were obtained by averaging five 120 s trials of robot behaviour, for each of the eighteen λ levels tested. Standard error bars are shown. Plots are coloured as per the colour scheme in <a href="#biomimetics-09-00139-f005" class="html-fig">Figure 5</a>—clean selection (dark green), no selection (orange), partial selection (purple), distorted selection (pink), multiple selection (light green). Black dotted lines show the equivalent results obtained using the non-embodied model (<a href="#biomimetics-09-00139-f005" class="html-fig">Figure 5</a>). Comparison of the selection properties of the non-embodied and robot models shows more clean, partial, and distorted selection in the robotic model and fewer selection competitions where the outcomes were either no selection or multiple selection. (<b>F</b>). Average efficiency (green) and distortion (red) across all runs at each level of λ.</p>
Full article ">Figure 10
<p>Total trials (<b>A</b>) and success rate ((<b>B</b>), 0.0–1.0) in achieving avoidance/foraging different levels of simulated dopamine (λ). (<b>C</b>) Evidence of disintegrated behaviour at different levels of λ. The bubble plot shows the proportion of trials at each value of λ that resulted in the observed failure type. See the text for further details.</p>
Full article ">Figure 11
<p>Bout/sequence structure of action selection in the robot model for three 120 s trials with low simulated dopamine, (<b>A</b>) λ = 0.06, (<b>B</b>) λ = 0.09, and (<b>C</b>) λ = 0.12, and three 120 s trials with high simulated dopamine: (<b>D</b>) λ = 0.31; (<b>E</b>) λ = 0.31; (<b>F</b>) λ = 0.40. The graph layout is as described for <a href="#biomimetics-09-00139-f008" class="html-fig">Figure 8</a>, except that distortion, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>d</mi> </mrow> <mrow> <mi>w</mi> </mrow> </msub> </mrow> </semantics></math>, of the winning action, replaces inefficiency for panels D–F (as inefficiency is always zero in these trials). Labels in the ‘sequence’ plot show successful avoidance (av), foraging (fo), or different forms of behavioural disintegration as per <a href="#biomimetics-09-00139-t001" class="html-table">Table 1</a>. With low simulated dopamine, the robot shows slowed movement (sm) and an absence of movement (am). Inefficient selection can also cause premature deselection, leading to the failures to grasp the cylinder (fgc) or raise the gripper arm (fra) shown in plots B and C. With high values of λ, distortion of the selected behaviour by the motor output of losing competitors becomes a significant issue. Distortion in the run shown in plot D has only benign effects, but in the run shown in plot E causes behavioural disintegration as the robot fails to grasp a cylinder (fgc) despite multiple attempts. The run shown in plot F demonstrates that there is a high frequency of behaviour switching with high levels of simulated dopamine, in this case because distortion causes to the robot to repeatedly lose track of the walls (lw). See the text for further discussion and <a href="#app2-biomimetics-09-00139" class="html-app">Appendix A</a> for a detailed commentary.</p>
Full article ">Figure 12
<p>Comparison of the standard ‘soft switching’ robot model of the basal ganglia with a winner-takes-all variant in terms of the timing and frequency of behavioural switching for different levels of simulated dopamine. (<b>A</b>) ‘Time-to-switch’ from avoidance to foraging. The plot demonstrates that persistence (time-to-switch to foraging) varies with simulated dopamine and is affected by motor distortion at higher dopamine levels in the case of the standard model only, leading to earlier switching (less persistence) compared with the winner-takes-all variant. (<b>B</b>) Total number of bouts during the first avoidance and foraging sequences combined. Bout frequency is significantly increased at very high λ levels for the standard model only, indicating that distortion of motor behaviour can cause more frequent switching. Each average is over five runs. Bars show standard errors.</p>
Full article ">
Back to TopTop