CN114118064A - Display device, text error correction method and server - Google Patents
Display device, text error correction method and server Download PDFInfo
- Publication number
- CN114118064A CN114118064A CN202010879686.1A CN202010879686A CN114118064A CN 114118064 A CN114118064 A CN 114118064A CN 202010879686 A CN202010879686 A CN 202010879686A CN 114118064 A CN114118064 A CN 114118064A
- Authority
- CN
- China
- Prior art keywords
- text
- corrected
- matrix
- display device
- display
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000006243 chemical reaction Methods 0.000 claims abstract description 14
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 13
- 238000012163 sequencing technique Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 92
- 238000012512 characterization method Methods 0.000 claims description 41
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 15
- 239000010410 layer Substances 0.000 description 68
- 238000004891 communication Methods 0.000 description 39
- 230000006870 function Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 12
- 230000003993 interaction Effects 0.000 description 9
- 230000009471 action Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000005236 sound signal Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 241000282994 Cervidae Species 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 239000010931 gold Substances 0.000 description 2
- 229910052737 gold Inorganic materials 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 206010041349 Somnolence Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 239000012792 core layer Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Controls And Circuits For Display Device (AREA)
Abstract
The embodiment of the application provides a display device, a text error correction method and a server, wherein the display device comprises a display and a controller, and the controller is configured to: responding to a received voice command input by a user, and performing voice conversion on the voice command to obtain a text to be corrected; controlling a display to display a text to be corrected; correcting the text to be corrected based on the phonological close confusion set and the graphic attention machine mechanism to obtain an initial correction text, performing candidate recalling on the text to be corrected and the initial correction text, and obtaining a final correction text according to the sequencing result of the recalled texts; and controlling the display to refresh the text to be corrected into the final corrected text. According to the embodiment of the application, the pronunciation similar knowledge map and the shape similar knowledge map are generated according to the confusion set corresponding to the text to be corrected, the pinyin and character pattern related knowledge of the Chinese characters is merged into the graph neural network, deep semantic information among similar characters is extracted, the pronunciation and character pattern similar knowledge can be effectively utilized, and the error detection and correction accuracy and the recall rate are improved.
Description
Technical Field
The application relates to the technical field of display equipment, in particular to display equipment, a text error correction method and a server.
Background
With the development of computers, big data and machine learning, the spelling correction technology has been widely applied to many fields such as chinese and english input methods, document editing tools, search tools, OCR and speech recognition. The spelling error correction technology is firstly proposed in English which is the most users in the world, and after decades of development, the technology based on rules, statistics and characteristics appears in sequence, so that the accuracy rate is considerable. In contrast, since the start of chinese correction is late, and chinese is more complex than english, and the input of the learner in the research aspect of chinese correction is less, the performance and accuracy of chinese correction are lower, and mature and available tools are fewer at present.
The accuracy of Chinese input data is a basic premise of common tasks of natural language processing and is also a key for improving the performance of upper-layer applications. In the related art, the error detection technology based on LSTM + CRF is difficult to fall on the ground generally because of being limited by a large number of labeled samples, and the error detection technology based on N-gram is low in algorithm performance because of 'hard' discrimination rules, so that the error detection efficiency is low.
Disclosure of Invention
In order to solve the technical problem, the application provides a display device, a text error correction method and a server.
In a first aspect, the present application provides a display device comprising:
a display;
a controller connected with the display, the controller configured to:
responding to a received voice command input by a user, and performing voice conversion on the voice command to obtain a text to be corrected;
controlling a display to display the text to be corrected;
correcting the text to be corrected based on a phonological close confusion set and a graph attention machine mechanism to obtain an initial corrected text;
candidate recalling the text to be corrected and the initial correction text, and obtaining a final correction text according to a sorting result of the recalled texts;
and controlling a display to refresh the text to be corrected into a final corrected text.
In some embodiments, the error correcting the text to be corrected based on the phonetic-shape similar confusion set and the graph attention machine mechanism includes:
extracting features of the text to be corrected to obtain an initial characterization matrix;
creating an adjacent matrix of each character in the text to be corrected according to the sound-shape similar confusion set;
inputting the initial characterization matrix and the adjacency matrix into a multilayer graph convolutional neural network to obtain a next layer characterization matrix;
obtaining a last layer of characterization matrix of the multilayer graph convolution neural network according to a graph attention machine;
and generating characters through the full connection layer and the probability normalization function.
In some embodiments, the creating an adjacency matrix for each character in the text to be corrected according to the phonological close confusion set includes:
acquiring pronunciation similar characters and shape similar characters of each character in the text to be corrected in the sound-shape similar confusion set;
taking the characters in the text to be corrected, the pronunciation similar characters and the characters in the word stock as nodes, taking the relation between the characters as edges, and establishing a pronunciation similar adjacency matrix;
and taking the characters in the text to be corrected, the similar-shape characters and the characters in the word stock as nodes, taking the relation between the characters as edges, and establishing a similar-shape adjacency matrix.
In some embodiments, the inputting the initial characterization matrix and the adjacency matrix into a multi-layer graph convolutional neural network to obtain a next-layer characterization matrix includes:
adding the adjacency matrix and the identity matrix to obtain an adjacency estimation matrix;
calculating a diagonal matrix corresponding to the adjacent estimation matrix to obtain a diagonal estimation matrix;
and obtaining a next-layer characterization matrix according to the adjacency estimation matrix, the diagonal estimation matrix and the initial characterization matrix.
In some embodiments, said deriving a final-layer characterization matrix of said multi-layer graph convolution neural network according to a graph attention machine comprises:
calculating an attention feature matrix of knowledge fusion by adopting an attention mechanism;
and obtaining a final layer of characterization matrix according to the sum of the attention characterization matrix and each layer of characterization matrix.
In a second aspect, an embodiment of the present application provides a text correction method, which is used for a display device, and includes:
correcting the text to be corrected based on the phonetic-shape similar confusion set and the drawing attention machine mechanism to obtain an initial corrected text,
candidate recalling the text to be corrected and the initial correction text to obtain a recalling text;
and sequencing the recalling texts, and obtaining a final error correction text corresponding to the text to be error corrected according to a sequencing result.
In a third aspect, an embodiment of the present application provides a server, where the server is configured to:
receiving a text to be corrected from a display device;
correcting the text to be corrected based on the phonetic-shape similar confusion set and the graph attention machine mechanism to obtain an initial corrected text,
candidate recalling the text to be corrected and the initial correction text, and obtaining a final correction text according to a sorting result of the recalled texts;
and sending the final error correction text to the display device.
The display device, the text error correction method and the server have the advantages that:
according to the embodiment of the application, the pronunciation similar knowledge map and the shape similar knowledge map are generated according to the confusion set corresponding to the text to be corrected, the pinyin and character pattern related knowledge of the Chinese characters is merged into the graph neural network, deep semantic information among similar characters is extracted, the pronunciation and character pattern similar knowledge can be effectively utilized, and the error detection and correction accuracy and the recall rate are improved.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a schematic diagram illustrating an operational scenario between a display device and a control apparatus according to some embodiments;
a block diagram of a hardware configuration of a display device 200 according to some embodiments is illustrated in fig. 2;
a block diagram of the hardware configuration of the control device 100 according to some embodiments is illustrated in fig. 3;
a schematic diagram of a software configuration in a display device 200 according to some embodiments is illustrated in fig. 4;
FIG. 5 illustrates an icon control interface display diagram of an application in the display device 200, according to some embodiments;
an overall flow diagram of text correction according to some embodiments is illustrated in fig. 6;
a flow diagram of a text correction method according to some embodiments is illustrated in fig. 7;
FIG. 8 is a diagram illustrating an end-to-end error detection and correction model according to some embodiments;
FIG. 9 illustrates a flow diagram of a method of parsing text to be corrected, according to some embodiments;
FIG. 10 is a flow diagram illustrating a method of creating an adjacency matrix according to some embodiments;
FIG. 11 is a schematic illustration of a voice interaction interface, according to some embodiments;
FIG. 12 is a schematic diagram illustrating a voice interaction interface, according to some embodiments;
a voice interaction interface diagram according to some embodiments is illustrated in fig. 13.
Detailed Description
To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.
It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.
The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
The term "remote control" as used in this application refers to a component of an electronic device (such as the display device disclosed in this application) that is typically wirelessly controllable over a relatively short range of distances. Typically using infrared and/or Radio Frequency (RF) signals and/or bluetooth to connect with the electronic device, and may also include WiFi, wireless USB, bluetooth, motion sensor, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in the common remote control device with the user interface in the touch screen.
The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, or result.
Fig. 1 is a schematic diagram illustrating an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display device 200 through the mobile terminal 300 and the control apparatus 100.
In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, etc., and the display device 200 is controlled by wireless or other wired methods. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement the function of controlling the display device 200.
In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device. The application, through configuration, may provide the user with various controls in an intuitive User Interface (UI) on a screen associated with the smart device.
In some embodiments, the mobile terminal 300 may install a software application with the display device 200 to implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 300 and the display device 200 can establish a control instruction protocol, synchronize a remote control keyboard to the mobile terminal 300, and control the display device 200 by controlling a user interface on the mobile terminal 300. The audio and video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.
As also shown in fig. 1, the display apparatus 200 also performs data communication with the server 400 through various communication means. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. Illustratively, the display device 200 receives software program updates, or accesses a remotely stored digital media library, by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers. Other web service contents such as video on demand and advertisement services are provided through the server 400.
The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The particular display device type, size, resolution, etc. are not limiting, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.
The display apparatus 200 may additionally provide an intelligent network tv function of a computer support function including, but not limited to, a network tv, an intelligent tv, an Internet Protocol Tv (IPTV), and the like, in addition to the broadcast receiving tv function.
A hardware configuration block diagram of a display device 200 according to an exemplary embodiment is exemplarily shown in fig. 2.
In some embodiments, at least one of the controller 250, the tuner demodulator 210, the communicator 220, the detector 230, the input/output interface 255, the display 275, the audio output interface 285, the memory 260, the power supply 290, the user interface 265, and the external device interface 240 is included in the display apparatus 200.
In some embodiments, a display 275 receives image signals originating from the first processor output and displays video content and images and components of the menu manipulation interface.
In some embodiments, the display 275, includes a display screen assembly for presenting a picture, and a driving assembly that drives the display of an image.
In some embodiments, the video content is displayed from broadcast television content, or alternatively, from various broadcast signals that may be received via wired or wireless communication protocols. Alternatively, various image contents received from the network communication protocol and sent from the network server side can be displayed.
In some embodiments, the display 275 is used to present a user-manipulated UI interface generated in the display apparatus 200 and used to control the display apparatus 200.
In some embodiments, a driver assembly for driving the display is also included, depending on the type of display 275.
In some embodiments, display 275 is a projection display and may also include a projection device and a projection screen.
In some embodiments, communicator 220 is a component for communicating with external devices or external servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi chip, a bluetooth communication protocol chip, a wired ethernet communication protocol chip, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver.
In some embodiments, the display apparatus 200 may establish control signal and data signal transmission and reception with the external control device 100 or the content providing apparatus through the communicator 220.
In some embodiments, the user interface 265 may be configured to receive infrared control signals from a control device 100 (e.g., an infrared remote control, etc.).
In some embodiments, the detector 230 is a signal used by the display device 200 to collect an external environment or interact with the outside.
In some embodiments, the detector 230 includes a light receiver, a sensor for collecting the intensity of ambient light, and parameters changes can be adaptively displayed by collecting the ambient light, and the like.
In some embodiments, the detector 230 may further include an image collector, such as a camera, etc., which may be configured to collect external environment scenes, collect attributes of the user or gestures interacted with the user, adaptively change display parameters, and recognize user gestures, so as to implement a function of interaction with the user.
In some embodiments, the detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.
In some embodiments, the display apparatus 200 may adaptively adjust a display color temperature of an image. For example, the display apparatus 200 may be adjusted to display a cool tone when the temperature is in a high environment, or the display apparatus 200 may be adjusted to display a warm tone when the temperature is in a low environment.
In some embodiments, the detector 230 may also be a sound collector or the like, such as a microphone, which may be used to receive the user's voice. Illustratively, a voice signal including a control instruction of the user to control the display device 200, or to collect an ambient sound for recognizing an ambient scene type, so that the display device 200 can adaptively adapt to an ambient noise.
In some embodiments, as shown in fig. 2, the input/output interface 255 is configured to allow data transfer between the controller 250 and external other devices or other controllers 250. Such as receiving video signal data and audio signal data of an external device, or command instruction data, etc.
In some embodiments, the external device interface 240 may include, but is not limited to, the following: the interface can be any one or more of a high-definition multimedia interface (HDMI), an analog or data high-definition component input interface, a composite video input interface, a USB input interface, an RGB port and the like. The plurality of interfaces may form a composite input/output interface.
In some embodiments, as shown in fig. 2, the tuning demodulator 210 is configured to receive a broadcast television signal through a wired or wireless receiving manner, perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, and demodulate an audio and video signal from a plurality of wireless or wired broadcast television signals, where the audio and video signal may include a television audio and video signal carried in a television channel frequency selected by a user and an EPG data signal.
In some embodiments, the frequency points demodulated by the tuner demodulator 210 are controlled by the controller 250, and the controller 250 can send out control signals according to user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.
In some embodiments, the broadcast television signal may be classified into a terrestrial broadcast signal, a cable broadcast signal, a satellite broadcast signal, an internet broadcast signal, or the like according to the broadcasting system of the television signal. Or may be classified into a digital modulation signal, an analog modulation signal, and the like according to a modulation type. Or the signals are classified into digital signals, analog signals and the like according to the types of the signals.
In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box. Therefore, the set top box outputs the television audio and video signals modulated and demodulated by the received broadcast television signals to the main body equipment, and the main body equipment receives the audio and video signals through the first input/output interface.
In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 may control the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user command.
In some embodiments, the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon. The user command for selecting the UI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the display apparatus 200 or a voice command corresponding to a voice spoken by the user.
As shown in fig. 2, the controller 250 includes at least one of a Random Access Memory 251 (RAM), a Read-Only Memory 252 (ROM), a video processor 270, an audio processor 280, other processors 253 (e.g., a Graphics Processing Unit (GPU), a Central Processing Unit 254 (CPU), a Communication Interface (Communication Interface), and a Communication Bus 256(Bus), which connects the respective components.
In some embodiments, RAM 251 is used to store temporary data for the operating system or other programs that are running.
In some embodiments, ROM 252 is used to store instructions for various system boots.
In some embodiments, the ROM 252 is used to store a Basic Input Output System (BIOS). The system is used for completing power-on self-test of the system, initialization of each functional module in the system, a driver of basic input/output of the system and booting an operating system.
In some embodiments, when the power-on signal is received, the display device 200 starts to power up, the CPU executes the system boot instruction in the ROM 252, and copies the temporary data of the operating system stored in the memory to the RAM 251 so as to start or run the operating system. After the start of the operating system is completed, the CPU copies the temporary data of the various application programs in the memory to the RAM 251, and then, the various application programs are started or run.
In some embodiments, CPU processor 254 is used to execute operating system and application program instructions stored in memory. And executing various application programs, data and contents according to various interactive instructions received from the outside so as to finally display and play various audio and video contents.
In some example embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include a main processor and one or more sub-processors. A main processor for performing some operations of the display apparatus 200 in a pre-power-up mode and/or operations of displaying a screen in a normal mode. One or more sub-processors for one operation in a standby mode or the like.
In some embodiments, the graphics processor 253 is used to generate various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The display device comprises an arithmetic unit which carries out operation by receiving various interactive instructions input by a user and displays various objects according to display attributes. And the system comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.
In some embodiments, video processor 270 is configured to receive an external video signal, perform video processing such as decompression, decoding, scaling, noise reduction, frame number conversion, resolution conversion, image synthesis, etc., according to a standard codec protocol of the input signal, and obtain a signal that can be displayed or played on directly displayable device 200.
In some embodiments, video processor 270 includes a demultiplexing module, a video decoding module, an image composition module, a frame number conversion module, a display formatting module, and the like.
The demultiplexing module is used for demultiplexing the input audio and video data stream, and if the input MPEG-2 is input, the demultiplexing module demultiplexes the input audio and video data stream into a video signal and an audio signal.
And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.
And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.
The frame conversion module is used for converting the input video frame number, such as converting the 60Hz frame number into the 120Hz frame number or the 240Hz frame number, and the common format is realized by adopting a frame interpolation mode.
The display format module is used for converting the received frame number into a video output signal and changing the signal to conform to the display format, such as outputting an RGB data signal.
In some embodiments, the graphics processor 253 and the video processor may be integrated or separately configured, and when the graphics processor and the video processor are integrated, the graphics processor and the video processor may perform processing of graphics signals output to the display, and when the graphics processor and the video processor are separately configured, the graphics processor and the video processor may perform different functions, respectively, for example, a GPU + frc (frame Rate conversion) architecture.
In some embodiments, the audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processes to obtain an audio signal that can be played in a speaker.
In some embodiments, video processor 270 may comprise one or more chips. The audio processor may also comprise one or more chips.
In some embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated together with the controller in one or more chips.
In some embodiments, the audio output, under the control of controller 250, receives sound signals output by audio processor 280, such as: the speaker 286, and an external sound output terminal of the sound generating device that can output to the external device, in addition to the speaker carried by the display device 200 itself, such as: external sound interface or earphone interface, etc., and may also include a near field communication module in the communication interface, for example: and the Bluetooth module is used for outputting sound of the Bluetooth loudspeaker.
The power supply 290 supplies power to the display device 200 from the power input from the external power source under the control of the controller 250. The power supply 290 may include a built-in power supply circuit installed inside the display apparatus 200, or may be a power supply interface installed outside the display apparatus 200 to provide an external power supply in the display apparatus 200.
A user interface 265 for receiving an input signal of a user and then transmitting the received user input signal to the controller 250. The user input signal may be a remote controller signal received through an infrared receiver, and various user control signals may be received through the network communication module.
In some embodiments, the user inputs a user command through the control apparatus 100 or the mobile terminal 300, the user input interface responds to the user input through the controller 250 according to the user input, and the display device 200 responds to the user input through the controller 250.
In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.
In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.
The memory 260 includes a memory storing various software modules for driving the display device 200. Such as: various software modules stored in the first memory, including: at least one of a basic module, a detection module, a communication module, a display control module, a browser module, and various service modules.
The base module is a bottom layer software module for signal communication between various hardware in the display device 200 and for sending processing and control signals to the upper layer module. The detection module is used for collecting various information from various sensors or user input interfaces, and the management module is used for performing digital-to-analog conversion and analysis management.
For example, the voice recognition module comprises a voice analysis module and a voice instruction database module. The display control module is used for controlling the display to display the image content, and can be used for playing the multimedia image content, UI interface and other information. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing a module for data communication between browsing servers. And the service module is used for providing various services and modules including various application programs. Meanwhile, the memory 260 may store a visual effect map for receiving external data and user data, images of various items in various user interfaces, and a focus object, etc.
Fig. 3 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 3, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.
The control apparatus 100 is configured to control the display device 200 and may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200. Such as: the user operates the channel up/down key on the control device 100, and the display device 200 responds to the channel up/down operation.
In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications that control the display device 200 according to user demands.
In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similar to the control apparatus 100 after an application for manipulating the display device 200 is installed. Such as: the user may implement the function of controlling the physical keys of the apparatus 100 by installing an application, various function keys or virtual buttons of a graphical user interface available on the mobile terminal 300 or other intelligent electronic device.
The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used for controlling the operation of the control device 100, as well as the communication cooperation among the internal components and the external and internal data processing functions.
The communication interface 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the display apparatus 200. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.
A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can realize a user instruction input function through actions such as voice, touch, gesture, pressing, and the like, and the input interface converts the received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the instruction signal to the display device 200.
The output interface includes an interface that transmits the received user instruction to the display apparatus 200. In some embodiments, the interface may be an infrared interface or a radio frequency interface. Such as: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the display device 200 through the rf transmitting terminal.
In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is configured with a communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may transmit the user input command to the display device 200 through the WiFi protocol, or the bluetooth protocol, or the NFC protocol code.
A memory 190 for storing various operation programs, data and applications for driving and controlling the control apparatus 200 under the control of the controller. The memory 190 may store various control signal commands input by a user.
And a power supply 180 for providing operation power support for each element of the control device 100 under the control of the controller. A battery and associated control circuitry.
In some embodiments, the system may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.
Referring to fig. 4, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer from top to bottom.
In some embodiments, at least one application program runs in the application program layer, and the application programs can be Window (Window) programs carried by an operating system, system setting programs, clock programs, camera applications and the like; or may be an application developed by a third party developer such as a hi program, a karaoke program, a magic mirror program, or the like. In specific implementation, the application packages in the application layer are not limited to the above examples, and may actually include other application packages, which is not limited in this embodiment of the present application.
The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.
As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.
In some embodiments, the activity manager is to: managing the life cycle of each application program and the general navigation backspacing function, such as controlling the exit of the application program (including switching the user interface currently displayed in the display window to the system desktop), opening, backing (including switching the user interface currently displayed in the display window to the previous user interface of the user interface currently displayed), and the like.
In some embodiments, the window manager is configured to manage all window processes, such as obtaining a display size, determining whether a status bar is available, locking a screen, intercepting a screen, controlling a display change (e.g., zooming out, dithering, distorting, etc.) and the like.
In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.
In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (such as fingerprint sensor, temperature sensor, touch sensor, pressure sensor, etc.), and so on.
In some embodiments, the kernel layer further comprises a power driver module for power management.
In some embodiments, software programs and/or modules corresponding to the software architecture of fig. 4 are stored in the first memory or the second memory shown in fig. 2 or 3.
In some embodiments, for a display device with a touch function, taking a split screen operation as an example, the display device receives an input operation (such as a split screen operation) that a user acts on a display screen, and the kernel layer may generate a corresponding input event according to the input operation and report the event to the application framework layer. The window mode (such as multi-window mode) corresponding to the input operation, the position and size of the window and the like are set by an activity manager of the application framework layer. And the window management of the application program framework layer draws a window according to the setting of the activity manager, then sends the drawn window data to the display driver of the kernel layer, and the display driver displays the corresponding application interface in different display areas of the display screen.
In some embodiments, as shown in fig. 5, the application layer containing at least one application may display a corresponding icon control in the display, such as: the system comprises a live television application icon control, a video on demand application icon control, a media center application icon control, an application center icon control, a game application icon control and the like.
In some embodiments, the live television application may provide live television via different signal sources. For example, a live television application may provide television signals using input from cable television, radio broadcasts, satellite services, or other types of live television services. And, the live television application may display video of the live television signal on the display device 200.
In some embodiments, a video-on-demand application may provide video from different storage sources. Unlike live television applications, video on demand provides a video display from some storage source. For example, the video on demand may come from a server side of the cloud storage, from a local hard disk storage containing stored video programs.
In some embodiments, the media center application may provide various applications for multimedia content playback. For example, a media center, which may be other than live television or video on demand, may provide services that a user may access to various images or audio through a media center application.
In some embodiments, an application center may provide storage for various applications. The application may be a game, an application, or some other application associated with a computer system or other device that may be run on the smart television. The application center may obtain these applications from different sources, store them in local storage, and then be operable on the display device 200.
The hardware or software architecture in some embodiments may be based on the description in the above embodiments, and in some embodiments may be based on other hardware or software architectures that are similar to the above embodiments, and it is sufficient to implement the technical solution of the present application.
In some embodiments, the application center may be configured with a voice assistant application to implement intelligent voice services, such as services of searching media assets, adjusting volume, and the like. The user can wake up the voice assistant application by sending a voice command to the display device, the voice command can be some preset wake-up words, and after the voice assistant application wakes up, the user can interact with the voice assistant application to perform voice control on the display device. After receiving a voice command of a user, the intelligent voice assistant needs to perform voice recognition on the voice command to obtain a recognized text, and since many characters are easy to be confused, the recognized text has a certain error probability.
In order to solve the above technical problem, an embodiment of the present application shows an overall flow chart of text error correction, and with reference to fig. 6, a natural language text is input into an end-to-end error detection and correction model for error correction to obtain a first error correction result, where the end-to-end error detection and correction model sequentially performs processing on the natural language text, that is, the text to be error corrected, such as Bert vector representation, phono-configurational confusion representation of characters, multilayer graph neural networks, hidden vector classification to generate characters, and the like; then, the Elasticissearch engine performs candidate recall on the first error correction result according to the error correction word bank to obtain a recall result, wherein the candidate recall comprises Elasticissearch search, inverted index of the error correction word bank and the like; and finally, performing candidate sorting on the recall result to obtain a sorting result, and generating a final error correction result corresponding to the natural language text according to the sorting result, wherein the candidate sorting comprises processing of editing distance, threshold filtering and the like.
To further introduce the text error correction method in fig. 6, an embodiment of the present application further provides a flowchart of a text error correction method, and referring to fig. 7, the method may be used in a display device, and includes the following steps:
step S10: and correcting the text to be corrected based on the phono-configurational similarity confusion set and the graph attention machine mechanism to obtain an initial corrected text.
In some embodiments, the voice assistant application of the display device may receive the user's voice commands upon waking up. The controller of the display device acquires the voice command received by the voice assistant application, and performs voice conversion on the voice command to obtain a text to be corrected, wherein the text to be corrected may have some errors with an actual text corresponding to the voice command, and the actual text needs to be obtained by correcting errors, and may be referred to as a final corrected text.
In some embodiments, it may take a certain time, for example, 1 second, to correct the error by the display device, if the final error correction text is displayed after the error correction, a user experience that the response of the display device is slow may be brought to the user, and in order to avoid that the user waits for the display device to respond for a long time, after the text to be corrected is obtained, the display may be controlled to display the text to be corrected first, and the error correction is performed in the background.
In some embodiments, the display device may construct an end-to-end error detection and correction model for performing preliminary error correction on the text to be corrected. Referring to fig. 8, which is a schematic structural diagram of an end-to-end error detection and correction model according to some embodiments, as shown in fig. 8, a text to be corrected, such as "encounter inverse competition", is input into a Bert Extractor, and an initial characterization matrix H is output, where H includes H0、H1、……Ht+1Where Trm represents the encoding output of the transform layer, EMB represents word embedding for a character, Trm takes EMB as input, and t represents the character length requested by the user.
Inputting an initial characterization matrix into a GCN Network (Graph connected neural Network), and respectively inputting a pronunciation similar confusion set knowledge Graph and a shape similar confusion set knowledge Graph of a text to be corrected into the GCN Network to update the initial characterization matrix, wherein the GCN Network has 3 layers in total: layer _1, layer _1 and layer _3,
the GCN Network inputs the output result to a classifier, and outputs an error correction result of an end-to-end error detection and correction model, such as 'encountering adversity', through the classifier, wherein the error correction result can be used as an initial error correction text, the classifier can be a hidden vector classifier and can perform hidden vector classification, and the dotted boxes in the classifier represent the probability distribution of each character prediction, such as 80%, 70% and 85% … ….
Referring to fig. 9, a flowchart of a parsing method for a text to be corrected according to some embodiments of the present application is shown, and as shown in fig. 9, the parsing method may include steps S101 to S105.
Step S101: and extracting the characteristics of the text to be corrected to obtain an initial characterization matrix.
In some embodiments, feature extraction may be performed on the text to be corrected through a Bert model.
The Bert model uses a bidirectional Transformer as an Encoder, and uses a Masked language model (mask LM) and a Next sequence Prediction (Next Sentence Prediction) to capture word and Sentence-level representations respectively. And inputting the text to be corrected into the Bert model, and outputting an initial characterization matrix H.
Step S102: and creating an adjacency matrix of each character in the text to be corrected according to the sound-shape similar confusion set.
The pronunciation-shape similar confusion set comprises a preset pronunciation-similar confusion set and a preset shape-similar confusion set, wherein the pronunciation-similar confusion set is a preset character set which is easy to be confused due to similar pronunciation; the shape similarity confusion set is a preset character set which is easy to be confused due to similarity of shapes. In some embodiments, the set of near-sonic confusion results from data analysis of user data, which may include user input data on a display device.
In some embodiments, for the character "race" in the text "run against the race" to be corrected, the pronunciation similarity confusion set is obtained as { gold, silence, border, well, host, race }, and the shape similarity confusion set is obtained as { jing, mirror, race, scene, border }.
In some embodiments, the confusing set of "racing" words may include more than just the above-described Chinese characters, such as: after the Jingjing deer who had developed the Jinglu in the quiet scene and went into gold competition, Jingjing pure drowsiness clean in the eye-Yanjing walls were prohibited, and the confusion set of the Jingjing deer was the words after the colon was signed.
The method for creating the adjacency matrix can be seen in FIG. 10, which includes steps S1021-S1023.
Step S1021: and acquiring pronunciation similar characters and shape similar characters of each character in the text to be corrected in the sound-shape similar confusion set.
And respectively extracting the pronunciation similar characters and the shape similar characters of each character in the pronunciation and shape similar confusion set from the pronunciation similar confusion set and the shape similar confusion set.
Step S1022: and taking the characters in the text to be corrected, the pronunciation similar characters and the characters in the word stock as nodes, taking the relation between the characters as edges, and establishing a pronunciation similar adjacency matrix.
In some embodiments, Chinese characters commonly used in life can be selected to form a character library, and alternative characters are provided for the text to be corrected.
Taking the character in the text to be corrected as a center node, taking the pronunciation similar character of the character and the characters in the word stock except the character in the text to be corrected and the pronunciation similar character thereof as edge nodes, taking the relationship between the characters as edges, and establishing a knowledge graph of the pronunciation similar confusion set, wherein each edge represents 0 or 1, 1 represents that two nodes of the edge are close, and 0 represents that two nodes of the edge are not close, for example, the edge connecting the character in the text to be corrected and the character in the pronunciation similar confusion set can be represented as 1, and the edge connecting the character in the text to be corrected and the character in the word stock which does not belong to the pronunciation similar confusion set can be represented as 0.
The knowledge graph of the pronunciation similarity confusion set can be represented as an N × N adjacency matrix, wherein N represents the number of the common Chinese characters, namely the number of the characters in the character library, such as 5000.
Step S1023: and taking the characters in the text to be corrected, the shape similar characters and the characters in the word stock as nodes, taking the relation between the characters as edges, and establishing a shape similar adjacency matrix.
Taking characters in a text to be corrected as central nodes, taking the shape similar characters of the characters and characters in a word stock as edge nodes, taking the relationship between the characters as edges, and establishing a knowledge graph of the shape similar confusion set, wherein each edge represents 0 or 1, 1 represents that two nodes of the edge are close, and 0 represents that two nodes of the edge are not close. For example, the edge of the character in the text to be corrected connected with the character in the shape-similar confusion set can be represented as 1, and the edge of the character in the text to be corrected connected with the character in the word stock which does not belong to the shape-similar confusion set can be represented as 0. The knowledge-graph of the similarly shaped confusion set may also be represented as an N x N adjacency matrix.
Step S103: and inputting the initial characterization matrix and the adjacent matrix into a multilayer graph convolutional neural network to obtain a next layer characterization matrix.
And constructing a 3-layer graph neural network, such as a layer-layer 3 in fig. 8, wherein the input of the layer is the coded output of the Bert Extractor, the inputs of the layer2 and the layer3 are the output of the previous layer, and the H and the adjacency matrix A are used as the input of the multilayer graph convolutional neural network to extract the semantic information of a deeper layer. The adjacency matrix A comprises an adjacency matrix corresponding to the knowledge graph of the pronunciation similar confusion set and an adjacency matrix corresponding to the knowledge graph of the shape similar confusion set.
Taking the H and the adjacency matrix A as the input of the multilayer graph convolution neural network to obtain a characterization matrix of a second layer; taking the characterization matrix and the adjacency matrix A of the second layer as the input of the multilayer graph convolutional neural network, obtaining the characterization matrix … … of the third layer, and so on, obtaining the characterization matrix H of each layer graph convolutional layer of the multilayer graph convolutional neural networklThe calculation formula is as follows:
(1) wherein l represents the l-th layer, I represents the identity matrix corresponding to A,a matrix representing the characteristics of the incoming node self-join, which may be referred to as an adjacency estimation matrix,the diagonal matrix representing a corresponds to a, which may be referred to as a diagonal estimation matrix, whose values at diagonal positions are also the degrees of the corresponding nodes. i and j are both between 0 and N. Hl-1I.e. HlA characterization matrix of the previous layer, WlRepresenting the training parameters of layer i.
Step S104: and obtaining a final layer of characterization matrix of the multilayer graph convolution neural network according to a graph attention machine.
In some embodiments, a graph attention mechanism may be introduced to combine the knowledge of similar pronunciation and similar shape to obtain a final layer of characterization matrix Hl+1。
Calculating a knowledge-fused attention characterization matrix C by adopting an attention mechanismlCalculatingThe formula is as follows:
(2) in the formula, ClIs a matrix in dimensions N x D, D representing the vector dimensions after Bert encoding, fk(Ak,Hl)iIs the ith row of the graph convolution output for the graph K, which is an adjacent matrix of the kth word and may also be represented as AkS represents a similar shape, and p represents a similar pronunciation.Represents the scalar weight of the ith character for the graph k, and l represents the number of layers W of the neural networkaIs thatβ is a hyperparameter, β may be a constant, e.g., 3.
The characterization matrix H of the last layer is calculated according to the following formulal+1:
Step S105: and generating characters through the full connection layer and the probability normalization function.
In some embodiments, the characters may be generated according to a probability normalization function:
(4) where X represents the entire user request, such as "run the reverse,the word representing the first position is y p representing the probability,representing the probability that the input is X and the ith character position is y, W represents the training weight parameter for the fully-connected layer.
According to the formula (4), the probability that each character position is a certain character can be obtained, if a character position exists in a plurality of candidate characters, such as "border" and "competition", the character corresponding to the maximum probability value is selected as the character of the character position, wherein the character of each character position candidate can be represented according to the characterization matrix H of the last layerl+1Thus obtaining the product. In some embodiments, after the text to be corrected is input into the end-to-end error detection and correction model, the character "encountered adversity" which may be referred to as the initial corrected text may be generated and output.
Step S20: and candidate recalling the text to be corrected and the initial correction text to obtain a recalling text.
In some embodiments, the initial error correction text may be subjected to an Elasticsearch query to obtain a first recall text.
An ES (distributed full-text search engine) is a full-text search server, and can also be used as a NoSQL database to store documents and data in any format. The full-text search engine of the ES is an open source search engine established on Lucene (full-text search framework), and can be used for full-text search and geographic information search.
Taking the initial error correction text as a query to perform an elastic search query, wherein the query modes include matching search, prefix search, suffix search and fuzzy search, and the query can be searched by combining various modes to obtain a first recall text. For example, the match search may be an exact search, requiring the words to be identical, a search for "pig pecks," and the search result may be "pig pecks.
In some embodiments, an inverted index data structure may be constructed based on the error correction lexicon, and the initial error correction text and the text to be corrected are respectively used as query queries to obtain a second recall text.
Inverted indexing, also commonly referred to as inverted indexing, posting or inverted archiving, is an indexing method used to store a mapping of the storage location of a word in a document or set of documents under a full-text search. Which is the most common data structure in document retrieval systems. By inverted indexing, a list of documents containing a word can be quickly retrieved from that word. The inverted index is mainly composed of two parts: a "word dictionary" and an "inverted file". And establishing mapping from Chinese characters to words based on the error correction word bank, constructing an inverted index, and searching similar words in the word bank by respectively taking the error correction result and the original text as query and combining the rule conditions of the number of similar characters or the number of similar pinyin.
In some embodiments, the recall text may include a first recall text and a second recall text.
Step S30: and performing candidate sorting on the recalling texts, and obtaining a final error correction text corresponding to the text to be corrected according to a sorting result.
Levenshtein (string similarity) edit distance is a metric for calculating the degree of difference between two strings, and is used to represent the minimum number of times a single character needs to be edited (e.g., modified, inserted, deleted) when modifying from one string to another. The larger the Levenshtein edit distance, the weaker the correlation between the two strings.
In some embodiments, an edit distance of each recalled text from the initial corrected text may be calculated separately, and the edit distance is divided by a length of a longest text of the recalled text and the initial corrected text to obtain a difference of the recalled text, wherein the longest text refers to a text with a largest number of words.
Furthermore, a difference threshold value can be set, the recalling texts are sorted according to the difference, and the recalling texts with the difference higher than the difference threshold value are filtered. Wherein the threshold value of the degree of difference may be set to a constant, for example 0.75.
And taking the recalling text with the minimum difference as the final error correction text of the text to be error corrected.
Embodiments of the present application further provide a server, where the server may be configured to execute the text error correction method shown in fig. 7 to correct the text of the chinese.
In some embodiments, the server may be in communication connection with the display device, the display device may send the text to be corrected to the server, and the server obtains the final corrected text according to the text correction method shown in fig. 7, and then sends the final corrected text to the display device, so that the display device displays the final corrected text.
Referring to fig. 11-13, which are schematic diagrams of a voice interaction interface according to some embodiments, as shown in fig. 11, a wake-up word of a voice assistant application may be "haixin xiao ji", and after the voice assistant application wakes up, a recording prompt word may be displayed, such as "… listening in", prompting a user to issue a voice command; as shown in fig. 12, after the user sends a voice command, the display device may perform voice conversion on the voice command and display the voice command in real time, where the text after the voice conversion may be a text to be corrected, and if "reverse competition" is encountered, the display device may perform text correction in a background process of the display device according to the method shown in fig. 7 after displaying the text to be corrected, so as to obtain a final corrected text, or the display device uploads the text to be corrected to a server, performs text correction by the server, and then returns the final corrected text to the display device; as shown in fig. 13, after the final error correction text is obtained, the display device may refresh the text to be error corrected into the final error correction text. Furthermore, the display device can also respond according to the final error correction text, such as controlling the display device or playing corresponding audio and video media assets and the like.
According to the embodiment, the pronunciation similar knowledge map and the shape similar knowledge map are generated according to the confusion set corresponding to the text to be corrected, Pinyin and character pattern related knowledge of the Chinese characters are merged into the graph neural network, deep semantic information among similar characters is extracted, the pronunciation and character pattern similar knowledge can be effectively utilized, and the error detection and correction accuracy and the recall rate are improved.
Since the above embodiments are all described by referring to and combining with other embodiments, the same portions are provided between different embodiments, and the same and similar portions between the various embodiments in this specification may be referred to each other. And will not be described in detail herein.
It is noted that, in this specification, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a circuit structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such circuit structure, article, or apparatus. Without further limitation, the presence of an element identified by the phrase "comprising an … …" does not exclude the presence of other like elements in a circuit structure, article, or device comprising the element.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims. The above embodiments of the present application do not limit the scope of the present application.
Claims (10)
1. A display device, comprising:
a display;
a controller connected with the display, the controller configured to:
responding to a received voice command input by a user, and performing voice conversion on the voice command to obtain a text to be corrected;
controlling a display to display the text to be corrected;
correcting the text to be corrected based on a phonological close confusion set and a graph attention machine mechanism to obtain an initial corrected text;
candidate recalling the text to be corrected and the initial correction text, and obtaining a final correction text according to a sorting result of the recalled texts;
and controlling a display to refresh the text to be corrected into a final corrected text.
2. The display device according to claim 1, wherein the error correction of the text to be corrected based on the phonographic close confusion set and the graph attention machine mechanism comprises:
extracting features of the text to be corrected to obtain an initial characterization matrix;
creating an adjacent matrix of each character in the text to be corrected according to the sound-shape similar confusion set;
inputting the initial characterization matrix and the adjacency matrix into a multilayer graph convolutional neural network to obtain a next layer characterization matrix;
obtaining a last layer of characterization matrix of the multilayer graph convolution neural network according to a graph attention machine;
and generating characters through the full connection layer and the probability normalization function.
3. The display device according to claim 2, wherein the creating of the adjacency matrix of each character in the text to be corrected according to the phonological similar confusion set comprises:
acquiring pronunciation similar characters and shape similar characters of each character in the text to be corrected in the sound-shape similar confusion set;
taking the characters in the text to be corrected, the pronunciation similar characters and the characters in the word stock as nodes, taking the relation between the characters as edges, and establishing a pronunciation similar adjacency matrix;
and taking the characters in the text to be corrected, the similar-shape characters and the characters in the word stock as nodes, taking the relation between the characters as edges, and establishing a similar-shape adjacency matrix.
4. The display device of claim 2, wherein inputting the initial characterization matrix and adjacency matrix into a multi-layer graph convolutional neural network to obtain a next-layer characterization matrix comprises:
adding the adjacency matrix and the identity matrix to obtain an adjacency estimation matrix;
calculating a diagonal matrix corresponding to the adjacent estimation matrix to obtain a diagonal estimation matrix;
and obtaining a next-layer characterization matrix according to the adjacency estimation matrix, the diagonal estimation matrix and the initial characterization matrix.
5. The display device of claim 2, wherein the deriving a final layer characterization matrix of the multi-layer graph convolution neural network according to a graph attention machine comprises:
calculating an attention feature matrix of knowledge fusion by adopting an attention mechanism;
and obtaining a final layer of characterization matrix according to the sum of the attention characterization matrix and each layer of characterization matrix.
6. The display device according to claim 1, wherein the candidate recalling of the text to be corrected and the initial corrected text comprises:
performing Elasticissearch query on the initial error correction text to obtain a first recall text;
and constructing an inverted index data structure based on the error correction word bank, and respectively using the text to be corrected and the initial error correction text as query queries to obtain a second recall text.
7. The display device according to claim 1, wherein the obtaining of the final error correction text according to the sorting result of the recalled texts comprises:
respectively calculating the editing distance between each recall text and the initial error correction text;
dividing the editing distance by the length of the longest text in the recall text and the initial error correction text to obtain the difference degree of the recall text;
and sequencing the recall texts according to the difference degree, and taking the recall text with the minimum difference degree as a final error correction text of the text to be corrected.
8. The display device of claim 7, wherein the controller is further configured to:
and filtering the recalled texts with the difference degree higher than the difference degree threshold value.
9. A text correction method for a display device, comprising:
correcting the text to be corrected based on the phonetic-shape similar confusion set and the drawing attention machine mechanism to obtain an initial corrected text,
candidate recalling the text to be corrected and the initial correction text to obtain a recalling text;
and sequencing the recalling texts, and obtaining a final error correction text corresponding to the text to be error corrected according to a sequencing result.
10. A server, wherein the server is configured to:
receiving a text to be corrected from a display device;
correcting the text to be corrected based on the phonetic-shape similar confusion set and the graph attention machine mechanism to obtain an initial corrected text,
candidate recalling the text to be corrected and the initial correction text, and obtaining a final correction text according to a sorting result of the recalled texts;
and sending the final error correction text to the display device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010879686.1A CN114118064A (en) | 2020-08-27 | 2020-08-27 | Display device, text error correction method and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010879686.1A CN114118064A (en) | 2020-08-27 | 2020-08-27 | Display device, text error correction method and server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114118064A true CN114118064A (en) | 2022-03-01 |
Family
ID=80374665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010879686.1A Pending CN114118064A (en) | 2020-08-27 | 2020-08-27 | Display device, text error correction method and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114118064A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114676684A (en) * | 2022-03-17 | 2022-06-28 | 平安科技(深圳)有限公司 | Text error correction method and device, computer equipment and storage medium |
CN114896965A (en) * | 2022-05-17 | 2022-08-12 | 马上消费金融股份有限公司 | Text correction model training method and device and text correction method and device |
CN115017276A (en) * | 2022-03-28 | 2022-09-06 | 连芷萱 | Multi-turn conversation method and system for government affair consultation by combining fuzzy logic and R-GCN |
CN115293138A (en) * | 2022-08-03 | 2022-11-04 | 北京中科智加科技有限公司 | Text error correction method and computer equipment |
CN117874089A (en) * | 2023-12-05 | 2024-04-12 | 深圳市六度人和科技有限公司 | Automatic correction method, device, terminal and storage medium for search text |
-
2020
- 2020-08-27 CN CN202010879686.1A patent/CN114118064A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114676684A (en) * | 2022-03-17 | 2022-06-28 | 平安科技(深圳)有限公司 | Text error correction method and device, computer equipment and storage medium |
CN114676684B (en) * | 2022-03-17 | 2024-02-02 | 平安科技(深圳)有限公司 | Text error correction method and device, computer equipment and storage medium |
CN115017276A (en) * | 2022-03-28 | 2022-09-06 | 连芷萱 | Multi-turn conversation method and system for government affair consultation by combining fuzzy logic and R-GCN |
CN115017276B (en) * | 2022-03-28 | 2022-11-29 | 连芷萱 | Multi-turn conversation method and system for government affair consultation, government affair robot and storage medium |
CN114896965A (en) * | 2022-05-17 | 2022-08-12 | 马上消费金融股份有限公司 | Text correction model training method and device and text correction method and device |
CN114896965B (en) * | 2022-05-17 | 2023-09-12 | 马上消费金融股份有限公司 | Text correction model training method and device, text correction method and device |
CN115293138A (en) * | 2022-08-03 | 2022-11-04 | 北京中科智加科技有限公司 | Text error correction method and computer equipment |
CN117874089A (en) * | 2023-12-05 | 2024-04-12 | 深圳市六度人和科技有限公司 | Automatic correction method, device, terminal and storage medium for search text |
CN117874089B (en) * | 2023-12-05 | 2024-08-09 | 深圳市六度人和科技有限公司 | Automatic correction method, device, terminal and storage medium for search text |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112511882B (en) | Display device and voice call-out method | |
CN114118064A (en) | Display device, text error correction method and server | |
CN110737840A (en) | Voice control method and display device | |
CN112163086B (en) | Multi-intention recognition method and display device | |
CN111984763B (en) | Question answering processing method and intelligent device | |
WO2021103398A1 (en) | Smart television and server | |
CN112000820A (en) | Media asset recommendation method and display device | |
CN112182196A (en) | Service equipment applied to multi-turn conversation and multi-turn conversation method | |
CN112188249B (en) | Electronic specification-based playing method and display device | |
CN112002321B (en) | Display device, server and voice interaction method | |
CN114187905A (en) | Training method of user intention recognition model, server and display equipment | |
CN112165641A (en) | Display device | |
CN112492390A (en) | Display device and content recommendation method | |
CN114627864A (en) | Display device and voice interaction method | |
CN113468351A (en) | Intelligent device and image processing method | |
CN111950288B (en) | Entity labeling method in named entity recognition and intelligent device | |
CN112256232B (en) | Display device and natural language generation post-processing method | |
CN112261290B (en) | Display device, camera and AI data synchronous transmission method | |
CN114155846A (en) | Semantic slot extraction method and display device | |
CN111914114A (en) | Badcase mining method and electronic equipment | |
CN113593559A (en) | Content display method, display equipment and server | |
CN111344664B (en) | Electronic apparatus and control method thereof | |
CN113038217A (en) | Display device, server and response language generation method | |
CN112199560A (en) | Setting item searching method and display device | |
CN111858856A (en) | Multi-round search type chatting method and display equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |