CN117710766A

CN117710766A - Service plate transportation method, device, electronic equipment and computer readable medium

Info

Publication number: CN117710766A
Application number: CN202311684080.2A
Authority: CN
Inventors: 黄龚; 徐振博; 孟阿瑾
Original assignee: Hangzhou Shifang Technology Co ltd
Current assignee: Hangzhou Shifang Technology Co ltd
Priority date: 2023-12-08
Filing date: 2023-12-08
Publication date: 2024-03-15

Abstract

Embodiments of the present disclosure disclose methods, apparatus, electronic devices, and computer-readable media for transporting dinner plates. One embodiment of the method comprises the following steps: acquiring a dinner plate image; inputting the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set; performing fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set; generating a dinner plate foreground image set; inputting the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, and obtaining a dinner plate attribute information set; and in response to receiving the attribute request sent by the user terminal, controlling the associated transportation equipment to transport the service plate attribute information group corresponding to the attribute request and the service plate attribute information group corresponding to the service plate to the user terminal. This embodiment may reduce waste of transportation resources.

Description

Service plate transportation method, device, electronic equipment and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, apparatus, electronic device, and computer readable medium for transporting dinner plates.

Background

By identifying the dinner plate attribute information set of the dinner plate image, the dinner plate meeting the user requirement can be transported to the user terminal. At present, transportation dinner plate is through the mode that adopts: firstly, different dinner plates are identified based on a target detection model, then a dinner plate attribute information set of the dinner plates is identified through a dinner plate attribute information model trained by a full gradient descent algorithm, and finally, the dinner plates corresponding to the dinner plate attribute information set meeting the user requirements in the dinner plate attribute information set are transported to a user terminal through transport equipment.

However, the following technical problems generally exist in the above manner:

firstly, dishes are included in the dishes identified through the target detection model, influence of the dishes on the dish attribute information is not considered, so that accuracy of a dish attribute information set obtained by identifying the dishes including the dishes is low, and therefore, the dishes transported to a user terminal through transport equipment do not meet user requirements, and transport resources of the transport equipment are wasted;

secondly, when the dinner plate attribute information model is subjected to parameter adjustment through a full gradient descent algorithm, a large amount of computing resources are consumed as the whole training sample set is used for each training, so that the computing resources are wasted;

Thirdly, when the target detection model is trained, the sample labels included by the training samples are required to be marked manually, so that the accuracy of the sample labels is low, the accuracy of the identified dinner plate attribute information set is low through the target detection model trained by the training samples with low accuracy, and therefore, the dinner plate transported to the user terminal through the transportation equipment is not in line with the user demand, and transportation resources of the transportation equipment are wasted.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose methods, apparatus, electronic devices, and computer-readable media for transporting dinner plates to address one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method of transporting a tray, the method comprising: acquiring a dinner plate image; inputting the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set; performing fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set; generating a set of tray foreground images based on the tray image and the set of tray foreground masks; inputting the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, and obtaining a dinner plate attribute information set; and in response to receiving the attribute request sent by the user terminal, controlling the associated transportation equipment to transport the service plate attribute information set corresponding to the attribute request and the service plate attribute information set corresponding to the service plate to the user terminal.

In a second aspect, some embodiments of the present disclosure provide a tray transport device, the device comprising: an acquisition unit configured to acquire a dinner plate image; a first input unit configured to input the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set; a fusion unit configured to perform fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set; a generation unit configured to generate a dinner plate foreground image set based on the dinner plate image and the dinner plate foreground mask set; the second input unit is configured to input the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, so as to obtain a dinner plate attribute information set; and the control unit is configured to control the associated transportation equipment to transport the dinner plate attribute information set corresponding to the attribute request and the corresponding dinner plate to the user terminal in response to receiving the attribute request sent by the user terminal.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

The above embodiments of the present disclosure have the following advantageous effects: by the dinner plate conveying method of some embodiments of the present disclosure, waste of conveying resources can be reduced. In particular, the reason why transportation resources of transportation equipment are wasted is that: dish is included in the dinner plate identified through the target detection model, influence of the dish on the dinner plate attribute information is not considered, accuracy of the dinner plate attribute information set obtained by identifying the dinner plate including the dish is low, and therefore the dinner plate transported to the user terminal through the transportation equipment does not meet user requirements. Based on this, the tray transportation method of some embodiments of the present disclosure first acquires a tray image. And secondly, inputting the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set. Thus, the first division mask information characterizing the dish and the second division mask information characterizing the dish can be obtained. Then, the first segmentation mask information set and the second segmentation mask information set are fused to generate a dinner plate foreground mask set. Thus, individual dish foreground masks that characterize dish-removed dishes may be obtained. A set of tray foreground images is then generated based on the tray images and the set of tray foreground masks. Thus, an image corresponding to the dish foreground mask can be obtained. And then, inputting the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, and obtaining a dinner plate attribute information set. Therefore, a dinner plate attribute information set corresponding to the dinner plate foreground image set can be obtained through the dinner plate attribute information generation model, so that a dinner plate meeting the user requirements can be obtained through the dinner plate attribute information set. And finally, in response to receiving the attribute request sent by the user terminal, controlling the associated transportation equipment to transport the service plate attribute information set corresponding to the attribute request and the service plate attribute information set corresponding to the service plate to the user terminal. Therefore, the dinner plate meeting the user demand can be transported to the user terminal through the transportation equipment. Thus, waste of transportation resources can be reduced.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of a tray transportation method according to the present disclosure;

fig. 2 is a schematic structural view of some embodiments of a tray transport device according to the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1, a flow 100 of some embodiments of a tray transportation method according to the present disclosure is shown. The dinner plate conveying method comprises the following steps:

And step 101, obtaining a dinner plate image.

In some embodiments, the subject (e.g., computing device) performing the tray transportation method may obtain the tray image from the terminal device by way of a wired connection or a wireless connection. The dinner plate image may be an image including dinner plates and dishes. Here, dishes are placed over the dish plate.

And 102, inputting the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set.

In some embodiments, the executing body may input the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set. The pre-trained segmentation mask information generation model may be a neural network model that takes a dinner plate image as an input and takes a first segmentation mask information set and a second segmentation mask information set as outputs. The first partition mask information in the first partition mask information set may characterize a binary mask map of a meal tray. The second partition mask information in the second partition mask information set may characterize a binary mask map of the dish.

Alternatively, the pre-trained segmentation mask information generation model may be trained by:

first, a training sample set is obtained.

In some embodiments, the executing entity may obtain the training sample set from the terminal device through a wired connection or a wireless connection. Wherein, the training samples in the training sample set include: sample tray image.

And secondly, labeling each sample dinner plate image included in the training sample set to generate sample segmentation mask information, and obtaining a sample segmentation mask information set.

In some embodiments, the executing body may perform labeling processing on each sample dinner plate image included in the training sample set to generate sample segmentation mask information, so as to obtain a sample segmentation mask information set. Wherein, the sample segmentation mask information in the sample segmentation mask information set may include, but is not limited to, at least one of the following: a sample first segmentation mask information set and a sample second segmentation mask information set.

Third, an initial segmentation mask information generation model is determined.

In some embodiments, the execution body may determine an initial segmentation mask information generation model. Wherein, the initial segmentation mask information generation model may include, but is not limited to, at least one of the following: an initial first segmentation mask information generation model and an initial second segmentation mask information generation model.

The initial first segmentation mask information generation model may be a first custom model that takes a sample dinner plate image as an input and takes an initial first segmentation mask information set as an output. The first segmentation mask information in the initial first segmentation mask information set may characterize first segmentation mask information characterizing the dinner plate generated by the initial first segmentation mask information generation model. The first custom model may include three layers:

a first layer, an input layer, for transferring a sample tray image to a second layer.

A second layer, a handle layer, comprising: an initial first dish segmentation model and an initial second dish segmentation model. The initial first tray segmentation model may be an example segmentation model with a sample tray image as input and a first tray mask information set as output, and the initial second tray segmentation model may be an example segmentation model with a sample tray image as input and a second tray mask information set as output. The first tray mask information in the first tray mask information set may represent a mask map representing a tray outputted through the first tray segmentation model. The second dish mask information in the second dish mask information set may represent a mask map representing a dish output through the second dish segmentation model. For example, the first disk segmentation model may be a Mask R-CNN (instance segmentation network) model. The second dish-segmentation model may be a Mask2Former model.

And the third layer is an output layer used for selecting the output of the initial first dinner plate segmentation model or the initial second dinner plate segmentation model as the output of the whole first custom model. For example, when the number of first dish mask information in the first dish mask information group is equal to or greater than the number of second dish mask information in the second dish mask information group, the first dish mask information group is used as an initial first division mask information group. And when the number of the first dinner plate mask information in the first dinner plate mask information set is smaller than the number of the second dinner plate mask information in the second dinner plate mask information set, the second dinner plate mask information set is used as an initial first segmentation mask information set.

The initial second segmentation mask information generation model may be a second custom model with the sample dinner plate image as input and the initial second segmentation mask information set as output. The second partition mask information in the initial second partition mask information set may characterize second partition mask information characterizing the dishes generated by the initial second partition mask information generation model. The second custom model may include three layers:

A second layer, a handle layer, comprising: an initial first dish segmentation model and an initial second dish segmentation model. The initial first dish segmentation model may be a semantic segmentation model with the sample dish image as input and the first set of dish mask information as output. The initial second dish segmentation model may be a semantic segmentation model with the sample dish image as input and the second set of dish mask information as output. The first dish mask information in the first set of dish mask information may characterize a mask map characterizing dishes output by the initial first dish segmentation model. The second dish mask information in the second set of dish mask information may characterize a mask map characterizing dishes output by the initial second dish segmentation model. For example, the initial first dish segmentation model may be a deep labv3 (semantic segmentation algorithm) model. The initial second dish segmentation model may be an FCN (Fully Convolutional Networks, full convolutional network) model.

And the third layer is an output layer used for selecting the output of the initial first dish segmentation model or the initial second dish segmentation model as the output of the whole second custom model. For example, when the number of first dish mask information in the first dish mask information group is equal to or greater than the number of second dish mask information in the second dish mask information group, the first dish mask information group is determined as an initial second partition mask information group. And determining the second dish mask information group as an initial second partition mask information group when the number of the first dish mask information in the first dish mask information group is smaller than the number of the second dish mask information in the second dish mask information group.

And step four, selecting training samples from the training sample set.

In some embodiments, the executing entity may select a training sample from the training sample set. In practice, the executing entity may randomly select training samples from the training sample set.

And fifthly, inputting a sample dinner plate image included in the training sample into the initial first segmentation mask information generation model to obtain an initial first segmentation mask information set.

In some embodiments, the executing body may input a sample dinner plate image included in the training sample into the initial first segmentation mask information generation model to obtain an initial first segmentation mask information set.

And sixthly, inputting a sample dinner plate image included in the training sample into the initial second segmentation mask information generation model to obtain an initial second segmentation mask information set.

In some embodiments, the executing body may input a sample dinner plate image included in the training sample into the initial second segmentation mask information generation model to obtain an initial second segmentation mask information set.

Seventh, determining a first segmentation difference value between the initial first segmentation mask information set and a first segmentation mask information set included in the sample segmentation mask information corresponding to the training sample based on a preset first segmentation loss function.

In some embodiments, the execution body may determine a first segmentation difference value between the initial first segmentation mask information set and a first segmentation mask information set included in the sample segmentation mask information corresponding to the training sample based on a preset first segmentation loss function. The preset first loss function may be, but is not limited to: mean square error loss function (MSE), cross entropy loss function (cross entropy), 0-1 loss function, absolute loss function, log loss function, square loss function, exponential loss function, and the like.

And eighth step, determining a second segmentation difference value between the initial second segmentation mask information set and a second segmentation mask information set included in the sample segmentation mask information corresponding to the training sample based on a preset second segmentation loss function.

In some embodiments, the execution body may determine a second segmentation difference value between the initial second segmentation mask information set and a second segmentation mask information set included in the sample segmentation mask information corresponding to the training sample based on a preset second segmentation loss function. The preset second loss function may be, but is not limited to: mean square error loss function (MSE), cross entropy loss function (cross entropy), 0-1 loss function, absolute loss function, log loss function, square loss function, exponential loss function, and the like.

And a ninth step of adjusting network parameters of the initial segmentation mask information generation model in response to determining that the first segmentation difference value and the second segmentation difference value satisfy a preset segmentation difference condition.

In some embodiments, the execution body may adjust the network parameters of the initial segmentation mask information generation model in response to determining that the first segmentation difference value and the second segmentation difference value satisfy a preset segmentation difference condition. The preset segmentation difference condition may be that the first segmentation difference value is greater than a preset segmentation difference value and the second segmentation difference value is greater than a preset segmentation difference value. For example, the first and second segmentation differences may be differentiated from each other. On this basis, the error value is transmitted forward from the last layer of the model by using back propagation, random gradient descent and the like to adjust the parameters of each layer. Of course, a network freezing (dropout) method may be used as needed, and network parameters of some layers therein may be kept unchanged and not adjusted, which is not limited in any way. The setting of the preset division difference value is not limited, and for example, the preset division difference value may be 0.1.

In practice, the executing body may perform labeling processing on each sample dinner plate image included in the training sample set by the following steps to generate sample segmentation mask information:

the first step, in response to receiving first click information sent by the user terminal, performs the following labeling substeps:

a first sub-step of combining the first click information with the sample tray image to generate initial tray click information. The first click information may characterize a case that the user terminal clicks on the sample dinner plate image. The user terminal may be a terminal that clicks on the dinner plate image to annotate the dinner plate image to obtain the segmentation mask map. For example, the first click information may be a two-dimensional sparse matrix.

And a second sub-step of inputting the initial dinner plate click information into the attention model to obtain initial dinner plate attention information. The attention model may be a pre-trained attention model with initial tray click information as input and initial tray attention information as output. For example, the attention model may be a transform model.

And a third sub-step of inputting the initial dinner plate attention information into the segmentation mask prediction model to obtain an initial segmentation mask diagram. The segmentation mask prediction model may be a neural network model that is pre-trained with initial dish attention information as input and an initial segmentation mask map as output. For example, the segmentation mask prediction model may be a segmentation head (segmentation head) model. For example, the segmentation mask prediction model may include: a convolution layer, a normalization layer and an activation layer.

And a fourth sub-step of determining the initial segmentation mask map as sample segmentation mask information in response to receiving the dinner plate completion information transmitted by the user terminal. Wherein the dinner plate completion information may characterize the user terminal to confirm that the initial segmentation mask map has been annotated as complete.

Alternatively, in response to receiving the second click information sent by the user terminal, the executing body may determine the second click information as the first click information, and may determine the initial segmentation mask map as the sample dinner plate image, so as to execute the labeling step again. Wherein the second click information may characterize a case where the user terminal clicks on the initial segmentation mask map. For example, the second click information may be a two-dimensional sparse matrix.

Therefore, the user terminal can click on the sample dinner plate image for a small number of times, and accurate sample segmentation mask information can be obtained through the attention model and the segmentation mask prediction model.

The optional technical content in step 102 is taken as an invention point of the embodiment of the present disclosure, and solves the third "technical problem mentioned in the background art, which causes waste of transportation resources of transportation equipment. The factors that cause the waste of transportation resources of transportation equipment are often as follows: when the target detection model is trained, the sample labels included by the training samples are required to be marked manually, so that the accuracy of the sample labels is low, the accuracy of the identified dinner plate attribute information set is low through the target detection model trained by the training samples with low accuracy, and therefore the dinner plate transported to the user terminal through the transportation equipment is not in accordance with the user requirements. If the above factors are solved, the effect of reducing the waste of transportation resources can be achieved. To achieve this, first, in response to receiving first click information sent by the user terminal, the following labeling steps are performed: first, the first click information is combined with the sample tray image to generate initial tray click information. Thus, initial tray click information can be obtained so as to label the sample tray image through the model later. Secondly, inputting initial dinner plate click information into an attention model to obtain initial dinner plate attention information. Therefore, more accurate initial dinner plate attention information can be obtained through the attention model. Thirdly, the initial dinner plate attention information is input into the segmentation mask prediction model, and an initial segmentation mask diagram is obtained. Thus, a more accurate initial segmentation mask map can be obtained through the segmentation mask prediction model. Fourth, in response to receiving the dish completion information transmitted from the user terminal, the initial segmentation mask map is determined as sample segmentation mask information. Thus, more accurate sample segmentation mask information meeting the user requirements can be obtained. Accordingly, more accurate sample division mask information as a sample label of the division mask information generation model can be obtained by the attention model and the division mask prediction model. Therefore, a more accurate segmentation mask information generation model can be trained, so that a more accurate dinner plate attribute information set can be obtained through the more accurate segmentation mask information generation model. Furthermore, the dinner plate meeting the user demand can be transported to the user terminal through the transportation equipment. Thus, waste of transportation resources can be reduced.

Optionally, in response to determining that the first segmentation difference value and the second segmentation difference value do not meet the preset segmentation difference condition, determining the initial segmentation mask information generation model as a trained segmentation mask information generation model.

In some embodiments, the execution body may determine the initial segmentation mask information generation model as the trained segmentation mask information generation model in response to determining that the first segmentation difference value and the second segmentation difference value do not satisfy the preset segmentation difference condition.

Step 103, fusion processing is performed on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set.

In some embodiments, the execution body may perform fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set.

In practice, the executing body may perform fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set through the following steps:

and a first step of removing, for each first division mask information in the first division mask information set, the second division mask information from the first division mask information in response to determining that the first division mask information and the second division mask information satisfy a preset corresponding condition for each second division mask information in the second division mask information set, thereby obtaining a dinner plate foreground mask. The preset corresponding condition may be that the position of the dinner plate corresponding to the first division mask information is the same as the position of the dish corresponding to the second division mask information.

And secondly, determining the obtained service plate foreground masks as service plate foreground mask sets.

Step 104, generating a dinner plate foreground image set based on the dinner plate image and the dinner plate foreground mask set.

In some embodiments, the tray image and the tray foreground mask set, the executing entity may be based on generating the tray foreground image set.

In practice, the dinner plate image and the dinner plate foreground mask set, the executing body may generate the dinner plate foreground image set by:

first, for each tray foreground mask in the tray foreground mask set, a tray foreground image is selected from the tray images based on the tray foreground mask. In practice, the executing body may select a dinner plate foreground image from the dinner plate images based on the dinner plate foreground mask by: first, an image corresponding to the dinner plate foreground mask is selected from the dinner plate images as an initial dinner plate foreground image. Then, taking an image with a preset width at the outermost periphery of the initial dinner plate foreground image as a dinner plate foreground image. The image corresponding to the dinner plate foreground mask may be the same image as the dinner plate foreground mask area in the dinner plate image. For example, the predetermined width may be, but is not limited to, 1 centimeter.

Therefore, the initial dinner plate foreground image corresponding to the dinner plate foreground mask can be further intercepted, and an image which only remains the outermost periphery is obtained. Therefore, the influence of dishes on the inner side edge of the dinner plate foreground image can be reduced, so that a more accurate dinner plate attribute information set can be obtained later.

And secondly, determining the selected service plate foreground images as a service plate foreground image set.

Step 105, inputting the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, and obtaining a dinner plate attribute information set.

In some embodiments, the executing body may input the dinner plate foreground image in the dinner plate foreground image set into a pre-trained dinner plate attribute information generating model to generate a dinner plate attribute information set, so as to obtain a dinner plate attribute information set. The pre-trained dinner plate attribute information generation model can be a neural network model taking a dinner plate foreground image as input and taking a dinner plate attribute information set as output. Here, one of the tray attribute information groups in the tray attribute information group may correspond to one tray. The tray attribute information in the tray attribute information set may be, but is not limited to, at least one of: first dinner plate attribute information and second dinner plate attribute information. The first tray attribute information may characterize tray color. The second tray attribute information may characterize tray shape.

Alternatively, the pre-trained tray attribute information generation model may be trained by:

first, a training sample set is obtained.

In some embodiments, the executing entity may obtain the training sample set from the terminal device through a wired connection or a wireless connection. Wherein, the training samples in the training sample set include: a sample tray foreground image and a sample tray attribute information set. The sample tray attribute information in the sample tray attribute information set may be, but is not limited to, at least one of: first sample tray attribute information and second sample tray attribute information.

And secondly, determining an initial dinner plate attribute information generation model.

In some embodiments, the executive may determine an initial tray attribute information generation model. Wherein, the initial dinner plate attribute information generation model comprises: an initial first dinner plate attribute information generation model, an initial second dinner plate attribute information generation model and an initial splicing model.

The initial first dinner plate attribute information generation model may be a neural network model using a sample dinner plate foreground image as input and initial first dinner plate attribute information as output. The initial first tray attribute information may be information representing a tray color generated by the initial first tray attribute information generation model. For example, the initial first panel attribute information generation model may be a ResNet18 (residual neural network) model.

The initial second tray attribute information generation model may be a neural network model using a sample tray foreground image as input and initial second tray attribute information as output. The initial second tray attribute information may be information representing a tray shape generated by the initial second tray attribute information generation model. For example, the initial second tray attribute information generation model may be a MobileNetV2 (computer vision network) model.

The initial stitching model described above may be used to: receiving the initial first tray attribute information and the initial second tray attribute information, and adding the initial first tray attribute information and the initial second tray attribute information to an initial tray attribute information group. Wherein the initial tray attribute information set is initially empty.

And thirdly, selecting training samples from the training sample set.

And step four, inputting a sample dinner plate foreground image included in the training sample into the initial first dinner plate attribute information generation model to obtain initial first dinner plate attribute information.

In some embodiments, the execution subject may input a sample tray foreground image included in the training sample into the initial first tray attribute information generation model to obtain initial first tray attribute information.

And fifthly, inputting a sample dinner plate foreground image included in the training sample into the initial second dinner plate attribute information generation model to obtain initial second dinner plate attribute information.

In some embodiments, the executing body may input a sample dinner plate foreground image included in the training sample into the initial second dinner plate attribute information generating model to obtain initial second dinner plate attribute information.

And sixthly, inputting the initial first dinner plate attribute information and the initial second dinner plate attribute information into the initial splicing model to obtain an initial dinner plate attribute information set.

In some embodiments, the executing entity may input the initial first tray attribute information and the initial second tray attribute information into the initial stitching model to obtain an initial tray attribute information set.

And seventhly, determining an attribute difference value between the initial dinner plate attribute information set and a sample dinner plate attribute information set included in the training sample based on a preset attribute loss function.

In some embodiments, the executing entity may determine an attribute difference value between the initial tray attribute information set and a sample tray attribute information set included in the training sample based on a predetermined attribute loss function. The preset attribute loss function may be, but is not limited to: mean square error loss function (MSE), cross entropy loss function (cross entropy), 0-1 loss function, absolute loss function, log loss function, square loss function, exponential loss function, and the like. In practice, first, the executing entity may determine, based on the attribute loss function, a first difference value between initial first tray attribute information included in the initial tray attribute information set and first sample tray attribute information included in the sample tray attribute information set included in the training sample. Then, the executing body may determine, based on the attribute loss function, a second difference value between the initial second tray attribute information included in the initial tray attribute information set and the second sample tray attribute information included in the sample tray attribute information set included in the training sample. Finally, the execution body may determine a sum of the first variance value and the second variance value as an attribute variance value.

And eighth, in response to determining that the attribute difference value is greater than a preset attribute difference value, adjusting network parameters of the initial tray attribute information generation model.

In some embodiments, the executing entity may adjust network parameters of the initial tray attribute information generation model in response to determining that the attribute difference value is greater than a preset attribute difference value. For example, the attribute difference value and the preset attribute difference value may be differentiated. On the basis, parameters of the initial dinner plate attribute information generation model are adjusted by using methods such as back propagation, gradient descent and the like. It should be noted that the back propagation algorithm and the gradient descent method are well known techniques widely studied and applied at present, and will not be described herein. The setting of the preset attribute difference value is not limited, and for example, the preset attribute difference value may be 0.1.

In practice, in response to determining that the attribute difference value is greater than a preset attribute difference value, the execution body may perform the following adjustment sub-steps for each network parameter included in the initial tray attribute information generation model:

a first sub-step of generating an attribute gradient based on the attribute loss function. In practice, the execution subject described above may generate an attribute gradient by the following formula:

Wherein g _i Representing the attribute gradient. i represents a time step. θ _i-1 And the i-1 th adjusted network parameters corresponding to the network parameters are shown. f (f) _i (θ _i-1 ) Representing the attribute loss function described above.Represents f _i (θ _i-1 ) For theta _i-1 And (5) deriving.

And a second sub-step of generating first attribute information based on the attribute gradient and a preset first attenuation coefficient. In practice, the execution subject may generate the first attribute information by the following formula:

x _i ＝β ₁ ×x _i-1 +(1-β ₁ )×g _i 。

wherein x is _i Representing the first attribute information. Beta ₁ Representing a first attenuation coefficient set in advance. X is x _i-1 And the first attribute information which represents the i-1 th adjustment corresponding to the network parameters. For example beta ₁ May be 0.8.

And a third sub-step of generating second attribute information based on the attribute gradient and a preset second attenuation coefficient. In practice, the execution subject may generate the second attribute information by the following formula:

wherein y is _i Representing second attribute information. Beta ₂ Representing a second attenuation coefficient set in advance. y is _i-1 And second attribute information representing the i-1 st adjustment corresponding to the network parameter.G represents g _i Square of (d). For example beta ₂ May be 0.999.

And a fourth sub-step of adjusting the network parameter based on the first attribute information and the second attribute information to generate a target network parameter. In practice, based on the first attribute information and the second attribute information, the execution subject may perform adjustment processing on the network parameter by the following formula to generate a target network parameter:

Wherein,represents x _i Is used for the offset correction of the (a). />Representing y _i And (5) offset correction. />Representing beta ₁ To the power of i. />Representing beta ₂ To the power of i. θ _i Representing the target network parameters. Alpha represents a first parameter set in advance. E represents a second parameter set in advance. λ represents a third parameter set in advance. For example, α may be 0.001. E may be 10 ^-8 . Lambda may be 0.001.

And a fifth sub-step of re-executing the adjustment step by using the target network parameter as the network parameter included in the initial dinner plate attribute information generation model.

Optionally, in response to determining that the attribute difference value is less than or equal to the preset attribute difference value, determining the initial tray attribute information generation model as a trained tray attribute information generation model.

In some embodiments, the executing entity may determine the initial tray attribute information generation model as a trained tray attribute information generation model in response to determining that the attribute difference value is equal to or less than a preset attribute difference value.

The optional technical content in step 105 is taken as an invention point of the embodiment of the present disclosure, and solves the second technical problem mentioned in the background art, which "causes waste of computing resources". Factors that lead to wasted computing resources are often as follows: when the dinner plate attribute information model is called through the full gradient descent algorithm, a large amount of computing resources are consumed as the whole training sample set is used for each training. If the above factors are solved, the effect of reducing the waste of the computing resources can be achieved. To achieve this, first, an attribute gradient is generated based on the attribute loss function described above. Thereby, an attribute gradient can be obtained for subsequent generation of the first attribute information and the second attribute information. And secondly, generating first attribute information based on the attribute gradient and a preset first attenuation coefficient. Then, second attribute information is generated based on the attribute gradient and a preset second attenuation coefficient. Thereby, the first attribute information and the second attribute information can be obtained for subsequent adjustment of the network parameters. Then, the network parameters are adjusted based on the first attribute information and the second attribute information to generate target network parameters. Thereby, adjusted network parameters can be obtained. And then, taking the target network parameters as network parameters included in the initial dinner plate attribute information generation model, and executing the adjustment step again. Thus, the adjusted target network parameter can be used as the network parameter to continuously adjust the network parameter. Thus, the network parameters of the tray attribute information model may be adjusted by the above steps instead of the full-scale gradient descent algorithm. Further, one training sample in the training sample set may be used at each training instead of the entire training sample set. Thus, the waste of computing resources can be reduced.

And step 106, in response to receiving the attribute request sent by the user terminal, controlling the associated transportation equipment to transport the service plate attribute information set corresponding to the attribute request and the service plate attribute information set corresponding to the service plate to the user terminal.

In some embodiments, the executing entity may control, in response to receiving an attribute request sent by a user terminal, an associated transport device to transport the service plate attribute information set corresponding to the attribute request and the service plate attribute information set corresponding to the attribute request to the user terminal. Here, the attribute request information may characterize a dinner plate in which the user terminal wants to specify the first dinner plate attribute information and specify the second dinner plate attribute information. For example, the attribute request information may be: the user terminal wants a dinner plate with red color and square shape. The associated transportation device may be a device for transporting the meal tray to the user terminal. For example, the above-described associated transportation device may be, but is not limited to: mechanical arm, conveyer belt, lifting bracket.

With further reference to fig. 2, as an implementation of the method shown in the above figures, the present disclosure provides embodiments of a tray transporter that correspond to those shown in fig. 1, which may find particular application in a variety of electronic devices.

As shown in fig. 2, some embodiments of the tray transport 200 include: an acquisition unit 201, a first input unit 202, a fusion unit 203, a generation unit 204, a second input unit 205, and a control unit 206. Wherein the acquisition unit 201 is configured to acquire a dinner plate image; a first input unit 202 configured to input the above-described dinner plate image into a pre-trained segmentation mask information generation model, resulting in a first segmentation mask information set and a second segmentation mask information set; a fusion unit 203 configured to perform fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set; a generating unit 204 configured to generate a set of dinner plate foreground images based on the dinner plate image and the set of dinner plate foreground masks; a second input unit 205 configured to input the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, so as to obtain a dinner plate attribute information set; and a control unit 206 configured to control the associated transportation device to transport the dinner plate attribute information set corresponding to the attribute request and the corresponding dinner plate to the user terminal in response to receiving the attribute request sent by the user terminal.

It will be appreciated that the elements described in the tray transport 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and benefits described above with respect to the method are equally applicable to the tray transport device 200 and the units contained therein, and are not described in detail herein.

Referring now to FIG. 3, a schematic diagram of an electronic device (e.g., computing device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic devices in some embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM302, and the RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a dinner plate image; inputting the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set; performing fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set; generating a set of tray foreground images based on the tray image and the set of tray foreground masks; inputting the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, and obtaining a dinner plate attribute information set; and in response to receiving the attribute request sent by the user terminal, controlling the associated transportation equipment to transport the service plate attribute information set corresponding to the attribute request and the service plate attribute information set corresponding to the service plate to the user terminal.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a first input unit, a fusion unit, a generation unit, a second input unit, and a control unit. The names of these units do not constitute a limitation on the unit itself in some cases, and the acquisition unit may also be described as "acquiring a dinner plate image", for example.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A method of transporting a dinner plate, comprising:

acquiring a dinner plate image;

inputting the dinner plate image into a pre-trained segmentation mask information generation model to obtain a first segmentation mask information set and a second segmentation mask information set;

performing fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set;

generating a set of tray foreground images based on the tray images and the set of tray foreground masks;

inputting the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, and obtaining a dinner plate attribute information set;

and in response to receiving the attribute request sent by the user terminal, controlling the associated transportation equipment to transport the service plate attribute information set corresponding to the attribute request and the service plate attribute information set corresponding to the attribute request to the user terminal.

2. The method of claim 1, wherein the fusing the first and second segmentation mask information sets to generate a meal tray foreground mask set comprises:

for each first split mask information in the first split mask information set, for each second split mask information in the second split mask information set, removing the second split mask information from the first split mask information in response to determining that the first split mask information and the second split mask information meet a preset corresponding condition, thereby obtaining a dinner plate foreground mask;

The resulting individual tray foreground masks are determined to be a tray foreground mask set.

3. The method of claim 1, wherein the generating a set of tray foreground images based on the tray image and the set of tray foreground masks comprises:

selecting a tray foreground image from the tray images based on the tray foreground masks for each tray foreground mask in the tray foreground mask set;

and determining each selected dinner plate foreground image as a dinner plate foreground image set.

4. The method of claim 1, wherein the pre-trained segmentation mask information generation model is trained by:

obtaining a training sample set, wherein training samples in the training sample set comprise: a sample tray image;

labeling each sample dinner plate image included in the training sample set to generate sample segmentation mask information to obtain a sample segmentation mask information set, wherein the sample segmentation mask information in the sample segmentation mask information set comprises: a sample first segmentation mask information set and a sample second segmentation mask information set;

determining an initial segmentation mask information generation model, wherein the initial segmentation mask information generation model comprises: an initial first segmentation mask information generation model and an initial second segmentation mask information generation model;

Selecting a training sample from the training sample set;

inputting a sample dinner plate image included in the training sample into the initial first segmentation mask information generation model to obtain an initial first segmentation mask information set;

inputting a sample dinner plate image included in the training sample into the initial second segmentation mask information generation model to obtain an initial second segmentation mask information set;

determining a first segmentation difference value between the initial first segmentation mask information set and a first segmentation mask information set of samples included in sample segmentation mask information corresponding to the training samples based on a preset first segmentation loss function;

determining a second segmentation difference value between the initial second segmentation mask information set and a second segmentation mask information set included by sample segmentation mask information corresponding to the training sample based on a preset second segmentation loss function;

and adjusting network parameters of the initial segmentation mask information generation model in response to determining that the first segmentation difference value and the second segmentation difference value meet a preset segmentation condition.

5. The method of claim 4, wherein the method further comprises:

In response to determining that the first segmentation difference value and the second segmentation difference value do not meet the preset segmentation difference condition, determining the initial segmentation mask information generation model as a trained segmentation mask information generation model.

6. The method of claim 1, wherein the pre-trained tray attribute information generation model is trained by:

obtaining a training sample set, wherein training samples in the training sample set comprise: a sample tray foreground image and a sample tray attribute information set;

determining an initial tray attribute information generation model, wherein the tray attribute information generation model comprises: an initial first dinner plate attribute information generation model, an initial second dinner plate attribute information generation model and an initial splicing model;

selecting a training sample from the training sample set;

inputting a sample dinner plate foreground image included in the training sample into the initial first dinner plate attribute information generation model to obtain initial first dinner plate attribute information;

inputting a sample dinner plate foreground image included in the training sample into the initial second dinner plate attribute information generation model to obtain initial second dinner plate attribute information;

Inputting the initial first dinner plate attribute information and the initial second dinner plate attribute information into the initial splicing model to obtain an initial dinner plate attribute information set;

determining an attribute difference value between the initial dinner plate attribute information set and a sample dinner plate attribute information set included in the training sample based on a preset attribute loss function;

and in response to determining that the attribute difference value is greater than a preset attribute difference value, adjusting network parameters of the initial dinner plate attribute information generation model.

7. The method of claim 6, wherein the method further comprises:

and determining the initial dinner plate attribute information generation model as a trained dinner plate attribute information generation model in response to determining that the attribute difference value is smaller than or equal to the preset attribute difference value.

8. A tray transport device comprising:

an acquisition unit configured to acquire a dinner plate image;

a first input unit configured to input the dinner plate image into a pre-trained segmentation mask information generation model, resulting in a first segmentation mask information set and a second segmentation mask information set;

a fusion unit configured to perform fusion processing on the first segmentation mask information set and the second segmentation mask information set to generate a dinner plate foreground mask set;

A generating unit configured to generate a set of tray foreground images based on the tray image and the set of tray foreground masks;

the second input unit is configured to input the dinner plate foreground images in the dinner plate foreground image set into a pre-trained dinner plate attribute information generation model to generate a dinner plate attribute information set, so as to obtain a dinner plate attribute information set;

and the control unit is configured to control the associated transportation equipment to transport the dinner plate attribute information set corresponding to the attribute request and the corresponding dinner plate to the user terminal in response to receiving the attribute request sent by the user terminal.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.

10. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-7.