CN102708567B

CN102708567B - Visual perception-based three-dimensional image quality objective evaluation method

Info

Publication number: CN102708567B
Application number: CN201210144039.1A
Authority: CN
Inventors: 邵枫; 顾珊波; 郁梅; 蒋刚毅; 李福翠
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2012-05-11
Filing date: 2012-05-11
Publication date: 2014-12-10
Anticipated expiration: 2032-05-11
Also published as: CN102708567A

Abstract

The invention discloses a visual perception-based three-dimensional image quality objective evaluation method which comprises the steps of: firstly, obtaining an objective evaluation measurement value of each pixel point through calculating a local phase characteristic and a local amplitude characteristic of each pixel point in left and right viewpoint images of a three-dimensional image, dividing the three-dimensional image into a shielding region, a binocular inhibition region and a binocular fusion region by a region detection method; and secondly, evaluating all regions respectively and fusing evaluation results to obtain a final image quality objective evaluation prediction value. The visual perception-based three-dimensional image quality objective evaluation method has the advantages that the obtained information for reflecting local phase and local amplitude characteristics has strong stability and can well reflect quality change condition of the three-dimensional image, and the image quality evaluation method of the shielding region, the binocular inhibition region and the binocular fusion region can well reflect the visual perception characteristic of the human visual system, and is effectively improved in correlation of objective evaluation result and subjective perception.

Description

Stereoscopic image quality objective evaluation method based on visual perception

Technical Field

The invention relates to an image quality evaluation method, in particular to a stereoscopic image quality objective evaluation method based on visual perception.

Background

With the rapid development of image coding technology and stereoscopic display technology, the stereoscopic image technology has received more and more extensive attention and application, and has become a current research hotspot. The stereo image technology utilizes the binocular parallax principle of human eyes, the left and right viewpoint images from the same scene are respectively and independently received by binoculars, and binocular parallax is formed through brain fusion, so that the stereo image with depth feeling and reality feeling is appreciated. Due to the influence of an acquisition system and storage compression and transmission equipment, a series of distortions are inevitably introduced into the stereo image, and compared with a single-channel image, the stereo image needs to ensure the image quality of two channels simultaneously, so that the quality evaluation of the stereo image is of great significance. However, currently, there is no effective objective evaluation method for evaluating the quality of stereoscopic images. Therefore, establishing an effective objective evaluation model of the quality of the stereo image has very important significance.

The existing three-dimensional image quality objective evaluation method directly applies the plane image quality evaluation method to the evaluation of the three-dimensional image quality, however, the process of fusing the left and right viewpoint images of the three-dimensional image to generate the three-dimensional effect is not a simple process of superposing the left and right viewpoint images and is difficult to express by a simple mathematical method, so that how to effectively simulate the binocular three-dimensional fusion in the three-dimensional image quality evaluation process and how to modulate the objective evaluation result according to the visual masking characteristic of human eyes so that the objective evaluation result is more in line with the human visual system is a problem which needs to be researched and solved in the process of evaluating the three-dimensional image quality objectively.

Disclosure of Invention

The invention aims to solve the technical problem of providing a stereoscopic image quality objective evaluation method based on visual perception, which can effectively improve the correlation between objective evaluation results and subjective perception.

The technical scheme adopted by the invention for solving the technical problems is as follows: a stereoscopic image quality objective evaluation method based on visual perception is characterized by comprising the following steps:

making S_orgFor original undistorted stereo image, let S_disFor the distorted stereo image to be evaluated, S_orgIs noted as { L_org(x, y) }, adding S_orgIs noted as { R_org(x, y) }, adding S_disIs noted as { L_dis(x, y) }, adding S_disIs noted as { R_dis(x, y) }, wherein (x, y) denotes a coordinate position of a pixel point in the left viewpoint image and the right viewpoint image, x is 1. ltoreq. x.ltoreq.W, y is 1. ltoreq. y.ltoreq.H, W denotes a width of the left viewpoint image and the right viewpoint image, H denotes a height of the left viewpoint image and the right viewpoint image, L is L_org(x, y) represents { L }_orgThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_org(x, y) represents { R_orgThe pixel value L of the pixel point with the coordinate position (x, y) in (x, y) } is_dis(x, y) represents { L }_disThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_dis(x, y) represents { R_disThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

secondly, respectively extracting { L ] by using visual masking effect of human stereoscopic vision perception on background illumination and contrast_dis(x, y) } binocular minimum perceivable change image and { R }_dis(x, y) } binocular minimum perceivable change images, respectively noted asAndwherein,represents { L_dis(x, y) } binocular minimum perceivable change imageThe middle coordinate position is the pixel value of the pixel point of (x, y),represents { R_dis(x, y) } binocular minimum perceivable change imageThe middle coordinate position is the pixel value of the pixel point of (x, y);

utilizing area detection algorithm to respectively obtain { L_dis(x, y) } and { R }_disThe region type of each pixel point in (x, y) } is marked as p, wherein p belongs to {1,2,3}, p =1 represents an occlusion region, p =2 represents a binocular suppression region, and p =3 represents a binocular fusion region; then will { L_disThe sheltering area formed by all pixel points with the area type p =1 in (x, y) } is recorded asWill { L_disThe binocular suppression area formed by all pixel points with the area type p =2 in (x, y) } is recorded as a binocular suppression areaWill { L_disThe binocular fusion region formed by all pixel points with the region type p =3 in (x, y) } is recorded as a binocular fusion regionWill { R_disThe sheltering area formed by all pixel points with the area type p =1 in (x, y) } is recorded asWill { R_disThe binocular suppression area formed by all pixel points with the area type p =2 in (x, y) } is recorded as a binocular suppression areaWill { R_disThe binocular fusion region formed by all pixel points with the region type p =3 in (x, y) } is recorded as a binocular fusion region

Separately calculating { L }_org(x,y)}、{R_org(x,y)}、{L_dis(x, y) } and { R }_disThe local phase characteristic and the local amplitude characteristic of each pixel in (x, y) } will be { L_orgThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwill { R_orgThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwill { L_disThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwill { R_disThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwherein,represents { L_orgThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { L_orgThe local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y),represents { R_orgThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { R_orgThe local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y),represents { L_disThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { L_disThe local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y),represents { R_disThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { R_disThe local amplitude characteristic of a pixel point with the coordinate position (x, y) in the (x, y) };

according to { L_org(x, y) } and { L_disCalculating the local phase characteristic and the local amplitude characteristic of each pixel point in (x, y) } according to the local phase characteristic and the local amplitude characteristic_dis(x, y) } the objective evaluation metric for each pixel point, will be { L_disThe objective evaluation metric values of all the pixel points in (x, y) are collectively expressed as { Q }_L(x,y)}，

According to { R_org(x, y) } and { R }_disCalculating the local phase characteristic and the local amplitude characteristic of each pixel point in (x, y) } according to the local phase characteristic and the local amplitude characteristic of each pixel point in the (x, y) } matrix, and calculating the { R_dis(x, y) } the objective evaluation metric for each pixel point, will be { R }^disThe objective evaluation metric values of all the pixel points in (x, y) are collectively expressed as { Q }_R(x,y)}，

Wherein Q is_L(x, y) represents { L }_dis(x, y) the objective evaluation metric value of the pixel point with coordinate position (x, y),

Q_R(x, y) represents { R_dis(x, y) the objective evaluation metric value of the pixel point with coordinate position (x, y),

w_LP、w_LAand b is a training parameter, T₁And T₂Is a control parameter;

according to { L_dis(x, y) } and { R }_dis(x, y) and calculating S by using the visual perception characteristic of the human visual system to the sheltered area_disThe objective evaluation metric of the occlusion region in (1) is marked as Q_nc，

<math> <mrow> <msub> <mi>Q</mi> <mi>nc</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>L</mi> <mi>nc</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>+</mo> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>nc</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>R</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>N</mi> <mi>L</mi> <mi>nc</mi> </msubsup> <mo>+</mo> <msubsup> <mi>N</mi> <mi>R</mi> <mi>nc</mi> </msubsup> </mrow> </mfrac> <mo>,</mo> </mrow> </math>

Wherein,represents { L_dis(x, y) } the number of pixel points with the area type of p =1,represents { R_dis(x, y) } the number of pixel points with the area type of p = 1;

is according to { L_dis(x, y) } and { R }_dis(x, y) and calculating S by using the visual perception characteristics of the human visual system to the binocular inhibition area_disThe objective evaluation metric of the binocular suppression area in (1) is marked as Q_bs，

Q_{bs} = \max (Q_{L}^{bs}, Q_{R}^{bs}),

Where max () is a function taking the maximum value,

<math> <mrow> <msubsup> <mi>Q</mi> <mi>L</mi> <mi>bs</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>L</mi> <mi>bs</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>L</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>L</mi> <mi>bs</mi> </msubsup> </mrow> </munder> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>L</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow> </math>

<math> <mrow> <msubsup> <mi>Q</mi> <mi>R</mi> <mi>bs</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bs</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>R</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bs</mi> </msubsup> </mrow> </munder> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow> </math>

according to { L_dis(x, y) } and { R }_dis(xY) and calculating S by using the visual perception characteristics of the human visual system to the binocular fusion area_disThe objective evaluation metric value of the binocular fusion area in (1) is marked as Q_bf，

Wherein,

<math> <mrow> <msubsup> <mi>Q</mi> <mi>L</mi> <mi>bf</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>L</mi> <mi>bf</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>L</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>L</mi> <mi>bf</mi> </msubsup> </mrow> </munder> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>L</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow> </math>

<math> <mrow> <msubsup> <mi>Q</mi> <mi>R</mi> <mi>bf</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bf</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>R</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bf</mi> </msubsup> </mrow> </munder> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow> </math>

ninthly to S_disObjective evaluation metric Q of occlusion region in (1)_nc、S_disObjective evaluation metric Q of the binocular suppression area in (1)_bsAnd S_disObjective evaluation metric value Q of binocular fusion area in (1)_bfCarrying out fusion to obtain S_disThe predicted value of the objective evaluation of image quality is marked as Q, Q = w_nc×Q_nc+w_bs×Q_bs+w_bf×Q_bfWherein w is_nc、w_bfAnd w_bsAre weighting parameters.

The concrete process of the second step is as follows:

② 1, calculating { L_disVisualization threshold set of luminance masking effect of (x, y) }, denoted as { T }_l(x,y)}，Wherein, T_l(x, y) represents { L }_disThe coordinate position in (x, y) is the visual threshold value of the brightness masking effect of the pixel point of (x, y), bg_l(x, y) represents { L }_disAll pixel points in a 5 × 5 window centered on a pixel point whose coordinate position is (x, y) in (x, y) } are within a window of 5 × 5Average value of brightness of;

2, calculating { L_disVisualization threshold set of contrast masking effect of (x, y) }, denoted as { T }_c(x,y)}，T_c(x,y)=K(bg_l(x,y))+eh_l(x, y) wherein T_c(x, y) represents { L }_disThe coordinate position in (x, y) is the visual threshold value of the contrast masking effect of the pixel point of (x, y), eh_l(x, y) represents the pair { L }_disThe pixel points with the coordinate positions (x, y) in (x, y) are respectively subjected to edge filtering in the horizontal direction and the vertical direction to obtain an average gradient value K (bg)_l(x,y))=-10^-6×(0.7×bg_l(x,y)²+32×bg_l(x,y))+0.07；

2- (3) pairs of { L_disVisualization threshold set of luminance masking effects of (x, y) } { T_l(x, y) } and a visual threshold set of contrast masking effects { T }_c(x, y) } to obtain { L_dis(x, y) } binocular minimum perceivable change image, noted

J_{L}^{dis} (x, y) = T_{l} (x, y) + T_{c} (x, y);

② 4, calculating { R_disVisualization threshold set of luminance masking effect of (x, y) }, denoted as { T }_r(x,y)}，Wherein, T_r(x, y) represents { R_disThe coordinate position in (x, y) is the visual threshold value of the brightness masking effect of the pixel point of (x, y), bg_r(x, y) represents { R_disAverage brightness values of all pixel points in a 5 multiplied by 5 window with the pixel point with the coordinate position as (x, y) as the center in (x, y) };

② 5, calculating { R_disVisualization threshold set of contrast masking effect of (x, y) }, denoted as { T }_c′(x,y)}，T_c′(x,y)=K(bg_r(x,y))+eh_r(x, y) wherein T_c' (x, y) denotes { R_disThe coordinate position in (x, y) is the visual threshold value of the contrast masking effect of the pixel point of (x, y), eh_r(x, y) represents the pair { R_disThe pixel points with the coordinate positions (x, y) in (x, y) are respectively subjected to edge filtering in the horizontal direction and the vertical direction to obtain an average gradient value K (bg)_r(x,y))=-10^-6×(0.7×bg_r(x,y)²+32×bg_r(x,y))+0.07；

2- (6) pairs of { R_disVisualization threshold set of luminance masking effects of (x, y) } { T_r(x, y) } and a visual threshold set of contrast masking effects { T }_c' (x, y) } to obtain { R_dis(x, y) } binocular minimum perceivable change image, noted

In the step III, the region detection algorithm is utilized to respectively obtain { L }_dis(x, y) } and { R }_disThe specific process of the area type of each pixel point in (x, y) } is as follows:

③ 1, calculating by adopting a block matching method_org(x, y) } and { R }_org(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y);

③ 2, calculating by adopting a block matching method_dis(x, y) } and { R }_dis(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y);

③ 3, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf 255, if so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) Region type of pixel pointMarked as p =1, then step c-6 is performed, otherwise step c-4 is performed, where x is greater than or equal to 1₁≤W,1≤y₁≤H；

③ 4, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not greater thanThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p =2, then step (c-6) is executed, otherwise, step (c-5) is executed;

③ 5, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not it is less than or equal toThe pixel value of the pixel point with the middle coordinate position (x1, y1)If so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) The region type of the pixel point of (1) is marked as p = 3;

③ 6, return to the step of ③ 3 to continue to determine L_dis(x, y) } of the residueUntil { L }_disDetermining the region types of all pixel points in (x, y) };

thirdly-7, calculating the R by adopting a block matching method_org(x, y) } and { L_org(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y);

thirdly-8, calculating R by adopting a block matching method_dis(x, y) } and { L_dis(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y);

③ 9, judgmentThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf 255, then { R } will be_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p =1, then step- (c-12) is executed, otherwise, step- (c-10) is executed, wherein x is more than or equal to 1₁≤W,1≤y₁≤H；

③ 10, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not greater thanThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { R }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p =2, then step c-12 is executed, otherwise step c-11 is executed;

③ 11, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not it is less than or equal toThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { R }_disThe coordinate position in (x, y) } is (x)₁,y₁) The region type of the pixel point of (1) is marked as p = 3;

③ 12, return to the step of ③ -9 to continue to determine R_dis(x, y) the area type of the remaining pixel points in the (x, y) } until the { R } is reached_disAnd (x, y) finishing determining the area types of all pixel points in the (x, y) }.

In the step (iv) { L }_org(x,y)}、{R_org(x,y)}、{L_dis(x, y) } and { R }_disThe acquisition process of the local phase characteristic and the local amplitude characteristic of each pixel point in (x, y) } comprises the following steps:

tetra-1, pair { L_orgCarrying out phase consistency transformation on each pixel point in (x, y) to obtain { L }_orgEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are converted into { L }_orgEven symmetric frequency responses of pixel points with coordinate positions (x, y) in different scales and directions are recorded as e_α,θ(x, y) will { L_orgThe odd symmetric frequency response of the pixel point with the coordinate position (x, y) in different scales and directions is marked as o_α，θ(x, y), wherein alpha represents a scale factor of the filter, 1 ≦ alpha ≦ 4, theta represents a direction factor of the filter, 1 ≦ theta ≦ 4;

fourthly-2, calculating { L_orgThe phase consistency characteristics of each pixel point in (x, y) in different directions are shown as L_orgThe phase consistency characteristics of pixel points with coordinate positions (x, y) in different directions in (x, y) are marked as PC_θ(x,y)，

Wherein,

fourthly-3, according to { L_orgCalculating the direction corresponding to the maximum phase consistency characteristic of each pixel point in (x, y) } and calculating the { L_org(x, y) } local phase features and local amplitude features for each pixel point, for { L_orgThe pixel point with (x, y) coordinate position in (x, y) } is firstly found out the phase consistency characteristic PC of the pixel point in different directions_θFinding out the direction corresponding to the maximum phase consistency feature in (x, y), and marking as the directionθ_mAgain according to theta_mCalculation of { L_orgThe local phase characteristic and the local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y) } are respectively and correspondingly marked asAnd

<math> <mrow> <msubsup> <mi>LP</mi> <mi>L</mi> <mi>org</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>arctan</mi> <mrow> <mo>(</mo> <msub> <mi>H</mi> <msub> <mi>θ</mi> <mi>m</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>,</mo> <msub> <mi>F</mi> <msub> <mi>θ</mi> <mi>m</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

wherein,

represents { L_org(x, y) in the (x, y) } the pixel point with the coordinate position of (x, y) is in the direction theta corresponding to different scales and the maximum phase consistency characteristic_mThe even-symmetric frequency response of the frequency domain,represents { L_org(x, y) in the (x, y) } the pixel point with the coordinate position of (x, y) is in the direction theta corresponding to different scales and the maximum phase consistency characteristic_mThe arctan () is an inverted cosine function;

fourthly-4, obtaining { L ] according to the steps from the fourth step-1 to the fourth step-3_orgOperation of local phase feature and local amplitude feature of each pixel in (x, y) } obtains { R } in the same manner_org(x,y)}、{L_dis(x, y) } and { R }_disLocal phase features and local amplitude features of each pixel in (x, y) }.

Compared with the prior art, the invention has the advantages that:

1) the method provided by the invention considers that different areas have different responses to the stereo perception, divides the stereo image into the shielding area, the binocular inhibition area and the binocular fusion area, carries out evaluation respectively, and fuses evaluation results to obtain the final evaluation score, so that the evaluation results are more perceptually in line with the human visual system.

2) The method of the invention obtains the objective evaluation metric value of each pixel point by calculating the local phase characteristic and the local amplitude characteristic of each pixel point in the left and right viewpoint images of the stereo image, and effectively improves the correlation between the objective evaluation result and the subjective perception because the obtained local phase characteristic and the local amplitude characteristic have stronger stability and can better reflect the quality change condition of the stereo image.

3) According to the method, the binocular minimum perceptible change image is obtained according to the stereoscopic vision characteristics of human eyes, and objective evaluation metric values of all pixel points in the occlusion area, the binocular inhibition area and the binocular fusion area are weighted to different degrees, so that the evaluation result is more in line with a human vision system, and the correlation between the objective evaluation result and subjective perception is improved.

Drawings

FIG. 1 is a block diagram of an overall implementation of the method of the present invention;

fig. 2a is a left viewpoint image of Akko (640 × 480 size) stereo image;

fig. 2b is a right viewpoint image of an Akko (size 640 × 480) stereoscopic image;

fig. 3a is a left viewpoint image of an altmobit (size 1024 × 768) stereoscopic image;

fig. 3b is a right view image of an altmobit (size 1024 × 768) stereoscopic image;

fig. 4a is a left viewpoint image of a balloon (size 1024 × 768) stereoscopic image;

fig. 4b is a right viewpoint image of a balloon (size 1024 × 768) stereoscopic image;

fig. 5a is a left viewpoint image of a Doorflower (size 1024 × 768) stereoscopic image;

fig. 5b is a right viewpoint image of a Doorflower (size 1024 × 768) stereoscopic image;

fig. 6a is a left view image of a Kendo (size 1024 × 768) stereoscopic image;

fig. 6b is a right view image of a Kendo (size 1024 × 768) stereoscopic image;

fig. 7a is a left view image of a LeaveLaptop (size 1024 × 768) stereoscopic image;

fig. 7b is a right view image of a LeaveLaptop (size 1024 × 768) stereoscopic image;

fig. 8a is a left viewpoint image of a lovedual 1 (size 1024 × 768) stereoscopic image;

fig. 8b is a right viewpoint image of a lovedual 1 (size 1024 × 768) stereoscopic image;

fig. 9a is a left view image of a newsapper (size 1024 × 768) stereoscopic image;

fig. 9b is a right view image of a newsapper (size 1024 × 768) stereoscopic image;

FIG. 10a is a left viewpoint image of Puppy (size 720 × 480) stereo image;

FIG. 10b is a right viewpoint image of Puppy (size 720 × 480) stereoscopic image;

fig. 11a is a left viewpoint image of a Soccer2 (size 720 × 480) stereoscopic image;

fig. 11b is a right viewpoint image of a Soccer2 (size 720 × 480) stereoscopic image;

fig. 12a is a left viewpoint image of a Horse (size 720 × 480) stereoscopic image;

fig. 12b is a right view image of a Horse (size 720 × 480) stereoscopic image;

fig. 13a is a left viewpoint image of an Xmas (size 640 × 480) stereoscopic image;

fig. 13b is a right view image of an Xmas (size 640 × 480) stereoscopic image;

fig. 14 is a scatter plot of the difference between the objective evaluation prediction value of image quality and the average subjective score of each distorted stereoscopic image in the set of distorted stereoscopic images.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

The invention provides a stereoscopic image quality objective evaluation method based on visual perception, the overall implementation block diagram of which is shown in figure 1, and the method comprises the following steps:

making S_orgFor original undistorted stereo image, let S_disFor the distorted stereo image to be evaluated, S_orgIs noted as { L_org(x, y) }, adding S_orgIs noted as { R_org(x, y) }, adding S_disIs noted as { L_dis(x, y) }, adding S_disIs noted as { R_dis(x, y) }, wherein (x, y) denotes a coordinate position of a pixel point in the left viewpoint image and the right viewpoint image, x is 1. ltoreq. x.ltoreq.W, y is 1. ltoreq. y.ltoreq.H, W denotes a width of the left viewpoint image and the right viewpoint image, H denotes a height of the left viewpoint image and the right viewpoint image, L is L_org(x, y) represents { L }_orgThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_org(x, y) represents { R_orgThe pixel value L of the pixel point with the coordinate position (x, y) in (x, y) } is_dis(x, y) represents { L }_disThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_dis(x, y) represents { R_disAnd the coordinate position in the (x, y) is the pixel value of the pixel point of (x, y).

The human visual characteristics indicate that the human eye is imperceptible to a property or noise that changes less in the image unless the intensity of the change of the property or noise exceeds a threshold, which is the minimum perceptible distortion (JND). However, the visual masking effect of human eyes is a local effect, which is influenced by background illumination, texture complexity and other factors, and the brighter the background is, the more complex the texture is, and the higher the threshold value is. Therefore, the invention respectively extracts L by using the visual masking effect of human stereoscopic vision perception on background illumination and contrast_dis(x, y) } binocular minimum perceivable change image and { R }_dis(x, y) } binocular minimum perceivable change images, respectively noted asAndwherein,represents { L_dis(x, y) } binocular minimum perceivable change imageThe middle coordinate position is the pixel value of the pixel point of (x, y),represents { R_dis(x, y) } binocular minimum perceivable change imageThe middle coordinate position is the pixel value of the pixel point of (x, y).

In this embodiment, the specific process of step two is:

② 1, calculating { L_disLuminance masking effect of (x, y) } and luminance masking effect of the sameIs marked as { T }_l(x,y)}，Wherein, T_l(x, y) represents { L }_disThe coordinate position in (x, y) is the visual threshold value of the brightness masking effect of the pixel point of (x, y), bg_l(x, y) represents { L }_disAnd (x, y) } average brightness of all pixels in a 5 × 5 window with the pixel with the coordinate position (x, y) as the center.

2, calculating { L_disVisualization threshold set of contrast masking effect of (x, y) }, denoted as { T }_c(x,y)}，T_c(x,y)=K(bg_l(x,y))+eh_l(x, y) wherein T_c(x, y) represents { L }_disThe coordinate position in (x, y) is the visual threshold value of the contrast masking effect of the pixel point of (x, y), eh_l(x, y) represents the pair { L }_disThe pixel points with the coordinate positions (x, y) in (x, y) are respectively subjected to edge filtering in the horizontal direction and the vertical direction to obtain an average gradient value K (bg)_l(x,y))=-10^-6×(0.7×bg_l(x,y)²+32×bg_l(x,y))+0.07。

J_{L}^{dis} (x, y) = T_{l} (x, y) + T_{c} (x, y) .

② 4, calculating { R_disVisualization threshold set of luminance masking effect of (x, y) }, denoted as { T }_r(x,y)}，Wherein, T_r(x, y) represents { R_disThe coordinate position in (x, y) is the visual threshold value of the brightness masking effect of the pixel point of (x, y), bg_r(x, y) represents { R_disAnd (x, y) } average brightness of all pixels in a 5 × 5 window with the pixel with the coordinate position (x, y) as the center.

② 5, calculating { R_disVisualization threshold set of contrast masking effect of (x, y) }, denoted as { T }_c′(x,y)}，T_c′(x,y)=K(bg_r(x,y))+eh_r(x, y) wherein T_c' (x, y) denotes { R_disThe coordinate position in (x, y) is the visual threshold value of the contrast masking effect of the pixel point of (x, y), eh_r(x, y) represents the pair { R_disThe pixel points with the coordinate positions (x, y) in (x, y) are respectively subjected to edge filtering in the horizontal direction and the vertical direction to obtain an average gradient value K (bg)_r(x,y))=-10^-6×(0.7×bg_r(x,y)²+32×bg_r(x,y))+0.07。

In this embodiment, step three uses region detection algorithm to obtain { L }_dis(x, y) } and { R }_disThe specific process of the area type of each pixel point in (x, y) } is as follows:

③ 1, calculating by adopting a block matching method_org(x, y) } and { R }_org(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y).

③ 2, calculating by adopting a block matching method_dis(x, y) } and { R }_dis(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y).

③ 3, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf 255, if so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p =1, then step- (c-6) is executed, otherwise, step- (c-4) is executed, wherein x is more than or equal to 1₁≤W,1≤y₁≤H。

③ 4, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not greater thanThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point of (1) is marked as p =2, then step (c-6) is executed, otherwise, step (c-5) is executed.

③ 5, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not it is less than or equal toThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) The region type of the pixel point of (1) is marked as p = 3.

③ 6, return to the step of ③ 3 to continue to determine L_dis(x, y) } the region types of the remaining pixel points, up to { L }_disAnd (x, y) finishing determining the area types of all pixel points in the (x, y) }.

Thirdly-7, calculating the R by adopting a block matching method_org(x, y) } and { L_org(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y).

Thirdly-8, calculating R by adopting a block matching method_dis(x, y) } and { L_dis(x, y) } parallax images, noted asWherein,to representThe middle coordinate position is the pixel value of the pixel point of (x, y).

③ 9, judgmentThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf 255, then { R } will be_disThe coordinate position in (x, y) } is (x)₁,y₁) Is marked as p =1, and then the step is executed③ 12, otherwise, executing the step of ③ 10, wherein, x is less than or equal to 1₁≤W,1≤y₁≤H。

③ 10, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not greater thanThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { R }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point of (1) is marked as p =2, then step (c-12) is performed, otherwise, step (c-11) is performed.

③ 11, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not it is less than or equal toThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { R }_disThe coordinate position in (x, y) } is (x)₁,y₁) The region type of the pixel point of (1) is marked as p = 3.

③ 12, return to the step of ③ -9 to continue to determine R_disThe remaining pixels in (x, y) } areRegion type of point, up to { R_disAnd (x, y) finishing determining the area types of all pixel points in the (x, y) }.

Here, the block matching method is a conventional classical block matching method, and a basic idea thereof is to divide an image into small blocks, and for each small block of a left-viewpoint image (right-viewpoint image), a small block having the largest correlation is searched for in a right-viewpoint image (left-viewpoint image), and a spatial displacement amount between the two small blocks is a parallax.

Separately calculating { L }_org(x,y)}、{R_org(x,y)}、{L_dis(x, y) } and { R }_disThe local phase characteristic and the local amplitude characteristic of each pixel in (x, y) } will be { L_orgThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwill { R_orgThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwill { L_disThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwill { R_disThe local phase characteristics and the local amplitude characteristics of all pixel points in (x, y) } are respectively expressed as setsAndwherein,represents { L_orgThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { L_orgThe local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y),represents { R_orgThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { R_orgThe local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y),represents { L_disThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { L_disThe local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y),represents { R_disThe coordinate position in (x, y) is the local phase characteristic of the pixel point of (x, y),represents { R_disAnd (x, y) the local amplitude characteristic of the pixel point with the coordinate position (x, y).

In this embodiment, { L ] in step (iv)_org(x,y)}、{R_org(x,y)}、{L_dis(x, y) } and { R }_disThe acquisition process of the local phase characteristic and the local amplitude characteristic of each pixel point in (x, y) } comprises the following steps:

tetra-1, pair { L_orgCarrying out phase consistency transformation on each pixel point in (x, y) to obtain { L }_orgEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are converted into { L }_orgEven symmetric frequency responses of pixel points with coordinate positions (x, y) in different scales and directions are recorded as e_α,θ(x, y) will { L_orgThe odd symmetric frequency response of the pixel point with the coordinate position (x, y) in different scales and directions is marked as o_α，θ(x, y), wherein alpha represents the scale factor of the filter, 1 ≦ alpha ≦ 4, theta represents the direction factor of the filter, and 1 ≦ theta ≦ 4.

Wherein,

fourthly-3, according to { L_orgCalculating the direction corresponding to the maximum phase consistency characteristic of each pixel point in (x, y) } and calculating the { L_org(x, y) } local phase features and local amplitude features for each pixel point, for { L_orgThe pixel point with (x, y) coordinate position in (x, y) } is firstly found out the phase consistency characteristic PC of the pixel point in different directions_θFinding out the direction corresponding to the maximum phase consistency characteristic in (x, y), and marking as theta_mAgain according to theta_mCalculation of { L_orgLocal phase of pixel point with coordinate position (x, y) in (x, y) }Features and local amplitude features, respectivelyAnd

wherein,

represents { L_org(x, y) in the (x, y) } the pixel point with the coordinate position of (x, y) is in the direction theta corresponding to different scales and the maximum phase consistency characteristic_mThe even-symmetric frequency response of the frequency domain,represents { L_org(x, y) in the (x, y) } the pixel point with the coordinate position of (x, y) is in the direction theta corresponding to different scales and the maximum phase consistency characteristic_mThe odd symmetric frequency response of (1), arctan () is an inverted cosine function.

According to { R_org(x, y) } and { R }_disCalculating the local phase characteristic and the local amplitude characteristic of each pixel point in (x, y) } according to the local phase characteristic and the local amplitude characteristic of each pixel point in the (x, y) } matrix, and calculating the { R_dis(x, y) } the objective evaluation metric for each pixel point, will be { R }_disThe objective evaluation metric values of all the pixel points in (x, y) are collectively expressed as { Q }_R(x,y)}，

w_LP、w_LAand b is a training parameter, T₁And T₂Are control parameters.

In this embodiment, w is taken according to the different effects of the local phase characteristic and the local amplitude characteristic on the stereo image quality variation_LP=0.9834、w_LA=0.2915, b =0, take T₁=0.85、T₂=160。

And sixthly, the occlusion area is mainly formed by pixels with unsuccessful parallax matching in the distorted stereo image and comprises an occlusion area of the left viewpoint image and an occlusion area of the right viewpoint image, and the perception of a human visual characteristic system to the occlusion areas mainly plays a main role in monocular vision. Thus the present invention is based on { L }_dis(x, y) } and { R }_dis(x, y) and calculating S by using the visual perception characteristic of the human visual system to the sheltered area_disThe objective evaluation metric of the occlusion region in (1) is marked as Q_nc，

Wherein,represents { L_dis(x, y) } the number of pixel points with the area type of p =1,represents { R_dis(x, y) } the number of pixel points with the area type of p = 1.

The characteristics of the human visual system indicate that if the difference between the image contents of the corresponding areas of the retina of the left eye and the retina of the right eye is large or the parallax value between the two areas is large, the human visual system cannot perform binocular fusion operation on the information of conflict between the two eyes, the binocular masking processing is performed, and the high-quality viewpoints inhibit the low-quality viewpoints in the binocular masking processing process. Thus the present invention is based on { L }_dis(x, y) } and { R }_dis(x, y) and calculating S by using the visual perception characteristics of the human visual system to the binocular inhibition area_disThe objective evaluation metric of the binocular suppression area in (1) is marked as Q_bs，Where max () is a function taking the maximum value,

<math> <mrow> <msubsup> <mi>Q</mi> <mi>R</mi> <mi>bs</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bs</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>R</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bs</mi> </msubsup> </mrow> </munder> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>.</mo> </mrow> </math>

the human visual characteristics show that if the difference of the image contents of the corresponding areas of the left and right eye retinas is small, the human visual system carries out binocular superposition (fusion) operation on the areas, so that the visual sensitivity of the two eyes to the areas is 1.4 times that of the eyes. Thus the present invention is based on { L }_dis(x, y) } and { R }_dis(x, y) and calculating S by using the visual perception characteristics of the human visual system to the binocular fusion region_disThe objective evaluation metric value of the binocular fusion area in (1) is marked as Q_bf，

Wherein,

<math> <mrow> <msubsup> <mi>Q</mi> <mi>R</mi> <mi>bf</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bf</mi> </msubsup> </mrow> </munder> <msub> <mi>Q</mi> <mi>R</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>×</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>Ω</mi> <mi>R</mi> <mi>bf</mi> </msubsup> </mrow> </munder> <mrow> <mo>(</mo> <mn>1</mn> <mo>/</mo> <msubsup> <mi>J</mi> <mi>R</mi> <mi>dis</mi> </msubsup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>.</mo> </mrow> </math>

ninthly to S_disObjective evaluation metric Q of occlusion region in (1)_nc、S_disObjective evaluation metric Q of the binocular suppression area in (1)_bsAnd S_disObjective evaluation metric value Q of binocular fusion area in (1)_bfCarrying out fusion to obtain S_disThe predicted value of the objective evaluation of image quality is marked as Q, Q = w_nc×Q_nc+w_bs×Q_bs+w_bf×Q_bfWherein w is_nc、w_bfAnd w_bsAre weighting parameters. In this embodiment, take w_nc=0.1163、w_bf=0.4119 and w_bs=0.4718。

In the present embodiment, the 12 undistorted stereo images shown in fig. 2a and 2b, fig. 3a and 3b, fig. 4a and 4b, fig. 5a and 5b, fig. 6a and 6b, fig. 7a and 7b, fig. 8a and 8b, fig. 9a and 9b, fig. 10a and 10b, fig. 11a and 11b, fig. 12a and 12b, and fig. 13a and 13b are used to create 312 distorted stereo images under different degrees of JPEG compression, JPEG2000 compression, gaussian blur, white noise, and h.264 coding distortion to analyze the distorted stereo image S obtained by the method of the present invention_disThe correlation between the predicted value and the average subjective score difference is objectively evaluated, wherein 60 distorted stereoscopic images are compressed by JPEG, 60 distorted stereoscopic images are compressed by JPEG2000, 60 distorted stereoscopic images are compressed by Gaussian Blur, 60 distorted stereoscopic images are blurred by White Noise (White Noise), and 72 distorted stereoscopic images are encoded by H.264. And the average subjective score difference of each distorted stereo image in the distorted stereo image set is respectively obtained by the existing subjective quality evaluation method and is recorded as DMOS,DMOS =100-MOS, where MOS denotes the mean subjective score, DMOS ∈ [0,100]。

In this embodiment, 4 common objective parameters of the image quality evaluation method are used as evaluation indexes, that is, Pearson correlation coefficient (PLCC), Spearman correlation coefficient (SROCC), Kendall correlation coefficient (KROCC), mean square error (RMSE), stereo image evaluation accuracy of the objective model in which PLCC and rmsc reflect distortion, and SROCC and KROCC reflect monotonicity thereof. The image quality objective evaluation predicted value of the distorted stereo image calculated according to the method is subjected to five-parameter Logistic function nonlinear fitting, and the higher the PLCC, SROCC and KROCC values are, the lower the RMSE value is, the better the correlation between the objective evaluation method and the average subjective score difference is. The PLCC, SROCC, KROCC and RMSE coefficients reflecting the three-dimensional image objective evaluation model performance are shown in Table 1, and the data listed in Table 1 shows that the correlation between the final image quality objective evaluation predicted value of the distorted three-dimensional image obtained by the method and the average subjective score difference value is very high, which indicates that the objective evaluation result is more consistent with the result of human eye subjective perception, and the effectiveness of the method is enough to explain.

Fig. 14 shows a scatter diagram of the difference between the objective evaluation prediction value of the image quality of each distorted stereoscopic image in the distorted stereoscopic image set and the average subjective score, and the more concentrated the scatter is, the better the consistency between the objective model and the subjective perception is. As can be seen from fig. 14, the scatter diagram obtained by the method of the present invention is more concentrated, and the goodness of fit with the subjective evaluation data is higher.

TABLE 1 correlation between objective evaluation prediction value and subjective score of image quality of distorted stereoscopic image obtained by the method of the present invention

Claims

1. A stereoscopic image quality objective evaluation method based on visual perception is characterized by comprising the following steps:

making S_orgFor original undistorted stereo image, let S_disFor the distorted stereo image to be evaluated, S_orgIs noted as { L_org(x, y) }, adding S_orgIs noted as { R_org(x, y) }, adding S_disIs noted as { L_dis(x, y) }, adding S_disIs noted as { R_dis(x, y) }, wherein (x, y) representsX is more than or equal to 1 and less than or equal to W, y is more than or equal to 1 and less than or equal to H, W represents the width of the left viewpoint image and the right viewpoint image, H represents the height of the left viewpoint image and the right viewpoint image, L represents the coordinate position of the pixel points in the left viewpoint image and the right viewpoint image, and L represents the height of the left viewpoint image and the right viewpoint image_org(x, y) represents { L }_orgThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_org(x, y) represents { R_orgThe pixel value L of the pixel point with the coordinate position (x, y) in (x, y) } is_dis(x, y) represents { L }_disThe coordinate position in (x, y) } is the pixel value of the pixel point with (x, y), R_dis(x, y) represents { R_disThe coordinate position in (x, y) is the pixel value of the pixel point of (x, y);

utilizing area detection algorithm to respectively obtain { L_dis(x, y) } and { R }_disThe region type of each pixel point in (x, y) } is marked as p, wherein p belongs to {1,2,3}, where p ═ 1 represents an occlusion region, p ═ 2 represents a binocular rejection region, and p ═ 3 represents a binocular fusion region; then will { L_disThe sheltering area formed by pixel points with all area types of p ═ 1 in (x, y) } is recorded as the sheltering areaWill { L_disThe binocular suppression area formed by all pixel points with the area type p ═ 2 in (x, y) } is recorded as a binocular suppression areaWill { L_disThe binocular fusion area formed by all pixel points with the area type p ═ 3 in (x, y) } is recorded as a binocular fusion areaWill { R_disThe sheltering area formed by pixel points with all area types of p ═ 1 in (x, y) } is recorded as the sheltering areaWill { R_disThe binocular suppression area formed by all pixel points with the area type p ═ 2 in (x, y) } is recorded as a binocular suppression areaWill { R_disThe binocular fusion area formed by all pixel points with the area type p ═ 3 in (x, y) } is recorded as a binocular fusion area

w_LP、w_LAand b is a training parameter, T₁And T₂Is a control parameter;

according to { L_dis(x, y) } and { R }_dis(x, y) and calculating S by using the visual perception characteristic of the human visual system to the sheltered area_disThe objective evaluation metric of the occlusion region in (1) is marked as Q_nc，Wherein,represents { L_dis(x, y) } the number of pixel points with the area type p being 1,represents { R_dis(x, y) } the number of pixel points with the area type p being 1;

Q_{bs} = \max (Q_{L}^{bs}, Q_{R}^{bs}),

Where max () is a function taking the maximum value,

according to { L_dis(x, y) } and { R }_dis(x, y) and calculating S by using the visual perception characteristics of the human visual system to the binocular fusion region_disThe objective evaluation metric value of the binocular fusion area in (1) is marked as Q_bf，

Wherein,

ninthly to S_disObjective evaluation metric Q of occlusion region in (1)_nc、S_disObjective evaluation metric Q of the binocular suppression area in (1)_bsAnd S_disObjective evaluation metric value Q of binocular fusion area in (1)_bfCarrying out fusion to obtain S_disThe predicted value of the objective evaluation of image quality is recorded as Q, Q is w_nc×Q_nc+w_bs×Q_bs+w_bf×Q_bfWherein w is_nc、w_bfAnd w_bsAre weighting parameters.

2. The objective evaluation method for stereoscopic image quality based on visual perception according to claim 1, wherein the specific process of the step (II) is as follows:

② 1, calculating { L_disVisualization threshold set of luminance masking effect of (x, y) }, denoted as { T }_l(x,y)}，Wherein, T_l(x, y) represents { L }_disThe coordinate position in (x, y) is the visual threshold value of the brightness masking effect of the pixel point of (x, y), bg_l(x, y) represents { L }_disAverage brightness values of all pixel points in a 5 multiplied by 5 window with the pixel point with the coordinate position as (x, y) as the center in (x, y) };

2, calculating { L_disVisualization threshold set of contrast masking effect of (x, y) }, denoted as { T }_c(x,y)}，T_c(x,y)＝K(bg_l(x,y))+eh_l(x, y) wherein T_c(x, y) represents { L }_disThe coordinate position in (x, y) is the visual threshold value of the contrast masking effect of the pixel point of (x, y), eh_l(x, y) represents the pair { L }_disThe pixel points with the coordinate positions (x, y) in (x, y) are respectively subjected to edge filtering in the horizontal direction and the vertical direction to obtain an average gradient value K (bg)_l(x,y))＝-10^-6×(0.7×bg_l(x,y)²+32×bg_l(x,y))+0.07；

J_{L}^{dis} (x, y) = T_{l} (x, y) + T_{c} (x, y);

② 5, calculating { R_disVisualization threshold set of contrast masking effect of (x, y) }, denoted as { T }_c'(x,y)}，T_c'(x,y)＝K(bg_r(x,y))+eh_r(x, y) wherein T_c'(x, y) represents { R_disThe coordinate position in (x, y) is the visual threshold value of the contrast masking effect of the pixel point of (x, y), eh_r(x, y) represents the pair { R_disThe pixel points with the coordinate positions (x, y) in (x, y) are respectively subjected to edge filtering in the horizontal direction and the vertical direction to obtain an average gradient value K (bg)_r(x,y))＝-10^-6×(0.7×bg_r(x,y)²+32×bg_r(x,y))+0.07；

2- (6) pairs of { R_disVisualization threshold set of luminance masking effects of (x, y) } { T_r(x, y) } and a visual threshold set of contrast masking effects { T }_c'(x, y) } to obtain { R }_disBinocular minimum observable of (x, y) }Feeling change image, is recorded as

3. The objective evaluation method for stereo image quality based on visual perception according to claim 1 or 2, characterized in that in the step (c), region detection algorithm is used to obtain { L } respectively_dis(x, y) } and { R }_disThe specific process of the area type of each pixel point in (x, y) } is as follows:

③ 3, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf 255, if so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p 1, then step- (c) -6 is executed, otherwise, step- (c) -4 is executed, wherein x is more than or equal to 1₁≤W,1≤y₁≤H；

③ 4, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not greater thanThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) Of pixel pointsThe area type is marked as p-2, then step- (6) is executed, otherwise, step- (5) is executed;

③ 5, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not it is less than or equal toThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { L }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p being 3;

③ 6, return to the step of ③ 3 to continue to determine L_dis(x, y) } the region types of the remaining pixel points, up to { L }_disDetermining the region types of all pixel points in (x, y) };

③ 9, judgmentThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf 255, then { R } will be_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p 1, then the step- (12) is executed, otherwise, the step- (10) is executed, wherein x is more than or equal to 1₁≤W,1≤y₁≤H；

③ 10, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not greater thanThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { R }_disThe coordinate position in (x, y) } is (x)₁,y₁) The region type of the pixel point is marked as p is 2, and then the step is executedStep three-12, otherwise, executing step three-11;

③ 11, judgeThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointWhether or not it is less than or equal toThe middle coordinate position is (x)₁,y₁) Pixel value of the pixel pointIf so, then { R }_disThe coordinate position in (x, y) } is (x)₁,y₁) The area type of the pixel point is marked as p being 3;

4. The objective evaluation method for stereoscopic image quality based on visual perception according to claim 3, wherein { L ] in the step (c)_org(x,y)}、{R_org(x,y)}、{L_dis(x, y) } and { R }_disThe acquisition process of the local phase characteristic and the local amplitude characteristic of each pixel point in (x, y) } comprises the following steps:

tetra-1, pair { L_orgCarrying out phase consistency transformation on each pixel point in (x, y) to obtain { L }_orgEven symmetric frequency response and odd symmetric frequency response of each pixel point in (x, y) } in different scales and directions are converted into { L }_orgEven symmetric frequency responses of pixel points with coordinate positions (x, y) in different scales and directions are recorded as e_α,θ(x, y) will { L_orgThe sum of pixel points with coordinate positions (x, y) in (x, y) is different in sizeThe odd symmetric frequency response of the direction is noted as o_α,θ(x, y), wherein alpha represents a scale factor of the filter, 1 ≦ alpha ≦ 4, theta represents a direction factor of the filter, 1 ≦ theta ≦ 4;

Wherein,

fourthly-3, according to { L_orgCalculating the direction corresponding to the maximum phase consistency characteristic of each pixel point in (x, y) } and calculating the { L_org(x, y) } local phase features and local amplitude features for each pixel point, for { L_orgThe pixel point with (x, y) coordinate position in (x, y) } is firstly found out the phase consistency characteristic PC of the pixel point in different directions_θFinding out the direction corresponding to the maximum phase consistency characteristic in (x, y), and marking as theta_mAgain according to_θm calculates { L }_orgThe local phase characteristic and the local amplitude characteristic of the pixel point with the coordinate position (x, y) in (x, y) } are respectively and correspondingly marked asAnd

wherein,