JP3907724B2

JP3907724B2 - Image encoding device

Info

Publication number: JP3907724B2
Application number: JP32597195A
Authority: JP
Inventors: 善明鹿喰; 慎一境田; 金子　　豊; 豊田中
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 1995-12-14
Filing date: 1995-12-14
Publication date: 2007-04-18
Anticipated expiration: 2015-12-14
Also published as: JPH09163369A

Description

【０００１】
【発明の属する技術分野】
本発明は画像符号化装置に関し、特に分割されたブロック毎に各画像構成要素について処理を行う画像符号化装置に関するものである。
【０００２】
【従来の技術】
従来の画像信号の高能率符号化は、画像信号の水平または垂直方向の相関を利用した直交変換により特定の変換係数に信号電力を集中させて必要ビット数の削減を図るものが多い。この直交変換を用いた高能率符号化では１枚の画像を一定数の画素からなる一定形状の複数の単位ブロック（以下、ブロックと記す）に分割して各ブロック毎に変換を行い、各ブロック内の画素を基本となる基底ベクトルの重み付き線形和として表し、この重みを量子化してこの対応番号を伝送している。
【０００３】
ＭＰＥＧの標準化に代表されるこれまでの画像符号化においては、１枚の画像はその画像内容とは関係なく連続したデータ列としてのみ扱われ、より少ないビット数でより高忠実に画像の波形（パターン）を再現することが目標とされている。このような従来の画像符号化では、通常ＤＣＴ(Discrete Cosine Transform) 等のように基本的に定常性を前提とした処理が用いられている。このため、前景と背景といった画像信号に本質的に内在する非定常性により、符号化効率や復号画質の低下をきたすことになる。また従来、画像の信号処理やハンドリングは、１枚の画像毎に、動画像の場合にはひと続きのフレーム毎に一括して扱われている。すなわち、画像構成要素単位等、より小さな単位で信号処理やハンドリングを行うことは考慮されていない。
【０００４】
そこで、近年、画像信号を各画像構成要素に対応する複数の画像領域に分割した上で処理して符号化を行うための研究が活発になってきている。この符号化では、上述のような非定常性を回避した上で、画像領域毎に最適な処理手法を選択して施すことができるため、より高性能な符号化が可能となる。また、画像構成要素毎に処理が可能となるため、よりハンドリング性のよい画像処理を行うことができる。
【０００５】
従来の画像信号の波形符号化では、ＤＣＴ符号化に代表されるように画像を一定サイズの矩形ブロックに分割して処理を行っていた。しかし、画像領域毎に処理する場合、画像領域形状は一般に矩形にはならないので、従来の手法（画像を一定サイズの矩形ブロックに分割してブロック毎に処理する方式）をそのまま用いることができない。そこで、従来のブロックベースの符号化手法を基本に、任意形状の符号化対象画像領域を構成する画素群に対して処理を行う拡張手法がいくつか提案されている。
【０００６】
任意形状の画像領域の符号化に従来の符号化手法、特に、直交変換を用いる手法は、画像領域の形状からそれに合致した変換基底を導出して用いる手法、および変換対象画像信号を操作して基底自体はＤＣＴなど従来のものをそのまま用いる手法の２つに大別される。
【０００７】
前者の手法として、Gilge による手法、加藤による手法、および松田らによる手法が挙げられる。Gilge は多項式を基本として形状から直交変換の基底を求めている(Gilge M., Engelhardt T. and Mehlan R.：“Coding of Arbitrary Shaped Image Segments Based on a Generalized Orthogonal Transform ”，Signal Processing Image Comm., 1,2, pp.153-180(Oct. 1989))。
【０００８】
また加藤や松田らは、画像の相関モデルおよび画像領域形状からＫＬ（カルーネン・レーベ）変換基底を導出している(Kato Y.，“Segment-based Still Image Coding Arbitrary Shape Orthogonal Transforms”，Proc pcs’91, pp.357-360(Sept. 1991)，松田一朗、伊東晋、宇都宮敏男：“画像の適応的可変ブロック形状ＫＬ変換符号化”，信学論Ｂ−Ｉ，vol.J76-B-I, No.5, pp.399-408(May 1993))。
【０００９】
加藤は同時に、ＫＬ変換基底導出に必要な固有値問題を解く代わりに、ＤＣＴの基底を初期基底として用い、これを画像領域形状内で直交化することにより任意形状ＫＬ変換に類似した基底を持つ任意形状ＤＣＴ（ＡＳ(Arbitrary Shaped)−ＤＣＴ）を提案している。このＡＳ−ＤＣＴでは、符号化対象画像領域の画像形状をもとに導く変換基底の数がその画素数に等しくなるため、変換基底毎に計算される伝送すべき変換係数の数も画素数に等しくなる。
【００１０】
これらの手法により、ほぼ最適な電力集中特性を得ることができる。また、これらの手法では、変換基底は画像領域形状から導出しているので、画像領域形状さえ伝送していれば基底関数を伝送しなくても受信側で再現することができる。
【００１１】
後者の手法として、Sikoraらによる手法、木村による手法、および伊藤らによる手法が挙げられる。SikoraらによるＳＡ(Shape Adaptive)−ＤＣＴ手法（Sikora T. and Makai B.：“Shape-adaptive DCT for generic encoding of video”，ＩＥＥＥ Trans. on Circuits and Systems for Video Technology，5,1(1995))では、まず画像領域の水平方向のライン毎に領域の幅に応じた次数の１次元ＤＣＴを施し、得られた係数を低周波側にシフトして最低周波数係数が垂直に並ぶようにする。次に、同様の操作を垂直方向に対して行う。このＳＡ−ＤＣＴでは、符号化対象画像領域の画素数に一致した次数の１次元ＤＣＴにより各々の変換を行う。したがって、これも最終的には伝送すべき変換係数の数は符号化対象画像領域の画素数に等しくなる。
【００１２】
また、木村、および伊藤らによって、矩形形状の２次元ＤＣＴを用いることにより、矩形ブロック内で符号化対象画像領域に属さない画素については画像領域内画素の平均レベル等で外挿を行う手法（ＲＦ−ＤＣＴ：Region Filling DCT）が提案されている（木村淳一：“ブロック内領域分割による画像符号化の検討”，Proc．PCSJ’95, 3-7,pp.39-40(Oct. 1995) ，伊藤典男、堅田裕之、草尾寛：“任意形状領域に対するＤＣＴ実現手法の検討”，Proc．PCSJ’95, 5-2, pp.77-78(Oct. 1995)) 。
【００１３】
このＲＦ−ＤＣＴでブロック内に２つの画像領域が混在するときには、画像領域毎にブロックＤＣＴを行う。このため変換係数は画像領域毎にブロックの大きさ分だけ（すなわち、２つの画像領域を含んだブロックの画素数だけ）導出される。したがって、両方の画像領域分を併せると、１ブロックに対し２ブロック分の数の変換係数が導出され、これら２ブロック分の変換係数を伝送する必要がある。しかし、この伝送すべき変換係数の数は非圧縮の場合である。
【００１４】
ＲＦ−ＤＣＴでは、見かけ上画素数の倍の変換係数を伝送する必要があり、ＡＳ−ＤＣＴやＳＡ−ＤＣＴ、あるいは画像領域分割を行わず単純にブロックに対しブロックＤＣＴをかける場合に比べて符号化効率上不利であるように見える。しかし、通常ＤＣＴを用いた高能率符号化においては視覚的に重要かつ電力の大半を占める低周波成分に相当する変換係数のみを伝送するため、実効的に伝送する変換係数の数はＳＡ−ＤＣＴなどの場合と殆ど同じである。
【００１５】
【発明が解決しようとする課題】
しかしながら、従来のブロックベースの符号化手法を基本に任意形状の画素群に対して従来のＤＣＴを適用するの拡張手法いくつか提案されているものの、ＡＳ−ＤＣＴにおける基底の導出の際には、固有値問題の解を導出したり、シュミットの直交化法を使用する必要があるため、演算量が膨大なものとなり、あまり大きな形状の画像領域を扱うことは現実的ではないという問題点がある。また、導出された基底は正弦関数ではないので変換は周波数分析としての意味合いは薄く、視覚特性を考慮した量子化を変換係数に施すことが困難であるという問題点がある。
【００１６】
また、ＳＡ−ＤＣＴ手法は１次元ＤＣＴと係数の周波数シフト処理で実現できるため比較的簡単な処理で済むが、周波数シフト処理により画像領域の輪郭形状に応じてライン間の位相がずらされるため、垂直方向の相関が低くなるという問題点がある。また最終的に求められる係数の位置は空間周波数的な意味合いをあまり持たないので、視覚特性の導入が困難になるという問題点がある。さらに、ＲＦ−ＤＣＴ手法は簡易であり、低周波の画像領域内外の連続性はある程度保証されるものの、中・高周波については不連続となり、挿入画素レベルの最適性、ひいては画像領域内画素記述法としての最適性は保証されないという問題点がある。
【００１７】
このように、従来のブロックベースの符号化手法を基本に任意形状の画素群の処理を行う上記した各拡張手法では、符号化効率、計算量、視覚特性の導入を可能とする周波数分析能力の点でいずれも問題点を有している。
【００１８】
そこで本発明は、上記の問題点を解決した画像符号化装置を提供することを目的とする。
【００１９】
【課題を解決するための手段】
上記の課題を解決するために本発明装置では、複数の構成要素からなる画像を所定数の画素で構成される所定寸法の複数のブロックに分割し、該ブロックに含まれる画素信号ベクトルを該ブロックに対応する変換基底ベクトルの重み付け線形和として出力する画像符号化装置において、前記複数の構成要素の境界情報に基づいて、前記複数のブロックのうち前記複数の構成要素の境界を含まないブロックの画素信号ベクトルを予め定められた直交変換基底ベクトルの重み付け線形和として変換出力する第１の変換手段と、前記境界情報に基づいて、前記複数のブロックのうち前記境界を含むブロックにおける前記複数の構成要素に対応する符号化対象の各画像領域に対してそれぞれ、前記直交変換基底ベクトルを該画像領域の形状に合わせて修正した新たな変換基底ベクトルを導出し、前記境界を含むブロックの画素信号ベクトルを当該ブロックにおける画像領域毎に対応する新たな変換基底ベクトルの重み付け線形和として変換出力する第２の変換手段と、前記境界情報に基づいて、前記第１の変換手段からの変換出力と前記第２の変換手段からの変換出力のいずれかを選択的に出力することで符号化された画像信号を直列に出力する選択出力手段とを具備し、前記第２の変換手段は、前記境界を含む各ブロックについて、前記直交変換基底ベクトルの前記各画像領域に対応する成分を保持するとともに、該成分以外の対応しない成分をゼロとして前記直交変換基底ベクトルをマスキングし、マスキングされた各直交変換基底ベクトルを該直交変換基底ベクトルの自乗のノルムで除することによりスカラー倍することにより、前記各画像領域の形状に合わせて修正された前記新たな変換基底ベクトルを導出する構成とした。
【００２１】
また、前記第２の変換手段によって、画像信号ベクトルと前記修正された新たな変換基底ベクトルとの内積を計算することにより前記修正された新たな変換基底ベクトルに対応する係数を求めて各係数の中から絶対値最大の係数を選択し、該絶対値最大の係数に対応する前記直交変換基底ベクトルと該絶対値最大の係数との積を前記画素信号ベクトルから減じることで残差信号ベクトルを導出して該残差信号ベクトルを前記画像信号ベクトルと置き換え、さらに、置き換えられた該画像信号ベクトルと前記修正された新たな変換基底ベクトルとの内積に基づいて新たな絶対値最大の係数の選択および新たな残差信号ベクトルの導出を行うことを前記各画像領域に対応する残差信号電力が予め定められた値以下になるまで繰り返すことにより前記修正された新たな変換基底ベクトルの係数を決定し、前記画像信号ベクトルを前記修正された新たな変換基底ベクトルの重み付け線形和として表現して変換出力する構成とした。
【００２２】
また、前記第２の変換手段によって、画像信号ベクトルと前記修正された新たな変換基底ベクトルとの内積を計算することにより前記修正された新たな変換基底ベクトルに対応する係数を求めて該係数毎に異なる重み付け演算を施した後の絶対値が最大となる前記係数を選択し、該係数に対応する前記直交変換基底ベクトルと該係数との積を前記画素信号ベクトルから減じることで残差信号ベクトルを導出して該残差信号ベクトルを前記画像信号ベクトルと置き換え、さらに、置き換えられた該画像信号ベクトルと前記修正された新たな変換基底ベクトルとの内積に基づいて新たな絶対値最大の係数の選択および新たな残差信号ベクトルの導出を行うことを前記各画像領域に対応する残差信号電力が予め定められた値以下になるまで繰り返すことにより前記修正された新たな変換基底ベクトルの係数を決定し、前記画像信号ベクトルを前記修正された新たな変換基底ベクトルの重み付け線形和として表現して変換出力する構成とした。
【００２３】
また、前記直交変換基底ベクトルは離散コサイン変換基底ベクトルであるように構成した。
【００２４】
【発明の実施の形態】
以下、図面を参照しながら本発明の実施の形態を詳細に説明する。
【００２５】
（第１の実施の形態）
図１は本発明を適用した画像符号化装置の第１の実施の形態の構成を示すブロック図である。
【００２６】
本実施の形態では、処理対象画像は人などの前景と背景とで構成されていて画像構成要素数は２であり、分割された各単位ブロック（以下、ブロックと記す）内の画像領域は最大で２（最小は１）である。また、各画像構成要素が画面内で占める領域を表す領域情報（境界情報）は既知であるものとする。
【００２７】
図１に示す画像符号化装置は、領域情報が入力されるブロック化回路１，処理対象の画像信号が入力されるブロック化回路２，ＤＣＴ(Discrete Cosine Transform；離散コサイン変換）基底供給回路３，サポート領域修正回路４および５，係数導出回路６および７および８，組合せ回路９，判定回路１０，並びに切替え回路１１により構成されている。サポート領域修正回路は、処理対象画像の画像構成要素数だけ必要である。
【００２８】
画像信号はブロック化回路２に、その領域情報はブロック化回路１に供給され、Ｎ×Ｎの一定の寸法の複数のブロックに対するブロック信号およびブロック領域情報に分割される。画素数Ｎの値は一般には８または１６であるが、本実施の形態ではＮ＝８として説明する。
【００２９】
ここで、図２はブロック化された領域情報の一例を示す説明図である。
【００３０】
図２において、画像２０は画像構成要素として人と背景とを含んでおり、水平方向（ｘ方向）に８分割、垂直方向（ｙ方向）に６分割され、４８個の矩形ブロックの領域情報に分割されている。図２中＊を付した１５個のブロック領域情報（（ｘ，ｙ）＝（３，３），（４，１），（４，２），（４，３），（４，４），（４，５），（４，６），（５，１），（５，４），（５，６），（６，１），（６，２），（６，３），（６，４），（６，５））は、各ブロックが人を画像構成要素とする網掛け部分で示す画像領域２１の一部と、白地で示す背景を画像構成要素とする画像領域２２の一部とで構成されることを表している。換言すれば、これらのブロック情報はブロック内に画像構成要素の境界があり、画像が非矩形で構成されることを表している。
【００３１】
その他のブロック領域情報は、各ブロックが人のみ、または背景のみを画像構成要素とすることを表している。換言すれば、これらのブロック情報はブロック内に画像構成要素の境界がなく、画像が矩形で構成されることを表している。図２で示したようにブロック化回路１によってブロック化されたブロック領域情報は、サポート領域修正回路４および５，係数導出回路６，並びに判定回路１０に供給される。サポート領域修正回路４および５では、ＤＣＴ基底供給回路３から供給されたＤＣＴ基底のサポート領域をブロック毎に画像領域形状に応じて修正する。サポート領域修正回路４では２つの画像領域のうち１つの画像領域（たとえば人とする）の領域形状に、サポート領域修正回路５ではもう一方の画像領域（背景とする）の領域形状に対応して修正が行われる。サポート領域修正回路４および５によるサポート領域修正方法については後述する。
【００３２】
また、ブロック化回路２によってブロック化された画像信号は、係数導出回路６および７および８に供給される。人のみで構成されるブロックおよび背景のみで構成されるブロックの画像信号に対しては、係数導出回路６によりＤＣＴ基底供給回路３から供給された２次元ＤＣＴ基底を修正せずに用いて、ＤＣＴ係数の導出およびその符号化を行う。係数導出回路６からの符号化データは、切替え回路１１に供給される。
【００３３】
人の一部と背景の一部とが混在するブロックの画像信号に対しては、係数導出回路７および８により各々サポート領域修正回路４および５から供給される修正されたＤＣＴ基底を用いてＤＣＴ係数の導出およびその符号化を行う。このＤＣＴ係数の導出およびその符号化については後述する。
【００３４】
係数導出回路７および８からの符号化データは、それぞれ組み合わせ回路９に供給される。そして、１つのブロックを構成する２つの画像領域を記述する符号化データが多重された符号化データとされて、組み合わせ回路９から切替え回路１１に供給される。組み合わせ回路９による符号化データの多重は、切替え回路１１によりブロック単位の切替えを行なうために１つのブロック内の２つの画像領域の符号化データをまとめるための処理であり、係数導出回路７および８からの２つの符号化データ列を単に直列に並べるものである。
【００３５】
一方、判定回路１０では、ブロック化回路１から供給された各ブロックのブロック領域情報に基づいて、各ブロック毎に１つの画像領域（人または背景）のみが存在するか２つの画像領域（人および背景）が混在するかを判定する。この判定回路１０からの判定信号は、切替え回路１１に供給される。
【００３６】
切替え回路１１は判定回路１０からの判定信号に基づき、１つの画像領域のみが存在するブロックに対しては係数導出回路６からの符号化データを、２つの画像領域が混在するブロックに対しては組み合わせ回路９からの符号化データを選択的に切替えて圧縮された直列データとして出力する。すなわち、図２において＊を施した２つの画像領域が混在するブロックに対しては組み合わせ回路９からの直列に多重された符号化データを出力し、その他の１つの画像領域から構成されるブロックに対しては係数導出回路６からの重み付け線形和で表される符号化データを出力する。
【００３７】
次に、サポート領域修正回路４および５によるサポート領域修正の方法について説明する。
【００３８】
図３はサポート領域修正回路４および５において処理されるブロック内の符号化対象画像領域の一例を示す説明図である。
【００３９】
本実施の形態では、前述したとおり分割された各ブロックの寸法は８×８（Ｎ＝８）であり、たとえばサポート領域修正回路４により一つの画像構成要素である人を表す網掛け部分の画素群（符号化対象画素数Ｍ＝３１）に対してサポート領域修正を行うものとする。この場合、もう一つのサポート領域修正回路５によって、他の画像構成要素である背景を表す白色で示す画素群に対するサポート領域修正が人を表す網掛け部分の画素群に対するサポート領域修正とは別々に行われる。このため、一つの画像を構成する画像構成要素の数だけサポート領域修正回路が必要になる。
【００４０】
ここでは、説明の便宜上サポート領域修正回路４によるサポート領域修正についてのみ説明し、サポート領域修正回路５によるサポート領域修正についての説明は省略する。ＤＣＴ基底供給回路３からは、予め定められた大きさＮ×Ｎ（Ｎ＝８）の
【００４１】
【外１】

【００４２】
が供給される。
【００４３】
上記ＤＣＴ基底ベクトル（後に説明する図４（ａ）参照、ただし図４では１次元の表現である）のＮ×Ｎ個の各々の成分のうち、符号化対象（サポート）画像領域に対応するＭ個（Ｍ＝３１）の成分を保持し、その他の３３個の成分が０となるように
【００４４】
【外２】

【００４５】
を設定し、符号化対象画像領域を修正する。そして、以下に示す式（１）の演算を行って
【００４６】
【外３】

【００４７】
を求め、マスキングされたベクトルをノルムの自乗で除算することで符号化対象画像構成要素の形状に修正された新たな変換基底ベクトルを算出し、ノルム補正を行う。すなわち、
【００４８】
【数１】

【００４９】
により
【００５０】
【外４】

【００５１】
が求められる。この新たな変換基底ベクトルは、ＤＣＴ基底ベクトルをスカラー倍することで非正規であることを補正されている。
【００５２】
図４は上述したサポート領域修正の例を示す説明図である。図４では、説明の便宜上、２次元のブロックではなく１次元の８画素のブロックに対する例を示している。図４（ａ）は予め定められたＤＣＴ基底ベクトルを、図４（ｂ）はサポート領域修正後のマスキングされたベクトルを、図４（ｃ）はノルム補正後の新たな変換基底ベクトルをそれぞれ表している。
【００５３】
図４中、網掛けされた正方形は符号化対象画素を、白い正方形は符号化対象外の画素を表している。図４（ｂ）に示すサポート領域修正後のマスキングされたベクトルは、ＤＣＴ基底ベクトルの符号化対象外の画素に対応する所定の成分をマスキングされて０とされており、図４（ｃ）に示すノルム補正後の画像構成要素の形状に修正された新たな変換基底ベクトルは、マスキングされたベクトルの符号化対象画素に対応する各成分をスカラー倍したものとなっている。
【００５４】
次に、係数導出回路７および８における係数導出法について説明する。
【００５５】
ここで、
【００５６】
【外５】

【００５７】
を示すブロック化回路２からの画像信号ベクトル（要素数Ｎ×Ｎ）とする。
【００５８】
まず、第１の段階では、上記画像信号ベクトルとノルム補正後の修正された
【００５９】
【外６】

【００６０】
との内積を計算してスカラー関数
【００６１】
【数２】

【００６２】
に変換して修正された
【００６３】
【外７】

【００６４】
を求め、これらの係数の中から次式により絶対値が最大の係数ｃ_k を選択する。
【００６５】
【数３】

【００６６】
次に、第２の段階では、絶対値最大の係数ｃ_k に対応する
【００６７】
【外８】

【００６８】
次式のとおり絶対値が最大の係数ｃ_k と絶対値が最大の係数ｃ_k に対応する上記ＤＣＴ基底ベクトルとの積の成分を上記画像信号ベクトルから減じて除去することで残差信号ベクトルを求め、これを画像信号ベクトルと置き換える。
【００６９】
【数４】

【００７０】
次に、第３の段階では、第２の段階において求めた上記の残差信号ベクトルで置き換えられた画像信号ベクトルと前記したノルム補正後の修正された新たな変換基底ベクトルとの内積を第１の段階と同様に再び計算してスカラー関数に変換して、この関数の係数のうちから第１の段階と同様に新たな絶対値最大の係数を選択する。
【００７１】
次に、第４の段階では、第３の段階において選択された絶対値が最大の係数ｃ_k とこの係数に対応するＤＣＴ基底ベクトルとの積の成分を画像信号ベクトルから減じて除去することで第２の段階と同様に再び新たな残差信号ベクトルを求め、画像信号ベクトルと置き換える。
【００７２】
次に、第５の段階では、第４の段階において求められた残差信号ベクトル、すなわち画像信号ベクトルに基づいて、残差信号ベクトルによる残差信号電力が予め十分小さな値に定められた閾値以下であるかを評価し、閾値以下であればこのときの修正された新たな変換基底ベクトルの係数を採用し、重み付け線形和としてて符号化して変換出力する。ここで残差信号電力の評価は符号化対象画像構成要素分だけとする。すなわち、第２の段階での処理はＮ×Ｎの要素に対して施されるが、目的は符号化対象のＭ個の画素のレベルを表現することにあるので、誤差の評価は残差信号ベクトルの要素のうち、このＭ個に対応する要素の自乗和で行なう。
【００７３】
一方、残差信号電力が十分小さな閾値以下になっていないと判定されたときは再び第３の段階に戻って処理を続行し、第５の段階において残差信号電力が予め十分小さな値に定められた閾値以下であると評価されるまで、第３の段階と第４の段階と第５の段階の処理を繰り返し実行する。
【００７４】
なお、各変換基底ベクトルは直交していないので、一度除かれた成分が他の成分の除去に伴って再び現れることもある。この場合、同一変換基底の係数を累積し、その結果を最終的に採用する。
【００７５】
このように、上記した本実施の形態によれば、比較的小さなハードウェア規模で、高能率かつ画像構成要素単位で画像信号のハンドリングを行うことのできる画像符号化を実現することができる。
【００７６】
図５は近年発表された従来の任意形状符号化と本発明の第１の実施の形態による画像符号化の諸特性を比較して表す説明図である。なお、本発明による符号化手法では、一般的変換基底を領域情報に基づいて画像構成要素の形状に修正した変換基底を導出しているので、本出願人らはこの手法を（ＲＳ(Region Shaped) −ＤＣＴ）と名付けている。
【００７７】
図５において、各特性の良し悪しを表す印は、二重丸は非常に優れていることを、丸は優れていることを、三角は普通であることを、×は劣っていることをそれぞれ表している。
【００７８】
本発明によるＤＣＴを用いたＲＳ−ＤＣＴによる高能率符号化では見かけ上画素数の倍の変換係数が導出されるが、視覚的に重要かつ電力の大半を占める低周波成分に相当する変換係数のみを高い電力集中特性により伝送するため、実効的に伝送する変換係数の数は従来のＳＡ−ＤＣＴなどの場合と同等以下であることが計算機によるシミュレーションにより確認されている。このため、ＲＳ−ＤＣＴはデコーダー演算量が少なくて済み、周波数分析能力が特に優れており、他の特性でも劣る点がないことがわかる。
【００７９】
（他の実施の形態）
本実施の形態は、第１の実施の形態と同様のブロック構成により実現できる。ただし、係数導出回路７および８は、変換係数毎に異なった重み付けを持つ評価関数を用いて係数導出を行うように構成されている点で第１の実施の形態とは異なっている。
【００８０】
すなわち、係数導出回路７および８による係数導出における第１の段階において、たとえば下式のように係数の次数の逆数で重み付けした各係数の絶対値の中から絶対値最大の係数ｃ_k の選択を行ってもよい。
【００８１】
【数５】

【００８２】
重み付けは係数ｃ_k の選択にのみ用い、係数ｃ_k を選択した後の第２の段階以降の処理は第１の実施の形態による処理と同様に行う。この手法により、高周波の係数の出現を抑えることができ、第１の実施の形態と比べてさらに情報量の削減を行なうことができる。
【００８３】
なお、ここでは次数の逆数を重み付けに用いることにより高周波電力を低減したが、どの係数に電力を集中させたいかという目的に応じて任意の重み付けを用いてもよい。
【００８４】
上記した各実施の形態における符号化処理による符号化データは、基本的に従来のブロックベースの復号法によって復号することができる。１ブロックに２つの画像領域があるときは、各々のデータを復号すると各々についてブロック形状の復号画像が得られる。これらから、別途伝送した領域情報に従い、有効な画像信号を画素毎に選択し、両方の画像領域の画像信号が混在した所望の１ブロックの画像信号が得られる。
【００８５】
このように本発明装置では、従来のブロックベースの符号化と同等の復号演算量、周波数分析能力をもたせるために、従来の矩形形状のブロック符号化処理に必要最小限の修正を施している。すなわち、符号化対象画像領域が非矩形のとき、従来の符号化に用いる変換基底のサポート領域をその符号化対象画像領域に合わせるように修正した上で矩形処理することにより、従来符号化との整合性を保ちつつ任意形状画像信号の処理を行なうことができる。本発明装置に係わる変換手法は矩形変換を基本とするが、変換基底として一定の次数のブロックベースの変換基底を修正して用いている点で、符号化対象画像領域毎に変換基底を導出して用いる従来技術の前者の手法と考え方を異にし、基底自体を従来のものを用いる従来技術の後者の手法の２つ目とも考え方を異にし、従来の技術に記載した三つの手法のいずれにも属さない新規な手法である。
【００８６】
【発明の効果】
以上説明したように本発明によれば、従来の符号化に用いている一般的な変換基底をその符号化対象画像領域に合わせるように修正した上で処理することにより従来符号化との整合性を保つことができ、視覚的に重要かつ電力の大半を占める低周波成分に相当する変換係数のみを高い電力集中特性により伝送するため、実効的に伝送する変換係数の数は従来の任意形状ＤＣＴなどの場合と同等以下となり、デコーダー演算量が少なくて済み比較的小さなハードウェアにより実現でき、また周波数分析能力が特に優れているために容易に視覚特性の導入を行うことができるという効果がある。
【図面の簡単な説明】
【図１】本発明を適用した画像符号化装置の第１の実施の形態の構成を示すブロック図である。
【図２】ブロック化された領域情報の一例を示す説明図である。
【図３】サポート領域修正回路４および５において処理されるブロック内の符号化対象画像領域の一例を示す説明図である。
【図４】サポート領域修正の例を示す説明図である。
【図５】近年発表された従来の任意形状符号化と本発明の第１の実施の形態による画像符号化の諸特性を比較して表す説明図である。
【符号の説明】
１，２ブロック化回路
３ＤＣＴ基底供給回路
４，５サポート領域修正回路
６，７，８係数導出回路
９組合わせ回路
１０判定回路
１１切替え回路
２０画像
２１前景
２２背景[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image encoding device, and more particularly to an image encoding device that performs processing on each image constituent element for each divided block.
[0002]
[Prior art]
Conventional high-efficiency coding of image signals often reduces the number of necessary bits by concentrating signal power on a specific transform coefficient by orthogonal transform using the horizontal or vertical correlation of the image signal. In this high-efficiency encoding using orthogonal transform, one image is divided into a plurality of unit blocks (hereinafter referred to as blocks) having a fixed shape composed of a fixed number of pixels, and each block is converted. These pixels are represented as weighted linear sums of basic basis vectors, and the corresponding numbers are transmitted by quantizing the weights.
[0003]
In conventional image coding represented by MPEG standardization, one image is treated only as a continuous data string regardless of the image content, and the waveform of the image (with a smaller number of bits is more faithful) The goal is to reproduce the pattern. In such conventional image coding, processing based on the premise of continuity, such as DCT (Discrete Cosine Transform), is generally used. For this reason, non-stationarity inherent in the image signal such as the foreground and background causes a decrease in encoding efficiency and decoded image quality. Conventionally, image signal processing and handling are collectively handled for each image, or in the case of a moving image, for each continuous frame. That is, it is not considered to perform signal processing and handling in smaller units such as image component units.
[0004]
Therefore, in recent years, research has been actively conducted to divide an image signal into a plurality of image regions corresponding to each image component and to process and encode the image signal. In this encoding, since the above-mentioned non-stationarity can be avoided and an optimum processing method can be selected and applied for each image region, higher-performance encoding is possible. In addition, since processing can be performed for each image component, image processing with better handling can be performed.
[0005]
In conventional waveform coding of an image signal, processing is performed by dividing an image into rectangular blocks of a certain size, as represented by DCT coding. However, when processing is performed for each image region, the shape of the image region is not generally rectangular, and the conventional method (method of dividing an image into rectangular blocks of a certain size and processing each block) cannot be used as it is. In view of this, several expansion methods have been proposed in which processing is performed on a pixel group constituting an encoding target image region having an arbitrary shape based on a conventional block-based encoding method.
[0006]
Conventional encoding methods for image regions of arbitrary shapes, especially methods that use orthogonal transformation, are methods that derive and use conversion bases that match the shape of the image region, and manipulate the image signal to be converted. The base itself is roughly divided into two methods using a conventional one as it is, such as DCT.
[0007]
Examples of the former method include the method by Gilge, the method by Kato, and the method by Matsuda et al. Gilge seeks the basis of orthogonal transformation from shape based on polynomials (Gilge M., Engelhardt T. and Mehlan R .: “Coding of Arbitrary Shaped Image Segments Based on a Generalized Orthogonal Transform”, Signal Processing Image Comm., 1,2, pp.153-180 (Oct. 1989)).
[0008]
Also, Kato and Matsuda et al. Derived KL (Karunen-Loeve) transform bases from image correlation models and image region shapes (Kato Y., “Segment-based Still Image Coding Arbitrary Shape Orthogonal Transforms”, Proc pcs' 91, pp.357-360 (Sept. 1991), Ichiro Matsuda, Satoshi Ito, Toshio Utsunomiya: “Adaptive Variable Block Shape KL Transform Coding of Images”, Science Theory BI, vol. J76-BI, No .5, pp.399-408 (May 1993)).
[0009]
At the same time, instead of solving the eigenvalue problem necessary for derivation of the KL transform base, Kato uses a DCT basis as an initial basis, and orthogonalizes it in the image region shape, so that an arbitrary base similar to the arbitrary shape KL transform is used. A shape DCT (AS (Arbitrary Shaped) -DCT) is proposed. In this AS-DCT, since the number of transform bases derived based on the image shape of the encoding target image region is equal to the number of pixels, the number of transform coefficients to be transmitted calculated for each transform base is also the number of pixels. Will be equal.
[0010]
By these methods, almost optimal power concentration characteristics can be obtained. In these methods, since the transformation base is derived from the shape of the image region, if the shape of the image region is transmitted, it can be reproduced on the receiving side without transmitting the basis function.
[0011]
The latter method includes the method by Sikora et al., The method by Kimura, and the method by Ito et al. SA (Shape Adaptive) -DCT method by Sikora et al. (Sikora T. and Makai B .: “Shape-adaptive DCT for generic encoding of video”, IEEE Trans. On Circuits and Systems for Video Technology, 5, 1 (1995)) First, a one-dimensional DCT of an order corresponding to the width of the region is performed for each horizontal line of the image region, and the obtained coefficient is shifted to the low frequency side so that the lowest frequency coefficient is aligned vertically. Next, a similar operation is performed in the vertical direction. In this SA-DCT, each conversion is performed by a one-dimensional DCT of the order that matches the number of pixels in the encoding target image region. Therefore, also in the end, the number of transform coefficients to be transmitted is equal to the number of pixels in the encoding target image area.
[0012]
In addition, Kimura and Ito et al. Use a rectangular two-dimensional DCT to extrapolate pixels that do not belong to the encoding target image area in the rectangular block based on the average level of the pixels in the image area ( RF-DCT (Region Filling DCT) has been proposed (Keiichi Kimura: “Examination of Image Coding by Region Division in Block”, Proc. PCSJ'95, 3-7, pp.39-40 (Oct. 1995) , Norio Ito, Hiroyuki Katata, Hiroshi Kusao: “Examination of DCT Realization Method for Arbitrary Shape Region”, Proc. PCSJ'95, 5-2, pp.77-78 (Oct. 1995)).
[0013]
When two image areas are mixed in the block by the RF-DCT, the block DCT is performed for each image area. Therefore, the transform coefficient is derived for each image area by the size of the block (that is, the number of pixels of the block including the two image areas). Therefore, when both image areas are combined, two blocks of transform coefficients are derived for one block, and it is necessary to transmit these two blocks of transform coefficients. However, this number of transform coefficients to be transmitted is an uncompressed case.
[0014]
In RF-DCT, it is necessary to transmit a conversion coefficient that is apparently twice the number of pixels, and AS-DCT, SA-DCT, or code compared to a case where block DCT is simply applied to a block without image region division. Seems to be disadvantageous in terms of efficiency. However, in high-efficiency coding using DCT, only the transform coefficients corresponding to low-frequency components that are visually important and occupy most of the power are transmitted, so the number of transform coefficients that are effectively transmitted is SA-DCT. And so on.
[0015]
[Problems to be solved by the invention]
However, although several extended methods for applying the conventional DCT to a pixel group of an arbitrary shape based on the conventional block-based coding method have been proposed, in the derivation of the basis in the AS-DCT, Since it is necessary to derive a solution to the eigenvalue problem or use Schmidt's orthogonalization method, the amount of calculation becomes enormous, and there is a problem that it is not practical to handle an image region having a very large shape. Further, since the derived basis is not a sine function, the conversion has little meaning as frequency analysis, and it is difficult to perform quantization considering the visual characteristic on the conversion coefficient.
[0016]
In addition, the SA-DCT method can be realized by one-dimensional DCT and coefficient frequency shift processing, and thus can be relatively simple processing. However, the frequency shift processing shifts the phase between lines in accordance with the contour shape of the image region. There is a problem that the correlation in the vertical direction is lowered. In addition, since the position of the finally obtained coefficient does not have much spatial frequency significance, there is a problem that it is difficult to introduce visual characteristics. Furthermore, although the RF-DCT method is simple and continuity between the inside and outside of the low frequency image region is guaranteed to some extent, it becomes discontinuous at the middle and high frequencies, and the optimality of the inserted pixel level, and thus the pixel description method in the image region. There is a problem that the optimality is not guaranteed.
[0017]
As described above, each of the above-described extended methods for processing a pixel group having an arbitrary shape based on the conventional block-based coding method has a frequency analysis capability capable of introducing coding efficiency, calculation amount, and visual characteristics. All have problems.
[0018]
Therefore, an object of the present invention is to provide an image encoding device that solves the above-described problems.
[0019]
[Means for Solving the Problems]
In order to solve the above-described problem, in the device of the present invention, an image composed of a plurality of components is divided into a plurality of blocks having a predetermined size composed of a predetermined number of pixels, and a pixel signal vector included in the block is divided into the blocks. In the image encoding device that outputs as a weighted linear sum of the transform basis vectors corresponding to the pixels of the plurality of blocks that do not include the boundaries of the plurality of components based on the boundary information of the plurality of components First conversion means for converting and outputting a signal vector as a weighted linear sum of predetermined orthogonal transform base vectors; and the plurality of components in a block including the boundary among the plurality of blocks based on the boundary information For each image region to be encoded corresponding to, the orthogonal transform basis vector is matched to the shape of the image region. We derive a new transformation base vector correct, including the boundary Mu block Painting Elementary signal vector In the block Image area every Corresponding to New A second conversion means for converting and outputting as a weighted linear sum of conversion basis vectors, and a conversion output from the first conversion means and a conversion output from the second conversion means based on the boundary information. Selective output means for serially outputting encoded image signals by selectively outputting The second transform means holds, for each block including the boundary, a component corresponding to each image region of the orthogonal transform basis vector, and sets the non-corresponding component other than the component as zero. The new transformation modified to the shape of each image area by masking the vector and multiplying each masked orthogonal transformation basis vector by a scalar norm of the square of the orthogonal transformation basis vector Derive basis vectors The configuration.
[0021]
Further, by calculating the inner product of the image signal vector and the corrected new converted basis vector by the second converting means, a coefficient corresponding to the corrected new converted basis vector is obtained and each coefficient is calculated. A coefficient with the maximum absolute value is selected from among them, and a product of the orthogonal transformation base vector corresponding to the coefficient with the maximum absolute value and the coefficient with the maximum absolute value is subtracted from the pixel signal vector to derive a residual signal vector. The residual signal vector is replaced with the image signal vector, and a new absolute value maximum coefficient is selected based on the inner product of the replaced image signal vector and the modified new transform base vector, and Repeating the derivation of a new residual signal vector until the residual signal power corresponding to each image area is less than or equal to a predetermined value. Determining the coefficients of the modified new transformation base vector and the image signal vector and configured to convert expressed output as a weighted linear sum of new transformation base vector is the corrected.
[0022]
Further, by calculating the inner product of the image signal vector and the corrected new converted basis vector by the second converting means, a coefficient corresponding to the corrected new converted basis vector is obtained and each coefficient is calculated. The residual signal vector is selected by selecting the coefficient having the maximum absolute value after performing different weighting operations on the pixel and subtracting the product of the orthogonal transform base vector corresponding to the coefficient and the coefficient from the pixel signal vector. And substituting the residual signal vector with the image signal vector, and, based on the inner product of the replaced image signal vector and the modified new transform base vector, a new absolute maximum coefficient The selection and derivation of a new residual signal vector are repeated until the residual signal power corresponding to each of the image regions is equal to or lower than a predetermined value. Said modified to determine the coefficients of a new transformation base vector and the image signal vector and configured to convert expressed output as a weighted linear sum of new transformation base vector is the corrected by the.
[0023]
The orthogonal transform basis vector is configured to be a discrete cosine transform basis vector.
[0024]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0025]
(First embodiment)
FIG. 1 is a block diagram showing a configuration of a first embodiment of an image encoding apparatus to which the present invention is applied.
[0026]
In this embodiment, the processing target image is composed of a foreground such as a person and the background, the number of image components is 2, and the image area in each divided unit block (hereinafter referred to as a block) is the maximum. 2 (minimum is 1). Further, it is assumed that area information (boundary information) representing an area occupied by each image component in the screen is known.
[0027]
1 includes a blocking circuit 1 to which region information is input, a blocking circuit 2 to which an image signal to be processed is input, a DCT (Discrete Cosine Transform) basis supply circuit 3, The support

area correction circuits

4 and 5, the

coefficient derivation circuits

6 and 7 and 8, the combinational circuit 9, the determination circuit 10, and the switching circuit 11 are included. The support area correction circuit is required for the number of image components of the processing target image.
[0028]
The image signal is supplied to the blocking circuit 2, and the area information is supplied to the blocking circuit 1, and is divided into block signals and block area information for a plurality of blocks having a fixed size of N × N. The value of the number of pixels N is generally 8 or 16, but in the present embodiment, it is assumed that N = 8.
[0029]
Here, FIG. 2 is an explanatory diagram showing an example of the block area information.
[0030]
In FIG. 2, an image 20 includes a person and a background as image components. The image 20 is divided into 8 parts in the horizontal direction (x direction) and 6 parts in the vertical direction (y direction). It is divided. In FIG. 2, 15 block area information (*, y) = (3, 3), (4, 1), (4, 2), (4, 3), (4, 4), (4,5), (4,6), (5,1), (5,4), (5,6), (6,1), (6,2), (6,3), (6 , 4), (6, 5)) is a part of the image area 21 in which each block is indicated by a shaded portion having a person as an image constituent element, and an image area 22 having a background indicated by a white background as an image constituent element. It is composed of parts. In other words, these pieces of block information indicate that there is a boundary between the image constituent elements in the block, and the image is composed of a non-rectangular shape.
[0031]
Other block area information indicates that each block has only a person or only a background as an image component. In other words, the block information indicates that there is no image component boundary in the block, and the image is configured in a rectangular shape. The block area information blocked by the blocking circuit 1 as shown in FIG. 2 is supplied to the support area correction circuit 4, the coefficient derivation circuit 6, and the determination circuit 10. The support

area correction circuits

4 and 5 correct the DCT base support area supplied from the DCT base supply circuit 3 according to the image area shape for each block. The support area correction circuit 4 corresponds to the area shape of one of the two image areas (eg, a person), and the support area correction circuit 5 corresponds to the area shape of the other image area (ie, the background). Corrections are made. A support area correction method by the support

area correction circuits

4 and 5 will be described later.
[0032]
The image signal blocked by the blocking circuit 2 is supplied to

coefficient derivation circuits

6, 7 and 8. For the image signal of the block composed only of people and the block composed only of the background, the two-dimensional DCT base supplied from the DCT base supply circuit 3 by the coefficient derivation circuit 6 is used without modification, and the DCT Coefficients are derived and encoded. The encoded data from the coefficient derivation circuit 6 is supplied to the switching circuit 11.
[0033]
For the image signal of the block in which a part of the person and a part of the background are mixed, the

coefficient derivation circuits

7 and 8 respectively use the modified DCT bases supplied from the support

area modification circuits

4 and 5 to perform DCT. Coefficients are derived and encoded. The derivation of the DCT coefficient and its encoding will be described later.
[0034]
The encoded data from the

coefficient derivation circuits

7 and 8 are respectively supplied to the combinational circuit 9. Then, the encoded data that describes the two image regions constituting one block is encoded data and supplied from the combinational circuit 9 to the switching circuit 11. The multiplexing of the encoded data by the combinational circuit 9 is a process for collecting the encoded data of the two image areas in one block in order to switch the block unit by the switching circuit 11, and the

coefficient deriving circuits

7 and 8 are combined. Are simply arranged in series.
[0035]
On the other hand, in the determination circuit 10, based on the block area information of each block supplied from the blocking circuit 1, there is only one image area (person or background) for each block, or two image areas (person and background). It is determined whether or not (background) is mixed. The determination signal from the determination circuit 10 is supplied to the switching circuit 11.
[0036]
Based on the determination signal from the determination circuit 10, the switching circuit 11 applies the encoded data from the coefficient derivation circuit 6 to a block in which only one image area exists, and to the block in which two image areas exist together. The encoded data from the combinational circuit 9 is selectively switched and output as compressed serial data. That is, in FIG. 2, for the block in which the two image areas marked with * are mixed, the encoded data multiplexed in series from the combinational circuit 9 is output, and the block composed of the other one image area is output. On the other hand, encoded data represented by a weighted linear sum from the coefficient deriving circuit 6 is output.
[0037]
Next, a method of correcting the support area by the support

area correction circuits

4 and 5 will be described.
[0038]
FIG. 3 is an explanatory diagram showing an example of an encoding target image area in a block processed by the support

area correction circuits

4 and 5.
[0039]
In the present embodiment, the size of each block divided as described above is 8 × 8 (N = 8). For example, the support area correction circuit 4 uses a pixel in a shaded portion representing one person as an image component. It is assumed that the support area is corrected for the group (the number of encoding target pixels M = 31). In this case, the support area correction for the pixel group shown in white representing the background, which is another image component, is performed separately from the support area correction for the pixel group in the shaded portion representing the person by another support area correction circuit 5. Done. For this reason, as many support area correction circuits as the number of image components constituting one image are required.
[0040]
Here, for convenience of explanation, only support area correction by the support area correction circuit 4 will be described, and description of support area correction by the support area correction circuit 5 will be omitted. The DCT base supply circuit 3 has a predetermined size N × N (N = 8).
[0041]
[Outside 1]

[0042]
Is supplied.
[0043]
Of the N × N components of the DCT basis vector (see FIG. 4A described later, which is a one-dimensional expression in FIG. 4), M corresponding to the encoding target (support) image region. Keep the components (M = 31) and make the other 33 components 0
[0044]
[Outside 2]

[0045]
To correct the encoding target image area. Then, the following equation (1) is calculated
[0046]
[Outside 3]

[0047]
Then, the masked vector is divided by the square of the norm to calculate a new transform base vector corrected to the shape of the encoding target image constituent element, and the norm correction is performed. That is,
[0048]
[Expression 1]

[0049]
By
[0050]
[Outside 4]

[0051]
Is required. This new transformed basis vector is corrected to be non-normal by multiplying the DCT basis vector by a scalar.
[0052]
FIG. 4 is an explanatory diagram showing an example of the support area correction described above. FIG. 4 shows an example for a one-dimensional 8-pixel block instead of a two-dimensional block for convenience of explanation. 4A shows a predetermined DCT basis vector, FIG. 4B shows a masked vector after correction of the support area, and FIG. 4C shows a new transformed basis vector after norm correction. ing.
[0053]
In FIG. 4, shaded squares represent pixels to be encoded, and white squares represent pixels that are not to be encoded. The masked vector after the support area correction shown in FIG. 4B is set to 0 by masking a predetermined component corresponding to a pixel not to be encoded of the DCT basis vector, and is shown in FIG. The new transform base vector corrected to the shape of the image component after norm correction shown is obtained by scalar multiplication of each component corresponding to the pixel to be encoded of the masked vector.
[0054]
Next, a coefficient derivation method in the

coefficient derivation circuits

7 and 8 will be described.
[0055]
here,
[0056]
[Outside 5]

[0057]
Is an image signal vector (number of elements N × N) from the blocking circuit 2.
[0058]
First, in the first stage, the image signal vector and the norm corrected image are corrected.
[0059]
[Outside 6]

[0060]
A scalar function
[0061]
[Expression 2]

[0062]
Converted to
[0063]
[Outside 7]

[0064]
From these coefficients, the coefficient c having the maximum absolute value is _k Select.
[0065]
[Equation 3]

[0066]
Next, in the second stage, the coefficient c having the maximum absolute value _k Corresponding to
[0067]
[Outside 8]

[0068]
Coefficient c with the largest absolute value as _k And the coefficient c with the maximum absolute value _k The residual signal vector is obtained by subtracting and removing the product component of the DCT basis vector corresponding to the image signal vector from the image signal vector, and replacing this with the image signal vector.
[0069]
[Expression 4]

[0070]
Next, in the third stage, the inner product of the image signal vector replaced with the residual signal vector obtained in the second stage and the new transformed basis vector corrected after the norm correction is used as the first product. In the same manner as in step (1), the calculation is performed again and converted into a scalar function, and a new coefficient having the maximum absolute value is selected from the coefficients of the function in the same manner as in the first step.
[0071]
Next, in the fourth stage, the absolute value selected in the third stage is the maximum coefficient c. _k Then, a product component of the coefficient and the DCT basis vector corresponding to this coefficient is subtracted from the image signal vector and removed, so that a new residual signal vector is obtained again in the same manner as in the second stage, and is replaced with the image signal vector.
[0072]
Next, in the fifth stage, based on the residual signal vector obtained in the fourth stage, that is, the image signal vector, the residual signal power based on the residual signal vector is equal to or less than a threshold value set to a sufficiently small value in advance. If it is equal to or less than the threshold value, the coefficient of the new converted basis vector corrected at this time is adopted, encoded as a weighted linear sum, and converted and output. Here, the residual signal power is evaluated only for the constituent elements of the encoding target image. That is, the processing in the second stage is performed on N × N elements, but the purpose is to express the level of M pixels to be encoded. Of the vector elements, the sum of squares of the elements corresponding to the M elements is used.
[0073]
On the other hand, when it is determined that the residual signal power is not below a sufficiently small threshold value, the process returns to the third stage to continue the process, and the residual signal power is set to a sufficiently small value in the fifth stage in advance. The processes of the third stage, the fourth stage, and the fifth stage are repeatedly executed until it is evaluated that it is equal to or less than the threshold value.
[0074]
In addition, since each conversion base vector is not orthogonal, the component once removed may reappear with the removal of another component. In this case, the coefficients of the same conversion base are accumulated, and the result is finally adopted.
[0075]
As described above, according to the present embodiment described above, it is possible to realize image coding capable of handling an image signal with high efficiency and in units of image components with a relatively small hardware scale.
[0076]
FIG. 5 is an explanatory diagram for comparing various characteristics of a conventional arbitrary shape coding recently announced and an image coding according to the first embodiment of the present invention. In the coding method according to the present invention, since the transformation base obtained by modifying the general transformation base into the shape of the image constituent element based on the region information is derived, the present applicants referred to this method (RS (Region Shaped)). ) -DCT).
[0077]
In FIG. 5, the marks indicating the quality of each characteristic indicate that the double circle is very good, the circle is excellent, the triangle is normal, and the x is inferior. Represents.
[0078]
In the high-efficiency encoding by RS-DCT using DCT according to the present invention, a conversion coefficient that is apparently twice the number of pixels is derived, but only the conversion coefficient corresponding to the low-frequency component that is visually important and occupies most of the power. Is transmitted with high power concentration characteristics, it has been confirmed by computer simulation that the number of conversion coefficients to be effectively transmitted is equal to or less than that of the conventional SA-DCT. For this reason, it can be seen that RS-DCT requires only a small amount of decoder computation, is particularly excellent in frequency analysis capability, and is not inferior in other characteristics.
[0079]
(Other embodiments)
The present embodiment can be realized by the same block configuration as that of the first embodiment. However, the

coefficient derivation circuits

7 and 8 are different from the first embodiment in that the

coefficient derivation circuits

7 and 8 are configured to perform coefficient derivation using evaluation functions having different weights for each conversion coefficient.
[0080]
That is, in the first stage in the coefficient derivation by the

coefficient derivation circuits

7 and 8, for example, the coefficient c having the maximum absolute value among the absolute values of the respective coefficients weighted by the reciprocal of the coefficient order as shown in the following equation. _k May be selected.
[0081]
[Equation 5]

[0082]
Weighting is coefficient c _k Is used only to select the coefficient c _k The processing after the second stage after selecting is performed in the same manner as the processing according to the first embodiment. By this method, the appearance of high frequency coefficients can be suppressed, and the amount of information can be further reduced as compared with the first embodiment.
[0083]
Although the high frequency power is reduced by using the reciprocal of the order for weighting here, any weighting may be used according to the purpose of which power is to be concentrated.
[0084]
The encoded data obtained by the encoding process in each of the above embodiments can be basically decoded by a conventional block-based decoding method. When there are two image areas in one block, a block-shaped decoded image is obtained for each data when each data is decoded. From these, an effective image signal is selected for each pixel in accordance with the separately transmitted region information, and a desired one-block image signal in which the image signals of both image regions are mixed is obtained.
[0085]
As described above, in the apparatus according to the present invention, the minimum necessary correction is applied to the conventional rectangular block encoding process in order to have the same decoding calculation amount and frequency analysis capability as those of the conventional block-based encoding. In other words, when the encoding target image area is non-rectangular, the transform base support area used for the conventional encoding is modified to match the encoding target image area, and then the rectangular processing is performed. Arbitrary shape image signals can be processed while maintaining consistency. The conversion method related to the device of the present invention is based on rectangular conversion, but a conversion base is derived for each coding target image area in that a block-based conversion base of a certain order is used as a conversion base. Different from the former method of the prior art used, and different from the second method of the latter method of the prior art using the base itself as a conventional one, and any of the three methods described in the prior art This is a new technique that does not belong to any of the above.
[0086]
【The invention's effect】
As described above, according to the present invention, the general transform base used in the conventional encoding is modified so as to match the encoding target image area, and then processed so as to be consistent with the conventional encoding. Therefore, only the conversion coefficients corresponding to the low-frequency components that are visually important and occupy most of the power are transmitted with high power concentration characteristics. Therefore, the number of conversion coefficients to be effectively transmitted is the conventional arbitrary shape DCT. As a result, the amount of decoder calculation is small and can be realized with relatively small hardware, and the frequency analysis ability is particularly excellent, so that visual characteristics can be easily introduced. .
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a first embodiment of an image encoding device to which the present invention has been applied.
FIG. 2 is an explanatory diagram showing an example of block area information;
FIG. 3 is an explanatory diagram showing an example of an encoding target image area in a block processed in the support

area correction circuits

4 and 5;
FIG. 4 is an explanatory diagram showing an example of support area correction;
FIG. 5 is an explanatory diagram for comparing various characteristics of a conventional arbitrary shape coding recently announced and image coding according to the first embodiment of the present invention.
[Explanation of symbols]
1, 2 block circuit
3 DCT base supply circuit
4,5 Support area correction circuit
6, 7, 8 Coefficient derivation circuit
9 Combination circuit
10 Judgment circuit
11 Switching circuit
20 images
21 Foreground
22 background

Claims

An image composed of a plurality of components is divided into a plurality of blocks having a predetermined size composed of a predetermined number of pixels, and a pixel signal vector included in the block is output as a weighted linear sum of converted basis vectors corresponding to the block. In an image encoding device,
Based on the boundary information of the plurality of components, the pixel signal vector of a block that does not include the boundaries of the plurality of components among the plurality of blocks is converted and output as a weighted linear sum of predetermined orthogonal transform base vectors. First conversion means;
Based on the boundary information, the orthogonal transform base vector is determined for each image area to be encoded corresponding to the plurality of components in the block including the boundary among the plurality of blocks. A second transformation for deriving a new transformation basis vector modified to match the pixel signal vector of the block including the boundary as a weighted linear sum of a new transformation basis vector corresponding to each image area in the block Means,
Based on the boundary information, an encoded image signal is output in series by selectively outputting either the conversion output from the first conversion unit or the conversion output from the second conversion unit. Selection output means, and the second conversion means holds, for each block including the boundary, a component corresponding to each image region of the orthogonal transform basis vector, and a component that does not correspond other than the component. By masking the orthogonal transform basis vectors with zero and dividing each masked orthogonal transform basis vector by the square of the norm of the orthogonal transform basis vector, it is multiplied by the shape of each image region. An image coding apparatus characterized by deriving the new transformed basis vector modified.

By calculating the inner product of the image signal vector and the corrected new converted basis vector by the second converting means , a coefficient corresponding to the corrected new converted basis vector is obtained, and is different for each coefficient. The residual signal vector is derived by selecting the coefficient having the maximum absolute value after performing the weighting operation and subtracting the product of the orthogonal transform base vector corresponding to the coefficient and the coefficient from the pixel signal vector. The residual signal vector is replaced with the image signal vector, and a new absolute value maximum coefficient is selected based on the inner product of the replaced image signal vector and the modified new transform base vector, and Deriving a new residual signal vector is repeated until the residual signal power corresponding to each image area is equal to or lower than a predetermined value. Ri determines the coefficients of a new transformation base vector said corrected, the image signal vector to claim 1, characterized in that the conversion output is expressed as a weighted linear sum of a new transformation base vector that is the modified The image encoding device described.

By calculating the inner product of the image signal vector and the corrected new converted basis vector by the second converting means, a coefficient corresponding to the corrected new converted basis vector is obtained, and the coefficient is selected from the coefficients. Selecting a coefficient having the maximum absolute value, and deriving a residual signal vector by subtracting a product of the orthogonal transform base vector corresponding to the coefficient having the maximum absolute value and the coefficient having the maximum absolute value from the pixel signal vector; The residual signal vector is replaced with the image signal vector, and a new absolute value maximum coefficient is selected based on the inner product of the replaced image signal vector and the modified new transform base vector, and a new The correction is performed by repeating the derivation of the residual signal vector until the residual signal power corresponding to each of the image regions is equal to or less than a predetermined value. Is to determine the coefficients of the new transformation base vector, an image according to claim 1, characterized in that the converted output to represent the image signal vector as a weighted linear sum of new transformation base vector that is the modified Encoding device.

4. The image encoding apparatus according to claim 1, wherein the orthogonal transform base vector is a discrete cosine transform base vector .