KR20000031963A

KR20000031963A - Half-band subband dct/idct circuit using rac

Info

Publication number: KR20000031963A
Application number: KR1019980048235A
Authority: KR
Inventors: 차진종; 김익균; 김기철
Original assignee: 정선종; 한국전자통신연구원
Priority date: 1998-11-11
Filing date: 1998-11-11
Publication date: 2000-06-05
Also published as: KR100306745B1

Abstract

PURPOSE: A half-band sub band DCT/IDCT(discrete cosine transform/inverse discrete cosine transform) circuit using RAC(ROM and Accumulator in cascade) is provided to utilize the DCT having the same size in both of forward and backward directions, thereby enhancing the efficiency of hardware. CONSTITUTION: A half-band sub band DCT/IDCT circuit using RAC comprises a multiplexor(100), an SPC(serial-to-parallel converter; 200), three RACs(300), a butterfly unit(400) and a pre-memory(500). The multiplexor(100) selectively receives input and the output of the pre-memory(500). The SPC(200) consists of 8 resistors(201) and serially receives the image inputs from the multiplexor(100) to parallel output them. The RAC(300) receives parallel data from the SPC(200) to execute half-band sub band DCT. The butterfly unit(400) consists of 8 resistors(401) and one adder-subtracter(430) to execute a butterfly network function.

Description

Half-Band Subband DC / IDC Circuit Using ALC and Its Method

본 발명은 저전송율 영상 시스템에 효율적으로 사용될 수 있는 알고리즘과 구조를 사용한 8x8 DCT/IDCT(Discrete Cosine Transform/Inverse Discrete Cosine Transform) 회로에 관한 것이다.The present invention relates to an 8x8 DCT / IDCT (Discrete Cosine Transform / Inverse Discrete Cosine Transform) circuit using an algorithm and a structure that can be efficiently used in a low-rate imaging system.

일반적으로 영상 압축이 필요한 응용에서, 입력 영상의 신호 스펙트럼(spectrum)은 특정 주파수 대역에서 거의 모든 에너지를 가지며 다른 부분의 에너지는 거의 무시해도 되는 경우가 많다. 이러한 경우에 에너지가 집중된 서브밴드에 대한 계산만을 수행하고 나머지 밴드에 대한 계산은 수행하지 않음으로써 계산량을 혁신적으로 줄이는 것이 가능하다.In general, in applications requiring image compression, the signal spectrum of the input image has almost all of the energy in a particular frequency band and the energy of other portions can be almost negligible. In this case, it is possible to innovatively reduce the amount of calculation by performing only the calculation of the energy-intensive subbands and not the remaining bands.

저전송율 영상신호 시스템에서 더욱 적은 요구 계산량을 가지면서 고화질의 전송을 가능하게 하는 서브밴드 DCT라는 새로운 DCT 알고리즘이 있다.There is a new DCT algorithm called subband DCT that enables higher quality transmission with less computational demand in low-rate video signal systems.

도 1은 종래의 투-밴드(Two-band) 서브밴드 DCT 회로를 나타낸 블록도로서, 이에 도시된 바와 같이, N-포인트 입력샘플 X(n)을 입력받아 서브밴드 분해부(Subband decomposition)(1)을 통해 상기 입력 샘플 X(n)로부터 N/2-포인트 하위밴드 샘플(X_L)과, N/2-포인트 상위 밴드 샘플(X_H)을 생성하며, 상기 N/2-포인트 하위밴드 샘플(X_L)은 N/2-포인트 DCT(2)에서, 상기 N/2-포인트 상위 밴드 샘플(X_H)은 N/2-포인트 DST(Discrete Sine Transform)(3)에서 각각 DCT와 DST를 수행한다. 그 DCT와 DST의 결과는 가중치 네트웍(weighting network)(4)에 입력되어 최종적인 N-포인트 DCT의 결과 C(k)가 출력된다.FIG. 1 is a block diagram illustrating a conventional two-band subband DCT circuit. As shown in FIG. 1, a subband decomposition unit (N) receiving an N-point input sample X (n) is shown. 1) generates an N / 2-point lower band sample (X _L ) and an N / 2-point upper band sample (X _H ) from the input sample X (n), and generates the N / 2-point lower band. Sample (X _L ) is in N / 2-point DCT (2), and the N / 2-point upper band sample (X _H ) is in N / 2-point Discrete Sine Transform (DST) 3 respectively. Perform The result of the DCT and DST is input to a weighting network 4, and the result C (k) of the final N-point DCT is output.

이와 같이 구성된 서브밴드 DCT의 큰 장점은, N-포인트 DCT를 보다 작은 크기의 DCT와 DST로 표현하여 필요 계산량을 줄일 수 있으며, 서브밴드의 개념을 도입하여 영상압축에 유용한 DCT의 근사계산법을 적용할 수 있는 것이다.The great advantage of this subband DCT is that N-point DCT can be expressed as smaller size DCT and DST to reduce the amount of computation required, and the approximation method of DCT, which is useful for image compression, is introduced by introducing the concept of subband. You can do it.

영상 신호의 대부분의 에너지가 저주파수 범위에 집중되어 있는 경우에 고역통과 밴드(high-pass band)에 대한 계산을 수행하지 않는 경우의 계산량은 N/2-포인트 DCT를 수행하는 데 필요한 계산량과 거의 동일하며, 이러한 DCT의 계산을 하프-밴드 서브밴드 DCT라 한다.If most of the energy in the video signal is concentrated in the low frequency range, the calculations for no high-pass band calculations are nearly equal to the calculations required to perform N / 2-point DCT. This calculation of DCT is called half-band subband DCT.

도 2에 종래의 하프-밴드 서브밴드 DCT를 수행하기 위한 블록 다이어그램이 나타나 있다.2 is a block diagram for performing a conventional half-band subband DCT.

N-포인트 입력샘플 X(n)을 입력받아 서브밴드 분해부(21)을 통해 상기 입력 샘플 X(n)로부터 N/2-포인트 하위밴드 샘플(X_L)을 생성하며, 상기 N/2-포인트 하위밴드 샘플(X_L)은 N/2-포인트 DCT(22)에서 DCT를 수행한다. 그 DCT의 결과는 가중치 네트웍(23)에 입력되어 최종적인 N-포인트 DCT의 결과 C(k)가 출력된다.The N-point input sample X (n) is input to generate an N / 2-point lower band sample (X _L ) from the input sample X (n) through a subband decomposition unit 21, and the N / 2- The point lowerband sample (X _L ) performs DCT at N / 2-point DCT 22. The result of the DCT is input to the weighting network 23, and the result C (k) of the final N-point DCT is output.

이와 같이 구성된 하프-밴드 서브밴드 DCT를 하드웨어로 구현하는 경우에 실제로 사용되는 DCT의 크기가 정방향과 역방향 시에 서로 다르다는 문제를 해결해야 한다. 즉 N=8인 경우에 정방향 시에는 4-포인트 DCT를 수행하여야 하나, 역방향 시에는 8-포인트 IDCT를 수행해야 하는 문제가 생긴다. 이러한 문제는 정방향 DCT와 역방향 DCT를 각 각 독립된 모듈로 수행하면 문제가 발생하지 않으나, 이 경우 하드웨어의 효율성이 극도로 저하되는 문제가 발생한다.When the half-band subband DCT configured as described above is implemented in hardware, it is necessary to solve the problem that the size of the DCT actually used is different in the forward and reverse directions. That is, when N = 8, 4-point DCT should be performed in the forward direction, but 8-point IDCT should be performed in the reverse direction. This problem does not occur when the forward DCT and the reverse DCT are performed as independent modules, but in this case, the hardware efficiency is extremely degraded.

본 발명에서는 상기와 같은, DCT의 크기에 의한 문제를 정방향과 역방향 모두에 같은 크기의 DCT를 사용함으로써 해결하고자 한다. 즉 8-포인트 IDCT를 두개의 RAC을 이용하여 각기 4-포인트 IDCT를 수행함으로써 순방향 4-포인트 DCT와 8-포인트 IDCT가 하나의 장치에서 처리할 수 있도록 하는 것이다.In the present invention, the problem of the size of the DCT as described above is to be solved by using the same size of the DCT in both the forward and reverse directions. In other words, the 8-point IDCT is performed by using two RACs, respectively, so that the forward 4-point DCT and the 8-point IDCT can be processed by one device.

동일한 크기의 DCT를 정방향 DCT와 역방향 DCT에 모두 사용하면 동일한 하드웨어를 반복하여 사용할 수 있는 가능성이 생기게 된다. 즉 4-포인트 DCT에 사용된 하드웨어 모듈을 역방향에서도 사용할 수 있게 된다. 또한 8-포인트 IDCT연산에 필요한 4-포인트 IDCT연산을 동시에 수행하지 않고 순차적으로 수행하면 동일한 하드웨어의 반복 사용도가 더욱 높아진다. 따라서 4-포인트 DCT에 사용되는 구조는 서로 다른 DCT 연산에서 하드웨어의 반복 사용을 극대화 할 수 있는 구조를 사용하는 것이 바람직하다.Using the same sized DCT for both forward and reverse DCTs creates the possibility of repeating the same hardware. That is, the hardware module used for the 4-point DCT can be used in the reverse direction. In addition, if the 4-point IDCT operation required for the 8-point IDCT operation is performed sequentially without performing the same time, the repeated use of the same hardware is further increased. Therefore, the structure used for 4-point DCT is preferably used to maximize the repeated use of hardware in different DCT operations.

서로 다른 DCT 연산을 수행하면서 하드웨어의 반복 사용을 극대화 할 수 있는 방법으로 분산 산술처리(distributed arithmetic) 방법이 있다. 분산 산술처리는 신호처리에서 많이 쓰이는 방법이며 DCT의 구현에도 매우 자주 쓰이는 방법이다. 아래에 분산 산술 처리에 대하여 간단히 설명한다.There is a distributed arithmetic method that can maximize the repetitive use of hardware while performing different DCT operations. Distributed arithmetic is a popular method in signal processing and very often used in the implementation of DCT. The distributed arithmetic processing is briefly described below.

변수 X가 변수들 Y₀,Y₁,Y₂,Y₃ 에 의하여 다음 식에 따라 구해진다고 가정한다.Variable X is variables Y ₀ , Y ₁ , Y ₂ , Y ₃ Is assumed to be obtained according to the following equation.

X=C₀Y₀+C₁Y₁+C₂Y₂+C₃Y₃ ……… (1) X = C ₀ Y ₀ + C ₁ Y ₁ + C ₂ Y ₂ + C ₃ Y ₃ … … … (One)

여기서 변수 Y₀,Y₁,Y₂,Y₃ 가 n-bit 2의 보수(2's complement)로 표현되어 있는 경우에 Y_i,0≤i≤3, 는 다음과 같이 표현 된다.Where the variable Y ₀ , Y ₁ , Y ₂ , Y ₃ Is expressed as n'bit 2's complement. Y _i , 0≤i≤3, Is expressed as

Yi=-2^n-1y_i ^n-1+2^n-2y_i ^n-2+...+2y_i ¹+y_i ⁰ Yi = -2 ^n-1 y _i ^n-1 +2 ^n-2 y _i ^n-2 + ... + 2y _i ¹ + y _i ⁰

이 때 X는 다음과 같이 표현될 수 있다.X may be expressed as follows.

위의 식에서 ∑_i=0 ³y_i ^kC_i 의 값들을 y_i ^k 의 가능한 모든 경우에 대하여 미리 계산하여 ROM에 저장하여 놓으면 2^k∑_i=0 ³y_i ^kC_i 의 값을 쉽게 구할 수 있다. 이러한 중간 값들을 계속하여 구하면 최종적으로 원하는 값을 구할 수 있다. 도 3에 식 (2)의 값을 구하는 회로가 나타나 있다. 그림에서 ROM과 누적기(accumulator)가 연결되어 있는 것을 RAC(ROM and accumulator in cascade)라고 하며, 4-포인트 입력 Y₀, Y₁, Y₂, Y₃을 입력받는 ROM(31)의 출력을 입력받아 가/감산기(32) 및 쉬프트 레지스터(Shifter Register)(33)를 통한 피드백에 의해 하기 식(3)과 (4)의 우측에 있는 코사인 메트릭스-벡터와 상기 입력값을 곱셈하여 그 결과를 출력하도록 구성된다. 이는 일반적인 구성으로서 본 발명에서는 이러한 RAC을 3개를 사용하여 하나의 RAC을 이용하여 4-포인트 순방향 DCT를 처리하고, 다른 두개의 4-포인트 RAC을 이용하여 8-포인트 IDCT를 처리하도록 함에 특징이 있다.In the above expression ∑ _{i = 0} ³ y _i ^k C _i The values of y _i ^k For all possible cases of pre-calculation and storing in ROM 2 ^k ∑ _{i = 0} ³ y _i ^k C _i You can easily find the value of. By continuing to obtain these intermediate values, we can finally get the desired value. The circuit which calculates the value of Formula (2) is shown by FIG. In the figure, the ROM and the accumulator are called RAC (ROM and accumulator in cascade), and the output of ROM (31) that receives 4-point inputs Y ₀ , Y ₁ , Y ₂ , Y ₃ is shown. The input value is multiplied by the cosine matrix-vector on the right side of the following equations (3) and (4) by feedback through the adder / subtracter 32 and the shift register 33, and the result is obtained. Is configured to output. This is a general configuration. In the present invention, three RACs are used to process a 4-point forward DCT using one RAC, and two different 4-point RACs are used to process an 8-point IDCT. have.

도 1은 종래 투-밴드 서브밴드(Two-band subband) DCT 알고리즘을 나타낸 블럭도.1 is a block diagram illustrating a conventional two-band subband DCT algorithm.

도 2는 종래 하프-밴드 서브밴드(Half-band subband) DCT 근사화 모델을 나타낸 블럭도.2 is a block diagram illustrating a conventional half-band subband DCT approximation model.

도 3은 종래 분산 산술처리 장치를 나타낸 블럭도.3 is a block diagram showing a conventional distributed arithmetic processing apparatus.

도 4는 본 발명에 의한 RAC를 사용하는 저전송율 영상 신호용 하프-밴드 서브밴드 8x8 DCT/IDCT를 나타낸 블럭도.4 is a block diagram showing a half-band subband 8x8 DCT / IDCT for a low rate video signal using RAC according to the present invention;

＜도면의 주요부분에 대한 부호의 설명＞<Description of the code | symbol about the principal part of drawing>

100 : 멀티 플렉서100: multiplexer

200 : 순차병렬변환기(SPC : Serial-to-Parallel Converter)200: Serial-to-Parallel Converter (SPC)

201,401 : 레지스터201,401: register

300 : 4-포인트 RAC(ROM and Accumulator in Cascade)300: 4-point RAC (ROM and Accumulator in Cascade)

400 : 버터플라이부 410,420 : 레지스터 뱅크_A, B400: butterfly part 410,420: register bank_A, B

430 : 가/감산기430: adder / subtractor

500 : 전치메모리(TM ; Transposition Memory)500: Transposition Memory (TM)

이하, 본 발명의 실시예를 첨부된 도면을 참조해서 상세히 설명하면 다음과 같다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 4에 저전송율 영상신호용 8x8 DCT/IDCT의 구조가 나타나 있다. 도 4에 도시된 바와 같이, 입력 및 전치메모리의 출력을 선택적으로 입력받는 하나의 멀티플랙서(100)와, 8개의 레지스터(201)로 구성되어 상기 멀티 플렉서(100)로부터 공급되는 영상 입력을 행 단위로 순차적으로 입력 받아 병렬로 출력하는 순차병렬변환기(SPC)(200)와, 그 순차병렬변환기(200)로부터 병렬데이타를 입력받아 하프밴드 서브밴드 DCT를 행하는 3개의 RAC(300)와, 8개의 레지스터(401)와 하나의 가감산기(430)로 구성되어 버터플라이 네트웍 기능을 수행하는 버터플라이 유니트(400)와, 그 버터플라이 유니트(400)의 출력을 공급받아 상기 멀티플렉서(100)에 로드시켜 상기 순차병렬변환기(200)로 공급해주는 전치메모리(500)로 구성된다.4 shows the structure of an 8x8 DCT / IDCT for low bit rate video signal. As shown in FIG. 4, an image input supplied from the multiplexer 100 includes one multiplexer 100 and eight registers 201 that selectively receive an input and an output of a pre-memory. And Sequential Parallel Converter (SPC) 200 for sequentially inputting and outputting in parallel in units of rows, three RACs 300 for receiving half-band subband DCT from parallel data from the Sequential Parallel Converter 200; And a butterfly unit 400 composed of eight registers 401 and one adder and subtractor 430 to perform a butterfly network function, and the output of the butterfly unit 400 receives the multiplexer 100. It is composed of a pre-memory 500 to be loaded in and supplied to the sequential parallel converter 200.

이와 같이 구성된 본 발명은, 순차병렬변환기(200)는, 4개의 레지스터가 직렬연결되고, 그 4개의 레지스터의 각각의 출력을 받는 4개의 레지스터가 병렬로 출력하도록 구성되어 순차병렬변환(SPC ; serial-to-parallel converter)의 기능을 수행하도록 구성된다. 상기 버터플라이 유니트(400)는 8개의 레지스터(401)와 가/감산기로 구성하되, 상기 3개의 4-RAC(300)로부터 출력되는 4-포인트 데이타를 4개의 레지스터가 각각 입력받는 뱅크_A(Bank_A)(410)와, 그 뱅크_A와 동일한 구성으로 상기 4-포인트 데이타를 4개의 레지스터가 입력받는 뱅크_B(Bank_B)(420)와, 상기 뱅크_A 및 뱅크_B의 출력을 가/감산하여 버터플라이 네트웍(butterfly network)의 역할을 수행하는 가/감산기(430)로 구성된다.According to the present invention configured as described above, the sequential parallel converter 200 is configured such that four registers are connected in series, and four registers receiving the output of the four registers are output in parallel. -to-parallel converter). The butterfly unit 400 is composed of eight registers 401 and adders / subtracters, and four registers receive four-point data output from the three 4-RACs 300. Bank_A) 410, Bank_B (Bank_B) 420 to which the four-point data is inputted in the same configuration as that of Bank_A, and outputs of the Bank_A and Bank_B. It consists of an adder / subtractor 430 that performs the role of a butterfly network by subtracting / subtracting.

도 4에는 3개의 4-입력 RAC(300)가 구비되어 구성되며, RAC 하나는 하프-밴드 서브밴드 DCT를 위한 4-포인트 순방향 DCT를 수행하는데 사용되며, 나머지 두 개의 RAC은 8-포인트 IDCT(역방향 DCT)를 수행하는데 사용된다.4 is provided with three four-input RAC 300, one RAC is used to perform a four-point forward DCT for half-band subband DCT, the other two RAC is an 8-point IDCT ( Is used to perform reverse DCT).

이와 같이 구성된 본 발명에 의한 저전송율 영상신호용 DCT/IDCT 동작을 설명한다.The DCT / IDCT operation for the low bit rate video signal according to the present invention configured as described above will be described.

먼저 IDCT를 수행하는 과정을 설명하기로 한다. 8-포인트 IDCT는 아래에 보이는 두 개의 식으로 나타낼 수 있다.First, the process of performing IDCT will be described. An eight-point IDCT can be represented by the two equations shown below.

… (3) … (3)

… (4) … (4)

도 4의 8x8 DCT/IDCT 모듈이 IDCT를 수행하는 과정은 아래와 같다.The process of performing IDCT by the 8x8 DCT / IDCT module of FIG. 4 is as follows.

제 1단계는 멀티플렉서(100)는 최초 외부 입력을 선택하여 순차병렬 변환기(200)에 공급한다. 즉, 8x8 영상 데이터가 행 단위로 순차적으로 SPC에 입력된다. 이 때 하나의 행은 짝수번째 데이터 4개의 데이터(Y0,Y2,Y4,Y6)가 먼저 입력되고, 홀수번째 데이터 4개(Y1,Y3,Y5,Y7)는 그 후에 입력된다.In the first step, the multiplexer 100 selects an initial external input and supplies it to the sequential parallel converter 200. That is, 8x8 image data is sequentially input to the SPC in units of rows. In this case, one row of four even-numbered data (Y0, Y2, Y4, Y6) is inputted first, and the fourth odd-numbered data (Y1, Y3, Y5, Y7) is input thereafter.

제 2단계는 처음에 입력된 4개의 데이터는 식 (3)에 보이는 매트릭스-벡터 멀티플리케이션(matrix-vector multiplication)이 하나의 RAC(300)에 의하여 수행되어 그 결과는 버터플라이 유니트(400)의 레지스터 뱅크_A(410)에 저장된다.In the second step, the first four data inputted are matrix-vector multiplication shown in equation (3) by one RAC 300, and the result of the butterfly unit 400 is obtained. It is stored in register bank_A 410.

제 3단계는 다음에 입력되는 4개의 데이터는 식 (4)에 보이는 매트릭스-벡터멀티플리케이션이 또 다른 하나의 RAC(300)에 의하여 수행되어 그 결과는 버터플라이 유니트(400)의 레지스터 뱅크_B(420)에 저장된다.The third step is that the next four data inputs are performed by the matrix-vector multiplication shown in equation (4) by another RAC 300 and the result is the register bank_B of the butterfly unit 400. 420 is stored.

제 4단계는 다음 행의 데이터가 RAC(300)에서 처리되는 동안에 레지스터 뱅크_A, B(410),(420)에 있는 중간 결과들은 가/감산기(430)에 의하여 식 (3),(4)의 좌측에 보이는 버터플라이 계산이 수행되어 그 결과는 순차적으로 전치메모리(TM)(500)에 저장된다.In the fourth step, intermediate results in register banks A, B 410, and 420 are processed by the adder / subtracter 430 while the next row of data is processed in the RAC 300. The butterfly calculations shown on the left side of the c) are performed and the results are sequentially stored in the prememory (TM) 500.

상기 제 1단계에서 제 4단계까지의 과정은 전치메모리(500)에 64개의 데이터가 모두 저장될 때까지 계속된다.The process from the first to the fourth step is continued until all 64 data are stored in the pre-memory 500.

제 5단계는 전치메모리(500)에 있는 8x8 데이터가 열 단위로 순차적으로 순차병렬변환기(SPC)(200)에 입력된다. 즉, 멀티플렉서(100)가 외부 입력 대신에 상기 전치메모리(500)의 출력을 선택하여 순차 병렬변환기(200)에 공급하게 된다.In the fifth step, 8x8 data in the pre-memory 500 is sequentially input to the SPC 200 in units of columns. That is, the multiplexer 100 selects the output of the pre-memory 500 instead of an external input and sequentially supplies the output to the parallel converter 200.

이 때도 하나의 열은 짝수번째 데이터 4개의 데이터(Y0,Y2,Y4,Y6)가 먼저 입력되고 홀수번째 데이터 4개(Y1,Y3,Y5,Y7)는 그 후에 입력된다.Also in this case, data of four even-numbered data (Y0, Y2, Y4, Y6) is input first, and four odd-numbered data (Y1, Y3, Y5, Y7) are input thereafter.

제 6단계는, 처음에 입력된 4개의 데이터는 식 (3)에 보이는 매트릭스-벡터 멀티플리케이션이 하나의 RAC(300)에 의하여 수행되어 그 결과는 버터플라이 유니트(400)의 레지스터 뱅크_A(410)에 저장된다.In the sixth step, the first four data inputs are performed by a single RAC 300 with matrix-vector multiplication shown in equation (3), and the result is the register bank_A (of the butterfly unit 400). 410.

제 7단계는 다음에 입력되는 4개의 데이터는 식 (4)에 보이는 매트릭스-벡터 멀티플리케이션이 또 다른 하나의 RAC(300)에 의하여 수행되어 그 결과는 버터플라이 유니트(400)의 레지스터 뱅크_B(420)에 저장된다.In the seventh step, the next four data inputs are performed by the matrix-vector multiplication shown in equation (4) by another RAC 300 and the result is the register bank_B of the butterfly unit 400. 420 is stored.

제 8단계는 다음 열의 데이터가 RAC(300)에서 처리되는 동안에 레지스터 뱅크_A, B(410), (420)에 있는 중간 결과들은 가/감산기(430)에 의하여 식 (3),(4)의 좌측에 보이는 버터플라이 계산이 수행되어 그 결과는 순차적으로 8x8 2-D IDCT의 결과로 출력된다.In the eighth step, intermediate results in register banks A, B, 410, and 420 are processed by the adder / subtracter 430 while the next column of data is processed in the RAC 300. The butterfly calculation shown on the left side of is performed and the result is sequentially output as the result of 8x8 2-D IDCT.

상기 제 5단계에서 제 8단계까지의 과정은 64개의 데이터가 모두 출력될 때까지 계속된다.The process from the fifth step to the eighth step is continued until all 64 data are output.

위에 보인 IDCT 수행 과정의 제 3단계와 제 4단계, 그리고 제 7단계와 제 8단계 사이에 레지스터 뱅크 충돌이 생기지 않는 것이 중요하다. 본발명에서 제안되는 DCT/IDCT는 분산 산술처리 방식을 사용함으로 RAC의 결과는 4개의 중간 결과가 모두 결정된 뒤에 한 번에 레지스터 뱅크로 출력되기 때문에 레지스터 뱅크 충돌이 생기지 않는다. 예를들어 내부 정밀도가 16-bit인 경우에 4-RAC의 각 입력이 2-bit이면 RAC에서는 8클럭마다 중간 결과를 출력한다. 따라서 레지스터 뱅크와 가/감산기(430)에서 8클럭마다 8개의 결과를 출력하면, 레지스터 충돌(register contention)이 생기지 않는다.It is important that no register bank conflicts occur between the third and fourth stages and the seventh and eighth stages of the IDCT process shown above. The DCT / IDCT proposed in the present invention uses a distributed arithmetic method, so that the result of the RAC is output to the register bank at once after all four intermediate results are determined, so that there is no register bank conflict. For example, if the internal precision is 16-bit and each input of 4-RAC is 2-bit, RAC will output an intermediate result every 8 clocks. Therefore, if the register bank and the adder / subtracter 430 output eight results every eight clocks, there is no register contention.

본 발명에서 제안된 8x8 DCT/IDCT 모듈이 하프-밴드 서브밴드 DCT를 수행하는 동작을 설명한다. 여기서 영상 압축 시스템은 DCT/IDCT의 입력으로 2x2 하다마드 변환(hadamard transform)이 수행된 4x4 데이터를 입력시킨다. 4x4 매트릭스의 각 요소는 8x8 매트릭스의 2x2 서브-매트릭스의 평균값이므로 간단히 계산될 수 있다.The operation of the 8x8 DCT / IDCT module proposed in the present invention performs half-band subband DCT. Here, the image compression system inputs 4x4 data on which 2x2 Hadamard transform is performed as an input of DCT / IDCT. Each element of the 4x4 matrix is simply an average of the 2x2 sub-matrix of the 8x8 matrix, so it can be simply calculated.

본 발명에서 제안된 8x8 DCT/IDCT는 하다마드 변환된 4x4 데이터에 대한 DCT만을 수행하고 8x8 데이터의 나머지 48개에 대한 DCT 계수는 영상 압축 시스템에서 '0'으로 처리된다. 8x8 DCT/IDCT에서 4x4 2-D DCT를 수행하는 과정은 다음과 같다.The 8x8 DCT / IDCT proposed in the present invention performs only DCT on Hadamard transformed 4x4 data and the DCT coefficients for the remaining 48 of 8x8 data are treated as '0' in the image compression system. The process of performing 4x4 2-D DCT in 8x8 DCT / IDCT is as follows.

제 1단계는 멀티플렉서(100)에서 선택된 4x4 영상 데이터가 행 단위로 순차적으로 순차병렬변환기(SPC)(200)에 입력된다.In the first step, 4x4 image data selected by the multiplexer 100 is sequentially input to the SPC 200 in row units.

제 2단계는 입력된 4개의 데이터는 1-D DCT에 필요한 매트릭스-벡터 멀티플리케이션이 하나의 RAC(300)에 의하여 수행되어 그 결과는 버터플라이 유니트(400)의 레지스터 뱅크_A(410)에 저장된다.The second step is that four input data are performed by one RAC 300 for matrix-vector multiplication necessary for 1-D DCT, and the result is stored in register bank_A 410 of the butterfly unit 400. Stored.

제 3단계는, 다음 행의 데이터가 RAC(300)에서 처리되는 동안에 레지스터 뱅크_A(410)에 있는 중간 결과들은 순차적으로 전치메모리 TM(500)에 저장된다.In a third step, intermediate results in register bank_A 410 are sequentially stored in pre-memory TM 500 while the next row of data is processed in RAC 300.

상기 제 1단계에서 제 3단계까지의 과정은 전치메모리(500)에 16개의 데이터가 모두 저장될 때까지 계속된다.The process from the first to the third step is continued until all 16 data are stored in the pre-memory 500.

제 4단계는 상기 전치메모리(500)에 있는 4x4 데이터가 멀티플렉서(100)를 통하여 열 단위로 순차적으로 순차병렬변환기(200)에 입력된다.In the fourth step, 4x4 data in the pre-memory memory 500 is sequentially input to the sequential parallel converter 200 in units of columns through the multiplexer 100.

제 5단계는 입력된 4개의 데이터는 1-D DCT에 필요한 매트릭스-벡터 멀티플리케이션이 하나의 RAC에 의하여 수행되어 그 결과는 버터플라이 유니트(400)의 레지스터 뱅크_B(420)에 저장된다.In the fifth step, four input data are performed by one RAC for matrix-vector multiplication necessary for 1-D DCT, and the result is stored in register bank_B 420 of the butterfly unit 400.

제 6단계는 다음 열의 데이터가 RAC에서 처리되는 동안에 레지스터 뱅크_B(420)에 있는 결과들은 순차적으로 4x4 2-D DCT의 결과로 출력된다.In the sixth step, the results in the register bank_B 420 are sequentially output as the result of the 4x4 2-D DCT while the next column of data is processed in the RAC.

상기 제 4단계에서 제 6단계까지의 과정은, 16개의 데이터가 모두 출력될 때까지 계속된다.The process from the fourth step to the sixth step is continued until all 16 data are output.

도 4에서 하나의 RAC가 4개의 입력에 대하여 매트릭스-벡터 멀티플리케이션을 수행하는데 8개의 클럭이 소요되는 것으로 가정한다. 이는 내부 정밀도가 16-bit이고 각 채널 당 2-bit가 있는 경우를 가정한 것이다. 따라서 순차병렬변횐기(200)에 입력되는 데이터들은 두 클럭에 하나 씩 입력되어야 한다.In FIG. 4, it is assumed that one clock takes eight clocks to perform matrix-vector multiplication on four inputs. This assumes that the internal precision is 16-bit and there are 2-bits for each channel. Therefore, data input to the sequential parallel converter 200 should be input one by one to two clocks.

8x8 2-D IDCT를 수행하는 경우에 도 4의 버터플라이 회로에서는 8x8 2-D IDCT 레지스터 뱅크_A(410)와 레지스터 뱅크_B(420)에 8개의 데이터가 입력된 후에는 8 클럭 안에 8개의 데이터를 계산하여(butterfly computation) 전치메모리(500)에 공급하여야 한다. 이는 뱅크_A(410)와 뱅크_B(420)에 8개의 데이터가 입력된 후 8 클럭 후에는 뱅크_A(410)에 새로운 데이터가 공급되기 때문이다.In the case of performing 8x8 2-D IDCT, in the butterfly circuit of FIG. 4, after eight data are input to the 8x8 2-D IDCT register bank_A 410 and register bank_B 420, the 8x8 2-D IDCT is executed within 8 clocks. Data must be calculated and supplied to the pre-memory 500. This is because eight data are input to the bank_A 410 and the bank_B 420, and new data is supplied to the bank_A 410 after eight clocks.

8x8 2-D IDCT를 수행하는 경우에 하나의 행이 1-D DCT가 수행되어 전치메모리(500)에 저장될 때까지는 40클럭이 소요된다. 실제로는 32 클럭만에 8개의 데이터가 전치메모리(500)에 저장되나 그 후 8 클럭 동안은 버터플라이 회로가 동작하지 못하므로 40 클럭이 소요되는 것으로 한다.In the case of performing 8x8 2-D IDCT, 40 rows are required until one row of 1-D DCT is performed and stored in the pre-memory 500. In reality, eight data are stored in the pre-memory memory 500 only in 32 clocks, but since the butterfly circuit does not operate for eight clocks, it takes 40 clocks.

따라서 8 개의 행이 파이프라인 방식으로 공급되면 8 개의 행에 대하여 각각 1-D DCT가 수행되어 그 결과가 전치메모리에 저장되는 데는 152 클럭이 소요된다. 전치메모리(500)에 저장된 8개의 행에 대하여 다시 각 각 1-D DCT 가 수행되어 그 결과가 출력되는 데에 다시 152 클럭이 소요되므로 하나의 8x8 2-D DCT가 수행되는 데는 304 클럭이 소요된다.Therefore, if eight rows are supplied in a pipelined manner, 1-D DCT is performed on each of the eight rows, and it takes 152 clocks to store the result in the pre-memory. Since each 1-D DCT is performed on 8 rows stored in the pre-memory memory 500, and it takes 152 clocks to output the result, it takes 304 clocks to perform one 8x8 2-D DCT. do.

4x4 2-D IDCT를 수행하는 경우에 하나의 행이 1-D DCT가 수행되어 전치메모리에 저장될 때까지는 24 클럭이 소요된다. 4개의 행이 파이프라인 방식으로 공급되면 4개의 행에 대하여 각각 1-D DCT가 수행되어 그 결과가 전치메모리에 저장되는 데는 48 클럭이 소요된다. 전치메모리에 저장된 4 개의 행에 대하여 다시 각각 1-D DCT 가 수행되어 그 결과가 출력되는 데에 다시 48 클럭이 소요되므로 하나의 4x4 2-D DCT가 수행되는 데는 96 클럭이 소요된다.When 4x4 2-D IDCT is performed, 24 rows are required until one row of 1-D DCT is performed and stored in the pre-memory. When four rows are supplied in a pipelined manner, each 1-D DCT is performed on each of the four rows, and it takes 48 clocks to store the result in the prememory. Since each 1-D DCT is performed on each of four rows stored in the pre-memory, and the result is 48 clocks again, it takes 96 clocks to perform one 4x4 2-D DCT.

초 당 30개의 QCIF에 두 개의 IDCT를 수행하고 하나의 하프-밴드 서브밴드 DCT를 수행하는 경우에 소요되는 클럭의 수는 다음과 같다. 여기서 블록과 블록사이에서는 파이프라인 방식이 사용되지 않는 것을 가정한다.When two IDCTs are performed at 30 QCIFs per second and one half-band subband DCT is performed, the number of clocks is as follows. It is assumed here that no pipeline method is used between blocks.

30 x 99 x 6 x (96 + 304 + 304) ≈ 11.4 x 10⁶ 30 x 99 x 6 x (96 + 304 + 304) ≈ 11.4 x 10 ⁶

따라서 8x8 DCT/IDCT에 요구되는 동작 클럭주파수는 약 12MHz이다.Therefore, the operating clock frequency required for 8x8 DCT / IDCT is about 12 MHz.

본 발명은 초 당 30 개의 QCIF 화면 처리에 두 개의 IDCT를 수행하고 하나의 하프-밴드 서브밴드 DCT를 수행하는 경우 필요한 동작 주파수가 약 12MHz로서 매우 낮기 때문에 소비전력을 줄일 수 있으며, 정방향과 역방향 모두에 동일한 하드웨어를 반복하여 사용함으로써 적은 하드웨어 비용으로 저전송율 영상신호를 위한 8x8 DCT/IDCT를 효율적으로 수행한다.According to the present invention, when two IDCTs are performed for 30 QCIF screens per second and one half-band subband DCT, the required operating frequency is about 12 MHz, which is very low, thereby reducing power consumption. By repeatedly using the same hardware, 8x8 DCT / IDCT for low bit rate video signal can be efficiently performed at low hardware cost.

Claims

A multiplexer to select the input and output of the prememory,

Sequential parallel conversion means for sequentially receiving the image input selected by the multiplexer in units of rows and outputting them in parallel;

Three RACs that receive the input data output from the sequential parallel converter and perform matrix-beta multiplication to process half-band subband DCT / IDCT;

Butterfly means for performing a function of a butterfly network on data output from the RAC;

A half-band subband DCT / IDCT circuit using a RAC consisting of a pre-memory means for receiving the output of the butterfly means, loading it into the multiplexer, and supplying it to the serial-to-parallel conversion means.

The method of claim 1, wherein the DCT / IDCT circuit,

Half-band subband DCT / IDCT circuit using RAC, characterized in that it is configured to transmit to the butterfly means at once after all four intermediate results in the RAC have been determined using distributed arithmetic. .

The method of claim 1, wherein the three RAC,

One RAC that performs 4-point forward DCT for half-band subband DCT,

Half-band subband DCT / IDCT circuit using RAC, characterized in that it consists of two RACs performing 8-point IDCT.

The method of claim 1, wherein the butterfly means,

Register bank_A, register bank_B, each of which consists of four registers and alternately stores the output of the RAC;

A half-band subband DCT / IDCT circuit using the RAC, characterized in that the register bank_A and one adder / subtracter for performing butterfly calculations by reading the data in the register bank_B.

In the DCT / IDCT circuit,

One 4-point RAC for DCT processing and two 4-point RAC different from the RAC for IDCT processing,

A first process of first inputting even-numbered data in row units and then receiving odd-numbered data in parallel and sequentially converting them in parallel;

The serial-converted data is input from the RAC and matrix-vector multiplication is performed on a row-by-row basis, but in one RAC for 4-point forward DCT processing, and different from the one RAC for 8-point IDCT processing. Performing a matrix-vector multiplication by dividing data by four points in two RACs and storing the result in register bank_A and register bank_B in sequence;

A third step of reading data stored in the register banks A and B, performing butterfly calculation by addition / decrement calculation, and storing the data in the pre-memory;

After repeating the third process from the first process to process all the data in units of rows, the butterfly calculation is completed while repeating the processes from the first to the third process of data stored in the transpose memory in columns The half-band subband DCT / IDCT method using the RAC, characterized in that for performing a fourth process of outputting the.

The method of claim 5, wherein the second process,

A half-band subband DCT / IDCT method using a RAC, characterized in that the result of the RAC is output to the register bank at once after all four intermediate results are determined by using a distributed arithmetic processing.