KR100939351B1

KR100939351B1 - Method and Apparatus for Managing Fault

Info

Publication number: KR100939351B1
Application number: KR1020070104330A
Authority: KR
Inventors: 김범수; 황찬규; 유재형
Original assignee: 주식회사 케이티
Priority date: 2007-10-17
Filing date: 2007-10-17
Publication date: 2010-01-29
Also published as: KR20090038981A

Abstract

본 발명은 장애 관리 장치 및 방법에 관한 것이다.The present invention relates to a failure management apparatus and method.

본 발명은 효과적인 KEDB(Known Error DB) 구축 방법을 통해 운용자들에게 장애 처리를 위한 필수적인 정보를 제공하고, 장애 처리를 위한 관리 단계별 유기적인 협조 체계를 지원하며, 발생된 장애에 대한 신속하고 정확한 진단과 처리를 지원하는 것이다.The present invention provides operators with essential information for handling faults through an effective KEDB (Known Error DB) construction method, supports organic cooperative systems for each step of handling faults, and provides a quick and accurate diagnosis of the faults that have occurred. And to support processing.

본 발명은 KEDB 기반의 장애 관리 방법과 에러 복구 경로(Error Recovery Path) 정보 기반의 KEDB 구축 방법 및 에러 복구 경로 정보의 조립 방법을 상세하게 기술한다.The present invention describes in detail a KEDB-based failure management method, a KEDB construction method based on error recovery path information, and an assembly method of error recovery path information.

본 발명은 실무에 불필요한 IT 구조 라이브러리 서비스 서포트(Service Support) 운용 지침의 작업 절차 및 활동을 최소화하고 역할과 권한에 따른 전담과 위임 중심의 관리 단계별 명확한 업무 분리가 가능한 효과를 기대할 수 있다.The present invention can be expected to minimize the work procedures and activities of the IT structure library service support operation guideline that is unnecessary for practical use, and to be able to clearly separate the tasks according to the role and authority and the management phase based on the delegation.

ITIL, ITSM, KEDB, 사건 관리, 문제 관리, 변경 관리 ITIL, ITSM, KEDB, Incident Management, Problem Management, Change Management

Description

Fault management apparatus and method {Method and Apparatus for Managing Fault}

본 발명은 장애 관리 장치 및 방법에 관한 것으로서, 특히 주지의 에러 데이터베이스(Known Error Database, 이하 'KEDB'라 칭함) 구축을 통한 장애 관리 장치 및 방법에 관한 것이다.The present invention relates to a failure management apparatus and method, and more particularly, to a failure management apparatus and method by establishing a known error database (hereinafter referred to as 'KEDB').

IT 서비스 제공자들이 추구하는 공통적인 사업 목표는 서비스 품질 보장을 통한 고객 만족도 향상과 효율적인 IT 자원 관리를 통한 장기적 측면의 총 비용 소유권(Total Cost Ownership: TCO) 절감이라 할 수 있다.Common business goals pursued by IT service providers include improved customer satisfaction through service quality assurance and reduced total cost ownership (TCO) in the long run through efficient IT resource management.

IT 서비스 관리(IT Service Management: ITSM)는 IT 서비스 제공자들의 목표들을 지원하기 위해 모든 IT 서비스 제공을 위한 구성요소(프로세스, 인력, 정보 및 기술)들을 효과적으로 통합 및 관리하기 위한 체계화라 정의할 수 있다.IT Service Management (ITSM) can be defined as the organization to effectively integrate and manage components (processes, people, information and technologies) for all IT service delivery to support the goals of IT service providers. .

IT 구조 라이브러리(IT Infrastructure Libray: ITIL)은 IT 관련 업계의 IT 서비스 관리 모범 사례들을 집대성한 IT 서비스 관리 프레임 워크로서, 효율적인 IT 구조 라이브러리의 구현 및 운영을 위해 최적화된 가이드라 정의한다.IT Infrastructure Library (ITIL) is an IT service management framework that aggregates the IT service management best practices of the IT-related industry and defines it as an optimized guide for the implementation and operation of an efficient IT structure library.

전통적인 IT 관리 체계를 IT 서비스 관리/IT 구조 라이브러리 체계로의 전환 과정에서 나타나는 문제점은 기존의 단위 업무 프로세스가 세분화됨에 따른 업무의 복잡성 증가, 기존의 단위 조직의 업무가 협력 체계로 전환됨에 따른 업무의 비효율성 증가, KEDB 정보의 신뢰성 부족에 따른 업무 처리 시간의 지연 등을 있다.The problem with the transition from the traditional IT management system to the IT service management / IT structure library system is that the complexity of the work is increased due to the fragmentation of the existing unit business processes, and Increase in inefficiency and delay in processing time due to lack of reliability of KEDB information.

이러한 문제점들의 대표적인 사례가 IT 구조 라이브러리 서비스 서포트(Service Support) 운용 지침을 실용화한 장애 관리 프로세스이다.A representative example of these problems is the failure management process that puts the IT structure library Service Support operation guidelines into practice.

이러한 장애 관리 프로세스의 운용 과정에서 발생하는 문제점들은 장애 관리 프로세스의 세분화에 따른 부작용과 KEDB 중심의 장애 관리 운용에 따른 부작용이 있다.Problems that occur in the operation of the failure management process have side effects due to the fragmentation of the failure management process and side effects due to the operation of KEDB-centered failure management.

먼저, 장애 관리 프로세스의 세분화에 따른 부작용의 원인은 관리 단계별 협력 작업을 위한 절차 및 활동의 복잡성, 관리 단계별 협력 작업에 대한 역할 및 책임의 명확성 부족, 관리 단계별 협력 작업을 위한 명확한 정보 전달 체계의 부족, 관리 단계별 업무 수행을 위한 운용자의 기술 및 지식의 부족 등이 있다.First, the causes of the side effects of the segmentation of the failure management process are the complexity of the procedures and activities for cooperative work at each stage of management, the lack of clarity of roles and responsibilities for cooperative work at each stage of management, and the lack of a clear information delivery system for cooperative work at each stage of management. For example, there is a lack of skills and knowledge of operators to perform tasks at each management stage.

또한, KEDB 중심의 장애 관리 운용에 따른 부작용의 원인은 다양한 유형의 장애들에 대한 장애 처리 정보의 정형화 및 체계화 부족, 다양한 유형의 장애들의 대한 운용자의 전문 지식 및 경험 부족, 다양한 유형의 장애들에 대한 근본 원인, 복구 방법, 처리 명령 도출의 어려움, 다양한 유형의 장애들에 대한 진단, 분석, 처리를 위한 지원 도구의 부족 등이 있다.In addition, the causes of the side effects of KEDB-centered disability management operations are due to the lack of formalization and systematization of disability handling information for various types of disabilities, lack of operator expertise and experience for various types of disabilities, and various types of disabilities. The root cause, recovery methods, difficulty in deriving processing instructions, and the lack of support tools for diagnosing, analyzing, and treating various types of failures.

이와 같은 문제점을 해결하기 위하여, 본 발명은 에러 복구 경로 정보 기반의 KEDB 구축을 통한 장애 관리 장치 및 방법을 제공하기 위한 것이다.In order to solve such a problem, the present invention is to provide a failure management apparatus and method through the KEDB construction based on error recovery path information.

이러한 기술적 과제를 달성하기 위한 본 발명의 특징에 따른 장애 관리 방법은 (a) 장애 정보에 대응하는 다수의 에러 복구 경로 정보를 주지의 에러 데이터베이스(Known Error DB)에서 검색하여 상기 다수의 에러 복구 경로 정보 중 어느 하나의 에러 복구 경로 정보를 확인하여 선택하고, 상기 선택한 에러 복구 경로 정보에 대한 검증을 위한 제1 변경 요청 메시지를 생성하여 전송하는 단계; (b) 상기 장애 정보에 대한 원인 분석 및 처리 방법을 찾고, 상기 장애 정보를 해결하기 위한 새로운 에러 복구 경로를 생성하며, 상기 생성한 새로운 에러 복구 경로에 대한 검증을 위한 제2 변경 요청 메시지를 생성하여 전송하는 단계; 및 (c) 상기 제1 변경 요청 메시지에 대응하는 상기 선택한 에러 복구 경로 정보와 상기 제2 변경 요청 메시지에 대응하는 상기 생성한 새로운 에러 복구 경로를 상기 주지의 에러 데이터베이스를 통해 식별하고, 상기 선택한 에러 복구 경로 및 상기 생성한 새로운 에러 복구 경로에 대한 타당성 검증 및 조정을 수행하는 단계를 포함한다.According to an aspect of the present invention for achieving the above technical problem, (a) a plurality of error recovery paths by retrieving a plurality of error recovery path information corresponding to failure information from a known error database (Known Error DB) Identifying and selecting any one error recovery path information from the information, and generating and transmitting a first change request message for verifying the selected error recovery path information; (b) find a cause analysis and processing method for the failure information, generate a new error recovery path for resolving the failure information, and generate a second change request message for verifying the generated new error recovery path; Transmitting by; And (c) identifying the selected error recovery path information corresponding to the first change request message and the generated new error recovery path corresponding to the second change request message through the known error database, and selecting the selected error. Performing validation and adjustment of the recovery path and the new error recovery path generated.

본 발명의 특징에 따른 장애 관리 장치는 장애 정보에 대응하는 다수의 에러 복구 경로 정보를 주지의 에러 데이터베이스(Known Error DB)에서 검색하여 상기 다수의 에러 복구 경로 정보 중 어느 하나의 에러 복구 경로 정보를 적용하여 장애 처리를 수행하는 사건 관리부; 상기 장애 정보가 상기 주지의 에러 데이터베이스에서 검색되지 않는 경우, 상기 사건 관리부로부터 수신한 상기 장애 정보에 대한 원인 분석 및 처리 방법을 찾고 상기 장애 정보를 해결하기 위한 새로운 에러 복구 경로 정보를 생성하는 문제 관리부; 및 상기 에러 복구 경로 정보와 상기 새로운 에러 복구 경로 정보를 상기 주지의 에러 데이터베이스를 통해 식별하여 타당성 검증 및 장애 영향도 평가를 수행하는 변경 관리부를 포함한다.A failure management apparatus according to an aspect of the present invention retrieves a plurality of error recovery path information corresponding to failure information from a known error database and retrieves any one of the plurality of error recovery path information. An incident management unit that applies trouble handling by applying; If the failure information is not retrieved from the known error database, a problem management unit for finding a cause analysis and processing method for the failure information received from the event management unit and generating new error recovery path information for resolving the failure information. ; And a change management unit for identifying the error recovery path information and the new error recovery path information through the known error database to perform validity verification and failure impact evaluation.

본 발명의 특징에 따른 장애 관리 방법은 (a) 장애 정보에 대응하는 서비스 유형, 서버 유형, 장애 유형 및 장애 원인을 주지의 에러 데이터베이스에서 선택하는 단계; (b) 상기 장애 정보를 처리하기 위한 에러 복구 경로를 상기 주지의 에러 데이터베이스에서 선택하는 단계; 및 (c) 상기 선택한 에러 복구 경로에 정의된 복구 절차 및 운용 지침을 확인하고 상기 선택한 에러 복구 경로에 정의된 복구 명령어를 수행하여 결과를 확인하는 단계를 포함한다.A failure management method according to an aspect of the present invention includes the steps of: (a) selecting a service type, a server type, a failure type and a cause of failure corresponding to failure information from a known error database; (b) selecting an error recovery path from the known error database for processing the failure information; And (c) checking a recovery procedure and an operation guide defined in the selected error recovery path, and performing a recovery command defined in the selected error recovery path to confirm a result.

전술한 구성에 의하여, 본 발명은 IT 구조 라이브러리(IT Infrastructure Libray: ITIL) 표준화에 따른 기존의 장애 관리 프로세스 운용상의 부작용들을 최소화하는 효과를 기대할 수 있다.By the above-described configuration, the present invention can be expected to minimize the side effects of the existing failure management process operation according to the IT Infrastructure Library (ITIL) standardization.

본 발명은 고객과의 협약된 서비스 품질의 체계적인 관리 및 IT 자원의 관리를 지원하는 효과를 기대할 수 있다.The present invention can be expected to support the systematic management of the service quality agreed with the customer and the management of IT resources.

본 발명은 장애 처리를 위한 관리 단계 상호 간의 협력 체계를 강화하고 KEDB 기반의 장애 처리에 대한 철저한 검증 및 보완을 강화하는 효과를 기대할 수 있다.The present invention can be expected to enhance the cooperation system between the management steps for handling the failure and to enhance the thorough verification and complementation of the KEDB-based failure handling.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it may further include other components, except to exclude other components unless specifically stated otherwise. In addition, the terms “… unit”, “… unit”, “module”, “block”, etc. described in the specification mean a unit that processes at least one function or operation, which is hardware or software or a combination of hardware and software. It can be implemented as.

도 1은 본 발명의 실시예에 따른 KEDB(200) 기반의 장애 관리 장치(100)의 내부 구성을 간략하게 나타낸 블록 구성도이다.1 is a block diagram schematically illustrating an internal configuration of a failure management apparatus 100 based on KEDB 200 according to an embodiment of the present invention.

본 발명의 실시예는 IT 서비스 관리 표준을 단순화한 것으로서, 관리 단계별 업무들이 전담과 위임의 동작 원리로 진행된다는 것이다.The embodiment of the present invention simplifies the IT service management standard, in which tasks in each management process are carried out on the principle of operation of delegation and delegation.

본 발명의 실시예에 따른 장애 관리 장치(100)는 사건 관리부(110), 문제 관리부(120), 변경 관리부(130), 배포 관리부(140) 및 에러 복구 경로 변경부(150)를 포함한다.The failure management apparatus 100 according to the embodiment of the present invention includes an incident management unit 110, a problem management unit 120, a change management unit 130, a distribution management unit 140, and an error recovery path change unit 150.

사건 관리부(110)는 KEDB(200)에 등록된 에러 복구 경로(Error Recovery Path: ERP) 정보를 이용하여 발생된 장애 정보를 처리하는 업무와 처리하지 못하는 장애 정보를 문제 관리부(120)로 전송한다.The incident management unit 110 transmits a task of processing fault information generated using error recovery path (ERP) information registered in the KEDB 200 and trouble information that cannot be processed to the problem manager 120. .

문제 관리부(120)는 사건 관리부(110)로부터 수신된 장애 정보에 대한 원인 분석 및 처리 방법을 찾고, 문제를 해결하기 위한 에러 복구 경로 정보를 생성한다.The problem manager 120 finds a cause analysis and processing method for the failure information received from the event manager 110 and generates error recovery path information for solving the problem.

변경 관리부(130)는 사건 관리부(110)와 문제 관리부(120)로부터 요청된 에러 복구 경로 정보를 KEDB(200)를 통해 식별하고, 에러 복구 경로 정보에 대한 타당성 검증 및 영향도 평가를 수행하며, 검증 및 보완된 에러 복구 경로 정보를 KEDB(200)에 등록한다.The change management unit 130 identifies the error recovery path information requested from the incident management unit 110 and the problem management unit 120 through the KEDB 200, performs validity verification and impact evaluation on the error recovery path information, The verified and corrected error recovery path information is registered in the KEDB 200.

배포 관리부(140)는 변경 관리부(130)에서 검증 및 보완된 에러 복구 경로 정보를 개발 환경 및 실제 환경에서 테스트하여 성공한 경우 구성 관리 데이터베이스(미도시)를 업데이트한다.The distribution manager 140 tests the error recovery path information verified and supplemented by the change manager 130 in a development environment and a real environment, and updates a configuration management database (not shown) if successful.

에러 복구 경로 변경부(150)는 해당 장애 정보에 대한 에러 복구 경로 정보를 변경할 필요가 있는 경우, KEDB(200)에 등록된 각 에러 복구 경로 정보에서 장애 정보에 대응하는 에러 복구 경로 정보로 변경한다.If it is necessary to change the error recovery path information for the corresponding failure information, the error recovery path changing unit 150 changes the error recovery path information corresponding to the failure information from each error recovery path information registered in the KEDB 200. .

도 2는 본 발명의 실시예에 따른 KEDB(200) 기반의 장애 관리 방법의 처리 과정을 설명하기 위한 도면이다.2 is a view for explaining the processing of the failure management method based on the KEDB 200 according to an embodiment of the present invention.

사건 관리부(110)는 장애가 발생된 사건을 인지하고 분류하고, 서비스 임팩트 분석 맵(Service Impact Analysis Map: SIAM)을 통해 발생된 장애 정보를 분석한다(S100, S102).The incident management unit 110 recognizes and classifies the occurrence of the failure and analyzes the failure information generated through the service impact analysis map (SIAM) (S100 and S102).

사건 관리부(110)는 발생된 장애 정보에 대한 에러 복구 경로 정보를 KEDB(200)에서 검색한다(S104).The incident management unit 110 retrieves error recovery path information on the generated failure information from the KEDB 200 (S104).

사건 관리부(110)는 발생된 장애 정보에 대응하는 에러 복구 경로 정보를 KEDB(200)에서 검색하지 못하는 경우 발생된 장애 정보를 문제 관리부(120)로 전송한다(S200).The incident management unit 110 transmits the generated failure information to the problem management unit 120 when the error recovery path information corresponding to the generated failure information is not retrieved from the KEDB 200 (S200).

사건 관리부(110)는 발생된 장애 정보에 대응하는 에러 복구 경로 정보를 KEDB(200)에서 검색하는 경우 발생된 장애 정보를 해결하기 위하여 다수의 에러 복구 경로 정보 중 어느 하나의 에러 복구 경로 정보를 확인하여 선택한다(S108).The incident management unit 110 checks the error recovery path information of any one of the plurality of error recovery path information to solve the failure information generated when the error recovery path information corresponding to the generated failure information is searched in the KEDB 200. To select (S108).

이어서, 사건 관리부(110)는 발생된 장애 정보의 에러 복구 경로 정보를 표시하는 변경 요청 메시지(Request for Change: RFC)를 생성하여 변경 관리부(130)로 전송한다. 여기서, 변경 요청 메시지는 에러 복구 경로 정보를 검증 및 분석하기 위해 에러 복구 경로 정보를 표시하는 명령어이다.Subsequently, the event manager 110 generates a change request message (RFC) indicating error recovery path information of the generated failure information and transmits the generated request message to the change manager 130. Here, the change request message is a command for displaying the error recovery path information in order to verify and analyze the error recovery path information.

사건 관리부(110)는 발생된 장애 정보에 대한 에러 복구 경로 정보의 변경이 필요하다고 판단하는 경우, 에러 복구 경로 변경부(Incident Resolution Assembler)(150)를 통해 해당 에러 복구 경로 정보를 변경한다(S110, S112). 이어 서, 사건 관리부(110)는 변경된 에러 복구 경로 정보를 통해 장애를 해결하기 위한 변경 요청 메시지를 생성하여 변경 관리부(130)로 전송한다(S114).If the event management unit 110 determines that the error recovery path information needs to be changed with respect to the generated failure information, the error recovery path information is changed through the error recovery path changing unit 150 (S110). , S112). Subsequently, the incident management unit 110 generates a change request message for resolving a failure through the changed error recovery path information and transmits the change request message to the change management unit 130 (S114).

문제 관리부(120)는 사건 관리부(110)로부터 수신한 장애 정보를 인지 및 분류하고, 수신한 장애 정보를 서비스 임팩트 분석 맵을 통해 분석한다(S200, S202). 이어서, 문제 관리부(120)는 발생된 장애 정보에 대응하는 에러 복구 경로 정보를 KEDB(200)에서 검색한다(S204).The problem management unit 120 recognizes and classifies the failure information received from the event management unit 110 and analyzes the received failure information through the service impact analysis map (S200 and S202). Subsequently, the problem management unit 120 searches for error recovery path information corresponding to the generated failure information in the KEDB 200 (S204).

문제 관리부(120)는 해당 서비스의 장애 정보에 대한 에러 복구 경로 정보가 KEDB(200)에 존재하는지 판단하여 존재하지 않는 경우 외부 시스템과 연동을 통하여 발생된 장애 정보에 대한 분석을 수행한다.The problem manager 120 determines whether error recovery path information regarding the failure information of the corresponding service exists in the KEDB 200, and if it does not exist, analyzes the failure information generated through interworking with an external system.

문제 관리부(120)는 해당 서비스의 장애 정보에 대한 에러 복구 경로 정보가 KEDB(200)에 존재하는 경우 발생된 장애 정보를 처리하기 위하여 다수의 에러 복구 경로 정보 중 어느 하나의 에러 복구 경로 정보를 확인하여 선택한다(S206, S208). 이어서, 문제 관리부(120)는 발생된 장애 정보의 에러 복구 경로 정보를 표시하는 변경 요청 메시지를 생성하여 변경 관리부(130)로 전송한다.The problem manager 120 checks the error recovery path information of any one of the plurality of error recovery path information in order to process the failure information generated when the error recovery path information of the corresponding service failure information exists in the KEDB 200. To select (S206, S208). Subsequently, the problem manager 120 generates a change request message indicating error recovery path information of the generated failure information and transmits the change request message to the change manager 130.

문제 관리부(120)는 해당 서비스의 장애 정보에 대한 에러 복구 경로 정보를 변경할 필요가 있다고 판단하는 경우 에러 복구 경로 변경부(Incident Resolution Assembler)(150)를 통해 해당 에러 복구 경로 정보를 변경한다(S210, S212). 이어서, 문제 관리부(120)는 발생한 장애 정보의 에러 복구 경로 정보를 표시하는 변경 요청 메시지를 생성하여 변경 관리부(130)로 전송한다(S214).If the problem management unit 120 determines that it is necessary to change the error recovery path information for the failure information of the corresponding service, the error recovery path information is changed through the error recovery path changing unit 150 (Sc150) (S210). , S212). Subsequently, the problem manager 120 generates a change request message indicating error recovery path information of the generated failure information and transmits the change request message to the change manager 130 (S214).

변경 관리부(130)는 사건 관리부(110)와 문제 관리부(120)로부터 수신된 변 경 요청 메시지를 인지 및 분류한다(S300). 변경 관리부(130)는 수신된 변경 요청 메시지의 유형을 분석하여 '긴급' 또는 '일반'인지 판단한다(S302).The change manager 130 recognizes and classifies the change request message received from the incident manager 110 and the problem manager 120 (S300). The change manager 130 analyzes the type of the received change request message and determines whether it is “urgent” or “normal” (S302).

변경 관리부(130)는 수신된 변경 요청 메시지의 유형이 '긴급'인 경우, 수신된 변경 요청 메시지에 해당하는 에러 복구 경로 정보를 KEDB(200)에서 검색하고, 긴급 테스트를 통해 에러 복구 경로 정보의 적용에 따른 장애 영향도 평가를 수행한다(S304).When the type of the received change request message is 'emergency', the change management unit 130 retrieves error recovery path information corresponding to the received change request message from the KEDB 200, and performs an emergency test to determine the error recovery path information. Disability impact assessment according to the application is performed (S304).

또한, 변경 관리부(130)는 수신된 변경 요청 메시지의 유형이 '일반'인 경우, 수신된 변경 요청 메시지에 해당하는 에러 복구 경로 정보를 KEDB(200)에서 검색하고 정밀 테스트를 통해 에러 복구 경로 정보의 적용에 따른 장애 영향도 평가를 수행한다(S306).In addition, when the type of the received change request message is 'general', the change manager 130 retrieves the error recovery path information corresponding to the received change request message from the KEDB 200 and performs the error recovery path information through a precise test. Disability impact assessment is performed according to the application (S306).

변경 관리부(130)는 긴급 및 정밀 테스트를 통해 에러 복구 경로 정보의 검증 및 조정을 수행한다(S308). 변경 관리부(130)는 에러 복구 경로 정보에 대한 검증 및 조정이 성공한 경우 에러 복구 경로 정보를 KEDB(200)에 등록한다(S310, S312).The change management unit 130 performs verification and adjustment of the error recovery path information through emergency and precise tests (S308). The change manager 130 registers the error recovery path information in the KEDB 200 when the verification and adjustment of the error recovery path information is successful (S310 and S312).

변경 관리부(130)는 에러 복구 경로 정보에 대한 배포 관리가 필요한지 판단하여 필요한 경우(S314), 검증 및 조정된 에러 복구 경로 정보를 배포 관리부(140)로 전송한다(S400).The change manager 130 determines whether distribution management for the error recovery path information is necessary (S314), and transmits the verified and adjusted error recovery path information to the distribution manager 140 (S400).

배포 관리부(140)는 변경 관리부(130)로부터 수신한 에러 복구 경로 정보를 분석하여 개발 환경에 테스트하고(S402, S404), 개발 환경에 대한 테스트가 성공한 경우(S406), 실제 환경에 테스트하여 성공하면 구성 관리 데이터베이스(미도시)를 업데이트한다(S408, S410, S412).The distribution management unit 140 analyzes the error recovery path information received from the change management unit 130 and tests the development environment (S402, S404), and if the test for the development environment is successful (S406), tests the actual environment and succeeds. The configuration management database (not shown) is updated (S408, S410, and S412).

도 3은 본 발명의 실시예에 따른 KEDB(200) 기반의 장애 관리 방법의 처리 과정에서 에러 복구 경로 정보를 기반으로 구축된 KEDB(200)의 세부적인 구조를 설명하기 위한 도면이다.3 is a diagram illustrating a detailed structure of a KEDB 200 constructed based on error recovery path information in a process of a failure management method based on KEDB 200 according to an exemplary embodiment of the present invention.

본 발명의 실시예에 따른 에러 복구 경로 정보를 기반으로 한 KEDB(200) 구축 방법은 발생된 장애 정보를 처리하는 절차와 업무에 필요한 정보를 제공하는 것이고 이를 통해 발생된 장애 정보를 정확하고 신속하게 처리할 수 있도록 지원하는 것이다.The KEDB 200 construction method based on the error recovery path information according to an embodiment of the present invention provides information necessary for a procedure and a task for processing the generated fault information and accurately and quickly generates the fault information generated through the error information. To support this.

제안된 에러 복구 경로 정보를 기반으로 한 KEDB(200) 구축 방법은 장애 처리에 필요한 진단, 분석, 복구 등의 업무 절차를 경로 정보로 체계화하고 장애 처리에 필요한 유형, 원인, 해결 방법 등의 장애 처리 정보를 연계 정보로 통합화한다.Based on the proposed error recovery path information, KEDB (200) construction method organizes business procedures such as diagnosis, analysis, and recovery necessary for error handling into path information, and handles failures such as types, causes, and solutions required for failure handling. Integrate information into linkage information.

또한, 제안된 에러 복구 경로 정보를 기반으로 한 KEDB(200) 구축 방법은 장애 처리에 적용될 다양한 기술, 지식 정보들을 경험 정보로 누적화하고 장애 처리에 적용될 복구 절차 및 복구 명령어들을 조합화하며, 장애 처리에 적용된 장애 이력 정보를 통계 정보로 구조화한다.In addition, the KEDB 200 construction method based on the proposed error recovery path information accumulates various technical and knowledge information to be applied to failure processing as experience information, and combines recovery procedures and recovery commands to be applied to failure processing. The fault history information applied to the processing is structured into statistical information.

도 3에 도시된 바와 같이, 본 발명의 실시예에 따른 KEDB(200)는 다수의 에러 복구 경로 정보를 포함한다.As shown in FIG. 3, the KEDB 200 according to an embodiment of the present invention includes a plurality of error recovery path information.

여기서, 각 에러 복구 경로 정보는 서비스 유형(210), 서버 유형(220), 장애 유형(230), 장애 원인(240) 및 에러 복구 경로(250)를 포함한다.Here, each error recovery path information includes a service type 210, a server type 220, a failure type 230, a failure cause 240, and an error recovery path 250.

에러 복구 경로(250)는 장애 정보의 일련번호로 에러 복구 경로 패스(ERP Path)와 장애 처리를 수행한 빈도를 나타내는 에러 복구 경로 랭크(ERP Rank), 확인하고 내리고 복구하는 등의 장애를 복구하는 일련의 절차를 나타내는 복구 절차(Recovery Step), 각 복구 절차를 수행하는 복구 명령어(Recovery Operation)와 이에 따른 테스트 결과를 포함한다.The error recovery path 250 is a serial number of failure information. The error recovery path rank (ERP Rank) indicating an error recovery path path (ERP Path) and the frequency of failure processing, and recovering failures such as checking, lowering and recovering, etc. It includes a recovery step representing a series of procedures, a recovery operation for performing each recovery procedure, and a test result accordingly.

발생된 장애 정보를 해결하기 위해 에러 복구 경로 정보를 이용하는 기본적인 흐름은 다음과 같다.The basic flow of using error recovery path information to solve the generated fault information is as follows.

장애 관리 장치(100)는 발생된 장애 정보에 대응하는 서비스 유형(210), 서버 유형(220), 장애 유형(230) 및 장애 원인(240)을 선택한다. 이어서, 장애 관리 장치(100)는 발생된 장애 정보를 처리하기 위한 에러 복구 경로(250)를 선택한다. The failure management apparatus 100 selects a service type 210, a server type 220, a failure type 230, and a failure cause 240 corresponding to the generated failure information. Subsequently, the failure management apparatus 100 selects an error recovery path 250 for processing the generated failure information.

장애 관리 장치(100)는 선택된 에러 복구 경로(250)에 정의된 복구 절차 및 운용 지침을 확인한다.The failure management apparatus 100 confirms a recovery procedure and an operation guide defined in the selected error recovery path 250.

장애 관리 장치(100)는 선택된 에러 복구 경로(250)에 정의된 복구 명령어를 수행하고 결과를 확인한다.The failure management apparatus 100 performs a recovery command defined in the selected error recovery path 250 and checks the result.

제안된 에러 복구 경로 정보를 기반으로 한 KEDB(200) 구축 방법은 서비스, 서버, 장애 이벤트에 대응되는 장애 원인, 복구 절차, 복구 명령에 대한 종합적 정보를 제공하고 다양한 서비스 장애 발생에 대한 구체적인 해결 방안 도출의 편의성을 제공하는 효과가 있다.The KEDB 200 construction method based on the proposed error recovery path information provides comprehensive information on the failure cause, recovery procedure, and recovery command corresponding to service, server, and failure events, and specific solutions for various service failures. This provides the convenience of derivation.

이러한 KEDB(200) 구축 방법은 동일한 서비스 장애 발생에 대한 다양한 해결 방안에 기술적 정보를 제공하고 경험적 통계 정보를 통한 장애 복구 경로 선정을 위한 정확성을 제공하며 다양한 서비스 장애 발생에 대한 신속하고 정확한 판단 기준을 제공하는 효과가 있다.The KEDB 200 deployment method provides technical information on various solutions for the occurrence of the same service failure, provides accuracy for selecting a recovery path through empirical statistical information, and provides quick and accurate criteria for determining various service failures. It is effective to provide.

도 4는 본 발명의 실시예에 따른 KEDB(200) 기반의 장애 관리 방법의 처리 과정에 있어서, 발생된 장애 정보를 처리하기 위한 에러 복구 경로 정보를 조립하는 방법을 설명하기 위한 도면이다.4 is a view for explaining a method of assembling error recovery path information for processing the generated failure information in the process of the KEDB 200-based failure management method according to an embodiment of the present invention.

에러 복구 경로 정보를 조립하는 방법은 단계별(사건 관리부(110), 문제 관리부(120) 및 변경 관리부(130)) 역할에 따라 각각 다른 방식으로 운영된다.The method of assembling the error recovery path information is operated in different ways depending on the stages (event management unit 110, problem management unit 120, and change management unit 130).

사건 관리부(110)는 발생된 장애 정보를 처리하기 위해 기존에 KEDB(200)에 등록된 각 에러 복구 경로 정보 중 어느 하나의 에러 복구 경로 정보를 선택하여 장애 처리를 수행한다.The incident management unit 110 performs failure processing by selecting any one error recovery path information from each error recovery path information previously registered in the KEDB 200 to process the generated failure information.

문제 관리부(120)는 장애 정보를 처리하기 위해 필요한 에러 복구 경로 정보를 생성한다.The problem manager 120 generates error recovery path information necessary for processing the failure information.

변경 관리부(130)는 사건 관리부(110)와 문제 관리부(120)에서 요청한 에러 복구 경로 정보에 대한 검증, 시험 및 조정 역할을 수행한다.The change manager 130 performs verification, test, and adjustment of the error recovery path information requested by the event manager 110 and the problem manager 120.

에러 복구 경로 정보의 조립을 위해서는 다양한 에러 복구 경로 정보 풀(Pool)을 제공한다. 여기서, 에러 복구 경로 정보 풀은 등록된 다수의 에러 복구 경로 정보를 저장하고 있는 KEDB(200)을 의미한다.Various error recovery path information pools are provided for assembling the error recovery path information. Here, the error recovery path information pool refers to the KEDB 200 that stores a plurality of registered error recovery path information.

발생된 장애 정보를 해결하기 위한 에러 복구 경로 정보를 생성 및 조립하는 기본적인 흐름은 다음과 같다.The basic flow of generating and assembling error recovery path information to solve the generated fault information is as follows.

장애 관리 장치(100)는 장애 정보에 대응하는 에러 복구 경로 정보를 선택한 후, 새로운 에러 복구 경로 정보가 필요하다고 판단되는 경우 장애 정보에 대응하는 에러 복구 경로 정보를 에러 복구 경로 정보 풀에서 선택한다(S500).The failure management apparatus 100 selects error recovery path information corresponding to the failure information, and then selects error recovery path information corresponding to the failure information from the error recovery path information pool when it is determined that new error recovery path information is needed ( S500).

이어서, 장애 관리 장치(100)는 장애 정보에 대응하는 복구 스텝을 에러 복구 스텝 풀(Error Recovery Step Pool)에서 선택하며(S502), 장애 정보에 대응하는 복구 명령을 에러 복구 명령 풀(Error Recovery Operation Pool)에서 선택한다(S504). 여기서, 에러 복구 스텝 풀은 에러 복구 경로 정보의 구성요소인 복구 절차를 저장하며, Action1, Action2, Action3 등으로 기재되어 있다. 여기서, 에러 복구 명령 풀은 복구 절차에 등록된 실제적인 처리 명령어 및 선택 정보들을 저장하며, Operation01, Operation02, Operation03 등으로 기재되어 있다.Subsequently, the failure management apparatus 100 selects a recovery step corresponding to the failure information from an error recovery step pool (S502), and selects a recovery command corresponding to the failure information from an error recovery command pool. Pool) (S504). Here, the error recovery step pool stores a recovery procedure which is a component of error recovery path information, and is described as Action1, Action2, Action3, and the like. Here, the error recovery command pool stores actual processing instructions and selection information registered in a recovery procedure, and is described as Operation01, Operation02, Operation03, and the like.

이상에서 설명한 본 발명의 실시예는 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하기 위한 프로그램, 그 프로그램이 기록된 기록 매체 등을 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시예의 기재로부터 본 발명이 속하는 기술분야의 전문가라면 쉽게 구현할 수 있는 것이다.The embodiments of the present invention described above are not implemented only by the apparatus and / or method, but may be implemented through a program for realizing a function corresponding to the configuration of the embodiments of the present invention, a recording medium on which the program is recorded, and the like. Such implementations may be readily implemented by those skilled in the art from the description of the above-described embodiments.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

도 1은 본 발명의 실시예에 따른 KEDB 기반의 장애 관리 장치의 내부 구성을 간략하게 나타낸 블록 구성도이다.1 is a block diagram schematically showing the internal configuration of a KEDB-based failure management apparatus according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 KEDB 기반의 장애 관리 방법의 처리 과정을 설명하기 위한 도면이다.2 is a view for explaining the processing of the KEDB-based failure management method according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 KEDB 기반의 장애 관리 방법의 처리 과정에서 에러 복구 경로 정보를 기반으로 구축된 KEDB의 세부적인 구조를 설명하기 위한 도면이다.3 is a diagram illustrating a detailed structure of a KEDB constructed on the basis of error recovery path information in a process of processing a KEDB based failure management method according to an exemplary embodiment of the present invention.

도 4는 본 발명의 실시예에 따른 KEDB 기반의 장애 관리 방법의 처리 과정에 있어서, 발생된 장애 정보를 처리하기 위한 에러 복구 경로 정보를 조립하는 방법을 설명하기 위한 도면이다.4 is a view for explaining a method of assembling error recovery path information for processing the generated failure information in the process of the KEDB-based failure management method according to an embodiment of the present invention.

Claims

(a) Known error database in which a plurality of error recovery path information corresponding to the failure information is known, wherein the known error database includes failure history information applied to failure processing. Searching based on one of the plurality of error recovery path information, and selecting one error recovery path information to process the failure information;

(b) verifying validity of the selected error recovery path information and registering the verified error recovery path information in the known error database; And

(c) testing the registered error recovery path information in a development environment and updating the known error database

Failure management method comprising a.

According to claim 1,

In step (a),

Generating new error recovery path information by searching for a cause analysis and processing method for the failure information when the plurality of error recovery path information corresponding to the failure information is not retrieved from the known error database;

Disability management method further comprising.

The method of claim 2,

Selecting error recovery path information corresponding to the failure information from the known error database when the new error recovery path information is needed;

Selecting a recovery step corresponding to the failure information from an error recovery step pool, wherein the error recovery step pool stores a recovery procedure that is a component of the error recovery path information; And

Selecting a recovery command corresponding to the failure information from an error recovery command pool, wherein the error recovery command pool stores processing instructions and selection information registered in the recovery procedure;

Failure management method comprising a.

Search based on statistics of the fault history information in a known error database in which a plurality of error recovery path information corresponding to the fault information is known, wherein the known error database includes fault history information applied to fault processing. An event management unit configured to perform error processing by applying one error recovery path information among the plurality of error recovery path information;

If the failure information is not retrieved from the known error database, a problem management unit for finding a cause analysis and processing method for the failure information received from the event management unit and generating new error recovery path information for resolving the failure information. ; And

A change management unit for identifying the error recovery path information and the new error recovery path information through the known error database to perform validity verification and failure impact evaluation;

Failure management device comprising a.

The method of claim 4, wherein

The distribution management unit that tests the error recovery path information verified and supplemented by the change management unit in the development environment and the real environment, and updates the configuration management database when it is successful.

Failure management device further comprising.

The method according to claim 4 or 5,

An error recovery path changing unit for changing the error recovery path information corresponding to the failure information from the plurality of registered error recovery path information of the known error database when the error recovery path information or the new error recovery path information is changed;

Failure management device further comprising.

(a) selecting a fault type corresponding to the fault information from a known error database, wherein the known error database includes fault history information applied to the fault processing;

(b) selecting an error recovery path of one of a plurality of error recovery paths for processing the failure information based on statistics of the failure history information in the known error database; And

(c) confirming a recovery procedure defined in the selected error recovery path and performing a recovery command defined in the selected error recovery path to verify a result;

Failure management method comprising a.

The method of claim 7, wherein

After step (c),

Selecting error recovery path information corresponding to the failure information from the known error database when new error recovery path information corresponding to the failure information is needed;

Disability management method further comprising.