JP5919529B2

JP5919529B2 - Data storage system

Info

Publication number: JP5919529B2
Application number: JP2011203488A
Authority: JP
Inventors: 浩一高岡; 孝栗尾; 弘記西川
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2011-09-16
Filing date: 2011-09-16
Publication date: 2016-05-18
Anticipated expiration: 2031-09-16
Also published as: JP2013065203A

Description

本発明は、分散データベースにデータを格納するデータ格納システムに関するものである。 The present invention relates to a data storage system for storing data in a distributed database.

従来から、インターネットのような電気通信回線を通してアクセスする様々なデータベースが運営されている。この種のデータベースには、地球上の多地点の気象に関する観測データを集約するデータベースや、電気やガスなどの需要家から検針データを収集するデータベースのように、多数台の端末からデータを繰り返して収集することが必要なものがある。 Conventionally, various databases accessed through a telecommunication line such as the Internet have been operated. In this type of database, data from many terminals is repeated, such as a database that collects observation data related to weather at multiple points on the earth, and a database that collects meter reading data from consumers such as electricity and gas. Some things need to be collected.

この種のデータベースが扱うデータの量は膨大であるから、１台のコンピュータシステムでデータベースを構築するのではなく、複数台のコンピュータシステムにデータを分散させて格納することが考えられている（たとえば、特許文献１参照）。したがって、この種のデータベースを管理するデータベース管理システムは、物理的に実在している多数台のコンピュータシステムにデータを分散させて保存する機能と、複数台のコンピュータシステムを統合して１台のコンピュータシステムのように扱う機能とを備えている必要がある。すなわち、データが分散して記憶されるにもかかわらず、利用者に複数台のコンピュータを意識させないようにする必要がある。この種の技術は分散データーベースと呼ばれている。 Since the amount of data handled by this type of database is enormous, it is conceived that the database is distributed and stored in a plurality of computer systems instead of constructing the database with one computer system (for example, , See Patent Document 1). Therefore, a database management system for managing this kind of database integrates a function of storing data in a large number of physically existing computer systems and a plurality of computer systems into one computer. It is necessary to have a function to handle like a system. That is, it is necessary to prevent the user from being conscious of a plurality of computers even though the data is stored in a distributed manner. This type of technology is called a distributed database.

特表２００２−５２８８１７号公報Japanese translation of PCT publication No. 2002-528817

上述した分散データベースには、ハードウェア資源の異なるコンピュータシステムが混在していることがある。データベースシステムでは、ハードウェア資源のうち、とくにＣＰＵ（Central Processing Unit）の性能と記憶装置の記憶容量との相違がスループットに大きく影響する。そのため、分散データベースシステムを構成しているコンピュータシステムにおけるハードウェア資源のばらつきに起因して、データを格納する際の書込時間やデータを提供する際の応答時間にばらつきが生じることがある。 In the distributed database described above, computer systems having different hardware resources may be mixed. In the database system, among the hardware resources, the difference between the performance of a CPU (Central Processing Unit) and the storage capacity of the storage device greatly affects the throughput. Therefore, due to variations in hardware resources in computer systems constituting the distributed database system, variations may occur in the writing time when storing data and the response time when providing data.

また、分散データベースシステムを構成している複数のコンピュータシステムにデータを格納する際に、単一のコンピュータシステムにアクセスが集中する場合があり、この場合には該当するコンピュータシステムがボトルネックになる。すなわち、分散データベースシステムの全体としてのスループットが低下することになる。 Further, when data is stored in a plurality of computer systems constituting a distributed database system, access may be concentrated on a single computer system. In this case, the corresponding computer system becomes a bottleneck. That is, the overall throughput of the distributed database system is reduced.

本発明は、分散データベースシステムを構成している複数のコンピュータシステムにハードウェア資源のばらつきがあってもボトルネックの発生が抑制されるようにし、結果的に、ハードウェア資源の劣るコンピュータシステムが混在していてもスループットの低下が生じにくい分散データベースを構築することを可能にしたデータ格納システムを提供することを目的とする。 The present invention makes it possible to suppress the occurrence of a bottleneck even if there are variations in hardware resources among a plurality of computer systems constituting a distributed database system. As a result, computer systems with inferior hardware resources are mixed. It is an object of the present invention to provide a data storage system that makes it possible to construct a distributed database that is unlikely to cause a decrease in throughput.

本発明に係るデータ格納システムは、複数台のコンピュータシステムを用いて構成した分散データベースにデータを格納するデータ格納システムであって、分散データベースの構成単位となる互いに記憶容量が等しい複数の仮想サーバと、仮想サーバの中からデータを格納する仮想サーバを決定する選別手段とを備え、仮想サーバは、データを格納する記憶領域を互いに等しい単位サイズである複数のテーブルに分割したデータ格納部を備え、仮想サーバの記憶容量は、複数台のコンピュータの記憶容量のうちの最小の記憶容量を上限とするようにデータ格納部の記憶容量を定めてあり、選別手段は、データが属する集合を識別するように数値で表されたデータＩＤを用いて、同じ集合に属するデータが同じ仮想サーバかつ同じテーブルに格納されるように、データを格納する仮想サーバとテーブルとを決定し、分散データベースに格納するデータを、仮想サーバに分散させ、かつテーブルに分散させて格納することを特徴とする。 A data storage system according to the present invention is a data storage system for storing data in a distributed database configured using a plurality of computer systems, and includes a plurality of virtual servers having the same storage capacity as constituent units of the distributed database, Selecting means for determining a virtual server for storing data from the virtual server, the virtual server including a data storage unit that divides a storage area for storing data into a plurality of tables having the same unit size, The storage capacity of the virtual server is determined so that the minimum storage capacity of the storage capacity of the plurality of computers is the upper limit, and the selecting means identifies the set to which the data belongs. Data belonging to the same set is stored in the same virtual server and the same table using the data ID expressed numerically. As will be, to determine a virtual server and a table for storing data, the data to be stored in distributed databases, is dispersed in the virtual server, and is dispersed in the table, characterized in that stored.

このデータ格納システムにおいて、データ格納部は、データを種類ごとに記憶する複数の種類別記憶部を備えることが好ましい。 In this data storage system, the data storage unit preferably includes a plurality of types of storage units that store data for each type.

このデータ格納システムにおいて、分散データベースは、データ格納部へのデータの格納、データ格納部からのデータの提供、データ格納部のデータの削除を行う管理機能部を備えることが好ましい。 In this data storage system, distributed database, storing data in the data storage unit, provides the data from the data storage unit, it is preferable to provide a management function unit that performs deletion of data in the data storage unit.

このデータ格納システムにおいて、データには、当該データが属する集合を識別するデータＩＤが付加されており、選別手段は、データＩＤに対して、データＩＤを仮想サーバの個数で除した剰余で仮想サーバを特定し、かつデータＩＤをテーブルの個数で除した剰余でテーブルを特定することにより、同じデータＩＤを持つデータが同じテーブルに格納されるように、データを分散させて格納することが好ましい。 In this data storage system, a data ID for identifying a set to which the data belongs is added to the data, and the selecting means uses the remainder obtained by dividing the data ID by the number of virtual servers with respect to the data ID. And by specifying the table by the remainder obtained by dividing the data ID by the number of tables, it is preferable to store the data in a distributed manner so that data having the same data ID is stored in the same table.

本発明の構成によれば、分散データベースシステムを構成している複数のコンピュータシステムにハードウェア資源のばらつきがあってもボトルネックの発生が抑制され、結果的に、ハードウェア資源の劣るコンピュータシステムが混在していてもスループットの低下が生じにくい分散データベースを構築することが可能になる。 According to the configuration of the present invention, even if there are variations in hardware resources among a plurality of computer systems constituting a distributed database system, occurrence of a bottleneck is suppressed, resulting in a computer system with inferior hardware resources. It is possible to construct a distributed database that is less likely to cause a decrease in throughput even if they are mixed.

実施形態を示すブロック図である。It is a block diagram which shows embodiment. 同上の概略構成図である。It is a schematic block diagram same as the above. 同上の動作説明図である。It is operation | movement explanatory drawing same as the above. 他の実施形態を示すブロック図である。It is a block diagram which shows other embodiment.

以下に説明する実施形態は、データベースシステムが、電気、ガス、水道などの需要家に設置されたメータの検針値を電気通信回線を通して収集した検針データを格納し、格納されたデータを提供する遠隔検針システムに用いられる場合を想定する。遠隔検針システムでは、一定時間（たとえば、１０分）ごとに得られる検針データを格納することが要求され、また、検針データを需要家ごとに仕分けて管理する必要がある。そのため、本実施形態は、一定時間毎に収集される検針データがレコードとして格納される多数個のテーブルをデータベースシステムが備え、１つの需要家に対応するレコードは、１つのテーブルにまとめて格納されるようにしている。各テーブルの記憶領域は互いに等しい単位サイズにしてある。また、１つのテーブルには複数の需要家のレコードが登録される。すなわち、各需要家の検針データがテーブルに仕分けられる。 In an embodiment described below, a database system stores meter reading data obtained by collecting meter reading values of meters installed in consumers such as electricity, gas, and water through an electric communication line, and provides stored data. The case where it is used for a meter-reading system is assumed. In the remote meter reading system, it is required to store meter reading data obtained every certain time (for example, 10 minutes), and it is necessary to sort and manage meter reading data for each customer. Therefore, in this embodiment, the database system includes a large number of tables in which meter-reading data collected at regular intervals is stored as records, and records corresponding to one customer are stored together in one table. I try to do it. The storage area of each table has the same unit size. In addition, a plurality of customer records are registered in one table. That is, meter reading data of each customer is sorted into a table.

なお、以下に説明する実施形態は、データベースシステムの用途を遠隔検針システムに限定する趣旨ではなく、１つのテーブルに同種のレコードが多数個格納され、かつレコードの仕分けが必要な用途であれば、同様の技術を採用可能である。このようなデータを扱う用途には、たとえば、多数の観測点について気象条件の観測データを格納する用途、多数の設備について動作状態の監視データを格納する用途、防災や防犯のために多箇所の監視データを格納する用途などがある。これらの用途では、いずれもデータを定期的ないし不定期的にサンプリングして格納する必要がある。これらの用途は一例であってデータの定点観測を行う用途では、本実施形態において説明する技術でデータを格納することにより後述する効果が期待できる。 The embodiment described below is not intended to limit the use of the database system to a remote meter-reading system, but is a use in which many records of the same type are stored in one table and the sorting of records is necessary. Similar techniques can be employed. Applications that handle such data include, for example, the storage of meteorological observation data for a large number of observation points, the storage of operating status monitoring data for a large number of facilities, and multiple locations for disaster prevention and crime prevention. There are uses for storing monitoring data. In these applications, it is necessary to sample and store data regularly or irregularly. These applications are examples, and in applications where fixed point observation of data is performed, the effects described later can be expected by storing data using the technology described in this embodiment.

いま、データベースシステムに、多数個（たとえば、２５００個）のテーブルが設けられ、各テーブルに、多数個（たとえば、１００万個）のレコードが格納される場合を想定する。この数値例では、１０分毎に検針データを収集し、１個の検針データを１レコードに対応付けている場合であって、１００万の需要家に対応するとすれば、１つの需要家に対して２５００レコードの検針データを格納することが可能である。つまり、約１７日間の検針データが格納される。テーブルの容量の上限を２ＧＢとすると、１レコードについて２ｋＢを使用することが可能である。したがって、データベースシステムの全体で、検針データの格納に必要な容量は５ＴＢ（２５億レコード）になる。 Assume that a large number (for example, 2500) tables are provided in the database system, and a large number (for example, 1 million) records are stored in each table. In this numerical example, meter reading data is collected every 10 minutes, and one meter reading data is associated with one record, and if it corresponds to 1 million customers, Thus, the meter reading data of 2500 records can be stored. That is, meter reading data for about 17 days is stored. If the upper limit of the table capacity is 2 GB, 2 kB can be used for one record. Therefore, the capacity required for storing meter-reading data in the entire database system is 5 TB (2.5 billion records).

上述のように、比較的短い時間間隔で大量のデータが発生する場合、１つのコンピュータシステムで対応しようとすると、データの収集や提供のための入出力がボトルネックになる上に、排他制御が行われる頻度が高くなる。そのため、入出力の速度や内部の処理能力が高い高性能のコンピュータシステムが必要になるという問題が生じる。この種の問題に対処する技術としては、分散データベースが知られている。すなわち、分散データベースは、複数のコンピュータシステムでそれぞれ構築されているデータベースを、１つのデータベースとみなしてアクセスするデータベース管理システムを備える。分散データベースを用いると、複数のコンピュータシステムにデータが分散して格納されるから、１つのコンピュータシステムで構築されたデータベースにおけるボトルネックを回避することが可能になる。 As described above, when a large amount of data is generated at a relatively short time interval, if one computer system tries to cope with it, input / output for collecting and providing data becomes a bottleneck, and exclusive control is performed. Increased frequency. Therefore, there arises a problem that a high-performance computer system having high input / output speed and high internal processing capability is required. A distributed database is known as a technique for dealing with this type of problem. That is, the distributed database includes a database management system that accesses a database constructed by a plurality of computer systems as a single database. When a distributed database is used, data is distributed and stored in a plurality of computer systems, so that it is possible to avoid a bottleneck in a database constructed with one computer system.

しかしながら、分散データベースを構築している物理的実体としてのコンピュータシステム（以下では、「実体システム」という）が複数存在している場合に、ハードウェア資源が異なる実体システムが混在することがある。データベースシステムでは、ハードウェア資源のうち、とくにＣＰＵ（Central Processing Unit）の性能と記憶装置の記憶容量との相違はスループットに大きく影響する。そのため、分散データベースシステムを構成している実体システムのハードウェア資源のばらつきに起因して、データを格納する際の書込時間やデータを提供する際の応答時間にばらつきが生じる。 However, when there are a plurality of computer systems (hereinafter referred to as “entity systems”) as physical entities constructing a distributed database, entity systems having different hardware resources may be mixed. In a database system, the difference between the performance of a CPU (Central Processing Unit) and the storage capacity of a storage device among hardware resources greatly affects the throughput. For this reason, due to variations in hardware resources of the entity systems constituting the distributed database system, variations occur in the writing time when storing data and the response time when providing data.

また、分散データベースシステムを構成している複数の実体システムにデータを格納する際に、単一の実体システムにアクセスが集中する場合があり、この場合には該当する実体システムがボトルネックになる。すなわち、分散データベースシステムの全体としてのスループットが低下することになる。 In addition, when data is stored in a plurality of entity systems constituting a distributed database system, access may be concentrated on a single entity system. In this case, the corresponding entity system becomes a bottleneck. That is, the overall throughput of the distributed database system is reduced.

本実施形態は、分散データベースシステムを構成している複数の実体システムにハードウェア資源のばらつきがあってもボトルネックの発生が抑制されるように、以下の構成を採用している。すなわち、以下の構成を採用することによって、ハードウェア資源の劣る実体システムが混在していてもスループットの低下が生じにくい分散データベースの提供が可能になる。 This embodiment employs the following configuration so that the occurrence of bottlenecks is suppressed even if there are variations in hardware resources among a plurality of entity systems that constitute the distributed database system. In other words, by adopting the following configuration, it is possible to provide a distributed database that is unlikely to cause a decrease in throughput even when a substantial system with inferior hardware resources is mixed.

本実施形態は、図２に示すように、複数台（図示例は３台）の実体システム２１，２２，２３を備え、少なくとも１つの実体システム２１と他の実体システム２２，２３とが電気通信回線ＮＴ１を通して通信する構成を用いて説明する。電気通信回線ＮＴ１はインターネットのような広域網である公衆網を想定しているが、専用網や構内網であってもよい。各実体システム２１，２２，２３は、ハードディスク装置のような大容量の記憶装置と、プログラムを実行するＣＰＵと、電気通信回線ＮＴ１を通して他装置と通信するための通信インターフェイス装置を備える。 As shown in FIG. 2, the present embodiment includes a plurality of (three in the illustrated example) entity systems 21, 22, and 23, and at least one entity system 21 and the other entity systems 22 and 23 communicate with each other. A description will be given using a configuration in which communication is performed through the line NT1. The telecommunications line NT1 is assumed to be a public network that is a wide area network such as the Internet, but may be a dedicated network or a private network. Each of the actual systems 21, 22, and 23 includes a large-capacity storage device such as a hard disk device, a CPU that executes a program, and a communication interface device for communicating with other devices through an electric communication line NT1.

図１に示すように、実体システム２１，２２，２３は、一体となってサーバ２０として機能し、電気通信回線ＮＴ１に接続された利用者端末３１をクライアントとしたクライアントサーバシステムを構築する。このクライアントサーバシステムは、プレゼンテーション層とアプリケーション層とデータ層とからなる３層アーキテクチャを有し、プレゼンテーション層は利用者端末３１により実現されている。 As shown in FIG. 1, the entity systems 21, 22, and 23 integrally function as the server 20, and construct a client server system using the user terminal 31 connected to the telecommunication line NT1 as a client. This client server system has a three-layer architecture including a presentation layer, an application layer, and a data layer, and the presentation layer is realized by a user terminal 31.

サーバ２０はデータ層に対応し、図示例において、アプリケーション層に相当するサーバ３２はサーバ２０とは別に設けられている。サーバ３２は、利用者端末３１から要求された処理に対応した依頼をサーバ２０に対して行い、サーバ２０の応答を利用者端末３１に返す機能を備える。 The server 20 corresponds to the data layer. In the illustrated example, the server 32 corresponding to the application layer is provided separately from the server 20. The server 32 has a function of making a request corresponding to the processing requested from the user terminal 31 to the server 20 and returning a response of the server 20 to the user terminal 31.

また、後述するように、サーバ２０は、複数台の仮想的なサーバ（以下、「仮想サーバ」という）１０に分割されており、サーバ３２は、複数台の仮想サーバ１０を統合して１台のサーバ２０として動作させる機能を備える。この機能はサーバ３２に設けられるＡＰＩ（Application Program Interface）により実現される。すなわち、サーバ３２のＡＰＩは、利用者端末３１からの要求に対応する仮想サーバ１０を選択する機能を実現するための関数ないし命令を備える。また、サーバ３２のＡＰＩは、複数台の実体システム２１，２２，２３を１台のサーバ２０として扱う機能も備える。 As will be described later, the server 20 is divided into a plurality of virtual servers (hereinafter referred to as “virtual servers”) 10, and the server 32 is formed by integrating a plurality of virtual servers 10. Provided with a function of operating as the server 20. This function is realized by an API (Application Program Interface) provided in the server 32. That is, the API of the server 32 includes a function or instruction for realizing a function of selecting the virtual server 10 corresponding to the request from the user terminal 31. Further, the API of the server 32 has a function of handling a plurality of entity systems 21, 22, and 23 as a single server 20.

言い換えると、サーバ２０は複数台の実体システム２１，２２，２３により構成された分散データベースであって、仮想サーバ１０は、このサーバ２０の構成単位になる。本実施形態において、各仮想サーバ１０は互いに記憶容量を等しくしてある。 In other words, the server 20 is a distributed database composed of a plurality of entity systems 21, 22, and 23, and the virtual server 10 is a structural unit of the server 20. In the present embodiment, the virtual servers 10 have the same storage capacity.

本実施形態は遠隔検針システムを想定しているから、検針データを収集してサーバ２０に格納させるための収集サーバ３３も電気通信回線ＮＴ１に接続される。収集サーバ３３には、複数の仮想サーバ１０の中から検針データを格納する仮想サーバ１０を決定する選別機能部（選別手段）３３１を実現するためのＡＰＩが設けられる。選別機能部３３１の動作については後述する。 Since the present embodiment assumes a remote meter reading system, a collection server 33 for collecting meter reading data and storing it in the server 20 is also connected to the telecommunication line NT1. The collection server 33 is provided with an API for realizing a sorting function unit (sorting means) 331 that determines a virtual server 10 that stores meter-reading data from among a plurality of virtual servers 10. The operation of the sorting function unit 331 will be described later.

仮想サーバ１０は、それぞれデータを格納するデータ格納部１０１と、データベース管理システムの機能を付与する管理機能部１０２とを備える。管理機能部１０２は、実体システム２１，２２，２３に設けたＡＰＩにより実現されている。管理機能部１０２は、データ格納部１０１へのデータの格納、記憶装置から読み出したデータの提供、記憶装置に格納されているデータの削除などのデータの操作を行う機能のほか、トランザクション管理を行う機能などを備える。管理機能部１０２におけるこの種の機能は一般的なデータベース管理システムが備えている機能と同様である。ただし、管理機能部１０２は、トランザクション管理を行うインターフェイス層１０２１、データベース制御を行う上位層１０２２、データベース操作を行う下位層１０２３を備えた３層で構成されている。 The virtual server 10 includes a data storage unit 101 that stores data, and a management function unit 102 that provides functions of a database management system. The management function unit 102 is realized by an API provided in the entity systems 21, 22, and 23. The management function unit 102 performs transaction management in addition to functions for operating data such as storing data in the data storage unit 101, providing data read from the storage device, and deleting data stored in the storage device. It has functions. This type of function in the management function unit 102 is the same as the function of a general database management system. However, the management function unit 102 includes three layers including an interface layer 1021 that performs transaction management, an upper layer 1022 that performs database control, and a lower layer 1023 that performs database operations.

上述したようにサーバ２０は、複数台（たとえば、１００台）の仮想サーバ１０に分割されているから、データ格納部１０１も分割される。また、仮想サーバ１０ごとに管理機能部１０２を備えている。ここでは、説明を簡単にするために、データ格納部１０１と管理機能部１０２とが一対一に対応していると仮定する。言い換えると、仮想サーバ１０ごとに、データ格納部１０１と管理機能部１０２とが１つずつ設けられていることになる。 As described above, since the server 20 is divided into a plurality of (for example, 100) virtual servers 10, the data storage unit 101 is also divided. Each virtual server 10 includes a management function unit 102. Here, in order to simplify the explanation, it is assumed that the data storage unit 101 and the management function unit 102 have a one-to-one correspondence. In other words, one data storage unit 101 and one management function unit 102 are provided for each virtual server 10.

また、本実施形態では、仮想サーバ１０において、データ格納部１０１の記憶容量が互いに等しく設定される。したがって、仮想サーバ１０を構成する実体システム２１，２２，２３の記憶装置の記憶容量にかかわらず、各仮想サーバ１０のデータ格納部１０１の記憶容量は等しくなる。実体システム２１，２２，２３では、一般的な傾向としては、処理能力が高いほど記憶装置の記憶容量が大きくなっている。このことから、仮想サーバ１０においてデータ格納部１０１の記憶容量を等しくしたことにより、実体システム２１，２２，２３の処理能力にかかわらず仮想サーバ１０のスループットはほぼ等しくなると考えられる。 In the present embodiment, the storage capacities of the data storage units 101 are set to be equal to each other in the virtual server 10. Accordingly, the storage capacities of the data storage units 101 of the respective virtual servers 10 are equal regardless of the storage capacities of the storage devices of the entity systems 21, 22, and 23 constituting the virtual server 10. In the actual systems 21, 22, and 23, as a general tendency, the storage capacity of the storage device increases as the processing capability increases. From this, it can be considered that by making the storage capacity of the data storage unit 101 equal in the virtual server 10, the throughput of the virtual server 10 becomes substantially equal regardless of the processing capabilities of the entity systems 21, 22, and 23.

１つの仮想サーバ１０のデータ格納部１０１は、図３に示すように、記憶領域が互いに等しい単位サイズに分割され、各単位サイズの記憶領域がそれぞれテーブル１０１１を形成している。したがって、データ格納部１０１は、複数個（図示例では、１００個）のテーブル１０１１を備える。図示例では、テーブル１０１１ごとに識別用の４桁の番号が割り当ててある。図示例では、テーブル１０１１を識別する番号は「０００１」〜「２５００」になっている。テーブル１０１１を識別する番号の下二桁は、仮想サーバ１０を識別する番号に相当し、「０１」…「９９」「００」が用いられている。なお、「００」は１００番に対応する。 As illustrated in FIG. 3, the data storage unit 101 of one virtual server 10 is divided into storage units having the same unit size, and each storage region of each unit size forms a table 1011. Therefore, the data storage unit 101 includes a plurality (100 in the illustrated example) of tables 1011. In the illustrated example, a 4-digit number for identification is assigned to each table 1011. In the illustrated example, the numbers for identifying the table 1011 are “0001” to “2500”. The last two digits of the number identifying the table 1011 correspond to the number identifying the virtual server 10, and “01”... “99” “00” is used. “00” corresponds to No. 100.

また、需要家には個別に識別情報（以下、「データＩＤ」という）が設定され、図示例は１００万の需要家が存在している場合を想定している。すなわち、テーブル１０１１に登録されるレコードには１００万種類のデータＩＤが付与される。ここに、同じデータＩＤのレコードは、同じテーブル１０１１に格納される。また、データＩＤは数値で設定されるか、適宜に数値に変換され、１番から順に１００００００番まで付与される。ここでは、データが検針データであるから、データＩＤは需要家に対応しているが、一般化すれば、データＩＤは、データが属する集合を識別するために付与されていると言える。 Further, identification information (hereinafter referred to as “data ID”) is individually set for each consumer, and the illustrated example assumes a case where 1 million consumers exist. That is, 1 million kinds of data IDs are assigned to records registered in the table 1011. Here, records with the same data ID are stored in the same table 1011. Further, the data ID is set as a numerical value or is appropriately converted into a numerical value, and assigned from 1 to 1000000 in order. Here, since the data is meter reading data, the data ID corresponds to the customer. However, if generalized, it can be said that the data ID is given to identify the set to which the data belongs.

上述のような仮想サーバ１０を構築するには、仮想サーバ１０の個数、１つの仮想サーバ１０の記憶容量、１つのテーブル１０１１の記憶容量などを定める必要がある。 In order to construct the virtual server 10 as described above, it is necessary to determine the number of virtual servers 10, the storage capacity of one virtual server 10, the storage capacity of one table 1011, and the like.

以下では、仮想サーバ１０を適正に構築する手順を説明する。まず、データベースシステムの規模を定めるために、想定される需要家の数、すなわち、想定されるデータＩＤの最大数が必要である。 Hereinafter, a procedure for properly constructing the virtual server 10 will be described. First, in order to determine the scale of the database system, the number of assumed consumers, that is, the maximum number of assumed data IDs is required.

また、仮想サーバ１０は、実体システム２１，２２，２３により構成されているから、１つの仮想サーバ１０が複数の実体システム２１，２２，２３に跨って構成されることは好ましくない。そのため、仮想サーバ１０を構築するにあたっては、実体システム２１，２２，２３ごとの記憶容量を知る必要がある。 Further, since the virtual server 10 is configured by the entity systems 21, 22, and 23, it is not preferable that one virtual server 10 is configured across the plurality of entity systems 21, 22, and 23. Therefore, when constructing the virtual server 10, it is necessary to know the storage capacity of each of the entity systems 21, 22, and 23.

仮想サーバ１０におけるデータ格納部１０１の記憶容量は、実体システム２１，２２，２３が備えている記憶装置のうちの最小の記憶容量が上限になる。したがって、実体システム２１，２２，２３における記憶装置について想定される記憶容量の最小値を求め、この最小値を用いて仮想サーバ１０におけるデータ格納部１０１の記憶容量が定められる。 The storage capacity of the data storage unit 101 in the virtual server 10 is limited to the minimum storage capacity of the storage devices included in the real systems 21, 22, and 23. Therefore, the minimum value of the storage capacity assumed for the storage devices in the real systems 21, 22, and 23 is obtained, and the storage capacity of the data storage unit 101 in the virtual server 10 is determined using this minimum value.

さらに、１つのテーブル１０１１に格納するデータ量（レコード数）およびテーブル１０１１の記憶容量は設計値であって、需要家から取得して格納するデータ量に応じて適宜に定められることになる。１つのテーブル１０１１に格納されるデータ量の上限値は、仮想サーバ１０におけるデータ格納部１０１の記憶容量として定めた値になる。ただし、実際には、データ格納部１０１の記憶容量よりも十分に小さくすることが好ましい。 Furthermore, the amount of data (number of records) stored in one table 1011 and the storage capacity of the table 1011 are design values, and are appropriately determined according to the amount of data acquired and stored from a consumer. The upper limit value of the data amount stored in one table 1011 is a value determined as the storage capacity of the data storage unit 101 in the virtual server 10. However, in practice, it is preferable to make it sufficiently smaller than the storage capacity of the data storage unit 101.

以上のように、仮想サーバ１０を用いた分散データベースを構築するには、設計条件として以下の情報が必要である。すなわち、想定されるデータＩＤの最大数、データＩＤごとに格納するレコード数、実体システムの記憶容量、仮想サーバ１０の記憶容量、テーブル１０１１の記憶容量、テーブル１０１１に格納するレコードの最大数が必要である。 As described above, in order to construct a distributed database using the virtual server 10, the following information is necessary as a design condition. That is, the maximum number of data IDs assumed, the number of records stored for each data ID, the storage capacity of the real system, the storage capacity of the virtual server 10, the storage capacity of the table 1011 and the maximum number of records stored in the table 1011 are required. It is.

これらのパラメータが得られると、仮想サーバ１０の個数、テーブル１０１１の総数の最大値、１つの仮想サーバ１０におけるテーブル１０１１の個数が求められる。また、実体システム２１，２２，２３に対する仮想サーバ１０の対応付けが可能になる。 When these parameters are obtained, the number of virtual servers 10, the maximum value of the total number of tables 1011, and the number of tables 1011 in one virtual server 10 are obtained. Further, the virtual server 10 can be associated with the actual systems 21, 22, and 23.

いま、１つの仮想サーバ１０の記憶容量をＱ、テーブル１０１１の記憶容量をｑとすると、１つの仮想サーバ１０におけるテーブル１０１１の個数はＱ／ｑになる。さらに、想定されるデータＩＤの最大数をＮ、１つのデータＩＤに関して格納するレコード数をｍ、１つのテーブル１０１１に格納するレコード数をｎとすれば、必要なテーブル１０１１の個数はＮ×ｍ／ｎになる。このことから、仮想サーバ１０の個数は、（Ｎ×ｍ／ｎ）／（Ｑ／ｑ）＝Ｎｍｑ／ｎＱになる。 If the storage capacity of one virtual server 10 is Q and the storage capacity of the table 1011 is q, the number of tables 1011 in one virtual server 10 is Q / q. Furthermore, if the maximum number of assumed data IDs is N, the number of records stored for one data ID is m, and the number of records stored in one table 1011 is n, the number of necessary tables 1011 is N × m. / N. From this, the number of virtual servers 10 is (N × m / n) / (Q / q) = Nmq / nQ.

図３に示す構成例は、想定される需要家（データＩＤ）の最大数Ｎを１００万、テーブル１０１１の記憶容量ｑを２ＧＢ、テーブル１０１１に格納するレコード数ｎを１００万としている。さらに、１つの需要家に関して格納するレコード数ｍは２５００個にしている。図３に示す構成例では、複数台の実体システムにおける記憶容量の最小値を５０ＧＢとしている。そのため、仮想サーバ１０の記憶容量Ｑも５０ＧＢとしている。なお、図２の構成例では３台の実体システム２１，２２，２３を用いているが、図３の構成例は、さらに多くの実体システムを用いることになる。 In the configuration example shown in FIG. 3, the assumed maximum number N of customers (data IDs) is 1 million, the storage capacity q of the table 1011 is 2 GB, and the number of records n stored in the table 1011 is 1 million. Furthermore, the number of records m stored for one consumer is 2500. In the configuration example shown in FIG. 3, the minimum value of the storage capacity in a plurality of physical systems is 50 GB. Therefore, the storage capacity Q of the virtual server 10 is also set to 50 GB. In the configuration example of FIG. 2, three entity systems 21, 22, and 23 are used. However, the configuration example of FIG. 3 uses more entity systems.

これらの数値を上述した関係式に当てはめると、１つの仮想サーバ１０に設けるテーブル１０１１の個数は、５０ＧＢ／２ＧＢ＝２５個になる。また、データベースシステムを構成する仮想サーバ１０の個数は、（１００００００個×２５００レコード×２ＧＢ）／（１００００００×５０ＧＢ）＝１００個になる。 When these numerical values are applied to the relational expression described above, the number of tables 1011 provided in one virtual server 10 is 50 GB / 2 GB = 25. Further, the number of virtual servers 10 configuring the database system is (1000000 × 2500 records × 2 GB) / (1000000 × 50 GB) = 100.

ここで、データベースシステムを構成する実体システムとして、記憶容量が１００ＧＢの実体システムと、記憶容量が５０ＧＢの実体システムとを用いている場合を想定する。仮想サーバ１０の記憶容量が５０ＧＢであるとすれば、１個の仮想サーバ１０のみを構成する実体システムと、２個の仮想サーバ１０を構成する実体システムとが混在することになる。 Here, it is assumed that an entity system having a storage capacity of 100 GB and an entity system having a storage capacity of 50 GB are used as the entity system constituting the database system. If the storage capacity of the virtual server 10 is 50 GB, an entity system that configures only one virtual server 10 and an entity system that configures two virtual servers 10 are mixed.

上述した構成例において、データベースシステムが、１００ＧＢの実体システムを４０台用い、５０ＧＢの実体システムを２０台用いて構成されていると仮定する。このように仮定すると、１００個の仮想サーバ１０のうち、２０個が５０ＧＢの実体システムに割り当てられ、８０個が１００ＧＢの実体システムに割り当てられる。この構成では、仮想サーバ１０の識別に用いる番号は、複数個の仮想サーバ１０を含む実体システムで構成された仮想サーバ１０の間に、１個の仮想サーバ１０のみを含む実体システムで構成された仮想サーバ１０が並ぶように割り当てることが好ましい。 In the configuration example described above, it is assumed that the database system is configured using 40 100 GB entity systems and 20 50 GB entity systems. Assuming this, 20 of the 100 virtual servers 10 are allocated to the 50 GB entity system, and 80 are allocated to the 100 GB entity system. In this configuration, the number used to identify the virtual server 10 is configured by an entity system including only one virtual server 10 between the virtual servers 10 configured by the entity system including a plurality of virtual servers 10. It is preferable to assign the virtual servers 10 so that they line up.

たとえば、複数の仮想サーバ１０を含む実体システムを用いた仮想サーバ１０を識別する番号に「０１」〜「４０」、「６１」〜「９９」「００」を用い、単体の仮想サーバ１０を含む実体システムを用いた仮想サーバ１０を識別する番号に「４１」〜「６０」を用いる。ここに、複数の仮想サーバ１０を含む実体システムにおいて、仮想サーバ１０を識別する番号は必ずしも連続していなくてもよいが、以下の説明では連続していると仮定して説明する。 For example, “01” to “40”, “61” to “99”, “00” are used as numbers for identifying the virtual server 10 using the entity system including the plurality of virtual servers 10, and the single virtual server 10 is included. “41” to “60” are used as numbers for identifying the virtual servers 10 using the real system. Here, in an entity system including a plurality of virtual servers 10, the numbers for identifying the virtual servers 10 do not necessarily have to be consecutive, but in the following description, it is assumed that they are consecutive.

ところで、上述の構成を備えるデータベースシステムを用いて検針データを格納する際に、検針データを格納するテーブル１０１１を決定する必要がある。すなわち、収集サーバ３３の選別機能部３３１は、以下に説明する手順で、検針データを格納するための仮想サーバ１０を求め、その仮想サーバ１０に対応する実体システムを決定し、さらに、検針データを格納するテーブル１０１１を決定する。 By the way, when storing meter reading data using the database system having the above-described configuration, it is necessary to determine the table 1011 for storing meter reading data. That is, the selection function unit 331 of the collection server 33 obtains a virtual server 10 for storing meter reading data in the procedure described below, determines a substantial system corresponding to the virtual server 10, and further acquires meter reading data. The table 1011 to be stored is determined.

検針データを格納する仮想サーバ１０は、数値で表されているデータＩＤを用いることにより簡単に決定することができる。すなわち、データＩＤをＸとし、仮想サーバ１０の個数をＫとすると、当該需要家に対応する検針データを格納する仮想サーバ１０を識別する番号は、ＸをＫで除した剰余で求められる（つまり、Ｘ mod Ｋ）。ただし、剰余が０になる場合は、「００」の仮想サーバ１０に割り当てられる。 The virtual server 10 that stores meter-reading data can be easily determined by using a data ID represented by a numerical value. That is, if the data ID is X and the number of virtual servers 10 is K, the number for identifying the virtual server 10 storing meter reading data corresponding to the customer is obtained by the remainder obtained by dividing X by K (that is, X mod K). However, when the remainder becomes 0, it is assigned to the virtual server 10 of “00”.

たとえば、データＩＤ＝５０であれば、（５０ mod １００）＝５０であるから、該当する検針データは５０番の仮想サーバ１０に格納される。また、たとえば、データＩＤ＝３６９３５であれば、（３６９３５ mod １００）＝３５であるから、該当する検針データは３５番の仮想サーバ１０に格納される。仮想サーバ１０の番号は、収集サーバ３３の選別機能部３３１において、ルックアップテーブルなどによって、実体システムとの対応付けがなされているから、どの実体システムにアクセスするかが決められる。 For example, if the data ID = 50, (50 mod 100) = 50, so the corresponding meter reading data is stored in the 50th virtual server 10. Further, for example, if the data ID = 36935, (36935 mod 100) = 35, so the corresponding meter reading data is stored in the 35th virtual server 10. The number of the virtual server 10 is associated with the entity system by the lookup table or the like in the selection function unit 331 of the collection server 33, and therefore, which entity system is accessed is determined.

次に、仮想サーバ１０の中のどのテーブル１０１１に検針データを格納するかを決定する必要がある。上述した構成では、検針データを格納するテーブル１０１１の決定も容易であって、テーブル１０１１の総数をＬとすると、データＩＤがＸであるとき、ＸをＬで除した剰余がテーブル１０１１の番号になる（つまり、Ｘ mod Ｌ）。ただし、剰余が０になる場合は、２５００番のテーブル１０１１が割り当てられる。 Next, it is necessary to determine which table 1011 in the virtual server 10 stores the meter reading data. In the configuration described above, it is easy to determine the table 1011 for storing meter-reading data. When the total number of the table 1011 is L, when the data ID is X, the remainder obtained by dividing X by L is the number of the table 1011. (That is, X mod L). However, when the remainder becomes 0, the 2500th table 1011 is allocated.

たとえば、データＩＤ＝５０であれば、（５０ mod ２５００）＝５０であるから、該当する検針データは００５０番のテーブル１０１１に格納される。また、たとえば、データＩＤ＝３６９３５であれば、（３６９３５ mod ２５００）＝１９３５であるから、該当する検針データは１９３５番のテーブル１０１１に格納される。 For example, if the data ID = 50, (50 mod 2500) = 50, so the corresponding meter reading data is stored in the table 511 of the number 0050. Further, for example, if the data ID = 36935, (36935 mod 2500) = 1935, the corresponding meter reading data is stored in the 1935 table 1011 .

上述したように、検針データを格納する仮想サーバ１０を定めることにより、検針データを格納する実体システムが求められるから、該当する実体システムの範囲で、仮想サーバ１０およびテーブル１０１１に相当する記憶領域を抽出することができる。すなわち、検針データを適正なテーブル１０１１に格納することができる。 As described above, by defining the virtual server 10 for storing meter-reading data, an entity system for storing meter-reading data is required. Therefore, a storage area corresponding to the virtual server 10 and the table 1011 is included in the range of the corresponding entity system. Can be extracted. That is, meter reading data can be stored in an appropriate table 1011.

このような手順で検針データを格納するテーブル１０１１を選択すると、データＩＤが連続した数値であるときには、図３に矢印で示しているように、各需要家の検針データが異なる仮想サーバ１０に順に格納される。 When the table 1011 for storing meter reading data is selected in such a procedure, when the data ID is a continuous numerical value, as shown by arrows in FIG. Stored.

すなわち、１番の需要家の検針データは１番の仮想サーバ１０に設けた０００１番のテーブルに格納され、２番の需要家の検針データは２番の仮想サーバ１０に設けた０００２番のテーブルに格納される。同様にして、１０１番の需要家の検針データは１番の仮想サーバ１０に設けた０１０１番のテーブル１０１１に格納され、２５００番の需要家の検針データは１００番の仮想サーバ１０の２５００番のテーブル１０１１に格納される。ここに、２５０１番の需要家の検針データは、１番の仮想サーバ１０における０００１番のテーブル１０１１に格納される。要するに、データＩＤを２５００で除した剰余が１である検針データは、０００１番のテーブル１０１１に格納される。このようにして、１００万の需要家に対して、１つのテーブル１０１１には４００の需要家の検針データが格納される。 That is, meter reading data of the first customer is stored in a table 0001 provided in the first virtual server 10, and meter reading data of the second customer is stored in a table 0002 provided in the second virtual server 10. Stored in Similarly, the meter reading data of the 101st customer is stored in the 0101 table 1011 provided in the 1st virtual server 10 and the meter reading data of the 2500th customer is the 2500th of the 100th virtual server 10. It is stored in the table 1011. The meter reading data of the 2501 customer is stored in the 0001 table 1011 in the 1st virtual server 10. In short, the meter reading data whose remainder is 1 obtained by dividing the data ID by 2500 is stored in the table 1011 of the number 0001. Thus, for one million customers, meter reading data of 400 customers is stored in one table 1011.

上述の動作により、あたかも複数台のハードディスクを用いてストライピングを行う場合のように、各需要家の検針データが異なる仮想サーバ１０に振り分けて格納される。すなわち、アクセスが単一の仮想サーバ１０に集中しなくなり、検針データの格納が高速化される。また、１つの仮想サーバ１０の処理能力が比較的低い場合でもボトルネックが生じにくい分散データベースを構築することが可能になる。すなわち、本実施形態では、仮想サーバ１０の記憶容量を等しくし、各仮想サーバ１０の処理能力を平準化したことにより、検針データを格納する際の処理速度も平準化されることになる。 With the above-described operation, the meter reading data of each consumer is distributed and stored in different virtual servers 10 as if striping is performed using a plurality of hard disks. That is, access is not concentrated on a single virtual server 10, and meter reading data is stored faster. Further, even when the processing capacity of one virtual server 10 is relatively low, it is possible to construct a distributed database in which bottlenecks are unlikely to occur. That is, in the present embodiment, the processing speed when storing meter-reading data is equalized by equalizing the storage capacities of the virtual servers 10 and leveling the processing capabilities of the virtual servers 10.

上述した構成例では、需要家の検針データを扱うデータベースの例について説明したが、１つの計測点において複数種類のデータを取得して格納する用途もある。たとえば、需要家における電気の検針データだけではなく、ガスの検針データと水道の検針データとを併せて格納することが可能であり、また、需要家における防犯や防災に関連する情報を格納することなども可能である。つまり、データベースの用途は遠隔検針の用途に限定されず、数値データのほか、テキストデータ、音声データ、画像データなどの複数種類のデータを格納する場合もある。このように、種類の異なるデータを収集する場合は、データの種類が識別可能となるように、データが得られた場所などに関するデータＩＤだけではなくデータの種別名がデータに付加される。 In the above-described configuration example, an example of a database that handles customer meter reading data has been described. However, there is a use in which a plurality of types of data are acquired and stored at one measurement point. For example, it is possible to store not only electricity meter reading data at consumers but also gas meter data and water meter reading data together, and store information related to crime prevention and disaster prevention at consumers. Etc. are also possible. That is, the use of the database is not limited to the use of remote meter reading, and in addition to numerical data, a plurality of types of data such as text data, audio data, and image data may be stored. Thus, when collecting different types of data, not only the data ID relating to the location where the data is obtained but also the data type name is added to the data so that the data type can be identified.

ところで、この種のデータは、データの種類によって情報量が異なるから、複数種類のデータが混在している場合にデータを一括して扱うと、一括されたデータに含まれているデータの種類に応じて情報量に大きなばらつきが生じる。そのため、複数種類のデータが混在した状態で、この種のデータを一括して一つのテーブル１０１１に格納すると、仮想サーバ１０のアクセス時間に大きなばらつきが生じる。 By the way, this type of data has a different amount of information depending on the type of data. Therefore, when multiple types of data are mixed, if the data is handled in a batch, the type of data included in the batched data is changed. Correspondingly, a large variation occurs in the amount of information. For this reason, when this type of data is stored in one table 1011 in a state where a plurality of types of data are mixed, the access time of the virtual server 10 varies greatly.

そこで、図４に示すように、仮想サーバ１０のデータ格納部１０１に、データの種類ごとに振り分けてデータを格納する複数個（図示例では３個）の種類別記憶部１０１２，１０１３，１０１４を設けることが望ましい。この場合、仮想サーバ１０には、データの種類を振り分けるための分類機能部１０３が設けられる。分類機能部１０３は、管理機能部１０２と同様にＡＰＩを用いて構成される。すなわち、収集サーバ３３から仮想サーバ１０に転送されたデータは、仮想サーバ１０の分類機能部１０３においてデータの種類ごとに振り分けられ、それぞれの種類別記憶部１０１２，１０１３，１０１４に格納される。 Therefore, as shown in FIG. 4, in the data storage unit 101 of the virtual server 10, a plurality (three in the illustrated example) of storage units 1012, 1013, and 1014 that store the data sorted according to the type of data are stored. It is desirable to provide it. In this case, the virtual server 10 is provided with a classification function unit 103 for distributing data types. The classification function unit 103 is configured using an API in the same manner as the management function unit 102. That is, the data transferred from the collection server 33 to the virtual server 10 is sorted for each data type in the classification function unit 103 of the virtual server 10 and stored in the respective type storage units 1012, 1013, and 1014.

このように、異なる種類のデータが種類別記憶部１０１２，１０１３，１０１４に振り分けて格納される。したがって、データＩＤごとに情報量が異なっていても、種類別記憶部１０１２，１０１３，１０１４に格納されるデータについては、情報量（データ量）のばらつきが抑制される。すなわち、データＩＤごとに、種類別記憶部１０１２，１０１３，１０１４に格納するデータが対応付けられる場合と、データが対応付けられない場合があるが、データＩＤに格納するデータが対応付けられている場合、当該データの情報量のばらつきは少ない。したがって、各種類別記憶部１０１２，１０１３，１０１４にデータを格納する処理の負荷は、種類別記憶部１０１２，１０１３，１０１４ごとに、それぞれ平準化されることになる。 Thus, Ru stored different kinds of data are distributed by type storage unit 1012,1013,1014. Therefore, even if the information amount differs for each data ID, the variation in the information amount (data amount) is suppressed for the data stored in the type storage units 1012, 1013, and 1014. That is, data stored in the type-specific storage units 1012, 1013, and 1014 may be associated with each data ID, and data may not be associated, but data stored in the data ID is associated. In this case, there is little variation in the information amount of the data. Therefore, the processing load for storing data in each type storage unit 1012, 1013, 1014 is equalized for each type storage unit 1012, 1013, 1014.

なお、図４の構成例では、１つの仮想サーバ１０に複数の種類別記憶部１０１２，１０１３，１０１４を設けているが、種類別記憶部１０１２，１０１３，１０１４ごとに異なる仮想サーバ１０に分散させたり、種類別記憶部１０１２，１０１３，１０１４の一部を異なる仮想サーバ１０に分散させてもよい。 In the configuration example of FIG. 4, a plurality of type-specific storage units 1012, 1013, and 1014 are provided in one virtual server 10. However, the type-specific storage units 1012, 1013, and 1014 are distributed to different virtual servers 10. Alternatively, part of the type-specific storage units 1012, 1013, and 1014 may be distributed to different virtual servers 10.

ところで、データＩＤが増加し、予想したデータＩＤの最大数を上回るときには、データベースシステムの拡張が必要になる。上述の例では、データＩＤの最大数を１００万としているが、さらに大規模にデータを収集しようとすれば、データＩＤの最大数を増加させることが必要になる。このような場合、上述した構成を１単位とし、必要なだけ単位数を増加させればよい。たとえば、１単位ごとにグループ名を付与し、グループ名として０から順に与えられる数値を用いる。上述した構成例では、１００万個のデータＩＤについて、合計で２５億レコードを格納することができるから、１グループ増加させると、２００万個のデータＩＤについて、合計で５０億レコードを格納することが可能になる。つまり、データＩＤが１〜１００００００の範囲のデータはグループ「０」に格納し、データＩＤが１０００００１〜２００００００の範囲のデータはグループ「１」に格納すればよい。この例において、グループ名は、データＩＤを１００００００で除した商になる。 By the way, when the data ID increases and exceeds the predicted maximum number of data IDs, it is necessary to expand the database system. In the above example, the maximum number of data IDs is 1 million. However, in order to collect data on a larger scale, it is necessary to increase the maximum number of data IDs. In such a case, the above-described configuration may be one unit and the number of units may be increased as necessary. For example, a group name is assigned to each unit, and numerical values given in order from 0 are used as the group name. In the configuration example described above, a total of 2.5 billion records can be stored for 1 million data IDs, so when one group is added, a total of 5 billion records are stored for 2 million data IDs. Is possible. That is, data in the range of data ID 1 to 1000000 may be stored in group “0”, and data in the range of data ID 1000001 to 2000000 may be stored in group “1”. In this example, the group name is a quotient obtained by dividing the data ID by 1000000.

上述のように１つのグループを単位として、グループ単位でデータベースを拡張することにより、収集サーバ３３の選別機能部３３１は、グループを識別する手順を付加するだけで、他の手順を変更することなくデータを振り分けることが可能になる。なお、収集サーバ３３の前段に、グループを分けるサーバを別に設けてもよい。すなわち、グループの仕分けを行う機能を専用に設け、収集サーバ３３も含めてグループを追加することによって、データベースシステムの拡張を図ってもよい。 As described above, by expanding the database in units of one group, the selection function unit 331 of the collection server 33 only adds a procedure for identifying the group without changing other procedures. Data can be distributed. It should be noted that a server for dividing the group may be separately provided before the collection server 33. In other words, the database system may be expanded by providing a dedicated function for grouping and adding groups including the collection server 33.

データベースシステムを拡張が必要になるか否かは、データＩＤの総数によるから、収集サーバ３３あるいはサーバ３２において、データＩＤの総数を取得する機能を設けておけばよい（通常は、キーボードなどの入力装置が操作されることにより取得する）。収集サーバ３３あるいはサーバ３２では、取得したデータＩＤの総数からデータの格納に必要なグループ数を算出し、算出したグループ数が現状のグループ数よりも多ければ、データベースの拡張を利用者端末３１に指示するようにすればよい。 Whether or not the database system needs to be expanded depends on the total number of data IDs. Therefore, the collection server 33 or the server 32 may be provided with a function for acquiring the total number of data IDs (usually input from a keyboard or the like). (Acquired by operating the device). The collection server 33 or the server 32 calculates the number of groups necessary for data storage from the total number of acquired data IDs, and if the calculated number of groups is larger than the current number of groups, the database extension is made to the user terminal 31. You just have to give instructions.

ところで、一般的には、データベースを拡張した場合、新規に設けられた実体システムにデータが格納されるから、新規の実体システムにアクセスが集中すると考えられる。しかしながら、本実施形態では、データＩＤを用いて複数の仮想サーバ１０にデータを分散させて格納しているので、データベースを拡張した場合であっても、新規に設けられた実体システムにアクセスが集中することがない。つまり、実体システムへのアクセスが集中することがなく、仮想サーバ１０へのアクセスが平準化され、結果的に仮想サーバ１０の記憶容量やアクセスによる負荷が平準化できるのである。 By the way, in general, when the database is expanded, data is stored in a newly provided entity system, so that it is considered that access concentrates on the new entity system. However, in this embodiment, since data is distributed and stored in a plurality of virtual servers 10 using data IDs, even when the database is expanded, access is concentrated on the newly provided entity system. There is nothing to do. In other words, access to the real system is not concentrated, access to the virtual server 10 is leveled, and as a result, the storage capacity of the virtual server 10 and the load due to access can be leveled.

なお、上述の例においてデータを格納する際に、仮想サーバ１０およびテーブル１０１１を決定し、該当する実体システムにデータを引き渡す技術について説明したが、データを抽出する際にも同様の手順を用いる。たとえば、特定のデータＩＤに関するデータが必要であれば、データＩＤを与えて仮想サーバ１０およびテーブル１０１１を決定し、該当する実体システムのテーブル１０１１からデータを抽出すればよい。 In the above-described example, a technique has been described in which the virtual server 10 and the table 1011 are determined when data is stored, and the data is transferred to the corresponding actual system. However, a similar procedure is used when data is extracted. For example, if data relating to a specific data ID is necessary, the virtual server 10 and the table 1011 may be determined by giving the data ID, and the data may be extracted from the table 1011 of the corresponding entity system.

上述した構成例では、格納するデータの収集および格納されたデータの使用にあたり電気通信回線を用い、複数台の実体システムの間で電気通信回線を通して通信することにより実体システムを連携させているが、電気通信回線を用いて通信することは必須ではない。つまり、データの収集あるいは使用を実体システムに付設した入力装置や出力装置によって行うことが可能であり、また、複数の実体システムを空間的に近接させて配置するとともに連携可能となるように適宜のインターフェイス装置を介して接続してもよい。 In the above-described configuration example, the collection of data to be stored and the use of the stored data use the telecommunication line, and the entity system is linked by communicating through the telecommunication line between the plurality of entity systems. It is not essential to communicate using a telecommunication line. In other words, data can be collected or used by an input device or an output device attached to the entity system, and a plurality of entity systems can be arranged in close proximity and can be linked appropriately. You may connect via an interface apparatus.

なお、管理機能部１０２を構成するＡＰＩに、上述したデータベースシステムに関する種々のパラメータを受け付ける機能を持たせてもよい。この場合、管理機能部１０２にパラメータを設定することによって、管理機能部１０２が、仮想サーバ１０におけるデータ格納部１０１の記憶容量、テーブル１０１１の総数、テーブル１０１１の記憶容量などを定めるようにしてもよい。 Note that the API constituting the management function unit 102 may have a function of accepting various parameters related to the above-described database system. In this case, by setting parameters in the management function unit 102, the management function unit 102 may determine the storage capacity of the data storage unit 101 in the virtual server 10, the total number of tables 1011, the storage capacity of the table 1011, and the like. Good.

また、上述の例ではデータを格納する仮想サーバ１０およびテーブル１０１１を決めるために、選別機能部３３１において、データＩＤの除算を行っているが、乱数などの他の関係を用いて分散させてもよい。 In the above example, in order to determine the virtual server 10 and the table 1011 for storing data, the selection function unit 331 divides the data ID, but it may be distributed using other relationships such as random numbers. Good.

１０仮想サーバ
２０サーバ（分散データベース）
２１，２２，２３実体システム（コンピュータシステム）
１０１データ格納部
１０２管理機能部
３３１選別機能部（選別手段）
１０１１テーブル
１０１２，１０１３，１０１４種類別記憶部 10 virtual servers 20 servers (distributed database)
21, 22, 23 Entity system (computer system)
101 Data storage unit 102 Management function unit 331 Sorting function unit (sorting means)
1011 Table 1012, 1013, 1014 Type storage unit

Claims

A data storage system for storing data in a distributed database configured using a plurality of computer systems, wherein a plurality of virtual servers having the same storage capacity as a constituent unit of the distributed database and data from the virtual servers Selecting means for determining a virtual server for storing the virtual server, the virtual server comprising a data storage unit that divides a storage area for storing data into a plurality of tables having the same unit size, and the storage capacity of the virtual server Defines the storage capacity of the data storage unit so that the minimum storage capacity among the storage capacities of the plurality of computers is set as an upper limit, and the selecting means identifies the set to which the data belongs. Data belonging to the same set is stored in the same virtual server and the same table by using numerical data IDs The data storage is characterized in that the virtual server and the table for storing the data are determined, the data to be stored in the distributed database is distributed to the virtual server and stored in the table. system.

The data storage system according to claim 1, wherein the data storage unit includes a plurality of types of storage units that store data for each type.

3. The distributed database includes a management function unit that stores data in the data storage unit, provides data from the data storage unit, and deletes data in the data storage unit. The data storage system described.

A data ID for identifying a set to which the data belongs is added to the data, and the selecting means specifies a virtual server by a remainder obtained by dividing the data ID by the number of the virtual servers. The data is distributed and stored so that data having the same data ID is stored in the same table by specifying the table by a remainder obtained by dividing the data ID by the number of the tables. Item 4. The data storage system according to any one of Items 1 to 3.