Nothing Special   »   [go: up one dir, main page]

CN110445662B - Method and device for adaptively switching OpenStack control node into computing node - Google Patents

Method and device for adaptively switching OpenStack control node into computing node Download PDF

Info

Publication number
CN110445662B
CN110445662B CN201910810282.4A CN201910810282A CN110445662B CN 110445662 B CN110445662 B CN 110445662B CN 201910810282 A CN201910810282 A CN 201910810282A CN 110445662 B CN110445662 B CN 110445662B
Authority
CN
China
Prior art keywords
node
computing
computing node
nodes
control node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910810282.4A
Other languages
Chinese (zh)
Other versions
CN110445662A (en
Inventor
刘梦可
刘超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inesa R&d Center
Original Assignee
Inesa R&d Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inesa R&d Center filed Critical Inesa R&d Center
Priority to CN201910810282.4A priority Critical patent/CN110445662B/en
Publication of CN110445662A publication Critical patent/CN110445662A/en
Application granted granted Critical
Publication of CN110445662B publication Critical patent/CN110445662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/30Decision processes by autonomous network management units using voting and bidding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Hardware Redundancy (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a method and a device for adaptively switching OpenStack control nodes into computing nodes, wherein the OpenStack comprises a plurality of control node groups and computing node groups, and the method comprises the following steps: s1: dividing a plurality of groups of control node groups into switchable control node groups and non-switchable control node groups, and selecting control nodes to be switched from the switchable control node groups through a selection algorithm; s2: periodically triggering monitoring, if a node fault or overhigh total load occurs in a computing node group, triggering a self-adaptive upgrading process, and otherwise, ending the process; the self-adaptive upgrading process specifically comprises the following steps: and switching the control nodes to be switched into the computing nodes by an automatic management tool in combination with the container technology, and adding the computing nodes to the computing node group in the step S2. Compared with the prior art, the invention has the advantages of high efficiency and the like.

Description

Method and device for adaptively switching OpenStack control node into computing node
Technical Field
The invention relates to the technical field of Openstack cloud platforms, in particular to a method and a device for adaptively switching Openstack computing nodes into control nodes.
Background
OpenStack is a management platform of open-source cloud computing, can realize the management of a large amount of distributed computing resources, storage resources and network resources, and provides a unified management panel. The purpose is as follows: the cloud service system helps to organize the cloud which runs as virtual computing or storage service, and provides extensible and flexible cloud computing for public cloud and private cloud, and the OpenStack has developed very mature through years of production practice.
In a medium-small scale cloud platform, a general deployment architecture is a model of multiple control nodes and multiple computing nodes, the control nodes can be multiplexed with network nodes, and distributed storage services can be deployed on the control nodes, the computing nodes or other independent nodes. With the increasing service life of the server on the cloud platform, the failure rate of the server is also increased, the emergency situation of server failure is often encountered in an actual production environment, and when a computing node has a functional failure, particularly a reusable storage service, the virtual machine on the node cannot provide service to the outside in a short time; in addition, when the load of the cloud platform computing node group is too high, the performance of the virtual machine is also affected, and the user experience is further affected. In order to improve the operation continuity of a virtualization cluster system, when a virtual machine running a service fails, the service can be timely recovered to run, so that the service interruption time is minimized. Each server in the high-availability technology virtualization cluster is provided with an agent, the agents continuously detect the states of other servers, the agents send heartbeat signals to other servers at regular time, if one server cannot respond to the heartbeat signals continuously for three times, the agents think that the server has a fault and report the corresponding fault, and the agents restart all virtual machines on the fault server on other servers in the virtualization cluster to recover the virtual machine service, so that the continuity of the service is ensured, or adding a new server or replacing the original failed server, and directly deploying each computing service and network agent service of the cloud platform on an operating system of the newly added server in an RPM (revolution speed) packet mode, however, the method has low timeliness and slow recovery due to the complex process of putting the server on and off the shelf and complex deployment, and the requirement on high timeliness in a production environment is difficult to meet.
In order to solve the above problems, the prior art also provides an effective solution, and a chinese patent CN108089911A provides a method and a device for controlling a computing node in an OpenStack environment, where the method includes: periodically monitoring, in the control node, status data of the compute nodes, determining whether the respective compute node is available, and if not, evacuating virtual machines running on the respective compute node to compute nodes determined to be available in the environment based on the monitored status data, the method can maximally shorten the RTO and the RPO by determining whether the computing node is available and evacuating the virtual machines running on the unavailable computing node in time, the virtual machine traffic is resumed in the shortest amount of time, maintaining high availability, but the increased load on the receiving virtual machine's compute node also causes the service on that node to run slower, meanwhile, the virtual machines of the unavailable computing nodes are dispersed to other available computing nodes, the operation burden of other computing nodes is increased, the method can not effectively solve the problem of overhigh load of the computing node and influences the service quality of the tenant virtual machine.
Disclosure of Invention
The present invention is directed to provide a method and an apparatus for adaptively switching an OpenStack computing node to a control node, so as to overcome the above-mentioned drawbacks of the prior art.
The purpose of the invention can be realized by the following technical scheme:
a method for adaptively switching OpenStack control nodes into computing nodes is provided, wherein the OpenStack is a topological structure formed by a control node group and a computing node group, and the method comprises the following steps:
s1: dividing the control node group into a switchable control node group and a non-switchable control node group, and selecting a control node to be switched from the switchable control node group through a selection algorithm;
s2: periodically triggering monitoring, if a node fault or over-high total load occurs in a computing node group, triggering a self-adaptive upgrading process, and otherwise, ending the process;
the self-adaptive upgrading process specifically comprises the following steps: and switching the control nodes to be switched into the computing nodes by an automatic management tool in combination with the container technology, and adding the computing nodes to the computing node group in the step S2.
Furthermore, the control node group provides highly available cloud platform management control services, the switchable control node group is used as a supplement of the control node group, management services and virtual network services can be provided, the management performance of the cloud platform is improved, the service bandwidth of the centralized virtual network is increased, and due to the fact that each service of the whole control node group is highly available, service interruption of the cloud platform cannot be caused due to switching of a single control node.
Furthermore, the computing node provides services such as computing resource virtualization service, computing resource management service, virtual two-layer network proxy and the like, and provides computing resources such as a CPU (central processing unit), a memory and the like for the virtual machine.
Furthermore, the OpenStack cloud platform adopts a containerization deployment mode, all services are packaged into corresponding Docker images, the services are started in a container starting mode, the problem of dependence conflict among different services is avoided, meanwhile, the upgrading rollback of each service is facilitated, and the problems of difficult deployment and difficult upgrading of the cloud platform are effectively solved;
the container mirror image of every service all keeps in local Docker private warehouse, combines the layering characteristic of container mirror image, and all customization mirror images of cloud platform adopt the mode of mirror image layering to realize, realize the mirror image layering through four layers, by from the top down in proper order: the operating system base mirror image, the cloud platform base mirror image, the function module base mirror image and the service mirror image in the module can avoid the repeated installation of the dependence package through mirror image layering, reduce the total storage size of the mirror image and improve the deployment efficiency.
Further, control node groups are grouped by marking custom tags, the number of the non-switchable control node groups is at least 3, and the requirement of the cloud platform on the minimum number of high availability is met.
Furthermore, when the control nodes in the switchable control node group are deployed, the container mirror images required by the computing nodes are pre-installed, the container mirror images of the control nodes are kept updated synchronously when the cloud platform service is upgraded, the cloud platform computing service can be started through the container mirror images, and meanwhile performance degradation caused by network transmission of large files is avoided;
further, the election algorithm selects a node with the minimum reference index value, the reference index is a CPU load or a network traffic or a Cost value, the Cost value is obtained through a weighted summation algorithm, and a calculation formula is as follows:
Figure BDA0002184864990000031
wherein WiIs a weight value, XiThe method is a combination of any parameters of input parameters including CPU usage, memory usage and network flow, and N is the number of the input parameters.
Further, the combining the container technology and the automatic management tool to quickly switch the control node to be switched into the computing node specifically includes:
and cleaning all containers on the control node to be switched by utilizing an automatic deployment tool, wherein the automatic deployment tool comprises an Angle, an operating system layer and a Docker service layer which are consistent with the computing node, and rapidly starting the services of the computing node obtained by switching, and the services comprise nova-libvirtd, nova-computer and neutron-ope nvswitch-agent.
Further, the method for judging the node failure of the computing node group specifically comprises the following steps:
the monitoring system sends heartbeat packets to each computing node in the computing node group, and if any computing node cannot receive the heartbeat packets, the computing node group has node faults.
Further, the method for judging that the load of the computing node group is too high comprises total load calculation and total load prediction.
Further, the total load calculation method specifically includes:
collecting the load of the current computing node through a monitoring agent on each computing node in the computing node group, wherein the load comprises a CPU (Central processing Unit), a memory and network flow, and when the total load of the computing node group exceeds a preset threshold value, the total load of the computing node group is too high;
further, the total load prediction method specifically includes:
based on historical monitoring data of the computing nodes, predicting through a multi-input single-output neural network linear regression model, wherein the neural network linear regression model is as follows:
Z=WX+B
where Z is the calculated node load prediction value, X ═ X1,x2,…,xNIs an input sample that includes time, number of virtual machines, and number of tenants, W ═ W1,w2,…,wNIs the weight matrix, B ═ B1And the total load of the computing node group is over-high if the total load exceeds a preset threshold value.
An apparatus for adaptive switching of an OpenStack control node to a computing node comprises a memory having a computer program stored therein and a processor that invokes the computer program to perform the steps of the method.
Compared with the prior art, the invention has the following beneficial effects:
(1) the state of a computing node group is periodically monitored, whether node failure occurs or not and the total load is overhigh, the process of controlling the node to be switched into the computing node is automatically triggered, the self-healing or capacity expansion of the computing node group is achieved, and the high availability of a cloud platform is maintained, wherein the computing mode of the total load of the computing node group comprises real-time computing and prediction computing, the prediction computing is carried out through a neural network linear regression model based on historical data, the node switching is carried out before the total load of the computing node group reaches a set threshold, and the influence on the cloud platform is avoided;
(2) according to the invention, a containerization deployment mode is adopted for each service of the cloud platform, and when a node switching process is carried out, only the original container service on a control node is needed, so that the environment can be quickly cleaned;
(3) the service of the cloud platform is stored in the Docker private warehouse in a mirror layering mode, wherein the mirror layering sequentially comprises an operating system basic mirror, a cloud platform basic mirror, each function module basic mirror and the mirror of each service in the module from top to bottom, the total storage size of the mirror is reduced, the control node to be switched after environment cleaning only needs to download all the mirrors related to the service of the computing node in the Docker private warehouse in advance instead of starting a corresponding container, the corresponding container service can be started quickly, and the deployment efficiency is high;
(4) according to the method, the control node groups are divided into the switchable control node groups and the non-switchable control node groups, wherein the non-switchable control node groups ensure high availability of the cloud platform, so that the continuity of the cloud platform service cannot be influenced by switching of a single node.
Drawings
FIG. 1 is a flow chart of an adaptive switching node;
FIG. 2 is a block diagram of an adaptive switching node;
FIG. 3 is a flow diagram of a switching node according to an embodiment;
FIG. 4 is a Docker container deployment diagram of three classes of nodes.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example one
A method for adaptively switching OpenStack control nodes into computing nodes, wherein OpenStack is a topology structure formed by a control node group and a computing node group, as shown in FIG. 1, the method comprises the following steps:
s1: dividing the control node group into a switchable control node group and a non-switchable control node group, and selecting and generating a control node to be switched from the switchable control node group through a selection algorithm;
s2: periodically triggering monitoring, if a node fault or over-high total load occurs in a computing node group, triggering a self-adaptive upgrading process, and otherwise, ending the process;
the self-adaptive upgrading process specifically comprises the following steps: and switching the control node to be switched into the computing node by an automatic management tool in combination with the container technology, and adding the computing node to the computing node group in the step S2.
The method comprises the following steps that a node with the minimum reference index value is selected by an election algorithm, the node with the minimum reference index value is selected by the election algorithm, the reference index is a CPU load or network flow or a Cost value, the Cost value is obtained through a weighted summation algorithm, and the calculation formula is as follows:
Figure BDA0002184864990000051
wherein WiIs a weight value, XiThe method is a combination of any parameters of input parameters including CPU usage, memory usage and network flow, and N is the number of the input parameters.
The method for rapidly switching the control node to be switched into the computing node by combining the container technology through an automatic management tool specifically comprises the following steps:
and cleaning all containers on the control node to be switched by utilizing an automatic deployment tool, wherein the automatic deployment tool comprises an Angle, an operating system layer and a Docker service layer which are consistent with the computing node, and rapidly starting the services of the computing node obtained by switching, and the services comprise nova-libvirtd, nova-computer and neutron-ope nvswitch-agent.
The method for judging the node fault of the computing node group specifically comprises the following steps:
the monitoring system sends heartbeat packets to each computing node in the computing node group, and if any computing node cannot receive the heartbeat packets, the computing node group has node faults.
The method for judging the overhigh load of the computing node group comprises the following steps:
collecting the load of the current computing node through a monitoring agent on each computing node in the computing node group, wherein the load comprises a CPU (Central processing Unit), a memory and network flow, and when the total load of the computing node group exceeds a preset threshold value, the total load of the computing node group is too high;
specifically, in this embodiment, as shown in fig. 3, step S2 includes:
101) the timer regularly triggers the monitoring system to collect load information of each computing node of the cloud platform every five minutes, and sends heartbeat packets to the computing nodes;
102) judging whether the computing node group has node faults or the total load is too high, if so, executing a step 103), otherwise, ending the process;
103) if the monitoring system is configured to be in a silent mode, directly executing the step 104), otherwise, notifying an administrator through a mail or a short message, if the administrator agrees, executing the step 104), otherwise, ending the process;
104) selecting control nodes to be switched from the switchable control node group through an election algorithm;
105) automatically cleaning the container on the control node to be switched through the infrastructure, and reserving an operating system layer;
106) and automatically starting the container service related to the control node to be switched and the computing node through the infrastructure, switching the container service into the computing node, adding the computing node into the computing node group, and ending the process.
Example two
In the implementation, the total load of the computing node group is obtained by computing through a prediction algorithm, and the others are the same as those in the first embodiment, and the prediction algorithm specifically comprises the following steps:
based on historical monitoring data of the computing nodes, predicting through a multi-input single-output neural network linear regression model, wherein the neural network linear regression model is as follows:
Z=WX+B
wherein Z is a predicted value of the load of the computing node, and X is { X ═ X1,x2,…,xNIs an input sample that includes time, number of virtual machines, and number of tenants, W ═ W1,w2,…,wNIs the weight matrix, B ═ B1And the total load of the computing node group is obtained according to the load predicted value Z of each computing node in the group, and if the total load exceeds a preset threshold value, the total load of the computing node group is too high.
EXAMPLE III
An apparatus for OpenStack control node adaptive switching to a compute node corresponding to an embodiment, the OpenStack comprising a set of control nodes and a set of compute nodes, the apparatus comprising:
the fault monitoring module is used for detecting each computing node in the computing node group by sending the heartbeat packet and judging whether the computing node group has node faults or not;
the load detection module is used for collecting load information elements of each computing node in the computing node group, computing the total load of the computing node group according to the collected load information, or predicting the total load of the computing node group according to historical load information, and judging whether the computing node group is overloaded or not according to a set load threshold value;
the node processing module is used for dividing the control node groups into switchable control node groups and non-switchable control node groups, selecting and generating control nodes to be switched from the switchable control node groups through a selection algorithm, switching the control nodes to be switched into calculation nodes through an automatic management tool by combining a container technology, and adding the calculation nodes into the calculation node groups with node faults or overhigh total load;
and the timing trigger module is used for setting a monitoring period and triggering the fault monitoring module and the load detection module to operate at regular time according to the period.
The device of the embodiment is used as a peripheral device of a cloud platform, monitors load information and fault states of computing nodes, wherein the load information comprises a CPU (central processing unit), a memory and key network traffic, and manages node switching processes.
The infrastructure of the cloud platform adopts a topology of M + N nodes and comprises a model of M control nodes and N computing nodes, wherein the control nodes can reuse network nodes, the M control nodes are divided into a plurality of control node groups, the control node groups are divided into switchable control node groups and non-switchable control node groups through marking custom labels, the number of the non-switchable control node groups is at least 3, and the requirement of the cloud platform on the minimum number of high availability is met.
The control node group provides a highly available cloud platform management control service: providing Application Program Interface (API) services comprising a computing module, a cloud hard disk management module and a mirror image management module and internal working components comprising a controller component and a scheduling component, wherein the services are stateless and realize high availability of load balancing through hash + keepalive; the method comprises the steps that a shared database and a message queue service are provided at the same time, the two services are stateful, the database service realizes a multi-master high-availability cluster through MySQL Gelera, and the RabbitMQ cluster realizes the high availability of a message queue through a mirror image mode;
meanwhile, the control node group can provide high-availability cloud platform management control services including services of a gateway of a tenant network, external network access, a floating IP (Internet protocol), a virtual firewall and the like through the L3 Agent, the control node group can be switched to serve as a supplement of the control node group, the management services and the virtual network services can be provided, the management performance of the cloud platform is improved, the service bandwidth of a centralized virtual network is increased, and the high-availability of active and standby virtual routing is realized through keepalive.
The computing node group provides computing resource virtualization service, computing resource management service, virtual two-layer network proxy and other services, and computing resources such as a CPU (central processing unit), a memory and the like are provided for the virtual machine.
The OpenStack cloud platform adopts a containerization deployment mode, all services are packaged into corresponding Docker images, the services are started in a container starting mode, all nodes keep the consistency of operating system versions and Docker service versions, the problem of dependence and conflict among different services is avoided, meanwhile, the upgrading and rollback of each service is facilitated, and the problems of difficult cloud platform deployment and difficult upgrading are effectively solved.
As shown in fig. 2, the container mirror image of each service is stored in the local Docker private warehouse, and in combination with the hierarchical property of the container mirror image, all customized mirror images of the cloud platform are implemented in a mirror image hierarchical manner, and the mirror image hierarchical is implemented by four layers, which are sequentially from top to bottom: the operating system basic mirror image, the cloud platform basic mirror image, each function module basic mirror image and the mirror image of each service in the module can avoid the repeated installation of the dependence package through the mirror image layering, reduce the total storage size of the mirror images and improve the deployment efficiency.
All services of the cloud platform are started in a Docker container mode, all nodes keep consistency of operating system versions and Docker service versions, and smoothness and stability of node switching are guaranteed.
As shown in fig. 4, the Docker container of the control node includes API services and internal components of each module of the cloud platform, and all of the shared database, the message queue, and the load balancing service implement a high availability scheme.
The Docker container of the computing node comprises nova-computer computing service, neutron-openvswitch-ag ent virtual two-layer network service and the like, wherein all images of the computing node service are pre-downloaded by the nodes in the switchable control node group instead of starting the corresponding container, so that the corresponding container service can be quickly started when the node is upgraded to the computing node.
When the control nodes in the switchable control node group are deployed, the container mirror images required by the computing nodes are pre-installed, the container mirror images of the control nodes are kept updated synchronously when the cloud platform service is upgraded, the cloud platform computing service can be started through the container mirror images, and meanwhile performance degradation caused by network transmission of large files is avoided.
The first embodiment, the second embodiment and the third embodiment trigger a process of controlling the node switching to the computing node based on the current state, including node failure or overload of the computing node group, or prediction based on historical data through a multi-input single-output neural network linear regression model, by adopting containerization deployment platform service, only old container service is simply deleted when the node is switched, and then the service is quickly started through a pre-downloaded Docker mirror image, so that the timeliness and high availability of cloud platform switching are guaranteed through the quick process.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (8)

1. A method for OpenStack control node to adaptively switch to a computing node, the OpenStack comprising a plurality of groups of control nodes and a plurality of groups of computing nodes, the method comprising:
s1: dividing a plurality of groups of control node groups into switchable control node groups and non-switchable control node groups, and selecting and generating control nodes to be switched from the switchable control node groups through a selection algorithm;
s2: triggering monitoring periodically, if finding that a computing node group has node faults or the total load is overhigh, triggering a self-adaptive upgrading process, and if not, ending the process;
the self-adaptive upgrading process specifically comprises the following steps: switching the control node to be switched into a computing node by combining a container technology through an automatic management tool, and adding the computing node into the computing node group in the step S2;
the combination container technology for rapidly switching the control nodes to be switched into the computing nodes through an automatic management tool specifically comprises the following steps:
and cleaning all containers on the control node to be switched by utilizing an automatic deployment tool, wherein the automatic deployment tool comprises an anchor, an operating system layer and a Docker service layer which are consistent with the computing node, and rapidly starts the services of the computing node obtained by switching, the services comprise nova-libvirtd, nova-computer and neutron-openvswitch-agent, the container services related to the control node to be switched and the computing node are automatically started through the anchor, the computing node is switched into the computing node, and the computing node is added into the computing node group, wherein all mirror images of the computing node services are pre-downloaded by the nodes in the switchable control node group, but the corresponding containers are not started.
2. The method of claim 1, wherein the grouping of control node groups is performed by tagging custom tags, and the number of the non-switchable control node groups is at least 3.
3. The method for OpenStack control node to adaptively switch to a compute node according to claim 1, wherein the election algorithm selects a node with a minimum reference index value, and the reference index is a CPU load or a network traffic or a Cost value;
the Cost value is obtained through a weighted summation algorithm, and the calculation formula is as follows:
Figure DEST_PATH_IMAGE001
wherein
Figure 208996DEST_PATH_IMAGE002
Is a weight value of the weight value,
Figure DEST_PATH_IMAGE003
the number of the input parameters is one or more of the input parameters including CPU usage, memory usage and network traffic.
4. The method for adaptively switching an OpenStack control node to a computing node according to claim 1, wherein the method for determining a node failure in a computing node group specifically comprises:
the monitoring system sends heartbeat packets to each computing node in the computing node group, and if any computing node cannot receive the heartbeat packets, the computing node group has node faults.
5. The method of claim 1, wherein the method for determining that the load of the computing node group is too high comprises total load calculation and total load prediction.
6. The method for OpenStack control node adaptive switching to a compute node according to claim 5, wherein the method for computing the total load of a compute node group specifically comprises:
the method comprises the steps that loads of current computing nodes are collected through monitoring agents on the computing nodes in a computing node group, the loads comprise a CPU, a memory and network flow, and when the total loads of the computing node group exceed a preset threshold value, the total loads of the computing node group are too high.
7. The method for OpenStack control node adaptive switching to a compute node according to claim 5, wherein the total load prediction method of a compute node group specifically is: based on historical monitoring data of the computing nodes, predicting through a multi-input single-output neural network linear regression model;
wherein the neural network linear regression model is:
Figure 594978DEST_PATH_IMAGE004
wherein
Figure DEST_PATH_IMAGE005
In order to calculate the node load prediction value,
Figure 968190DEST_PATH_IMAGE006
to input a sample, the sample includes time, number of virtual machines, and number of tenants,
Figure DEST_PATH_IMAGE007
in order to be a weight matrix, the weight matrix,
Figure 628979DEST_PATH_IMAGE008
is an offset matrix; according to the obtained load predicted value of each computing node in the group
Figure DEST_PATH_IMAGE009
And solving the total load of the computing node group, wherein if the total load exceeds a preset threshold value, the total load of the computing node group is too high.
8. An apparatus for adaptive switching of an OpenStack control node to a computing node, comprising a memory and a processor, the memory storing a computer program, wherein the processor invokes the computer program to perform the steps of the method according to any of claims 1-7.
CN201910810282.4A 2019-08-29 2019-08-29 Method and device for adaptively switching OpenStack control node into computing node Active CN110445662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910810282.4A CN110445662B (en) 2019-08-29 2019-08-29 Method and device for adaptively switching OpenStack control node into computing node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910810282.4A CN110445662B (en) 2019-08-29 2019-08-29 Method and device for adaptively switching OpenStack control node into computing node

Publications (2)

Publication Number Publication Date
CN110445662A CN110445662A (en) 2019-11-12
CN110445662B true CN110445662B (en) 2022-07-12

Family

ID=68438379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910810282.4A Active CN110445662B (en) 2019-08-29 2019-08-29 Method and device for adaptively switching OpenStack control node into computing node

Country Status (1)

Country Link
CN (1) CN110445662B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111447146B (en) * 2020-03-20 2022-04-29 上海中通吉网络技术有限公司 Method, device, equipment and storage medium for dynamically updating physical routing information
CN112269694B (en) * 2020-10-23 2023-12-22 北京浪潮数据技术有限公司 Management node determining method and device, electronic equipment and readable storage medium
CN112925609B (en) * 2021-03-01 2022-03-15 浪潮云信息技术股份公司 OpenStack cloud platform upgrading method and device
CN114500554B (en) * 2022-02-09 2024-04-26 南京戎光软件科技有限公司 Internet of things system management method
CN115242688B (en) * 2022-07-27 2024-06-14 郑州浪潮数据技术有限公司 Network fault detection method, device and medium
CN116980346B (en) * 2023-09-22 2023-11-28 新华三技术有限公司 Container management method and device based on cloud platform

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5078347B2 (en) * 2006-12-28 2012-11-21 インターナショナル・ビジネス・マシーンズ・コーポレーション Method for failing over (repairing) a failed node of a computer system having a plurality of nodes
CN102035862B (en) * 2009-09-30 2013-11-06 国际商业机器公司 Configuration node fault transfer method and system in SVC cluster
US9106537B1 (en) * 2013-06-05 2015-08-11 Parallels IP Holdings GmbH Method for high availability of services in cloud computing systems
CN105245381B (en) * 2015-10-22 2019-08-16 上海斐讯数据通信技术有限公司 Cloud Server delay machine monitors migratory system and method
CN107544839B (en) * 2016-06-27 2021-05-25 腾讯科技(深圳)有限公司 Virtual machine migration system, method and device
CN106603696B (en) * 2016-12-28 2019-06-25 华南理工大学 A kind of high-availability system based on super fusion basic framework
CN107526626B (en) * 2017-08-24 2020-12-01 武汉大学 Docker container thermal migration method and system based on CRIU
CN109617992B (en) * 2018-12-29 2021-08-03 杭州趣链科技有限公司 Block chain-based dynamic election method for edge computing nodes

Also Published As

Publication number Publication date
CN110445662A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
CN110445662B (en) Method and device for adaptively switching OpenStack control node into computing node
US11044162B2 (en) Orchestration of cloud and fog interactions
CN110580198B (en) Method and device for adaptively switching OpenStack computing node into control node
JP4916881B2 (en) Method, apparatus, and program for autonomic failover
US10956832B2 (en) Training a data center hardware instance network
US10735250B2 (en) Dynamic distributor selection for network load balancing
CN107615792B (en) Management method and system for MTC event
US11650654B2 (en) Managing power resources for pools of virtual machines
US20190020559A1 (en) Distributed health check in virtualized computing environments
US11909603B2 (en) Priority based resource management in a network functions virtualization (NFV) environment
US11886904B2 (en) Virtual network function VNF deployment method and apparatus
CN112311896B (en) Health examination method, device, equipment and computer readable storage medium
Farahnakian et al. Hierarchical vm management architecture for cloud data centers
CN114338670B (en) Edge cloud platform and network-connected traffic three-level cloud control platform with same
US20170141950A1 (en) Rescheduling a service on a node
Abid et al. A novel scheme for node failure recovery in virtualized networks
de Melo Network virtualisation from an operator perspective
CN114756396A (en) Container service fault repairing method and device
CN105591780B (en) Cluster monitoring method and equipment
CN110266790A (en) Edge cluster management method, device, edge cluster and readable storage medium storing program for executing
Vistro et al. An Efficient Approach for Resilience and Reliability Against Cascading Failure
Paulraj et al. Route aware virtual machine migration in cloud datacenter
US20240028489A1 (en) Adaptive application recovery
US20240333622A1 (en) Distributed network monitoring
Kamisiński et al. Resilient NFV Technology and Solutions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant