Nothing Special   »   [go: up one dir, main page]

US7886180B2 - Recovery in a distributed stateful publish-subscribe system - Google Patents

Recovery in a distributed stateful publish-subscribe system Download PDF

Info

Publication number
US7886180B2
US7886180B2 US10/846,196 US84619604A US7886180B2 US 7886180 B2 US7886180 B2 US 7886180B2 US 84619604 A US84619604 A US 84619604A US 7886180 B2 US7886180 B2 US 7886180B2
Authority
US
United States
Prior art keywords
missing information
message
messages
overlay network
publisher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US10/846,196
Other versions
US20050268146A1 (en
Inventor
Yuhui Jin
Robert Evan Strom
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US10/846,196 priority Critical patent/US7886180B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JIN, YUHUI, STROM, ROBERT EVAR
Publication of US20050268146A1 publication Critical patent/US20050268146A1/en
Application granted granted Critical
Publication of US7886180B2 publication Critical patent/US7886180B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/142Managing session states for stateless protocols; Signalling session states; State transitions; Keeping-state mechanisms

Definitions

  • the present invention relates generally to the field of data processing systems and, more particularly, to a method, system and computer program product for fault recovery in a distributed stateful publish-subscribe system.
  • a publish-subscribe system is a system that includes two types of clients, publisher clients and subscriber clients.
  • a publisher client also referred to herein as a publisher, generates messages, also referred to as events, which contain a topic and some data content.
  • a subscriber client also referred to herein as a client, provides, ahead of time, a criterion, also referred to as a subscription, that specifies the information, based on published messages, that the system is required to deliver to that subscriber client in the future.
  • a criterion also referred to as a subscription
  • publishers and subscribers are anonymous in that publishers do not necessarily know the number of subscribers or their locations; and subscribers do not necessarily know the locations of the publishers.
  • a stateful publish-subscribe system is a system without such restrictions.
  • a stateful publish-subscribe system is required to support subscription criteria that depend upon computations that require multiple messages from one or more streams, for example, “Give me the highest quote of IBM within each one-minute period”.
  • a stateful system might entail delivering information that is more than simply a copy of published messages, e.g. “Tell me how many stocks fell during each one-minute period”.
  • a stateful publish-subscribe service as used in this invention is implemented on an overlay network that comprises a collection of service machines, also referred to as brokers, that accept messages from publisher clients, deliver subscribed information to subscriber clients, and route information between publishers and subscribers.
  • a stateful publish-subscribe system as used herein is a publish-subscribe system in which at least one subscription of the system is stateful.
  • An effective stateful publish-subscribe system should be fault-tolerant; i.e., it should have the ability to detect and recover from failures that may occur when a stateful publish-subscribe service is implemented on an overlay network. Such failures may include, for example, temporary crashes of broker machines, temporary losses of connectivity between broker machines, and network errors causing messages to possibly be lost, duplicated or delivered out of order.
  • the present invention provides a fault-tolerant protocol for a distributed stateful publish-subscribe system.
  • the system includes the capability of recovering from failures that may occur when a stateful publish-subscribe service is implemented on an overlay network. Such failures may include, for example, temporary crashes of broker machines, and network errors causing messages to possibly be lost, duplicated or delivered out of order.
  • the system requires stable storage logging only when a published event enters the system, and requires that logged messages be retrieved from stable storage only in the event all brokers between a failed link or broker and the publishing sites have failed.
  • the publish-subscribe system of the present invention does not require that broker-to-broker connections use reliable FIFO protocols, such as TCP/IP, but may advantageously use faster, less reliable protocols.
  • FIG. 1 depicts a pictorial representation of a network of data processing systems in which the present invention may be implemented
  • FIG. 2 is a block diagram of a data processing system that may be implemented as a server computer in accordance with a preferred embodiment of the present invention
  • FIG. 3 is a block diagram of a data processing system that might act as a client of the service in which the present invention may be implemented;
  • FIG. 4 is a diagram of a broker network for a stateful publish-subscribe system in accordance with a preferred embodiment of the present invention
  • FIG. 5 is a diagram that illustrates how a stateful publish-subscribe system appears to publisher and subscriber clients in accordance with a preferred embodiment of the present invention
  • FIG. 6 is a diagram that illustrates an example of an operator that transforms two input view objects to an output view object in accordance with a preferred embodiment of the present invention
  • FIG. 7 is a diagram that illustrates a dataflow hypergraph, distributed over multiple brokers in accordance with a preferred embodiment of the present invention.
  • FIG. 8 is a pictorial representation of a monotonic domain illustrating the manner in which information gaps are detected in accordance with a preferred embodiment of the present invention
  • FIG. 9 is a flowchart that illustrates a method for responding to state change messages in accordance with a preferred embodiment of the present invention.
  • FIG. 10 is a diagram that illustrates the result of a stateful transformation in accordance with a preferred embodiment of the present invention.
  • FIG. 11 is a flowchart that illustrates a method for processing a curiosity message from a downstream object in accordance with a preferred embodiment of the present invention.
  • FIG. 1 depicts a pictorial representation of a network of data processing systems in which the present invention may be implemented.
  • the network of data processing systems is designated by reference number 100 , and contains a network 102 , which is the medium used to provide communications links between various devices and computers connected together within network data processing system 100 .
  • Network 102 may include connections, such as wire, wireless communication links, or fiber optic cables.
  • servers 112 , 114 , 116 are connected to network 102 along with storage unit 106 .
  • clients 122 , 124 , and 126 are connected to network 102 .
  • Clients 122 , 124 , and 126 may, for example, be personal computers or network computers.
  • servers 112 , 114 , 116 provide data, such as boot files, operating system images, and applications to clients 122 , 124 , 126 .
  • Clients 122 , 124 , and 126 are clients to servers 112 , 114 , 116 .
  • Network data processing system 100 may include additional servers, clients, and other devices not shown.
  • network data processing system 100 provides a distributed messaging system that supports stateful subscriptions. A subset of clients 122 , 124 , 126 may be publishing clients, while others of clients 122 , 124 , 126 may be subscribing clients. Published events may also be generated by one or more of servers 112 , 114 , 116 .
  • a stateful publish-subscribe system is a distributed messaging system in which at least one subscription is stateful. Other subscriptions may be content-based or, in other words, stateless.
  • a stateful publish-subscribe system must compute information that requires multiple messages of one or more streams. For example, a stateful subscription may request, “Give me the highest quote within each one-minute period.”
  • a stateful subscription may entail delivering information other than simply a copy of published messages. For example, a stateful subscription may request, “Tell me how many stocks fell during each one-minute period.”
  • the stateful publish-subscribe system is implemented as an overlay network, which is a collection of service machines, referred to as brokers, that accept messages from publisher clients, deliver subscribed information to subscriber clients, and route information between publishers and subscribers.
  • servers 112 , 114 and 116 may be broker machines.
  • Both content-based and stateful publish-subscribe systems support a message delivery model based on two roles: (1) publishers produce information in the form of structured messages; and, (2) subscribers specify in advance the kinds of information in which they are interested. As messages are later published, relevant information is delivered in a timely fashion to subscribers.
  • Content-based subscriptions are restricted to Boolean filter predicates that can only refer to fields in individual messages.
  • a content-based subscription may request, “Deliver message if traded volume>1000 shares” where the field “traded volume” appears in each message.
  • stateful subscriptions are more general state-valued expressions and may refer to one or more message histories. For example, a subscription may request “Deliver total traded volume by hour for all issues trading a total>10000 shares in that hour”.
  • a content-based publish-subscribe system because subscriptions can only specify filtering, all published messages are either passed through to subscribers or filtered out. Therefore, messages received by subscribers are identically structured copies of messages published by publishers.
  • subscriptions may include more complex expressions and, therefore, subscribers may receive information that is not identical to the published messages with different formatting. For example, a published message may have only integer prices, while subscriptions to average prices may have non-integer averages.
  • Published event streams are associated with topics. Each topic is associated with a base relation.
  • a base relation is a table of tuples, each tuple corresponding to an event.
  • Subscriptions are expressed as view expressions in a relational algebraic language, although other representations may also be used. The language defines a cascade of views of base relations and derived views computed from either base relations or other views.
  • the set of subscriptions is compiled into a collection of objects that are deployed and integrated into messaging brokers.
  • publishers and subscribers connect to these brokers.
  • Published events are delivered to objects associated with base relations. The events are then pushed downstream to other objects that compute how each derived view changes based on the change to the base relation. Those derived views associated with subscriptions then deliver events to the subscriber informing the subscriber of each change in state.
  • network data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another.
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, government, educational and other computer systems that route data and messages.
  • network data processing system 100 also may be implemented as a number of different types of networks, such as for example, an intranet, a local area network (LAN), or a wide area network (WAN).
  • FIG. 1 is intended as an example, and not as an architectural limitation for the present invention.
  • Data processing system 200 may be a symmetric multiprocessor (SMP) system including a plurality of processors 202 and 204 connected to system bus 206 . Alternatively, a single processor system may be employed. Also connected to system bus 206 is memory controller/cache 208 , which provides an interface to local memory 209 . I/O bus bridge 210 is connected to system bus 206 and provides an interface to I/O bus 212 . Memory controller/cache 208 and I/O bus bridge 210 may be integrated as depicted.
  • SMP symmetric multiprocessor
  • Peripheral component interconnect (PCI) bus bridge 214 connected to I/O bus 212 provides an interface to PCI local bus 216 .
  • PCI local bus 216 A number of modems may be connected to PCI local bus 216 .
  • Typical PCI bus implementations will support four PCI expansion slots or add-in connectors.
  • Communications links to clients 122 - 126 in FIG. 1 may be provided through modem 218 and network adapter 220 connected to PCI local bus 216 through add-in connectors.
  • Additional PCI bus bridges 222 and 224 provide interfaces for additional PCI local buses 226 and 228 , from which additional modems or network adapters may be supported. In this manner, data processing system 200 allows connections to multiple network computers.
  • a memory-mapped graphics adapter 230 and hard disk 232 may also be connected to I/O bus 212 as depicted, either directly or indirectly.
  • FIG. 2 may vary.
  • other peripheral devices such as optical disk drives and the like, also may be used in addition to or in place of the hardware depicted.
  • the depicted example is not meant to imply architectural limitations with respect to the present invention.
  • the data processing system depicted in FIG. 2 may be, for example, an IBM eServerTM pSeries® system, a product of International Business Machines Corporation in Armonk, N.Y., running the Advanced Interactive Executive (AIX) operating system or LINUX operating system.
  • AIX Advanced Interactive Executive
  • Data processing system 300 is an example of a computer, such as client computer 122 in FIG. 1 , in which code or instructions implementing the processes of the present invention may be located.
  • data processing system 300 employs a hub architecture including a north bridge and memory controller hub (ICH) 308 and a south bridge and input/output (I/O) controller hub (ICH) 310 .
  • ICH north bridge and memory controller hub
  • I/O input/output controller hub
  • Processor 302 , main memory 304 , and graphics processor 318 are connected to MCH 308 .
  • Graphics processor 318 may be connected to the MCH through an accelerated graphics port (AGP), for example.
  • AGP accelerated graphics port
  • local area network (LAN) adapter 312 audio adapter 316 , keyboard and mouse adapter 320 , modem 322 , read only memory (ROM) 324 , hard disk drive (HDD) 326 , CD-ROM driver 330 , universal serial bus (USB) ports and other communications ports 332 , and PCI/PCIe devices 334 may be connected to ICH 310 .
  • PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, PC cards for notebook computers, etc. PCI uses a cardbus controller, while PCIe does not.
  • ROM 324 may be, for example, a flash binary input/output system (BIOS).
  • BIOS binary input/output system
  • Hard disk drive 326 and CD-ROM drive 330 may use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface.
  • a super I/O (SIO) device 336 may be connected to ICH 310 .
  • IDE integrated drive electronics
  • SATA serial
  • An operating system runs on processor 302 and is used to coordinate and provide control of various components within data processing system 300 in FIG. 3 .
  • the operating system may be a commercially available operating system such as Windows XPTM, which is available from Microsoft Corporation. Instructions for the operating system, and applications or programs are located on storage devices, such as hard disk drive 326 , and may be loaded into main memory 304 for execution by processor 302 .
  • the processes of the present invention are performed by processor 302 using computer implemented instructions, which may be located in a memory such as, for example, main memory 304 , memory 324 , or in one or more peripheral devices 326 and 330 .
  • main memory 304 main memory
  • memory 324 or in one or more peripheral devices 326 and 330 .
  • a plurality of broker machines are responsible for delivery of messages sent by publishing clients towards subscribing clients based upon the content of the messages and the stateful transformations requested by the subscribing clients. These broker machines form an overlay network.
  • Some broker machines may be specialized for hosting publishing clients, referred to as publisher hosting brokers (PHBs), and others for hosting subscribing clients, referred to as subscriber hosting brokers (SHBs). Between the PHBs and the SHBs, there may be any number of intermediate nodes that include routing and filtering.
  • the brokers at the intermediate nodes are referred to as intermediate brokers or IBs. For expository purposes, this separation is assumed; however, in actual deployment, some or all of the broker machines may combine the functions of a PHB, an SHB and/or an IB.
  • FIG. 4 is a diagram of a broker network for a stateful publish-subscribe system in accordance with a preferred embodiment of the present invention.
  • a publishing client such as one of publishers 402 a - 402 d , establishes a connection to a PHB, such as PHB 404 a or PHB 404 b , over a corresponding one of client connections 406 a - 406 d .
  • the client connections may, for example, be a first-in/first-out (FIFO) connection, such as a Transmission Control Protocol/Internet Protocol (TCP/IP) socket connection, or another suitable connection.
  • FIFO first-in/first-out
  • a subscribing client such as one of subscribers 412 a - 412 d , establishes a connection to an SHB, such as SHB 410 a or SHB 410 b , over a corresponding one of client connections 414 a - 414 d , which may be similar to client connections 406 a - 406 d .
  • the PHBs and SHBs are connected, via intermediate brokers 408 a - 408 b , through broker-to-broker links.
  • a publish-subscribe system in accordance with the present invention includes a fault-tolerant protocol that tolerates link failures and message re-orderings.
  • each broker machine may be a stand-alone computer, a process within a computer, or, to minimize delay due to failures, a cluster of redundant processes within multiple computers.
  • the links may be simple socket connections, or connection bundles that use multiple alternative paths for high availability and load balancing.
  • FIG. 5 is a diagram that illustrates how a stateful publish-subscribe system appears to publisher and subscriber clients in accordance with a preferred embodiment of the present invention.
  • Clients are unaware of the physical broker network or its topology.
  • a client application may connect to any broker in the role of publisher and/or subscriber.
  • Publishing clients are aware only of particular named published event streams (topics), such as publish streams 502 and 504 . Multiple clients may publish to the same published event stream (topic).
  • derived views based on functions of either published event streams or of other derived views.
  • published event streams may be represented as relations.
  • Derived views are represented as relations derived from published event streams or from other derived views by means of relational algebraic expressions in a language, such as Date and Darwen's tutorial-D, Structured Query Language (SQL), or XQUERY.
  • derived view 510 is defined as a function of stream relations 502 and 504 by means of a JOIN expression with relations 502 and 504 as inputs and relation 510 as an output.
  • relation 512 indicated as a subscriber view, is derived from relation 510 by client-specified relational expressions.
  • subscriber view 512 may be a request to group the stock trades of relation 510 by issue and hour and compute the running total volume and max and min price for each issue-hour pair.
  • Each subscribing client subscribes to a particular derived view. As published events enter the system from publishing clients, they are saved in their respective streams. The system is then responsible for updating each derived view according to the previously specified relational expressions and then delivering client messages to each subscriber representing the changes to the state of the respective subscribed view.
  • subscription specifications are analyzed by a compiler and converted into a collection of transform objects and view objects.
  • the compiler generates JAVA classes for transform and view objects, which are then packaged into an archive (JAR) file, uploaded to the appropriate brokers, and instantiated.
  • JAR archive
  • Each operator that derives a view from one or more inputs corresponds to a transform object.
  • Each view corresponds to a view object.
  • View objects hold the state of a view.
  • Transform objects express the logic for incrementally updating an output view constituting the result of an operator in response to individual changes to input views constituting the arguments to that operator.
  • FIG. 6 illustrates an example of an operator that transforms input view objects to an output view object in accordance with a preferred embodiment of the present invention.
  • views 610 and 620 are view objects that are inputs to some operator, such as, for example, a JOIN operator.
  • Transform 650 is a transform object for that operator, which produces a derived view shown as view object 670 .
  • messages reflecting the changes are sent to transform object 650 .
  • Transform 650 receives the messages representing changes to its inputs 610 , 620 , computes how the result of the operator changes given the announced changes it has received, and then delivers the computed results to its output view object 670 in the form of change messages.
  • Output view object 670 then propagates in its turn such change messages, either to further transforms, if view object 670 is an intermediate view, or to subscribers, if view object 670 is a subscriber view.
  • all message flows are in the “downstream” direction, that is, from inputs toward transforms, and from transforms toward objects.
  • view objects such as view object 670 will detect, using protocols discussed hereinafter, that information is missing as a result of failures causing message loss.
  • messages referred to as “curiosity” messages will flow in the upstream direction, that is, from output view objects toward transform objects and from transform objects toward input view objects, in order to request the re-sending of these missing messages.
  • FIG. 6 is a diagram that illustrates an example of an operator that transforms two input view objects to an output view object in accordance with a preferred embodiment of the present invention.
  • the mechanism of the present invention builds a structure containing all of the transform objects and view objects needed for all intermediate and subscribed views of all subscriptions.
  • This structure is called a dataflow hypergraph.
  • the dataflow hypergraph has nodes corresponding to each view object and hyperedges, which may possibly have more than one input feeding an output, representing each transform object associated with an operation in the subscription specification.
  • the view objects and transform objects are then allocated to actual brokers in the overlay network, either manually by an administrator or automatically via a service, such as the one described in co-pending application entitled “CONTINUOUS FEEDBACK-CONTROLLED DEPLOYMENT OF MESSAGE TRANSFORMS IN A DISTRIBUTED MESSAGING SYSTEM”, Ser. No. 10/841,297, filed on May 7, 2004.
  • the published streams and the subscribed views may be constrained to be located on brokers where the publishers and subscribers actually connect.
  • the placement of the intermediate transform objects and view objects is not constrained. That is, intermediate transform objects and view objects may be placed wherever suitable, taking into consideration the capacities of the broker machines and the links, as well as the desired performance. After such allocation of objects to brokers, the result is a distributed dataflow hypergraph.
  • FIG. 7 is a diagram that illustrates a dataflow hypergraph, distributed over multiple brokers in accordance with a preferred embodiment of the present invention.
  • the physical network consists of brokers 710 , 720 , 730 , and 740 .
  • the publishing clients are publishing to three separate published message streams: “buys” 722 on broker 720 , “sells” 734 on broker 730 , and “matches” 712 on broker 710 .
  • the subscribing client 750 subscribes to derived view 748 on broker 740 .
  • Broker 710 also includes transforms 714 and 716 , which feed change messages to brokers 720 and 730 , respectively.
  • Broker 720 includes view objects 724 and 726 and transform objects 725 and 727 .
  • view object 726 represents an intermediate derived view or relation, which is based on transform 725 , published stream 722 , and view 724 .
  • Broker 730 includes views 732 and 736 , in addition to published stream 734 , and also includes transforms 735 , 737 .
  • Broker 740 includes views 742 , 744 and 748 , and transform 746 .
  • View 748 is a subscriber view for subscriber 750 .
  • multiple publisher clients may provide messages for a single message stream, and multiple subscriber clients may subscribe and receive updates from the same view.
  • the transform graph consists of multiple transform and view objects distributed over all brokers.
  • the paths between objects will sometimes lie within a broker, as is the case between transform object 725 and intermediate view object 726 .
  • the path must cross over an inter-broker link.
  • the within-broker communications between objects may use cheaper communications formats, such as parameter passing between objects, however, inter-broker communications requires generating physical messages or packets that will cross the link.
  • the protocols of all view objects and transform objects will be able to recover from lost, out-of-order, or duplicate messages, and, therefore, will work correctly regardless of which paths between objects cross broker boundaries and which do not.
  • a history of states is stored in a data storage device. For example, messages from the “matches” published stream 712 are stored in storage 782 , messages from the “buys” published stream 722 are stored in storage 784 , and messages from the “sells” published stream 734 are stored in storage 786 .
  • Storage 782 , 784 and 786 should be a persistent storage, such as a hard drive, capable of recovering published messages should broker 710 , 720 or 730 crash and be restarted. In a system guaranteeing reliable service, published messages will be logged to persistent storage before being propagated.
  • Other states such as views 742 , 744 , are preferentially stored in main memory and are not required to be stored persistently.
  • downstream refers to the direction along the hypergraph of FIG. 7 from input streams towards subscribers; and the term “upstream” refers to the opposite direction from a subscribed or intermediate view object back towards the input streams or towards intermediate views used to compute them.
  • the state represented in each view object is represented by a value in a monotonic domain.
  • a monotonic domain is a set of values that can be put into a partial order. In the present invention, if a value B succeeds a value A in the partial order, this means that value B contains more information than value A.
  • value B might be a property of a total stock volume that says the total is at least 100; and value A might be the property that says the total is at least 50.
  • a critical property of monotonic domains is that they change only in one direction: from less knowledge to more knowledge.
  • the monotonic domains begin from a bottom state, which is a state of total ignorance. In the example given above of total stock volume, the bottom state is the state that the total stock volume is at least 0.
  • FIG. 8 is a pictorial representation of a monotonic domain illustrating the manner in which information gaps are detected in accordance with a preferred embodiment of the present invention.
  • FIG. 8 illustrates a very simple monotonic domain representing a computed value that can take a range of from 0 to 15. Initially, as shown in state diagram 810 , all cells are blank, representing that nothing is known other than that the value is in the range of 0-15. A later state of the same object occurs if new pieces of knowledge are added. In state diagram 820 , two new pieces of knowledge have been added indicating that the value is at least 2 and no more than 14.
  • the new knowledge is indicated in this illustrative example by the first two cells being marked with “T” (True) and the last cell being marked with “F” (False).
  • Arrow 815 indicates that state 810 has evolved to state 820 as a result of the marking of the appropriate cells.
  • state diagram 830 In a still later state, illustrated in state diagram 830 , more knowledge has been added indicating that the value is exactly 6. Thus, in state 830 , the first 6 cells are marked with T and the last 9 cells are marked with F. Arrow 825 indicates that state 820 has further evolved to state 830 as a result of the marking of additional cells. At this point, no further cells can be filled in and the state can no longer change. State 830 , accordingly, is a “final” state. Every state maintained in accordance with a preferred embodiment of the present invention is from some monotonic domain, the exact domain depending upon the operator that generated the state. It is understood that FIG. 8 represents a mathematical example of a simple domain. An efficient representation will typically represent a range with a pair of integer cells, provided that the processing of updates proceeds in a manner consistent with the mathematical model.
  • the cells can only be filled in a particular order.
  • the cells can only be filled with consecutive T's beginning on the left or with consecutive F's beginning on the right.
  • Adding values in the wrong order is immediately detectable as a “gap” and serves as an indication either that messages have arrived in the wrong order, or that a message has been lost and might not arrive at all.
  • the combination of monotonic domains, incremental filling of cells, and ordering rules for filling the cells to permit gap detection, permit detection of and recovery from failures in a stateful publish-subscribe system in accordance with a preferred embodiment of the present invention.
  • the monotonic domain used for published input streams in accordance with a preferred embodiment of the present invention is always the same; namely, a set of cells labeled by “ticks” of time. Ticks in the future are blank, representing the fact that it is unknown whether those ticks will contain events and if so what the values of these events will be. Ticks in the past are filled; either with an event value, or with a silence. State 850 in FIG. 8 represents a particular state after 10 ticks have elapsed.
  • messages enter the publish-subscribe system as part of some published stream they are logged to stable storage. For example, in FIG. 7 , messages from client 710 to published stream Matches 712 are logged to storage 782 .
  • the view objects associated with published streams such as Matches 712 are now stable and never have missing messages. These messages are available even if the PHB fails.
  • Downstream view objects, such as 724 , or 742 which depend for their updates on messages that have to cross broker boundaries over links, may contain missing messages. The following describes the manner in which view objects other than the published stream view objects detect and recover from failures.
  • the general protocol in accordance with a preferred embodiment of the present invention is the same for all view objects other than the published stream view objects. There exist particular specializations of this general protocol based on the actual operation used.
  • Each view object has an associated transform object that feeds it input.
  • transform object 727 in the case of view object 742 receives an update from its input or inputs ( 726 in this example), it computes the change to the view object, and passes the change to that view object, which then performs the method illustrated in FIG. 9 .
  • FIG. 9 is a flowchart that illustrates a method for responding to state change messages in accordance with a preferred embodiment of the present invention.
  • the method is generally designated by reference number 900 , and begins when a change message is received indicating which new cells in the monotonic domain are to be filled, and the new cells are filled in accordance with the message (step 901 ).
  • the particular monotonic domain will depend upon the actual operators.
  • the new state is obtained by merging the changed cells with the original cells. Mathematically, this is called taking the greatest upper bound of the old and new values. For example, suppose the domain is the one shown in FIG. 8 as domains 810 - 840 . Suppose also that the current state is the state shown in 840 .
  • step 902 After updating the state in step 901 , a check is conducted to determine if any gaps exist in the new state (step 902 ). One of three results will be detected:
  • state changes, if any, are propagated further to all transform objects that are downstream of this view object (step 903 ).
  • the above-described protocol will result in an alarm being generated indicating that a gap was detected in step 902 a and not filled quickly enough.
  • the view object detecting the gap becomes a “curious view”.
  • the view object will notify its associated transform object, and the transform object will decide, based on the transform, exactly what kind of information it is missing and, in some cases, from which of its multiple arguments it is missing this information.
  • the transform object will then send an inquiry message, also referred to as a “curiosity message” upstream toward the view object or objects that might be able to supply the missing information.
  • Each such view object receiving a curiosity message becomes a “satisfying view”.
  • the satisfying view responds to the curiosity message by resending the requested information if it has it; otherwise the satisfying relation itself becomes curious and curiosity messages are propagated further upstream. If the satisfying view is logged to stable storage, it will always be able to satisfy curiosity and will never need to propagate curiosity messages. As mentioned previously, this is guaranteed to occur in the case of the published stream views. Therefore ultimately, missing information is retransmitted from satisfying relations back towards curious relations.
  • the protocol in accordance with the present invention tolerates duplicate messages from links by simply ignoring them (This occurs in step 901 in FIG. 9 , where the merge operation described above is encountered-case (b) “Duplicate”). Duplicate messages, accordingly, do not present a problem.
  • FIG. 10 is a diagram that illustrates the results of a stateful transformation in accordance with a preferred embodiment of the present invention.
  • This relation is the result of taking a stock ticker showing tick number, issue, price, and number of shares sold, and applying an operator that groups the ticker by issue, and delivering the total volume (totalvol) of shares sold for each issue.
  • the state of this view object is shown in two components: (1) a relation keyed by issue, and mapping to the total volume and the tick number of the latest update, and (2) a gap data structure recording which ticks from the original stock ticker have or have not contributed to the sum being stored—the information having three components: a past horizon t 1 , a future horizon t 2 , and a gap list.
  • object 1001 shows a typical value of the relation
  • objects 1002 and 1003 show possible alternative values of the gap data structure.
  • the change messages have one of the following three forms:
  • the meaning of the relation is as follows: the “issue” and “totalvol” columns represent values that a viewer of the relation would actually see; the “t” column is used to record at which ticks these values reached their current value, and is used to facilitate the response to curiosity messages.
  • horizon t 1 represents the fact that the totals in all rows include the summations including all ticks up to and including t 1 .
  • the gap-list indicate which ranges of ticks between t 1 and t 2 are included (flagged with ‘T’) or not included (flagged with ‘*’) in the summations for all rows.
  • object 1002 represents a possible state of the gap data structure in which all the ticks from 0 through 9500 have been counted towards the summations, no ticks from 9501 on have been counted, and there are, therefore, no gaps.
  • Object 1003 represents a possible state of the gap data structure in which ticks from 0 through 9500 and 9551 through 9600 have been counted, no ticks from 9601 on have been counted, but ticks in the range 9501 through 9550 are unaccounted for. Such a state would constitute a gap, and if the gap persists, the relation would become curious.
  • the tick range of the message is examined. If the tick range includes ticks that have already been counted, the already counted ticks are ignored. If the tick range includes ticks that have not previously been counted, the gap list is adjusted by possibly eliminating a gap, possibly creating a gap or possibly just extending the future horizon. In this manner, the data structure of the example is a specialization of the general protocol illustrated in FIG. 9 .
  • FIG. 11 is a flowchart that illustrates a method by which a satisfying relation responds to a curiosity message from a downstream curious view object in accordance with a preferred embodiment of the present invention.
  • the method is generally designated by reference number 1100 .
  • the satisfying relation begins by receiving the curiosity message for t1 . . . t2, and finding the set S of all tuples ⁇ i, v, t> of the relation for which the value of column t falls within the range (step 1001 ). For each such row (step 1102 ), the satisfying relation sends an Update message to the curious relation with the specified ⁇ i, v, t> (step 1104 ). For each distinct interval t i . . . t j within range t1 . . .
  • the satisfying relation sends a Don't-Care message to the curious relation, specifying the interval t i . . . t j (step 1005 ).
  • This algorithm provides an advantage over algorithms that are not aware of the mathematical properties of the transforms, and that merely replay all messages. Such algorithms are used, for example, in “guaranteed delivery” systems for stateless publish-subscribe services.
  • the present invention thus provides a fault-tolerant protocol for a distributed stateful publish-subscribe system.
  • the system includes the capability of recovering from failures that may occur when a stateful publish-subscribe service is implemented on an overlay network. Such failures may include, for example, temporary crashes of broker machines, and network errors causing messages to possibly be lost, duplicated or delivered out of order.
  • the system requires stable storage logging only when a published event enters the system, and requires that logged messages be retrieved from stable storage only in the event all brokers between a failed link or broker and the publishing sites have failed.
  • the publish-subscribe system of the present invention does not require that broker-to-broker connections use reliable FIFO protocols, such as TCP/IP, but may advantageously use faster, less reliable protocols.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Method, apparatus and computer program product for fault recovery in a distributed stateful publish-subscribe system. The system includes the capability of recovering from failures that may occur when a stateful publish-subscribe service is implemented on an overlay network. Such failures may include, for example, temporary crashes of broker machines, and network errors causing messages to possibly be lost, duplicated or delivered out of order. The system requires stable storage logging only when a published event enters the system, and requires that logged messages be retrieved from stable storage only in the event all brokers between a failed link or broker and the publishing sites have failed. The publish-subscribe system of the present invention does not require that broker-to-broker connections use reliable FIFO protocols, such as TCP/IP, but may advantageously use faster, less reliable protocols.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
The present application is related to co-pending applications entitled “CONTINUOUS FEEDBACK-CONTROLLED DEPLOYMENT OF MESSAGE TRANSFORMS IN A DISTRIBUTED MESSAGING SYSTEM”, Ser. No. 10/841,297, filed on May 7, 2004; and “DISTRIBUTED MESSAGING SYSTEM SUPPORTING STATEFUL SUBSCRIPTIONS”, Ser. No. 10/841,916, filed on May 7, 2004, both assigned to the same assignee, and incorporated herein by reference.
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates generally to the field of data processing systems and, more particularly, to a method, system and computer program product for fault recovery in a distributed stateful publish-subscribe system.
2. Description of Related Art
A publish-subscribe system is a system that includes two types of clients, publisher clients and subscriber clients. A publisher client, also referred to herein as a publisher, generates messages, also referred to as events, which contain a topic and some data content. A subscriber client, also referred to herein as a client, provides, ahead of time, a criterion, also referred to as a subscription, that specifies the information, based on published messages, that the system is required to deliver to that subscriber client in the future. In a publish-subscribe system, publishers and subscribers are anonymous in that publishers do not necessarily know the number of subscribers or their locations; and subscribers do not necessarily know the locations of the publishers.
A stateless publish-subscribe system, also referred to as a topic-based or content-based publish-subscribe system, is a system in which delivered messages are a possibly filtered subset of published messages, and in which the subscription criterion is a property that can be tested on each message independent of any other message. For example, a filtered published message might be “topic=stock-ticker” or “volume>10000 & issue=IBM”. A stateful publish-subscribe system, on the other hand, is a system without such restrictions. A stateful publish-subscribe system is required to support subscription criteria that depend upon computations that require multiple messages from one or more streams, for example, “Give me the highest quote of IBM within each one-minute period”. In addition, a stateful system might entail delivering information that is more than simply a copy of published messages, e.g. “Tell me how many stocks fell during each one-minute period”.
A stateful publish-subscribe service as used in this invention is implemented on an overlay network that comprises a collection of service machines, also referred to as brokers, that accept messages from publisher clients, deliver subscribed information to subscriber clients, and route information between publishers and subscribers. A stateful publish-subscribe system as used herein is a publish-subscribe system in which at least one subscription of the system is stateful.
An effective stateful publish-subscribe system should be fault-tolerant; i.e., it should have the ability to detect and recover from failures that may occur when a stateful publish-subscribe service is implemented on an overlay network. Such failures may include, for example, temporary crashes of broker machines, temporary losses of connectivity between broker machines, and network errors causing messages to possibly be lost, duplicated or delivered out of order.
There are known techniques in database systems in which each new published event, and the subscriber state changes derived from each new event, can be incorporated into a transaction. An implementation based upon transactions in a database system, however, is inefficient and requires an expensive “two-phase commit” protocol for every message. It would, accordingly, be advantageous to provide a fault-tolerant stateful publish-subscribe system that is efficient and that does not require a two-phase commit protocol for every message.
SUMMARY OF THE INVENTION
The present invention provides a fault-tolerant protocol for a distributed stateful publish-subscribe system. The system includes the capability of recovering from failures that may occur when a stateful publish-subscribe service is implemented on an overlay network. Such failures may include, for example, temporary crashes of broker machines, and network errors causing messages to possibly be lost, duplicated or delivered out of order. The system requires stable storage logging only when a published event enters the system, and requires that logged messages be retrieved from stable storage only in the event all brokers between a failed link or broker and the publishing sites have failed. The publish-subscribe system of the present invention does not require that broker-to-broker connections use reliable FIFO protocols, such as TCP/IP, but may advantageously use faster, less reliable protocols.
BRIEF DESCRIPTION OF THE DRAWINGS
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a pictorial representation of a network of data processing systems in which the present invention may be implemented;
FIG. 2 is a block diagram of a data processing system that may be implemented as a server computer in accordance with a preferred embodiment of the present invention;
FIG. 3 is a block diagram of a data processing system that might act as a client of the service in which the present invention may be implemented;
FIG. 4 is a diagram of a broker network for a stateful publish-subscribe system in accordance with a preferred embodiment of the present invention;
FIG. 5 is a diagram that illustrates how a stateful publish-subscribe system appears to publisher and subscriber clients in accordance with a preferred embodiment of the present invention;
FIG. 6 is a diagram that illustrates an example of an operator that transforms two input view objects to an output view object in accordance with a preferred embodiment of the present invention;
FIG. 7 is a diagram that illustrates a dataflow hypergraph, distributed over multiple brokers in accordance with a preferred embodiment of the present invention;
FIG. 8 is a pictorial representation of a monotonic domain illustrating the manner in which information gaps are detected in accordance with a preferred embodiment of the present invention;
FIG. 9 is a flowchart that illustrates a method for responding to state change messages in accordance with a preferred embodiment of the present invention;
FIG. 10 is a diagram that illustrates the result of a stateful transformation in accordance with a preferred embodiment of the present invention; and
FIG. 11 is a flowchart that illustrates a method for processing a curiosity message from a downstream object in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
With reference now to the figures, FIG. 1 depicts a pictorial representation of a network of data processing systems in which the present invention may be implemented. The network of data processing systems is designated by reference number 100, and contains a network 102, which is the medium used to provide communications links between various devices and computers connected together within network data processing system 100. Network 102 may include connections, such as wire, wireless communication links, or fiber optic cables. In the depicted example, servers 112, 114, 116 are connected to network 102 along with storage unit 106. In addition, clients 122, 124, and 126 are connected to network 102. Clients 122, 124, and 126 may, for example, be personal computers or network computers. In the depicted example, servers 112, 114, 116 provide data, such as boot files, operating system images, and applications to clients 122, 124, 126. Clients 122, 124, and 126 are clients to servers 112, 114, 116. Network data processing system 100 may include additional servers, clients, and other devices not shown. In accordance with a preferred embodiment of the present invention, network data processing system 100 provides a distributed messaging system that supports stateful subscriptions. A subset of clients 122, 124, 126 may be publishing clients, while others of clients 122, 124, 126 may be subscribing clients. Published events may also be generated by one or more of servers 112, 114, 116.
A stateful publish-subscribe system is a distributed messaging system in which at least one subscription is stateful. Other subscriptions may be content-based or, in other words, stateless. A stateful publish-subscribe system must compute information that requires multiple messages of one or more streams. For example, a stateful subscription may request, “Give me the highest quote within each one-minute period.” A stateful subscription may entail delivering information other than simply a copy of published messages. For example, a stateful subscription may request, “Tell me how many stocks fell during each one-minute period.”
The stateful publish-subscribe system is implemented as an overlay network, which is a collection of service machines, referred to as brokers, that accept messages from publisher clients, deliver subscribed information to subscriber clients, and route information between publishers and subscribers. One or more of servers 112, 114 and 116, for example, may be broker machines. Both content-based and stateful publish-subscribe systems support a message delivery model based on two roles: (1) publishers produce information in the form of structured messages; and, (2) subscribers specify in advance the kinds of information in which they are interested. As messages are later published, relevant information is delivered in a timely fashion to subscribers. Content-based subscriptions are restricted to Boolean filter predicates that can only refer to fields in individual messages. For example, a content-based subscription may request, “Deliver message if traded volume>1000 shares” where the field “traded volume” appears in each message. On the other hand, stateful subscriptions are more general state-valued expressions and may refer to one or more message histories. For example, a subscription may request “Deliver total traded volume by hour for all issues trading a total>10000 shares in that hour”. In a content-based publish-subscribe system, because subscriptions can only specify filtering, all published messages are either passed through to subscribers or filtered out. Therefore, messages received by subscribers are identically structured copies of messages published by publishers. In contrast, in a stateful publish-subscribe system, subscriptions may include more complex expressions and, therefore, subscribers may receive information that is not identical to the published messages with different formatting. For example, a published message may have only integer prices, while subscriptions to average prices may have non-integer averages.
Published event streams are associated with topics. Each topic is associated with a base relation. A base relation is a table of tuples, each tuple corresponding to an event. Subscriptions are expressed as view expressions in a relational algebraic language, although other representations may also be used. The language defines a cascade of views of base relations and derived views computed from either base relations or other views. At compile-time, the set of subscriptions is compiled into a collection of objects that are deployed and integrated into messaging brokers. At run-time, publishers and subscribers connect to these brokers. Published events are delivered to objects associated with base relations. The events are then pushed downstream to other objects that compute how each derived view changes based on the change to the base relation. Those derived views associated with subscriptions then deliver events to the subscriber informing the subscriber of each change in state.
In the depicted example, network data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, government, educational and other computer systems that route data and messages. Of course, network data processing system 100 also may be implemented as a number of different types of networks, such as for example, an intranet, a local area network (LAN), or a wide area network (WAN). FIG. 1 is intended as an example, and not as an architectural limitation for the present invention.
Referring to FIG. 2, a block diagram of a data processing system that may be implemented as a server, such as server 112 in FIG. 1, is depicted in accordance with a preferred embodiment of the present invention. Data processing system 200 may be a symmetric multiprocessor (SMP) system including a plurality of processors 202 and 204 connected to system bus 206. Alternatively, a single processor system may be employed. Also connected to system bus 206 is memory controller/cache 208, which provides an interface to local memory 209. I/O bus bridge 210 is connected to system bus 206 and provides an interface to I/O bus 212. Memory controller/cache 208 and I/O bus bridge 210 may be integrated as depicted.
Peripheral component interconnect (PCI) bus bridge 214 connected to I/O bus 212 provides an interface to PCI local bus 216. A number of modems may be connected to PCI local bus 216. Typical PCI bus implementations will support four PCI expansion slots or add-in connectors. Communications links to clients 122-126 in FIG. 1 may be provided through modem 218 and network adapter 220 connected to PCI local bus 216 through add-in connectors. Additional PCI bus bridges 222 and 224 provide interfaces for additional PCI local buses 226 and 228, from which additional modems or network adapters may be supported. In this manner, data processing system 200 allows connections to multiple network computers. A memory-mapped graphics adapter 230 and hard disk 232 may also be connected to I/O bus 212 as depicted, either directly or indirectly.
Those of ordinary skill in the art will appreciate that the hardware depicted in FIG. 2 may vary. For example, other peripheral devices, such as optical disk drives and the like, also may be used in addition to or in place of the hardware depicted. The depicted example is not meant to imply architectural limitations with respect to the present invention. The data processing system depicted in FIG. 2 may be, for example, an IBM eServer™ pSeries® system, a product of International Business Machines Corporation in Armonk, N.Y., running the Advanced Interactive Executive (AIX) operating system or LINUX operating system.
With reference now to FIG. 3, a block diagram of a data processing system is shown illustrating a machine that may serve as a client of the service in which the present invention may be implemented. Data processing system 300 is an example of a computer, such as client computer 122 in FIG. 1, in which code or instructions implementing the processes of the present invention may be located. In the depicted example, data processing system 300 employs a hub architecture including a north bridge and memory controller hub (ICH) 308 and a south bridge and input/output (I/O) controller hub (ICH) 310. Processor 302, main memory 304, and graphics processor 318 are connected to MCH 308. Graphics processor 318 may be connected to the MCH through an accelerated graphics port (AGP), for example.
In the depicted example, local area network (LAN) adapter 312, audio adapter 316, keyboard and mouse adapter 320, modem 322, read only memory (ROM) 324, hard disk drive (HDD) 326, CD-ROM driver 330, universal serial bus (USB) ports and other communications ports 332, and PCI/PCIe devices 334 may be connected to ICH 310. PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, PC cards for notebook computers, etc. PCI uses a cardbus controller, while PCIe does not. ROM 324 may be, for example, a flash binary input/output system (BIOS). Hard disk drive 326 and CD-ROM drive 330 may use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface. A super I/O (SIO) device 336 may be connected to ICH 310.
An operating system runs on processor 302 and is used to coordinate and provide control of various components within data processing system 300 in FIG. 3. The operating system may be a commercially available operating system such as Windows XP™, which is available from Microsoft Corporation. Instructions for the operating system, and applications or programs are located on storage devices, such as hard disk drive 326, and may be loaded into main memory 304 for execution by processor 302. The processes of the present invention are performed by processor 302 using computer implemented instructions, which may be located in a memory such as, for example, main memory 304, memory 324, or in one or more peripheral devices 326 and 330. Those of ordinary skill in the art will appreciate that the hardware in FIG. 3 may vary depending on the implementation. Other internal hardware or peripheral devices, such as flash memory, equivalent non-volatile memory, or optical disk drives and the like, may be used in addition to or in place of the hardware depicted in FIG. 3. Also, the processes of the present invention may be applied to a multiprocessor data processing system.
In accordance with a preferred embodiment of the present invention, a plurality of broker machines are responsible for delivery of messages sent by publishing clients towards subscribing clients based upon the content of the messages and the stateful transformations requested by the subscribing clients. These broker machines form an overlay network. Some broker machines may be specialized for hosting publishing clients, referred to as publisher hosting brokers (PHBs), and others for hosting subscribing clients, referred to as subscriber hosting brokers (SHBs). Between the PHBs and the SHBs, there may be any number of intermediate nodes that include routing and filtering. The brokers at the intermediate nodes are referred to as intermediate brokers or IBs. For expository purposes, this separation is assumed; however, in actual deployment, some or all of the broker machines may combine the functions of a PHB, an SHB and/or an IB.
FIG. 4 is a diagram of a broker network for a stateful publish-subscribe system in accordance with a preferred embodiment of the present invention. A publishing client, such as one of publishers 402 a-402 d, establishes a connection to a PHB, such as PHB 404 a or PHB 404 b, over a corresponding one of client connections 406 a-406 d. The client connections may, for example, be a first-in/first-out (FIFO) connection, such as a Transmission Control Protocol/Internet Protocol (TCP/IP) socket connection, or another suitable connection. Independently, a subscribing client, such as one of subscribers 412 a-412 d, establishes a connection to an SHB, such as SHB 410 a or SHB 410 b, over a corresponding one of client connections 414 a-414 d, which may be similar to client connections 406 a-406 d. The PHBs and SHBs are connected, via intermediate brokers 408 a-408 b, through broker-to-broker links. As will be described more fully hereinafter, a publish-subscribe system in accordance with the present invention includes a fault-tolerant protocol that tolerates link failures and message re-orderings. Accordingly, in the publish-subscribe system of the present invention it is not necessary for the broker-to-broker connections to use reliable FIFO protocols, such as TCP/IP, but may, when it is advantageous to do so, use faster, less reliable protocols. Each broker machine may be a stand-alone computer, a process within a computer, or, to minimize delay due to failures, a cluster of redundant processes within multiple computers. Similarly, the links may be simple socket connections, or connection bundles that use multiple alternative paths for high availability and load balancing.
FIG. 5 is a diagram that illustrates how a stateful publish-subscribe system appears to publisher and subscriber clients in accordance with a preferred embodiment of the present invention. Clients are unaware of the physical broker network or its topology. A client application may connect to any broker in the role of publisher and/or subscriber. Publishing clients are aware only of particular named published event streams (topics), such as publish streams 502 and 504. Multiple clients may publish to the same published event stream (topic).
Administrators and clients may define derived views based on functions of either published event streams or of other derived views. In the depicted example, published event streams may be represented as relations. Derived views are represented as relations derived from published event streams or from other derived views by means of relational algebraic expressions in a language, such as Date and Darwen's Tutorial-D, Structured Query Language (SQL), or XQUERY. For example, derived view 510 is defined as a function of stream relations 502 and 504 by means of a JOIN expression with relations 502 and 504 as inputs and relation 510 as an output. Similarly, relation 512, indicated as a subscriber view, is derived from relation 510 by client-specified relational expressions. For example, subscriber view 512 may be a request to group the stock trades of relation 510 by issue and hour and compute the running total volume and max and min price for each issue-hour pair.
Each subscribing client subscribes to a particular derived view. As published events enter the system from publishing clients, they are saved in their respective streams. The system is then responsible for updating each derived view according to the previously specified relational expressions and then delivering client messages to each subscriber representing the changes to the state of the respective subscribed view.
In a preferred embodiment of the present invention, subscription specifications are analyzed by a compiler and converted into a collection of transform objects and view objects. In one embodiment, the compiler generates JAVA classes for transform and view objects, which are then packaged into an archive (JAR) file, uploaded to the appropriate brokers, and instantiated. Each operator that derives a view from one or more inputs corresponds to a transform object. Each view corresponds to a view object. View objects hold the state of a view. Transform objects express the logic for incrementally updating an output view constituting the result of an operator in response to individual changes to input views constituting the arguments to that operator.
FIG. 6 illustrates an example of an operator that transforms input view objects to an output view object in accordance with a preferred embodiment of the present invention. In the depicted example, views 610 and 620 are view objects that are inputs to some operator, such as, for example, a JOIN operator. Transform 650 is a transform object for that operator, which produces a derived view shown as view object 670. When one of the input objects 610 or 620 changes, either because it itself is a published input stream or because it is a derived view that has changed as a result of changes to its inputs, messages reflecting the changes are sent to transform object 650. Transform 650 receives the messages representing changes to its inputs 610, 620, computes how the result of the operator changes given the announced changes it has received, and then delivers the computed results to its output view object 670 in the form of change messages. Output view object 670 then propagates in its turn such change messages, either to further transforms, if view object 670 is an intermediate view, or to subscribers, if view object 670 is a subscriber view. Under failure-free circumstances, all message flows are in the “downstream” direction, that is, from inputs toward transforms, and from transforms toward objects. Exceptionally, view objects such as view object 670 will detect, using protocols discussed hereinafter, that information is missing as a result of failures causing message loss. In such cases, messages, referred to as “curiosity” messages will flow in the upstream direction, that is, from output view objects toward transform objects and from transform objects toward input view objects, in order to request the re-sending of these missing messages.
FIG. 6 is a diagram that illustrates an example of an operator that transforms two input view objects to an output view object in accordance with a preferred embodiment of the present invention. When subscriptions are entered, the mechanism of the present invention builds a structure containing all of the transform objects and view objects needed for all intermediate and subscribed views of all subscriptions. This structure is called a dataflow hypergraph. The dataflow hypergraph has nodes corresponding to each view object and hyperedges, which may possibly have more than one input feeding an output, representing each transform object associated with an operation in the subscription specification.
The view objects and transform objects are then allocated to actual brokers in the overlay network, either manually by an administrator or automatically via a service, such as the one described in co-pending application entitled “CONTINUOUS FEEDBACK-CONTROLLED DEPLOYMENT OF MESSAGE TRANSFORMS IN A DISTRIBUTED MESSAGING SYSTEM”, Ser. No. 10/841,297, filed on May 7, 2004. The published streams and the subscribed views may be constrained to be located on brokers where the publishers and subscribers actually connect. The placement of the intermediate transform objects and view objects is not constrained. That is, intermediate transform objects and view objects may be placed wherever suitable, taking into consideration the capacities of the broker machines and the links, as well as the desired performance. After such allocation of objects to brokers, the result is a distributed dataflow hypergraph.
FIG. 7 is a diagram that illustrates a dataflow hypergraph, distributed over multiple brokers in accordance with a preferred embodiment of the present invention. In the depicted example, the physical network consists of brokers 710, 720, 730, and 740. There are three publishing clients 702, 704, and 706, and one subscribing client 750. The publishing clients are publishing to three separate published message streams: “buys” 722 on broker 720, “sells” 734 on broker 730, and “matches” 712 on broker 710. The subscribing client 750 subscribes to derived view 748 on broker 740.
Broker 710 also includes transforms 714 and 716, which feed change messages to brokers 720 and 730, respectively. Broker 720 includes view objects 724 and 726 and transform objects 725 and 727. As an example, view object 726 represents an intermediate derived view or relation, which is based on transform 725, published stream 722, and view 724. Broker 730 includes views 732 and 736, in addition to published stream 734, and also includes transforms 735, 737. Broker 740 includes views 742, 744 and 748, and transform 746. View 748 is a subscriber view for subscriber 750. As stated above, multiple publisher clients may provide messages for a single message stream, and multiple subscriber clients may subscribe and receive updates from the same view.
As shown in FIG. 7, the transform graph consists of multiple transform and view objects distributed over all brokers. The paths between objects will sometimes lie within a broker, as is the case between transform object 725 and intermediate view object 726. In other cases, such as the path between transform 727 and intermediate view object 742 (shown with a dotted line), the path must cross over an inter-broker link. It is understood that the within-broker communications between objects may use cheaper communications formats, such as parameter passing between objects, however, inter-broker communications requires generating physical messages or packets that will cross the link. In accordance with a preferred embodiment of the present invention, the protocols of all view objects and transform objects will be able to recover from lost, out-of-order, or duplicate messages, and, therefore, will work correctly regardless of which paths between objects cross broker boundaries and which do not.
In order to support stateful subscriptions, a history of states is stored in a data storage device. For example, messages from the “matches” published stream 712 are stored in storage 782, messages from the “buys” published stream 722 are stored in storage 784, and messages from the “sells” published stream 734 are stored in storage 786. Storage 782, 784 and 786 should be a persistent storage, such as a hard drive, capable of recovering published messages should broker 710, 720 or 730 crash and be restarted. In a system guaranteeing reliable service, published messages will be logged to persistent storage before being propagated. Other states, such as views 742, 744, are preferentially stored in main memory and are not required to be stored persistently. As was described above, the term “downstream” refers to the direction along the hypergraph of FIG. 7 from input streams towards subscribers; and the term “upstream” refers to the opposite direction from a subscribed or intermediate view object back towards the input streams or towards intermediate views used to compute them.
In accordance with a preferred embodiment of the present invention, the state represented in each view object is represented by a value in a monotonic domain. A monotonic domain is a set of values that can be put into a partial order. In the present invention, if a value B succeeds a value A in the partial order, this means that value B contains more information than value A. For example, value B might be a property of a total stock volume that says the total is at least 100; and value A might be the property that says the total is at least 50. A critical property of monotonic domains is that they change only in one direction: from less knowledge to more knowledge. The monotonic domains begin from a bottom state, which is a state of total ignorance. In the example given above of total stock volume, the bottom state is the state that the total stock volume is at least 0.
Every monotonic domain can be represented as a collection of cells that are initially empty and that become filled in over time. Once a cell is filled, it cannot change value. FIG. 8 is a pictorial representation of a monotonic domain illustrating the manner in which information gaps are detected in accordance with a preferred embodiment of the present invention. FIG. 8 illustrates a very simple monotonic domain representing a computed value that can take a range of from 0 to 15. Initially, as shown in state diagram 810, all cells are blank, representing that nothing is known other than that the value is in the range of 0-15. A later state of the same object occurs if new pieces of knowledge are added. In state diagram 820, two new pieces of knowledge have been added indicating that the value is at least 2 and no more than 14. The new knowledge is indicated in this illustrative example by the first two cells being marked with “T” (True) and the last cell being marked with “F” (False). Arrow 815 indicates that state 810 has evolved to state 820 as a result of the marking of the appropriate cells.
In a still later state, illustrated in state diagram 830, more knowledge has been added indicating that the value is exactly 6. Thus, in state 830, the first 6 cells are marked with T and the last 9 cells are marked with F. Arrow 825 indicates that state 820 has further evolved to state 830 as a result of the marking of additional cells. At this point, no further cells can be filled in and the state can no longer change. State 830, accordingly, is a “final” state. Every state maintained in accordance with a preferred embodiment of the present invention is from some monotonic domain, the exact domain depending upon the operator that generated the state. It is understood that FIG. 8 represents a mathematical example of a simple domain. An efficient representation will typically represent a range with a pair of integer cells, provided that the processing of updates proceeds in a manner consistent with the mathematical model.
It is to be noted that the cells can only be filled in a particular order. In the example illustrated by states 810-830, the cells can only be filled with consecutive T's beginning on the left or with consecutive F's beginning on the right. Adding values in the wrong order, for example, as illustrated in state diagram 840, is immediately detectable as a “gap” and serves as an indication either that messages have arrived in the wrong order, or that a message has been lost and might not arrive at all. The combination of monotonic domains, incremental filling of cells, and ordering rules for filling the cells to permit gap detection, permit detection of and recovery from failures in a stateful publish-subscribe system in accordance with a preferred embodiment of the present invention.
The monotonic domain used for published input streams in accordance with a preferred embodiment of the present invention is always the same; namely, a set of cells labeled by “ticks” of time. Ticks in the future are blank, representing the fact that it is unknown whether those ticks will contain events and if so what the values of these events will be. Ticks in the past are filled; either with an event value, or with a silence. State 850 in FIG. 8 represents a particular state after 10 ticks have elapsed.
The monotonic domains used herein for derived views are based on analysis of the operators that generate the views. For example, a view that sums a set of K tuples each of which can have a value from 0 to M, produces a range from 0 to K*M. The example of the 0 to 15 range shown in the monotonic domain of FIG. 8 could result from such a case where K=5 and M=3.
When messages enter the publish-subscribe system as part of some published stream, they are logged to stable storage. For example, in FIG. 7, messages from client 710 to published stream Matches 712 are logged to storage 782. The view objects associated with published streams such as Matches 712 are now stable and never have missing messages. These messages are available even if the PHB fails. Downstream view objects, such as 724, or 742, which depend for their updates on messages that have to cross broker boundaries over links, may contain missing messages. The following describes the manner in which view objects other than the published stream view objects detect and recover from failures.
The general protocol in accordance with a preferred embodiment of the present invention is the same for all view objects other than the published stream view objects. There exist particular specializations of this general protocol based on the actual operation used.
Consider a view object that is not a published stream view object, for example, view object 742 in FIG. 7. Each view object has an associated transform object that feeds it input. When the transform object, transform object 727 in the case of view object 742, receives an update from its input or inputs (726 in this example), it computes the change to the view object, and passes the change to that view object, which then performs the method illustrated in FIG. 9.
In particular, FIG. 9 is a flowchart that illustrates a method for responding to state change messages in accordance with a preferred embodiment of the present invention. The method is generally designated by reference number 900, and begins when a change message is received indicating which new cells in the monotonic domain are to be filled, and the new cells are filled in accordance with the message (step 901). The particular monotonic domain will depend upon the actual operators. The new state is obtained by merging the changed cells with the original cells. Mathematically, this is called taking the greatest upper bound of the old and new values. For example, suppose the domain is the one shown in FIG. 8 as domains 810-840. Suppose also that the current state is the state shown in 840. The following are possible cases:
    • a. (Usual) The change message requests that cells 3-5 be filled with T and that cells 10-14 be filled with F. This is a typical case where the value moves to a later value, namely transitioning from the range extending from 2 to 14 to the range extending from 5 to 9.
    • b. (Duplicate) The change message requests that cell 1 be filled with T and that cell 15 be filled with F. These cells are already filled. Therefore, the new state is the same as the old state and the message is, in effect, ignored and treated as a duplicate. (The discussion below indicates a circumstance in which duplicates might occur as a result of retransmitting messages which were delayed but not lost.)
    • c. (Gap) The change message requests that cells 5-6 be filled with T. This yields a value such as the value shown in state 840, containing a gap. The gap is an indication that some message has probably been lost. Either some messages have been swapped during transit over the link, or some message has been lost and will need to be recovered.
After updating the state in step 901, a check is conducted to determine if any gaps exist in the new state (step 902). One of three results will be detected:
    • a. A gap has arisen where there previously was no gap. In this case, a timer service is invoked (step 902 a). If some threshold number of milliseconds has elapsed and the timer has not been cancelled, an alarm will be raised.
    • b. A gap has been closed where there was previously a gap. In this case, the timer is cancelled (step 902 b).
    • c. If there has been no change with regard to gaps, no further action is taken (step 902 c).
In all of the above three results, state changes, if any, are propagated further to all transform objects that are downstream of this view object (step 903).
From time to time, the above-described protocol will result in an alarm being generated indicating that a gap was detected in step 902 a and not filled quickly enough. When an alarm is generated, the view object detecting the gap becomes a “curious view”. The view object will notify its associated transform object, and the transform object will decide, based on the transform, exactly what kind of information it is missing and, in some cases, from which of its multiple arguments it is missing this information. The transform object will then send an inquiry message, also referred to as a “curiosity message” upstream toward the view object or objects that might be able to supply the missing information. Each such view object receiving a curiosity message becomes a “satisfying view”. The satisfying view responds to the curiosity message by resending the requested information if it has it; otherwise the satisfying relation itself becomes curious and curiosity messages are propagated further upstream. If the satisfying view is logged to stable storage, it will always be able to satisfy curiosity and will never need to propagate curiosity messages. As mentioned previously, this is guaranteed to occur in the case of the published stream views. Therefore ultimately, missing information is retransmitted from satisfying relations back towards curious relations.
It is possible that the messages conveying the missing information were merely delayed and not lost. In such a case, either the messages or the retransmissions from the satisfying view will turn out to be duplicates. However, the protocol in accordance with the present invention tolerates duplicate messages from links by simply ignoring them (This occurs in step 901 in FIG. 9, where the merge operation described above is encountered-case (b) “Duplicate”). Duplicate messages, accordingly, do not present a problem.
It is necessary to set a second timer when a curious view sends a curiosity message. This timer will trigger an alarm if the gap is not satisfied within a designated timeout period. If the timer times out before the gap is filled, the curiosity message will be resent. This protocol is necessary to deal with the possibility of the curiosity message itself being lost. The protocol assumes that curiosity messages and their responses will not be infinitely often lost; that is, that if one waits long enough, there will be a stable period of connectivity between a broker and its nearest neighbor.
An example of a view relation and the particular monotonic type it uses and the particular representations used to hold the state, detect gaps, and issue curiosity will now be given with reference to FIG. 10 which is a diagram that illustrates the results of a stateful transformation in accordance with a preferred embodiment of the present invention. This relation is the result of taking a stock ticker showing tick number, issue, price, and number of shares sold, and applying an operator that groups the ticker by issue, and delivering the total volume (totalvol) of shares sold for each issue.
The state of this view object is shown in two components: (1) a relation keyed by issue, and mapping to the total volume and the tick number of the latest update, and (2) a gap data structure recording which ticks from the original stock ticker have or have not contributed to the sum being stored—the information having three components: a past horizon t1, a future horizon t2, and a gap list. In FIG. 10, object 1001 shows a typical value of the relation, and objects 1002 and 1003 show possible alternative values of the gap data structure.
The change messages have one of the following three forms:
    • Update (i, v, t) represents the fact that the volume for the issue has increased to v as of tick t.
    • Silence (t1, t2) represents the fact that silences occurred in the tick range t1 to t2. In order to distinguish between event streams that have quiesced and link failures, the implementation requires that after a certain period with no change, silence messages for that time interval should propagate. In a typical implementation, silence messages are normally “piggybacked” on the neighboring update messages, and only sent in isolation when a long enough interval has elapsed without any update messages having been sent.
    • Don't-Care (t1, t2) represents the fact that the values for all issues in this range can be ignored (because it is known that an update for a time later than t2 exists).
The meaning of the relation is as follows: the “issue” and “totalvol” columns represent values that a viewer of the relation would actually see; the “t” column is used to record at which ticks these values reached their current value, and is used to facilitate the response to curiosity messages.
The meaning of the gap data structure is as follows: horizon t1 represents the fact that the totals in all rows include the summations including all ticks up to and including t1. Horizon t2 represents the fact that the totals in all rows do not include any ticks from t2 on. In the normal case, where there are no messages lost, t2=t1+1. When t2 does not equal t1+1, there are one or more gaps. The gap-list indicate which ranges of ticks between t1 and t2 are included (flagged with ‘T’) or not included (flagged with ‘*’) in the summations for all rows.
In FIG. 10, object 1002 represents a possible state of the gap data structure in which all the ticks from 0 through 9500 have been counted towards the summations, no ticks from 9501 on have been counted, and there are, therefore, no gaps. Object 1003 represents a possible state of the gap data structure in which ticks from 0 through 9500 and 9551 through 9600 have been counted, no ticks from 9601 on have been counted, but ticks in the range 9501 through 9550 are unaccounted for. Such a state would constitute a gap, and if the gap persists, the relation would become curious.
When processing a message, the tick range of the message is examined. If the tick range includes ticks that have already been counted, the already counted ticks are ignored. If the tick range includes ticks that have not previously been counted, the gap list is adjusted by possibly eliminating a gap, possibly creating a gap or possibly just extending the future horizon. In this manner, the data structure of the example is a specialization of the general protocol illustrated in FIG. 9.
If, in the data structure of FIG. 10, a gap persists for longer than the timeout interval, then the view represented by the data structure becomes curious, and a curiosity message is sent upstream. Suppose the view object upstream has a similar structure to FIG. 10. The curiosity message will request messages for a particular range or ranges of ticks, one range of ticks for each persistent gap. FIG. 11 is a flowchart that illustrates a method by which a satisfying relation responds to a curiosity message from a downstream curious view object in accordance with a preferred embodiment of the present invention.
The method is generally designated by reference number 1100. The satisfying relation begins by receiving the curiosity message for t1 . . . t2, and finding the set S of all tuples <i, v, t> of the relation for which the value of column t falls within the range (step 1001). For each such row (step 1102), the satisfying relation sends an Update message to the curious relation with the specified <i, v, t> (step 1104). For each distinct interval ti . . . tj within range t1 . . . t2 that does not include a tick named in a row of S (step 1103), the satisfying relation sends a Don't-Care message to the curious relation, specifying the interval ti . . . tj (step 1005).
The result of this algorithm is that if there have been multiple updates to some stock issue during the interval t1 . . . t2, only one update is sent, specifically, the update that led to the highest value. This update will supersede all previous updates, and previous updates need not be sent. This algorithm provides an advantage over algorithms that are not aware of the mathematical properties of the transforms, and that merely replay all messages. Such algorithms are used, for example, in “guaranteed delivery” systems for stateless publish-subscribe services.
The present invention thus provides a fault-tolerant protocol for a distributed stateful publish-subscribe system. The system includes the capability of recovering from failures that may occur when a stateful publish-subscribe service is implemented on an overlay network. Such failures may include, for example, temporary crashes of broker machines, and network errors causing messages to possibly be lost, duplicated or delivered out of order. The system requires stable storage logging only when a published event enters the system, and requires that logged messages be retrieved from stable storage only in the event all brokers between a failed link or broker and the publishing sites have failed. The publish-subscribe system of the present invention does not require that broker-to-broker connections use reliable FIFO protocols, such as TCP/IP, but may advantageously use faster, less reliable protocols.
It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (16)

1. A method for fault recovery in a stateful publish-subscribe system, the method comprising:
providing the stateful publish-subscribe system, the system including an overlay network, wherein the overlay network comprises a publisher that first transmits a plurality of structured messages through the overlay network, an object downstream from the publisher that receives the plurality of structured messages as a plurality of input messages from the publisher and applies at least one transform to the plurality of input messages to form at least one output message that is transmitted to a subscriber downstream from the object, and the subscriber that receives the at least one output message from the object;
storing a history of the plurality of structured messages in a stable storage at the publisher;
detecting missing information with respect to a message transmitted downstream through the overlay network from the plurality of input messages to the at least one output message, wherein view objects using protocols detect the missing information, wherein the view objects are represented by a set of values in a monotonic domain, wherein the monotonic domain is the set of values in a partial order, wherein detecting the missing information with respect to a message includes detecting a gap in the set of values, and wherein the gap is an indication that the message is lost or not arrived at all;
transmitting an inquiry message requesting the missing information upstream through the overlay network to the object;
determining whether the missing information is stored in the object using the inquiry message;
responsive to a determination that the missing information is stored in the object, responding to the inquiry message by reapplying the at least one transform to the missing information and transmitting the missing information downstream through the overlay network;
responsive to a determination that the missing information is not stored in the object, transmitting the inquiry message from the object to the publisher;
responding to the inquiry message by retrieving the missing information from the plurality of structured messages in the stable storage at the publisher and transmitting the missing information from the publisher downstream through the overlay network.
2. The method according to claim 1, wherein the system includes a hypergraph defining at least one transform object and at least one view object deployed over the overlay network, and wherein transmitting an inquiry message upstream through the overlay network comprises transmitting an inquiry message upstream through the overlay network from object to object until the missing information is determined.
3. The method according to claim 2, wherein the plurality of input messages are logged to a storage, and wherein transmitting an inquiry message upstream through the overlay network from object to object comprises transmitting an inquiry message upstream from object to object until the missing information is determined or until the inquiry message is transmitted to the storage.
4. The method according to claim 1, and further including invoking a timer when missing information is detected with respect to a message transmitted downstream through the overlay network, and wherein transmitting an inquiry message upstream through the overlay network is performed when the invoked timer has elapsed.
5. The method according to claim 1, wherein the steps of detecting missing information, transmitting an inquiry message and receiving and responding to the inquiry message are tailored to mathematical properties of individual transforms.
6. The method of claim 1, wherein the publisher is located on a first computer, the object is located on a second computer, and the subscriber is located on a third computer.
7. The method of claim 6, wherein the plurality of structured messages are transmitted from the publisher to the object using a connection bundle with a plurality of alternative paths.
8. The method of claim 1, wherein applying the at least one transform further comprises:
grouping a plurality of stock trades stored in plurality of structured messages by a stock issue and an hour.
9. The method of claim 8, wherein applying the at least one transform further comprises:
computing a total volume, a maximum price, and a minimum price for the stock issue during the hour.
10. The method of claim 1, further comprising:
distributing, by a service, the at least one transform within the overlay network.
11. The method of claim 1, wherein the step of storing the history of the plurality of structured messages in the stable storage is performed prior to the publisher first transmitting the plurality of structured messages through the overlay network.
12. The method of claim 1, wherein the at least one output message is ignored by the subscriber when the at least one output message is duplicative.
13. The method of claim 1, wherein the step of transmitting the inquiry message from the object to the publisher further comprises:
determining whether the missing information has been received by the object within a time period;
responsive to the missing information not being received by the object within the time period, retransmitting the inquiry message from the object to the publisher.
14. The method of claim 1, wherein the step of retrieving the missing information from the plurality of structured messages in the stable storage at the publisher comprises:
retrieving a structured message with a highest value in the gap in the set of values from the plurality of structured messages in the stable storage at the publisher.
15. A stateful publish-subscribe system comprising:
an overlay network having a plurality of broker machines;
at least one publishing client that publishes a plurality of messages to a plurality of published message streams;
at least one intermediate client that receives the plurality of messages and applies at least one transform to the plurality of messages to form at least one output message that is transmitted to at least one subscribing client downstream from the intermediate client;
at least one subscribing client that requests a view of the at least one output message, wherein at least one of the at least one subscribing client requests a stateful view in which at least one update to the stateful view depends upon more than one of the at least one output message;
a hypergraph defining transform objects and view objects deployed over the overlay network;
a stable storage for storing the plurality of messages at the publishing client;
a first protocol for detecting missing information using the view objects with respect to a message transmitted downstream through the overlay network, wherein the view objects are represented by a set of values in a monotonic domain, wherein the monotonic domain is the set of values in a partial order, wherein detecting the missing information with respect to a message includes detecting a gap in the set of values, and wherein the gap is an indication that the message is lost or not arrived at all;
a second protocol for transmitting an inquiry message requesting the missing information upstream through the overlay network to the at least one intermediate client;
a third protocol for determining whether the missing information is stored in the at least one intermediate client using the inquiry message;
a fourth protocol for, responsive to the third protocol determining that the missing information is stored in the at least one intermediate client, receiving and responding to the inquiry message by reprocessing the missing information using the transform objects, and transmitting the missing information downstream through the overlay network;
a fifth protocol for, responsive to the third protocol determining that the missing information is not stored in the at least one intermediate client, transmitting the inquiry message from the at least one intermediate client to the at least one publishing client;
a sixth protocol for responding to the inquiry message by retrieving the missing information from the plurality of structured messages in the stable storage at the publishing client and transmitting the missing information from the publishing client downstream through the overlay network.
16. A computer program product comprising:
a computer recordable-type medium including computer usable program code for fault recovery in a stateful publish-subscribe system, the computer program product comprising:
computer usable program code for providing the stateful publish-subscribe system, wherein the overlay network comprises a publisher that first transmits a plurality of structured messages through the overlay network, an object downstream from the publisher that receives the plurality of structured messages as a plurality of input messages from the publisher and applies at least one transform to the plurality of input messages to form at least one output message that is transmitted to a subscriber downstream from the object, and the subscriber that receives the at least one output message from the object;
computer usable program code for storing a history of the plurality of structured messages in a stable storage at the publisher;
computer usable program code for detecting missing information with respect to a message transmitted downstream through the overlay network from the plurality of input messages to the at least one output message, wherein view objects using protocols detect the missing information, wherein the view objects are represented by a set of values in a monotonic domain, wherein the monotonic domain is the set of values in a partial order, wherein detecting the missing information with respect to a message includes detecting a gap in the set of values, and wherein the gap is an indication that the message is lost or not arrived at all;
computer usable program code for transmitting an inquiry message requesting the missing information upstream through the overlay network to the object;
computer usable program code for determining whether the missing information is stored in the object using the inquiry message;
computer usable program code for responding to the inquiry message by reapplying the at least one transform to the determined missing information and transmitting the missing information downstream through the overlay network in response to a determination that the missing information is stored in the object;
computer usable program code for transmitting the inquiry message from the object to the publisher in response to a determination that the missing information is not stored in the object;
computer usable program code for responding to the inquiry message by retrieving the missing information from the plurality of structured messages in the stable storage at the publisher and the missing information from the publisher downstream through the overlay network.
US10/846,196 2004-05-14 2004-05-14 Recovery in a distributed stateful publish-subscribe system Expired - Fee Related US7886180B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/846,196 US7886180B2 (en) 2004-05-14 2004-05-14 Recovery in a distributed stateful publish-subscribe system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/846,196 US7886180B2 (en) 2004-05-14 2004-05-14 Recovery in a distributed stateful publish-subscribe system

Publications (2)

Publication Number Publication Date
US20050268146A1 US20050268146A1 (en) 2005-12-01
US7886180B2 true US7886180B2 (en) 2011-02-08

Family

ID=35426804

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/846,196 Expired - Fee Related US7886180B2 (en) 2004-05-14 2004-05-14 Recovery in a distributed stateful publish-subscribe system

Country Status (1)

Country Link
US (1) US7886180B2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061366A1 (en) * 2005-09-09 2007-03-15 Oden Insurance Services, Inc. Subscription apparatus and method
US20080209440A1 (en) * 2004-05-07 2008-08-28 Roman Ginis Distributed messaging system supporting stateful subscriptions
US20100049821A1 (en) * 2008-08-21 2010-02-25 Tzah Oved Device, system, and method of distributing messages
US9325650B2 (en) 2014-04-02 2016-04-26 Ford Global Technologies, Llc Vehicle telematics data exchange
US9323546B2 (en) 2014-03-31 2016-04-26 Ford Global Technologies, Llc Targeted vehicle remote feature updates
US9524156B2 (en) 2014-01-09 2016-12-20 Ford Global Technologies, Llc Flexible feature deployment strategy
US9654433B2 (en) 2014-09-23 2017-05-16 International Business Machines Corporation Selective message republishing to subscriber subsets in a publish-subscribe model
US9716762B2 (en) 2014-03-31 2017-07-25 Ford Global Technologies Llc Remote vehicle connection status
US9766874B2 (en) 2014-01-09 2017-09-19 Ford Global Technologies, Llc Autonomous global software update
US10140110B2 (en) 2014-04-02 2018-11-27 Ford Global Technologies, Llc Multiple chunk software updates
US10303558B2 (en) * 2015-04-17 2019-05-28 Microsoft Technology Licensing, Llc Checkpointing higher order query operators
US10748158B2 (en) 2004-10-08 2020-08-18 Refinitiv Us Organization Llc Method and system for monitoring an issue

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050251556A1 (en) * 2004-05-07 2005-11-10 International Business Machines Corporation Continuous feedback-controlled deployment of message transforms in a distributed messaging system
US7404108B2 (en) * 2004-08-06 2008-07-22 International Business Machines Corporation Notification method and apparatus in a data processing system
US7818417B2 (en) * 2006-01-10 2010-10-19 International Business Machines Corporation Method for predicting performance of distributed stream processing systems
US20070299979A1 (en) * 2006-06-27 2007-12-27 Avshalom Houri Stateless publish/subscribe messaging using sip
US20070297327A1 (en) * 2006-06-27 2007-12-27 International Business Machines Corporation Method for applying stochastic control optimization for messaging systems
US7937618B2 (en) * 2007-04-26 2011-05-03 International Business Machines Corporation Distributed, fault-tolerant and highly available computing system
US20080307436A1 (en) * 2007-06-06 2008-12-11 Microsoft Corporation Distributed publish-subscribe event system with routing of published events according to routing tables updated during a subscription process
US7761401B2 (en) * 2007-06-07 2010-07-20 International Business Machines Corporation Stochastic control optimization for sender-based flow control in a distributed stateful messaging system
US7793140B2 (en) * 2007-10-15 2010-09-07 International Business Machines Corporation Method and system for handling failover in a distributed environment that uses session affinity
US9576268B2 (en) * 2009-08-26 2017-02-21 Hewlett Packard Enterprise Development Lp Distributed data analysis
US9537747B2 (en) * 2010-06-11 2017-01-03 International Business Machines Corporation Publish/subscribe overlay network control system
US8589732B2 (en) * 2010-10-25 2013-11-19 Microsoft Corporation Consistent messaging with replication
US20150301875A1 (en) * 2014-04-22 2015-10-22 Andreas Harnesk Persisting and managing application messages
CN106528328A (en) * 2016-10-10 2017-03-22 乐视控股(北京)有限公司 Data recovery method, device and system based on distributed storage system
US11750441B1 (en) * 2018-09-07 2023-09-05 Juniper Networks, Inc. Propagating node failure errors to TCP sockets
CN111541608B (en) * 2020-04-16 2022-07-19 腾讯科技(成都)有限公司 Network communication method, system and related device
US11340828B2 (en) 2020-08-10 2022-05-24 Bank Of America Corporation Restoring messages to a memory utilized by a topic in a publish-subscribe environment
US11354161B2 (en) 2020-08-10 2022-06-07 Bank Of America Corporation Controlling memory utilization by a topic in a publish-subscribe environment
CN113098978B (en) * 2021-04-21 2023-04-07 上海微盟企业发展有限公司 Data transmission method, device and medium

Citations (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5091918A (en) 1988-03-05 1992-02-25 Plessey Overseas Limited Equalizers
US5870605A (en) 1996-01-18 1999-02-09 Sun Microsystems, Inc. Middleware for enterprise information distribution
US5940372A (en) 1995-07-13 1999-08-17 International Business Machines Corporation Method and system for selecting path according to reserved and not reserved connections in a high speed packet switching network
US5974417A (en) 1996-01-18 1999-10-26 Sun Microsystems, Inc. Database network connectivity product
US5987455A (en) 1997-06-30 1999-11-16 International Business Machines Corporation Intelligent compilation of procedural functions for query processing systems
US6118786A (en) 1996-10-08 2000-09-12 Tiernan Communications, Inc. Apparatus and method for multiplexing with small buffer depth
US20010049743A1 (en) 2000-05-31 2001-12-06 International Business Machines Corporation Message transformation selection tool and method
US20020069244A1 (en) 1999-11-24 2002-06-06 John Blair Message delivery system billing method and apparatus
US6502213B1 (en) * 1999-08-31 2002-12-31 Accenture Llp System, method, and article of manufacture for a polymorphic exception handler in environment services patterns
US6510429B1 (en) 1998-04-29 2003-01-21 International Business Machines Corporation Message broker apparatus, method and computer program product
US20030067874A1 (en) 2001-10-10 2003-04-10 See Michael B. Central policy based traffic management
US6643682B1 (en) 1999-09-22 2003-11-04 International Business Machines Corporation Publish/subscribe data processing with subscription points for customized message processing
US6681220B1 (en) 1999-05-28 2004-01-20 International Business Machines Corporation Reduction and optimization of information processing systems
US20040039786A1 (en) 2000-03-16 2004-02-26 Horvitz Eric J. Use of a bulk-email filter within a system for classifying messages for urgency or importance
US6748583B2 (en) * 2000-12-27 2004-06-08 International Business Machines Corporation Monitoring execution of an hierarchical visual program such as for debugging a message flow
US6748555B1 (en) * 1999-09-09 2004-06-08 Microsoft Corporation Object-based software management
US20040196837A1 (en) 1996-08-26 2004-10-07 Tibor Cinkler Method for optimising a mostly optical network
US20050010765A1 (en) 2003-06-06 2005-01-13 Microsoft Corporation Method and framework for integrating a plurality of network policies
US6859438B2 (en) 1998-02-03 2005-02-22 Extreme Networks, Inc. Policy based quality of service
US6983463B1 (en) 1998-10-02 2006-01-03 Microsoft Corporation Network independent profiling of applications for automatic partitioning and distribution in a distributed computing environment
US7010538B1 (en) 2003-03-15 2006-03-07 Damian Black Method for distributed RDSMS
US20060067231A1 (en) 2004-09-27 2006-03-30 Matsushita Electric Industrial Co., Ltd. Packet reception control device and method
US20060195896A1 (en) 2004-12-22 2006-08-31 Wake Forest University Method, systems, and computer program products for implementing function-parallel network firewall
US20060200333A1 (en) 2003-04-10 2006-09-07 Mukesh Dalal Optimizing active decision making using simulated decision making
US20060294219A1 (en) 2003-10-03 2006-12-28 Kazuki Ogawa Network system based on policy rule
US20070002750A1 (en) 2005-07-01 2007-01-04 Nec Laboratories America, Inc. Generic Real Time Scheduler for Wireless Packet Data Systems
US7162524B2 (en) * 2002-06-21 2007-01-09 International Business Machines Corporation Gapless delivery and durable subscriptions in a content-based publish/subscribe system
US7177859B2 (en) 2002-06-26 2007-02-13 Microsoft Corporation Programming model for subscription services
US20070116822A1 (en) 2005-11-23 2007-05-24 The Coca-Cola Company High-potency sweetener composition with saponin and compositions sweetened therewith
US7349980B1 (en) 2003-01-24 2008-03-25 Blue Titan Software, Inc. Network publish/subscribe system incorporating Web services network routing architecture
US7360202B1 (en) 2002-06-26 2008-04-15 Microsoft Corporation User interface system and methods for providing notification(s)
US7392279B1 (en) 1999-03-26 2008-06-24 Cisco Technology, Inc. Network traffic shaping using time-based queues
US7406537B2 (en) 2002-11-26 2008-07-29 Progress Software Corporation Dynamic subscription and message routing on a topic between publishing nodes and subscribing nodes
US20080209440A1 (en) 2004-05-07 2008-08-28 Roman Ginis Distributed messaging system supporting stateful subscriptions
US20080239951A1 (en) 2006-06-27 2008-10-02 Robert Evan Strom Method for applying stochastic control optimization for messaging systems
US20080244025A1 (en) 2004-05-07 2008-10-02 Roman Ginis Continuous feedback-controlled deployment of message transforms in a distributed messaging system
US20080301053A1 (en) 2007-05-29 2008-12-04 Verizon Services Organization Inc. Service broker
US20090187641A1 (en) 2006-03-29 2009-07-23 Cong Li Optimization of network protocol options by reinforcement learning and propagation

Patent Citations (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5091918A (en) 1988-03-05 1992-02-25 Plessey Overseas Limited Equalizers
US5940372A (en) 1995-07-13 1999-08-17 International Business Machines Corporation Method and system for selecting path according to reserved and not reserved connections in a high speed packet switching network
US5870605A (en) 1996-01-18 1999-02-09 Sun Microsystems, Inc. Middleware for enterprise information distribution
US5974417A (en) 1996-01-18 1999-10-26 Sun Microsystems, Inc. Database network connectivity product
US6021443A (en) 1996-01-18 2000-02-01 Sun Microsystems, Inc. Systems, software, and methods for routing events among publishers and subscribers on a computer network
US20040196837A1 (en) 1996-08-26 2004-10-07 Tibor Cinkler Method for optimising a mostly optical network
US6421359B1 (en) 1996-10-08 2002-07-16 Tiernan Communications, Inc. Apparatus and method for multi-service transport multiplexing
US6118786A (en) 1996-10-08 2000-09-12 Tiernan Communications, Inc. Apparatus and method for multiplexing with small buffer depth
US5987455A (en) 1997-06-30 1999-11-16 International Business Machines Corporation Intelligent compilation of procedural functions for query processing systems
US6859438B2 (en) 1998-02-03 2005-02-22 Extreme Networks, Inc. Policy based quality of service
US6510429B1 (en) 1998-04-29 2003-01-21 International Business Machines Corporation Message broker apparatus, method and computer program product
US6983463B1 (en) 1998-10-02 2006-01-03 Microsoft Corporation Network independent profiling of applications for automatic partitioning and distribution in a distributed computing environment
US7392279B1 (en) 1999-03-26 2008-06-24 Cisco Technology, Inc. Network traffic shaping using time-based queues
US6681220B1 (en) 1999-05-28 2004-01-20 International Business Machines Corporation Reduction and optimization of information processing systems
US6996625B2 (en) 1999-05-28 2006-02-07 International Business Machines Corporation Reduction and optiminization of operational query expressions applied to information spaces between nodes in a publish/subscribe system
US20040107290A1 (en) 1999-05-28 2004-06-03 International Business Machines Corporation Reduction and optimization of information processing systems
US6502213B1 (en) * 1999-08-31 2002-12-31 Accenture Llp System, method, and article of manufacture for a polymorphic exception handler in environment services patterns
US6748555B1 (en) * 1999-09-09 2004-06-08 Microsoft Corporation Object-based software management
US6643682B1 (en) 1999-09-22 2003-11-04 International Business Machines Corporation Publish/subscribe data processing with subscription points for customized message processing
US20020069244A1 (en) 1999-11-24 2002-06-06 John Blair Message delivery system billing method and apparatus
US20040039786A1 (en) 2000-03-16 2004-02-26 Horvitz Eric J. Use of a bulk-email filter within a system for classifying messages for urgency or importance
US20010049743A1 (en) 2000-05-31 2001-12-06 International Business Machines Corporation Message transformation selection tool and method
US6748583B2 (en) * 2000-12-27 2004-06-08 International Business Machines Corporation Monitoring execution of an hierarchical visual program such as for debugging a message flow
US20030067874A1 (en) 2001-10-10 2003-04-10 See Michael B. Central policy based traffic management
US7162524B2 (en) * 2002-06-21 2007-01-09 International Business Machines Corporation Gapless delivery and durable subscriptions in a content-based publish/subscribe system
US7177859B2 (en) 2002-06-26 2007-02-13 Microsoft Corporation Programming model for subscription services
US7360202B1 (en) 2002-06-26 2008-04-15 Microsoft Corporation User interface system and methods for providing notification(s)
US7406537B2 (en) 2002-11-26 2008-07-29 Progress Software Corporation Dynamic subscription and message routing on a topic between publishing nodes and subscribing nodes
US7349980B1 (en) 2003-01-24 2008-03-25 Blue Titan Software, Inc. Network publish/subscribe system incorporating Web services network routing architecture
US7010538B1 (en) 2003-03-15 2006-03-07 Damian Black Method for distributed RDSMS
US20060200333A1 (en) 2003-04-10 2006-09-07 Mukesh Dalal Optimizing active decision making using simulated decision making
US20050010765A1 (en) 2003-06-06 2005-01-13 Microsoft Corporation Method and framework for integrating a plurality of network policies
US20060294219A1 (en) 2003-10-03 2006-12-28 Kazuki Ogawa Network system based on policy rule
US20080209440A1 (en) 2004-05-07 2008-08-28 Roman Ginis Distributed messaging system supporting stateful subscriptions
US20080244025A1 (en) 2004-05-07 2008-10-02 Roman Ginis Continuous feedback-controlled deployment of message transforms in a distributed messaging system
US20060067231A1 (en) 2004-09-27 2006-03-30 Matsushita Electric Industrial Co., Ltd. Packet reception control device and method
US20060195896A1 (en) 2004-12-22 2006-08-31 Wake Forest University Method, systems, and computer program products for implementing function-parallel network firewall
US20070002750A1 (en) 2005-07-01 2007-01-04 Nec Laboratories America, Inc. Generic Real Time Scheduler for Wireless Packet Data Systems
US20070116822A1 (en) 2005-11-23 2007-05-24 The Coca-Cola Company High-potency sweetener composition with saponin and compositions sweetened therewith
US20090187641A1 (en) 2006-03-29 2009-07-23 Cong Li Optimization of network protocol options by reinforcement learning and propagation
US20080239951A1 (en) 2006-06-27 2008-10-02 Robert Evan Strom Method for applying stochastic control optimization for messaging systems
US20080301053A1 (en) 2007-05-29 2008-12-04 Verizon Services Organization Inc. Service broker

Non-Patent Citations (25)

* Cited by examiner, † Cited by third party
Title
Abadi et al., "Aurora: a new model and architecture for data stream management", The VLDB Journal-The International Journal on Very Large Data Bases, vol. 12, Issue 2, Aug. 2003. pp. 1-20.
Babcock et al., "Distributed Top-K Monitoring", SIGMOD 2003, Jun. 2003, San Diego, CA, ACM 1-58113-634-X/03/06, pp. 1-12.
Babcock et al., "Models and Issues in Data Stream Systems", Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of database systems, Mar. 9, 2002, pp. 1-30.
Babu et al., "Continuous Queries over Data Streams", Stanford University, vol. 30, Issue 3, Sep. 2001, 12 pages.
Bertsekas, "Dynamic Programming and Optimal Control", vol. II, Athena Scientific, 1995, pp. 184, 186, 203, 204 and 207.
Bhola et al., "Exactly-Once Delivery in a Content-based Publish Subscribe System", 2002, Proceedings of the International Conference on Dependable Systems and Network (DSN'02). *
Carney et al., "Monitoring Streams-A New Class of Data Management Applications", Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002, 12 pages.
Carney et al., "Monitoring Streams-A New Class of DBMS Applications", Technical Report CS-02-1, Department of Computer Science, Brown University, Feb. 2002, pp. 1-25.
Chandrasekaran et al., "Streaming Queries over Streaming Data", Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002, 12 pages.
Cherniack et al., "Scalable Distributed Stream Processing", Proceedings of the 1st Blennial Conference on Innovative Data Systems Research (CIDR), 2003, 12 pages.
Datar et al., "Maintaining Stream Statistics over Sliding Windows", Stanford University, Proceedings of the thirteenth annual ACM-SIAM Symposium on Discrete algorithms, ISBN: 0-89871-513-X, Jul. 30, 2001, pp. 1-10 and Appendix i-iii.
Gehrke et al., "On Computing Correlated Aggregates Over Continual Data Streams", Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, May 21-24, 2001, 12 pages.
Ginis et al., Continuous Feedback-Controlled Deployment of Message Transforms in a Distributed Messaging System, May 7, 2004.
Ginis et al., Distributed Messaging System Supporting Stateful Subscriptions, May 7, 2004.
Hwang et al., "A Comparison of Stream-Oriented High-Availability Algorithms", Brown University, http://www.cs.brown.edu/publications/techreports/reports/CS-03-17.html, Sep. 2003. pp. 1-13.
Jin et al., "Relational Subscription Middleware for Internet-Scale Publish-Subscribe", DEBS 2003, ACM: 1-58113-843-1, Jun. 8, 2003, pp. 1-8.
Kang et al., "Evaluating Window Joins over Unbounded Streams", Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002. 12 pages.
Madden et al., "Continuously Adaptive Continuous Queries over Streams", Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, ISBM: 1-58113-497-5, Jun. 4-6, 2002, 12 pages.
Madden et al., "Fjording the Stream: An Architecture for Queries over Streaming Sensor Data", Proceedings of the 18th International Conference on Data Engineering, Jun. 26, 2001. pp. 1-25.
Motwani et al., "Query Processing, Approximation, and Resource Management in a Data Stream Management System", Stanford University, Proceedings of the 1st Biennial Conference on Innovative Data Systems Research (CIDR), 2003, pp. 1-12.
Shah et al., "Flux: An Adaptive Partitioning Operator for Continuous Query Systems", Technical Report CSD-2-1205, http://techreports.1ib.berkeley.edu/accessPages/CSD-02-1205.html, University of California, Berkeley, California, Nov. 15, 2002, pp. 1-15.
U.S. Appl. No. 11/475,708, filed Jun. 27, 2006, Strom.
USPTO Notice of allowance for U.S. Appl. No. 12/115,742 dated May 17, 2010.
USPTO office action for U.S. Appl. No. 10/841,297 dated Aug. 20, 2008.
USPTO Office action for U.S. Appl. No. 11/475,708 dated Apr. 17, 2009.

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080209440A1 (en) * 2004-05-07 2008-08-28 Roman Ginis Distributed messaging system supporting stateful subscriptions
US8533742B2 (en) 2004-05-07 2013-09-10 International Business Machines Corporation Distributed messaging system supporting stateful subscriptions
US11037175B2 (en) 2004-10-08 2021-06-15 Refinitiv Us Organization Llc Method and system for monitoring an issue
US10748158B2 (en) 2004-10-08 2020-08-18 Refinitiv Us Organization Llc Method and system for monitoring an issue
US20070061366A1 (en) * 2005-09-09 2007-03-15 Oden Insurance Services, Inc. Subscription apparatus and method
US10825029B2 (en) * 2005-09-09 2020-11-03 Refinitiv Us Organization Llc Subscription apparatus and method
US20100049821A1 (en) * 2008-08-21 2010-02-25 Tzah Oved Device, system, and method of distributing messages
US8108538B2 (en) * 2008-08-21 2012-01-31 Voltaire Ltd. Device, system, and method of distributing messages
US20120096105A1 (en) * 2008-08-21 2012-04-19 Voltaire Ltd. Device, system, and method of distributing messages
US8244902B2 (en) * 2008-08-21 2012-08-14 Voltaire Ltd. Device, system, and method of distributing messages
US9766874B2 (en) 2014-01-09 2017-09-19 Ford Global Technologies, Llc Autonomous global software update
US9524156B2 (en) 2014-01-09 2016-12-20 Ford Global Technologies, Llc Flexible feature deployment strategy
US9323546B2 (en) 2014-03-31 2016-04-26 Ford Global Technologies, Llc Targeted vehicle remote feature updates
US9716762B2 (en) 2014-03-31 2017-07-25 Ford Global Technologies Llc Remote vehicle connection status
US10140110B2 (en) 2014-04-02 2018-11-27 Ford Global Technologies, Llc Multiple chunk software updates
US9325650B2 (en) 2014-04-02 2016-04-26 Ford Global Technologies, Llc Vehicle telematics data exchange
US9674127B2 (en) 2014-09-23 2017-06-06 International Business Machines Corporation Selective message republishing to subscriber subsets in a publish-subscribe model
US9654433B2 (en) 2014-09-23 2017-05-16 International Business Machines Corporation Selective message republishing to subscriber subsets in a publish-subscribe model
US10303558B2 (en) * 2015-04-17 2019-05-28 Microsoft Technology Licensing, Llc Checkpointing higher order query operators

Also Published As

Publication number Publication date
US20050268146A1 (en) 2005-12-01

Similar Documents

Publication Publication Date Title
US7886180B2 (en) Recovery in a distributed stateful publish-subscribe system
US8533742B2 (en) Distributed messaging system supporting stateful subscriptions
US7962646B2 (en) Continuous feedback-controlled deployment of message transforms in a distributed messaging system
US10360124B2 (en) Dynamic rate adjustment for interaction monitoring
US7162524B2 (en) Gapless delivery and durable subscriptions in a content-based publish/subscribe system
US7890955B2 (en) Policy based message aggregation framework
US7698251B2 (en) Fault tolerant facility for the aggregation of data from multiple processing units
US9450849B1 (en) Trace backtracking in distributed systems
US7822801B2 (en) Subscription propagation in a high performance highly available content-based publish/subscribe system
US8386633B2 (en) Method and system for processing raw financial data streams to produce and distribute structured and validated product offering data to subscribing clients
US7958025B2 (en) Method and system for processing raw financial data streams to produce and distribute structured and validated product offering objects
US20060015624A1 (en) Method and system for processing financial data objects carried on broadcast data streams and delivering information to subscribing clients
US11954125B2 (en) Partitioned backing store implemented in a distributed database
US9634920B1 (en) Trace deduplication and aggregation in distributed systems
US20030225857A1 (en) Dissemination bus interface
US7954112B2 (en) Automatic recovery from failures of messages within a data interchange
US20070180059A1 (en) Light weight software and hardware inventory
US20220197891A1 (en) Parallel audit cycles between primary and secondary event feeds
EP1323087A1 (en) System for processing raw financial data to produce validated product offering information to subscribers
US12079087B2 (en) Systems and methods for failure recovery in at-most-once and exactly-once streaming data processing
US12061608B2 (en) Duplicate detection and replay to ensure exactly-once delivery in a streaming pipeline
US11681569B2 (en) Streaming data pipeline with batch performance
US11947542B2 (en) Certifying events in a streaming pipeline
US20230370521A1 (en) Blockchain machine broadcast protocol with loss recovery
Toader et al. Modelling a reliable distributed system based on the management of replication processes

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JIN, YUHUI;STROM, ROBERT EVAR;REEL/FRAME:014672/0271;SIGNING DATES FROM 20040511 TO 20040513

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JIN, YUHUI;STROM, ROBERT EVAR;SIGNING DATES FROM 20040511 TO 20040513;REEL/FRAME:014672/0271

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20150208