US20130318084A1 - Processing structured and unstructured data using offload processors - Google Patents
Processing structured and unstructured data using offload processors Download PDFInfo
- Publication number
- US20130318084A1 US20130318084A1 US13/900,303 US201313900303A US2013318084A1 US 20130318084 A1 US20130318084 A1 US 20130318084A1 US 201313900303 A US201313900303 A US 201313900303A US 2013318084 A1 US2013318084 A1 US 2013318084A1
- Authority
- US
- United States
- Prior art keywords
- memory
- modules
- processing
- data
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06F17/30312—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1652—Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4282—Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
-
- G06F17/3061—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/70—Virtual switches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0272—Virtual private networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0281—Proxies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/16—Implementing security features at a particular protocol layer
- H04L63/168—Implementing security features at a particular protocol layer above the transport layer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1009—Address translation using page tables, e.g. page table structures
- G06F12/1018—Address translation using page tables, e.g. page table structures involving hashing techniques, e.g. inverted page tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1081—Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/34—Signalling channels for network management communication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present disclosure relates generally to servers capable of efficiently processing structured and unstructured data. More particularly, systems supporting offload or auxiliary processing modules that can be physically connected to a system memory bus to process data independent of a host processor of the server are described.
- Enterprises store and process their large amounts of data in a variety of ways.
- One manner in which enterprises store data is by using relational databases and corresponding relational database management systems (RDBMS).
- RDBMS relational database management systems
- structured data may be collected, normalized, formatted and stored in an RDBMS.
- Tools based on standardized data languages such as the Structured Query Language (SQL) may be used for accessing and processing structured data.
- SQL Structured Query Language
- it is estimated that such formatted structured data represents only a tiny fraction of an enterprise's stored data.
- Organizations are becoming increasingly aware that substantial information and knowledge resides in unstructured data (i.e., “Big Data”) repositories. Accordingly, simple and effective access to both structured and unstructured data are seen as necessary for maximizing the value of enterprise informational resources.
- SAN In-memory processing and Storage Area Network (SAN)-like architectures are used for traditional SQL queries, while commodity or shared nothing architectures (each computing node, consisting of a processor, local memory, and disk resources, shares nothing with other nodes in the computing cluster) are usually used for processing unstructured data.
- An architecture that supports both structured and unstructured queries can better handle current and emerging Big Data applications.
- Methods can include processing structured and/or unstructured data.
- a method of processing structured data can include providing an in-memory database with at least one of a plurality of modules connected to a memory bus in a first server; executing database functions with at least one processor on the module; and connecting a central processing unit (CPU) in the first server to the modules by the memory bus, and directing database queries to the at least one module.
- CPU central processing unit
- a method of processing unstructured data can include providing a plurality of modules connected to a memory bus that each include at least one processor; connecting a central processing unit (CPU) to the modules by the memory bus; executing data processing tasks with the CPU; and directing parallel computation tasks to a plurality of the modules.
- CPU central processing unit
- FIG. 1 shows illustrates an embodiment suitable to process structured queries.
- FIGS. 2-1 and 2 - 2 are diagrams showing the workflow and distributed architecture used to implement a data processing software.
- FIG. 3-1 is flow diagram showing a data processing method according to an embodiment.
- FIG. 3-2 shows a data processing architecture according to an embodiment.
- FIG. 4-1 shows a cartoon schematically illustrating a data processing system according to an embodiment, including a removable computation module for offload of data processing.
- FIG. 4-2 shows an example layout of an in-line module (referred to as a “XIMM”) module according to an embodiment.
- XIMM in-line module
- FIG. 4-3 shows two possible architectures for a XIMM in a simulation (Xockets MAX and MIN).
- FIG. 4-4 shows a representative the power budget for a XIMMs according to various embodiments.
- FIG. 4-5 illustrates data flow operation of one embodiment using an ARM A9 architecture.
- Data processing and analytics for enterprise server or cloud based data systems can be efficiently implemented on offload processing modules connected to a memory bus, for example, by insertion into a socket for a Dual In-line Memory Module (DIMM).
- DIMM Dual In-line Memory Module
- Such modules can be referred to as XocketTM In-line Memory Modules (XIMMs), and can have multiple “wimpy” cores associated with a memory channel.
- XIMMs XocketTM In-line Memory Modules
- XIMMs XocketTM In-line Memory Modules
- Such systems as a whole are able to handle large database searches at a very low power when compared to traditional high power “brawny” server cores.
- a XIMM based architecture capable of partitioning tasks is able to greatly improve data analytic performance.
- FIG. 1 illustrates an embodiment to process structured queries according to an embodiment.
- FIG. 1 depicts two commodity computers which can be rack servers ( 140 , 140 a ), each of which includes at least one central processing unit ( 108 ) and a set of offload processors ( 112 a to 112 c ).
- the servers ( 140 , 140 a ) are preferably present in the same rack and connected by a top of rack (TOR) switch ( 120 ).
- the servers ( 140 , 140 a ) are suited for hosting a distributed and shared in-memory assembly ( 110 a , 110 b ) that can handle and respond to structured queries as one large in-memory system.
- an in-memory database that handles structured queries looks for the query in its physical memory, and in the absence of the query therein will perform a disk read to a database.
- the disk read might result in the page cache of the kernel being populated with the disk file.
- the process might perform a complete copy of the page(s) (in a paged memory system) into its process buffer or it might perform a mmap operation, wherein a pointer to the entry in the page cache storing the pages is created and stored in the heap of the process. The latter is less time consuming and more efficient than the former.
- XIMM supported data retrieval framework can effectively extend the overall available size of the in-memory space available with the assembly.
- a structured query is being handled by one of the said CPUs ( 108 ), and the CPU is unable to retrieve it from its main memory, it will immediately trap into a memory map (mmap routine) that will handle disk reads and populate the page cache.
- the embodiment described herein can modify the mmap routine of standard operating systems to execute code corresponding to a driver for a removable computation module driver, in this case a XIMM driver.
- the XIMM driver in turn identifies the query and transfers the query to one or more XIMMs in the form of memory reads/writes.
- a XIMM can house a plurality of offload processors ( 112 a to 112 c ) that can receive the memory read/write commands containing the structured query.
- One or more of the offload processors ( 112 a to 112 c ) can perform a search for the query in its available cache and local memory and return the result.
- an mmap query can further be modified to allow the transference of the query to XIMMs that are not in the same server but in the same rack.
- the mmap abstraction in such a case can perform a remote direct memory access (RDMA) or a similar network memory read for accessing data present in a XIMM that is in the same rack.
- RDMA remote direct memory access
- the latency of a system can be the combined latency of the network interface cards (NICs) of the two servers (i.e., 100 in 140 and 100 in 140 a ), the latency of the TOR switch 120 , and a response time of the second XIMM.
- NICs network interface cards
- embodiments can provide a system that can have a much lower latency than a read from a hard disk drive.
- the described architecture can provide orders of magnitude improvement over a single structured query in-memory database. By allowing for non-frequently used data of one of the servers to be pushed to another XIMM located in the same rack, the effective in-memory space can be increased beyond conventional limits, and a large assembly having a large physical, low latency memory space can be created.
- This embodiment may further be improved by allowing for transparent sharing of pages across multiple XIMMs.
- Embodiments can further improve the latency between server-to-server connections by mediating them by XIMMs.
- the XIMMs can act as intelligent switches to offload and bypass the TOR switch.
- FIGS. 2-1 and 2 - 2 show the workflow and distributed architecture used to implement a data processing software, such as Hadoop.
- Hadoop MapReduce workloads may be broken into Split, Map, Shuffle, Reduce and Merge phases.
- the input file ( 204 ) is fetched from a file system such as Hadoop Distributed File System (HDFS) ( 202 ) and divided into smaller pieces referred to as Splits.
- Each Split ( 206 ) is a contiguous range of the input.
- Each record in each input split is independently passed to a Map function run by a Mapper ( 208 ) hosted on a processor.
- the Map function accepts a single record as an input and produces zero or more output records, each of which contains a key and a value.
- the results from the Map functions are rearranged such that all values with the same key are grouped together.
- the Reduce function run by a Reducer ( 212 ) takes in a key value and a list values as the input and produces another list as the output.
- the output records from Reduce functions are merged to form an Output file ( 216 ) which is then stored in the file system ( 202 ).
- Map and Reduce tasks are computationally intensive and have very tight dependency on each other. While Map tasks are small and independent tasks that can run in parallel, the Reduce tasks include fetching intermediate (key, value) pairs that result from each Map function, sorting and merging intermediate results according to keys and applying Reduce functions to the sorted intermediate results. Reducer ( 212 ) can perform the Reduce function only after it receives the intermediate results from all the Mappers ( 208 ). Thus, the Shuffle step ( 210 ) (communicating the Map results to Reducers) often becomes the bottleneck in Hadoop workloads and introduces latency.
- FIG. 2-2 shows a typical computing cluster used to implement Hadoop.
- a Master Node ( 222 ) runs a JobTracker ( 224 ) which organizes the cluster's activities.
- Each of the Worker Nodes ( 326 ) run a TaskTracker ( 228 ) which organizes the worker node's activities. All input jobs are organized into sequential tiers of tasks ( 230 ). The tasks could be map tasks or reduce tasks.
- the TaskTracker ( 228 ) runs a number of map and reduce tasks concurrently and pulls new tasks from the JobTracker ( 224 ) as soon as the old tasks are completed.
- the Hadoop Map-Reduce layer stores intermediate data ( 232 ) produced by the map and reduce tasks in the Hadoop Distributed File System ( 202 ) discussed in FIG. 2-1 .
- HDFS ( 202 ) is designed to provide high streaming throughput to large, write-once-read-many-times files.
- a XIMM based architecture can improve Hadoop (or similar data processing) performance in two ways. Firstly, intrinsically parallel computational tasks can be allocated to the XIMMs (e.g., modules with offload processors), leaving the number crunching tasks to “brawny” (e.g., x86) cores. This is illustrated in more detail in FIG. 3-1 .
- intrinsically parallel computational tasks can be allocated to the XIMMs (e.g., modules with offload processors), leaving the number crunching tasks to “brawny” (e.g., x86) cores. This is illustrated in more detail in FIG. 3-1 .
- FIG. 3-1 is flow diagram showing a data processing method according to an embodiment.
- a method can start ( 302 ) and input data can be fetched from a file system ( 304 ). In particular embodiments this can include a Direct Memory Access (DMA) operation from an HDFS. In some embodiments, all DMA operations are engineered such that all the parallel Map steps are equitably served with data.
- the input data can be partitioned into splits in ( 306 ) and parsed into records that contain initial (key, value) pairs in ( 308 ). Since the tasks involved in steps ( 304 , 306 and 308 ) are computationally light, they can be performed all, or in part, by the offload processors hosted on the XIMMs (e.g., by wimpy cores).
- Map operations can then be performed on the initial (key, value) pairs ( 310 ).
- the intermediate (key, value) pairs that result from the Map operations can then be communicated to the Reducers ( 312 ).
- Reduce operations can be performed ( 314 ).
- the results from the Reducers are merged into a single output file and written back to the file system (e.g., 302 ) by offload processors ( 316 ). Since Map and Reduce functions are computationally intensive, the steps ( 310 and 314 ) can be handled by a CPU (e.g., by brawny cores). Such distribution of workloads to processor cores that are favorably disposed to perform them can reduce latency and/or increases efficiency.
- a XIMM based architecture can reduce the intrinsic bottleneck of most Hadoop (or similar data processing) workloads (the Shuffle phase) by driving the I/O backplane to its full capacity.
- the TaskTracker ( 228 ) described in FIG. 2-2 serves hyper-text transport protocol (HTTP) GET requests to communicate Map results to Reduce inputs.
- HTTP hyper-text transport protocol
- the Mappers and Reducers hosted by the individual server CPUs have to communicate using their corresponding top of rack (TOR) switches of the corresponding racks. This slows down the process. Further, the Reducers hosted by the CPU stay idle while the results from Map operations are collected and sorted, introducing further computational inefficiency.
- FIG. 3-2 shows a data processing architecture according to an embodiment.
- operational modules are labeled to reflect their primary operations as previously noted with respect to FIG. 3-1 .
- a publish-subscribe model is used to perform the Shuffling phase as shown in FIG. 3-2 .
- the results from the Map step ( 310 ) are already stored and available in main memory, they can be collected through DMA operations ( 312 a ) and parsed ( 312 b ) by XIMMs. Since Map results are not required to be written to disk, latency can be reduced. Further, the XIMMs incorporated into the individual servers can define a switch fabric that can be used for Shuffling.
- the mid-plane defined by the XIMM based switch fabric is capable of driving and receiving the full 240 Gbps capacity of the PCI-3.0 bus and thus offers better speed and bandwidth compared to HTTP.
- the key is published through the massively parallel I/O mid-plane defined by the XIMMs ( 312 a ). Subscriptions are identified based on the keys and are directed to the Reducers hosted on the CPU through virtual interrupts ( 312 b ). Thus, rack-level locality and aggregation that are typical of conventional Hadoop systems are no longer required in the XIMM based architecture. Instead, intermediate (key, value) pairs are exchanged by all the computing nodes across several different servers through intelligent virtual switching of the XIMMs resulting in efficient processing of Hadoop workloads.
- FIG. 4-1 is a cartoon schematically illustrating a data processing system 400 including a removable computation module 402 for offload of data processing from x86 or similar main/server processors 403 to modules connected to a memory bus 403 .
- modules 402 can be XIMM modules, as described herein or equivalents, and can have multiple computation elements that can be referred to as “offload processors” because they offload various “light touch” processing tasks such HTML, video, packet level services, security, or data analytics. This is of particular advantage for applications that require frequent random access or application context switching, since many server processors incur significant power usage or have data throughput limitations that can be greatly reduced by transfer of the computation to lower power and more memory efficient offload processors.
- the computation elements or offload processors can be accessible through memory bus 405 .
- the module can be inserted into a Dual Inline Memory Module (DIMM) slot on a commodity computer or server using a DIMM connector ( 407 ), providing a significant increase in effective computing power to system 400 .
- the module e.g., XIMM
- the module may communicate with other components in the commodity computer or server via one of a variety of busses including but not limited to any version of existing double data rate standards (e.g., DDR, DDR2, DDR3, etc.)
- This illustrated embodiment of the module 402 contains five offload processors ( 400 a , 400 b , 400 c , 400 d , 400 e ) however other embodiments containing greater or fewer numbers of processors are contemplated.
- the offload processors ( 400 a to 400 e ) can be custom manufactured or one of a variety of commodity processors including but not limited to field-programmable grid arrays (FPGA), microprocessors, reduced instruction set computers (RISC), microcontrollers or ARM processors.
- the computation elements or offload processors can include combinations of computational FPGAs such as those based on Altera, Xilinx (e.g., ArtixTM class or Zynq® architecture, e.g., Zynq® 7020), and/or conventional processors such as those based on Intel Atom or ARM architecture (e.g., ARM A9).
- ARM processors having advanced memory handling features such as a snoop control unit (SCU) are preferred, since this can allow coherent read and write of memory.
- SCU snoop control unit
- Other preferred advanced memory features can include processors that support an accelerator coherency port (ACP) that can allow for coherent supplementation of the cache through an FPGA fabric or computational element.
- ACP accelerator coherency port
- Each offload processor ( 400 a to 400 e ) on the module 402 may run one of a variety of operating systems including but not limited to Apache or Linux.
- the offload processors ( 400 a to 400 e ) may have access to a plurality of dedicated or shared storage methods.
- each offload processor can connect to one or more storage units (in this embodiments, pairs of storage units 404 a , 404 b , 404 c and 404 d ).
- Storage units ( 404 a to 404 d ) can be of a variety of storage types, including but not limited to random access memory (RAM), dynamic random access memory (DRAM), sequential access memory (SAM), static random access memory (SRAM), synchronous dynamic random access memory (SDRAM), reduced latency dynamic random access memory (RLDRAM), flash memory, or other emerging memory standards such as those based on DDR4 or hybrid memory cubes (HMC).
- RAM random access memory
- DRAM dynamic random access memory
- SAM sequential access memory
- SRAM static random access memory
- SDRAM synchronous dynamic random access memory
- RLDRAM reduced latency dynamic random access memory
- flash memory or other emerging memory standards such as those based on DDR4 or hybrid memory cubes (HMC).
- FIG. 4-2 shows an example layout of a module (e.g., XIMM) such as that described in FIG. 4-1 , as well as a connectivity diagram between the components of the module.
- a module e.g., XIMM
- five XilinxTM Zynq® 7020 416 a , 416 b , 416 c , 416 d , 416 e and 416 in the connectivity diagram
- SoC programmable systems-on-a-chip
- These offload processors can communicate with each other using memory-mapped input-output (MMIO) ( 412 ).
- MMIO memory-mapped input-output
- the types of storage units used in this example are SDRAM (SD, one shown as 408 ) and RLDRAM (RLD, three shown as 406 a , 406 b , 406 c ) and an InphiTM iMB02 memory buffer 418 .
- Down conversion of 3.3 V to 2.5 volt is required to connect the RLDRAM ( 406 a to 406 c ) with the Zynq® components.
- the components are connected to the offload processors and to each other via a DDR3 ( 414 ) memory bus.
- the indicated layout maximizes memory resources availability without requiring a violation of the number of pins available under the DIMM standard.
- one of the Zynq® computational FPGAs ( 416 a to 416 e ) can act as arbiter providing a memory cache, giving an ability to have peer to peer sharing of data (via memcached or OMQ memory formalisms) between the other Zynq® computational FPGAs ( 416 a to 416 e ). Traffic departing for the computational FPGAs can be controlled through memory mapped I/O.
- the arbiter queues session data for use, and when a computational FPGA asks for address outside of the provided session, the arbiter can be the first level of retrieval, external processing determination, and predictors set.
- FIG. 4-3 shows two possible architectures for a module (e.g., XIMM) in a simulation (Xockets MAX and MIN).
- Xockets MAX and MIN Xockets MAX and MIN
- Xockets MAX and MIN Xockets MAX and MIN
- Xockets MAX and MIN Xockets MAX and MIN
- Xockets MIN 420 a
- Xockets MIN can be used in low-end public cloud servers, containing twenty ARM cores ( 420 b ) spread across fourteen DIMM slots in a commodity server which has two Opteron x86 processors and two network interface cards (NICs) ( 420 c ).
- This architecture can provide a minimal benefit per Watt of power used.
- Xockets MAX contains eighty ARM cores ( 422 b ) across eight DIMM slots, in a server with two Opteron x86 processors and four NICs ( 422 c ). This architecture can provide a maximum benefit per Watt of power used.
- FIG. 4-4 shows a representative power budget for an example of a module (e.g., XIMM) according to a particular embodiment.
- a module e.g., XIMM
- Each component is listed ( 424 a , 424 b , 424 c , 424 d ) along with its power profile.
- Average total and total wattages are also listed ( 426 a , 426 b ).
- module can have a low average power budget that is easily able to be provided by the 22 V dd pins per DIMM.
- the expected thermal output can be handled by inexpensive conductive heat spreaders, without requiring additional convective, conductive, or thermoelectric cooling.
- digital thermometers can be implemented to dynamically reduce performance (and consequent heat generation) if needed.
- a module 430 e.g., XIMM
- an ARM A9 architecture is illustrated with respect to FIG. 4-5 .
- Use of ARM A9 architecture in conjunction with an FPGA fabric and memory in this case shown as reduced latency DRAM (RLDRAM) 438 , can simplify or makes possible zero-overhead context switching, memory compression and CPI, in part by allowing hardware context switching synchronized with network queuing. In this way, there can be a one-to-one mapping between thread and queues.
- the ARM A9 architecture includes a Snoop Control Unit 432 (SCU). This unit allows one to read out and write in memory coherently.
- SCU Snoop Control Unit
- Accelerator Coherency Port 434 allows for coherent supplementation of the cache throughout the FPGA 436 .
- the RLDRAM 438 provides the auxiliary bandwidth to read and write the ping-pong cache supplement ( 435 ): Block1$ and Block2$ during packet-level meta-data processing.
- Table 1 illustrates potential states that can exist in the scheduling of queues/threads to XIMM processors and memory such as illustrated in FIG. 4-5 .
- a scheduler can require queued data from a network interface card (NIC) to continue scheduling the thread.
- NIC network interface card
- the maximum context size is assumed as data processed. In this way, a queue must be provisioned as the greater of computational resource and network bandwidth resource, for example, each as a ratio of an 800 MHz A9 and 3 Gbps of bandwidth.
- the ARM core is generally indicated to be worthwhile for computation having many parallel sessions (such that the hardware's prefetching of session-specific data and TCP/reassembly offloads a large portion of the CPU load) and those requiring minimal general purpose processing of data.
- Essentially zero-overhead context switching is also possible using modules as disclosed in FIG. 4-5 . Because per packet processing has minimum state associated with it, and represents inherent engineered parallelism, minimal memory access is needed, aside from packet buffering. On the other hand, after packet reconstruction, the entire memory state of the session can be accessed, and so can require maximal memory utility. By using the time of packet-level processing to prefetch the next hardware scheduled application-level service context in two different processing passes, the memory can always be available for prefetching. Additionally, the FPGA 436 can hold a supplemental “ping-pong” cache ( 435 ) that is read and written with every context switch, while the other is in use.
- the SCU 432 which allows one to read out and write in memory coherently, and ACP 434 for coherent supplementation of the cache throughout the FPGA 436 .
- the RLDRAM 438 provides for read and write to the ping-pong cache supplement ( 435 ): (shown as Block1$ and Block2$) during packet-level meta-data processing. In the embodiment shown, only locally terminating queues can prompt context switching.
- metadata transport code can relieve a main or host processor from tasks including fragmentation and reassembly, and checksum and other metadata services (e.g., accounting, IPSec, SSL, Overlay, etc.).
- L1 cache 437 can be filled during packet processing.
- the lock-down portion of a translation lookaside buffer (TLB) of an L1 cache can be rewritten with the addresses corresponding to the new context.
- TLB translation lookaside buffer
- the following four commands can be executed for the current memory space.
- TLB entries can be used by the XIMM stochastically.
- Bandwidths and capacities of the memories can be precisely allocated to support context switching as well as applications such as Openflow processing, billing, accounting, and header filtering programs.
- the ACP 434 can be used not just for cache supplementation, but hardware functionality supplementation, in part by exploitation of the memory space allocation.
- An operand can be written to memory and the new function called, through customizing specific Open Source libraries, so putting the thread to sleep and a hardware scheduler can validate it for scheduling again once the results are ready.
- OpenVPN uses the OpenSSL library, where the encrypt/decrypt functions 439 can be memory mapped. Large blocks are then available to be exported without delay, or consuming the L2 cache 440 , using the ACP 434 . Hence, a minimum number of calls are needed within the processing window of a context switch, improving overall performance.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Multi Processors (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
- Hardware Redundancy (AREA)
Abstract
Methods of processing structured data, unstructured data, or both are disclosed. Processing structured data can include providing an in-memory database with at least one of a plurality of modules connected to a memory bus of a server; executing database functions with at least one processor on the module; and directing database queries to the at least one module with a CPU of the server. Processing unstructured data can include executing data processing tasks with a CPU connected to a memory bus; and directing parallel computation tasks to a plurality of modules connected to the memory bus.
Description
- This application claims the benefit of U.S. Provisional Patent Application 61/650,373 filed May 22, 2012, the contents of which are incorporated by reference herein.
- The present disclosure relates generally to servers capable of efficiently processing structured and unstructured data. More particularly, systems supporting offload or auxiliary processing modules that can be physically connected to a system memory bus to process data independent of a host processor of the server are described.
- Enterprises store and process their large amounts of data in a variety of ways. One manner in which enterprises store data is by using relational databases and corresponding relational database management systems (RDBMS). Such data, usually referred to as structured data, may be collected, normalized, formatted and stored in an RDBMS. Tools based on standardized data languages such as the Structured Query Language (SQL) may be used for accessing and processing structured data. However, it is estimated that such formatted structured data represents only a tiny fraction of an enterprise's stored data. Organizations are becoming increasingly aware that substantial information and knowledge resides in unstructured data (i.e., “Big Data”) repositories. Accordingly, simple and effective access to both structured and unstructured data are seen as necessary for maximizing the value of enterprise informational resources.
- However, conventional platforms that are currently being used to handle structured and unstructured data can substantially differ in their architecture. In-memory processing and Storage Area Network (SAN)-like architectures are used for traditional SQL queries, while commodity or shared nothing architectures (each computing node, consisting of a processor, local memory, and disk resources, shares nothing with other nodes in the computing cluster) are usually used for processing unstructured data. An architecture that supports both structured and unstructured queries can better handle current and emerging Big Data applications.
- Methods can include processing structured and/or unstructured data. A method of processing structured data can include providing an in-memory database with at least one of a plurality of modules connected to a memory bus in a first server; executing database functions with at least one processor on the module; and connecting a central processing unit (CPU) in the first server to the modules by the memory bus, and directing database queries to the at least one module.
- A method of processing unstructured data can include providing a plurality of modules connected to a memory bus that each include at least one processor; connecting a central processing unit (CPU) to the modules by the memory bus; executing data processing tasks with the CPU; and directing parallel computation tasks to a plurality of the modules.
-
FIG. 1 shows illustrates an embodiment suitable to process structured queries. -
FIGS. 2-1 and 2-2 are diagrams showing the workflow and distributed architecture used to implement a data processing software. -
FIG. 3-1 is flow diagram showing a data processing method according to an embodiment. -
FIG. 3-2 shows a data processing architecture according to an embodiment. -
FIG. 4-1 shows a cartoon schematically illustrating a data processing system according to an embodiment, including a removable computation module for offload of data processing. -
FIG. 4-2 shows an example layout of an in-line module (referred to as a “XIMM”) module according to an embodiment. -
FIG. 4-3 shows two possible architectures for a XIMM in a simulation (Xockets MAX and MIN). -
FIG. 4-4 shows a representative the power budget for a XIMMs according to various embodiments. -
FIG. 4-5 illustrates data flow operation of one embodiment using an ARM A9 architecture. - Data processing and analytics for enterprise server or cloud based data systems, including both structured or unstructured data, can be efficiently implemented on offload processing modules connected to a memory bus, for example, by insertion into a socket for a Dual In-line Memory Module (DIMM). Such modules can be referred to as Xocket™ In-line Memory Modules (XIMMs), and can have multiple “wimpy” cores associated with a memory channel. Using one or more XIMMs it is possible to execute lightweight data processing tasks without intervention from a main server processor. As will be discussed, XIMM modules have high efficiency context switching, high parallelism, and can efficiently process large data sets. Such systems as a whole are able to handle large database searches at a very low power when compared to traditional high power “brawny” server cores. Advantageously, by accelerating implementation of MapReduce or similar algorithms on unstructured data and by providing high performance virtual shared disk for structured queries, a XIMM based architecture capable of partitioning tasks is able to greatly improve data analytic performance.
-
FIG. 1 illustrates an embodiment to process structured queries according to an embodiment.FIG. 1 depicts two commodity computers which can be rack servers (140, 140 a), each of which includes at least one central processing unit (108) and a set of offload processors (112 a to 112 c). The servers (140, 140 a) are preferably present in the same rack and connected by a top of rack (TOR) switch (120). The servers (140, 140 a) are suited for hosting a distributed and shared in-memory assembly (110 a, 110 b) that can handle and respond to structured queries as one large in-memory system. - Traditionally, an in-memory database that handles structured queries looks for the query in its physical memory, and in the absence of the query therein will perform a disk read to a database. The disk read might result in the page cache of the kernel being populated with the disk file. The process might perform a complete copy of the page(s) (in a paged memory system) into its process buffer or it might perform a mmap operation, wherein a pointer to the entry in the page cache storing the pages is created and stored in the heap of the process. The latter is less time consuming and more efficient than the former.
- The inclusion of a XIMM supported data retrieval framework can effectively extend the overall available size of the in-memory space available with the assembly. In case a structured query is being handled by one of the said CPUs (108), and the CPU is unable to retrieve it from its main memory, it will immediately trap into a memory map (mmap routine) that will handle disk reads and populate the page cache.
- The embodiment described herein can modify the mmap routine of standard operating systems to execute code corresponding to a driver for a removable computation module driver, in this case a XIMM driver. The XIMM driver in turn identifies the query and transfers the query to one or more XIMMs in the form of memory reads/writes. A XIMM can house a plurality of offload processors (112 a to 112 c) that can receive the memory read/write commands containing the structured query. One or more of the offload processors (112 a to 112 c) can perform a search for the query in its available cache and local memory and return the result. According to embodiments, an mmap query can further be modified to allow the transference of the query to XIMMs that are not in the same server but in the same rack. The mmap abstraction in such a case can perform a remote direct memory access (RDMA) or a similar network memory read for accessing data present in a XIMM that is in the same rack. As a second XIMM (i.e.,
offload processors 112 a to 112 c ofserver 140 a) is connected to a first XIMM (i.e.,offload processors 112 a to 112 c of server 140) through a top ofrack switch 120, the latency of a system according to an embodiment can be the combined latency of the network interface cards (NICs) of the two servers (i.e., 100 in 140 and 100 in 140 a), the latency of theTOR switch 120, and a response time of the second XIMM. - Despite the additional hops, embodiments can provide a system that can have a much lower latency than a read from a hard disk drive. The described architecture can provide orders of magnitude improvement over a single structured query in-memory database. By allowing for non-frequently used data of one of the servers to be pushed to another XIMM located in the same rack, the effective in-memory space can be increased beyond conventional limits, and a large assembly having a large physical, low latency memory space can be created. This embodiment may further be improved by allowing for transparent sharing of pages across multiple XIMMs. Embodiments can further improve the latency between server-to-server connections by mediating them by XIMMs. The XIMMs can act as intelligent switches to offload and bypass the TOR switch.
- Conventional data intensive computing platforms for handling large volumes of unstructured data can use a parallel computing approach combining multiple processors and disks in large commodity computing clusters connected with high-speed communications switches and networks. This can allow the data to be partitioned among the available computing resources and processed independently to achieve performance and scalability based on the amount of data. A variety of distributed architectures have been developed for data-intensive computing and several software frameworks have been proposed to process unstructured data. One such programming model for processing large data sets with a parallel, distributed algorithm on a multiple servers or clusters is commonly known as MapReduce. Hadoop is a popular open-source implementation of MapReduce that is widely used by enterprises for search of unstructured data.
-
FIGS. 2-1 and 2-2 show the workflow and distributed architecture used to implement a data processing software, such as Hadoop. Referring toFIG. 2-1 , Hadoop MapReduce workloads may be broken into Split, Map, Shuffle, Reduce and Merge phases. The input file (204) is fetched from a file system such as Hadoop Distributed File System (HDFS) (202) and divided into smaller pieces referred to as Splits. Each Split (206) is a contiguous range of the input. Each record in each input split is independently passed to a Map function run by a Mapper (208) hosted on a processor. The Map function accepts a single record as an input and produces zero or more output records, each of which contains a key and a value. In the Shuffle phase (210), the results from the Map functions, referred to as intermediate (key, value) pairs, are rearranged such that all values with the same key are grouped together. The Reduce function run by a Reducer (212) takes in a key value and a list values as the input and produces another list as the output. The output records from Reduce functions are merged to form an Output file (216) which is then stored in the file system (202). - Map and Reduce tasks are computationally intensive and have very tight dependency on each other. While Map tasks are small and independent tasks that can run in parallel, the Reduce tasks include fetching intermediate (key, value) pairs that result from each Map function, sorting and merging intermediate results according to keys and applying Reduce functions to the sorted intermediate results. Reducer (212) can perform the Reduce function only after it receives the intermediate results from all the Mappers (208). Thus, the Shuffle step (210) (communicating the Map results to Reducers) often becomes the bottleneck in Hadoop workloads and introduces latency.
-
FIG. 2-2 shows a typical computing cluster used to implement Hadoop. A Master Node (222) runs a JobTracker (224) which organizes the cluster's activities. Each of the Worker Nodes (326) run a TaskTracker (228) which organizes the worker node's activities. All input jobs are organized into sequential tiers of tasks (230). The tasks could be map tasks or reduce tasks. The TaskTracker (228) runs a number of map and reduce tasks concurrently and pulls new tasks from the JobTracker (224) as soon as the old tasks are completed. The Hadoop Map-Reduce layer stores intermediate data (232) produced by the map and reduce tasks in the Hadoop Distributed File System (202) discussed inFIG. 2-1 . HDFS (202) is designed to provide high streaming throughput to large, write-once-read-many-times files. - In an embodiment, a XIMM based architecture can improve Hadoop (or similar data processing) performance in two ways. Firstly, intrinsically parallel computational tasks can be allocated to the XIMMs (e.g., modules with offload processors), leaving the number crunching tasks to “brawny” (e.g., x86) cores. This is illustrated in more detail in
FIG. 3-1 . -
FIG. 3-1 is flow diagram showing a data processing method according to an embodiment. A method can start (302) and input data can be fetched from a file system (304). In particular embodiments this can include a Direct Memory Access (DMA) operation from an HDFS. In some embodiments, all DMA operations are engineered such that all the parallel Map steps are equitably served with data. The input data can be partitioned into splits in (306) and parsed into records that contain initial (key, value) pairs in (308). Since the tasks involved in steps (304, 306 and 308) are computationally light, they can be performed all, or in part, by the offload processors hosted on the XIMMs (e.g., by wimpy cores). Map operations can then be performed on the initial (key, value) pairs (310). The intermediate (key, value) pairs that result from the Map operations can then be communicated to the Reducers (312). Once the results from all the Map operations are available, Reduce operations can be performed (314). The results from the Reducers are merged into a single output file and written back to the file system (e.g., 302) by offload processors (316). Since Map and Reduce functions are computationally intensive, the steps (310 and 314) can be handled by a CPU (e.g., by brawny cores). Such distribution of workloads to processor cores that are favorably disposed to perform them can reduce latency and/or increases efficiency. - A XIMM based architecture, according to an embodiment, can reduce the intrinsic bottleneck of most Hadoop (or similar data processing) workloads (the Shuffle phase) by driving the I/O backplane to its full capacity. In a conventional Hadoop system, the TaskTracker (228) described in
FIG. 2-2 serves hyper-text transport protocol (HTTP) GET requests to communicate Map results to Reduce inputs. Also, the Mappers and Reducers hosted by the individual server CPUs have to communicate using their corresponding top of rack (TOR) switches of the corresponding racks. This slows down the process. Further, the Reducers hosted by the CPU stay idle while the results from Map operations are collected and sorted, introducing further computational inefficiency. -
FIG. 3-2 shows a data processing architecture according to an embodiment. In the architecture illustrated inFIG. 3-2 , operational modules are labeled to reflect their primary operations as previously noted with respect toFIG. 3-1 . In an embodiment, instead of using HTTP to communicate, a publish-subscribe model is used to perform the Shuffling phase as shown inFIG. 3-2 . Since the results from the Map step (310) are already stored and available in main memory, they can be collected through DMA operations (312 a) and parsed (312 b) by XIMMs. Since Map results are not required to be written to disk, latency can be reduced. Further, the XIMMs incorporated into the individual servers can define a switch fabric that can be used for Shuffling. The mid-plane defined by the XIMM based switch fabric is capable of driving and receiving the full 240 Gbps capacity of the PCI-3.0 bus and thus offers better speed and bandwidth compared to HTTP. The key is published through the massively parallel I/O mid-plane defined by the XIMMs (312 a). Subscriptions are identified based on the keys and are directed to the Reducers hosted on the CPU through virtual interrupts (312 b). Thus, rack-level locality and aggregation that are typical of conventional Hadoop systems are no longer required in the XIMM based architecture. Instead, intermediate (key, value) pairs are exchanged by all the computing nodes across several different servers through intelligent virtual switching of the XIMMs resulting in efficient processing of Hadoop workloads. - The following example(s) provide illustration and discussion of exemplary hardware and data processing systems suitable for implementation and operation of the foregoing discussed systems and methods. In particular hardware and operation of wimpy cores or computational elements connected to a memory bus and mounted in DIMM or other conventional memory socket is discussed.
-
FIG. 4-1 is a cartoon schematically illustrating adata processing system 400 including aremovable computation module 402 for offload of data processing from x86 or similar main/server processors 403 to modules connected to amemory bus 403.Such modules 402 can be XIMM modules, as described herein or equivalents, and can have multiple computation elements that can be referred to as “offload processors” because they offload various “light touch” processing tasks such HTML, video, packet level services, security, or data analytics. This is of particular advantage for applications that require frequent random access or application context switching, since many server processors incur significant power usage or have data throughput limitations that can be greatly reduced by transfer of the computation to lower power and more memory efficient offload processors. - The computation elements or offload processors can be accessible through
memory bus 405. In this embodiment, the module can be inserted into a Dual Inline Memory Module (DIMM) slot on a commodity computer or server using a DIMM connector (407), providing a significant increase in effective computing power tosystem 400. The module (e.g., XIMM) may communicate with other components in the commodity computer or server via one of a variety of busses including but not limited to any version of existing double data rate standards (e.g., DDR, DDR2, DDR3, etc.) - This illustrated embodiment of the
module 402 contains five offload processors (400 a, 400 b, 400 c, 400 d, 400 e) however other embodiments containing greater or fewer numbers of processors are contemplated. The offload processors (400 a to 400 e) can be custom manufactured or one of a variety of commodity processors including but not limited to field-programmable grid arrays (FPGA), microprocessors, reduced instruction set computers (RISC), microcontrollers or ARM processors. The computation elements or offload processors can include combinations of computational FPGAs such as those based on Altera, Xilinx (e.g., Artix™ class or Zynq® architecture, e.g., Zynq® 7020), and/or conventional processors such as those based on Intel Atom or ARM architecture (e.g., ARM A9). For many applications, ARM processors having advanced memory handling features such as a snoop control unit (SCU) are preferred, since this can allow coherent read and write of memory. Other preferred advanced memory features can include processors that support an accelerator coherency port (ACP) that can allow for coherent supplementation of the cache through an FPGA fabric or computational element. - Each offload processor (400 a to 400 e) on the
module 402 may run one of a variety of operating systems including but not limited to Apache or Linux. In addition, the offload processors (400 a to 400 e) may have access to a plurality of dedicated or shared storage methods. In this embodiment, each offload processor can connect to one or more storage units (in this embodiments, pairs ofstorage units -
FIG. 4-2 shows an example layout of a module (e.g., XIMM) such as that described inFIG. 4-1 , as well as a connectivity diagram between the components of the module. In this example, five Xilinx™ Zynq® 7020 (416 a, 416 b, 416 c, 416 d, 416 e and 416 in the connectivity diagram) programmable systems-on-a-chip (SoC) are used as computational FPGAs/offload processors. These offload processors can communicate with each other using memory-mapped input-output (MMIO) (412). The types of storage units used in this example are SDRAM (SD, one shown as 408) and RLDRAM (RLD, three shown as 406 a, 406 b, 406 c) and an Inphi™iMB02 memory buffer 418. Down conversion of 3.3 V to 2.5 volt is required to connect the RLDRAM (406 a to 406 c) with the Zynq® components. The components are connected to the offload processors and to each other via a DDR3 (414) memory bus. Advantageously, the indicated layout maximizes memory resources availability without requiring a violation of the number of pins available under the DIMM standard. - In this embodiment, one of the Zynq® computational FPGAs (416 a to 416 e) can act as arbiter providing a memory cache, giving an ability to have peer to peer sharing of data (via memcached or OMQ memory formalisms) between the other Zynq® computational FPGAs (416 a to 416 e). Traffic departing for the computational FPGAs can be controlled through memory mapped I/O. The arbiter queues session data for use, and when a computational FPGA asks for address outside of the provided session, the arbiter can be the first level of retrieval, external processing determination, and predictors set.
-
FIG. 4-3 shows two possible architectures for a module (e.g., XIMM) in a simulation (Xockets MAX and MIN). Xockets MIN (420 a) can be used in low-end public cloud servers, containing twenty ARM cores (420 b) spread across fourteen DIMM slots in a commodity server which has two Opteron x86 processors and two network interface cards (NICs) (420 c). This architecture can provide a minimal benefit per Watt of power used. Xockets MAX (422 a) contains eighty ARM cores (422 b) across eight DIMM slots, in a server with two Opteron x86 processors and four NICs (422 c). This architecture can provide a maximum benefit per Watt of power used. -
FIG. 4-4 shows a representative power budget for an example of a module (e.g., XIMM) according to a particular embodiment. Each component is listed (424 a, 424 b, 424 c, 424 d) along with its power profile. Average total and total wattages are also listed (426 a, 426 b). In total, especially for I/O packet processing with packet sizes on theorder 1 KB in size, module can have a low average power budget that is easily able to be provided by the 22 Vdd pins per DIMM. Additionally, the expected thermal output can be handled by inexpensive conductive heat spreaders, without requiring additional convective, conductive, or thermoelectric cooling. In certain situations, digital thermometers can be implemented to dynamically reduce performance (and consequent heat generation) if needed. - Operation of one embodiment of a module 430 (e.g., XIMM) using an ARM A9 architecture is illustrated with respect to
FIG. 4-5 . Use of ARM A9 architecture in conjunction with an FPGA fabric and memory, in this case shown as reduced latency DRAM (RLDRAM) 438, can simplify or makes possible zero-overhead context switching, memory compression and CPI, in part by allowing hardware context switching synchronized with network queuing. In this way, there can be a one-to-one mapping between thread and queues. As illustrated, the ARM A9 architecture includes a Snoop Control Unit 432 (SCU). This unit allows one to read out and write in memory coherently. Additionally, the Accelerator Coherency Port 434 (ACP) allows for coherent supplementation of the cache throughout theFPGA 436. TheRLDRAM 438 provides the auxiliary bandwidth to read and write the ping-pong cache supplement (435): Block1$ and Block2$ during packet-level meta-data processing. - The following table (Table 1) illustrates potential states that can exist in the scheduling of queues/threads to XIMM processors and memory such as illustrated in
FIG. 4-5 . -
TABLE 1 Queue/Thread State HW treatment Waiting for Ingress All ingress data has been processed and thread Packet awaits further communication. Waiting for MMIO A functional call to MM hardware (such as HW encryption or transcoding) was made. Waiting for Rate-limit The thread's resource consumption exceeds limit, due to other connections idling. Currently being One of the ARM cores is already processing processed this thread, cannot schedule again. Ready for Selection The thread is ready for context selection.
These states can help coordinate the complex synchronization between processes, network traffic, and memory-mapped hardware. When a queue is selected by a traffic manager a pipeline coordinates swapping in the desired L2 cache (440), transferring the reassembled 10 data into the memory space of the executing process. In certain cases, no packets are pending in the queue, but computation is still pending to service previous packets. Once this process makes a memory reference outside of the data swapped, a scheduler can require queued data from a network interface card (NIC) to continue scheduling the thread. To provide fair queuing to a process not having data, the maximum context size is assumed as data processed. In this way, a queue must be provisioned as the greater of computational resource and network bandwidth resource, for example, each as a ratio of an 800 MHz A9 and 3 Gbps of bandwidth. Given the lopsidedness of this ratio, the ARM core is generally indicated to be worthwhile for computation having many parallel sessions (such that the hardware's prefetching of session-specific data and TCP/reassembly offloads a large portion of the CPU load) and those requiring minimal general purpose processing of data. - Essentially zero-overhead context switching is also possible using modules as disclosed in
FIG. 4-5 . Because per packet processing has minimum state associated with it, and represents inherent engineered parallelism, minimal memory access is needed, aside from packet buffering. On the other hand, after packet reconstruction, the entire memory state of the session can be accessed, and so can require maximal memory utility. By using the time of packet-level processing to prefetch the next hardware scheduled application-level service context in two different processing passes, the memory can always be available for prefetching. Additionally, theFPGA 436 can hold a supplemental “ping-pong” cache (435) that is read and written with every context switch, while the other is in use. As previously noted, this is enabled in part by theSCU 432, which allows one to read out and write in memory coherently, andACP 434 for coherent supplementation of the cache throughout theFPGA 436. TheRLDRAM 438 provides for read and write to the ping-pong cache supplement (435): (shown as Block1$ and Block2$) during packet-level meta-data processing. In the embodiment shown, only locally terminating queues can prompt context switching. - In operation, metadata transport code can relieve a main or host processor from tasks including fragmentation and reassembly, and checksum and other metadata services (e.g., accounting, IPSec, SSL, Overlay, etc.). As 10 data streams in and out,
L1 cache 437 can be filled during packet processing. During a context switch, the lock-down portion of a translation lookaside buffer (TLB) of an L1 cache can be rewritten with the addresses corresponding to the new context. In one very particular implementation, the following four commands can be executed for the current memory space. - MRC p15, 0, r0, c10, c0, 0; read the lockdown register
- BIC r0, r0, #1; clear preserve bit
- MCR p15, 0, r0, c10, c0, 0; write to the lockdown register;
- write to the old value to the memory mapped Block RAM
- This is a small 32 cycle overhead to bear. Other TLB entries can be used by the XIMM stochastically.
- Bandwidths and capacities of the memories can be precisely allocated to support context switching as well as applications such as Openflow processing, billing, accounting, and header filtering programs.
- For additional performance improvements, the
ACP 434 can be used not just for cache supplementation, but hardware functionality supplementation, in part by exploitation of the memory space allocation. An operand can be written to memory and the new function called, through customizing specific Open Source libraries, so putting the thread to sleep and a hardware scheduler can validate it for scheduling again once the results are ready. For example, OpenVPN uses the OpenSSL library, where the encrypt/decrypt functions 439 can be memory mapped. Large blocks are then available to be exported without delay, or consuming theL2 cache 440, using theACP 434. Hence, a minimum number of calls are needed within the processing window of a context switch, improving overall performance. - It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
- It is also understood that the embodiments of the invention may be practiced in the absence of an element and/or step not specifically disclosed. That is, an inventive feature of the invention may be elimination of an element.
- Accordingly, while the various aspects of the particular embodiments set forth herein have been described in detail, the present invention could be subject to various changes, substitutions, and alterations without departing from the spirit and scope of the invention.
Claims (10)
1. A method of processing structured data, comprising:
providing an in-memory database with at least one of a plurality of modules connected to a memory bus in a first server;
executing database functions with at least one processor on the module; and
connecting a central processing unit (CPU) in the first server to the modules by the memory bus, and directing database queries to the at least one module.
2. The method of processing structured data of claim 1 , further including communicating between modules without accessing any CPU of the first server.
3. The method of processing structured data of claim 1 , further including:
the modules are mounted on different servers in a same rack; and
communicating between modules of different racks with a top of the rack switch.
4. The method of processing structured data of claim 1 , further including executing a memory mapping operation to transfer a database query from the CPU to at least one module in the form of a memory read or write.
5. The method of processing structured data claim 1 , wherein providing the in-memory database includes inserting the modules into dual-in-line-memory-module (DIMM) sockets.
6. A method of processing unstructured data, comprising the steps of:
providing a plurality of modules connected to a memory bus that each include at least one processor;
connecting a central processing unit (CPU) to the modules by the memory bus;
executing data processing tasks with the CPU; and
directing parallel computation tasks to a plurality of the modules.
7. The method of processing unstructured data of claim 6 , further including:
executing a Map/Reduce algorithm, including collecting data parsed with a plurality of the modules, and storing results of a Map step in a main memory
8. The method of processing unstructured data of claim 6 , further including defining a massively parallel input/output (I/O) mid-plane with a plurality of the modules.
9. The method of processing unstructured data of claim 6 , further including:
the modules are mounted on different servers in a same rack;
processing a Map/Reduce algorithm with multiples of the servers; and
exchanging intermediate (key, value) pairs across the multiple servers through switching by the modules.
10. The method of processing unstructured data of claim 6 , wherein providing the plurality of modules connected to a memory bus includes inserting the modules into dual-in-line-memory-module (DIMM) sockets.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/900,303 US20130318084A1 (en) | 2012-05-22 | 2013-05-22 | Processing structured and unstructured data using offload processors |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261650373P | 2012-05-22 | 2012-05-22 | |
US13/900,303 US20130318084A1 (en) | 2012-05-22 | 2013-05-22 | Processing structured and unstructured data using offload processors |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130318084A1 true US20130318084A1 (en) | 2013-11-28 |
Family
ID=49622398
Family Applications (9)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/900,222 Expired - Fee Related US9619406B2 (en) | 2012-05-22 | 2013-05-22 | Offloading of computation for rack level servers and corresponding methods and systems |
US13/900,346 Expired - Fee Related US9665503B2 (en) | 2012-05-22 | 2013-05-22 | Efficient packet handling, redirection, and inspection using offload processors |
US13/900,241 Abandoned US20130318276A1 (en) | 2012-05-22 | 2013-05-22 | Offloading of computation for rack level servers and corresponding methods and systems |
US13/900,251 Active - Reinstated 2033-11-29 US9495308B2 (en) | 2012-05-22 | 2013-05-22 | Offloading of computation for rack level servers and corresponding methods and systems |
US13/900,351 Active - Reinstated US9258276B2 (en) | 2012-05-22 | 2013-05-22 | Efficient packet handling, redirection, and inspection using offload processors |
US13/900,303 Abandoned US20130318084A1 (en) | 2012-05-22 | 2013-05-22 | Processing structured and unstructured data using offload processors |
US13/900,295 Abandoned US20130318119A1 (en) | 2012-05-22 | 2013-05-22 | Processing structured and unstructured data using offload processors |
US15/396,334 Active 2033-10-16 US11080209B2 (en) | 2012-05-22 | 2016-12-30 | Server systems and methods for decrypting data packets with computation modules insertable into servers that operate independent of server processors |
US15/396,328 Active - Reinstated 2033-07-12 US10223297B2 (en) | 2012-05-22 | 2016-12-30 | Offloading of computation for servers using switching plane formed by modules inserted within such servers |
Family Applications Before (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/900,222 Expired - Fee Related US9619406B2 (en) | 2012-05-22 | 2013-05-22 | Offloading of computation for rack level servers and corresponding methods and systems |
US13/900,346 Expired - Fee Related US9665503B2 (en) | 2012-05-22 | 2013-05-22 | Efficient packet handling, redirection, and inspection using offload processors |
US13/900,241 Abandoned US20130318276A1 (en) | 2012-05-22 | 2013-05-22 | Offloading of computation for rack level servers and corresponding methods and systems |
US13/900,251 Active - Reinstated 2033-11-29 US9495308B2 (en) | 2012-05-22 | 2013-05-22 | Offloading of computation for rack level servers and corresponding methods and systems |
US13/900,351 Active - Reinstated US9258276B2 (en) | 2012-05-22 | 2013-05-22 | Efficient packet handling, redirection, and inspection using offload processors |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/900,295 Abandoned US20130318119A1 (en) | 2012-05-22 | 2013-05-22 | Processing structured and unstructured data using offload processors |
US15/396,334 Active 2033-10-16 US11080209B2 (en) | 2012-05-22 | 2016-12-30 | Server systems and methods for decrypting data packets with computation modules insertable into servers that operate independent of server processors |
US15/396,328 Active - Reinstated 2033-07-12 US10223297B2 (en) | 2012-05-22 | 2016-12-30 | Offloading of computation for servers using switching plane formed by modules inserted within such servers |
Country Status (2)
Country | Link |
---|---|
US (9) | US9619406B2 (en) |
WO (3) | WO2013177313A2 (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104735087A (en) * | 2015-04-16 | 2015-06-24 | 国家电网公司 | Public key algorithm and SSL (security socket layer) protocol based method of optimizing security of multi-cluster Hadoop system |
WO2015153699A1 (en) * | 2014-03-31 | 2015-10-08 | Xockets, Llc | Computing systems, elements and methods for processing unstructured data |
US20160147984A1 (en) * | 2014-11-20 | 2016-05-26 | International Business Machines Corporation | Implementing extent granularity authorization initialization processing in capi adapters |
CN105739965A (en) * | 2016-01-18 | 2016-07-06 | 深圳先进技术研究院 | Method for assembling ARM (Acorn RISC Machine) mobile phone cluster based on RDMA (Remote Direct Memory Access) |
WO2016123042A1 (en) * | 2015-01-26 | 2016-08-04 | Dragonfly Data Factory Llc | Data factory platform and operating system |
CN106022080A (en) * | 2016-06-30 | 2016-10-12 | 北京三未信安科技发展有限公司 | Cipher card based on PCIe (peripheral component interface express) interface and data encryption method of cipher card |
US9582659B2 (en) | 2014-11-20 | 2017-02-28 | International Business Machines Corporation | Implementing extent granularity authorization and deauthorization processing in CAPI adapters |
US9582651B2 (en) | 2014-11-20 | 2017-02-28 | International Business Machines Corporation | Implementing extent granularity authorization processing in CAPI adapters |
US9594710B2 (en) | 2014-11-20 | 2017-03-14 | International Business Machines Corporation | Implementing extent granularity authorization command flow processing in CAPI adapters |
US9594696B1 (en) * | 2014-12-09 | 2017-03-14 | Parallel Machines Ltd. | Systems and methods for automatic generation of parallel data processing code |
US9697370B2 (en) | 2014-11-20 | 2017-07-04 | International Business Machines Corporation | Implementing and processing extent granularity authorization mechanism in CAPI adapters |
US9858443B2 (en) | 2014-11-20 | 2018-01-02 | International Business Machines Corporation | Implementing block device extent granularity authorization model processing in CAPI adapters |
US9923726B2 (en) | 2014-12-03 | 2018-03-20 | International Business Machines Corporation | RDMA transfers in mapreduce frameworks |
US10104171B1 (en) * | 2015-11-25 | 2018-10-16 | EMC IP Holding Company LLC | Server architecture having dedicated compute resources for processing infrastructure-related workloads |
CN109213737A (en) * | 2018-09-17 | 2019-01-15 | 郑州云海信息技术有限公司 | A kind of data compression method and apparatus |
CN110061999A (en) * | 2019-04-28 | 2019-07-26 | 华东师范大学 | A kind of network data security analysis ancillary equipment based on ZYNQ |
CN111159088A (en) * | 2019-11-29 | 2020-05-15 | 中国船舶重工集团公司第七0九研究所 | IIC bus communication method and system based on heterogeneous multi-core processor |
US11953997B2 (en) * | 2018-10-23 | 2024-04-09 | Capital One Services, Llc | Systems and methods for cross-regional back up of distributed databases on a cloud service |
Families Citing this family (112)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2559584A1 (en) | 2004-03-13 | 2005-09-29 | Cluster Resources, Inc. | System and method of providing a self-optimizing reservation in space of compute resources |
US20070266388A1 (en) | 2004-06-18 | 2007-11-15 | Cluster Resources, Inc. | System and method for providing advanced reservations in a compute environment |
CA2586763C (en) | 2004-11-08 | 2013-12-17 | Cluster Resources, Inc. | System and method of providing system jobs within a compute environment |
US9231886B2 (en) | 2005-03-16 | 2016-01-05 | Adaptive Computing Enterprises, Inc. | Simple integration of an on-demand compute environment |
EP2360587B1 (en) | 2005-03-16 | 2017-10-04 | III Holdings 12, LLC | Automatic workload transfer to an on-demand center |
WO2006108187A2 (en) | 2005-04-07 | 2006-10-12 | Cluster Resources, Inc. | On-demand access to compute resources |
US11720290B2 (en) | 2009-10-30 | 2023-08-08 | Iii Holdings 2, Llc | Memcached server functionality in a cluster of data processing nodes |
US20130318269A1 (en) | 2012-05-22 | 2013-11-28 | Xockets IP, LLC | Processing structured and unstructured data using offload processors |
US20170109299A1 (en) * | 2014-03-31 | 2017-04-20 | Stephen Belair | Network computing elements, memory interfaces and network connections to such elements, and related systems |
WO2013177313A2 (en) | 2012-05-22 | 2013-11-28 | Xockets IP, LLC | Processing structured and unstructured data using offload processors |
US9201638B2 (en) * | 2012-08-07 | 2015-12-01 | Nec Laboratories America, Inc. | Compiler-guided software accelerator for iterative HADOOP® jobs |
US10270709B2 (en) | 2015-06-26 | 2019-04-23 | Microsoft Technology Licensing, Llc | Allocating acceleration component functionality for supporting services |
US11132277B2 (en) * | 2012-12-28 | 2021-09-28 | Iii Holdings 2, Llc | System and method for continuous low-overhead monitoring of distributed applications running on a cluster of data processing nodes |
KR20160037827A (en) | 2013-01-17 | 2016-04-06 | 엑소케츠 인코포레이티드 | Offload processor modules for connection to system memory |
US9299041B2 (en) | 2013-03-15 | 2016-03-29 | Business Objects Software Ltd. | Obtaining data from unstructured data for a structured data collection |
US9262550B2 (en) | 2013-03-15 | 2016-02-16 | Business Objects Software Ltd. | Processing semi-structured data |
US9218568B2 (en) | 2013-03-15 | 2015-12-22 | Business Objects Software Ltd. | Disambiguating data using contextual and historical information |
US9063710B2 (en) * | 2013-06-21 | 2015-06-23 | Sap Se | Parallel programming of in memory database utilizing extensible skeletons |
WO2015041706A1 (en) * | 2013-09-23 | 2015-03-26 | Mcafee, Inc. | Providing a fast path between two entities |
WO2015105384A1 (en) | 2014-01-09 | 2015-07-16 | 삼성전자 주식회사 | Method and apparatus of transmitting media data related information in multimedia transmission system |
US9547553B1 (en) | 2014-03-10 | 2017-01-17 | Parallel Machines Ltd. | Data resiliency in a shared memory pool |
US9781027B1 (en) | 2014-04-06 | 2017-10-03 | Parallel Machines Ltd. | Systems and methods to communicate with external destinations via a memory network |
WO2015160331A1 (en) * | 2014-04-15 | 2015-10-22 | Hewlett-Packard Development Company, L.P. | Configurable network security |
US9477412B1 (en) | 2014-12-09 | 2016-10-25 | Parallel Machines Ltd. | Systems and methods for automatically aggregating write requests |
US9690713B1 (en) | 2014-04-22 | 2017-06-27 | Parallel Machines Ltd. | Systems and methods for effectively interacting with a flash memory |
KR101535502B1 (en) * | 2014-04-22 | 2015-07-09 | 한국인터넷진흥원 | System and method for controlling virtual network including security function |
US10838865B2 (en) * | 2014-05-08 | 2020-11-17 | Micron Technology, Inc. | Stacked memory device system interconnect directory-based cache coherence methodology |
US9641429B2 (en) | 2014-06-18 | 2017-05-02 | Radware, Ltd. | Predictive traffic steering over software defined networks |
US9723071B2 (en) * | 2014-09-29 | 2017-08-01 | Samsung Electronics Co., Ltd. | High bandwidth peer-to-peer switched key-value caching |
US9639473B1 (en) | 2014-12-09 | 2017-05-02 | Parallel Machines Ltd. | Utilizing a cache mechanism by copying a data set from a cache-disabled memory location to a cache-enabled memory location |
US9690705B1 (en) | 2014-12-09 | 2017-06-27 | Parallel Machines Ltd. | Systems and methods for processing data sets according to an instructed order |
US9753873B1 (en) | 2014-12-09 | 2017-09-05 | Parallel Machines Ltd. | Systems and methods for key-value transactions |
US9781225B1 (en) | 2014-12-09 | 2017-10-03 | Parallel Machines Ltd. | Systems and methods for cache streams |
EP3062142B1 (en) | 2015-02-26 | 2018-10-03 | Nokia Technologies OY | Apparatus for a near-eye display |
CN106155633A (en) * | 2015-03-30 | 2016-11-23 | 上海黄浦船用仪器有限公司 | A kind of parallel computation multitask system |
US10511478B2 (en) | 2015-04-17 | 2019-12-17 | Microsoft Technology Licensing, Llc | Changing between different roles at acceleration components |
US9792154B2 (en) | 2015-04-17 | 2017-10-17 | Microsoft Technology Licensing, Llc | Data processing system having a hardware acceleration plane and a software plane |
US10198294B2 (en) | 2015-04-17 | 2019-02-05 | Microsoft Licensing Technology, LLC | Handling tenant requests in a system that uses hardware acceleration components |
US10296392B2 (en) | 2015-04-17 | 2019-05-21 | Microsoft Technology Licensing, Llc | Implementing a multi-component service using plural hardware acceleration components |
IL238690B (en) | 2015-05-07 | 2019-07-31 | Mellanox Technologies Ltd | Network-based computational accelerator |
US10216555B2 (en) | 2015-06-26 | 2019-02-26 | Microsoft Technology Licensing, Llc | Partially reconfiguring acceleration components |
TWI547822B (en) * | 2015-07-06 | 2016-09-01 | 緯創資通股份有限公司 | Data processing method and system |
US9667657B2 (en) * | 2015-08-04 | 2017-05-30 | AO Kaspersky Lab | System and method of utilizing a dedicated computer security service |
US11082515B2 (en) * | 2015-09-26 | 2021-08-03 | Intel Corporation | Technologies for offloading data object replication and service function chain management |
US10169409B2 (en) | 2015-10-01 | 2019-01-01 | International Business Machines Corporation | System and method for transferring data between RDBMS and big data platform |
CN108353068B (en) * | 2015-10-20 | 2021-05-07 | 慧与发展有限责任合伙企业 | SDN controller assisted intrusion prevention system |
CN105656712B (en) * | 2015-12-22 | 2019-01-29 | 山东大学 | A kind of RFID protocol uniformity test platform and its working method based on ZYNQ |
US9984009B2 (en) * | 2016-01-28 | 2018-05-29 | Silicon Laboratories Inc. | Dynamic containerized system memory protection for low-energy MCUs |
TWI618387B (en) * | 2016-02-24 | 2018-03-11 | Method for improving packet processing of virtual switch | |
CN105631798B (en) * | 2016-03-04 | 2018-11-27 | 北京理工大学 | Low Power Consumption Portable realtime graphic object detecting and tracking system and method |
CN105785348A (en) * | 2016-04-08 | 2016-07-20 | 浙江大学 | Sonar signal processing method based on ZYNQ-7000 platform |
US10212232B2 (en) * | 2016-06-03 | 2019-02-19 | At&T Intellectual Property I, L.P. | Method and apparatus for managing data communications using communication thresholds |
US10956467B1 (en) * | 2016-08-22 | 2021-03-23 | Jpmorgan Chase Bank, N.A. | Method and system for implementing a query tool for unstructured data files |
CN106571847A (en) * | 2016-10-26 | 2017-04-19 | 深圳市极致汇仪科技有限公司 | Test instrument communication device and method based on ZYNQ |
US10650552B2 (en) | 2016-12-29 | 2020-05-12 | Magic Leap, Inc. | Systems and methods for augmented reality |
EP4300160A3 (en) | 2016-12-30 | 2024-05-29 | Magic Leap, Inc. | Polychromatic light out-coupling apparatus, near-eye displays comprising the same, and method of out-coupling polychromatic light |
CN107071324A (en) * | 2017-01-25 | 2017-08-18 | 上海电气集团股份有限公司 | A kind of visual pattern processing system and its design method |
CN107196695A (en) * | 2017-04-07 | 2017-09-22 | 西安电子科技大学 | Inter-satellite Links test system based on Zynq |
CN107329720B (en) * | 2017-06-30 | 2020-07-03 | 中国航空工业集团公司雷华电子技术研究所 | Radar image display acceleration system based on ZYNQ |
US10578870B2 (en) | 2017-07-26 | 2020-03-03 | Magic Leap, Inc. | Exit pupil expander |
TW202301125A (en) | 2017-07-30 | 2023-01-01 | 埃拉德 希提 | Memory chip with a memory-based distributed processor architecture |
CN107479831A (en) * | 2017-08-11 | 2017-12-15 | 浙江工业大学 | A kind of OCT volume data method for carrying based on Zynq platforms |
CN107634826B (en) * | 2017-08-29 | 2020-06-05 | 北京三未信安科技发展有限公司 | Encryption method and system based on ZYNQ device |
US11502948B2 (en) | 2017-10-16 | 2022-11-15 | Mellanox Technologies, Ltd. | Computational accelerator for storage operations |
US11005771B2 (en) | 2017-10-16 | 2021-05-11 | Mellanox Technologies, Ltd. | Computational accelerator for packet payload operations |
US10841243B2 (en) * | 2017-11-08 | 2020-11-17 | Mellanox Technologies, Ltd. | NIC with programmable pipeline |
CN109861925B (en) * | 2017-11-30 | 2021-12-21 | 华为技术有限公司 | Data transmission method, related device and network |
KR102596429B1 (en) | 2017-12-10 | 2023-10-30 | 매직 립, 인코포레이티드 | Anti-reflection coatings on optical waveguides |
US10708240B2 (en) | 2017-12-14 | 2020-07-07 | Mellanox Technologies, Ltd. | Offloading communication security operations to a network interface controller |
AU2018392482A1 (en) | 2017-12-20 | 2020-07-02 | Magic Leap, Inc. | Insert for augmented reality viewing device |
CN108566357B (en) * | 2017-12-21 | 2020-04-03 | 中国科学院西安光学精密机械研究所 | Image transmission and control system and method based on ZYNQ-7000 and FreeRTOS |
CN108055342B (en) * | 2017-12-26 | 2021-05-04 | 北京奇艺世纪科技有限公司 | Data monitoring method and device |
EP3766039B1 (en) | 2018-03-15 | 2024-08-14 | Magic Leap, Inc. | Image correction due to deformation of components of a viewing device |
EP3803450A4 (en) | 2018-05-31 | 2021-08-18 | Magic Leap, Inc. | Radar head pose localization |
CN108881254B (en) * | 2018-06-29 | 2021-08-06 | 中国科学技术大学苏州研究院 | Intrusion detection system based on neural network |
US11579441B2 (en) | 2018-07-02 | 2023-02-14 | Magic Leap, Inc. | Pixel intensity modulation using modifying gain values |
WO2020010226A1 (en) | 2018-07-03 | 2020-01-09 | Magic Leap, Inc. | Systems and methods for virtual and augmented reality |
US11856479B2 (en) | 2018-07-03 | 2023-12-26 | Magic Leap, Inc. | Systems and methods for virtual and augmented reality along a route with markers |
US10956336B2 (en) * | 2018-07-20 | 2021-03-23 | International Business Machines Corporation | Efficient silent data transmission between computer servers |
US11624929B2 (en) | 2018-07-24 | 2023-04-11 | Magic Leap, Inc. | Viewing device with dust seal integration |
JP7426982B2 (en) | 2018-07-24 | 2024-02-02 | マジック リープ, インコーポレイテッド | Temperature-dependent calibration of movement sensing devices |
US11112862B2 (en) | 2018-08-02 | 2021-09-07 | Magic Leap, Inc. | Viewing system with interpupillary distance compensation based on head motion |
WO2020028191A1 (en) | 2018-08-03 | 2020-02-06 | Magic Leap, Inc. | Unfused pose-based drift correction of a fused pose of a totem in a user interaction system |
JP7487176B2 (en) | 2018-08-22 | 2024-05-20 | マジック リープ, インコーポレイテッド | Patient Visibility System |
CN110932922B (en) * | 2018-09-19 | 2022-11-08 | 上海仪电(集团)有限公司中央研究院 | Financial data two-layer network acquisition system based on FPGA and testing method thereof |
US10914949B2 (en) | 2018-11-16 | 2021-02-09 | Magic Leap, Inc. | Image size triggered clarification to maintain image sharpness |
US10824469B2 (en) | 2018-11-28 | 2020-11-03 | Mellanox Technologies, Ltd. | Reordering avoidance for flows during transition between slow-path handling and fast-path handling |
EP3899613A4 (en) | 2018-12-21 | 2022-09-07 | Magic Leap, Inc. | Air pocket structures for promoting total internal reflection in a waveguide |
US11425189B2 (en) * | 2019-02-06 | 2022-08-23 | Magic Leap, Inc. | Target intent-based clock speed determination and adjustment to limit total heat generated by multiple processors |
EP3939030A4 (en) | 2019-03-12 | 2022-11-30 | Magic Leap, Inc. | Registration of local content between first and second augmented reality viewers |
US11184439B2 (en) | 2019-04-01 | 2021-11-23 | Mellanox Technologies, Ltd. | Communication with accelerator via RDMA-based network adapter |
CN110110534A (en) * | 2019-04-18 | 2019-08-09 | 郑州信大捷安信息技术股份有限公司 | A kind of FPGA safe operation system and method |
EP3963565A4 (en) | 2019-05-01 | 2022-10-12 | Magic Leap, Inc. | Content provisioning system and method |
US10749934B1 (en) * | 2019-06-19 | 2020-08-18 | Constanza Terry | Removable hardware for increasing computer download speed |
EP4004630A4 (en) | 2019-07-26 | 2022-09-28 | Magic Leap, Inc. | Systems and methods for augmented reality |
US11593156B2 (en) | 2019-08-16 | 2023-02-28 | Red Hat, Inc. | Instruction offload to processor cores in attached memory |
CN110687843B (en) * | 2019-10-14 | 2021-09-28 | 北京长峰天通科技有限公司 | Multi-shaft multi-motor servo device based on ZYNQ and control method thereof |
JP2023501574A (en) | 2019-11-14 | 2023-01-18 | マジック リープ, インコーポレイテッド | Systems and methods for virtual and augmented reality |
JP2023502927A (en) | 2019-11-15 | 2023-01-26 | マジック リープ, インコーポレイテッド | Visualization system for use in a surgical environment |
US11397519B2 (en) * | 2019-11-27 | 2022-07-26 | Sap Se | Interface controller and overlay |
CN111563059B (en) * | 2019-12-18 | 2022-05-24 | 中国船舶重工集团公司第七0九研究所 | PCIe-based multi-FPGA dynamic configuration device and method |
CN111857902B (en) * | 2019-12-30 | 2023-09-26 | 华人运通(上海)云计算科技有限公司 | Application display method, device, equipment and readable storage medium |
CN111506249B (en) * | 2020-04-23 | 2023-03-24 | 珠海华网科技有限责任公司 | Data interaction system and method based on ZYNQ platform |
US11934330B2 (en) * | 2020-05-08 | 2024-03-19 | Intel Corporation | Memory allocation for distributed processing devices |
US11791571B2 (en) | 2020-06-26 | 2023-10-17 | Ge Aviation Systems Llc | Crimp pin electrical connector |
CN111858436B (en) * | 2020-07-30 | 2021-10-26 | 南京英锐创电子科技有限公司 | Switching circuit for high-speed bus read-write low-speed bus and data read-write equipment |
IL276538B2 (en) | 2020-08-05 | 2023-08-01 | Mellanox Technologies Ltd | Cryptographic data communication apparatus |
CN114095153A (en) | 2020-08-05 | 2022-02-25 | 迈络思科技有限公司 | Cipher data communication device |
CN113176850B (en) * | 2021-03-12 | 2022-07-12 | 湖南艾科诺维科技有限公司 | Shared storage disk based on SRIO interface and access method thereof |
US11934658B2 (en) | 2021-03-25 | 2024-03-19 | Mellanox Technologies, Ltd. | Enhanced storage protocol emulation in a peripheral device |
US12117948B2 (en) | 2022-10-31 | 2024-10-15 | Mellanox Technologies, Ltd. | Data processing unit with transparent root complex |
US12007921B2 (en) | 2022-11-02 | 2024-06-11 | Mellanox Technologies, Ltd. | Programmable user-defined peripheral-bus device implementation using data-plane accelerator (DPA) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030002261A1 (en) * | 2001-06-29 | 2003-01-02 | Intel Corporation | Rack-mounted server and associated methods |
US20030229626A1 (en) * | 2002-06-05 | 2003-12-11 | Microsoft Corporation | Performant and scalable merge strategy for text indexing |
US20050187944A1 (en) * | 2004-02-10 | 2005-08-25 | Microsoft Corporation | Systems and methods for a database engine in-process data provider |
US20060036574A1 (en) * | 2004-08-11 | 2006-02-16 | Rainer Schweigkoffer | System and method for an optimistic database access |
US20070032920A1 (en) * | 2005-07-25 | 2007-02-08 | Lockheed Martin Corporation | System for controlling unmanned vehicles |
US20080027920A1 (en) * | 2006-07-26 | 2008-01-31 | Microsoft Corporation | Data processing over very large databases |
US20090077478A1 (en) * | 2007-09-18 | 2009-03-19 | International Business Machines Corporation | Arrangements for managing processing components using a graphical user interface |
US20090276528A1 (en) * | 2008-05-05 | 2009-11-05 | William Thomas Pienta | Methods to Optimally Allocating the Computer Server Load Based on the Suitability of Environmental Conditions |
US20120259801A1 (en) * | 2011-04-06 | 2012-10-11 | Microsoft Corporation | Transfer of learning for query classification |
US20130179435A1 (en) * | 2012-01-06 | 2013-07-11 | Ralph Stadter | Layout-Driven Data Selection and Reporting |
US20130268354A1 (en) * | 2012-04-09 | 2013-10-10 | Ranjith Jayaram | Selecting Content Items for Display in a Content Stream |
US20130290462A1 (en) * | 2012-04-27 | 2013-10-31 | Kevin T. Lim | Data caching using local and remote memory |
US20130290976A1 (en) * | 2012-04-30 | 2013-10-31 | Ludmila Cherkasova | Scheduling mapreduce job sets |
US20130297624A1 (en) * | 2012-05-07 | 2013-11-07 | Microsoft Corporation | Interoperability between Map-Reduce and Distributed Array Runtimes |
Family Cites Families (156)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62214464A (en) * | 1986-03-17 | 1987-09-21 | Hitachi Ltd | Coprocessor coupling system |
US5446844A (en) * | 1987-10-05 | 1995-08-29 | Unisys Corporation | Peripheral memory interface controller as a cache for a large data processing system |
US5237662A (en) | 1991-06-27 | 1993-08-17 | Digital Equipment Corporation | System and method with a procedure oriented input/output mechanism |
US5247675A (en) | 1991-08-09 | 1993-09-21 | International Business Machines Corporation | Preemptive and non-preemptive scheduling and execution of program threads in a multitasking operating system |
US5577213A (en) | 1994-06-03 | 1996-11-19 | At&T Global Information Solutions Company | Multi-device adapter card for computer |
US5850571A (en) * | 1996-04-22 | 1998-12-15 | National Instruments Corporation | System and method for converting read cycles into write cycles for improved system performance |
US6085307A (en) | 1996-11-27 | 2000-07-04 | Vlsi Technology, Inc. | Multiple native instruction set master/slave processor arrangement and method thereof |
US5870350A (en) | 1997-05-21 | 1999-02-09 | International Business Machines Corporation | High performance, high bandwidth memory bus architecture utilizing SDRAMs |
US6092146A (en) | 1997-07-31 | 2000-07-18 | Ibm | Dynamically configurable memory adapter using electronic presence detects |
US5913058A (en) * | 1997-09-30 | 1999-06-15 | Compaq Computer Corp. | System and method for using a real mode bios interface to read physical disk sectors after the operating system has loaded and before the operating system device drivers have loaded |
US7565461B2 (en) * | 1997-12-17 | 2009-07-21 | Src Computers, Inc. | Switch/network adapter port coupling a reconfigurable processing element to one or more microprocessors for use with interleaved memory controllers |
US6157955A (en) | 1998-06-15 | 2000-12-05 | Intel Corporation | Packet processing system including a policy engine having a classification unit |
US20060117274A1 (en) | 1998-08-31 | 2006-06-01 | Tseng Ping-Sheng | Behavior processor system and method |
US6446163B1 (en) | 1999-01-04 | 2002-09-03 | International Business Machines Corporation | Memory card with signal processing element |
US6622181B1 (en) * | 1999-07-15 | 2003-09-16 | Texas Instruments Incorporated | Timing window elimination in self-modifying direct memory access processors |
US6625685B1 (en) | 2000-09-20 | 2003-09-23 | Broadcom Corporation | Memory controller with programmable configuration |
US7120155B2 (en) | 2000-10-03 | 2006-10-10 | Broadcom Corporation | Switch having virtual shared memory |
TWI240864B (en) | 2001-06-13 | 2005-10-01 | Hitachi Ltd | Memory device |
US6751113B2 (en) | 2002-03-07 | 2004-06-15 | Netlist, Inc. | Arrangement of integrated circuits in a memory module |
US7472205B2 (en) | 2002-04-24 | 2008-12-30 | Nec Corporation | Communication control apparatus which has descriptor cache controller that builds list of descriptors |
US7441262B2 (en) * | 2002-07-11 | 2008-10-21 | Seaway Networks Inc. | Integrated VPN/firewall system |
AU2003273333A1 (en) | 2002-09-18 | 2004-04-08 | Netezza Corporation | Field oriented pipeline architecture for a programmable data streaming processor |
US7454749B2 (en) | 2002-11-12 | 2008-11-18 | Engineered Intelligence Corporation | Scalable parallel processing on shared memory computers |
US20040133720A1 (en) | 2002-12-31 | 2004-07-08 | Steven Slupsky | Embeddable single board computer |
US7089412B2 (en) | 2003-01-17 | 2006-08-08 | Wintec Industries, Inc. | Adaptive memory module |
US7673304B2 (en) * | 2003-02-18 | 2010-03-02 | Microsoft Corporation | Multithreaded kernel for graphics processing unit |
US7421694B2 (en) * | 2003-02-18 | 2008-09-02 | Microsoft Corporation | Systems and methods for enhancing performance of a coprocessor |
US7155379B2 (en) | 2003-02-25 | 2006-12-26 | Microsoft Corporation | Simulation of a PCI device's memory-mapped I/O registers |
US7337314B2 (en) | 2003-04-12 | 2008-02-26 | Cavium Networks, Inc. | Apparatus and method for allocating resources within a security processor |
US6982892B2 (en) | 2003-05-08 | 2006-01-03 | Micron Technology, Inc. | Apparatus and methods for a physical layout of simultaneously sub-accessible memory modules |
US20050038946A1 (en) | 2003-08-12 | 2005-02-17 | Tadpole Computer, Inc. | System and method using a high speed interface in a system having co-processors |
US8776050B2 (en) | 2003-08-20 | 2014-07-08 | Oracle International Corporation | Distributed virtual machine monitor for managing multiple virtual resources across multiple physical nodes |
US7657706B2 (en) | 2003-12-18 | 2010-02-02 | Cisco Technology, Inc. | High speed memory and input/output processor subsystem for efficiently allocating and using high-speed memory and slower-speed memory |
US20050018495A1 (en) | 2004-01-29 | 2005-01-27 | Netlist, Inc. | Arrangement of integrated circuits in a memory module |
US7289386B2 (en) | 2004-03-05 | 2007-10-30 | Netlist, Inc. | Memory module decoder |
US7916574B1 (en) | 2004-03-05 | 2011-03-29 | Netlist, Inc. | Circuit providing load isolation and memory domain translation for memory module |
US7286436B2 (en) | 2004-03-05 | 2007-10-23 | Netlist, Inc. | High-density memory module utilizing low-density memory components |
US7532537B2 (en) | 2004-03-05 | 2009-05-12 | Netlist, Inc. | Memory module with a circuit providing load isolation and memory domain translation |
US7668165B2 (en) | 2004-03-31 | 2010-02-23 | Intel Corporation | Hardware-based multi-threading for packet processing |
US7254036B2 (en) | 2004-04-09 | 2007-08-07 | Netlist, Inc. | High density memory module using stacked printed circuit boards |
US7480611B2 (en) | 2004-05-13 | 2009-01-20 | International Business Machines Corporation | Method and apparatus to increase the usable memory capacity of a logic simulation hardware emulator/accelerator |
US7436845B1 (en) | 2004-06-08 | 2008-10-14 | Sun Microsystems, Inc. | Input and output buffering |
US20060004965A1 (en) | 2004-06-30 | 2006-01-05 | Tu Steven J | Direct processor cache access within a system having a coherent multi-processor protocol |
US7305574B2 (en) | 2004-10-29 | 2007-12-04 | International Business Machines Corporation | System, method and storage medium for bus calibration in a memory subsystem |
KR100666169B1 (en) | 2004-12-17 | 2007-01-09 | 삼성전자주식회사 | Flash memory data storing device |
US8072887B1 (en) | 2005-02-07 | 2011-12-06 | Extreme Networks, Inc. | Methods, systems, and computer program products for controlling enqueuing of packets in an aggregated queue including a plurality of virtual queues using backpressure messages from downstream queues |
EP2383657A1 (en) | 2005-04-21 | 2011-11-02 | Violin Memory, Inc. | Interconnetion system |
US8244971B2 (en) | 2006-07-31 | 2012-08-14 | Google Inc. | Memory circuit system and method |
US8438328B2 (en) | 2008-02-21 | 2013-05-07 | Google Inc. | Emulation of abstracted DIMMs using abstracted DRAMs |
US20080304481A1 (en) * | 2005-07-12 | 2008-12-11 | Paul Thomas Gurney | System and Method of Offloading Protocol Functions |
US20070016906A1 (en) | 2005-07-18 | 2007-01-18 | Mistletoe Technologies, Inc. | Efficient hardware allocation of processes to processors |
US7442050B1 (en) | 2005-08-29 | 2008-10-28 | Netlist, Inc. | Circuit card with flexible connection for memory module with heat spreader |
US7650557B2 (en) * | 2005-09-19 | 2010-01-19 | Network Appliance, Inc. | Memory scrubbing of expanded memory |
US8862783B2 (en) | 2005-10-25 | 2014-10-14 | Broadbus Technologies, Inc. | Methods and system to offload data processing tasks |
US7899864B2 (en) | 2005-11-01 | 2011-03-01 | Microsoft Corporation | Multi-user terminal services accelerator |
US8225297B2 (en) | 2005-12-07 | 2012-07-17 | Microsoft Corporation | Cache metadata identifiers for isolation and sharing |
US7904688B1 (en) * | 2005-12-21 | 2011-03-08 | Trend Micro Inc | Memory management unit for field programmable gate array boards |
US20070150671A1 (en) * | 2005-12-23 | 2007-06-28 | Boston Circuits, Inc. | Supporting macro memory instructions |
WO2007084422A2 (en) | 2006-01-13 | 2007-07-26 | Sun Microsystems, Inc. | Modular blade server |
US7619893B1 (en) | 2006-02-17 | 2009-11-17 | Netlist, Inc. | Heat spreader for electronic modules |
US20070226745A1 (en) | 2006-02-28 | 2007-09-27 | International Business Machines Corporation | Method and system for processing a service request |
US7421552B2 (en) | 2006-03-17 | 2008-09-02 | Emc Corporation | Techniques for managing data within a data storage system utilizing a flash-based memory vault |
US7434002B1 (en) | 2006-04-24 | 2008-10-07 | Vmware, Inc. | Utilizing cache information to manage memory access and cache utilization |
JP2007299279A (en) * | 2006-05-01 | 2007-11-15 | Toshiba Corp | Arithmetic device, processor system, and video processor |
US7716411B2 (en) | 2006-06-07 | 2010-05-11 | Microsoft Corporation | Hybrid memory device with single interface |
US8948166B2 (en) * | 2006-06-14 | 2015-02-03 | Hewlett-Packard Development Company, Lp. | System of implementing switch devices in a server system |
WO2007147170A2 (en) | 2006-06-16 | 2007-12-21 | Bittorrent, Inc. | Classification and verification of static file transfer protocols |
US7636800B2 (en) | 2006-06-27 | 2009-12-22 | International Business Machines Corporation | Method and system for memory address translation and pinning |
US8943245B2 (en) | 2006-09-28 | 2015-01-27 | Virident Systems, Inc. | Non-volatile type memory modules for main memory |
US20080082750A1 (en) | 2006-09-28 | 2008-04-03 | Okin Kenneth A | Methods of communicating to, memory modules in a memory channel |
US8074022B2 (en) | 2006-09-28 | 2011-12-06 | Virident Systems, Inc. | Programmable heterogeneous memory controllers for main memory with different memory modules |
US8189328B2 (en) | 2006-10-23 | 2012-05-29 | Virident Systems, Inc. | Methods and apparatus of dual inline memory modules for flash memory |
US7774556B2 (en) | 2006-11-04 | 2010-08-10 | Virident Systems Inc. | Asymmetric memory migration in hybrid main memory |
US8447957B1 (en) * | 2006-11-14 | 2013-05-21 | Xilinx, Inc. | Coprocessor interface architecture and methods of operating the same |
US8149834B1 (en) | 2007-01-25 | 2012-04-03 | World Wide Packets, Inc. | Forwarding a packet to a port from which the packet is received and transmitting modified, duplicated packets on a single port |
US20080215996A1 (en) * | 2007-02-22 | 2008-09-04 | Chad Farrell Media, Llc | Website/Web Client System for Presenting Multi-Dimensional Content |
US20080229049A1 (en) | 2007-03-16 | 2008-09-18 | Ashwini Kumar Nanda | Processor card for blade server and process. |
US8924680B2 (en) | 2007-04-12 | 2014-12-30 | Rambus Inc. | Memory controllers, systems, and methods supporting multiple request modes |
US8904098B2 (en) | 2007-06-01 | 2014-12-02 | Netlist, Inc. | Redundant backup using non-volatile memory |
US8301833B1 (en) | 2007-06-01 | 2012-10-30 | Netlist, Inc. | Non-volatile memory module |
US8874831B2 (en) | 2007-06-01 | 2014-10-28 | Netlist, Inc. | Flash-DRAM hybrid memory module |
US8347005B2 (en) | 2007-07-31 | 2013-01-01 | Hewlett-Packard Development Company, L.P. | Memory controller with multi-protocol interface |
US7840748B2 (en) | 2007-08-31 | 2010-11-23 | International Business Machines Corporation | Buffered memory module with multiple memory device data interface ports supporting double the memory capacity |
US7949683B2 (en) | 2007-11-27 | 2011-05-24 | Cavium Networks, Inc. | Method and apparatus for traversing a compressed deterministic finite automata (DFA) graph |
US8862706B2 (en) | 2007-12-14 | 2014-10-14 | Nant Holdings Ip, Llc | Hybrid transport—application network fabric apparatus |
US8990799B1 (en) | 2008-01-30 | 2015-03-24 | Emc Corporation | Direct memory access through virtual switch in device driver |
US7965714B2 (en) * | 2008-02-29 | 2011-06-21 | Oracle America, Inc. | Method and system for offloading network processing |
JP5186982B2 (en) * | 2008-04-02 | 2013-04-24 | 富士通株式会社 | Data management method and switch device |
US20110235260A1 (en) | 2008-04-09 | 2011-09-29 | Apacer Technology Inc. | Dram module with solid state disk |
US8154901B1 (en) | 2008-04-14 | 2012-04-10 | Netlist, Inc. | Circuit providing load isolation and noise reduction |
US8516185B2 (en) | 2009-07-16 | 2013-08-20 | Netlist, Inc. | System and method utilizing distributed byte-wise buffers on a memory module |
US8001434B1 (en) | 2008-04-14 | 2011-08-16 | Netlist, Inc. | Memory board with self-testing capability |
US8417870B2 (en) | 2009-07-16 | 2013-04-09 | Netlist, Inc. | System and method of increasing addressable memory space on a memory board |
US8787060B2 (en) | 2010-11-03 | 2014-07-22 | Netlist, Inc. | Method and apparatus for optimizing driver load in a memory package |
CN102037689B (en) | 2008-05-22 | 2014-04-09 | 诺基亚西门子通信公司 | Adaptive scheduler for communication systems apparatus, system and method |
US8190699B2 (en) | 2008-07-28 | 2012-05-29 | Crossfield Technology LLC | System and method of multi-path data communications |
US20100031253A1 (en) * | 2008-07-29 | 2010-02-04 | Electronic Data Systems Corporation | System and method for a virtualization infrastructure management environment |
US20100031235A1 (en) | 2008-08-01 | 2010-02-04 | Modular Mining Systems, Inc. | Resource Double Lookup Framework |
US7886103B2 (en) | 2008-09-08 | 2011-02-08 | Cisco Technology, Inc. | Input-output module, processing platform and method for extending a memory interface for input-output operations |
US9043450B2 (en) * | 2008-10-15 | 2015-05-26 | Broadcom Corporation | Generic offload architecture |
US8054832B1 (en) | 2008-12-30 | 2011-11-08 | Juniper Networks, Inc. | Methods and apparatus for routing between virtual resources based on a routing location policy |
US9104406B2 (en) * | 2009-01-07 | 2015-08-11 | Microsoft Technology Licensing, Llc | Network presence offloads to network interface |
US8352710B2 (en) * | 2009-01-19 | 2013-01-08 | International Business Machines Corporation | Off-loading of processing from a processor blade to storage blades |
US20100183033A1 (en) | 2009-01-20 | 2010-07-22 | Nokia Corporation | Method and apparatus for encapsulation of scalable media |
US8498349B2 (en) | 2009-03-11 | 2013-07-30 | Texas Instruments Incorporated | Demodulation and decoding for frequency modulation (FM) receivers with radio data system (RDS) or radio broadcast data system (RBDS) |
US8200800B2 (en) * | 2009-03-12 | 2012-06-12 | International Business Machines Corporation | Remotely administering a server |
US8264903B1 (en) | 2009-05-05 | 2012-09-11 | Netlist, Inc. | Systems and methods for refreshing a memory module |
US8489837B1 (en) | 2009-06-12 | 2013-07-16 | Netlist, Inc. | Systems and methods for handshaking with a memory module |
US9128632B2 (en) | 2009-07-16 | 2015-09-08 | Netlist, Inc. | Memory module with distributed data buffers and method of operation |
US9535849B2 (en) | 2009-07-24 | 2017-01-03 | Advanced Micro Devices, Inc. | IOMMU using two-level address translation for I/O and computation offload devices on a peripheral interconnect |
US20110035540A1 (en) * | 2009-08-10 | 2011-02-10 | Adtron, Inc. | Flash blade system architecture and method |
US8848513B2 (en) | 2009-09-02 | 2014-09-30 | Qualcomm Incorporated | Seamless overlay connectivity using multi-homed overlay neighborhoods |
US9876735B2 (en) | 2009-10-30 | 2018-01-23 | Iii Holdings 2, Llc | Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect |
US8442048B2 (en) * | 2009-11-04 | 2013-05-14 | Juniper Networks, Inc. | Methods and apparatus for configuring a virtual network switch |
US9110860B2 (en) * | 2009-11-11 | 2015-08-18 | Mellanox Technologies Tlv Ltd. | Topology-aware fabric-based offloading of collective functions |
WO2011068091A1 (en) * | 2009-12-04 | 2011-06-09 | 日本電気株式会社 | Server and flow control program |
US9389895B2 (en) | 2009-12-17 | 2016-07-12 | Microsoft Technology Licensing, Llc | Virtual storage target offload techniques |
US9390035B2 (en) | 2009-12-21 | 2016-07-12 | Sanmina-Sci Corporation | Method and apparatus for supporting storage modules in standard memory and/or hybrid memory bus architectures |
US8473695B2 (en) | 2011-03-31 | 2013-06-25 | Mosys, Inc. | Memory system including variable write command scheduling |
EP2363812B1 (en) | 2010-03-04 | 2018-02-28 | Karlsruher Institut für Technologie | Reconfigurable processor architecture |
JP2013527516A (en) | 2010-03-26 | 2013-06-27 | バーチャルメトリックス・インコーポレイテッド | Fine-grained performance resource management for computer systems |
CN101794271B (en) | 2010-03-31 | 2012-05-23 | 华为技术有限公司 | Implementation method and device of consistency of multi-core internal memory |
US8601498B2 (en) * | 2010-05-28 | 2013-12-03 | Security First Corp. | Accelerator system for use with secure data storage |
US20120324068A1 (en) * | 2011-06-17 | 2012-12-20 | Microsoft Corporation | Direct networking for multi-server units |
US8631271B2 (en) | 2010-06-24 | 2014-01-14 | International Business Machines Corporation | Heterogeneous recovery in a redundant memory system |
US8468151B2 (en) | 2010-06-29 | 2013-06-18 | Teradata Us, Inc. | Methods and systems for hardware acceleration of database operations and queries based on multiple hardware accelerators |
US9118591B2 (en) | 2010-07-30 | 2015-08-25 | Broadcom Corporation | Distributed switch domain of heterogeneous components |
US8386887B2 (en) | 2010-09-24 | 2013-02-26 | Texas Memory Systems, Inc. | High-speed memory system |
US8483046B2 (en) * | 2010-09-29 | 2013-07-09 | International Business Machines Corporation | Virtual switch interconnect for hybrid enterprise servers |
US8405668B2 (en) | 2010-11-19 | 2013-03-26 | Apple Inc. | Streaming translation in display pipe |
US8499222B2 (en) * | 2010-12-14 | 2013-07-30 | Microsoft Corporation | Supporting distributed key-based processes |
US20120239874A1 (en) | 2011-03-02 | 2012-09-20 | Netlist, Inc. | Method and system for resolving interoperability of multiple types of dual in-line memory modules |
US8885334B1 (en) * | 2011-03-10 | 2014-11-11 | Xilinx, Inc. | Computing system with network attached processors |
US8774213B2 (en) | 2011-03-30 | 2014-07-08 | Amazon Technologies, Inc. | Frameworks and interfaces for offload device-based packet processing |
US8825900B1 (en) | 2011-04-05 | 2014-09-02 | Nicira, Inc. | Method and apparatus for stateless transport layer tunneling |
US8930647B1 (en) | 2011-04-06 | 2015-01-06 | P4tents1, LLC | Multiple class memory systems |
WO2012141694A1 (en) | 2011-04-13 | 2012-10-18 | Hewlett-Packard Development Company, L.P. | Input/output processing |
US8442056B2 (en) | 2011-06-28 | 2013-05-14 | Marvell International Ltd. | Scheduling packets in a packet-processing pipeline |
US20130019057A1 (en) | 2011-07-15 | 2013-01-17 | Violin Memory, Inc. | Flash disk array and controller |
BR112014006948A2 (en) * | 2011-07-25 | 2017-06-13 | Servergy Inc | low power general purpose computer server system |
US8767463B2 (en) | 2011-08-11 | 2014-07-01 | Smart Modular Technologies, Inc. | Non-volatile dynamic random access memory system with non-delay-lock-loop mechanism and method of operation thereof |
US9424188B2 (en) | 2011-11-23 | 2016-08-23 | Smart Modular Technologies, Inc. | Non-volatile memory packaging system with caching and method of operation thereof |
EP2798804A4 (en) * | 2011-12-26 | 2015-09-23 | Intel Corp | Direct link synchronization communication between co-processors |
US8918634B2 (en) * | 2012-02-21 | 2014-12-23 | International Business Machines Corporation | Network node with network-attached stateless security offload device employing out-of-band processing |
WO2013128494A1 (en) * | 2012-03-02 | 2013-09-06 | Hitachi, Ltd. | Storage system and data transfer control method |
US9513845B2 (en) | 2012-03-30 | 2016-12-06 | Violin Memory Inc. | Memory module virtualization |
US20130318269A1 (en) | 2012-05-22 | 2013-11-28 | Xockets IP, LLC | Processing structured and unstructured data using offload processors |
WO2013177313A2 (en) | 2012-05-22 | 2013-11-28 | Xockets IP, LLC | Processing structured and unstructured data using offload processors |
US9268716B2 (en) | 2012-10-19 | 2016-02-23 | Yahoo! Inc. | Writing data from hadoop to off grid storage |
US20140157287A1 (en) | 2012-11-30 | 2014-06-05 | Advanced Micro Devices, Inc | Optimized Context Switching for Long-Running Processes |
WO2014105650A1 (en) | 2012-12-26 | 2014-07-03 | Cortina Systems, Inc. | Communication traffic processing architectures and methods |
KR20160037827A (en) | 2013-01-17 | 2016-04-06 | 엑소케츠 인코포레이티드 | Offload processor modules for connection to system memory |
US10031820B2 (en) | 2013-01-17 | 2018-07-24 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Mirroring high performance and high availablity applications across server computers |
US9378161B1 (en) | 2013-01-17 | 2016-06-28 | Xockets, Inc. | Full bandwidth packet handling with server systems including offload processors |
US10372551B2 (en) | 2013-03-15 | 2019-08-06 | Netlist, Inc. | Hybrid memory system with configurable error thresholds and failure analysis capability |
US9792154B2 (en) | 2015-04-17 | 2017-10-17 | Microsoft Technology Licensing, Llc | Data processing system having a hardware acceleration plane and a software plane |
-
2013
- 2013-05-22 WO PCT/US2013/042279 patent/WO2013177313A2/en active Application Filing
- 2013-05-22 US US13/900,222 patent/US9619406B2/en not_active Expired - Fee Related
- 2013-05-22 US US13/900,346 patent/US9665503B2/en not_active Expired - Fee Related
- 2013-05-22 US US13/900,241 patent/US20130318276A1/en not_active Abandoned
- 2013-05-22 US US13/900,251 patent/US9495308B2/en active Active - Reinstated
- 2013-05-22 US US13/900,351 patent/US9258276B2/en active Active - Reinstated
- 2013-05-22 US US13/900,303 patent/US20130318084A1/en not_active Abandoned
- 2013-05-22 WO PCT/US2013/042274 patent/WO2013177310A2/en active Application Filing
- 2013-05-22 WO PCT/US2013/042284 patent/WO2013177316A2/en active Application Filing
- 2013-05-22 US US13/900,295 patent/US20130318119A1/en not_active Abandoned
-
2016
- 2016-12-30 US US15/396,334 patent/US11080209B2/en active Active
- 2016-12-30 US US15/396,328 patent/US10223297B2/en active Active - Reinstated
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030002261A1 (en) * | 2001-06-29 | 2003-01-02 | Intel Corporation | Rack-mounted server and associated methods |
US20030229626A1 (en) * | 2002-06-05 | 2003-12-11 | Microsoft Corporation | Performant and scalable merge strategy for text indexing |
US20050187944A1 (en) * | 2004-02-10 | 2005-08-25 | Microsoft Corporation | Systems and methods for a database engine in-process data provider |
US20060036574A1 (en) * | 2004-08-11 | 2006-02-16 | Rainer Schweigkoffer | System and method for an optimistic database access |
US20070032920A1 (en) * | 2005-07-25 | 2007-02-08 | Lockheed Martin Corporation | System for controlling unmanned vehicles |
US20080027920A1 (en) * | 2006-07-26 | 2008-01-31 | Microsoft Corporation | Data processing over very large databases |
US20090077478A1 (en) * | 2007-09-18 | 2009-03-19 | International Business Machines Corporation | Arrangements for managing processing components using a graphical user interface |
US20090276528A1 (en) * | 2008-05-05 | 2009-11-05 | William Thomas Pienta | Methods to Optimally Allocating the Computer Server Load Based on the Suitability of Environmental Conditions |
US20120259801A1 (en) * | 2011-04-06 | 2012-10-11 | Microsoft Corporation | Transfer of learning for query classification |
US20130179435A1 (en) * | 2012-01-06 | 2013-07-11 | Ralph Stadter | Layout-Driven Data Selection and Reporting |
US20130268354A1 (en) * | 2012-04-09 | 2013-10-10 | Ranjith Jayaram | Selecting Content Items for Display in a Content Stream |
US20130290462A1 (en) * | 2012-04-27 | 2013-10-31 | Kevin T. Lim | Data caching using local and remote memory |
US20130290976A1 (en) * | 2012-04-30 | 2013-10-31 | Ludmila Cherkasova | Scheduling mapreduce job sets |
US20130297624A1 (en) * | 2012-05-07 | 2013-11-07 | Microsoft Corporation | Interoperability between Map-Reduce and Distributed Array Runtimes |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015153699A1 (en) * | 2014-03-31 | 2015-10-08 | Xockets, Llc | Computing systems, elements and methods for processing unstructured data |
US10055156B2 (en) | 2014-11-20 | 2018-08-21 | International Business Machines Corporation | Implementing extent granularity authorization command flow processing in CAPI adapters |
US10055574B2 (en) | 2014-11-20 | 2018-08-21 | International Business Machines Corporation | Implementing extent granularity authorization processing in CAPI adapters |
US9898599B2 (en) | 2014-11-20 | 2018-02-20 | International Business Machines Corporation | Implementing extent granularity authorization and deauthorization processing in CAPI adapters |
US9891852B2 (en) | 2014-11-20 | 2018-02-13 | International Business Machines Corporation | Implementing extent granularity authorization command flow processing in CAPI adapters |
US10169605B2 (en) | 2014-11-20 | 2019-01-01 | International Business Machines Corporation | Implementing block device extent granularity authorization model processing in CAPI adapters |
US9582659B2 (en) | 2014-11-20 | 2017-02-28 | International Business Machines Corporation | Implementing extent granularity authorization and deauthorization processing in CAPI adapters |
US9582651B2 (en) | 2014-11-20 | 2017-02-28 | International Business Machines Corporation | Implementing extent granularity authorization processing in CAPI adapters |
US9594710B2 (en) | 2014-11-20 | 2017-03-14 | International Business Machines Corporation | Implementing extent granularity authorization command flow processing in CAPI adapters |
US20160147984A1 (en) * | 2014-11-20 | 2016-05-26 | International Business Machines Corporation | Implementing extent granularity authorization initialization processing in capi adapters |
US9600642B2 (en) | 2014-11-20 | 2017-03-21 | International Business Machines Corporation | Implementing extent granularity authorization processing in CAPI adapters |
US9600654B2 (en) | 2014-11-20 | 2017-03-21 | International Business Machines Corporation | Implementing extent granularity authorization and deauthorization processing in CAPI adapters |
US9600428B2 (en) | 2014-11-20 | 2017-03-21 | International Business Machines Corporation | Implementing extent granularity authorization command flow processing in CAPI adapters |
US9697370B2 (en) | 2014-11-20 | 2017-07-04 | International Business Machines Corporation | Implementing and processing extent granularity authorization mechanism in CAPI adapters |
US9703972B2 (en) | 2014-11-20 | 2017-07-11 | International Business Machines Corporation | Implementing and processing extent granularity authorization mechanism in CAPI adapters |
US9710624B2 (en) | 2014-11-20 | 2017-07-18 | International Business Machines Corporation | Implementing extent granularity authorization initialization processing in CAPI adapters |
US9767261B2 (en) * | 2014-11-20 | 2017-09-19 | International Business Machines Corporation | Implementing extent granularity authorization initialization processing in CAPI adapters |
US9858443B2 (en) | 2014-11-20 | 2018-01-02 | International Business Machines Corporation | Implementing block device extent granularity authorization model processing in CAPI adapters |
US9886575B2 (en) | 2014-11-20 | 2018-02-06 | International Business Machines Corporation | Implementing extent granularity authorization processing in CAPI adapters |
US10055573B2 (en) | 2014-11-20 | 2018-08-21 | International Business Machines Corporation | Implementing extent granularity authorization and deauthorization processing in CAPI adapters |
US10055606B2 (en) | 2014-11-20 | 2018-08-21 | International Business Machines Corporation | Implementing block device extent granularity authorization model processing in CAPI adapters |
US9911000B2 (en) | 2014-11-20 | 2018-03-06 | International Business Machines Corporation | Implementing extent granularity authorization processing in CAPI adapters |
US9904795B2 (en) | 2014-11-20 | 2018-02-27 | International Business Machines Corporation | Implementing extent granularity authorization command flow processing in CAPI adapters |
US10043028B2 (en) | 2014-11-20 | 2018-08-07 | International Business Machines Corporation | Implementing extent granularity authorization processing in CAPI adapters |
US10013572B2 (en) | 2014-11-20 | 2018-07-03 | International Business Machines Corporation | Implementing extent granularity authorization command flow processing in CAPI adapters |
US9923726B2 (en) | 2014-12-03 | 2018-03-20 | International Business Machines Corporation | RDMA transfers in mapreduce frameworks |
US9594696B1 (en) * | 2014-12-09 | 2017-03-14 | Parallel Machines Ltd. | Systems and methods for automatic generation of parallel data processing code |
WO2016123042A1 (en) * | 2015-01-26 | 2016-08-04 | Dragonfly Data Factory Llc | Data factory platform and operating system |
CN104735087A (en) * | 2015-04-16 | 2015-06-24 | 国家电网公司 | Public key algorithm and SSL (security socket layer) protocol based method of optimizing security of multi-cluster Hadoop system |
US10104171B1 (en) * | 2015-11-25 | 2018-10-16 | EMC IP Holding Company LLC | Server architecture having dedicated compute resources for processing infrastructure-related workloads |
US10873630B2 (en) * | 2015-11-25 | 2020-12-22 | EMC IP Holding Company LLC | Server architecture having dedicated compute resources for processing infrastructure-related workloads |
US20190007483A1 (en) * | 2015-11-25 | 2019-01-03 | EMC IP Holding Company LLC | Server architecture having dedicated compute resources for processing infrastructure-related workloads |
CN105739965A (en) * | 2016-01-18 | 2016-07-06 | 深圳先进技术研究院 | Method for assembling ARM (Acorn RISC Machine) mobile phone cluster based on RDMA (Remote Direct Memory Access) |
CN106022080A (en) * | 2016-06-30 | 2016-10-12 | 北京三未信安科技发展有限公司 | Cipher card based on PCIe (peripheral component interface express) interface and data encryption method of cipher card |
CN109213737A (en) * | 2018-09-17 | 2019-01-15 | 郑州云海信息技术有限公司 | A kind of data compression method and apparatus |
US11953997B2 (en) * | 2018-10-23 | 2024-04-09 | Capital One Services, Llc | Systems and methods for cross-regional back up of distributed databases on a cloud service |
CN110061999A (en) * | 2019-04-28 | 2019-07-26 | 华东师范大学 | A kind of network data security analysis ancillary equipment based on ZYNQ |
CN111159088A (en) * | 2019-11-29 | 2020-05-15 | 中国船舶重工集团公司第七0九研究所 | IIC bus communication method and system based on heterogeneous multi-core processor |
Also Published As
Publication number | Publication date |
---|---|
WO2013177310A2 (en) | 2013-11-28 |
US11080209B2 (en) | 2021-08-03 |
US10223297B2 (en) | 2019-03-05 |
WO2013177310A3 (en) | 2014-03-13 |
WO2013177313A2 (en) | 2013-11-28 |
WO2013177316A3 (en) | 2014-01-30 |
US20130318119A1 (en) | 2013-11-28 |
US9665503B2 (en) | 2017-05-30 |
US20170237624A1 (en) | 2017-08-17 |
US20130346469A1 (en) | 2013-12-26 |
WO2013177313A3 (en) | 2014-03-20 |
US9495308B2 (en) | 2016-11-15 |
US20140157396A1 (en) | 2014-06-05 |
US20170237714A1 (en) | 2017-08-17 |
US9619406B2 (en) | 2017-04-11 |
US20130318276A1 (en) | 2013-11-28 |
US9258276B2 (en) | 2016-02-09 |
US20130347110A1 (en) | 2013-12-26 |
WO2013177316A2 (en) | 2013-11-28 |
US20130318275A1 (en) | 2013-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20130318084A1 (en) | Processing structured and unstructured data using offload processors | |
US10212092B2 (en) | Architectures and methods for processing data in parallel using offload processing modules insertable into servers | |
US9250954B2 (en) | Offload processor modules for connection to system memory, and corresponding methods and systems | |
US9256633B2 (en) | Partitioning data for parallel processing | |
Huang et al. | High-performance design of hbase with rdma over infiniband | |
US10248346B2 (en) | Modular architecture for extreme-scale distributed processing applications | |
US20150127691A1 (en) | Efficient implementations for mapreduce systems | |
US10318346B1 (en) | Prioritized scheduling of data store access requests | |
Yu et al. | Design and evaluation of network-levitated merge for hadoop acceleration | |
US20240259322A1 (en) | Systems, devices and methods with offload processing devices | |
Zhang et al. | Design and implementation of a real-time interactive analytics system for large spatio-temporal data | |
Lee et al. | Excavating the hidden parallelism inside DRAM architectures with buffered compares | |
US20240070107A1 (en) | Memory device with embedded deep learning accelerator in multi-client environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: XOCKETS IP, LLC, DELAWARE Free format text: NUNC PRO TUNC ASSIGNMENT;ASSIGNOR:DALAL, PARIN BHADRIK;REEL/FRAME:036663/0462 Effective date: 20150908 Owner name: XOCKETS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:XOCKETS IP, LLC;REEL/FRAME:036663/0483 Effective date: 20150719 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |