CN118715528A

CN118715528A - Apparatus and method for embedding data in genetic material

Info

Publication number: CN118715528A
Application number: CN202280086516.5A
Authority: CN
Inventors: A·M·安东涅维奇
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-12-31
Filing date: 2022-12-22
Publication date: 2024-09-27
Also published as: EP4457707A1; TW202336774A; WO2023129469A1

Abstract

Methods, systems, and devices for encoding data for storage in genetic material are disclosed. For example, the computing system may segment the user data into a plurality of data blocks and generate seed data characterizing a plurality of fountain code seeds. Additionally, the computing system may implement, for each data block, a set of operations that generate one or more data packets. In some examples, the set of operations may include, for each of the plurality of fountain code seeds, determining a bit value and a corresponding element code value, and determining which of the fountain code seeds has an element code value of the bit value that matches the value of the bit location identified in the metadata. Further, the computing system may, for each data packet, cause a second set of operations to be performed that synthesizes a polynucleotide chain based at least on bit values of the corresponding data packet.

Description

Apparatus and method for embedding data in genetic material

Cross Reference to Related Applications

The present application claims priority from U.S. provisional patent application No. 63/295,756 filed on 12 months 31 of 2021. The disclosure of the provisional application is expressly incorporated herein by reference in its entirety.

Technical Field

The disclosed embodiments generally relate to the present disclosure generally to encoding and decoding of data.

Background

In some examples, a computing system may encode data (such as a user's profile) for efficient transmission or storage. In such examples, the computing system may encode the data by changing or altering the data to a format that is different from the original format of the data. Additionally, such computing systems may decode encoded data or convert the encoded data back to the original format of the data.

Disclosure of Invention

According to one aspect, a computing system may include a non-transitory machine-readable storage medium storing instructions, and at least one processor coupled to the non-transitory computer-readable storage medium. The at least one processor may be configured to segment user data into a plurality of data blocks. In some examples, each data block may include metadata. Additionally, the at least one processor may be configured to generate seed data. In some examples, the seed data characterizes a plurality of fountain code seeds. Further, the at least one processor may be configured to perform a first set of operations that generate one or more data packets for each of the plurality of data blocks. In some examples, the set of operations may include determining a bit value that identifies a bit position in the metadata and an element code value that identifies and characterizes information conveyed by the corresponding bit value, and determining which of the plurality of fountain code seeds has an element code value of the bit value that matches the value of the bit position identified in the metadata, each of the one or more data packets being associated with a fountain code seed of the plurality of fountain code seeds that has the element code value of the bit value that matches the value of the bit position identified in the associated metadata. Further, the at least one processor may be configured to, for each data packet, cause a second set of operations to be performed that synthesizes a polynucleotide chain based at least on the bit values of the corresponding data packet.

According to another aspect, a non-transitory machine-readable storage medium storing instructions that, when executed by at least one processor of a server, may cause the at least one processor to perform operations comprising segmenting user data into a plurality of data blocks. In some examples, each data block may include metadata. Additionally, the at least one processor may perform operations comprising generating seed data. Further, the at least one processor may perform operations comprising performing a first set of operations to generate one or more data packets for each of the plurality of data blocks. In some examples, the set of operations may include determining a bit value that identifies a bit position in the metadata and an element code value that identifies and characterizes information conveyed by the corresponding bit value, and determining which of the plurality of fountain code seeds has an element code value of the bit value that matches the value of the bit position identified in the metadata, each of the one or more data packets being associated with a fountain code seed of the plurality of fountain code seeds that has the element code value of the bit value that matches the value of the bit position identified in the associated metadata. Further, the at least one processor may be executable to include, for each data packet, cause performing a second set of operations for synthesizing a polynucleotide chain based at least on bit values of the corresponding data packet.

According to another aspect, a method may include segmenting user data into a plurality of data blocks. In some examples, each data block may include metadata. Additionally, the method may include generating seed data. In some examples, the seed data may characterize a plurality of fountain code seeds. Further, the method may include performing, for each of the plurality of data blocks, a first set of operations that produce one or more data packets. In some examples, the set of operations may include determining a bit value that identifies a bit position in the metadata and an element code value that identifies and characterizes information conveyed by the corresponding bit value, and determining which of the plurality of fountain code seeds has an element code value of the bit value that matches the value of the bit position identified in the metadata, each of the one or more data packets being associated with a fountain code seed of the plurality of fountain code seeds that has the element code value of the bit value that matches the value of the bit position identified in the associated metadata. Furthermore, the method may include, for each data packet, causing a second set of operations to be performed that synthesizes a polynucleotide chain based at least on bit values of the corresponding data packet.

Drawings

FIG. 1 is a block diagram of an exemplary computing environment, according to some example embodiments;

FIGS. 2-6 are block diagrams illustrating a portion of an exemplary computing environment according to some example embodiments;

a flow diagram of an exemplary process 700 for monitoring digital assets associated with a distributed ledger is shown.

Like reference symbols and designations in the various drawings indicate like elements.

Detailed Description

While the features, methods, apparatus and systems described herein may be embodied in various forms, some exemplary and non-limiting embodiments are shown in the drawings and described below. Some of the components described in this disclosure are optional, and some implementations may include components other than, or less than, the components explicitly described in this disclosure.

Embodiments described herein are directed to a computing environment including a computing system configured to encode data with fountain code programs for storage in genetic material (such as DNA/RNA). Additionally, the computing system may be configured to decode data previously stored in genetic material (such as DNA/RNA) based on the FC program.

A. Exemplary computing Environment

Fig. 1 illustrates a block diagram of an example computing environment 100 that includes, among other things, one or more computing systems, such as an encoder-decoder (ED) computing system 110 and a genetic computing system 120, and one or more devices, including one or more client devices 101, such as client device 101A, client device 101B, and client device 101C. Each of the one or more computing systems (such as ED computing system 110 and genetic computing system 120) and the one or more client devices 101 may each be operatively connected to and interconnected across one or more communication networks (such as communication network 130). Examples of communication network 130 include, but are not limited to, a wireless Local Area Network (LAN) (e.g., a "Wi-Fi" network), a network utilizing Radio Frequency (RF) communication protocols, a Near Field Communication (NFC) network, a wireless Metropolitan Area Network (MAN) connecting a plurality of wireless LANs, and a Wide Area Network (WAN) (e.g., the internet). In some examples, computing devices and computing systems operating within computing environment 100 may perform operations to establish and maintain one or more secure communication channels across communication network 130, such as, but not limited to, a Transport Layer Security (TLS) channel, a Secure Socket Layer (SSL) channel, or any other suitable secure communication channel.

As described herein, one or more client devices 101 (such as client device 101A) may each transmit a user profile or user data to ED computing system 110. Furthermore, as described herein, ED computing system 110 may implement operations that encode data for storage in genetic material, such as DNA/RNA, using Fountain Code (FC) programs, and in some examples, may decode data previously stored in such genetic material based on FC programs. In addition, one or more client devices 101 (client devices 101A) may include a computing device having one or more tangible, non-transitory memories (such as memory 102) configured to execute software instructions. In some aspects, the one or more tangible, non-transitory memories may store software applications, application modules, and other elements of code executable by the one or more processors, such as, but not limited to, an executable web browser (e.g., google Chrome ^TM、Apple Safari^TM, etc.), and additionally or alternatively, an executable application (e.g., application 104) associated with a computing system (such as ED computing system 110). In some examples not shown in fig. 1, memory 102 may also include one or more structured or unstructured data repositories or databases, and the teachings of one or more client devices 101 may maintain one or more elements of device data and location data within the one or more structured or unstructured data repositories or databases. For example, elements of the device data may uniquely identify the client device 101 within the computing environment 100 and may include, but are not limited to, an Internet Protocol (IP) address assigned to the client device 101 or a Media Access Control (MAC) layer assigned to the client device 101A.

Further, one or more client devices 101 (such as client device 101A) may also include a display unit 106A configured to present interface elements to a corresponding user and an input unit 106B configured to receive input from the user. For example, the input unit 106B is configured to receive input from a user in response to interface elements presented through the display unit 106A. By way of example, the display unit 106A may include, but is not limited to, an LCD display unit or other suitable type of display unit, and the input unit 106B may include, but is not limited to, a keypad, keyboard, touch screen, voice-activated control technology, or a suitable type of input unit. Moreover, in additional aspects (not shown in fig. 1), the functionality of the display unit 106A and the input unit 106B may be combined into a single device, such as a pressure-sensitive touch screen display unit that presents interface elements and receives input from a user of the client device 101 (such as the client device 101A). The one or more client devices 101 may also include a communication interface 106C, such as a wireless transceiver device, coupled to the processor 105 and configured by the processor 105 to communicate via one or more communication protocols (such asNFC), cellular communication protocols (e.g.,Etc.) or any other suitable communication protocol to establish and maintain communication with the communication network 130.

Examples of one or more client devices 101 may include, but are not limited to, personal computers, laptop computers, tablet computers, notebook computers, handheld computers, personal digital assistants, portable navigation devices, mobile phones, smart phones, wearable computing devices (e.g., smart watches, wearable activity monitors, wearable smart jewelry and glasses, and other optical devices including Optical Head Mounted Displays (OHMD), embedded computing devices (e.g., in communication with smart textiles or electronic fabrics), and any other type of computing device that may be configured to store data and software instructions, execute software instructions to perform operations, and/or display information on an interface device or unit (such as display unit 106A).

Referring back to fig. 1, an encoder-decoder (ED) computing system 110 may represent a computing system that includes one or more servers, such as server 110A, and one or more tangible, non-transitory memory devices storing executable code, application engines, or application modules. Each of the one or more servers may include one or more processors that may be configured to execute stored code, application engines, or modules, or portions of application programs to perform operations consistent with the disclosed exemplary embodiments. For example, as shown in fig. 1, one or more servers of ED computing system 110 may include server 110A having one or more processors configured to execute stored code, application engines, or portions of modules or applications maintained within one or more tangible, non-transitory memories.

In some examples, ED computing system 110 may correspond to a discrete computing system, although in other examples, ED computing system 110 may correspond to a distributed computing system having multiple computing components distributed across an appropriate computing network (such as communication network 130 of fig. 1), or a computing system established and maintained by one or more cloud-based providers (such as Microsoft Azure ^TM、Amazon Web Services^TM or another third-party cloud service provider). In addition, ED computing system 110 may also include one or more communication interfaces (such as one or more wireless transceivers) coupled to the one or more processors to accommodate wired or wireless Internet communications across communication network 130 with other computing systems and devices (not shown in FIG. 1) operating within computing environment 100.

As described herein, the ED computing system 110 may perform any of the example processes described herein to encode data for storage in genetic material, such as DNA/RNA, particularly with Fountain Code (FC) programs. Additionally, in some examples, ED computing system 110 may decode data previously stored in such genetic material based on FC programs. To facilitate execution of these exemplary processes, ED computing system 110 may be maintained within one or more tangible, non-transitory memories, such as data repository 111 including, but not limited to, user data database 112, metadata database 113, fountain Code (FC) seed database 114, encoded data database 115A, decoded data database 115B, and map data database 116. User data database 112 may store user data received from one or more client devices 101. In some examples, user data database 112 may store one or more segments or data blocks of user data received from one or more client devices 101. In such examples, server 110A of ED computing system 110 may perform the processes described herein to segment user data into one or more segments or data blocks. In various examples, each of the one or more fragments or data blocks may be non-overlapping and may be substantially equal in size (e.g., equal bit lengths).

In addition, metadata database 113 may store metadata generated by ED computing system 110. Each portion of metadata may identify and characterize information about a corresponding segment or data block stored in user data database 112. Examples of information for one or more fragments or data blocks include an identifier associated with the corresponding fragment or data block (e.g., a block identifier), an identifier associated with each data element included in the corresponding fragment or data block (e.g., a bit identifier), information or a value associated with each data element (such as "isZero", "isOne", or "noninfo.". Data elements having a value representing a state (being "isZero" or "isOne") may each pass a value of 0 or 1, respectively, in another example, data elements having a value representing a state (being "noInfo") may each not pass any particular information about a particular state, hi some examples, the "noInfo" state may be used as a delimiter for separating multiple parameter values and a filler for any metadata bits beyond those required for transmission.

Further, in some examples, the information of the one or more segments or data blocks may include hash values (e.g., hash values corresponding to data block identifiers) that identify and characterize the information of the corresponding segments or data blocks. Furthermore, the metadata may include encoding-decoding information that characterizes and identifies several encoding-decoding parameters. Each encoding-decoding parameter may characterize the characteristics of the encoding and decoding process for received user data, such as one or more segments or data blocks. In some examples, the encoding-decoding parameters may be based in part on and dependent on the size of the received or obtained user data.

Further, FC seed database 114 may store seed data generated by ED computing system 110. The seed data may identify and characterize a number of fountain code seeds. Additionally, the size of the fountain code seed may be a fixed or fixed number of values, such as 26 to 32 bits. Further, the size of the fountain code seed may be based on the size of the user data. Further, the particular fountain code seed may correspond to information sufficient to describe the content of the payload of a corresponding data packet of a set of randomly encoded data elements used by the ED computing system 110 in decoding the data packet. Further, ED computing system 110 may embed or include information of metadata of one or more segments or data blocks stored in metadata database 113 in one or more or each fountain code seed. In some examples, the FC seed database 114 may store seed metadata or element code. In such examples, ED computing system 110 may perform operations that generate metadata or element code for fountain code seeds using one or more mixing functions, as described herein. The one or more mixing functions may be deterministic-producing the same result for a particular data packet regardless of the order of processing of the data packets. In addition, one or more of the mixing functions may be unbiased and may have a very flat distribution over the entire set of results.

Further, encoded data database 115A may store one or more data packets of encoded received user data. In some examples, ED computing system 110 may encode the received data by applying a type of erasure code, such as fountain codes (e.g., luby Transform), to the received user data, hi some examples, for each data block, ED computing system 110 may apply fountain codes to each data element of each corresponding data block and generate a set of random data elements and encapsulate one or more portions of the data packet, hi these examples, the ED computing system 110 may combine the set of random data elements together by bits under a binary field, the set of random combined data elements may be the payload of the corresponding data packet and may include information necessary to describe the original user data when processed (such as decoded) with a sufficient number of other data packets, hi addition, for each data packet, ED computing system 110 may include a fountain code seed corresponding to the content of the payload within the corresponding data packet.

As described herein, a fountain code seed may have a size that is a set of fixed length random values. Additionally, the set of fixed-length random values may correspond to information sufficient to describe the content of the payload used by ED computing system 110 in decoding the data packets. Further, the fountain code seed may include information of metadata of one or more segments or data blocks stored in the metadata database 113. Further, the data packet may be formatted such that the fountain code seed may be either before or after the payload.

In some examples, encoded data database 115A may store encoding-decoding parameters that ED computing system 110 may utilize when encoding received data packets, as described herein. In such examples, the encoding-decoding parameters may indicate the size of the fountain code seed and/or the payload within the data packet. In various examples, the encoding-decoding parameters may indicate a format of the data packet (e.g., fountain code seed before or after the payload, a size or length of an Error Correction Code (ECC) in the data packet, etc.).

In addition, decoded data database 115B may store data corresponding to original user data determined from one or more data packets. In some examples, decoded data database 115B may include decoded block data. In such examples, ED computing system 110 may decode one or more data packets to reconstruct one or more data blocks of user data. In other examples, decoded data database 115B may include reconstructed user data corresponding to original user data received from user's client device 101. In such examples, ED computing system 110 may combine one or more decoded block data to generate or reconstruct original user data corresponding to the original data. In some examples, decoded data database 115B may store encoding-decoding parameters that ED computing system 110 may utilize in decoding encoded data packets, as described herein. In such examples, the encoding-decoding parameters may indicate the size of the fountain code seed and/or the payload within the data packet. In various examples, the encoding-decoding parameters may indicate a format of the data packet (e.g., fountain code seed before or after the payload, a size or length of ECC in the data packet, etc.).

In addition, the mapping data database 116 may store mapping data. Mapping data can identify a particular base and corresponding bit pair. The mapping data may be modified and generated by an operator of ED computing system 110. Examples of position pairs and corresponding bases may include 00=adenine, 01=cytosine, 10=guanine, and 11=thymine. In some examples, mapping data database 116 may store sequence mapping data generated by ED computing system 110. The sequence mapping data may include data identifying a corresponding sequence of bases of the obtained data packet.

Further and to facilitate execution of any of the example processes described herein, ED computing system 110 may include a server 110A that may maintain application repository 117 in one or more tangible, non-transitory memories. As shown in FIG. 1, the application repository 117 may include, among other things, a segmentation engine 117A, FC seed engine 117B, an encoding engine 117C, a sequencer engine 117D, and a decoding engine 117E. In some examples, the segmentation engine 117A may be executed by one or more processors of the server 110A to obtain user data from the client device 101 (such as the client device 101A operated by a user) and segment the user data into one or more segments or data blocks. For example, the executed segmentation engine 117A may receive user data from the client device 101A. In addition, the executed segmentation engine 117A may segment the user data into a plurality (e.g., 4-2048) of smaller data blocks that are approximately equal in size and do not overlap. In some examples, the executed segmentation engine 117A may generate a data store that stores one or more segments or data blocks within a corresponding portion of the data repository 111 (such as the user data database 112).

In addition, the executed segmentation engine 117A may generate metadata for each data block or segment that identifies and characterizes information about the corresponding data block or segment. As described herein, examples of information for one or more segments or data blocks include an identifier associated with the corresponding segment or data block (e.g., a block identifier), an identifier associated with each data element included in the corresponding segment or data block (e.g., a bit identifier), information or values associated with each data element (e.g., "isZero", "isOne", or "noInfo"), and hash values (e.g., hash values corresponding to data block identifiers) that identify and characterize the information for the corresponding segment or data block. In addition, the metadata may include encoding-decoding information that characterizes and identifies several encoding-decoding parameters. Each encoding-decoding parameter may characterize the characteristics of the encoding and decoding process for received user data, such as one or more segments or data blocks. In some examples, the encoding-decoding parameters may be based in part on and dependent on the size of the received or obtained user data. In some examples, the executed segmentation engine 117A may generate metadata to be stored within a corresponding portion of the data repository 111 (such as the metadata database 113).

As shown in FIG. 1, a Fountain Code (FC) seed engine 117B may be executed by one or more processors of server 110A to generate seed data. As described herein, the seed data may identify and characterize a number of fountain code seeds. The executed FC seed engine 117B may implement a random generator program to generate each of the fountain code seeds included in the seed data. Each of the fountain code seeds may include a set or fixed number of random values, such as 26 to 32 bits. In some examples, the executed FC seed engine 117B may generate fountain code seeds based on the size of the user data. For example, the executed FC seed engine 117B may obtain all segments or data blocks that make up user data from the user data database 112 or obtain the entire user data prior to segmentation. In addition, the executed FC seed engine 117B may determine the size of the user data based on all fragments or data blocks or the user data itself. Based on the determined size of the user data, the executed FC seed engine 117B may generate a fountain code seed corresponding to the size of the user data (e.g., the larger the size of the user data, the larger the size of the fountain code seed). In addition, the fountain code seed generated by the executed FC seed engine 117B may correspond to a particular data block and a set of data elements associated with the particular data block. In some examples, the set of random values included in a fountain code seed may identify a particular data block associated with the fountain code seed, a number of data elements included in the set of data elements associated with the particular data block, and which data elements are the number of data elements included in the set of data elements associated with the particular block. In such examples, the executed FC seed engine 117B may generate such fountain code seeds based on portions of metadata stored in the metadata database 113 associated with the particular block. As described herein, the fountain code seed may include information sufficient to describe the content of the corresponding data packet that the ED computing system 110 may use in decoding the corresponding data packet. In some examples, the executed FC seed engine 117B may generate seed data that includes one or more fountain code seeds generated by the executed FC seed engine 117B.

In other examples, the executed FC seed engine 117B may embed or include in the fountain code seed a portion of metadata stored in the metadata database 113 that characterizes and identifies a number of encoding-decoding parameters. In such examples, each encoding-decoding parameter may characterize the characteristics of the encoding and decoding process for the received user data (such as one or more segments or data blocks). As described herein, the encoding-decoding parameters may be based in part on and dependent on the size of the received or obtained user data.

In addition, the executed FC seed engine 117B may generate corresponding seed metadata or element code for each of the one or more fountain code seeds. In such examples, for each of the one or more fountain code seeds, the executed FC seed engine 117B may apply one or more mixing functions to the corresponding fountain code seed to generate the corresponding seed metadata or element code. As described herein, one or more mixing functions may be deterministic-producing the same result for a particular data packet regardless of the processing order of the data packet, biased, and may have a very flat distribution across the set of results. In some examples, each of the one or more mixing functions may include a set of exclusive or shift functions configured for long-loop pseudo-random number generation.

In some examples, the blending function may include a blending function that, when applied to the fountain code seed, causes the executed FC seed engine 117B to generate a data block or fragment identifier associated with the valid data block. The data block or fragment identifier may identify a data block or fragment associated with a set of random data elements included in the encoded data packet of the set of random data elements. Further, the data block or segment identifier may indicate to ED computing system 110 with which data block or segment the set of random elements is associated during the decoding process. In some examples, the mixing function may generate the data block or segment identifier based on the value of the fountain code seed and the fountain code seed size. In other examples, the mixing function may include a mixing function that, when applied to the fountain code seed, causes the executed FC seed engine 117B to generate, for each identified data element in the set of random values included in the fountain code seed, a value representing one of a plurality of possible states (e.g., "isZero", "isOne", "noInfo") that the corresponding data element may take. In various examples, the mixing function may include a mixing function that, when applied to the fountain code seed, causes the executed FC seed engine 117B to generate a value representing the corresponding metadata bit for each identified data element in the set of random values included in the fountain code seed. In some examples, the value representing the corresponding metadata bit may indicate to ED computing system 110 which generated value representing one of the plurality of states is associated with which data element. In other examples, the value may be between 0 and the metadata size minus one. In various examples, all fountain code seeds and associated data packets may be processed using a mixing function configured identically.

Referring back to fig. 1, the executed FC seed engine 117B may generate seed metadata for each fountain code seed. The seed metadata may include a corresponding data block or fragment identifier, a corresponding value representing one of a plurality of states, and a corresponding value representing a metadata bit. In some examples, the executed FC seed engine 117B may store seed metadata for each fountain code seed within a corresponding portion of the data repository 111 (such as the FC seed database 114).

For example, a first blending function may be configured to generate a data block or fragment identifier associated with a valid data block, a second blending function may be configured to generate, for each identified data element in the set of random values included in the fountain code seed, a value representing one of a plurality of possible states (e.g., "isZero", "isOne", "noInfo") that the corresponding data element may take, and a third blending function may be configured to generate, for each identified data element in the set of random values included in the fountain code seed, a value representing a corresponding metadata bit. In such an example, the executed FC seed engine 117B may obtain seed data and apply the first, second, and third blending functions to a particular fountain code seed (e.g., 24 to 32 bits) of the seed data. Additionally, based on applying the first mixing function to the particular fountain code seed, the executed FC seed engine 117B may generate a 2-to 11-bit data block or segment identifier that represents the particular data block or segment associated with the particular fountain code seed. Further, based on applying the second mixing function to the particular fountain code seed, the executed FC seed engine 117B may generate a value associated with each element identified in the particular fountain code seed, such as "isZero", "isOne", "noInfo". Further, based on applying the third mixing function to the particular fountain code seed and for each of the elements identified in the particular fountain code seed, the executed FC seed engine 117B may generate a value representing the particular metadata bit.

Referring back to fig. 1, the encoding engine 117C may be executed by one or more processors of the server 110A to encode user data obtained from one or more client devices 101. In some examples, the performed encoding engine 117C may encode each segment or block of user data. In such examples, the executed encoding engine 117C may encode multiple segments or blocks of user data simultaneously or in parallel or alternatively serially. In addition, the performed encoding engine 117C may apply fountain codes (e.g., a lub transform) to each data element of each corresponding data block or segment for each data block or segment. Further, the executed encoding engine 117C may generate and encapsulate a set of random data elements into one or more portions of a data packet. Further, the executed encoding engine 117C may combine the set of random data elements together by bits under a binary field. The set of randomly combined data elements may be the payload of the corresponding data packet and may include information necessary to describe the original user data when processed (such as decoded) with a sufficient number of other data packets.

In some examples, the executed encoding engine 117C may utilize the fountain code seed data to generate one or more data packets. In such examples, the executed encoding engine 117C may obtain metadata for each piece or block of user data obtained from one or more client devices 101 (such as client device 101A) to initialize the executed encoding engine 117C. In addition, the executed encoding engine 117C may obtain seed data and corresponding seed metadata from the FC seed database 114. Further, for each data block or segment, the encoding engine 117C that is performed may select a particular potential fountain code seed. Based on the corresponding seed metadata of the potential fountain code seed and metadata associated with the corresponding data block or segment, the executed encoding engine 117C may determine whether the identifier identified in the metadata of the corresponding data block or segment matches the data block or segment identifier of the seed metadata. In examples where the executed encoding engine 117C determines that the identifier identified in the metadata of the corresponding data block or segment does not match the data block or segment identifier of the seed metadata, the executed encoding engine 117C may select another potential fountain code seed. In addition, the performed encoding engine 117C may repeat the process of determining whether the identifier identified in the metadata of the corresponding data block matches the data block identifier of the seed metadata of the second potential fountain code seed. The executed encoding engine 117C may keep repeating the process until the data block identifier of the potential fountain code matches the identifier identified in the metadata of the corresponding data block.

In examples where the executed encoding engine 117C determines that an identifier identified in metadata of a corresponding data block or segment matches a data block or segment identifier of seed metadata, the executed encoding engine 117C may determine whether a potential fountain code seed has been used to generate another data packet of a set of random data elements. In an example where the executed encoding engine 117C determines that a fountain code seed has been used to generate another data packet for a set of random data elements, the executed encoding engine 117C may select another potential fountain code seed. As described herein, the performed encoding engine 117C may repeat the above process to determine a potential fountain code seed that includes a data block identifier that matches an identifier identified in metadata of a corresponding data block and that has not been used to generate another data packet of a set of random data elements.

In examples where the executed encoding engine 117C determines that the potential fountain code seed has not been used to generate another data packet of a set of random data elements, the executed encoding engine 117C may determine whether one or more values represented in corresponding metadata bits in the seed metadata and corresponding values representing one of a plurality of states (e.g., "isZero", "isOne", "noInfo") are identified in the metadata of the corresponding block data or fragment. In examples where the executed encoding engine 117C determines that one or more values represented in corresponding metadata bits in the seed metadata and corresponding values representing one of a plurality of states are not identified in metadata of the corresponding block data or segment, the executed encoding engine 117C may select another potential fountain code seed. As described herein, the performed encoding engine 117C may repeat the above-described process to determine a potential fountain code seed that (1) includes a data block identifier that matches an identifier identified in metadata of a corresponding data block, (2) another data packet that has not been used to generate a set of random data elements, and (3) includes one or more values identified in metadata of the corresponding block data that are represented in corresponding metadata bits in the seed metadata and corresponding values representing one of a plurality of states (e.g., "isZero", "isOne", "noInfo").

In examples where the executed encoding engine 117C determines that one or more values represented in corresponding metadata bits in the seed metadata and a corresponding value representing one of a plurality of states are identified in metadata of the corresponding block data or segment, the executed encoding engine 117C may utilize the potential fountain code seed and/or the corresponding seed metadata to generate a data packet having a payload corresponding to the potential fountain code seed. For example, the payload may include the set of random data elements identified in the potential fountain code seed and/or corresponding seed metadata. In addition, each data element in the set of random data elements may be encoded by the executed encoding engine 117C using a fountain code (e.g., a luratio transform), as described herein. In some examples, the executed encoding engine 117C may combine each of the encoded data elements in the set of random data elements and encapsulate the combined encoded data elements into one or more portions of a data packet. Further, as described herein, the executed encoding engine 117C may encapsulate the corresponding potential fountain code seed into one or more portions of the data packet. As described herein, a fountain code seed may have a size that is a set of random values of a fixed length and may correspond to information sufficient to describe the set of random data elements included in the payload of a data packet. ED computing system 110 uses the data packets in decoding. Further, as described herein, the fountain code seed may include information corresponding to metadata of the segment or data block. In some examples, the executed encoding engine 117C may store the generated data packet within a corresponding portion of the data repository 111 (such as the encoded data database 115A).

In other examples, the executed encoding engine 117C may add Error Correction Codes (ECC) to the data packet. The ECC may be used by ED computing system 110 to control errors in the corresponding data packet during the decoding process (e.g., to recover lost bits or correct error bits during the decoding process). In some examples, the encoding-decoding parameters may indicate that the corresponding data packet is formatted such that the ECC is behind the payload. In such examples, based on the encoding-decoding parameters, the encoding engine 117C being executed may generate a data packet having an ECC code following the payload.

As shown in FIG. 1, sequencer engine 117D may be executed by one or more processors of server 110A to generate sequence map data for each of one or more data packets stored in encoded data database 115A. In such examples, the executed sequencer engine 117D may obtain the mapping data from the mapping data database 116. As described herein, mapping data can identify particular bases and corresponding bit pairs. The mapping data may be modified and generated by an operator of ED computing system 110. Examples of pairs of positions and corresponding bases may include that pair 00 corresponds to adenine, pair 01 corresponds to cytosine, pair 10 corresponds to guanine, and pair 11 corresponds to thymine. In addition, the executed sequencer engine 117D may obtain data packets stored in the encoded data database 115A. Further, the executed sequencer engine 117D may identify or determine the bit sequence of the fountain code seed and the payload (e.g., the set of encoded random data elements) included in the data packet. Based on the determined or identified bit sequence of the data packet and the mapping data, the executed sequencer engine 117D may determine a corresponding base sequence. Further, based on the determined corresponding base sequence, the executed sequencer engine 117D may generate sequence mapping data that identifies the corresponding base sequence of the obtained data packet. In some examples, the sequencing program engine 117D that is executed may add one or more primers (such as front-end primers and back-end primers) to the sequence mapping data. For example, the sequencing program engine 117D that is executed may add a front-end primer to the beginning of the base sequence associated with the data packet and a back-end primer to the end of the base sequence. Information related to the sequence of each of the one or more primers may be included in metadata of each of the data blocks or fragments. In various examples, the front-end primers and back-end primers may be of a fixed length or size known or encoded into the sequencing program engine 117D being executed.

In other examples, the sequencing program engine 117D executing may determine whether a polynucleotide strand synthesized based on the base sequence identified in the sequence mapping data is stable enough to synthesize. In such examples, the sequencer engine 117D that is executed may determine whether the base sequence identified in the sequence mapping data meets one or more sequence criteria. Examples of one or more sequence criteria include criteria associated with repeating bases (e.g., the number of bases in a column exceeds a threshold amount of bases), criteria associated with base patterns (e.g., a base sequence should have several patterns below a threshold amount), and criteria associated with base ratios (e.g., criteria indicate that a base ratio should be 50/50AT to GC). In examples where the executed sequencer engine 117D determines that the base sequence identified in the sequence mapping data meets one or more sequence criteria, the executed sequencer engine 117D may store the sequence mapping data in the mapping data database 116.

In various examples, the executed sequencer engine 117D may determine whether each data element of each data block has been included in the sequence map data 308 of each data packet 306. In such examples, the executed sequencer engine 117D may utilize the metadata for each data block to determine whether each data element for each data block has been included in the sequence map data 308 for each data packet 306 stored in the map data database 116. In examples where the executed sequencer engine 117D determines that the sequence map data 308 of each data packet 306 stored in the map data database 116 lost one or more data elements of one or more data blocks of user data (such as user data 103), the executed sequencer engine 117D may signal or instruct the encoding engine 117C to continue encoding the lost data elements of the incomplete data block or fragment. Otherwise, the executed sequencer engine 117D may transmit the sequence map data for each data packet to the server 120A of the genetic computing system 120. In such examples, the sequencer engine 117D being executed may generate the message. In addition, the executed sequencer engine 117D may encapsulate the sequence map data for each data packet within one or more portions of the communication. In addition, the executed sequencer engine 117D may transmit a message including sequence map data for each data packet to the server 120A of the genetic computing system 120. As described herein, the genetic computing system 120 can utilize sequence mapping data to generate corresponding polynucleotide strands and store the corresponding polynucleotide strands in a pool of polynucleotide strands. The polynucleotide pool may include a plurality of polynucleotide strands that each correspond to a particular block or fragment of user data.

In some examples, the sequencing program engine 117D that is executed may perform operations that determine the corresponding bit sequence based on the base sequence of a particular polynucleotide strand. In such examples, the genetic computing system 120 may process one or more polynucleotide strands in the pool of polynucleotide strands and sequence the polynucleotide strands and generate sequence data identifying the base sequence of each of the polynucleotide strands. In addition, genetic computing system 120 may transmit the sequence data to the executed sequencer engine 117D. The executed sequencer engine 117D may determine a bit sequence corresponding to the base sequence in the polynucleotide strand identified in the sequence data based on the mapping data and the sequence data obtained from the mapping data database 116. In addition, the sequencing program engine 117D that is executed may generate sequence bit data that identifies a bit sequence corresponding to the base sequence in the polynucleotide strand identified in the sequence data. In some examples, the polynucleotide strand may include primers (such as front-end primers and/or back-end primers) at both ends of the fountain code seed and payload. In these examples, the base sequences in the front primer and the back primer are the same for each polynucleotide strand corresponding to the data packet. This information, not shown in FIG. 1, may be obtained or encoded into the sequencing program engine 117D being executed and may be used to identify and/or tailor primers from the sequence of the polynucleotide strand identified in the sequence data generated by the genetic computing system 120. In other examples, the polynucleotide strand may not include primers, such as front-end primers and/or back-end primers. In such examples, the sequencing program engine 117D that is executed may not need to identify and trim primers from the sequence of the polynucleotide strand identified in the sequence data generated by the genetic computing system 120. In various examples, the executed sequencer engine 117D may store the sequence data and sequence bit data within one or more portions of the data repository 111.

In other examples, one or more processors of server 110A may execute a pre-inspection engine to implement a set of pre-inspection or pre-processing operations that determine an estimated distribution of data blocks or segments. In such examples, the executed pre-inspection engine may obtain sequence bit data associated with a set of random polynucleotide strands of a pool of polynucleotide strands (e.g., a set of 100,000 to 200,000 polynucleotide strands in a pool of 10,000,000 polynucleotide strands) from the mapping data database 116. In some examples, the set of random polynucleotide strands may include primers, such as front-end primers and back-end primers. In such examples, the executed pre-detection engine may determine the bit sequence from the sequence bit data and identify the portion of the bit sequence corresponding to the primer (described herein as the "primer portion") based on information associated with the length and size of the primer that is known or encoded into the executed pre-detection engine. Further, the performed pre-detection engine may identify portions of the sequence of bits between the primer portions and determine such portions as bits corresponding to the fountain code seed and associated payload and the size of such portions. Alternatively, in instances where the set of random polynucleotide strands does not include a primer, the performed pre-examination engine may not trim the bit sequence corresponding to the set of random polynucleotide strands. In such examples, the performed synthesis engine 121A may implement a biological agreement using custom sequence primers that have the effect of removing front-end primers and/or back-end primers of the polynucleotide sequence. Thus, the remaining polynucleotide sequences may be sequenced by sequencer engine 121B, and the corresponding bit sequences generated by sequencer engine 117D may correspond to the fountain code seed portion and associated payload portion.

Further, the executed pre-inspection engine may obtain the encoding-decoding parameters from the decoded data database 115B and determine which portion of the bits corresponding to the fountain code seed and associated payload are the fountain code seed and are the payload. For example, the encoding-decoding parameters may indicate that the corresponding data packet is formatted such that the fountain code seed is in front of the payload. Additionally, the encoding-decoding parameters may indicate a fountain code seed size and/or a payload size. In summary, the executed pre-inspection engine may determine which portion of bits corresponding to the fountain code seed and associated payload is the fountain code seed and is the payload based on the encoding-decoding parameters.

In examples where ED computing system 110 has encoded and decoded user data of varying sizes, the size of the bit sequence corresponding to the fountain code seed and the payload may vary. In such examples, not shown in fig. 1, data packet mapping data may be stored in ED computing system 110. For each bit sequence of varying size corresponding to a fountain code seed and a payload, the data packet map data may indicate at least a particular format (e.g., the fountain code seed is in front of or behind the payload), and the size of the fountain code seed and/or the payload. In addition, when the performed pre-detection engine performs the set of pre-detection or pre-processing operations, the performed pre-detection engine may determine the size of all portions of the sequence of bits between the primer portions and determine a majority of the size. Based on most of the size and data packet mapping data, the performed pre-inspection engine may determine which of the estimated fountain code seed size, payload size, and the portion of the sequence of bits between the primer portions corresponds to the payload.

Referring back to fig. 1, based on determining which portions of the bit sequence may correspond to fountain code seeds (described herein as "fountain code seed portions"), the performed pre-inspection engine may determine an identifier of the data block for each fountain code seed portion. Further, the performed pre-inspection engine may determine a distribution of identifiers of the data blocks based on the identifiers of the data blocks determined from each fountain code seed portion. In some examples, the executed pre-inspection engine may generate a histogram of the determined crashes or compositions that identify and characterize the distribution of identifiers of the data blocks. Additionally or alternatively, the executed pre-inspection engine may generate data block plan data that identifies and characterizes a determined collapse or composition of the distribution of identifiers of the data blocks. In some examples, the executed pre-fetch engine may store the generated data block plan data within a corresponding portion of the data repository 111 (such as the decoded data database 115B).

Decoding engine 117E may be executed by one or more processors of server 110A to decode the encoded data packet. In some examples, the performed decoding engine 117E may implement a set of operations to recover or generate seed metadata or element code corresponding to each identified or determined portion of the bit sequences associated with the second set of polynucleotide chains. In some examples, the second set of polynucleotide strands may be all of the polynucleotide strands sequenced. In addition, the decoding engine 117E that is executed may obtain sequence bit data associated with the second set of polynucleotide strands of the pool of polynucleotide strands from the mapping data database 116. Based on the sequence bit data of the second set of polynucleotide strands, the decoding engine 117E that is executed may determine the bit sequence associated with the second set of polynucleotide strands. In examples where the second set of polynucleotide strands includes primers (such as front-end primers and back-end primers), the performed decoding engine 117E may identify portions of the sequence of bits corresponding to the primer portions based on information associated with the length and size of the primers that is known or encoded into the performed decoding engine 117E, and trim the primer portions. Each of the remaining portions may correspond to a fountain code seed portion and an associated payload portion. Alternatively, in instances where the second set of polynucleotide strands does not include primers, the decoding engine 117E that is performed may not trim the bit sequences corresponding to the second set of polynucleotide strands. In such examples, the performed synthesis engine 121A may implement a biological agreement using custom sequence primers that have the effect of removing front-end primers and/or back-end primers of the polynucleotide sequence. Thus, the remaining polynucleotide sequences may be sequenced by sequencer engine 121B, and the corresponding bit sequences generated by sequencer engine 117D may correspond to the fountain code seed portion and associated payload portion. Further, the decoding engine 117E that is executed may obtain the encoding-decoding parameters from the decoded data database 115B and determine which portion of the remaining portion is the fountain code seed portion and which portion of the remaining portion is the payload portion.

In some examples, the decoding engine 117E executing may determine an identifier of the corresponding data block for each fountain code seed portion. In addition, the executed decoding engine 117E may determine a distribution of identifiers of data blocks associated with the second set of polynucleotide chains based on the determined identifiers of corresponding data blocks of each fountain code seed portion. In such examples, the performed decoding engine 117E may determine whether the distribution of identifiers of the data blocks associated with the second set of polynucleotide chains matches the distribution of identifiers of the data blocks identified in the data block plan data. In examples where the distribution of identifiers of data blocks associated with the second set of polynucleotide chains does not match the distribution of identifiers of data blocks identified in the data block plan data, the performed decoding engine 117E may use clustering, multiple read alignments, and most of the base calls to implement additional recovery operations. In some examples, the performed decoding engine 117E may determine the identifiers of data blocks in the second set of polynucleotide chains or from the data block plan data that are missing or erroneous based on the distribution of identifiers of data blocks associated with the second set of polynucleotide chains that do not match the distribution of identifiers of data blocks identified in the data block plan data. In such examples, the executed decoding engine 117E may use clustering, multiple read alignments, and most base calls for such lost data blocks to implement additional recovery operations.

In examples where the distribution of identifiers of data blocks associated with the second set of polynucleotide chains matches the distribution of identifiers of data blocks identified in the data block plan data, the performed decoding engine 117E may classify each portion of bits corresponding to the fountain code seed and the associated portion of bits corresponding to the payload according to the corresponding data block identifiers. In addition, the executed decoding engine 117E may generate list data that identifies and characterizes the sequence of bits of each of the fountain code seed portion and the associated payload portion for each identifier of each of the data blocks or fragments. In some examples, the executed decoding engine 117E may store the list data within a portion of the data repository 111 (such as the decoded data database 115B).

Further, the executed decoding engine 117E may generate or recover seed metadata or element code associated with each fountain code seed portion and payload portion of each identifier of a data block. In some examples, an executed decoding engine 117E, similar to the executed FC seed engine 117B, may apply one or more mixing functions to each fountain code seed portion for each identifier of a data block to generate corresponding seed metadata or element code. As described herein, examples of blending functions may include blending functions that, when applied to each fountain code seed portion, cause the executed decoding engine 117E to generate a corresponding data block or segment identifier. The data block or segment identifier may identify a corresponding data block or segment associated with the set of random data elements identified in the corresponding fountain code seed portion. Additionally, another example of a mixing function may include a mixing function that, when applied to each fountain code seed portion, causes the executed decoding engine 117E to generate, for each identified data element in the set of random values identified in the corresponding fountain code seed portion, a value representing one of a plurality of possible states (e.g., "isZero", "isOne", "noInfo") that the corresponding data element may take. Further, yet another example of a mixing function may include a mixing function that, when applied to each fountain code seed portion, causes the executed decoding engine 117E to generate a value representing a corresponding metadata bit for each identified data element in the set of random values identified in the corresponding fountain code seed portion. In some examples, the value representing the corresponding metadata bit may indicate to the decoding engine 117E being executed which resulting value representing one of the plurality of states is associated with which data element. In other examples, the value may be between zero and the metadata size minus one. In other examples, the executed decoding engine 117E may generate seed metadata or element code for each fountain code seed portion based on the corresponding data block or segment identifier, one or more values each representing a metadata bit, and an associated value representing one of a plurality of states. Seed metadata or element code for each of the fountain code seed portions may identify and characterize a corresponding data block or segment identifier, one or more values each representing a metadata bit, and an associated value representing one of a plurality of states. In such examples, the decoding engine 115E executing may store seed metadata or element code within a portion of the data repository 111 (such as the decoding data database 115B).

In some examples, the executed decoding engine 117E may determine, for each identifier of a data block or segment, whether the corresponding seed metadata or element codes of each corresponding fountain code seed portion are consistent with each other. In such examples, the executed decoding engine 117E may utilize one or more confidence thresholds to determine whether seed metadata or element codes for each corresponding fountain code seed portion are consistent with each other for each identifier of a data block or segment. In some examples, one or more confidence thresholds may be associated with a number of fountain code seed portions, where metadata bits for particular values have particular values that represent particular states of a plurality of states. For example, for a first data block, the decoding engine 117E executing may obtain seed metadata for 750 fountain code seed portions associated with an identifier of the first data block. Additionally, based on the seed metadata of the 750 fountain code seed portions, the executed decoding engine 117E may determine that 500 of the fountain code seed portions have corresponding seed metadata that indicates that for metadata bits having a particular value of 1 (e.g., metadata bit 1) have a corresponding value representing a particular state "isZero" of the plurality of states. The executed decoding engine 117E may determine whether the metadata bit having a value of 1 for the first data block has a corresponding value representing state "isZero" based on the number of fountain code seed portions of the seed metadata having metadata bits having a value of 1 representing the corresponding value of state "isZero" being greater than or equal to a confidence threshold associated with a number of metadata bits having particular values representing particular values of particular states of the plurality of states. In the example of a confidence threshold of 250 fountain code seed portions (where the seed metadata has a metadata bit of value 1 for the first data block and a corresponding value representing state "isZero"), the decoding engine 117E executing may determine that the metadata bit of value 1 for the first data block has a corresponding value representing state "isZero". Alternatively, in instances where the confidence threshold is 5000 seed metadata or element code (where the metadata bit value of the first data block is 1 and the corresponding value representing state "isZero"), the executed decoding engine 117E may determine that the metadata bit of the first data block having a value of 1 may not have the corresponding value representing state "isZero" or have state "isNoInfo".

In other examples, for a particular block of data, the one or more confidence thresholds may be based on a maximum number of metadata bits having a particular value that represents a particular value of a particular state of the plurality of states. For example, for a second data block, the decoding engine 117E executing may obtain seed metadata for a number of fountain code seed portions. Based on the obtained seed metadata, the executed decoding engine 117E may determine that 100 fountain code seed portions have corresponding seed metadata indicating that for metadata bits having a particular value of 3 (e.g., metadata bit 3) have a first corresponding value representing a particular state "isZero" of the plurality of states, and that 350 fountain code sub-portions have corresponding seed metadata indicating that for metadata bits having a particular value of 3 (e.g., metadata bit 3) have a corresponding value representing a particular state "isOne" of the plurality of states. Further, the decoding engine 117E performed may determine that the metadata bit having a value of 3 for the second data block has a corresponding value representing state "isZero" based on the number of fountain code seed portions having corresponding seed metadata having a corresponding value of particular value 3 for the metadata bit having particular value 3 (e.g., metadata bit 3) as compared to the number of fountain code seed portions having corresponding seed metadata having a corresponding value of particular value 3 for the metadata bit having particular value 3 (e.g., metadata bit 3) indicating having a corresponding value of particular state "isOne" of the plurality of states.

In other examples, the executing decoding engine 117E may determine, for each identifier of a block or segment of data, whether the payload portion is sufficient for the executing decoding engine 117E to recover the complete corresponding block of data. In some examples, the decoding engine 117E executing may make such determinations based on seed metadata or element code for each of the corresponding fountain code seed portions. In such examples, the decoding engine 117E executing may determine which data elements of the corresponding data block are identified in the seed metadata of each corresponding fountain code seed portion. Additionally, based in part on the identified data elements, the executed decoding engine 117E may determine whether any and which data elements of the corresponding data block are missing or incorrect. For example, the decoding engine 117E that is executed may determine which data elements are missing or incorrect by comparing the identified data elements for each fountain code seed portion. For example, the executed decoding engine 117E may determine that all or most of the identifying data elements having a bit value of 1 have a corresponding value associated with state "isOne". In addition, the executed decoding engine 117E may determine that a minority of the identification data elements having a bit value of 1 have a corresponding value associated with state "isZero". Thus, the executed decoding engine 117E may determine that the data element having a bit value of 1 may have a corresponding value associated with state "IsOne" and that the data element identified using "isZero" is incorrect. In another example, the executed decoding engine 117E may determine to identify one or more of the data elements having a bit value of 2 but not obtain information about corresponding values associated with the states of the plurality of states. In addition, the executed decoding engine 117E may determine that a number of identified data elements having a bit value of 2 have a corresponding value associated with state "isZero". Thus, the executed decoding engine 117E may determine that the data element having a bit value of 2 and the value associated with the state is missing may have a corresponding value associated with state "isZero".

Otherwise, in examples where the executed decoding engine 117E determines that all data elements are identified in the seed metadata or element code of the fountain-code seed portion of each data block, the executed decoding engine 117E may perform a set of operations to reconstruct the original user data based on the list data and the seed metadata or element code of each portion of bits of the fountain-seed code corresponding to the identifier of each data block or segment. In some examples, the implemented decoding engine 117E may obtain list data and seed metadata for the fountain code seed portion of each data block from the decoded data database 115B. In addition, the executed decoding engine 117E may utilize the list data and seed metadata or element code to initialize a decoding process, such as a fountain code decoding process. In some examples, the decoding engine 117E performed may implement a decoding process to decode each data block serially or simultaneously/in parallel. In either instance, for each data block, the decoding engine 117E that is performed may apply the decoding process to the seed metadata and the portion of the list data associated with the identifier of the corresponding data block. In addition, for each data block, the decoding engine 117E that is executed may generate sets of data elements from each payload portion based on the application of the decoding process to the seed metadata and the portion of the list data associated with the identifier of the corresponding data block. As described herein, the list data may identify, for each data block and an identifier of the data block, a payload portion obtained from the bit sequence. Further, for each data block, the decoding engine 117E that is executed may identify information for each data element within each set of data elements. For example, for each data block, the decoding engine 117E executing may identify metadata bit values associated with each data element of each group, and corresponding information to be communicated, such as states of a plurality of states (e.g., "isZero", "isOne", "isnoInfo"). Additionally, for each block of data, the executed decoding engine 117E may determine an order of data elements that reflect the data elements of each block of user data when initially received and segmented by the ED computing system 110 (e.g., the executed segmentation engine 117A). The order of the data elements is based in part on the seed metadata and the portion of the list data associated with the identifier of the corresponding data block. In some examples, the executed decoding engine 117E may utilize a join graph to determine the join between each data element and the order of the data elements for a particular data block.

In some examples, the executed decoding engine 117E may determine whether all data elements of a particular data block have been identified, whether corresponding values representing one of a plurality of states have been determined, and whether the order of the data elements has been determined. As described herein, the decoding engine 117E as executed may make this determination for each data block identified in the sequence of bits. In an example where the executed decoding engine 117E has determined all data elements of a particular data block, has determined a corresponding value representing one of a plurality of states, and has determined an order of the data elements, the executed decoding engine 117E may reconstruct the particular data block from the corresponding portion of bits corresponding to the payload according to the seed metadata, the portion of list data associated with the identifier of the corresponding data block, and the determined corresponding order of the data elements. In such examples, the decoding engine 117E as executed may reconstruct the particular block data by constructing and identifying each data element from the portion of bits corresponding to the payload and combining each data element according to the determined order of the corresponding order of the data elements. After each data block identified in the self-bit sequence has been constructed, the decoding engine 117E that is executed may combine each constructed data block. The combined data block may reflect the original user data received by ED computing system 110. In some examples, the executed decoding engine 117E may store the combined data block and each individual constructed data block within a corresponding portion of the data repository 111 (such as the decoded data database 115B). In other examples, the decoding engine 117E executing may generate the message and encapsulate the reconstructed user data within one or more portions of the message. In such examples, the decoding engine 117E executed may transmit the message and the included reconstructed user data back to the client device 101 of the user that originally sent the original user data on which the reconstructed user data was based.

In various examples, building a particular data block may also generate metadata for the particular data block. In such examples, the decoding engine 117E being executed may determine the hash value included in the metadata of the particular data block. In addition, the executed decoding engine 117E may determine whether information included in the hash value matches data of the specific data block (e.g., whether a portion of the hash value corresponding to the data block identifier matches the data block identifier included in the specific data block). In an example where the executed decoding engine 117E determines that the information included in the hash value matches the data of the particular data block, the executed decoding engine 117E may determine that the particular data block may be combined with other data blocks that are also determined to have corresponding hash values of the information matching the data of the corresponding data block. In an instance in which the executed decoding engine 117E determines that the information included in the hash value does not match the data of the particular data block, the executed decoding engine 117E may determine that the particular data block may be corrupted. In such examples, the decoding engine 117E being executed may implement operations to identify and replace corruptions within a particular block of data. For example, the executed decoding engine 117E may identify all data elements of a particular data block that may have erroneous or incorrect data (e.g., a particular state value of a particular set of bit values of a fountain code portion that is inconsistent with a particular state value of other fountain code portions of the particular bit value). In addition, the executed decoding engine 117E may utilize clustering, multiple read alignments, and/or majority base calling procedures to recover such identified data elements that may be correct data for a particular block that may have erroneous or incorrect data. As described herein, the executed encoding engine 117C may generate additional data packets having sets of redundant random data elements. Thus, a pool of polynucleotide chains may include polynucleotide chains associated with sets of redundant random data elements that can be used to determine, identify, and replace corruption within a particular data block when sequenced and converted to a sequence of bits.

Referring back to fig. 1, genetic computing system 120 may be operated by one or more operators. In addition, genetic computing system 120 may represent a computing system that includes one or more servers (such as server 120A) and one or more tangible, non-transitory memory devices storing executable code, application engines, or application modules. Each of the one or more servers may include one or more processors that may be configured to execute stored code, application engines, or portions of modules or applications to perform operations consistent with the disclosed exemplary embodiments. For example, as shown in fig. 1, one or more servers of genetic computing system 120 may include server 120A having one or more processors configured to execute stored code, application engines or modules, or portions of an application maintained within one or more tangible, non-transitory memories.

Further, as described herein, genetic computing system 120 may perform the processes described herein to generate or synthesize polynucleotides corresponding to the data packets generated by ED computing system 110. To facilitate the efficiency of generating or synthesizing polynucleotides corresponding to data packets generated by ED computing system 110, genetic computing system 120 may be maintained within one or more tangible, non-transitory memories (such as application repository 121). The application library 121 may include, among other things, a composition engine 121A and a sequencer engine 121B. The composition engine 121A may be executed by one or more processors of the server 120A to obtain a sequence map from the ED computing system 110. As described herein, the sequence mapping data may identify and characterize a base sequence of each data packet that corresponds to a set of random data elements that includes one of the data blocks of user data. In some examples, the sequence mapping data may include one or more primers. In addition, the executed composition engine 121A may generate instructions and encapsulate sequence mapping data within one or more portions of the instructions. Further, the executed composition engine 121A may transmit instructions to the electrode unit 122. The electrode unit 122 may include one or more electrodes configured to generate or synthesize a corresponding polynucleotide strand based on the instructions. As described herein, genetic computing system 120 may store polynucleotide strands in a pool of polynucleotides. The polynucleotide pool and polynucleotide strand may be associated with the same user data from which the polynucleotide pool and strand were derived.

In addition, sequencer engine 121B may be executed by one or more processors of server 120A to sequence one or more polynucleotide chains from a pool of polynucleotides. In some examples, the sequencing program engine 121B that is executed may communicate with one or more electrodes of the electrode units 122 to sequence one or more polynucleotides detected and measured/sequenced by one or more electrodes in the electrode units 122. In addition, the sequencing program engine 121B that is executed may generate sequence data that identifies base sequences in the detected and measured/sequenced one or more polynucleotides. Further, the executed sequencer engine 121B may transmit the sequence data to the ED computing system 110. As described herein, ED computing system 110 may reconstruct user data associated with the sequence data. In some examples, ED computing system 110 and genetic computing system 120 may be combined. In other examples, as shown in fig. 1, ED computing system 110 and genetic computing system 120 may be discrete computing systems.

B. Computer-implemented techniques for encoding user data into polynucleotide chains

As described herein, the encoder-decoder (ED) computing system 110 may implement operations that encode data for storage in genetic material, such as one or more polynucleotide strands. Additionally, ED computing system 110 may encode this data using Fountain Code (FC) programs. The data that may be encoded by ED computing system 110 may be obtained from one or more client devices 101 (such as client device 101A). In some examples, as shown in fig. 2, client device 101A or any client device 101 (such as client device 101B and/or client device 101C) may transmit corresponding user data to ED computing system 110. For example, as shown in fig. 2, the processor 105 of the client device 101A may obtain the user data 103. Additionally, the processor 105 may generate the message 202 within one or more portions of the processor 105 and encapsulate the user data 103 within one or more portions of the processor 105. Processor 105 may transmit message 202 along with user data 103 to server 110A of ED computing system 110.

As shown in FIG. 3, a programming interface, such as an Application Programming Interface (API) 302, established and maintained by server 110A of ED computing system 110 may receive messages 202 including user data 103. As described herein, ED computing system 110 may receive communication 202 across communication network 130 via a communication channel programmatically established between API 302 and processor 105 or any processor of client device 101 (such as client device 101A, client device 101B, and/or client device 101C). In addition, the API 302 may route the communication 202 to the executed segmentation engine 117A. The executed segmentation engine 117A may parse the message 202 and obtain the user data 103. Further, the executed segmentation engine 117A may store the user data 103 within a corresponding portion of the data repository 111 (such as the user data database 112).

As described herein, the performed segmentation engine 117A may perform operations to segment the user data 103 into a plurality of segments or data blocks. In some examples, the executed segmentation engine 117A may segment the user data into a plurality (e.g., 4-2048) of smaller data blocks having substantially equal non-overlapping sizes. In some examples, the executed segmentation engine 117A may generate a data store that stores one or more segments or data blocks within a corresponding portion of the data repository 111 (such as the user data database 112). In addition, the executed segmentation engine 117A may generate metadata for each of the plurality of data blocks. As described herein, metadata for a particular data block may identify and characterize information about the corresponding data block. Examples of information about a corresponding data block include an identifier (e.g., a block identifier) associated with the corresponding fragment or data block, an identifier or value (such as a meta-bit value associated with each data element included in the corresponding data block (e.g., a bit identifier), information or value associated with each data (e.g., "isZero", "isOne", or "noInfo"), and a hash value (e.g., a hash value corresponding to the data block identifier) that identifies and characterizes the information of the corresponding fragment or data block. In some examples, the metadata may include encoding-decoding information that characterizes and identifies a number of encoding-decoding parameters. Each encoding-decoding parameter may characterize the characteristics of the encoding and decoding process for received user data, such as one or more segments or data blocks. In other examples, the encoding-decoding parameters may be based in part on and dependent on the size of the received or obtained user data. In various examples, the executed segmentation engine 117A may generate the stored metadata for each of the plurality of data blocks within a corresponding portion of the data repository 111, such as the metadata database 113.

Further, as shown in fig. 3, the executing Fountain Code (FC) seed engine 117B may perform operations that generate seed data. As described herein, the seed data may identify and characterize a number of fountain code seeds. In some examples, the executed FC seed engine 117B may implement a random generator program to generate each of the fountain code seeds included in the seed data. In some examples, each of the fountain code seeds may include a set or fixed number of random values, such as 26 to 32 bits. In other examples, the executed FC seed engine 117B may generate fountain code seeds based on the size of the user data. For example, the executed FC seed engine 117B may obtain a data block of user data 103 from the user data database 112 or obtain user data 103 prior to segmentation. In addition, the executed FC seed engine 117B may determine the size of the user data 103 based on the data block of the user data 103 or the user data 103 itself. Further, the executed FC seed engine 117B may generate a fountain code seed corresponding to the determined size of the user data 103 (e.g., the larger the size of the user data, the larger the size of the fountain code seed). In various examples, the fountain code seed generated by the executed FC seed engine 117B may correspond to a particular data block and a set of data elements associated with the particular data block. In such examples, the set of random values included in the fountain code seed may identify a particular data block, a number of data elements included in the set of data elements associated with the particular data block, and which data elements are to be included in the set of data elements of the particular block. In addition, the executed FC seed engine 117B may generate such fountain code seeds based on portions of metadata stored in the metadata database 113 associated with the particular block. As described herein, the fountain code seed may include information sufficient to describe the content of the corresponding data packet that the ED computing system 110 may use in decoding the corresponding data packet. In some examples, the executed FC seed engine 117B may generate seed data that includes one or more fountain code seeds generated by the executing FC seed engine 117B.

Further, the executed FC seed engine 117B may embed or include a portion of the metadata stored in the metadata database 113 that characterizes and identifies several encoding-decoding parameters in the fountain code seed. In some examples, for each data block of user data (such as user data 103), the executed FC seed engine 117B may obtain corresponding metadata from metadata database 113. In addition, the corresponding metadata for each of the data blocks of user data may include encoding-decoding parameters. As described herein, each encoding-decoding parameter may characterize the characteristics of the encoding and decoding process for received user data (such as one or more data blocks). In addition, the encoding-decoding parameters may be based in part on and dependent on the size of the received or obtained user data (such as user data 103).

In addition, the executed FC seed engine 117B may generate corresponding seed metadata or element code for each of the one or more fountain code seeds. In such examples, for each of the one or more fountain code seeds, the executed FC seed engine 117B may apply one or more mixing functions to the corresponding fountain code seed to generate the corresponding seed metadata. As described herein, one or more mixing functions may be deterministic-producing the same result for a particular data packet regardless of the processing order of the data packet, unbiased, and may have a very flat distribution across the set of results. In some examples, each of the one or more mixing functions may include a set of exclusive or shift functions configured for long-loop pseudo-random number generation.

In some examples, the one or more mixing functions may include a mixing function that, when applied to the fountain code seed, causes the executed FC seed engine 117B to generate a data block identifier associated with the valid data block. The data block identifier may identify the data block with which the corresponding fountain code seed is associated. In some examples, the mixing function may generate the data block identifier based on a value of the fountain code seed and a fountain code seed size. In other examples, the mixing function may include a mixing function that, when applied to the fountain code seed, causes the executed FC seed engine 117B to generate, for each identified data element in the set of random values included in the fountain code seed, a value representing one of a plurality of possible states (e.g., "isZero", "isOne", "noInfo") that the corresponding data element may take. In various examples, the one or more mixing functions may include a mixing function that, when applied to the fountain code seed, causes the executed FC seed engine 117B to generate a value representing the corresponding metadata bit for each data element identified in the set of random values included in the fountain code seed.

Referring back to fig. 3, the executed FC seed engine 117B may generate seed metadata for each fountain code seed based in part on the output of each of the one or more mixing functions. For example, the seed metadata may include a corresponding data block identifier, with a corresponding value representing a metadata bit and a corresponding value representing one of a plurality of states for each of the data elements included in the corresponding data block. In some examples, the executed FC seed engine 117B may store seed data and seed metadata for each fountain code seed included in the seed data within a corresponding portion of the data repository 111 (such as the FC seed database 114).

For example, a first mixing function may be configured to generate a data block identifier associated with a valid data block, a second mixing function may be configured to generate, for each identified data element of the set of random values included in the fountain code seed, a value representing one of a plurality of possible states (e.g., "isZero", "isOne", "noInfo") that the corresponding data element may take, and a third mixing function may be configured to generate, for each identified data element of the set of random values included in the fountain code seed, a value representing a corresponding metadata bit. In such an example, the executed FC seed engine 117B may obtain seed data and apply the first, second, and third blending functions to a particular fountain code seed (e.g., 24-32 bits) of the seed data. Additionally, based on applying the first mixing function to the particular fountain code seed, the executed FC seed engine 117B may generate a 2-to 11-bit data block identifier that represents the particular data block associated with the particular fountain code seed. Further, based on applying the second mixing function to the particular fountain code seed, the executed FC seed engine 117B may generate values associated with elements identified in the particular fountain code seed, such as "isZero", "isOne", "noInfo". Further, based on applying the third mixing function to the particular fountain code seed and for each of the elements identified in the particular fountain code seed, the executed FC seed engine 117B may generate a value representing the particular metadata bit.

Referring back to fig. 3, the executed encoding engine 117C may utilize the seed data and seed metadata to encode user data, such as user data 103 obtained from one or more client devices 101. As described herein, the encoding engine 117C as executed may encode each data block of user data. In such examples, the executed encoding engine 117C may encode the data blocks of user data simultaneously or in parallel or alternatively serially. In addition, the performed encoding engine 117C may apply fountain codes (e.g., a lubi transform) to each data element of each corresponding data block for each data block. Further, the executed encoding engine 117C may generate and encapsulate a set of random data elements into one or more portions of a data packet. Further, the executed encoding engine 117C may combine the set of random data elements together by bits under a binary field. The set of randomly combined data elements may be the payload of the corresponding data packet and may include information necessary to describe the original user data when processed (such as decoded) with a sufficient number of other data packets.

For example, the executed encoding engine 117C may obtain metadata for each data block of user data 103 obtained from one or more client devices 101 (such as client device 101A). In addition, the executed encoding engine 117C may obtain seed data and corresponding seed metadata from the FC seed database 114. Further, for each data block, the encoding engine 117C that is performed may select a particular potential fountain code seed from the obtained seed data. Based on the corresponding seed metadata of the potential fountain code seed and metadata associated with the corresponding data block, the executed encoding engine 117C may determine whether the identifier identified in the metadata of the corresponding data block matches the data block identifier of the seed metadata. In examples where the executed encoding engine 117C determines that the identifier identified in the metadata of the corresponding data block does not match the data block identifier of the seed metadata, the executed encoding engine 117C may select another potential fountain code seed. In addition, the performed encoding engine 117C may repeat the process of determining whether the identifier identified in the metadata of the corresponding data block matches the data block identifier of the seed metadata of the second potential fountain code seed. The performed encoding engine 117C may keep repeating the process until the data block identifier of the potential fountain code matches the identifier identified in the metadata of the corresponding data block.

In examples where the executed encoding engine 117C determines that the identifier identified in the metadata of the corresponding data block or fragment matches the data block identifier of the seed metadata, the executed encoding engine 117C may determine whether the potential fountain code seed has been used to generate another data packet of a set of random data elements. In an example where the executed encoding engine 117C determines that a fountain code seed has been used to generate another data packet of a set of random data elements, the executed encoding engine 117C may select another potential fountain code seed from the seed data. As described herein, the performed encoding engine 117C may repeat the above process to determine a potential fountain code seed that includes a data block identifier that matches an identifier identified in metadata of a corresponding data block and that has not been used to generate another data packet of a set of random data elements.

In an example where the executed encoding engine 117C determines that the potential fountain code seed has not been used to generate another data packet of a set of random data elements, the executed encoding engine 117C may determine whether to identify in the metadata of the corresponding block data one or more values represented in the corresponding metadata bits in the seed metadata and corresponding values representing one of a plurality of states (e.g., "isZero", "isOne", "noInfo"). In examples where the executed encoding engine 117C determines one or more values represented in corresponding metadata bits in the seed metadata and a corresponding value representing one of the plurality of states is not identified in metadata of the corresponding block data, the executed encoding engine 117C may select another potential fountain code seed. As described herein, the performed encoding engine 117C may repeat the above-described process to determine a potential fountain code seed that (1) includes a data block identifier that matches an identifier identified in metadata of a corresponding data block, (2) another data packet that has not been used to generate a set of random data elements, and (3) includes one or more values represented in corresponding metadata bits in the seed metadata and a corresponding value that represents one of a plurality of states (e.g., "isZero", "isOne", "noInfo") in metadata of the corresponding block data.

In examples where the executed encoding engine 117C determines one or more values represented in corresponding metadata bits in the seed metadata and a corresponding value representing one of a plurality of states identified in metadata of the corresponding block data or segment, the executed encoding engine 117C may utilize the potential fountain code seed and/or the corresponding seed metadata to generate a data packet having a payload corresponding to the potential fountain code seed. For example, the payload may include the set of random data elements identified in the potential fountain code seed and/or corresponding seed metadata. In addition, each data element in the set of random data elements may be encoded by the executed encoding engine 117C using a fountain code (e.g., a luratio transform), as described herein. In some examples, the executed encoding engine 117C may combine each encoded data element of the set of random data elements and encapsulate the combined encoded data element into one or more portions of the data packet 306. Further, as described herein, the executed encoding engine 117C may encapsulate the corresponding potential fountain code seed into one or more portions of the data packet 306. As described herein, for each data block identified in the metadata, the performed encoding engine 117C may repeat the process described herein until all data elements identified in the metadata of the corresponding data block are included in the data packet 306. In some examples, for each data block of user data 103, the executed encoding engine 117C may store the resulting data packet 306 within a corresponding portion of data repository 111 (such as encoded data database 115A).

In some examples, the executed encoding engine 117C may add Error Correction Codes (ECC) to the data packet. The ECC may be used by ED computing system 110 to control errors in the corresponding data packet during the decoding process (e.g., to recover lost bits or correct error bits during the decoding process). In some examples, the encoding-decoding parameters may indicate that the corresponding data packet is formatted such that the ECC is behind the payload. In such examples, based on the encoding-decoding parameters, the encoding engine 117C being executed may generate a data packet having an ECC code following the payload.

As described herein, the executed sequencer engine 117D may generate sequence map data for each of the data packets 306 of the user data 103 stored in the encoded data database 115A. For example, as shown in FIG. 3, the executed sequencer engine 117D may obtain mapping data from the mapping data database 116. As described herein, mapping data can identify particular bases and corresponding bit pairs. The mapping data may be modified and generated by an operator of ED computing system 110. Examples of pairs of positions and corresponding bases may include that pair 00 corresponds to adenine, pair 01 corresponds to cytosine, pair 10 corresponds to guanine, and pair 11 corresponds to thymine. In addition, the executed sequencer engine 117D may obtain one or more data packets 306 of the user data 103 stored in the encoded data database 115A. Further, for each of the one or more data packets 306, the executed sequencer engine 117D may identify or determine a bit sequence of the fountain code seed and a payload (e.g., the set of encoded random data elements) included within the corresponding data packet 306. Based on the determined or identified bit sequence and the mapping data for each of the one or more data packets 306, the executed sequencer engine 117D may determine a corresponding base sequence for the bit sequence for each of the one or more data packets 306. Further, for each of the one or more data packets 306 and based on the determined corresponding base sequence, the executed sequencer engine 117D may generate sequence mapping data 308. The sequence mapping data 308 of each of the one or more data packets 306 may identify a corresponding sequence of bases of the corresponding data packet 306.

In some examples, the sequencing program engine 117D executing may add one or more primers (such as a front-end primer and a back-end primer) or sequence portions representing corresponding primers to sequence mapping data of each of the one or more data packets 306. For example, for a particular data packet 306, the sequencing program engine 117D executing may add a base sequence representing a front-end primer to the beginning of the corresponding base sequence. In addition, for a particular data packet 306, the sequencing program engine 117D that is executed may add a base sequence representing the back-end primer to the end of the corresponding base sequence. Information related to the sequence of each of the one or more primers may be included in the metadata of each of the data blocks or fragments. In various examples, the front-end primers and the back-end primers may be of a fixed length or size known or encoded in the sequencing program engine 117D being executed.

In other examples, for sequence mapping data 308 of each data packet 306, the sequencing program engine 117D executing may determine whether a polynucleotide synthesized from the corresponding sequence of bases identified in the associated sequence mapping data is stable enough to synthesize. In such examples, for the sequence mapping data 308 of each data packet 306, the sequencing program engine 117D executing may determine whether the base sequence in the corresponding sequence mapping data 308 meets one or more sequence criteria. Examples of one or more sequence criteria include criteria associated with repeating bases (e.g., the number of bases in a column exceeds a threshold amount of bases), criteria associated with patterns of bases (e.g., the base sequence should have a number of patterns below a threshold amount), and criteria associated with ratios of bases (e.g., criteria indicate that the ratio of bases should be 50/50AT to GC). In examples where the executed sequencer engine 117D determines that the base sequence of the sequence mapping data 308 for a particular data packet 306 meets one or more sequence criteria, the execution sequencer engine 117D may store the sequence mapping data 308 in the mapping data database 116.

In various examples, the executed sequencer engine 117D may determine whether each data element of each data block has been included in the sequence map data 308 of each data packet 306. In such examples, the executed sequencer engine 117D may utilize the metadata for each data block to determine whether each data element in each data block has been included in the sequence map data 308 for each data packet 306 stored in the map data database 116. In examples where the executed sequencer engine 117D determines that the sequence map data 308 of each data packet 306 stored in the map data database 116 lost one or more data elements of one or more data blocks of user data (e.g., user data 103), the executed sequencer engine 117D may signal or instruct the encoding engine 117C to continue encoding the lost data elements of the incomplete data block or fragment. Otherwise, the executed sequencer engine 117D may transmit the sequence map data 308 for each data packet 306 to the server 120A of the genetic computing system 120. In such examples, the executed sequencer engine 117D may generate the message 304. Additionally, the executed sequencer engine 117D may be encapsulated within one or more portions of the sequence map data 308 of the messages 304 of each data packet 306. In addition, the executed sequencer engine 117D may transmit a message 304 including sequence map data 308 for each data packet 306 to the server 120A of the genetic computing system 120. As described herein, genetic computing system 120 can utilize sequence mapping data 308 to generate corresponding polynucleotide strands and store the corresponding polynucleic acid strands in a pool of polynucleotide strands. The polynucleotide pool may include a plurality of polynucleotide strands that each correspond to a particular block or fragment of user data.

As shown in fig. 4, a programming interface (such as API 402) established and maintained by server 120A of genetic computing system 120 may receive a communication 304 including sequence mapping data 308 for each data packet 306. As described herein, genetic computing system 120 may receive communication 304 across communication network 130 via a programmatically-established communication channel between API 402 and executed sequencer engine 117D. In addition, API 402 may route message 304 to the executed composition engine 121A. The executed composition engine 121A may parse the message 304 and obtain sequence map data 308 for each data packet 306. Further, the execution fragmentation engine 117A may provide the sequence mapping data 308 to one or more electrodes of the electrode unit 122. One or more electrodes of electrode units 122 can generate a corresponding polynucleotide strand 404 for each base sequence identified in sequence mapping data 308 of each data packet 306. The polynucleotide strands 404 generated from the sequence mapping data 308 may be stored in a polynucleotide strand pool 406. Polynucleotide chain pool 406 may include a plurality of polynucleotide chains 404 that each correspond to a particular block or segment of user data, such as user data 103.

Fig. 7 is a flow chart of an exemplary process 700 for encoding data for storage in genetic material. For example, one or more computing systems (such as ED computing system 110) may perform one or more steps of exemplary process 700, as described below with reference to FIG. 7. Referring to fig. 7, ed computing system 110 may perform any of the processes described herein to segment user data 103 into a plurality of data blocks (e.g., in step 702 of fig. 7). In some examples, ED computing system 110 may obtain user data 103 from client device 101 (such as client device 101A). In other examples, the executed segmentation engine 117A may segment the user data into a plurality (e.g., 4-2048) of smaller data blocks having substantially equal non-overlapping sizes. In an example, the executed segmentation engine 117A may generate a data store that stores one or more segments or data blocks within a corresponding portion of the data repository 111 (such as the user data database 112).

Additionally, ED computing system 110 may perform any of the processes described herein to generate seed data (e.g., in step 704 of FIG. 7). In some examples, the executed FC seed engine 117B may implement operations to generate seed data. As described herein, the seed data may identify and characterize a number of fountain code seeds. In some examples, the executed FC seed engine 117B may implement a random generator program to generate each of the fountain code seeds included in the seed data. In some examples, each of the fountain code seeds may include a set or fixed number of random values, such as 26 to 32 bits. In other examples, the executed FC seed engine 117B may generate fountain code seeds based on the size of the user data.

For example, the executed FC seed engine 117B may obtain a data block of user data 103 from the user data database 112 or obtain user data 103 prior to segmentation. In addition, the executed FC seed engine 117B may determine the size of the user data 103 based on the data block of the user data 103 or the user data 103 itself. Further, the executed FC seed engine 117B may generate a fountain code seed corresponding to the determined size of the user data 103 (e.g., the larger the size of the user data, the larger the size of the fountain code seed).

In various examples, the fountain code seed generated by the executed FC seed engine 117B may correspond to a particular data block and a set of data elements associated with the particular data block. In such examples, the set of random values included in the fountain code seed may identify a particular data block, a number of data elements included in the set of data elements associated with the particular data block, and which data elements are to be included in the set of data elements of the particular block. In addition, the executed FC seed engine 117B may generate such fountain code seeds based on portions of metadata stored in the metadata database 113 associated with the particular block. As described herein, the fountain code seed may include information sufficient to describe the content of the corresponding data packet that the ED computing system 110 may use in decoding the corresponding data packet. In some examples, the executed FC seed engine 117B may generate seed data that includes one or more fountain code seeds generated by the executed FC seed engine 117B.

Further, the executed FC seed engine 117B may embed or include a portion of the metadata stored in the metadata database 113 that characterizes and identifies several encoding-decoding parameters in the fountain code seed. In some examples, for each data block of user data (such as user data 103), the executed FC seed engine 117B may obtain corresponding metadata from metadata database 113. In addition, the corresponding metadata of each data block of user data may include encoding-decoding parameters. As described herein, each encoding-decoding parameter may characterize the characteristics of the encoding and decoding process for received user data (e.g., one or more data blocks). In addition, the encoding-decoding parameters may be based in part on and dependent on the size of the received or obtained user data (such as user data 103).

Additionally, for each of the one or more fountain code seeds, the executed FC seed engine 117B may generate corresponding seed metadata or element code. In such examples, for each of the one or more fountain code seeds, the executed FC seed engine 117B may apply one or more mixing functions to the corresponding fountain code seed to generate the corresponding seed metadata. As described herein, one or more mixing functions may be deterministic-producing the same result for a particular data packet regardless of the processing order of the data packet, unbiased, and may have a very flat distribution across the set of results. In some examples, each of the one or more mixing functions may include a set of exclusive or shift functions configured for long-loop pseudo-random number generation.

Further, ED computing system 110 may perform any of the processes described herein to implement, for each of a plurality of data blocks, a first set of operations that generate one or more data packets (e.g., in step 704 of FIG. 7). In some examples, the executed encoding engine 117C may encode user data, such as user data 103 obtained from one or more client devices 101, with seed data and seed metadata. As described herein, the encoding engine 117C as executed may encode each data block of user data. In such examples, the executed encoding engine 117C may encode the data blocks of user data simultaneously or in parallel or alternatively serially. In addition, the performed encoding engine 117C may apply fountain codes (e.g., a lubi transform) to each data element of each corresponding data block for each data block. Further, the executed encoding engine 117C may generate and encapsulate a set of random data elements into one or more portions of a data packet. Further, the executed encoding engine 117C may combine the set of random data elements together by bits under a binary field. The set of randomly combined data elements may be the payload of the corresponding data packet and may include information necessary to describe the original user data when processed (such as decoded) with a sufficient number of other data packets.

In examples where the executed encoding engine 117C determines one or more values represented in corresponding metadata bits in the seed metadata and a corresponding value representing one of a plurality of states identified in metadata of the corresponding block data or segment, the executed encoding engine 117C may utilize the potential fountain code seed and/or the corresponding seed metadata to generate the data packet 306 having a payload corresponding to the potential fountain code seed. For example, the payload may include the set of random data elements identified in the potential fountain code seed and/or corresponding seed metadata. In addition, each data element in the set of random data elements may be encoded by the executed encoding engine 117C using a fountain code (e.g., a luratio transform), as described herein. In some examples, the executed encoding engine 117C may combine each encoded data element of the set of random data elements and encapsulate the combined encoded data element into one or more portions of the data packet 306. Further, as described herein, the executed encoding engine 117C may encapsulate the corresponding potential fountain code seed into one or more portions of the data packet 306. As described herein, for each data block identified in the metadata, the performed encoding engine 117C may repeat the process described herein until all data elements identified in the metadata of the corresponding data block are included in the data packet 306. In some examples, for each data block of user data 103, the executed encoding engine 117C may store the resulting data packet 306 within a corresponding portion of data repository 111 (such as encoded data database 115A).

Further, for each data packet, ED computing system 110 may cause a second set of operations to be performed (e.g., in step 708 of FIG. 7) that synthesizes a polynucleotide chain based at least on the bit values of the corresponding data packet. In some examples, genetic computing system 120 may utilize sequence mapping data 308 generated from each data packet to generate a corresponding polynucleotide strand and store the corresponding polynucleic acid strand in a pool of polynucleotide strands. The polynucleotide pool may include a plurality of polynucleotide strands that each correspond to a particular block or fragment of user data.

C. Computer-implemented techniques for decoding user data from a polynucleotide chain

As described herein, encoder-decoder (ED) computing system 110 may implement operations to decode data derived from genetic material, such as one or more polynucleotide chains. In addition, ED computing system 110 may utilize Fountain Code (FC) programs to decode this data. In some examples, genetic computing system 120 may process and sequence one or more polynucleotide strands in a pool of polynucleotide strands and generate sequence data identifying a base sequence for each of the one or more polynucleotide sequences. Furthermore, ED computing system 110 may generate sequence bit data based on the sequence data. The sequence bit data may identify a bit sequence corresponding to each of the base sequences of each of the one or more polynucleotide strands identified in the sequence data.

For example, as shown in FIG. 5, a pool of polynucleotide strands 406 may include one or more polynucleotide strands 404. In addition, one or more electrodes of electrode units 122 can determine the sequence of bases of each of one or more polynucleotide strands 404 in polynucleotide strand pool 406. The executed sequencer engine 121B may generate sequence data 504 that identifies the determined base sequence for each of the one or more polynucleotide strands 404. As described herein, each of the one or more polynucleotide chains 404 and the pool of polynucleotide chains 406 can be associated with user data (such as user data 103). In addition, the executed sequencer engine 121B may generate the message 502 and packets within one or more portions of the sequence data 504 of the message 502. In addition, the executed sequencer engine 121B may transmit the message 502 to the server 110A of the ED computing system 110. As described herein, ED computing system 110 may utilize sequence data 504 to reconstruct and decode user data 103.

As shown in FIG. 6, a programming interface (such as an API 602) established and maintained by a server 110A of the ED computing system 110 may receive a message 502 including sequence data 504. As described herein, the ED computing system 110 may receive the communication 502 across the communication network 130 via a communication channel programmatically established between the API 602 and the executed sequencer engine 121B. In addition, the API 602 may route the message 304 to the executed sequencer engine 117D. The executed sequencer engine 117D may parse the message 502 and obtain sequence data 504. Further, the executed sequencer engine 117D may store the sequence data 504 within one or more portions of the data repository 111 (such as the sequence data database 604).

In addition, the executed sequencer engine 117D may perform an operation of determining a sequence of bits corresponding to the base sequence identified in the sequence data 504. In some examples, the executed sequencer engine 117D may obtain the sequence data 504 from the sequence data database 604. In addition, the executed sequencer engine 117D may obtain mapping data from the mapping data database 116. Based on the mapping data and the sequence data 504, the executed sequencer engine 117D may determine a sequence of bits corresponding to the base sequence identified in the sequence data 504. Further, the executed sequencer engine 117D may generate sequence-bit data that identifies a sequence of bits corresponding to the base sequence identified in the sequence data 504. In some examples, the executed sequencer engine 117D may store the sequence bit data within one or more portions of the data repository 111 (such as the sequence data database 604).

As described herein, in some examples, each polynucleotide strand 404 may include primers, such as front-end primers and/or back-end primers, at both ends of the fountain code seed and payload. In addition, the base sequences in the front primer and the rear primer are the same for each polynucleotide strand 404. This information, not shown in FIG. 1, may be obtained or encoded into the sequencing program engine 117D that is executed, and may be used to identify and/or tailor primers based on the sequence of the polynucleotide strand identified in the sequence data 504. In other examples, each polynucleotide strand 404 may not include primers, such as front-end primers and/or back-end primers. In such examples, the sequencing program engine 117D that is executed may not need to identify and tailor primers from the sequence of the polynucleotide strand identified in the sequence data generated by the genetic computing system 120.

Referring back to fig. 6, the performed pre-inspection engine 606 may implement a set of pre-inspection or pre-processing operations to determine an estimated distribution of data blocks based on the sequence bit data for each of the sequences of bases identified in the sequence data 504. For example, the executed pre-fetch engine 606 may obtain a portion of the sequence bit data from the sequence data database 604. As described herein, portions of the sequence bit data can be associated with a set of random polynucleotide strands 404 in a pool of polynucleotide strands 406 (e.g., a set of 100,000 to 200,000 polynucleotide strands in a pool of 10,000,000 polynucleotide strands). Additionally, in some examples, the set of random polynucleotide strands 404 may include primers, such as front-end primers and back-end primers. In such examples, the executed pre-detection engine 606 may determine the sequence of bits from portions of the sequence bit data and identify portions of the sequence of bits corresponding to the primers (described herein as "primer portions") based on information known or encoded as the executed decoding engine 117E associated with the length and size of the primers and the sequence bit data. In addition, the performed pre-inspection engine 606 may trim the primer portions, leaving portions of the sequence of bits located between the primer portions (described herein as "middle portions"). As described herein, the middle portion may be a portion of bits corresponding to the fountain code seed and associated payload. Alternatively, in instances where the set of random polynucleotide strands does not include a primer, the performed pre-examination engine may not trim the sequence of bits corresponding to the set of random polynucleotide strands. In such examples, the performed synthesis engine 121A may implement a biological agreement using custom sequence primers that have the effect of removing front-end primers and/or back-end primers of the polynucleotide sequence. Thus, the remaining polynucleotide sequences or intermediate portions may be sequenced by sequencer engine 121B, and the corresponding sequences of bits generated by sequencer engine 117D may correspond to fountain code seed portions and associated payload portions.

Further, the executed pre-fetch engine 606 may obtain the encoding-decoding parameters from the decoding data database 115B. As described herein, the encoding-decoding parameters may indicate that the corresponding data packet is formatted such that the fountain code seed precedes the payload by a fountain code seed size and/or payload size. Based on the remaining intermediate portions and the encoding-decoding parameters, the performed pre-inspection engine 606 may determine which portion of the intermediate portions corresponds to the fountain code seed and associated payload, and the size of the intermediate portions.

In examples where ED computing system 110 has encoded and decoded user data of varying sizes, the size of the bit sequence corresponding to the fountain code seed and the payload may vary. As described herein, data packet mapping data may be stored in ED computing system 110. The data packet map data may indicate a particular format (e.g., fountain code seed before or after the payload), and the size of the fountain code seed and/or the payload, at least for varying sizes of bit sequences corresponding to the fountain code seed and the payload. Additionally, when the executed pre-inspection engine 606 performs the set of pre-inspection or pre-processing operations, the executing post-flight engine 606 may determine the size of all intermediate portions to determine a majority of the sizes. Based on most of the size and data packet mapping data, the executed pre-inspection engine 606 may determine which portions of the estimated fountain code seed size, payload size, and intermediate portions correspond to which portions of the fountain code seed and intermediate portions correspond to the payload.

Referring back to fig. 6, the executed pre-inspection engine 606 may determine which portions of the portion of the bit sequence of the fountain code seed (described herein as "fountain code seed portions") are associated with the identifiers of the data blocks. Additionally, based on the fountain code seed portion determined to be the middle portion associated with the identifier of the data block, the executed pre-inspection engine 606 may determine the identifier of the data block. Further, the executed pre-inspection engine 606 may determine a distribution of identifiers of the data blocks based on the determined identifiers of the data blocks of each of the fountain code seed portions. In some examples, the performed pre-inspection engine 606 may generate a histogram that identifies and characterizes the distribution of identifiers of the data blocks. Additionally or alternatively, the executed pre-inspection engine 606 may generate data block plan data that identifies and characterizes the distribution of identifiers of the determined data blocks. In some examples, the executed decoding engine 117E may store the generated data block plan data within a corresponding portion of the data repository 111 (such as the decoded data database 115B).

The performed decoding engine 117E may perform a set of operations to recover or generate seed metadata or element code corresponding to each identified or determined portion of the sequence of bits associated with the second set of polynucleotide chains 404. In some examples, the second set of polynucleotide strands 404 may be all of the polynucleotide strands sequenced (e.g., polynucleotide strand pool 406). In such examples, the decoding engine 117E executing may obtain sequence bit data associated with each sequence of bases identified in the sequence data 504 from the sequence data database 604. Based on the sequence bit data of the second set of polynucleotide strands 404, the decoding engine 117E that is executed may determine the sequence of bits associated with the second set of polynucleotide strands. In examples where the second set of polynucleotide strands includes primers (such as front-end primers and back-end primers), the performed decoding engine 117E may identify portions of the sequence of bits corresponding to the primer portions based on information in the performed decoding engine 117E that is known or encoded to be associated with the length and size of the primers, and trim the primer portions. Each of the remaining portions may correspond to portions of bits with fountain code seeds and associated payloads that are otherwise described as intermediate portions. Alternatively, the second set of polynucleotide strands 404 may not include primers. In such examples, the performed synthesis engine 121A may implement a biological agreement using custom sequence primers that have the effect of removing front-end primers and/or back-end primers of the polynucleotide sequence. Thus, the remaining polynucleotide sequences may be sequenced by sequencer engine 121B, and the corresponding sequences of bits generated by sequencer engine 117D may correspond to the fountain code seed portion and associated payload portion. Further, the executed decoding engine 117E may obtain the encoding-decoding parameters from the decoding data database 115B and determine which portion of the remainder of the bits corresponds to the fountain code seed and which portion of the remainder of the bits corresponds to the associated payload.

In some examples, the executed decoding engine 117E may determine an identifier of the corresponding data block for each portion of bits corresponding to the fountain code seed or the fountain code seed portion and based on the portion of the fountain code seed. Additionally, the executed decoding engine 117E may determine a distribution of identifiers of data blocks associated with the second set of polynucleotide chains based on the determined data block identifiers of the fountain code seed portion. In some examples, the performed decoding engine 117E may determine whether the distribution of identifiers of the data blocks associated with the second set of polynucleotide chains matches the distribution of identifiers of the data blocks identified in the data block plan data. In examples where the distribution of identifiers of data blocks associated with the second set of polynucleotide chains does not match the distribution of identifiers of data blocks identified in the data block plan data, the performed decoding engine 117E may use clustering, multiple read alignments, and most of the base calls to implement additional recovery operations. In some examples, the performed decoding engine 117E may determine the identifiers of data blocks in or missing from the data block plan data based on the distribution of identifiers of data blocks associated with the second set of polynucleotide chains that do not match the distribution of identifiers of data blocks identified in the data block plan data. In such examples, the performed decoding engine 117E may use clustering, multiple read alignments, and most base calls for such lost data blocks to implement additional recovery operations.

In examples where the distribution of identifiers of data blocks associated with the second set of polynucleotide chains matches the distribution of identifiers of data blocks identified in the data block plan data, the performed decoding engine 117E may classify each fountain code seed portion and associated bits corresponding to a payload (described herein as a "payload portion") according to the corresponding data block identifiers. In addition, the executed decoding engine 117E may generate list data that identifies and characterizes one or more intermediate portions (e.g., fountain code portions and payload portions associated with corresponding data block identifiers) for each identifier of each of the data blocks. In some examples, the executed decoding engine 117E may store the list data within a portion of the data repository 111 (such as the decoded data database 115B).

Further, similar to the executed FC seed engine 117B, the executed decoding engine 117E may apply one or more mixing functions to each fountain code seed portion for each identifier of a data block or data block identifier to generate corresponding seed metadata or element code. As described herein, examples of blending functions may include blending functions that, when applied to each fountain code seed portion, cause the executed decoding engine 117E to generate a corresponding data block identifier. The data block identifier may identify a corresponding data block associated with the set of random data elements identified in the corresponding fountain code seed portion. Additionally, another example of a mixing function may include a mixing function that, when applied to each fountain code seed portion, causes the executed decoding engine 117E to generate, for each identified data element in the set of random values identified in the corresponding fountain code seed portion, a value representing one of a plurality of possible states (e.g., "isZero", "isOne", "noInfo") that the corresponding data element may take. Further, yet another example of a mixing function may include a mixing function that, when applied to each fountain code seed portion, causes the executed decoding engine 117E to generate a value representing a corresponding metadata bit for each identified data element in the set of random values identified in the fountain code seed portion. In some examples, the value representing the corresponding metadata bit may indicate to the decoding engine 117E being executed which resulting value representing one of the plurality of states is associated with which data element. In other examples, the value may be between zero and the metadata size minus one. In other examples, the performed decoding engine 117E may generate seed metadata for each fountain code seed portion based on the corresponding data block or segment identifier, one or more values each representing a metadata bit, and an associated value representing one of a plurality of states. The seed metadata of each fountain code seed portion may identify and characterize a corresponding data block or segment identifier, one or more values each representing a metadata bit, and an associated value representing one of a plurality of states. In such examples, the decoding engine 115E executing may store seed metadata or element code within a portion of the data repository 111 (such as the decoding data database 115B).

In some examples, the executed decoding engine 117E may determine, for each identifier of a data block or segment, whether the corresponding seed metadata or element codes of each corresponding fountain code seed portion are consistent with each other. In such examples, the executed decoding engine 117E may utilize one or more confidence thresholds to determine whether seed metadata or element codes for each corresponding fountain code seed portion are consistent with each other for each identifier of a data block or segment. In some examples, one or more confidence thresholds may be associated with a number of fountain code seed portions, where metadata bits for particular values have particular values that represent particular states of a plurality of states. For example, for a first data block, the decoding engine 117E executing may obtain seed metadata for 750 fountain code seed portions associated with an identifier of the first data block. Additionally, based on the seed metadata of the 750 fountain code seed portions, the executed decoding engine 117E may determine that 500 portions of the fountain code seed portions have corresponding seed metadata that indicates that for metadata bits having a particular value of 1, there is a corresponding value representing a particular state "isZero" of the plurality of states. The performed decoding engine 117E may determine whether a metadata bit of value 1 of the first data block has a corresponding value representing state "isZero" based on a number of fountain code seed portions of seed metadata having metadata bits of value 1 representing a corresponding value of state "isZero" being greater than or equal to a confidence threshold associated with a number of metadata bits having a particular value representing a particular value of a particular state of the plurality of states. In the example of a confidence threshold of 250 fountain code seed portions (where the seed metadata has a metadata bit of value 1 for the first data block and a corresponding value representing state "isZero"), the decoding engine 117E executing may determine that the metadata bit of value 1 for the first data block has a corresponding value representing state "isZero". Alternatively, in the example where the confidence threshold is 5000 seed metadata or element code with a metadata bit value of 1 for the first data block and a corresponding value representing state "isZero", the executed decoding engine 117E may determine that the metadata bit of value 1 for the first data block may not have a corresponding value representing state "isZero" or have a state of "isNoInfo".

In other examples, for a particular block of data, the one or more confidence thresholds may be based on a maximum number of metadata bits in a particular value having a particular value representing a particular state of the plurality of states. For example, for a second data block, the decoding engine 117E executing may obtain seed metadata for a number of fountain code seed portions. Based on the obtained seed metadata, the executed decoding engine 117E may determine that 100 fountain code seed portions have corresponding seed metadata indicating that a metadata bit having a particular value of 3 (e.g., metadata bit 3) has a first corresponding value representing a particular state "isZero" of the plurality of states, and that 350 fountain code sub-portions have corresponding seed metadata indicating that a metadata bit having a particular value of 3 (e.g., metadata bit 3) has a corresponding value representing a particular state "isOne" of the plurality of states. Further, the decoding engine 117E performed may determine that the metadata bit having a value of 3 for the second data block has a corresponding value representing state "isZero" based on the number of fountain code seed portions having corresponding seed metadata having a corresponding value of particular value 3 for the metadata bit having particular value 3 (e.g., metadata bit 3) as compared to the number of fountain code seed portions having corresponding seed metadata having a corresponding value of particular value 3 for the metadata bit having particular value 3 (e.g., metadata bit 3) indicating having a corresponding value of particular state "isOne" of the plurality of states.

In other examples, the executed decoding engine 117E may determine, for each identifier of a data block or segment, whether the payload portion is sufficient for the executed decoding engine 117E to recover the complete corresponding data block. In some examples, the decoding engine 117E that is executed may make such determinations based on seed metadata or element code for each corresponding fountain code seed portion. In such examples, the decoding engine 117E executing may determine which data elements of the corresponding data block are identified in the seed metadata of each corresponding fountain code seed portion. Additionally, based in part on the identified data elements, the executed decoding engine 117E may determine whether any and which data elements of the corresponding data block are missing or incorrect. For example, in another example, the executed decoding engine 117E may determine that all or most of the identified data elements having a bit value of 4 have a corresponding value associated with state "isZero". In addition, the executed decoding engine 117E may determine that a small number of identified data elements having a bit value of 4 have a corresponding value associated with state "isOne". Thus, the executed decoding engine 117E may determine that the data element having a bit value of 4 may have a corresponding value associated with state "IsZero" and that the data element identified using "IsOne" is incorrect. In another example, the executed decoding engine 117E may determine one or more of the identified data elements having a bit value of 6 but not obtaining information about corresponding values associated with states of the plurality of states. In addition, the executed decoding engine 117E may determine that a number of identified data elements having a bit value of 6 have a corresponding value associated with state "isOne". Thus, the executed decoding engine 117E may determine that the data element having a bit value of 6 and the value associated with the state is missing may have a corresponding value associated with state "isOne".

Otherwise, in examples where the executed decoding engine 117E determines that all data elements are identified in the seed metadata or element code of the fountain-code seed portion of each data block, the executed decoding engine 117E may perform a set of operations to reconstruct the original user data based on the list data and the seed metadata or element code of each portion of bits of the fountain-seed code corresponding to the identifier of each data block or segment. In some examples, the implemented decoding engine 117E may obtain list data and seed metadata for the fountain code seed portion of each data block from the decoded data database 115B. In addition, the executed decoding engine 117E may utilize the list data and seed metadata or element code to initialize a decoding process, such as a fountain code decoding process. In some examples, the decoding engine 117E performed may implement a decoding process to decode each data block serially or simultaneously/in parallel. In either instance, for each data block, the decoding engine 117E that is performed may apply the decoding process to the seed metadata and the portion of the list data associated with the identifier of the corresponding data block. In addition, for each data block, the decoding engine 117E that is executed may generate sets of data elements from each payload portion based on the application of the decoding process to the seed metadata and the portion of the list data associated with the identifier of the corresponding data block. As described herein, the list data may identify, for each data block and the identifier of the data block, the payload portion obtained from the bit sequence. Further, for each data block, the decoding engine 117E that is executed may identify information for each data element within each set of data elements. For example, for each data block, the decoding engine 117E executing may identify metadata bit values associated with each data element of each group, and corresponding information to be communicated, such as states of a plurality of states (e.g., "isZero", "isOne", "isnoInfo"). Additionally, for each block of data, the executed decoding engine 117E may determine an order of data elements that reflect the data elements of each block of user data when initially received and segmented by the ED computing system 110 (e.g., the executed segmentation engine 117A). The order of the data elements is based in part on the seed metadata and the portion of the list data associated with the identifier of the corresponding data block. In some examples, the executed decoding engine 117E may utilize a join graph to determine the join between each data element and the order of the data elements for a particular data block.

In some examples, the executed decoding engine 117E may determine whether all data elements of a particular data block have been identified, whether corresponding values representing one of a plurality of states have been determined, and whether the order of the data elements has been determined. As described herein, the decoding engine 117E as executed may make this determination for each data block identified in the sequence of bits. In an example where the executed decoding engine 117E has determined all data elements of a particular data block, has determined a corresponding value representing one of a plurality of states, and has determined an order of the data elements, the executed decoding engine 117E may reconstruct the particular data block from the seed metadata, the portion of the list data associated with the identifier of the corresponding data block, and the determined corresponding order of the data elements from the corresponding portion of the bits corresponding to the payload. In such examples, the executed decoding engine 117E may reconstruct the particular block of data by constructing from the payload portion and identifying each data element in the payload portion and combining each data element in an order according to the determined corresponding order of the data elements. After each data block identified in the self-bit sequence has been constructed, the decoding engine 117E as executed may combine each constructed data block. The combined data block 610 may reflect raw user data received by the ED computing system 110, such as user data 103. In some examples, the executed decoding engine 117E may store the combined data block 610 and each individual constructed data block within a corresponding portion of the data repository 111 (such as the decoded data database 115B). In other examples, the decoding engine 117E executing may generate the message 608 and encapsulate the reconstructed user data or combined data block 610 within one or more portions of the message 608. In such examples, the executed decoding engine 117E may transmit the message 608 including the combined data block 610 to the client device 101, such as the client device 101A, of the user that originally sent the original user data on which the reconstructed user data was based.

FIG. 8 is a flow chart of an exemplary process for decoding data derived from genetic material. For example, one or more computing systems (such as ED computing system 110) may perform one or more steps of exemplary process 800, as described below with reference to FIG. 8. Referring to fig. 8, ed computing system 110 may perform any of the processes described herein to obtain sequence data 504 (e.g., in step 802 of fig. 8). In some examples, ED computing system 110 may obtain sequence data from genetic computing system 120. Additionally, ED computing system 110 may perform any of the processes described herein to generate sequence bit data based on the sequence data (e.g., in step 804 of FIG. 8). In some examples, the executed sequencer engine 117D may perform operations to determine a sequence of bits corresponding to a sequence of bases identified in the sequence data 504.

Further, ED computing system 110 may perform any of the processes described herein to utilize a portion of the sequence bit data to implement a first set of operations (e.g., in step 806 of FIG. 8). In some examples, the performed pre-inspection engine 606 may implement a set of pre-inspection or pre-processing operations to determine an estimated distribution of data blocks based on the sequence bit data for each of the sequences of bases identified in the sequence data 504. For example, the executed pre-fetch engine 606 may obtain a portion of the sequence bit data from the sequence data database 604. As described herein, a portion of the sequence bit data can be associated with a set of random polynucleotide strands 404 of a pool of polynucleotide strands 406 (e.g., a set of 100,000 to 200,000 polynucleotide strands in a pool of 10,000,000 polynucleotide strands).

In addition, the executed pre-detection engine 606 may determine the sequence of bits from portions of the sequence bit data and identify portions of the sequence of bits corresponding to the primers (described herein as "primer portions") based on information associated with the length and size of the primers and sequence bit data known or encoded into the executed decoding engine 117E. In addition, the performed pre-inspection engine 606 may trim the primer portions, leaving portions of the sequence of bits located between the primer portions (described herein as "middle portions"). As described herein, the middle portion may be a portion of bits corresponding to the fountain code seed and associated payload. Further, the executed pre-fetch engine 606 may obtain the encoding-decoding parameters from the decoding data database 115B. As described herein, the encoding-decoding parameters may indicate that the corresponding data packet is formatted such that the fountain code seed precedes the payload by a fountain code seed size and/or payload size. Based on the remaining intermediate portions and the encoding-decoding parameters, the performed pre-inspection engine 606 may determine which portion of the intermediate portions corresponds to the fountain code seed and associated payload, and the size of the intermediate portions.

In other examples, the executed pre-inspection engine 606 may determine which portions of the sequence of bits corresponding to the fountain code seed (described herein as "fountain code seed portions") are associated with the identifiers of the data blocks. Additionally, based on the fountain code seed portion determined to be the middle portion associated with the identifier of the data block, the executed pre-inspection engine 606 may determine the identifier of the data block. Further, the executed pre-inspection engine 606 may determine a distribution of identifiers of the data blocks based on the determined identifiers of the data blocks of each of the fountain code seed portions. In some examples, the performed pre-inspection engine 606 may generate a histogram that identifies and characterizes the distribution of identifiers of the data blocks. Additionally or alternatively, the executed pre-inspection engine 606 may generate data block plan data that identifies and characterizes the determined distribution of data block identifiers. In some examples, the executed decoding engine 117E may store the generated data block plan data within a corresponding portion of the data repository 111 (such as the decoded data database 115B). Further, ED computing system 110 may perform any of the processes described herein to implement a second set of operations to reconstruct the original user data based at least on the sequence bit data (e.g., in step 808 of FIG. 8).

D. Exemplary hardware and software implementations

Implementations of the objects and functional operations described in this disclosure may be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and structural equivalents thereof, or in combinations of one or more of them. Embodiments of the objects described in this disclosure, including the application 104, the segmentation engine 117A, FC seed engine 117B, the encoding engine 117C, the sequencing program engine 117D, the decoding engine 117E, the sequencing program engine 121B, the composition engine 121A, the Application Programming Interface (API) 302, the API 402, the API 602, and the pre-inspection engine 606, may be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier, for execution or control of their operations by a data processing device (or computing system). Additionally or alternatively, the program instructions may be encoded on an artificially-generated propagated signal (such as a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus). The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.

The terms "device," "apparatus," and "system" refer to data processing hardware and encompass all kinds of devices, apparatuses, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus, device or system may also be or further comprise special purpose logic circuitry, such as an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). In addition to hardware, a device, apparatus, or system may optionally include code that generates an execution environment for a computer program, such as code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program (which may be referred to or described as a program, software application, engine, module, software module, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may (but need not) correspond to an archive in a file system. A program can be stored in a portion of a archive that holds other programs or data, such as in one or more scripts stored in a markup language file, in a single archive dedicated to the program in question, or in multiple coordinated archives, such as in an archive that stores portions of one or more modules, subroutines, or code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The program and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, by way of example, general purpose or special purpose microprocessors or both, or any other type of central processing unit. Typically, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The basic unit of the computer is a central processing unit for executing or executing instructions, and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, computers need not have such devices. In addition, the computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) or Assisted Global Positioning System (AGPS) receiver, or a portable storage device, such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and storage devices, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disk; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

To provide for interaction with a user, implementations of the objects described in this specification can be implemented on a computer having a display device (such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user or a keyboard or a pointing device (such as a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. In addition, the computer may interact with the user by sending and receiving files to and from devices used by the user; for example, by sending a web page to a web browser on a user device in response to a request received from the web browser.

Embodiments of the objects described in this specification may be implemented in a computing system that includes a back-end component (such as a data server) or that includes a middleware component (such as an application server), such as a client computer having a graphical user interface or a web browser through which a user can interact with an embodiment of the objects described in this specification, or any combination of one or more back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a Local Area Network (LAN) and a Wide Area Network (WAN), such as the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some implementations, a server transmits data (such as HTML pages) to a user device for purposes of displaying the data to and receiving user input from the user device interaction acting as a client, for example. Data produced at the user device, such as results of user interactions, may be received from the user device at the server.

While this specification includes many details, these should not be construed as limitations on the scope of the disclosure or of what may be claimed, but rather as descriptions of features specific to particular embodiments of the disclosure. Certain features of the specification that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown in the drawings or in sequential order, or that all operations shown in the drawings be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described process components and systems can generally be integrated together in a single software product or packaged into multiple software products.

In examples where HTML files are mentioned, other file types or formats may be substituted. For example, the HTML archive may be replaced by XML, JSON, plain text, or other types of files. In addition, other data structures (such as spreadsheets, relational databases, or structured files) may be used when referring to tables or hash tables.

Various embodiments have been described herein with reference to the accompanying drawings. It will, however, be evident that various modifications and changes may be made thereto, and additional embodiments may be implemented, without departing from the broader scope of the disclosed embodiments as set forth in the claims that follow.

Moreover, unless specifically defined otherwise herein, all terms should be given their broadest possible interpretation, including the meaning implied from the specification and the meaning understood by one of skill in the art and/or defined in the dictionary, paper, etc. It should also be noted that, as used in the specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless otherwise indicated, and when used in this specification the term "comprises" specify the presence or addition of one or more other features, aspects, steps, operations, elements, components, and/or groups thereof. Furthermore, the terms "coupled," "operably connected," and the like are to be construed broadly to refer to a mechanical, electrical, wired, wireless, or otherwise connecting together devices or components such that the connection allows the associated devices or components to operate (e.g., communicate) with one another as intended by the relationship. In this disclosure, the use of "or" means "and/or" unless stated otherwise. In addition, the use of the term "include" and other forms (such as "include") is not limited. In addition, unless specifically stated otherwise, terms such as "element" or "component" encompass both elements and components comprising one unit, as well as elements and components comprising more than one subunit. In addition, the section headers used herein are for organizational purposes only and should not be construed as limiting the described objects.

The foregoing is provided for the purpose of illustrating, explaining and describing embodiments of the present disclosure. Modifications and adaptations to the embodiments will be apparent to those skilled in the art and may be made without departing from the scope or spirit of the disclosure.

Claims

1. A computing system:

A memory storing instructions; and

At least one processor coupled to the communication interface and the memory, the at least one processor configured to execute the instructions to:

segmenting user data into a plurality of data blocks, each data block comprising metadata;

Generating seed data, the seed data characterizing a plurality of fountain code seeds;

For each of the plurality of data blocks, performing a first set of operations that produce one or more data packets, the set of operations comprising:

For each of the plurality of fountain code seeds:

determining a bit value identifying a bit position in the metadata and identifying and characterizing an element code value of information conveyed by the corresponding bit value; and

Determining which of the plurality of fountain code seeds has an element code value of the bit value that matches the value of the bit position identified in the metadata, each of the one or more data packets being associated with one of the plurality of fountain code seeds having the element code value of the bit value that matches the value of the bit position identified in the associated metadata; and

For each data packet, causing a second set of operations to be performed that synthesizes a polynucleotide chain based at least on the bit values of the corresponding data packet.

2. The computing system of claim 1, wherein the first set of operations further comprises:

For each of the plurality of fountain code seeds:

determining a data block identifier; and

Determining which of the plurality of fountain code seeds has a data block identifier that matches a block identifier identified in the associated metadata.

3. The computing system of claim 1, wherein the first set of operations further comprises:

For each of the plurality of fountain code seeds, a determination is made as to whether a particular fountain code seed has been previously used in the synthesis of the polynucleotide strand.

4. The computing system of claim 1, wherein the second set of operations comprises:

Encoding each of the one or more data packets as a nucleic acid sequence; and

Polynucleotide strands are synthesized based on the nucleic acid sequences.

5. The computing system of claim 4, wherein the second set of operations comprises:

A first primer and a second primer are attached to each polynucleotide strand.

6. The computing system of claim 5, wherein the at least one processor is further configured to:

For each polynucleotide strand, it is determined whether the corresponding polynucleotide strand satisfies a set of sequence criteria.

7. The computing system of claim 6, wherein the set of sequence criteria comprises:

at least one of a criterion associated with nucleotide repetition, a criterion associated with a nucleotide pattern, and a criterion associated with a nucleotide contrast ratio.

8. The computing system of claim 1, wherein for each data packet, causing the second set of operations to be performed that synthesizes a polynucleotide chain according to at least a bit value of the corresponding data packet comprises generating an instruction and transmitting the instruction to a device configured to perform the second set of operations.

9. The computing system of claim 1, wherein the second set of operations includes causing one or more electrodes of a set of electrodes to synthesize a polynucleotide according to at least a bit value of the corresponding data packet.

10. The computing system of claim 1, wherein for each of the plurality of data blocks, the one or more data packets are associated with one or more elements of the corresponding data block.

11. The computing system of claim 1, wherein the plurality of data blocks do not overlap.

12. The computing system of claim 1, wherein the metadata includes parameters for the encoding.

13. A computer-implemented method, the method comprising:

Segmenting, by at least a first processor, user data into a plurality of data blocks, each data block comprising metadata;

Generating, by at least the first processor, seed data characterizing a plurality of fountain code seeds;

for each of the plurality of data blocks, performing, by at least the first processor, a first set of operations that produce one or more data packets, the set of operations comprising:

For each of the plurality of fountain code seeds:

For each data packet, causing, at least by the first processor, a second set of operations to be performed to synthesize a polynucleotide chain based at least on bit values of the corresponding data packet.

14. The computer-implemented method of claim 13, wherein the first set of operations further comprises:

For each of the plurality of fountain code seeds:

determining a data block identifier; and

15. The computer-implemented method of claim 13, wherein the first set of operations further comprises:

16. The computer-implemented method of claim 13, wherein the first set of operations further comprises:

For each of the plurality of data blocks, determining whether the generated data packets are associated together with all data elements of the corresponding data block.

17. The computer-implemented method of claim 13, wherein the second set of operations comprises:

Encoding each of the one or more data packets as a nucleic acid sequence; and

Polynucleotide strands are synthesized based on the nucleic acid sequences.

18. The computer-implemented method of claim 17, wherein the second set of operations comprises:

A first primer and a second primer are attached to each polynucleotide strand.

19. The computer-implemented method of claim 18, the method further comprising

20. A non-transitory machine-readable storage medium storing instructions that, when executed by at least one processor of a server, cause the at least one processor to perform operations comprising:

For each of the plurality of fountain code seeds:

For each data packet, a second set of operations is performed to synthesize a polynucleotide chain based at least on the bit values of the corresponding data packet.