Nothing Special   »   [go: up one dir, main page]

US20180121680A1 - Obfuscating web code - Google Patents

Obfuscating web code Download PDF

Info

Publication number
US20180121680A1
US20180121680A1 US15/859,694 US201815859694A US2018121680A1 US 20180121680 A1 US20180121680 A1 US 20180121680A1 US 201815859694 A US201815859694 A US 201815859694A US 2018121680 A1 US2018121680 A1 US 2018121680A1
Authority
US
United States
Prior art keywords
expressions
code
computer
data
replacement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/859,694
Inventor
Xinran Wang
Yao Zhao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shape Security Inc
Original Assignee
Shape Security Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shape Security Inc filed Critical Shape Security Inc
Priority to US15/859,694 priority Critical patent/US20180121680A1/en
Publication of US20180121680A1 publication Critical patent/US20180121680A1/en
Assigned to SHAPE SECURITY, INC. reassignment SHAPE SECURITY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WANG, XINRAN, ZHAO, YAO
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/121Restricting unauthorised execution of programs
    • G06F21/125Restricting unauthorised execution of programs by manipulating the program code, e.g. source code, compiled code, interpreted code, machine code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/54Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures

Definitions

  • This document relates to computer security and interference with malware.
  • malware Malware
  • Bot activities include content scraping, reconnaissance, credential stuffing, creating fake accounts, comment spamming, and similar activities. Bots can impose an unnecessary load on any company trying to serve web content efficiently. More importantly, they can attempt to “learn” the operation of a web site so as to exploit it.
  • malicious software may execute a “man in the browser” attack by intercepting communications that a user makes with a web site in a manner that makes the user believe that he or she is actually communicating with the web site. For example, malware may generate a display for a user who is visiting a banking site, where the display requests from the user information such as social security number, credit card number, other account numbers. An organization that operates the malware may then have such data sent to it, and may use the data to steal from the user, the web site operator, or both.
  • This document describes systems and techniques by which web code (e.g., HTML, CSS, and JavaScript) that a server system provides to client devices is modified before it is served over the internet, so as to make more difficult the exploitation of the code and the operator of the server system by clients that receive the code (including clients that are infected without their human users' knowledge).
  • the modifications can be made to encode sensitive data, and may differ for different instances in which a web page and related content are served, whether to the same client computer or to different client computers.
  • a single expression or value in the code may be re-written as multiple expressions that, when executed, produce the initial value or expression. Where different code is served in response to each request, the expressions into which the initial value are resolved may also differ each time.
  • the output of the code, when executed on the client computer, however, is the same for all such different versions of the served code so that a user at a client computer does not perceive a difference in the displayed web page.
  • two different users or a single user in two different web browsing sessions
  • the manner in which an initial value or expression is rewritten into multiple expressions capable of being executed on a client computer may take a variety of forms For example, different expressions, different numbers of expressions, and different ordering of the execution of the expressions may all be varied to interfere with malware. Also, these different parameters may be varied so as to be different from one serving of the code to the next. Such variation, which may be termed “polymorphism” of the code, may help create a moving target against which malware needs to apply itself.
  • changing the code that is served to client devices in an essentially random manner i.e., a manner that effectively interferes with the ability of malware that has analyzed serving n from inferring something useful about serving n+x
  • each time the code is served can deter malicious code executing on the client computers (e.g., Man in the Browser bot) from interacting with the served code in a predictable way so as to trick a user of the client computer into providing confidential financial information and the like.
  • external programs generally cannot drive web application functionality directly, and so preventing predictable interaction with served code can be an effective mechanism for preventing malicious computer activity.
  • the techniques transform values or expressions, such as a cleartext string, a Javascript object, or a Javascript code snippet into another Javascript snippet that is the equivalent to the input after it is executed (i.e., it produces an identical displayed output).
  • the encoding is dynamic and random, which means that the encoding generates different output code each time given the same input (though the outputs may repeat periodically as long as that repetition is not frequent enough to allow malware to predict the output or readily obtain the repeated output). Because the encoded output code is still presented as cleartext, it may not be able to prevent a human from ascertaining sensitive data, but it may make it very difficult for a malicious party to write a computer program to extract the sensitive data automatically.
  • Some of these attacks include: (a) denial of service attacks, and particularly advanced application denial of service attacks, in which a malicious party targets a particular functionality of a website (e.g., a widget or other web application) and floods the server with requests for that functionality until the server can no longer respond to requests from legitimate users; (b) rating manipulation schemes in which fraudulent parties use automated scripts to generate a large number of positive or negative reviews of some entity such as a marketed product or business in order to artificially skew the average rating for the entity up or down; (c) fake account creation in which malicious parties use automated scripts to establish and use fake accounts on one or more web services to engage in attacks ranging from content spam, e-mail spam, identity theft, phishing, ratings manipulation, fraudulent reviews, and countless others; (d) fraudulent reservation of rival goods, by which a malicious party exploits flaws in a merchant's website to engage in a form of online scalp
  • the systems, methods, and techniques for web code modifications described in this paper can, in certain implementations, prevent or deter one or more of these types of attacks. For example, transforming sensitive data by replacing expressions with a set of equivalent expressions and then interleaving the expressions in the set of equivalent expressions can cause the effectiveness of bots and other malicious automated scripts to be substantially diminished.
  • the modification of code may be carried out by a security system that may supplement a web server system, and may intercept requests from client computers to the web server system and intercept responses from web servers of the system when they serve content back to the client computers (including where pieces of the content are served by different server systems).
  • the modification may be of static code (e.g., HTML) and of related executable code (e.g., JavaScript) in combination.
  • HTML static code
  • JavaScript related executable code
  • An expression may be rewritten as an equivalent expression or multiple expressions.
  • the combination of the three expressions in the set of expressions produces the same result as the original expression—that is, an assignment of the value 2 to the variable y.
  • Such rewriting, or transforming, of code may occur by first identifying data present in code that is to be served to the client computer (e.g., HTML, CSS, and JavaScript) and grouping such occurrences of sensitive data for further processing (e.g., by generating flags that point to each such element or copying a portion of each such element).
  • the identified data may be identified as sensitive or potentially sensitive or simply data that should be rewritten before being served. Processing of the data may occur by modifying each element throughout different formats of code, such as changing an expression in the manner above each time that name occurs in a parameter, method call, DOM operation, or elsewhere. Next, further processing may occur that comprises interleaving the set of elements throughout the new code. Such a process may be repeated each time a client computer requests code, and the modifications may be different for each serving of the same code.
  • the analysis to identify values or expressions that can be rewritten without affecting the operation of the code may be performed once, and a map to occurrences of such values or expressions in the mode may be generated, and then used for each serving of the code to locate the occurrences, so that they may be altered throughout the code in a consistent manner that does not break the code.
  • Such analyze-once, transform-many approaches may lessen the computational load for such a system and allow greater scaling of the system to larger web server systems with high volume requirements.
  • Such modification of the served code can help to prevent bots or other malicious code from exploiting or even detecting weaknesses in the web server system.
  • the names of functions or variables may be changed in various random manners each time a server system serves the code.
  • such constantly changing modifications may interfere with the ability of malicious parties to identify how the server system operates and web pages are structured, so that the malicious party cannot generate code to automatically exploit that structure in dishonest manners.
  • Such techniques may create a moving target that can prevent malicious organizations from reverse-engineering the operation of a web site so as to build automated bots that can interact with the web site, and potentially carry out Man-in-the-Browser and other Man-in-the-Middle operations and attacks.
  • the techniques discussed here may be carried out by a server subsystem that acts as an adjunct to a web server system that is commonly employed by a provider of web content.
  • a server subsystem that acts as an adjunct to a web server system that is commonly employed by a provider of web content.
  • an internet retailer may have an existing system by which it presents a web storefront at a web site (e.g., www.examplestore.com), interacts with customers to show them information about items available for purchase through the storefront, and processes order and payment information through that same storefront.
  • the techniques discussed here may be carried out by the retailer adding a separate server subsystem (either physical or virtualized) that stands between the prior system and the internet.
  • the new subsystem may act to receive web code from the web servers (or from a traffic management system that receives the code from the web servers), may translate that code in random manners before serving it to clients, may receive responses from clients and translate them in the opposite direction, and then provide that information to the web servers using the original names and other data.
  • a system may provide the retailer or a third party with whom the retailer contracts (e.g., a web security company that monitors data from many different clients and helps them identify suspect or malicious activity) with information that identifies suspicious transactions.
  • the security subsystem may keep a log of abnormal interactions, may refer particular interactions to a human administrator for later analysis or for real-time intervention, may cause a financial system to act as if a transaction occurred (so as to fool code operating on a client computer) but to stop such a transaction, or any number of other techniques that may be used to deal with attempted fraudulent transactions.
  • a computer-implemented method includes identifying a piece of data for serving from a server system to a client device that is remote from the server system, the piece of data being part of executable code requested from the server from the client device; creating a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; and providing, to the client device and as part of the executable code, the plurality of expressions along with code for executing the plurality of expressions, so that when the plurality of expressions are executed on the client device, the identified piece of data is returned on the client device without a need to serve the identified piece of data to the client device.
  • the method can include performing a permutation on the plurality of expressions so that the plurality of expressions or ordered in the executable code in an order different than they were created. The order of the expressions can be selected randomly as part of the permutation.
  • the method can include creating one or more additional expressions whose executed results are not used by other code that is part of the executable code served to the client device; and providing to the client device the plurality of expressions with the one or more additional expressions. Also, the method can include identifying, in the piece of data, data that needs to be kept away from malware that may be in the client device, and wherein creating a plurality of expressions comprises creating one or more replacement statements that when executed, provide a result that corresponds to the potentially sensitive data. The replacement statements can comprise one or more expressions that do not execute on the client device when the executable code is executed.
  • the method can further include identifying, in the piece of data, a first expression and a second expression to be replaced, wherein creating a plurality of expressions comprises creating a first set of replacement expressions corresponding to the first expression and a second set of expressions corresponding to the second expressions; and interleaving the replacement expressions of the first set of replacement expressions with the replacement expressions of the second set of replacement expressions, wherein the plurality of expressions provided to the client device comprise the interleaved replacement expressions.
  • creating a plurality of expressions comprises creating a first set of replacement expressions; identifying a first replacement expression in the first set of replacement expressions; creating a second set of replacement expressions that, when executed, provide a result that corresponds to the first replacement expression; and replacing the first replacement expression with the second set of replacement expressions.
  • the piece of data to be served comprises formats of code in HTML, CSS, and JavaScript, and wherein each of the formats interoperates with the other formats.
  • a computer-implemented method comprises receiving, from a server system, web content comprising original code, wherein the web content is requested by a client device that is remote from the server system; identifying a piece of data in the code; creating a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; generating modified code comprising the original code with the piece of data replaced with the plurality of expressions; and providing the modified code to the client device, wherein, when executed, the modified code provides a result that corresponds to the original code.
  • generating modified code comprises interleaving the plurality of expressions into the original code with the identified piece of data removed.
  • the plurality of expressions is created in a first ordering, and the plurality of expressions is interleaved into the original code so that the plurality of expressions maintains the first ordering.
  • the plurality of expressions are created in a first ordering, and the plurality of expressions are interleaved into the original code so that the plurality of expressions are in a second ordering that is different than the first ordering.
  • the plurality of expressions includes one or more junk expressions that do not execute.
  • the method further comprises selecting a first expression among the plurality of expressions; and creating a second plurality of expressions that, when executed, provide a result that corresponds to the selected first expression, wherein the generated modified code comprises the original code with the piece of data replaced with the plurality of expressions, with the selected first expression replaced with the second plurality of expressions.
  • a computer system for recoding web content served to client computers comprises an interface for receiving information from a web server system configured to provide computer code in multiple different formats in response to requests from client computing devices; and a security intermediary that is arranged to (i) receive the computer code from the interface before the computer code is provided to the client computing devices, (ii) identify a piece of data in the computer code that is to be replaced; (iii) create a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; and (iv) provide the plurality of expressions to the client computing devices with code for executing the plurality of expressions.
  • the piece of data in the computer code that is to be replaced is identified as potentially sensitive data.
  • the security intermediary is further arranged to perform a permutation of the plurality of expressions.
  • the plurality of expressions comprise one or more expressions that do not execute.
  • the security intermediary is further arranged to interleave the plurality of expressions with the code of executing the plurality of expressions.
  • FIG. 1 is a conceptual diagram showing transformation of a value into multiple expressions that execute back to the value.
  • FIG. 1A depicts a general overview of a system for requesting, modifying, and serving web content.
  • FIG. 1B depicts a schematic diagram of an encoding system that modifies requested web content.
  • FIG. 2 depicts an overview of a method for modifying program code.
  • FIG. 3A-3G depict various examples for modifying code for web content.
  • FIG. 4 is a flow diagram of a process for serving modified, or encoded, web content.
  • FIG. 5 shows a system for serving polymorphic code.
  • FIG. 6 is a schematic diagram of a general computing system.
  • FIG. 1 is a conceptual diagram showing transformation of a value into multiple expressions that execute back to the value.
  • the diagram attempts to show at a high level how initial representations in code can be rewritten as multiple representations that together can be executed on a client device to return the initial representation.
  • the multiple representations can be difficult for automated malware to analyze because they can not easily be matched to a template, can be scattered throughout the code in appropriate circumstances, and can be constantly changed, both in their values and in their ordered and locations in the code.
  • the diagram depicts a process, flowing from left-to-right.
  • the process starts with a value 102 , which may take a variety of forms.
  • the value may be a simple string or number in plaintext form.
  • Such value may be found by analysis of web code served by a web server system and provided to an intermediate security system that is tasked with recoding portions of the served code where the recoding will not affect the functionality of the code when it is executed on client devices.
  • the intermediate security system identifies a relatively complex expression that will resolve to the value.
  • the expression is shown here in the form of a pseudo-equation.
  • operations are shown as a box surrounding a dot, to represent that any appropriate operation may be used.
  • Parentheses are used to indicate grouping of operations, and the ability to have the relative groups combined with each other out of the order they are shown in the equation.
  • the three main groups are each converted into code snippets to represent the relevant sub-expressions, and then the order in which those sub-expressions are evaluated is changed—where the second grouping from the formula is evaluated first in the code, then the first, and then the third. Additional code may be generated to evaluate the results of the three groupings together with each other.
  • the code generated at 106 may then be inserted into the code received from the web server system and may be served to a client device that requested the code.
  • that code is executed at the client device, such as using a web browser, and such execution generates the initial value 108 or a value that is equivalent to the initial value.
  • the value “T” may be resolved into very different lines of code and expressions.
  • the process shown here is able to replace original code with different code that serves as a proxy for the original code, and that reaches the same result as the original code when it is executed by the standard environment (e.g., standard JavaScript run-time) on a client device.
  • the standard environment e.g., standard JavaScript run-time
  • FIG. 1A depicts an overview of a system 100 for encoding web content served from a web server 122 to a polymorphic encoding system 124 (or simply, encoding system) and to a web browser 126 .
  • the system 100 represents a high-level depiction of the system in FIG. 1 .
  • the polymorphic encoding system 124 receives web content from a web server 122 that is to be served to a web browser 126 at, for example, a client device. Prior to serving the web content to the web browser 126 , the polymorphic encoding system 124 identifies and encodes potentially sensitive data. Web content that is handled by the system 100 may include, for example, HTML, CSS, JavaScript, and other program code associated with the content or transmission of web resources such as a web page that may be presented at a client computer (or many different requesting client computers).
  • FIG. 1B depicts various parts of the encoding system 124 of FIG. 1A .
  • these components operate to transform incoming computer code so as to convert values or expressions into multiple additional expressions that resolve to the original values or expressions when they are executed as part of the code.
  • a sensitive data identifier 110 parses code to identify sensitive data or potentially sensitive data, including data that can be recoded without affecting the functionality of the code when it is executed.
  • data identifier 110 may broadly identify data that is to be replaced, regardless of whether the data is identified as sensitive in nature.
  • program code P may comprise statements S 1 , S 2 , S 3 , and S 4 .
  • the sensitive data identifier 110 identifies statement S 1 as potentially sensitive data.
  • Various methods may be used for identifying potentially sensitive data. For example, data associated with a form to be filled out or with particular fields or fieldnames in a form may be identified. Also, an operator of a security system may study the code served by a particular organization and may flag particular fields or other elements that are frequently served by the organization and are of a sensitive nature. The sensitive field identifier 110 may then use a list of fields or other information generated by such an analysis to locate sensitive fields in other pages of web code to be served by the same organization.
  • a replacement code generator 112 generates code that replaces such potentially sensitive data.
  • the generated code when executed, generates the same output as the originally-identified potentially sensitive code.
  • replacement code generator 112 generates four statements E 1 , E 2 , E 3 , and E 4 that, when executed, produce the same output as statement S 1 .
  • Interleaver 114 takes the replacement code statements E 1 , E 2 , E 3 , and E 4 , and interleaves the replacement code statements into other programmatic statements that are already part of program P, or statements that have been generated as replacement code for other statements in the code.
  • the interleaving process may be random (though avoiding any placement that would break the code) and may result in a different ordering of statements in response to two different requests.
  • the resulting program with the interleaved statements when executed, produces the same functional output as program P.
  • the data transferred from the encoding system 124 to the web browser 126 may be, for example, in the form of obfuscated JavaScript code with the sensitive data hidden within the code. Specific example methods for encoding the sensitive data are described below with respect to FIGS. 3A-3G .
  • the encoding system 124 identifies and extracts the sensitive data and then modifies the code. The modified code is then incorporated in the original web content, replacing the sensitive data.
  • FIG. 2 depicts an example of how program code P 202 may be modified, or encoded, into program code P′ 206 .
  • the illustrated process involves identifying a number of operations or statements that may be joined together and transformed into code that, when executed under a standard programming environment (e.g., a standard run-time implementation), will produce an original starting value or expression.
  • a standard programming environment e.g., a standard run-time implementation
  • Program P 202 represents any appropriate web content, such as HTML, CSS, JavaScript, and other program code.
  • Program P 202 comprises a set of n statements, ⁇ S 1 , S 2 , S 3 , . . . , Sn ⁇ .
  • the statement Si in the set of statements may be potentially sensitive data or content that is confirmed to be sensitive in nature.
  • the statement Si in the set of statements may include statements that are identified as needing to be replaced.
  • Each statement, Si may be a line of code or expression in the program.
  • each of the statements, Si is rewritten as a set of statements ⁇ Si 1 , Si 2 , Si 3 , Si 4 . . . ⁇ that, collectively, is executed as the equivalent of the individual statement Si, as described in further detail below with respect to FIGS. 3A-3F .
  • a set of equivalent statements E 204 for Program P 202 is generated. That is, the set of equivalent statements E 204 comprises n sets of equivalent statements for each statement Si, in Program P 202 .
  • the statement Si is replaced with the set of equivalent statements ⁇ S 11 , S 12 , S 13 , S 14 . . . ⁇ .
  • the number of statements in each set of equivalent statements need not be the same.
  • the various statements in the set of equivalent statements E 204 are interleaved, as described below with respect to FIG. 3G .
  • FIGS. 3A-3F show examples of equivalent statement replacement.
  • the figures show manners in which a single line of code for expressing a variable-assigning and/or mathematical relationship can be expressed instead by multiple lines of code that can be executed in a particular order to reach the initial result.
  • numeric functions are shown in the examples, here, other data may be similarly treated.
  • an alphanumeric string may be transformed through multiple operations, so that the starting point is a string different than what the web server system provided, but that ends up in generating the same string when the code is executed by a web browser.
  • a constant number in Javascript can be replaced by an equivalent Javascript expression.
  • the number 1 can be written as (3 ⁇ 2) or as (0+1), and the number 24 can be written as the expression (4*6) or (30 ⁇ 6).
  • the system can first use a random generator to generate a random number, for example, x, and then replace the number y with (x ⁇ (x ⁇ y)), which appears in the code to be different than y but is functional equivalent of y when executed.
  • FIG. 3A shows a sum (or subtraction) operation
  • any appropriate type of JavaScript operation e.g., sum, multiply, divide
  • function that returns a number e.g., String.length( ), Array.length( )
  • Other types of constants such as Boolean, string, array, and object, can be replaced with equivalent statements using a similar approach.
  • FIG. 3B shows an example of an equivalent statement replacement of a Boolean operation, as one such example.
  • the set of equivalent statements comprises a set of three statements, which, when executed, equivalently result in the variable y being set as “true.”
  • the first two statements assign the values 3 and 2 to variables a and b, respectively. In doing so, the value of a is assigned to be greater than the value of b.
  • FIG. 3C shows an example of an equivalent statement replacement of a string constant.
  • the variable y is set as “abc”.
  • a string or the characters in a string may be assigned numeric values, and the operations performed on the numeric examples in the figures above and below may be applied, and then the resulting numeric value or values may be converted back into alphanumeric characters for a string.
  • the letter “a” may be assigned a value of 256 in a particular font definition, and the techniques discussed here may be used to break the value of 256 up into a plurality of expressions.
  • the number 256 may be returned, and then may be rendered as a glyph for the character set, as an “a.”
  • Another method of equivalent statement replacement involves adding junk code or junk branches, and may be applied as an alternative or additionally to the other examples discussed here.
  • the purpose of adding the junk code is to add “noise” to the code so that potential hackers or attackers cannot use the position of expressions in the code (e.g, line number or the nth statement) to locate a key function or variable.
  • variables j and k not be present anywhere else in the program code, so that while the new code causes j to be assigned the value of 4 and k to be assigned the value of 7, variables j and k are not used anywhere else in the code and do not otherwise affect the operation or execution of the program code.
  • junk branches can be generated to add a level of obfuscation to the code.
  • FIG. 3E shows an example of permutations of statements for a scenario in which equivalent statements do not require a strict order.
  • N such statements for which order does not matter, there are N! possible permutations.
  • Permutation of statements may be used on an array, list string, or other collection data structure.
  • FIGS. 3A-3F illustrated various methods of producing equivalent statements 204 .
  • an encoded program P′ 206 can be generated by interleaving the equivalent statements E 204 with the rest of the code, as shown, for example, in FIG. 3G (and potentially identifying which of the statements can be included in either order relative to other of the statements).
  • a set of replacement statements 376 are generated for statement 372 and a set of replacement statements 378 are generated for statement 374 .
  • FIG. 3G shows an example where the order of the statements in each of the sets of replacement statements 376 , 378 matter (i.e., affect the outcome of the executed code), but the order of the statements between the two sets of replacement statements 376 , 378 does not matter.
  • FIG. 4 is a flow diagram of a process for serving modified program code.
  • the process involves identifying items in content to be served to a client computer that may potentially include sensitive data, transforming the data dynamically and randomly into a set of other data, and incorporating the set of other data into the content in a manner so as to hide the potentially sensitive data.
  • a request for web content is received, such as from a client computer operated by an individual seeking to perform a banking transaction at a website for the individual's bank.
  • the request may be in the form of an HTTP request and may be received by a load balancer operated by, or for, the bank.
  • the load balancer may recognize the form of the request and understand that it is to be handled by a security system that the bank has installed to operate along with its web server system.
  • the load balancer may thus provide the request to the security system, which may forward it to the web server system after analyzing the request (e.g., to open a tracking session based on the request), or may provide the request to the web server system and also provide information about the request to the security system in parallel.
  • a response to the request is generated by the web server system.
  • the user may have requested to perform a funds transfer between accounts at the bank, where the funds are owned by the individual, and the response by the web server system may include HTML for a webpage on which the user can specify parameters for the transaction, along with JavaScript code and CSS code for carrying out such transactions at a web browser operated by the individual.
  • the web server system sends the response to the request to an encoding system.
  • the response may comprise the web content requested by the client computer. Included in the response may be potentially sensitive data, such as, for example, or account numbers, routing numbers, or other data relating to a banking transaction.
  • the encoding system receives the web content from the web server system and identifies potentially sensitive data in the web content.
  • the encoding system generates code to replace the sensitive data.
  • the sensitive data may be written as a set of replacement statements, which, when executed, are displayed the same as the sensitive data, resulting in no difference in appearance to a user requesting the web content.
  • Various methods for rewriting or replacing the sensitive data are possible, including the methods described above with respect to FIGS. 3A-3F .
  • the replacement of sensitive data may include replacing a single statement or expression in the web content, or it may include replacing numerous statements. At a minimum, however, a single statement, or expression, is replaced with a set of equivalent statements.
  • the set equivalent statements may comprise one or more statements, which, when collectively executed, output the same result as the initial statement comprising sensitive data.
  • the encoding system may identify a single statement assigning a constant value to contain sensitive data.
  • the encoding system may randomly generate a set of equivalent statements, which, collectively, make the same assignment, as illustrated, for example, in FIG. 3A .
  • the encoding system may identify a single statement containing sensitive data.
  • the encoding system may add one or more lines of junk code or junk branches. The purpose of the junk code is to add a layer of randomness to the code to prevent potential hackers from using the position (e.g., line number) of code to identify potentially sensitive data. When executed, the junk code has no visible effect on the displayed web content. Similarly, the encoding system may generate junk branches that appear to supplement the original statement or expression in the code.
  • the junk branches may comprise conditional statements or expressions that will never execute.
  • An example of generating a junk branch is discussed above with respect to FIG. 3D .
  • Adding junk branches adds an additional level of obfuscation to the code that hampers a potential attacker's ability to target sensitive data.
  • the encoding system may employ recursive coding, generating multiple “layers” of replacement code.
  • An example of recursive coding is shown, for example, in FIG. 3F .
  • an assignment statement is replaced with three separate statements.
  • a junk branch is added to one of the three replacement statements. While the example shown in FIG. 3F shows only two “layers” of replacement code, any number of “layers” of replacement code may be generated.
  • the method moves to box 412 where the various replacement statements are interleaved in the code of the web content.
  • An example of the interleaving process is described above with respect to FIG. 3G .
  • Each of the two sets of replacement code 376 and 378 has a particular ordering of the statements. In some instances, the ordering of the individual statements affects the execution of the code, while in other instances, the ordering does not affect the outcome.
  • the encoding system randomly and dynamically generates code to replace the sensitive data. That is, given the same input code (i.e., web content), the encoding system does not necessarily generate the same replacement code in response to two different requests for the web content. Furthermore, the approach for generating the replacement code may be different in response to two requests for the same web content. For example, in response to one request, the encoder system may replace a first statement with a set of three replacement statements that collectively result in the same result as the first statement, such as the example shown in FIG. 3A . In response to a second request for the same web content, the encoder system may replace the same first statement with a set of three different replacement statements that include a junk branch, such as the example shown in FIG. 3D .
  • the process then serves the recoded web content at box 414 , in familiar manners.
  • Such a process may be performed repeatedly each time a client computer requests content, with the recoded content being different each time the content is served through the encoding system, including when identical or nearly identical content is requested in separate transactions by two different users or by the same user.
  • the code that is served by the encoding system may be supplemented with instrumentation code that runs on the computer browser and monitors interaction with the web page.
  • the instrumentation code may look for particular method calls or other calls to be made, such as when the calls or actions relate to a field in a form that is deemed to be subject to malicious activity, such as a client ID number field, a transaction account number field, or a transaction amount field.
  • the instrumentation code observes such activity on the client device, it will report that activity along with metadata that helps to characterize the activity, the process receives such reports from the instrumentation code and processes them, such as by forwarding them to a central security system that may analyze them to determine whether such activity is benign or malicious.
  • FIG. 5 shows a system 500 for serving polymorphic and instrumented code.
  • polymorphic code is code that is changed in different manners for different servings of the code, in manners that do not affect the way in which the executed code is perceived by users. The goal is to create a moving target for malware that tries to determine how the code operates, but without changing the user experience.
  • Instrumented code is code that is served, e.g., to a browser, with the main functional code and monitors how the functional code operates on a client device, and how other code may interact with the functional code and other activities on the client device.
  • the system 500 may identify values or expressions in the code that can be replaced with multiple other expressions that, when executed on a client device, resolve to the initial value or expressions.
  • the system 500 may be adapted to perform deflection and detection of malicious activity with respect to a web server system. Deflection may occur, for example, by the serving of polymorphic code, which interferes with the ability of malware to interact effectively with the code that is served. Detection may occur, for example, by adding instrumentation code (including injected code for a security service provider) that monitors activity of client devices that are served web code.
  • instrumentation code including injected code for a security service provider
  • the system 500 in this example is a system that is operated by or for a large number of different businesses that serve web pages and other content over the internet, such as banks and retailers that have on-line presences (e.g., on-line stores, or on-line account management tools).
  • the main server systems operated by those organizations or their agents are designated as web servers 504 a - 504 n , and could include a broad array of web servers, content servers, database servers, financial servers, load balancers, and other necessary components (either as physical or virtual servers).
  • security server systems 502 a to 502 n may cause code from the web server system to be supplemented and altered.
  • code may be provided, either by the web server system itself as part of the originally-served code, or by another mechanism after the code is initially served, such as by the security server systems 502 a to 502 n , where the supplementing code causes client devices to which the code is served to transmit data that characterizes the client devices and the use of the client devices.
  • other actions may be taken by the supplementing code, such as the code reporting actual malware activity or other anomalous activity at the client devices that can then be analyzed to determine whether the activity is malware activity.
  • the set of security server systems 502 a to 502 n is shown connected between the web servers 504 a to 504 n and a network 510 such as the internet. Although both extend to n in number, the actual number of sub-systems could vary. For example, certain of the customers could install two separate security server systems to serve all of their web server systems (which could be one or more), such as for redundancy purposes.
  • the particular security server systems 502 a - 502 n may be matched to particular ones of the web server systems 504 a - 504 n , or they may be at separate sites, and all of the web servers for various different customers may be provided with services by a single common set of security servers 502 a - 502 n (e.g., when all of the server systems are at a single co-location facility so that bandwidth issues are minimized).
  • Each of the security server systems 502 a - 502 n may be arranged and programmed to carry out operations like those discussed above and below and other operations.
  • a policy engine 520 in each such security server system may evaluate HTTP requests from client computers (e.g., desktop, laptop, tablet, and smartphone computers) based on header and network information, and can set and store session information related to a relevant policy.
  • the policy engine may be programmed to classify requests and correlate them to particular actions to be taken to code returned by the web server systems before such code is served back to a client computer.
  • the policy information may be provided to a decode, analysis, and re-encode module, which matches the content to be delivered, across multiple content types (e.g., HTML, JavaScript, and CSS), to actions to be taken on the content (e.g., using XPATH within a DOM), such as substitutions, addition of content, and other actions that may be provided as extensions to the system.
  • content types e.g., HTML, JavaScript, and CSS
  • actions to be taken on the content e.g., using XPATH within a DOM
  • substitutions e.g., addition of content, and other actions that may be provided as extensions to the system.
  • the different types of content may be analyzed to determine naming that may extend across such different pieces of content (e.g., the name of a function or parameter), and such names may be changed in a way that differs each time the content is served, e.g., by replacing a named item with randomly-generated characters.
  • Elements within the different types of content may also first be grouped as having a common effect on the operation of the code (e.g., if one element makes a call to another), and then may be re-encoded together in a common manner so that their interoperation with each other will be consistent even after the re-encoding.
  • Both the analysis of content for determining which transformations to apply to the content, and the transformation of the content itself, may occur at the same time (after receiving a request for the content) or at different times.
  • the analysis may be triggered, not by a request for the content, but by a separate determination that the content newly exists or has been changed. Such a determination may be via a “push” from the web server system reporting that it has implemented new or updated content.
  • the determination may also be a “pull” from the security servers 502 a - 502 n , such as by the security servers 502 a - 502 n implementing a web crawler (not shown) to recursively search for new and changed content and to report such occurrences to the security servers 502 a - 502 n , and perhaps return the content itself and perhaps perform some processing on the content (e.g., indexing it or otherwise identifying common terms throughout the content, creating DOMs for it, etc.).
  • the analysis to identify portions of the content that should be subjected to polymorphic modifications each time the content is served may then be performed according to the manner discussed above and below.
  • a rules engine 522 may store analytical rules for performing such analysis and for re-encoding of the content.
  • the rules engine 522 may be populated with rules developed through operator observation of particular content types, such as by operators of a system studying typical web pages that call JavaScript content and recognizing that a particular method is frequently used in a particular manner. Such observation may result in the rules engine 522 being programmed to identify the method and calls to the method so that they can all be grouped and re-encoded in a consistent and coordinated manner.
  • the decode, analysis, and re-encode module 524 encodes content being passed to client computers from a web server according to relevant policies and rules.
  • the module 524 also reverse encodes requests from the client computers to the relevant web server or servers.
  • a web page may be served with a particular parameter, and may refer to JavaScript that references that same parameter.
  • the decode, analysis, and re-encode module 524 may replace the name of that parameter, in each of the different types of content, with a randomly generated name, and each time the web page is served (or at least in varying sessions), the generated name may be different.
  • the name of the parameter is passed back to the web server, it may be re-encoded back to its original name so that this portion of the security process may occur seamlessly for the web server.
  • a key for the function that encodes and decodes such strings can be maintained by the security server system 502 along with an identifier for the particular client computer so that the system 502 may know which key or function to apply, and may otherwise maintain a state for the client computer and its session.
  • a stateless approach may also be employed, whereby the system 502 encrypts the state and stores it in a cookie that is saved at the relevant client computer. The client computer may then pass that cookie data back when it passes the information that needs to be decoded back to its original status. With the cookie data, the system 502 may use a private key to decrypt the state information and use that state information in real-time to decode the information from the client computer.
  • Such a stateless implementation may create benefits such as less management overhead for the server system 502 (e.g., for tracking state, for storing state, and for performing clean-up of stored state information as sessions time out or otherwise end) and as a result, higher overall throughput.
  • the decode, analysis, and re-encode module 524 and the security server system 502 may be configured to modify web code differently each time it is served in a manner that is generally imperceptible to a user who interacts with such web code.
  • multiple different client computers may request a common web resource such as a web page or web application that a web server provides in response to the multiple requests in substantially the same manner.
  • a common web page may be requested from a web server, and the web server may respond by serving the same or substantially identical HTML, CSS, JavaScript, images, and other web code or files to each of the clients in satisfaction of the requests.
  • particular portions of requested web resources may be common among multiple requests, while other portions may be client or session specific.
  • the decode, analysis, and re-encode module 524 may be adapted to apply different modifications to each instance of a common web resource, or common portion of a web resource, such that the web code that it is ultimately delivered to the client computers in response to each request for the common web resource includes different modifications.
  • the analysis can happen a single time for a plurality of servings of the code in different recoded instances. For example, the analysis may identify a particular function name and all of the locations it occurs throughout the relevant code, and may create a map to each such occurrence in the code. Subsequently, when the web content is called to be served, the map can be consulted and random strings may be inserted in a coordinated matter across the code, though the generation of a new name each time for the function name and the replacement of that name into the code, will require much less computing cost than would full re-analysis of the content. Also, when a page is to be served, it can be analyzed to determine which portions, if any, have changed since the last analysis, and subsequent analysis may be performed only on the portions of the code that have changed.
  • the security server system 502 can apply the modifications in a manner that does not substantially affect a way that the user interacts with the resource, regardless of the different transformations applied. For example, when two different client computers request a common web page, the security server system 502 applies different modifications to the web code corresponding to the web page in response to each request for the web page, but the modifications do not substantially affect a presentation of the web page between the two different client computers. The modifications can therefore be made largely transparent to users interacting with a common web resource so that the modifications do not cause a substantial difference in the way the resource is displayed or the way the user interacts with the resource on different client devices or in different sessions in which the resource is requested.
  • An instrumentation module 526 is programmed to add instrumentation code to the content that is served from a web server.
  • the instrumentation code is code that is programmed to monitor the operation of other code that is served. For example, the instrumentation code may be programmed to identify when certain methods are called, when those methods have been identified as likely to be called by malicious software. When such actions are observed to occur by the instrumentation code, the instrumentation code may be programmed to send a communication to the security server reporting on the type of action that occurred and other meta data that is helpful in characterizing the activity. Such information can be used to help determine whether the action was malicious or benign.
  • the instrumentation code may also analyze the DOM on a client computer in predetermined manners that are likely to identify the presence of and operation of malicious software, and to report to the security servers 502 or a related system.
  • the instrumentation code may be programmed to characterize a portion of the DOM when a user takes a particular action, such as clicking on a particular on-page button, so as to identify a change in the DOM before and after the click (where the click is expected to cause a particular change to the DOM if there is benign code operating with respect to the click, as opposed to malicious code operating with respect to the click).
  • Data that characterizes the DOM may also be hashed, either at the client computer or the server system 502 , to produce a representation of the DOM (e.g., in the differences between part of the DOM before and after a defined action occurs) that is easy to compare against corresponding representations of DOMs from other client computers.
  • Other techniques may also be used by the instrumentation code to generate a compact representation of the DOM or other structure expected to be affected by malicious code in an identifiable manner.
  • Uninfected client computers 513 A- 512 n represent computers that do not have malicious code programmed to interfere with a particular site a user visits or to otherwise perform malicious activity.
  • Infected client computers 514 a - 514 n represent computers that do have malware or malicious code ( 518 a - 518 n , respectively) programmed to interfere with a particular site a user visits or to otherwise perform malicious activity.
  • the client computers 513 A- 512 n , 514 a - 514 n may also store the encrypted cookies discussed above and pass such cookies back through the network 510 .
  • the client computers 512 A- 512 n , 514 a - 514 n will, once they obtain the served content, implement DOMs for managing the displayed web pages, and instrumentation code may monitor the respective DOMs as discussed above. Reports of illogical activity (e.g., software on the client device calling a method that does not exist in the downloaded and rendered content) can then be reported back to the server system.
  • each web site operator may be provided with a single security console 507 that provides analytical tools for a single site or group of sites.
  • the console 507 may include software for showing groups of abnormal activities, or reports that indicate the type of code served by the web site that generates the most abnormal activity.
  • a security officer for a bank may determine that defensive actions are needed if most of the reported abnormal activity for its web site relates to content elements corresponding to money transfer operations-an indication that stale malicious code may be trying to access such elements surreptitiously.
  • Console 507 may also be multiple different consoles used by different employees of an operator of the system 500 , and may be used for pre-analysis of web content before it is served, as part of determining how best to apply polymorphic transformations to the web code.
  • an operator at console 507 may form or apply rules 522 that guide the transformation that is to be performed on the content when it is ultimately served.
  • the rules may be written explicitly by the operator or may be provided by automatic analysis and approved by the operator.
  • the operator may perform actions in a graphical user interface (e.g., by selecting particular elements from the code by highlighting them with a pointer, and then selecting an operation from a menu of operations) and rules may be written consistent with those actions.
  • a central security console 508 may connect to a large number of web content providers, and may be run, for example, by an organization that provides the software for operating the security server systems 502 A- 502 n .
  • Such console 508 may access complex analytical and data analysis tools, such as tools that identify clustering of abnormal activities across thousands of client computers and sessions, so that an operator of the console 508 can focus on those clusters in order to diagnose them as malicious or benign, and then take steps to thwart any malicious activity.
  • the console 508 may have access to software for analyzing telemetry data received from a very large number of client computers that execute instrumentation code provided by the system 500 .
  • Such data may result from forms being re-written across a large number of web pages and web sites to include content that collects system information such as browser version, installed plug-ins, screen resolution, window size and position, operating system, network information, and the like.
  • user interaction with served content may be characterized by such code, such as the speed with which a user interacts with a page, the path of a pointer over the page, and the like.
  • Such collected telemetry data may be used by the console 508 to identify what is “natural” interaction with a particular page that is likely the result of legitimate human actions, and what is “unnatural” interaction that is likely the result of a bot interacting with the content.
  • Statistical and machine learning methods may be used to identify patterns in such telemetry data, and to resolve bot candidates to particular client computers.
  • client computers may then be handled in special manners by the system 500 , may be blocked from interaction, or may have their operators notified that their computer is potentially running malicious software (e.g., by sending an e-mail to an account holder of a computer so that the malicious software cannot intercept it easily).
  • FIG. 6 is a schematic diagram of a general computing system 600 .
  • the system 600 can be used for the operations described in association with any of the computer-implement methods described previously, according to one implementation.
  • the system 600 is intended to include various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
  • the system 600 can also include mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices.
  • the system can include portable storage media, such as, Universal Serial Bus (USB) flash drives.
  • USB flash drives may store operating systems and other applications.
  • the USB flash drives can include input/output components, such as a wireless transmitter or USB connector that may be inserted into a USB port of another computing device.
  • the system 600 includes a processor 610 , a memory 620 , a storage device 630 , and an input/output device 640 .
  • Each of the components 610 , 620 , 630 , and 640 are interconnected using a system bus 650 .
  • the processor 610 is capable of processing instructions for execution within the system 600 .
  • the processor may be designed using any of a number of architectures.
  • the processor 610 may be a CISC (Complex Instruction Set Computers) processor, a RISC (Reduced Instruction Set Computer) processor, or a MISC (Minimal Instruction Set Computer) processor.
  • the processor 610 is a single-threaded processor. In another implementation, the processor 610 is a multi-threaded processor.
  • the processor 610 is capable of processing instructions stored in the memory 620 or on the storage device 630 to display graphical information for a user interface on the input/output device 640 .
  • the memory 620 stores information within the system 600 .
  • the memory 620 is a computer-readable medium.
  • the memory 620 is a volatile memory unit.
  • the memory 620 is a non-volatile memory unit.
  • the storage device 630 is capable of providing mass storage for the system 600 .
  • the storage device 630 is a computer-readable medium.
  • the storage device 630 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.
  • the input/output device 640 provides input/output operations for the system 600 .
  • the input/output device 640 includes a keyboard and/or pointing device.
  • the input/output device 640 includes a display unit for displaying graphical user interfaces.
  • the features described can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them.
  • the apparatus can be implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device for execution by a programmable processor; and method steps can be performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output.
  • the described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.
  • a computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result.
  • a computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors of any kind of computer.
  • a processor will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a processor for executing instructions and one or more memories for storing instructions and data.
  • a computer will also include, or be operatively coupled to communicate with, one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks.
  • Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM, and flash memory devices
  • magnetic disks such as internal hard disks and removable disks
  • magneto-optical disks and CD-ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
  • ASICs application-specific integrated circuits
  • the features can be implemented on a computer having a display device such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer. Additionally, such activities can be implemented via touchscreen flat-panel displays and other appropriate mechanisms.
  • a display device such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer.
  • a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer.
  • activities can be implemented via touchscreen flat-panel displays and other appropriate mechanisms.
  • the features can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them.
  • the components of the system can be connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), peer-to-peer networks (having ad-hoc or static members), grid computing infrastructures, and the Internet.
  • LAN local area network
  • WAN wide area network
  • peer-to-peer networks having ad-hoc or static members
  • grid computing infrastructures and the Internet.
  • the computer system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a network, such as the described one.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • the subject matter may be embodied as methods, systems, devices, and/or as an article or computer program product.
  • the article or computer program product may comprise one or more computer-readable media or computer-readable storage devices, which may be tangible and non-transitory, that include instructions that may be executable by one or more machines such as computer processors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A computer-implemented method, the method includes identifying a piece of data to be served from a server system to a client device that is remote from the server system; creating a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; and providing the plurality of expressions to the client device with code for executing the plurality of expressions.

Description

  • This application claims the benefit under 35 U.S.C. 120 as a Continuation of U.S. patent application Ser. No. 14/286,324, filed on 2014 May 23, the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein. The applicant(s) hereby rescind any disclaimer of claim scope in the parent application(s) or the prosecution history thereof and advise the USPTO that the claims in this application may be broader than any claim in the parent application(s).
  • TECHNICAL FIELD
  • This document relates to computer security and interference with malware.
  • BACKGROUND
  • Research indicates that a large share of web traffic involves computer bots—many are malware. Bot activities include content scraping, reconnaissance, credential stuffing, creating fake accounts, comment spamming, and similar activities. Bots can impose an unnecessary load on any company trying to serve web content efficiently. More importantly, they can attempt to “learn” the operation of a web site so as to exploit it. As one example, malicious software (malware) may execute a “man in the browser” attack by intercepting communications that a user makes with a web site in a manner that makes the user believe that he or she is actually communicating with the web site. For example, malware may generate a display for a user who is visiting a banking site, where the display requests from the user information such as social security number, credit card number, other account numbers. An organization that operates the malware may then have such data sent to it, and may use the data to steal from the user, the web site operator, or both.
  • Various approaches have been taken to identify and prevent such malicious activity. For example, programs have been developed for operation on client computers or at the servers of the organizations that own and operate the client computer to detect improper activity.
  • SUMMARY
  • This document describes systems and techniques by which web code (e.g., HTML, CSS, and JavaScript) that a server system provides to client devices is modified before it is served over the internet, so as to make more difficult the exploitation of the code and the operator of the server system by clients that receive the code (including clients that are infected without their human users' knowledge). The modifications can be made to encode sensitive data, and may differ for different instances in which a web page and related content are served, whether to the same client computer or to different client computers. For example, a single expression or value in the code may be re-written as multiple expressions that, when executed, produce the initial value or expression. Where different code is served in response to each request, the expressions into which the initial value are resolved may also differ each time. The output of the code, when executed on the client computer, however, is the same for all such different versions of the served code so that a user at a client computer does not perceive a difference in the displayed web page. Specifically, two different users (or a single user in two different web browsing sessions) may be served slightly different code in response to the same requests, where the difference may be in implicit parts of the code that are not displayed so that the differences are not noticeable to the user or users.
  • The manner in which an initial value or expression is rewritten into multiple expressions capable of being executed on a client computer may take a variety of forms For example, different expressions, different numbers of expressions, and different ordering of the execution of the expressions may all be varied to interfere with malware. Also, these different parameters may be varied so as to be different from one serving of the code to the next. Such variation, which may be termed “polymorphism” of the code, may help create a moving target against which malware needs to apply itself. In one example, changing the code that is served to client devices in an essentially random manner (i.e., a manner that effectively interferes with the ability of malware that has analyzed serving n from inferring something useful about serving n+x) each time the code is served can deter malicious code executing on the client computers (e.g., Man in the Browser bot) from interacting with the served code in a predictable way so as to trick a user of the client computer into providing confidential financial information and the like. Moreover, external programs generally cannot drive web application functionality directly, and so preventing predictable interaction with served code can be an effective mechanism for preventing malicious computer activity.
  • As described here, the techniques transform values or expressions, such as a cleartext string, a Javascript object, or a Javascript code snippet into another Javascript snippet that is the equivalent to the input after it is executed (i.e., it produces an identical displayed output). The encoding is dynamic and random, which means that the encoding generates different output code each time given the same input (though the outputs may repeat periodically as long as that repetition is not frequent enough to allow malware to predict the output or readily obtain the repeated output). Because the encoded output code is still presented as cleartext, it may not be able to prevent a human from ascertaining sensitive data, but it may make it very difficult for a malicious party to write a computer program to extract the sensitive data automatically.
  • Likewise, other forms of computer attacks can also be prevented or deterred by the web code transformations described in this document. Some of these attacks include: (a) denial of service attacks, and particularly advanced application denial of service attacks, in which a malicious party targets a particular functionality of a website (e.g., a widget or other web application) and floods the server with requests for that functionality until the server can no longer respond to requests from legitimate users; (b) rating manipulation schemes in which fraudulent parties use automated scripts to generate a large number of positive or negative reviews of some entity such as a marketed product or business in order to artificially skew the average rating for the entity up or down; (c) fake account creation in which malicious parties use automated scripts to establish and use fake accounts on one or more web services to engage in attacks ranging from content spam, e-mail spam, identity theft, phishing, ratings manipulation, fraudulent reviews, and countless others; (d) fraudulent reservation of rival goods, by which a malicious party exploits flaws in a merchant's website to engage in a form of online scalping by purchasing all or a substantial amount of the merchant's inventory and quickly turning around to sell the inventory at a significant markup; (e) ballot stuffing, in which automated bots are used to register a large number of fraudulent poll responses; (f) website scraping, in which both malicious parties and others (e.g., commercial competitors), use automated programs to obtain and collect data such as user reviews, articles, or technical information published by a website, and where the scraped data is used for commercial purposes that may threaten to undercut the origin website's investment in the scraped content; and (g) web vulnerability assessments, in which malicious parties scan any number of websites for security vulnerabilities by analyzing the web code and structure of each site.
  • The systems, methods, and techniques for web code modifications described in this paper can, in certain implementations, prevent or deter one or more of these types of attacks. For example, transforming sensitive data by replacing expressions with a set of equivalent expressions and then interleaving the expressions in the set of equivalent expressions can cause the effectiveness of bots and other malicious automated scripts to be substantially diminished.
  • The modification of code that is described in more detail below may be carried out by a security system that may supplement a web server system, and may intercept requests from client computers to the web server system and intercept responses from web servers of the system when they serve content back to the client computers (including where pieces of the content are served by different server systems). The modification may be of static code (e.g., HTML) and of related executable code (e.g., JavaScript) in combination. For example, the names of certain elements on a web page defined via HTML may be changed, as may references to items external to the HTML (e.g., CSS and JavaScript code). An expression may be rewritten as an equivalent expression or multiple expressions. For example, the expression “var y=2” may be rewritten as the following set of expressions: “var a=10”; “var b=8”; and “y=a−b”. As shown in this example, the combination of the three expressions in the set of expressions produces the same result as the original expression—that is, an assignment of the value 2 to the variable y. Such rewriting, or transforming, of code may occur by first identifying data present in code that is to be served to the client computer (e.g., HTML, CSS, and JavaScript) and grouping such occurrences of sensitive data for further processing (e.g., by generating flags that point to each such element or copying a portion of each such element). The identified data may be identified as sensitive or potentially sensitive or simply data that should be rewritten before being served. Processing of the data may occur by modifying each element throughout different formats of code, such as changing an expression in the manner above each time that name occurs in a parameter, method call, DOM operation, or elsewhere. Next, further processing may occur that comprises interleaving the set of elements throughout the new code. Such a process may be repeated each time a client computer requests code, and the modifications may be different for each serving of the same code.
  • In certain instances, the analysis to identify values or expressions that can be rewritten without affecting the operation of the code may be performed once, and a map to occurrences of such values or expressions in the mode may be generated, and then used for each serving of the code to locate the occurrences, so that they may be altered throughout the code in a consistent manner that does not break the code. Such analyze-once, transform-many approaches may lessen the computational load for such a system and allow greater scaling of the system to larger web server systems with high volume requirements.
  • Such modification of the served code can help to prevent bots or other malicious code from exploiting or even detecting weaknesses in the web server system. For example, the names of functions or variables may be changed in various random manners each time a server system serves the code. As noted above, such constantly changing modifications may interfere with the ability of malicious parties to identify how the server system operates and web pages are structured, so that the malicious party cannot generate code to automatically exploit that structure in dishonest manners. Such techniques may create a moving target that can prevent malicious organizations from reverse-engineering the operation of a web site so as to build automated bots that can interact with the web site, and potentially carry out Man-in-the-Browser and other Man-in-the-Middle operations and attacks.
  • The techniques discussed here may be carried out by a server subsystem that acts as an adjunct to a web server system that is commonly employed by a provider of web content. For example, as discussed in more detail below, an internet retailer may have an existing system by which it presents a web storefront at a web site (e.g., www.examplestore.com), interacts with customers to show them information about items available for purchase through the storefront, and processes order and payment information through that same storefront. The techniques discussed here may be carried out by the retailer adding a separate server subsystem (either physical or virtualized) that stands between the prior system and the internet. The new subsystem may act to receive web code from the web servers (or from a traffic management system that receives the code from the web servers), may translate that code in random manners before serving it to clients, may receive responses from clients and translate them in the opposite direction, and then provide that information to the web servers using the original names and other data. In addition, such a system may provide the retailer or a third party with whom the retailer contracts (e.g., a web security company that monitors data from many different clients and helps them identify suspect or malicious activity) with information that identifies suspicious transactions. For example, the security subsystem may keep a log of abnormal interactions, may refer particular interactions to a human administrator for later analysis or for real-time intervention, may cause a financial system to act as if a transaction occurred (so as to fool code operating on a client computer) but to stop such a transaction, or any number of other techniques that may be used to deal with attempted fraudulent transactions.
  • In one implementation, a computer-implemented method is disclosed that includes identifying a piece of data for serving from a server system to a client device that is remote from the server system, the piece of data being part of executable code requested from the server from the client device; creating a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; and providing, to the client device and as part of the executable code, the plurality of expressions along with code for executing the plurality of expressions, so that when the plurality of expressions are executed on the client device, the identified piece of data is returned on the client device without a need to serve the identified piece of data to the client device. The method can include performing a permutation on the plurality of expressions so that the plurality of expressions or ordered in the executable code in an order different than they were created. The order of the expressions can be selected randomly as part of the permutation.
  • In some aspects, the method can include creating one or more additional expressions whose executed results are not used by other code that is part of the executable code served to the client device; and providing to the client device the plurality of expressions with the one or more additional expressions. Also, the method can include identifying, in the piece of data, data that needs to be kept away from malware that may be in the client device, and wherein creating a plurality of expressions comprises creating one or more replacement statements that when executed, provide a result that corresponds to the potentially sensitive data. The replacement statements can comprise one or more expressions that do not execute on the client device when the executable code is executed.
  • In certain aspects, the method can further include identifying, in the piece of data, a first expression and a second expression to be replaced, wherein creating a plurality of expressions comprises creating a first set of replacement expressions corresponding to the first expression and a second set of expressions corresponding to the second expressions; and interleaving the replacement expressions of the first set of replacement expressions with the replacement expressions of the second set of replacement expressions, wherein the plurality of expressions provided to the client device comprise the interleaved replacement expressions.
  • In other aspects, creating a plurality of expressions comprises creating a first set of replacement expressions; identifying a first replacement expression in the first set of replacement expressions; creating a second set of replacement expressions that, when executed, provide a result that corresponds to the first replacement expression; and replacing the first replacement expression with the second set of replacement expressions. The piece of data to be served comprises formats of code in HTML, CSS, and JavaScript, and wherein each of the formats interoperates with the other formats.
  • In another implementation, a computer-implemented method is disclosed that comprises receiving, from a server system, web content comprising original code, wherein the web content is requested by a client device that is remote from the server system; identifying a piece of data in the code; creating a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; generating modified code comprising the original code with the piece of data replaced with the plurality of expressions; and providing the modified code to the client device, wherein, when executed, the modified code provides a result that corresponds to the original code. In some aspects, generating modified code comprises interleaving the plurality of expressions into the original code with the identified piece of data removed. Also, in some aspects, the plurality of expressions is created in a first ordering, and the plurality of expressions is interleaved into the original code so that the plurality of expressions maintains the first ordering. In other aspects, the plurality of expressions are created in a first ordering, and the plurality of expressions are interleaved into the original code so that the plurality of expressions are in a second ordering that is different than the first ordering. In yet other aspects, the plurality of expressions includes one or more junk expressions that do not execute. In some aspects, the method further comprises selecting a first expression among the plurality of expressions; and creating a second plurality of expressions that, when executed, provide a result that corresponds to the selected first expression, wherein the generated modified code comprises the original code with the piece of data replaced with the plurality of expressions, with the selected first expression replaced with the second plurality of expressions.
  • In another implementation, a computer system for recoding web content served to client computers is disclosed that comprises an interface for receiving information from a web server system configured to provide computer code in multiple different formats in response to requests from client computing devices; and a security intermediary that is arranged to (i) receive the computer code from the interface before the computer code is provided to the client computing devices, (ii) identify a piece of data in the computer code that is to be replaced; (iii) create a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; and (iv) provide the plurality of expressions to the client computing devices with code for executing the plurality of expressions. In some aspects, the piece of data in the computer code that is to be replaced is identified as potentially sensitive data. In other aspects, the security intermediary is further arranged to perform a permutation of the plurality of expressions. In yet other aspects, the plurality of expressions comprise one or more expressions that do not execute. In yet another aspect, the security intermediary is further arranged to interleave the plurality of expressions with the code of executing the plurality of expressions.
  • Other features and advantages will be apparent from the description and drawings, and from the claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a conceptual diagram showing transformation of a value into multiple expressions that execute back to the value.
  • FIG. 1A depicts a general overview of a system for requesting, modifying, and serving web content.
  • FIG. 1B depicts a schematic diagram of an encoding system that modifies requested web content.
  • FIG. 2 depicts an overview of a method for modifying program code.
  • FIG. 3A-3G depict various examples for modifying code for web content.
  • FIG. 4 is a flow diagram of a process for serving modified, or encoded, web content.
  • FIG. 5 shows a system for serving polymorphic code.
  • FIG. 6 is a schematic diagram of a general computing system.
  • Like reference numbers and designations in the various drawings indicate like elements.
  • DETAILED DESCRIPTION
  • FIG. 1 is a conceptual diagram showing transformation of a value into multiple expressions that execute back to the value. In general, the diagram attempts to show at a high level how initial representations in code can be rewritten as multiple representations that together can be executed on a client device to return the initial representation. The multiple representations, however, can be difficult for automated malware to analyze because they can not easily be matched to a template, can be scattered throughout the code in appropriate circumstances, and can be constantly changed, both in their values and in their ordered and locations in the code.
  • The diagram depicts a process, flowing from left-to-right. The process starts with a value 102, which may take a variety of forms. The value may be a simple string or number in plaintext form. Such value may be found by analysis of web code served by a web server system and provided to an intermediate security system that is tasked with recoding portions of the served code where the recoding will not affect the functionality of the code when it is executed on client devices.
  • At 104, the intermediate security system identifies a relatively complex expression that will resolve to the value. For clarity of explanation, the expression is shown here in the form of a pseudo-equation. In the equation, operations are shown as a box surrounding a dot, to represent that any appropriate operation may be used. Parentheses are used to indicate grouping of operations, and the ability to have the relative groups combined with each other out of the order they are shown in the equation. Thus, at 106, the three main groups are each converted into code snippets to represent the relevant sub-expressions, and then the order in which those sub-expressions are evaluated is changed—where the second grouping from the formula is evaluated first in the code, then the first, and then the third. Additional code may be generated to evaluate the results of the three groupings together with each other.
  • The code generated at 106 may then be inserted into the code received from the web server system and may be served to a client device that requested the code. At 108, that code is executed at the client device, such as using a web browser, and such execution generates the initial value 108 or a value that is equivalent to the initial value. In subsequent servings of the code, the value “T” may be resolved into very different lines of code and expressions.
  • In this manner, then, the process shown here is able to replace original code with different code that serves as a proxy for the original code, and that reaches the same result as the original code when it is executed by the standard environment (e.g., standard JavaScript run-time) on a client device.
  • FIG. 1A depicts an overview of a system 100 for encoding web content served from a web server 122 to a polymorphic encoding system 124 (or simply, encoding system) and to a web browser 126. In general, the system 100 represents a high-level depiction of the system in FIG. 1.
  • The polymorphic encoding system 124 receives web content from a web server 122 that is to be served to a web browser 126 at, for example, a client device. Prior to serving the web content to the web browser 126, the polymorphic encoding system 124 identifies and encodes potentially sensitive data. Web content that is handled by the system 100 may include, for example, HTML, CSS, JavaScript, and other program code associated with the content or transmission of web resources such as a web page that may be presented at a client computer (or many different requesting client computers).
  • FIG. 1B depicts various parts of the encoding system 124 of FIG. 1A. In general, these components operate to transform incoming computer code so as to convert values or expressions into multiple additional expressions that resolve to the original values or expressions when they are executed as part of the code.
  • In the figure, a sensitive data identifier 110 parses code to identify sensitive data or potentially sensitive data, including data that can be recoded without affecting the functionality of the code when it is executed. In some instances, data identifier 110 may broadly identify data that is to be replaced, regardless of whether the data is identified as sensitive in nature. In this example, program code P may comprise statements S1, S2, S3, and S4. In this example, the sensitive data identifier 110 identifies statement S1 as potentially sensitive data.
  • Various methods may be used for identifying potentially sensitive data. For example, data associated with a form to be filled out or with particular fields or fieldnames in a form may be identified. Also, an operator of a security system may study the code served by a particular organization and may flag particular fields or other elements that are frequently served by the organization and are of a sensitive nature. The sensitive field identifier 110 may then use a list of fields or other information generated by such an analysis to locate sensitive fields in other pages of web code to be served by the same organization.
  • The sensitive data from the web server may be typically presented in cleartext form. A replacement code generator 112 generates code that replaces such potentially sensitive data. The generated code, when executed, generates the same output as the originally-identified potentially sensitive code. In this example, replacement code generator 112 generates four statements E1, E2, E3, and E4 that, when executed, produce the same output as statement S1. Interleaver 114 takes the replacement code statements E1, E2, E3, and E4, and interleaves the replacement code statements into other programmatic statements that are already part of program P, or statements that have been generated as replacement code for other statements in the code. The interleaving process may be random (though avoiding any placement that would break the code) and may result in a different ordering of statements in response to two different requests. The resulting program with the interleaved statements, when executed, produces the same functional output as program P.
  • The data transferred from the encoding system 124 to the web browser 126 may be, for example, in the form of obfuscated JavaScript code with the sensitive data hidden within the code. Specific example methods for encoding the sensitive data are described below with respect to FIGS. 3A-3G. When the sensitive data passes through the encoding system 124, the encoding system 124 identifies and extracts the sensitive data and then modifies the code. The modified code is then incorporated in the original web content, replacing the sensitive data.
  • FIG. 2 depicts an example of how program code P 202 may be modified, or encoded, into program code P′ 206. In general, the illustrated process involves identifying a number of operations or statements that may be joined together and transformed into code that, when executed under a standard programming environment (e.g., a standard run-time implementation), will produce an original starting value or expression.
  • While the code of program P 202 and program P′ 206 are not identical, the output of each of program P 202 and program P′ 206, when executed (e.g., via a web browser), are the same. Program P 202 represents any appropriate web content, such as HTML, CSS, JavaScript, and other program code. Program P 202 comprises a set of n statements, {S1, S2, S3, . . . , Sn}. The statement Si in the set of statements may be potentially sensitive data or content that is confirmed to be sensitive in nature. In some instances, the statement Si in the set of statements may include statements that are identified as needing to be replaced. Each statement, Si, may be a line of code or expression in the program. In Step 1, each of the statements, Si, is rewritten as a set of statements {Si1, Si2, Si3, Si4 . . . } that, collectively, is executed as the equivalent of the individual statement Si, as described in further detail below with respect to FIGS. 3A-3F. In this manner, after Step 1 is complete for each statement, Si, in Program P 202, a set of equivalent statements E 204 for Program P 202 is generated. That is, the set of equivalent statements E 204 comprises n sets of equivalent statements for each statement Si, in Program P 202. For example, the statement Si is replaced with the set of equivalent statements {S11, S12, S13, S14 . . . }. The number of statements in each set of equivalent statements need not be the same. Then, in Step 2, the various statements in the set of equivalent statements E 204 are interleaved, as described below with respect to FIG. 3G.
  • FIGS. 3A-3F show examples of equivalent statement replacement. In general, the figures show manners in which a single line of code for expressing a variable-assigning and/or mathematical relationship can be expressed instead by multiple lines of code that can be executed in a particular order to reach the initial result. Although numeric functions are shown in the examples, here, other data may be similarly treated. For example, an alphanumeric string may be transformed through multiple operations, so that the starting point is a string different than what the web server system provided, but that ends up in generating the same string when the code is executed by a web browser.
  • Referring to FIG. 3A, a constant number in Javascript can be replaced by an equivalent Javascript expression. There exist numerous ways to replace a constant number. For example, the number 1 can be written as (3−2) or as (0+1), and the number 24 can be written as the expression (4*6) or (30−6). To generate essentially random expressions for number y, the system can first use a random generator to generate a random number, for example, x, and then replace the number y with (x−(x−y)), which appears in the code to be different than y but is functional equivalent of y when executed. FIG. 3A shows an example of this equivalent statement replacement where the statement “var y=2” is replaced with the set of statements {var a=10; var b=8; y=a−b}. While FIG. 3A shows a sum (or subtraction) operation, generally, any appropriate type of JavaScript operation (e.g., sum, multiply, divide) or function that returns a number (e.g., String.length( ), Array.length( )) may be used. Other types of constants, such as Boolean, string, array, and object, can be replaced with equivalent statements using a similar approach.
  • FIG. 3B shows an example of an equivalent statement replacement of a Boolean operation, as one such example. Where the initial statement “var y=true” sets the value of variable y as “true,” the set of equivalent statements comprises a set of three statements, which, when executed, equivalently result in the variable y being set as “true.” In this example, the first two statements assign the values 3 and 2 to variables a and b, respectively. In doing so, the value of a is assigned to be greater than the value of b. Therefore, the statement “y=(a>b)” will always result in the value of y being set to “true.” Notably, there exists a near infinite number of ways to create a set of statements (with any number of statements) that result in the value of y being assigned a value of “true.” For instance, variables a and b can be assigned to any values as long as the value of a is larger than the value of b. Alternatively, other Boolean expressions can be used.
  • FIG. 3C shows an example of an equivalent statement replacement of a string constant. In the example, the statement “var y=‘abc’” is replaced with an equivalent set of statements comprising three statements. Collectively, when the equivalent set of statements is executed, the variable y is set as “abc”. As with the examples discussed with respect to FIGS. 3A and 3B, an nearly infinite number of combinations exist to replace the initial statement. In other implementations, a string or the characters in a string may be assigned numeric values, and the operations performed on the numeric examples in the figures above and below may be applied, and then the resulting numeric value or values may be converted back into alphanumeric characters for a string. For example, the letter “a” may be assigned a value of 256 in a particular font definition, and the techniques discussed here may be used to break the value of 256 up into a plurality of expressions. When those expressions are executed at the client device, the number 256 may be returned, and then may be rendered as a glyph for the character set, as an “a.”
  • Another method of equivalent statement replacement involves adding junk code or junk branches, and may be applied as an alternative or additionally to the other examples discussed here. The purpose of adding the junk code is to add “noise” to the code so that potential hackers or attackers cannot use the position of expressions in the code (e.g, line number or the nth statement) to locate a key function or variable. Junk code may be one or more statements that execute but have no effect on the execution of the rest of the program or the operation of the program. For instance, two simple assignment statements “j=4” and “k=3+j” may be a part of the set of statements that the replacement code generator creates. However, variables j and k not be present anywhere else in the program code, so that while the new code causes j to be assigned the value of 4 and k to be assigned the value of 7, variables j and k are not used anywhere else in the code and do not otherwise affect the operation or execution of the program code.
  • Alternatively, junk branches can be generated to add a level of obfuscation to the code. FIG. 3D shows an example of adding a junk branch to the statement “var y=2”. Because the conditional statement “if (3<2)” is always false, junk branch “y=1” will never execute (e.g., the assignment of the value of 1 to y will never occur). The junk code or junk branches make it difficult for an attacker program to discern sensitive data from junk code or junk branches without more carefully parsing or analyzing the web content.
  • FIG. 3E shows an example of permutations of statements for a scenario in which equivalent statements do not require a strict order. For example, the statement “var y=[1, 2, 3]” creates an array with three elements. The collectively equivalent four statements “var y=[ ]”, “y[1]=2”, “y[2]=3”, “y[0]=1” create the same array y regardless of the order that the latter three statements are executed. Where the order of the code does not matter, a larger number of potential random ways exist to rewrite the code. Specifically, for N such statements for which order does not matter, there are N! possible permutations. In the example shown in FIG. 3E, because there are three statements where the order is irrelevant, there are six possible ways to generate equivalent code in this manner. Permutation of statements may be used on an array, list string, or other collection data structure.
  • The encoding system may further employ recursive encoding. FIG. 3F illustrates an example of random recursive encoding where a constant is replaced with a random expression and then a junk branch is then added to one replacement statement. Specifically, first, a single statement “var y=2” is replaced with three separate statements. Then, one of the three statements is then replaced with other statements that add a junk branch, similar to the example shown in FIG. 3D. Additional recursive encoding may be employed. After applying random basic polymorphic encoding approaches a random number of times, a simple assignment statement, such as “var y=2” can be transformed into hundreds of lines of code.
  • FIGS. 3A-3F illustrated various methods of producing equivalent statements 204. After performing such methods, an encoded program P′ 206 can be generated by interleaving the equivalent statements E 204 with the rest of the code, as shown, for example, in FIG. 3G (and potentially identifying which of the statements can be included in either order relative to other of the statements). In FIG. 3G, two separate statements 372 and 374—“var x=2” and “var y=3”—are part of web content, for example, program code P 202. In a first step, a set of replacement statements 376 are generated for statement 372 and a set of replacement statements 378 are generated for statement 374. Then, in a second step, the individual expressions of the set of replacement statements 376 are interleaved with the individual expressions of the set of replacement statements 378 to form an encoded form of the original web content 380. FIG. 3G shows an example where the order of the statements in each of the sets of replacement statements 376, 378 matter (i.e., affect the outcome of the executed code), but the order of the statements between the two sets of replacement statements 376, 378 does not matter.
  • FIG. 4 is a flow diagram of a process for serving modified program code. In general, the process involves identifying items in content to be served to a client computer that may potentially include sensitive data, transforming the data dynamically and randomly into a set of other data, and incorporating the set of other data into the content in a manner so as to hide the potentially sensitive data.
  • The process begins at box 402, where a request for web content is received, such as from a client computer operated by an individual seeking to perform a banking transaction at a website for the individual's bank. The request may be in the form of an HTTP request and may be received by a load balancer operated by, or for, the bank. The load balancer may recognize the form of the request and understand that it is to be handled by a security system that the bank has installed to operate along with its web server system. The load balancer may thus provide the request to the security system, which may forward it to the web server system after analyzing the request (e.g., to open a tracking session based on the request), or may provide the request to the web server system and also provide information about the request to the security system in parallel.
  • At box 404, a response to the request is generated by the web server system. For example, the user may have requested to perform a funds transfer between accounts at the bank, where the funds are owned by the individual, and the response by the web server system may include HTML for a webpage on which the user can specify parameters for the transaction, along with JavaScript code and CSS code for carrying out such transactions at a web browser operated by the individual.
  • At box 406, the web server system sends the response to the request to an encoding system. The response may comprise the web content requested by the client computer. Included in the response may be potentially sensitive data, such as, for example, or account numbers, routing numbers, or other data relating to a banking transaction. At box 408, the encoding system receives the web content from the web server system and identifies potentially sensitive data in the web content.
  • At box 410, the encoding system generates code to replace the sensitive data. The sensitive data may be written as a set of replacement statements, which, when executed, are displayed the same as the sensitive data, resulting in no difference in appearance to a user requesting the web content. Various methods for rewriting or replacing the sensitive data are possible, including the methods described above with respect to FIGS. 3A-3F. The replacement of sensitive data may include replacing a single statement or expression in the web content, or it may include replacing numerous statements. At a minimum, however, a single statement, or expression, is replaced with a set of equivalent statements. The set equivalent statements may comprise one or more statements, which, when collectively executed, output the same result as the initial statement comprising sensitive data.
  • In some instances, the encoding system may identify a single statement assigning a constant value to contain sensitive data. In response, the encoding system may randomly generate a set of equivalent statements, which, collectively, make the same assignment, as illustrated, for example, in FIG. 3A. In other instances, the encoding system may identify a single statement containing sensitive data. In response, the encoding system may add one or more lines of junk code or junk branches. The purpose of the junk code is to add a layer of randomness to the code to prevent potential hackers from using the position (e.g., line number) of code to identify potentially sensitive data. When executed, the junk code has no visible effect on the displayed web content. Similarly, the encoding system may generate junk branches that appear to supplement the original statement or expression in the code. In some instances, the junk branches may comprise conditional statements or expressions that will never execute. An example of generating a junk branch is discussed above with respect to FIG. 3D. In that example, the conditional statement “if (3<2)” will never be true, so the assignment statement “y=1” will not occur and instead, the value 2 will always be assigned to y. Adding junk branches adds an additional level of obfuscation to the code that hampers a potential attacker's ability to target sensitive data.
  • In some instances, the encoding system may employ recursive coding, generating multiple “layers” of replacement code. An example of recursive coding is shown, for example, in FIG. 3F. In a first step, an assignment statement is replaced with three separate statements. In a second step, a junk branch is added to one of the three replacement statements. While the example shown in FIG. 3F shows only two “layers” of replacement code, any number of “layers” of replacement code may be generated.
  • After the encoding system generates replacement code, the method moves to box 412 where the various replacement statements are interleaved in the code of the web content. An example of the interleaving process is described above with respect to FIG. 3G. In that example, the encoder system first generates two sets of replacement code 376 and 378 in response to two separate input statements, “var x=2” and “var y=3”, respectively. Then the individual statements of the two sets of replacement code 376 and 378 are interleaved. Each of the two sets of replacement code 376 and 378 has a particular ordering of the statements. In some instances, the ordering of the individual statements affects the execution of the code, while in other instances, the ordering does not affect the outcome. For example, in the set 376, statements “var a=10” and “var b=8” can be executed in any order with respect to one another but must be executed before the statement “x=a−b” is executed. Where the order of the statements is irrelevant, more variations of interleaved code are possible.
  • In some instances, the encoding system randomly and dynamically generates code to replace the sensitive data. That is, given the same input code (i.e., web content), the encoding system does not necessarily generate the same replacement code in response to two different requests for the web content. Furthermore, the approach for generating the replacement code may be different in response to two requests for the same web content. For example, in response to one request, the encoder system may replace a first statement with a set of three replacement statements that collectively result in the same result as the first statement, such as the example shown in FIG. 3A. In response to a second request for the same web content, the encoder system may replace the same first statement with a set of three different replacement statements that include a junk branch, such as the example shown in FIG. 3D. In both examples, the first statement to be replaced (presumably containing potentially sensitive data) is “var y=2”, but the encoder system generates two different sets of replacement statements. In another example, the encoder may generate the same two sets of replacements statements in response to a first statement (e.g., “var y=2”), such as the replacement statements shown in FIG. 3A, but it may then interleave the code in a different order so that the resulting web code produced in response to the two requests are identical.
  • The process then serves the recoded web content at box 414, in familiar manners. Such a process may be performed repeatedly each time a client computer requests content, with the recoded content being different each time the content is served through the encoding system, including when identical or nearly identical content is requested in separate transactions by two different users or by the same user.
  • In addition, the code that is served by the encoding system may be supplemented with instrumentation code that runs on the computer browser and monitors interaction with the web page. For example, the instrumentation code may look for particular method calls or other calls to be made, such as when the calls or actions relate to a field in a form that is deemed to be subject to malicious activity, such as a client ID number field, a transaction account number field, or a transaction amount field. When the instrumentation code observes such activity on the client device, it will report that activity along with metadata that helps to characterize the activity, the process receives such reports from the instrumentation code and processes them, such as by forwarding them to a central security system that may analyze them to determine whether such activity is benign or malicious.
  • FIG. 5 shows a system 500 for serving polymorphic and instrumented code. Generally, polymorphic code is code that is changed in different manners for different servings of the code, in manners that do not affect the way in which the executed code is perceived by users. The goal is to create a moving target for malware that tries to determine how the code operates, but without changing the user experience. Instrumented code is code that is served, e.g., to a browser, with the main functional code and monitors how the functional code operates on a client device, and how other code may interact with the functional code and other activities on the client device. In certain implementations, the system 500 may identify values or expressions in the code that can be replaced with multiple other expressions that, when executed on a client device, resolve to the initial value or expressions.
  • The system 500 may be adapted to perform deflection and detection of malicious activity with respect to a web server system. Deflection may occur, for example, by the serving of polymorphic code, which interferes with the ability of malware to interact effectively with the code that is served. Detection may occur, for example, by adding instrumentation code (including injected code for a security service provider) that monitors activity of client devices that are served web code.
  • The system 500 in this example is a system that is operated by or for a large number of different businesses that serve web pages and other content over the internet, such as banks and retailers that have on-line presences (e.g., on-line stores, or on-line account management tools). The main server systems operated by those organizations or their agents are designated as web servers 504 a-504 n, and could include a broad array of web servers, content servers, database servers, financial servers, load balancers, and other necessary components (either as physical or virtual servers).
  • In this example, security server systems 502 a to 502 n may cause code from the web server system to be supplemented and altered. In one example of the supplementation, code may be provided, either by the web server system itself as part of the originally-served code, or by another mechanism after the code is initially served, such as by the security server systems 502 a to 502 n, where the supplementing code causes client devices to which the code is served to transmit data that characterizes the client devices and the use of the client devices. As also described below, other actions may be taken by the supplementing code, such as the code reporting actual malware activity or other anomalous activity at the client devices that can then be analyzed to determine whether the activity is malware activity.
  • The set of security server systems 502 a to 502 n is shown connected between the web servers 504 a to 504 n and a network 510 such as the internet. Although both extend to n in number, the actual number of sub-systems could vary. For example, certain of the customers could install two separate security server systems to serve all of their web server systems (which could be one or more), such as for redundancy purposes. The particular security server systems 502 a-502 n may be matched to particular ones of the web server systems 504 a-504 n, or they may be at separate sites, and all of the web servers for various different customers may be provided with services by a single common set of security servers 502 a-502 n (e.g., when all of the server systems are at a single co-location facility so that bandwidth issues are minimized).
  • Each of the security server systems 502 a-502 n may be arranged and programmed to carry out operations like those discussed above and below and other operations. For example, a policy engine 520 in each such security server system may evaluate HTTP requests from client computers (e.g., desktop, laptop, tablet, and smartphone computers) based on header and network information, and can set and store session information related to a relevant policy. The policy engine may be programmed to classify requests and correlate them to particular actions to be taken to code returned by the web server systems before such code is served back to a client computer. When such code returns, the policy information may be provided to a decode, analysis, and re-encode module, which matches the content to be delivered, across multiple content types (e.g., HTML, JavaScript, and CSS), to actions to be taken on the content (e.g., using XPATH within a DOM), such as substitutions, addition of content, and other actions that may be provided as extensions to the system. For example, the different types of content may be analyzed to determine naming that may extend across such different pieces of content (e.g., the name of a function or parameter), and such names may be changed in a way that differs each time the content is served, e.g., by replacing a named item with randomly-generated characters. Elements within the different types of content may also first be grouped as having a common effect on the operation of the code (e.g., if one element makes a call to another), and then may be re-encoded together in a common manner so that their interoperation with each other will be consistent even after the re-encoding.
  • Both the analysis of content for determining which transformations to apply to the content, and the transformation of the content itself, may occur at the same time (after receiving a request for the content) or at different times. For example, the analysis may be triggered, not by a request for the content, but by a separate determination that the content newly exists or has been changed. Such a determination may be via a “push” from the web server system reporting that it has implemented new or updated content. The determination may also be a “pull” from the security servers 502 a-502 n, such as by the security servers 502 a-502 n implementing a web crawler (not shown) to recursively search for new and changed content and to report such occurrences to the security servers 502 a-502 n, and perhaps return the content itself and perhaps perform some processing on the content (e.g., indexing it or otherwise identifying common terms throughout the content, creating DOMs for it, etc.). The analysis to identify portions of the content that should be subjected to polymorphic modifications each time the content is served may then be performed according to the manner discussed above and below.
  • A rules engine 522 may store analytical rules for performing such analysis and for re-encoding of the content. The rules engine 522 may be populated with rules developed through operator observation of particular content types, such as by operators of a system studying typical web pages that call JavaScript content and recognizing that a particular method is frequently used in a particular manner. Such observation may result in the rules engine 522 being programmed to identify the method and calls to the method so that they can all be grouped and re-encoded in a consistent and coordinated manner.
  • The decode, analysis, and re-encode module 524 encodes content being passed to client computers from a web server according to relevant policies and rules. The module 524 also reverse encodes requests from the client computers to the relevant web server or servers. For example, a web page may be served with a particular parameter, and may refer to JavaScript that references that same parameter. The decode, analysis, and re-encode module 524 may replace the name of that parameter, in each of the different types of content, with a randomly generated name, and each time the web page is served (or at least in varying sessions), the generated name may be different. When the name of the parameter is passed back to the web server, it may be re-encoded back to its original name so that this portion of the security process may occur seamlessly for the web server.
  • A key for the function that encodes and decodes such strings can be maintained by the security server system 502 along with an identifier for the particular client computer so that the system 502 may know which key or function to apply, and may otherwise maintain a state for the client computer and its session. A stateless approach may also be employed, whereby the system 502 encrypts the state and stores it in a cookie that is saved at the relevant client computer. The client computer may then pass that cookie data back when it passes the information that needs to be decoded back to its original status. With the cookie data, the system 502 may use a private key to decrypt the state information and use that state information in real-time to decode the information from the client computer. Such a stateless implementation may create benefits such as less management overhead for the server system 502 (e.g., for tracking state, for storing state, and for performing clean-up of stored state information as sessions time out or otherwise end) and as a result, higher overall throughput.
  • The decode, analysis, and re-encode module 524 and the security server system 502 may be configured to modify web code differently each time it is served in a manner that is generally imperceptible to a user who interacts with such web code. For example, multiple different client computers may request a common web resource such as a web page or web application that a web server provides in response to the multiple requests in substantially the same manner. Thus, a common web page may be requested from a web server, and the web server may respond by serving the same or substantially identical HTML, CSS, JavaScript, images, and other web code or files to each of the clients in satisfaction of the requests. In some instances, particular portions of requested web resources may be common among multiple requests, while other portions may be client or session specific. The decode, analysis, and re-encode module 524 may be adapted to apply different modifications to each instance of a common web resource, or common portion of a web resource, such that the web code that it is ultimately delivered to the client computers in response to each request for the common web resource includes different modifications.
  • In certain implementations, the analysis can happen a single time for a plurality of servings of the code in different recoded instances. For example, the analysis may identify a particular function name and all of the locations it occurs throughout the relevant code, and may create a map to each such occurrence in the code. Subsequently, when the web content is called to be served, the map can be consulted and random strings may be inserted in a coordinated matter across the code, though the generation of a new name each time for the function name and the replacement of that name into the code, will require much less computing cost than would full re-analysis of the content. Also, when a page is to be served, it can be analyzed to determine which portions, if any, have changed since the last analysis, and subsequent analysis may be performed only on the portions of the code that have changed.
  • Even where different modifications are applied in responding to multiple requests for a common web resource, the security server system 502 can apply the modifications in a manner that does not substantially affect a way that the user interacts with the resource, regardless of the different transformations applied. For example, when two different client computers request a common web page, the security server system 502 applies different modifications to the web code corresponding to the web page in response to each request for the web page, but the modifications do not substantially affect a presentation of the web page between the two different client computers. The modifications can therefore be made largely transparent to users interacting with a common web resource so that the modifications do not cause a substantial difference in the way the resource is displayed or the way the user interacts with the resource on different client devices or in different sessions in which the resource is requested.
  • An instrumentation module 526 is programmed to add instrumentation code to the content that is served from a web server. The instrumentation code is code that is programmed to monitor the operation of other code that is served. For example, the instrumentation code may be programmed to identify when certain methods are called, when those methods have been identified as likely to be called by malicious software. When such actions are observed to occur by the instrumentation code, the instrumentation code may be programmed to send a communication to the security server reporting on the type of action that occurred and other meta data that is helpful in characterizing the activity. Such information can be used to help determine whether the action was malicious or benign.
  • The instrumentation code may also analyze the DOM on a client computer in predetermined manners that are likely to identify the presence of and operation of malicious software, and to report to the security servers 502 or a related system. For example, the instrumentation code may be programmed to characterize a portion of the DOM when a user takes a particular action, such as clicking on a particular on-page button, so as to identify a change in the DOM before and after the click (where the click is expected to cause a particular change to the DOM if there is benign code operating with respect to the click, as opposed to malicious code operating with respect to the click). Data that characterizes the DOM may also be hashed, either at the client computer or the server system 502, to produce a representation of the DOM (e.g., in the differences between part of the DOM before and after a defined action occurs) that is easy to compare against corresponding representations of DOMs from other client computers. Other techniques may also be used by the instrumentation code to generate a compact representation of the DOM or other structure expected to be affected by malicious code in an identifiable manner.
  • As noted, the content from web servers 504 a-504 n, as encoded by decode, analysis, and re-encode module 524, may be rendered on web browsers of various client computers. Uninfected client computers 513A-512 n represent computers that do not have malicious code programmed to interfere with a particular site a user visits or to otherwise perform malicious activity. Infected client computers 514 a-514 n represent computers that do have malware or malicious code (518 a-518 n, respectively) programmed to interfere with a particular site a user visits or to otherwise perform malicious activity. In certain implementations, the client computers 513A-512 n, 514 a-514 n may also store the encrypted cookies discussed above and pass such cookies back through the network 510. The client computers 512A-512 n, 514 a-514 n will, once they obtain the served content, implement DOMs for managing the displayed web pages, and instrumentation code may monitor the respective DOMs as discussed above. Reports of illogical activity (e.g., software on the client device calling a method that does not exist in the downloaded and rendered content) can then be reported back to the server system.
  • The reports from the instrumentation code may be analyzed and processed in various manners in order to determine how to respond to particular abnormal events, and to track down malicious code via analysis of multiple different similar interactions across different client computers 512A-512 n, 514 a-514 n. For small-scale analysis, each web site operator may be provided with a single security console 507 that provides analytical tools for a single site or group of sites. For example, the console 507 may include software for showing groups of abnormal activities, or reports that indicate the type of code served by the web site that generates the most abnormal activity. For example, a security officer for a bank may determine that defensive actions are needed if most of the reported abnormal activity for its web site relates to content elements corresponding to money transfer operations-an indication that stale malicious code may be trying to access such elements surreptitiously.
  • Console 507 may also be multiple different consoles used by different employees of an operator of the system 500, and may be used for pre-analysis of web content before it is served, as part of determining how best to apply polymorphic transformations to the web code. For example, in combined manual and automatic analysis like that described above, an operator at console 507 may form or apply rules 522 that guide the transformation that is to be performed on the content when it is ultimately served. The rules may be written explicitly by the operator or may be provided by automatic analysis and approved by the operator. Alternatively, or in addition, the operator may perform actions in a graphical user interface (e.g., by selecting particular elements from the code by highlighting them with a pointer, and then selecting an operation from a menu of operations) and rules may be written consistent with those actions.
  • A central security console 508 may connect to a large number of web content providers, and may be run, for example, by an organization that provides the software for operating the security server systems 502A-502 n. Such console 508 may access complex analytical and data analysis tools, such as tools that identify clustering of abnormal activities across thousands of client computers and sessions, so that an operator of the console 508 can focus on those clusters in order to diagnose them as malicious or benign, and then take steps to thwart any malicious activity.
  • In certain other implementations, the console 508 may have access to software for analyzing telemetry data received from a very large number of client computers that execute instrumentation code provided by the system 500. Such data may result from forms being re-written across a large number of web pages and web sites to include content that collects system information such as browser version, installed plug-ins, screen resolution, window size and position, operating system, network information, and the like. In addition, user interaction with served content may be characterized by such code, such as the speed with which a user interacts with a page, the path of a pointer over the page, and the like.
  • Such collected telemetry data, across many thousands of sessions and client devices, may be used by the console 508 to identify what is “natural” interaction with a particular page that is likely the result of legitimate human actions, and what is “unnatural” interaction that is likely the result of a bot interacting with the content. Statistical and machine learning methods may be used to identify patterns in such telemetry data, and to resolve bot candidates to particular client computers. Such client computers may then be handled in special manners by the system 500, may be blocked from interaction, or may have their operators notified that their computer is potentially running malicious software (e.g., by sending an e-mail to an account holder of a computer so that the malicious software cannot intercept it easily).
  • FIG. 6 is a schematic diagram of a general computing system 600. The system 600 can be used for the operations described in association with any of the computer-implement methods described previously, according to one implementation. The system 600 is intended to include various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The system 600 can also include mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices. Additionally the system can include portable storage media, such as, Universal Serial Bus (USB) flash drives. For example, the USB flash drives may store operating systems and other applications. The USB flash drives can include input/output components, such as a wireless transmitter or USB connector that may be inserted into a USB port of another computing device.
  • The system 600 includes a processor 610, a memory 620, a storage device 630, and an input/output device 640. Each of the components 610, 620, 630, and 640 are interconnected using a system bus 650. The processor 610 is capable of processing instructions for execution within the system 600. The processor may be designed using any of a number of architectures. For example, the processor 610 may be a CISC (Complex Instruction Set Computers) processor, a RISC (Reduced Instruction Set Computer) processor, or a MISC (Minimal Instruction Set Computer) processor.
  • In one implementation, the processor 610 is a single-threaded processor. In another implementation, the processor 610 is a multi-threaded processor. The processor 610 is capable of processing instructions stored in the memory 620 or on the storage device 630 to display graphical information for a user interface on the input/output device 640.
  • The memory 620 stores information within the system 600. In one implementation, the memory 620 is a computer-readable medium. In one implementation, the memory 620 is a volatile memory unit. In another implementation, the memory 620 is a non-volatile memory unit.
  • The storage device 630 is capable of providing mass storage for the system 600. In one implementation, the storage device 630 is a computer-readable medium. In various different implementations, the storage device 630 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.
  • The input/output device 640 provides input/output operations for the system 600. In one implementation, the input/output device 640 includes a keyboard and/or pointing device. In another implementation, the input/output device 640 includes a display unit for displaying graphical user interfaces.
  • The features described can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The apparatus can be implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device for execution by a programmable processor; and method steps can be performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output. The described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. A computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors of any kind of computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memories for storing instructions and data. Generally, a computer will also include, or be operatively coupled to communicate with, one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
  • To provide for interaction with a user, the features can be implemented on a computer having a display device such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer. Additionally, such activities can be implemented via touchscreen flat-panel displays and other appropriate mechanisms.
  • The features can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system can be connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), peer-to-peer networks (having ad-hoc or static members), grid computing infrastructures, and the Internet.
  • The computer system can include clients and servers. A client and server are generally remote from each other and typically interact through a network, such as the described one. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular implementations of particular inventions. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
  • Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
  • Thus, particular implementations of the subject matter have been described. Other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous. In some implementations, the subject matter may be embodied as methods, systems, devices, and/or as an article or computer program product. The article or computer program product may comprise one or more computer-readable media or computer-readable storage devices, which may be tangible and non-transitory, that include instructions that may be executable by one or more machines such as computer processors.

Claims (20)

What is claimed is:
1. A computer-implemented method, comprising:
identifying a piece of data for serving from a server system to a client device that is remote from the server system, the piece of data being part of executable code requested from the server from the client device;
creating a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; and
providing, to the client device and as part of the executable code, the plurality of expressions along with code for executing the plurality of expressions, so that when the plurality of expressions are executed on the client device, the identified piece of data is returned on the client device without a need to serve the identified piece of data to the client device.
2. The computer-implemented method of claim 1, further comprising performing a permutation on the plurality of expressions so that the plurality of expressions are ordered in the executable code in an order different than they were created.
3. The computer-implemented method of claim 2, wherein the order of the expressions is selected randomly as part of the permutation.
4. The computer-implemented method of claim 1, further comprising:
creating one or more additional expressions whose executed results are not used by other code that is part of the executable code served to the client device; and
providing to the client device the plurality of expressions with the one or more additional expressions.
5. The computer-implemented method of claim 1, further comprising:
identifying, in the piece of data, data that needs to be kept away from malware that may be in the client device, and
wherein creating a plurality of expressions comprises creating one or more replacement statements that when executed, provide a result that corresponds to the potentially sensitive data.
6. The computer-implemented method of claim 5, wherein the one or more replacement statements comprise one or more expressions that do not execute on the client device when the executable code is executed.
7. The computer-implemented method of claim 1, further comprising identifying, in the piece of data, a first expression and a second expression to be replaced, wherein creating a plurality of expressions comprises creating a first set of replacement expressions corresponding to the first expression and a second set of expressions corresponding to the second expressions; and
interleaving the replacement expressions of the first set of replacement expressions with the replacement expressions of the second set of replacement expressions,
wherein the plurality of expressions provided to the client device comprise the interleaved replacement expressions.
8. The computer-implemented method of claim 1, wherein creating a plurality of expressions comprises:
creating a first set of replacement expressions;
identifying a first replacement expression in the first set of replacement expressions;
creating a second set of replacement expressions that, when executed, provide a result that corresponds to the first replacement expression; and
replacing the first replacement expression with the second set of replacement expressions.
9. The computer-implemented method of claim 1, wherein the piece of data to be served comprises formats of code in HTML, CSS, and JavaScript, and wherein each of the formats interoperates with the other formats.
10. A computer-implemented method, the method comprising:
receiving, from a server system, web content comprising original code, wherein the web content is requested by a client device that is remote from the server system;
identifying a piece of data in the code;
creating a plurality of expressions that, when executed, provide a result that corresponds to the piece of data;
generating modified code comprising the original code with the piece of data replaced with the plurality of expressions; and
providing the modified code to the client device, wherein, when executed, the modified code provides a result that corresponds to the original code.
11. The computer-implemented method of claim 10, wherein generating modified code comprises:
interleaving the plurality of expressions into the original code with the identified piece of data removed.
12. The computer-implemented method of claim 11,
wherein the plurality of expressions are created in a first ordering, and
wherein the plurality of expressions are interleaved into the original code so that the plurality of expressions maintain the first ordering.
13. The computer-implemented method of claim 11,
wherein the plurality of expressions are created in a first ordering, and
wherein the plurality of expressions are interleaved into the original code so that the plurality of expressions are in a second ordering that is different than the first ordering.
14. The computer-implemented method of claim 10, wherein the plurality of expressions comprises one or more junk expressions that do not execute.
15. The computer-implemented method of claim 10, further comprising:
selecting a first expression among the plurality of expressions; and
creating a second plurality of expressions that, when executed, provide a result that corresponds to the selected first expression,
wherein the generated modified code comprises the original code with the piece of data replaced with the plurality of expressions, with the selected first expression replaced with the second plurality of expressions.
16. A computer system for recoding web content served to client computers, the system comprising:
an interface for receiving information from a web server system configured to provide computer code in multiple different formats in response to requests from client computing devices; and
a security intermediary that is arranged to (i) receive the computer code from the interface before the computer code is provided to the client computing devices, (ii) identify a piece of data in the computer code that is to be replaced; (iii) create a plurality of expressions that, when executed, provide a result that corresponds to the piece of data; and (iv) provide the plurality of expressions to the client computing devices with code for executing the plurality of expressions.
17. The computer-implemented system of claim 16, wherein the piece of data in the computer code that is to be replaced is identified as potentially sensitive data.
18. The computer-implemented system of claim 16, wherein the security intermediary is further arranged to perform a permutation of the plurality of expressions.
19. The computer-implemented system of claim 16, wherein the plurality of expressions comprise one or more expressions that do not execute.
20. The computer-implemented system of claim 16, wherein the security intermediary is further arranged to interleave the plurality of expressions with the code of executing the plurality of expressions.
US15/859,694 2014-05-23 2018-01-01 Obfuscating web code Abandoned US20180121680A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/859,694 US20180121680A1 (en) 2014-05-23 2018-01-01 Obfuscating web code

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/286,324 US9858440B1 (en) 2014-05-23 2014-05-23 Encoding of sensitive data
US15/859,694 US20180121680A1 (en) 2014-05-23 2018-01-01 Obfuscating web code

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/286,324 Continuation US9858440B1 (en) 2014-05-23 2014-05-23 Encoding of sensitive data

Publications (1)

Publication Number Publication Date
US20180121680A1 true US20180121680A1 (en) 2018-05-03

Family

ID=60971724

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/286,324 Active US9858440B1 (en) 2014-05-23 2014-05-23 Encoding of sensitive data
US15/859,694 Abandoned US20180121680A1 (en) 2014-05-23 2018-01-01 Obfuscating web code

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/286,324 Active US9858440B1 (en) 2014-05-23 2014-05-23 Encoding of sensitive data

Country Status (1)

Country Link
US (2) US9858440B1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10216488B1 (en) 2016-03-14 2019-02-26 Shape Security, Inc. Intercepting and injecting calls into operations and objects
US10230718B2 (en) 2015-07-07 2019-03-12 Shape Security, Inc. Split serving of computer code
CN110263533A (en) * 2019-04-28 2019-09-20 清华大学 Safe web page means of defence
US10834101B2 (en) 2016-03-09 2020-11-10 Shape Security, Inc. Applying bytecode obfuscation techniques to programs written in an interpreted language
US20210334342A1 (en) * 2020-04-27 2021-10-28 Imperva, Inc. Procedural code generation for challenge code
US11349816B2 (en) 2016-12-02 2022-05-31 F5, Inc. Obfuscating source code sent, from a server computer, to a browser on a client computer
EP4209938A1 (en) * 2022-01-05 2023-07-12 Irdeto B.V. Systems, methods, and storage media for creating secured computer code
US11741197B1 (en) 2019-10-15 2023-08-29 Shape Security, Inc. Obfuscating programs using different instruction set architectures

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10657262B1 (en) * 2014-09-28 2020-05-19 Red Balloon Security, Inc. Method and apparatus for securing embedded device firmware
US10311229B1 (en) * 2015-05-18 2019-06-04 Amazon Technologies, Inc. Mitigating timing side-channel attacks by obscuring alternatives in code
US10868665B1 (en) * 2015-05-18 2020-12-15 Amazon Technologies, Inc. Mitigating timing side-channel attacks by obscuring accesses to sensitive data
US10380355B2 (en) * 2017-03-23 2019-08-13 Microsoft Technology Licensing, Llc Obfuscation of user content in structured user data files
US10410014B2 (en) 2017-03-23 2019-09-10 Microsoft Technology Licensing, Llc Configurable annotations for privacy-sensitive user content
US11042634B2 (en) * 2018-12-21 2021-06-22 Fujitsu Limited Determining information leakage of computer-readable programs
US11677783B2 (en) * 2019-10-25 2023-06-13 Target Brands, Inc. Analysis of potentially malicious emails
US20210303662A1 (en) * 2020-03-31 2021-09-30 Irdeto B.V. Systems, methods, and storage media for creating secured transformed code from input code using a neural network to obscure a transformation function
US11611629B2 (en) * 2020-05-13 2023-03-21 Microsoft Technology Licensing, Llc Inline frame monitoring

Citations (77)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5003596A (en) * 1989-08-17 1991-03-26 Cryptech, Inc. Method of cryptographically transforming electronic digital data from one form to another
US5315657A (en) * 1990-09-28 1994-05-24 Digital Equipment Corporation Compound principals in access control lists
US5892899A (en) * 1996-06-13 1999-04-06 Intel Corporation Tamper resistant methods and apparatus
US6006328A (en) * 1995-07-14 1999-12-21 Christopher N. Drake Computer software authentication, protection, and security system
US6088452A (en) * 1996-03-07 2000-07-11 Northern Telecom Limited Encoding technique for software and hardware
US6594761B1 (en) * 1999-06-09 2003-07-15 Cloakware Corporation Tamper resistant software encoding
US20030159063A1 (en) * 2002-02-07 2003-08-21 Larry Apfelbaum Automated security threat testing of web pages
US20030163718A1 (en) * 2000-04-12 2003-08-28 Johnson Harold J. Tamper resistant software-mass data encoding
US6668325B1 (en) * 1997-06-09 2003-12-23 Intertrust Technologies Obfuscation techniques for enhancing software security
US20040101142A1 (en) * 2001-07-05 2004-05-27 Nasypny Vladimir Vladimirovich Method and system for an integrated protection system of data distributed processing in computer networks and system for carrying out said method
US20040139340A1 (en) * 2000-12-08 2004-07-15 Johnson Harold J System and method for protecting computer software from a white box attack
US6779114B1 (en) * 1999-08-19 2004-08-17 Cloakware Corporation Tamper resistant software-control flow encoding
US20050002532A1 (en) * 2002-01-30 2005-01-06 Yongxin Zhou System and method of hiding cryptographic private keys
US20050166191A1 (en) * 2004-01-28 2005-07-28 Cloakware Corporation System and method for obscuring bit-wise and two's complement integer computations in software
US20050183072A1 (en) * 1999-07-29 2005-08-18 Intertrust Technologies Corporation Software self-defense systems and methods
US20060031686A1 (en) * 1999-09-03 2006-02-09 Purdue Research Foundation Method and system for tamperproofing software
US20060034455A1 (en) * 2004-08-12 2006-02-16 Damgaard Ivan B Permutation data transform to enhance security
US20060101047A1 (en) * 2004-07-29 2006-05-11 Rice John R Method and system for fortifying software
US20060195703A1 (en) * 2005-02-25 2006-08-31 Microsoft Corporation System and method of iterative code obfuscation
US20060195588A1 (en) * 2005-01-25 2006-08-31 Whitehat Security, Inc. System for detecting vulnerabilities in web applications using client-side application interfaces
US7103180B1 (en) * 2001-10-25 2006-09-05 Hewlett-Packard Development Company, L.P. Method of implementing the data encryption standard with reduced computation
US20060253687A1 (en) * 2005-05-09 2006-11-09 Microsoft Corporation Overlapped code obfuscation
US20070039048A1 (en) * 2005-08-12 2007-02-15 Microsoft Corporation Obfuscating computer code to prevent an attack
US20070064617A1 (en) * 2005-09-15 2007-03-22 Reves Joseph P Traffic anomaly analysis for the detection of aberrant network code
US20080025496A1 (en) * 2005-08-01 2008-01-31 Asier Technology Corporation, A Delaware Corporation Encrypting a plaintext message with authentication
US20080208560A1 (en) * 2007-02-23 2008-08-28 Harold Joseph Johnson System and method of interlocking to protect software - mediated program and device behaviors
US20080222736A1 (en) * 2007-03-07 2008-09-11 Trusteer Ltd. Scrambling HTML to prevent CSRF attacks and transactional crimeware attacks
US20080229394A1 (en) * 2006-07-10 2008-09-18 Sci Group Method and System For Securely Protecting Data During Software Application Usage
US7472413B1 (en) * 2003-08-11 2008-12-30 F5 Networks, Inc. Security for WAP servers
US7506177B2 (en) * 2001-05-24 2009-03-17 Cloakware Corporation Tamper resistant software encoding and analysis
US20090077383A1 (en) * 2007-08-06 2009-03-19 De Monseignat Bernard System and method for authentication, data transfer, and protection against phishing
US20090119515A1 (en) * 2005-10-28 2009-05-07 Matsushita Electric Industrial Co., Ltd. Obfuscation evaluation method and obfuscation method
US20090193513A1 (en) * 2008-01-26 2009-07-30 Puneet Agarwal Policy driven fine grain url encoding mechanism for ssl vpn clientless access
US7580521B1 (en) * 2003-06-25 2009-08-25 Voltage Security, Inc. Identity-based-encryption system with hidden public key attributes
US20090235089A1 (en) * 2008-03-12 2009-09-17 Mathieu Ciet Computer object code obfuscation using boot installation
US20090249492A1 (en) * 2006-09-21 2009-10-01 Hans Martin Boesgaard Sorensen Fabrication of computer executable program files from source code
US20090254572A1 (en) * 2007-01-05 2009-10-08 Redlich Ron M Digital information infrastructure and method
US20090307500A1 (en) * 2006-02-06 2009-12-10 Taichi Sato Program obfuscator
US20100058301A1 (en) * 2008-08-26 2010-03-04 Apple Inc. System and method for branch extraction obfuscation
US20100083072A1 (en) * 2008-09-30 2010-04-01 Freescale Semiconductor, Inc. Data interleaver
US20100107245A1 (en) * 2008-10-29 2010-04-29 Microsoft Corporation Tamper-tolerant programs
US20100186089A1 (en) * 2009-01-22 2010-07-22 International Business Machines Corporation Method and system for protecting cross-domain interaction of a web application on an unmodified browser
US20100257354A1 (en) * 2007-09-07 2010-10-07 Dis-Ent, Llc Software based multi-channel polymorphic data obfuscation
US20100281459A1 (en) * 2009-05-01 2010-11-04 Apple Inc. Systems, methods, and computer-readable media for fertilizing machine-executable code
US20110129089A1 (en) * 2009-11-30 2011-06-02 Electronics And Telecommunications Research Institute Method and apparatus for partially encoding/decoding data for commitment service and method of using encoded data
US20110131416A1 (en) * 2009-11-30 2011-06-02 James Paul Schneider Multifactor validation of requests to thw art dynamic cross-site attacks
US20110167407A1 (en) * 2010-01-06 2011-07-07 Apple Inc. System and method for software data reference obfuscation
US20110302424A1 (en) * 2001-06-13 2011-12-08 Intertrust Technologies Corp. Software Self-Checking Systems and Methods
US20120022942A1 (en) * 2010-04-01 2012-01-26 Lee Hahn Holloway Internet-based proxy service to modify internet responses
US8185749B2 (en) * 2008-09-02 2012-05-22 Apple Inc. System and method for revising boolean and arithmetic operations
US8266243B1 (en) * 2010-03-30 2012-09-11 Amazon Technologies, Inc. Feedback mechanisms providing contextual information
US8347398B1 (en) * 2009-09-23 2013-01-01 Savvystuff Property Trust Selected text obfuscation and encryption in a local, network and cloud computing environment
US20130046995A1 (en) * 2010-02-23 2013-02-21 David Movshovitz Method and computer program product for order preserving symbol based encryption
US8393003B2 (en) * 2006-12-21 2013-03-05 Telefonaktiebolaget L M Ericsson (Publ) Obfuscating computer program code
US8392910B1 (en) * 2007-04-10 2013-03-05 AT & T Intellectual Property II, LLP Stochastic method for program security using deferred linking
US20130061323A1 (en) * 2008-04-23 2013-03-07 Trusted Knight Corporation System and method for protecting against malware utilizing key loggers
US20130067225A1 (en) * 2008-09-08 2013-03-14 Ofer Shochet Appliance, system, method and corresponding software components for encrypting and processing data
US20130179985A1 (en) * 2012-01-05 2013-07-11 Vmware, Inc. Securing user data in cloud computing environments
US20130232578A1 (en) * 2012-03-02 2013-09-05 Apple Inc. Method and apparatus for obfuscating program source codes
US8615804B2 (en) * 2010-02-18 2013-12-24 Polytechnic Institute Of New York University Complementary character encoding for preventing input injection in web applications
US20140013427A1 (en) * 2011-03-24 2014-01-09 Irdeto B.V. System And Method Providing Dependency Networks Throughout Applications For Attack Resistance
US20140165197A1 (en) * 2012-12-06 2014-06-12 Empire Technology Development, Llc Malware attack prevention using block code permutation
US8762705B2 (en) * 2008-07-24 2014-06-24 Alibaba Group Holding Limited System and method for preventing web crawler access
US20140282872A1 (en) * 2013-03-15 2014-09-18 Shape Security Inc. Stateless web content anti-automation
US20140283069A1 (en) * 2013-03-15 2014-09-18 Shape Security Inc. Protecting against the introduction of alien content
US20140281535A1 (en) * 2013-03-15 2014-09-18 Munibonsoftware.com, LLC Apparatus and Method for Preventing Information from Being Extracted from a Webpage
US20150039962A1 (en) * 2010-09-10 2015-02-05 John P. Fonseka Methods, apparatus, and systems for coding with constrained interleaving
US20150180509A9 (en) * 2010-09-10 2015-06-25 John P. Fonseka Methods, apparatus, and systems for coding with constrained interleaving
US20150350243A1 (en) * 2013-03-15 2015-12-03 Shape Security Inc. Safe Intelligent Content Modification
US9241004B1 (en) * 2014-03-11 2016-01-19 Trend Micro Incorporated Alteration of web documents for protection against web-injection attacks
US9270647B2 (en) * 2013-12-06 2016-02-23 Shape Security, Inc. Client/server security by an intermediary rendering modified in-memory objects
US20170041341A1 (en) * 2014-05-23 2017-02-09 Shape Security, Inc. Polymorphic Treatment of Data Entered At Clients
US9582666B1 (en) * 2015-05-07 2017-02-28 Shape Security, Inc. Computer system for improved security of server computers interacting with client computers
US9602543B2 (en) * 2014-09-09 2017-03-21 Shape Security, Inc. Client/server polymorphism using polymorphic hooks
US9712561B2 (en) * 2014-01-20 2017-07-18 Shape Security, Inc. Intercepting and supervising, in a runtime environment, calls to one or more objects in a web page
US10122747B2 (en) * 2013-12-06 2018-11-06 Lookout, Inc. Response generation after distributed monitoring and evaluation of multiple devices
US10216488B1 (en) * 2016-03-14 2019-02-26 Shape Security, Inc. Intercepting and injecting calls into operations and objects

Family Cites Families (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2212574C (en) 1995-02-13 2010-02-02 Electronic Publishing Resources, Inc. Systems and methods for secure transaction management and electronic rights protection
US6865735B1 (en) 1997-10-07 2005-03-08 University Of Washington Process for rewriting executable content on a network server or desktop machine in order to enforce site specific properties
SE512672C2 (en) 1998-06-12 2000-04-17 Ericsson Telefon Ab L M Procedure and system for transferring a cookie
US6697948B1 (en) 1999-05-05 2004-02-24 Michael O. Rabin Methods and apparatus for protecting information
CA2447451C (en) 2000-05-12 2013-02-12 Xtreamlok Pty. Ltd. Information security method and system
US6938170B1 (en) 2000-07-17 2005-08-30 International Business Machines Corporation System and method for preventing automated crawler access to web-based data sources using a dynamic data transcoding scheme
US7117239B1 (en) 2000-07-28 2006-10-03 Axeda Corporation Reporting the state of an apparatus to a remote computer
WO2002088951A1 (en) 2001-04-26 2002-11-07 Telefonaktiebolaget Lm Ericsson (Publ) Stateless server
WO2002093393A1 (en) 2001-05-11 2002-11-21 Sap Portals, Inc. Browser with messaging capability and other persistent connections
US7028305B2 (en) 2001-05-16 2006-04-11 Softricity, Inc. Operating system abstraction and protection layer
US7010779B2 (en) 2001-08-16 2006-03-07 Knowledge Dynamics, Inc. Parser, code generator, and data calculation and transformation engine for spreadsheet calculations
US20040162994A1 (en) 2002-05-13 2004-08-19 Sandia National Laboratories Method and apparatus for configurable communication network defenses
US7117429B2 (en) 2002-06-12 2006-10-03 Oracle International Corporation Methods and systems for managing styles electronic documents
US7333072B2 (en) 2003-03-24 2008-02-19 Semiconductor Energy Laboratory Co., Ltd. Thin film integrated circuit device
US8510571B1 (en) 2003-03-24 2013-08-13 Hoi Chang System and method for inserting security mechanisms into a software program
US7500099B1 (en) 2003-05-16 2009-03-03 Microsoft Corporation Method for mitigating web-based “one-click” attacks
US7735144B2 (en) 2003-05-16 2010-06-08 Adobe Systems Incorporated Document modification detection and prevention
WO2004109532A1 (en) 2003-06-05 2004-12-16 Cubicice (Pty) Ltd A method of collecting data regarding a plurality of web pages visited by at least one user
US8806187B1 (en) 2009-12-03 2014-08-12 Google Inc. Protecting browser-viewed content from piracy
US7624449B1 (en) 2004-01-22 2009-11-24 Symantec Corporation Countering polymorphic malicious computer code through code optimization
US7475341B2 (en) 2004-06-15 2009-01-06 At&T Intellectual Property I, L.P. Converting the format of a portion of an electronic document
US7480385B2 (en) 2004-11-05 2009-01-20 Cable Television Laboratories, Inc. Hierarchical encryption key system for securing digital media
US7707223B2 (en) 2005-04-28 2010-04-27 Cisco Technology, Inc. Client-side java content transformation
US7770185B2 (en) 2005-09-26 2010-08-03 Bea Systems, Inc. Interceptor method and system for web services for remote portlets
US8170020B2 (en) 2005-12-08 2012-05-01 Microsoft Corporation Leveraging active firewalls for network intrusion detection and retardation of attack
GB0620855D0 (en) 2006-10-19 2006-11-29 Dovetail Software Corp Ltd Data processing apparatus and method
JP5133973B2 (en) 2007-01-18 2013-01-30 パナソニック株式会社 Obfuscation support device, obfuscation support method, program, and integrated circuit
US8290800B2 (en) 2007-01-30 2012-10-16 Google Inc. Probabilistic inference of site demographics from aggregate user internet usage and source demographic information
WO2008095018A2 (en) 2007-01-31 2008-08-07 Omniture, Inc. Page grouping for site traffic analysis reports
WO2008130946A2 (en) 2007-04-17 2008-10-30 Kenneth Tola Unobtrusive methods and systems for collecting information transmitted over a network
US8527757B2 (en) 2007-06-22 2013-09-03 Gemalto Sa Method of preventing web browser extensions from hijacking user information
US7941382B2 (en) 2007-10-12 2011-05-10 Microsoft Corporation Method of classifying and active learning that ranks entries based on multiple scores, presents entries to human analysts, and detects and/or prevents malicious behavior
US8260845B1 (en) 2007-11-21 2012-09-04 Appcelerator, Inc. System and method for auto-generating JavaScript proxies and meta-proxies
US8347396B2 (en) 2007-11-30 2013-01-01 International Business Machines Corporation Protect sensitive content for human-only consumption
US9317255B2 (en) 2008-03-28 2016-04-19 Microsoft Technology Licensing, LCC Automatic code transformation with state transformer monads
CA2630388A1 (en) 2008-05-05 2009-11-05 Nima Sharifmehr Apparatus and method to prevent man in the middle attack
KR100987354B1 (en) 2008-05-22 2010-10-12 주식회사 이베이지마켓 System for checking false code in website and Method thereof
US9405555B2 (en) 2008-05-23 2016-08-02 Microsoft Technology Licensing, Llc Automated code splitting and pre-fetching for improving responsiveness of browser-based applications
KR101027928B1 (en) 2008-07-23 2011-04-12 한국전자통신연구원 Apparatus and Method for detecting obfuscated web page
CN102217225B (en) 2008-10-03 2014-04-02 杰出网络公司 Content delivery network encryption
US8020193B2 (en) 2008-10-20 2011-09-13 International Business Machines Corporation Systems and methods for protecting web based applications from cross site request forgery attacks
US8434068B2 (en) 2008-10-23 2013-04-30 XMOS Ltd. Development system
US8225401B2 (en) 2008-12-18 2012-07-17 Symantec Corporation Methods and systems for detecting man-in-the-browser attacks
CN101482882A (en) 2009-02-17 2009-07-15 阿里巴巴集团控股有限公司 Method and system for cross-domain treatment of COOKIE
US9311425B2 (en) 2009-03-31 2016-04-12 Qualcomm Incorporated Rendering a page using a previously stored DOM associated with a different page
US8332952B2 (en) 2009-05-22 2012-12-11 Microsoft Corporation Time window based canary solutions for browser security
US8527774B2 (en) 2009-05-28 2013-09-03 Kaazing Corporation System and methods for providing stateless security management for web applications using non-HTTP communications protocols
US8924943B2 (en) 2009-07-17 2014-12-30 Ebay Inc. Browser emulator system
US11102325B2 (en) 2009-10-23 2021-08-24 Moov Corporation Configurable and dynamic transformation of web content
US8539224B2 (en) 2009-11-05 2013-09-17 International Business Machines Corporation Obscuring form data through obfuscation
US8353037B2 (en) 2009-12-03 2013-01-08 International Business Machines Corporation Mitigating malicious file propagation with progressive identifiers
US8660976B2 (en) 2010-01-20 2014-02-25 Microsoft Corporation Web content rewriting, including responses
US20110255689A1 (en) 2010-04-15 2011-10-20 Lsi Corporation Multiple-mode cryptographic module usable with memory controllers
US8739150B2 (en) 2010-05-28 2014-05-27 Smartshift Gmbh Systems and methods for dynamically replacing code objects via conditional pattern templates
US8914879B2 (en) 2010-06-11 2014-12-16 Trustwave Holdings, Inc. System and method for improving coverage for web code
US20120124372A1 (en) 2010-10-13 2012-05-17 Akamai Technologies, Inc. Protecting Websites and Website Users By Obscuring URLs
US8631091B2 (en) 2010-10-15 2014-01-14 Northeastern University Content distribution network using a web browser and locally stored content to directly exchange content between users
US8751822B2 (en) 2010-12-20 2014-06-10 Motorola Mobility Llc Cryptography using quasigroups
AU2011200413B1 (en) 2011-02-01 2011-09-15 Symbiotic Technologies Pty Ltd Methods and Systems to Detect Attacks on Internet Transactions
US8590041B2 (en) 2011-11-28 2013-11-19 Mcafee, Inc. Application sandboxing using a dynamic optimization framework
US8904279B1 (en) 2011-12-07 2014-12-02 Amazon Technologies, Inc. Inhibiting automated extraction of data from network pages
WO2013091709A1 (en) 2011-12-22 2013-06-27 Fundació Privada Barcelona Digital Centre Tecnologic Method and apparatus for real-time dynamic transformation of the code of a web document
US10049168B2 (en) 2012-01-31 2018-08-14 Openwave Mobility, Inc. Systems and methods for modifying webpage data
US9111090B2 (en) 2012-04-02 2015-08-18 Trusteer, Ltd. Detection of phishing attempts
US20140089786A1 (en) 2012-06-01 2014-03-27 Atiq Hashmi Automated Processor For Web Content To Mobile-Optimized Content Transformation
US8595613B1 (en) 2012-07-26 2013-11-26 Viasat Inc. Page element identifier pre-classification for user interface behavior in a communications system
US8806627B1 (en) 2012-12-17 2014-08-12 Emc Corporation Content randomization for thwarting malicious software attacks
US9294502B1 (en) 2013-12-06 2016-03-22 Radware, Ltd. Method and system for detection of malicious bots
GB201415860D0 (en) 2014-09-08 2014-10-22 User Replay Ltd Systems and methods for recording and recreating interactive user-sessions involving an on-line server
WO2017156158A1 (en) 2016-03-09 2017-09-14 Shape Security, Inc. Applying bytecode obfuscation techniques to programs written in an interpreted language

Patent Citations (108)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5003596A (en) * 1989-08-17 1991-03-26 Cryptech, Inc. Method of cryptographically transforming electronic digital data from one form to another
US5315657A (en) * 1990-09-28 1994-05-24 Digital Equipment Corporation Compound principals in access control lists
US6006328A (en) * 1995-07-14 1999-12-21 Christopher N. Drake Computer software authentication, protection, and security system
US6088452A (en) * 1996-03-07 2000-07-11 Northern Telecom Limited Encoding technique for software and hardware
US5892899A (en) * 1996-06-13 1999-04-06 Intel Corporation Tamper resistant methods and apparatus
US6668325B1 (en) * 1997-06-09 2003-12-23 Intertrust Technologies Obfuscation techniques for enhancing software security
US6594761B1 (en) * 1999-06-09 2003-07-15 Cloakware Corporation Tamper resistant software encoding
US6842862B2 (en) * 1999-06-09 2005-01-11 Cloakware Corporation Tamper resistant software encoding
US7779394B2 (en) * 1999-07-29 2010-08-17 Intertrust Technologies Corporation Software self-defense systems and methods
US20150278491A1 (en) * 1999-07-29 2015-10-01 Intertrust Technologies Corporation Software self-defense systems and methods
US20070234070A1 (en) * 1999-07-29 2007-10-04 Intertrust Technologies Corp. Software self-defense systems and methods
US9064099B2 (en) * 1999-07-29 2015-06-23 Intertrust Technologies Corporation Software self-defense systems and methods
US7779270B2 (en) * 1999-07-29 2010-08-17 Intertrust Technologies Corporation Software self-defense systems and methods
US20130232343A1 (en) * 1999-07-29 2013-09-05 Intertrust Technologies Corporation Software self-defense systems and methods
US7430670B1 (en) * 1999-07-29 2008-09-30 Intertrust Technologies Corp. Software self-defense systems and methods
US20050183072A1 (en) * 1999-07-29 2005-08-18 Intertrust Technologies Corporation Software self-defense systems and methods
US20050204348A1 (en) * 1999-07-29 2005-09-15 Inter Trust Technologies Corporation Software self-defense systems and methods
US20050210275A1 (en) * 1999-07-29 2005-09-22 Intertrust Technologies Corporation Software self-defense systems and methods
US7823135B2 (en) * 1999-07-29 2010-10-26 Intertrust Technologies Corporation Software self-defense systems and methods
US20110035733A1 (en) * 1999-07-29 2011-02-10 Intertrust Technologies Corp. Software Self-Defense Systems and Methods
US8387022B2 (en) * 1999-07-29 2013-02-26 Intertrust Technologies Corp. Software self-defense systems and methods
US6779114B1 (en) * 1999-08-19 2004-08-17 Cloakware Corporation Tamper resistant software-control flow encoding
US20060031686A1 (en) * 1999-09-03 2006-02-09 Purdue Research Foundation Method and system for tamperproofing software
US20030163718A1 (en) * 2000-04-12 2003-08-28 Johnson Harold J. Tamper resistant software-mass data encoding
US20040139340A1 (en) * 2000-12-08 2004-07-15 Johnson Harold J System and method for protecting computer software from a white box attack
US7506177B2 (en) * 2001-05-24 2009-03-17 Cloakware Corporation Tamper resistant software encoding and analysis
US20110302424A1 (en) * 2001-06-13 2011-12-08 Intertrust Technologies Corp. Software Self-Checking Systems and Methods
US20040101142A1 (en) * 2001-07-05 2004-05-27 Nasypny Vladimir Vladimirovich Method and system for an integrated protection system of data distributed processing in computer networks and system for carrying out said method
US7103180B1 (en) * 2001-10-25 2006-09-05 Hewlett-Packard Development Company, L.P. Method of implementing the data encryption standard with reduced computation
US20050002532A1 (en) * 2002-01-30 2005-01-06 Yongxin Zhou System and method of hiding cryptographic private keys
US20030159063A1 (en) * 2002-02-07 2003-08-21 Larry Apfelbaum Automated security threat testing of web pages
US7580521B1 (en) * 2003-06-25 2009-08-25 Voltage Security, Inc. Identity-based-encryption system with hidden public key attributes
US7961879B1 (en) * 2003-06-25 2011-06-14 Voltage Security, Inc. Identity-based-encryption system with hidden public key attributes
US7472413B1 (en) * 2003-08-11 2008-12-30 F5 Networks, Inc. Security for WAP servers
US20050166191A1 (en) * 2004-01-28 2005-07-28 Cloakware Corporation System and method for obscuring bit-wise and two's complement integer computations in software
US20060101047A1 (en) * 2004-07-29 2006-05-11 Rice John R Method and system for fortifying software
US20060034455A1 (en) * 2004-08-12 2006-02-16 Damgaard Ivan B Permutation data transform to enhance security
US8077861B2 (en) * 2004-08-12 2011-12-13 Cmla, Llc Permutation data transform to enhance security
US20060195588A1 (en) * 2005-01-25 2006-08-31 Whitehat Security, Inc. System for detecting vulnerabilities in web applications using client-side application interfaces
US7587616B2 (en) * 2005-02-25 2009-09-08 Microsoft Corporation System and method of iterative code obfuscation
US20060195703A1 (en) * 2005-02-25 2006-08-31 Microsoft Corporation System and method of iterative code obfuscation
US20060253687A1 (en) * 2005-05-09 2006-11-09 Microsoft Corporation Overlapped code obfuscation
US20080025496A1 (en) * 2005-08-01 2008-01-31 Asier Technology Corporation, A Delaware Corporation Encrypting a plaintext message with authentication
US20100172494A1 (en) * 2005-08-01 2010-07-08 Kevin Martin Henson Encrypting a plaintext message with authenticaion
US7620987B2 (en) * 2005-08-12 2009-11-17 Microsoft Corporation Obfuscating computer code to prevent an attack
US20070039048A1 (en) * 2005-08-12 2007-02-15 Microsoft Corporation Obfuscating computer code to prevent an attack
US20070064617A1 (en) * 2005-09-15 2007-03-22 Reves Joseph P Traffic anomaly analysis for the detection of aberrant network code
US20090119515A1 (en) * 2005-10-28 2009-05-07 Matsushita Electric Industrial Co., Ltd. Obfuscation evaluation method and obfuscation method
US20090307500A1 (en) * 2006-02-06 2009-12-10 Taichi Sato Program obfuscator
US20080229394A1 (en) * 2006-07-10 2008-09-18 Sci Group Method and System For Securely Protecting Data During Software Application Usage
US20090249492A1 (en) * 2006-09-21 2009-10-01 Hans Martin Boesgaard Sorensen Fabrication of computer executable program files from source code
US8393003B2 (en) * 2006-12-21 2013-03-05 Telefonaktiebolaget L M Ericsson (Publ) Obfuscating computer program code
US20090254572A1 (en) * 2007-01-05 2009-10-08 Redlich Ron M Digital information infrastructure and method
US8752032B2 (en) * 2007-02-23 2014-06-10 Irdeto Canada Corporation System and method of interlocking to protect software-mediated program and device behaviours
US20150213239A1 (en) * 2007-02-23 2015-07-30 Irdeto Canada Corporation System and method of interlocking to protect software-mediated program and device behaviours
US20080208560A1 (en) * 2007-02-23 2008-08-28 Harold Joseph Johnson System and method of interlocking to protect software - mediated program and device behaviors
US20150074803A1 (en) * 2007-02-23 2015-03-12 Irdeto Canada Corportation System and method of interlocking to protect software-mediated program and device behaviours
US8161463B2 (en) * 2007-02-23 2012-04-17 Irdeto Canada Corporation System and method of interlocking to protect software—mediated program and device behaviors
US20080216051A1 (en) * 2007-02-23 2008-09-04 Harold Joseph Johnson System and method of interlocking to protect software-mediated program and device behaviours
US20080222736A1 (en) * 2007-03-07 2008-09-11 Trusteer Ltd. Scrambling HTML to prevent CSRF attacks and transactional crimeware attacks
US8392910B1 (en) * 2007-04-10 2013-03-05 AT & T Intellectual Property II, LLP Stochastic method for program security using deferred linking
US20130152071A1 (en) * 2007-04-10 2013-06-13 At & T Intellectual Property Ii, L.P. Stochastic Method for Program Security Using Deferred Linking
US20090077383A1 (en) * 2007-08-06 2009-03-19 De Monseignat Bernard System and method for authentication, data transfer, and protection against phishing
US20100257354A1 (en) * 2007-09-07 2010-10-07 Dis-Ent, Llc Software based multi-channel polymorphic data obfuscation
US20090193513A1 (en) * 2008-01-26 2009-07-30 Puneet Agarwal Policy driven fine grain url encoding mechanism for ssl vpn clientless access
US20090235089A1 (en) * 2008-03-12 2009-09-17 Mathieu Ciet Computer object code obfuscation using boot installation
US20130061323A1 (en) * 2008-04-23 2013-03-07 Trusted Knight Corporation System and method for protecting against malware utilizing key loggers
US8762705B2 (en) * 2008-07-24 2014-06-24 Alibaba Group Holding Limited System and method for preventing web crawler access
US20150195305A1 (en) * 2008-07-24 2015-07-09 Alibaba Group Holding Limited System and method for preventing web crawler access
US20100058301A1 (en) * 2008-08-26 2010-03-04 Apple Inc. System and method for branch extraction obfuscation
US8185749B2 (en) * 2008-09-02 2012-05-22 Apple Inc. System and method for revising boolean and arithmetic operations
US20130067225A1 (en) * 2008-09-08 2013-03-14 Ofer Shochet Appliance, system, method and corresponding software components for encrypting and processing data
US20100083072A1 (en) * 2008-09-30 2010-04-01 Freescale Semiconductor, Inc. Data interleaver
US20100107245A1 (en) * 2008-10-29 2010-04-29 Microsoft Corporation Tamper-tolerant programs
US20100186089A1 (en) * 2009-01-22 2010-07-22 International Business Machines Corporation Method and system for protecting cross-domain interaction of a web application on an unmodified browser
US20100281459A1 (en) * 2009-05-01 2010-11-04 Apple Inc. Systems, methods, and computer-readable media for fertilizing machine-executable code
US8347398B1 (en) * 2009-09-23 2013-01-01 Savvystuff Property Trust Selected text obfuscation and encryption in a local, network and cloud computing environment
US20110131416A1 (en) * 2009-11-30 2011-06-02 James Paul Schneider Multifactor validation of requests to thw art dynamic cross-site attacks
US20110129089A1 (en) * 2009-11-30 2011-06-02 Electronics And Telecommunications Research Institute Method and apparatus for partially encoding/decoding data for commitment service and method of using encoded data
US20110167407A1 (en) * 2010-01-06 2011-07-07 Apple Inc. System and method for software data reference obfuscation
US8615804B2 (en) * 2010-02-18 2013-12-24 Polytechnic Institute Of New York University Complementary character encoding for preventing input injection in web applications
US20130046995A1 (en) * 2010-02-23 2013-02-21 David Movshovitz Method and computer program product for order preserving symbol based encryption
US8266243B1 (en) * 2010-03-30 2012-09-11 Amazon Technologies, Inc. Feedback mechanisms providing contextual information
US20120022942A1 (en) * 2010-04-01 2012-01-26 Lee Hahn Holloway Internet-based proxy service to modify internet responses
US20150180509A9 (en) * 2010-09-10 2015-06-25 John P. Fonseka Methods, apparatus, and systems for coding with constrained interleaving
US20150039962A1 (en) * 2010-09-10 2015-02-05 John P. Fonseka Methods, apparatus, and systems for coding with constrained interleaving
US20140013427A1 (en) * 2011-03-24 2014-01-09 Irdeto B.V. System And Method Providing Dependency Networks Throughout Applications For Attack Resistance
US20130179985A1 (en) * 2012-01-05 2013-07-11 Vmware, Inc. Securing user data in cloud computing environments
US20130232578A1 (en) * 2012-03-02 2013-09-05 Apple Inc. Method and apparatus for obfuscating program source codes
US8661549B2 (en) * 2012-03-02 2014-02-25 Apple Inc. Method and apparatus for obfuscating program source codes
US20140165197A1 (en) * 2012-12-06 2014-06-12 Empire Technology Development, Llc Malware attack prevention using block code permutation
US20140281535A1 (en) * 2013-03-15 2014-09-18 Munibonsoftware.com, LLC Apparatus and Method for Preventing Information from Being Extracted from a Webpage
US20180041527A1 (en) * 2013-03-15 2018-02-08 Shape Security, Inc. Using instrumentation code to detect bots or malware
US20140282872A1 (en) * 2013-03-15 2014-09-18 Shape Security Inc. Stateless web content anti-automation
US9178908B2 (en) * 2013-03-15 2015-11-03 Shape Security, Inc. Protecting against the introduction of alien content
US20150350243A1 (en) * 2013-03-15 2015-12-03 Shape Security Inc. Safe Intelligent Content Modification
US20140283069A1 (en) * 2013-03-15 2014-09-18 Shape Security Inc. Protecting against the introduction of alien content
US20160197945A1 (en) * 2013-03-15 2016-07-07 Shape Security, Inc. Protecting against the introduction of alien content
US20190243971A1 (en) * 2013-03-15 2019-08-08 Shape Security, Inc. Using instrumentation code to detect bots or malware
US9270647B2 (en) * 2013-12-06 2016-02-23 Shape Security, Inc. Client/server security by an intermediary rendering modified in-memory objects
US10122747B2 (en) * 2013-12-06 2018-11-06 Lookout, Inc. Response generation after distributed monitoring and evaluation of multiple devices
US10027628B2 (en) * 2013-12-06 2018-07-17 Shape Security, Inc. Client/server security by an intermediary rendering modified in-memory objects
US9712561B2 (en) * 2014-01-20 2017-07-18 Shape Security, Inc. Intercepting and supervising, in a runtime environment, calls to one or more objects in a web page
US9241004B1 (en) * 2014-03-11 2016-01-19 Trend Micro Incorporated Alteration of web documents for protection against web-injection attacks
US20170041341A1 (en) * 2014-05-23 2017-02-09 Shape Security, Inc. Polymorphic Treatment of Data Entered At Clients
US9602543B2 (en) * 2014-09-09 2017-03-21 Shape Security, Inc. Client/server polymorphism using polymorphic hooks
US9582666B1 (en) * 2015-05-07 2017-02-28 Shape Security, Inc. Computer system for improved security of server computers interacting with client computers
US10216488B1 (en) * 2016-03-14 2019-02-26 Shape Security, Inc. Intercepting and injecting calls into operations and objects

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10230718B2 (en) 2015-07-07 2019-03-12 Shape Security, Inc. Split serving of computer code
US10834101B2 (en) 2016-03-09 2020-11-10 Shape Security, Inc. Applying bytecode obfuscation techniques to programs written in an interpreted language
US10216488B1 (en) 2016-03-14 2019-02-26 Shape Security, Inc. Intercepting and injecting calls into operations and objects
US11349816B2 (en) 2016-12-02 2022-05-31 F5, Inc. Obfuscating source code sent, from a server computer, to a browser on a client computer
CN110263533A (en) * 2019-04-28 2019-09-20 清华大学 Safe web page means of defence
US11741197B1 (en) 2019-10-15 2023-08-29 Shape Security, Inc. Obfuscating programs using different instruction set architectures
US20210334342A1 (en) * 2020-04-27 2021-10-28 Imperva, Inc. Procedural code generation for challenge code
US11748460B2 (en) * 2020-04-27 2023-09-05 Imperva, Inc. Procedural code generation for challenge code
EP4209938A1 (en) * 2022-01-05 2023-07-12 Irdeto B.V. Systems, methods, and storage media for creating secured computer code

Also Published As

Publication number Publication date
US9858440B1 (en) 2018-01-02

Similar Documents

Publication Publication Date Title
US20180121680A1 (en) Obfuscating web code
US11297097B2 (en) Code modification for detecting abnormal activity
US9973519B2 (en) Protecting a server computer by detecting the identity of a browser on a client computer
US10382482B2 (en) Polymorphic obfuscation of executable code
US10193909B2 (en) Using instrumentation code to detect bots or malware
US10205742B2 (en) Stateless web content anti-automation
US20190141064A1 (en) Detecting attacks against a server computer based on characterizing user interactions with the client computing device
US9489526B1 (en) Pre-analyzing served content
US9584534B1 (en) Dynamic field re-rendering
US9325734B1 (en) Distributed polymorphic transformation of served content
US9112900B1 (en) Distributed polymorphic transformation of served content
US12058170B2 (en) Code modification for detecting abnormal activity

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

AS Assignment

Owner name: SHAPE SECURITY, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, XINRAN;ZHAO, YAO;REEL/FRAME:050910/0270

Effective date: 20140522

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION