TRANSMUT-Spark

Transformation Mutation for Apache Spark

TRANSMUT-Spark is a sbt plugin for mutation testing of Apache Spark programs in Scala. It applies transformation mutation to insert faults in the set of transformations called in a Spark program. TRANSMUT-Spark runs the entire mutation testing process by generating the mutants, running the tests for each mutant and analyzing the test results to identify killed and survived mutants, generating reports with the results and metrics. The TRANSMUT-Spark plugin is based on the Stryker4s sbt plugin.

Requirements

sbt 1.3+
Scala 2.12.x
SemanticDB enabled (semanticdbEnabled := true)
Spark programs using only the RDD (Core) API

Add Plugin

The TRANSMUT-Spark plugin has not yet been published to a remote repository, to use it you must clone this project and publish the plugin locally with sbt publishLocal inside the TRANSMUT-Spark project folder. After that, the plugin is able to be added to local projects.

To install the TRANSMUT-Spark plugin in a project, add the following line to your project/plugins.sbt:

addSbtPlugin("br.ufrn.dimap.forall" % "sbt-transmut" % "0.1-SNAPSHOT")

Usage

To use TRANSMUT-Spark, enter transmut in the sbt console or sbt transmut in the terminal inside the folder of the project with the programs to be mutated. It triggers the execution of TRANSMUT-Spark that will look for the transmut.conf configuration file in the root of the project and run the entire process.

Additionally, there is also the transmutAlive command that can be executed after a first execution of transmut. The transmutAlive considers the results of the last execution of TRANSMUT-Spark to execute only the lived mutants of the last execution again. This command is useful to analyze only the lived mutants without having to run all the generated mutants. To run transmutAlive, the options sources, programs, mutation-operators, enable-reduction and reduction-rules in the configurations must be the same as in the configurations of the last run. In addition, the transmutAlive command can also be used to force the execution of mutants that have been removed by the reduction module. For this, a list of mutants to force execution must be added (force-execution). Thus, the transmutAlive command will only execute if there are lived mutants from the last execution or if there are removed mutants to force execution, otherwise the command will not execute.

Outputs

The successful execution of TRANSMUT-Spark will generate the folder transmut-$datetime inside target/ (or other folder defined in transmut-base-dir in configurations) containing the following content:

mutants/
- Folder with the meta-mutant sources generated by TRANSMUT-Spark. A meta-mutant is a source that agregates all mutants generated for this source in just a single code. To create this meta-mutant, we apply the Mutation Switching technique used by Stryker4s.
mutated-src/
- Folder with a copy of the project's Scala source folder (src/main/scala/ or other folder defined by src-dir in configurations) and modified sources. To not modify the original source codes, we copy the folder and mutate the sources inside the copy. This folder contains the modified sources and the other sources that were not modified in the process. To see only the modified sources, look at the source codes in the mutants folder.
reports/
- Folder with the reports generated by TRANSMUT-Spark.
reports/html/
- Folder with the HTML reports generated by TRANSMUT-Spark.
reports/json/
- Folder with the JSON reports generated by TRANSMUT-Spark.
transmut.conf
- Copy of the configuration file that was used in this run.

Configurations

Configurations of TRANSMUT-Spark are set in the transmut.conf file in the root of the project. This file is mandatory because at least one program source and one program must be specified to run the process. The file is in the HOCON format and all configuration options should be in the "transmut" namespace. For example:

transmut {
    sources: [ "WordCount.scala" ],
    programs: [ "wordCount" ]
}

Configuration Options:

sources
- Description: list of file names of the program source codes (Scala source codes with the programs to be mutated). Only the file name must be entered, not the full path.
- Mandatory: Yes
- Example: sources: [ "WordCount.scala" ]
programs
- Description: list of programs (methods) to be mutated. For TRANSMUT-Spark, a Spark program must be encapsulated in a method to be mutated. The programs in the list must exist in one of the sources. Only the methods in the list are mutated, other methods and statements of the sources remain unchanged.
- Mandatory: Yes
- Example: programs: [ "wordCount" ]
- Warning: if the programs in the list are not identified in any source, the execution will fail.
mutation-operators
- Description: list of mutation operators to be applied in the mutation testing process. Only the mutation operators in the list are applied to generate mutants.
- Mandatory: No
- Default: [ "ALL" ]
- Example: mutation-operators: [ "DTI", "ATR", "JTR" ]
equivalent-mutants
- Description: list of equivalent mutants IDs. This list must be updated after a first run of the tool and an analysis of the survived mutants if they are identified as equivalent. If the list of programs, original program codes, list of mutation operators, enable reduction flag and/or list of reduction rules are not changed, different executions of TRANSMUT-Spark will always generate the same mutants and with the same IDs. Mutants in this list are not executed in the tests and are marked as equivalent in the reports.
- Mandatory: No
- Default: []
- Example: equivalent-mutants: [ 3, 5, 6 ]
- Warning: if the list of programs, original program codes, list of mutation operators, enable reduction flag and/or list of reduction rules are changed after a run, the generated mutants and IDs may differ from those generated in the previous run in a new run.
test-only
- Description: list of test classes to be executed. If the list is empty, all tests of the project are executed. The full name of the test classes must be entered (package + class name).
- Mandatory: No
- Default: []
- Example: test-only: [ "examples.WordCountTest" ]
- Warning: run all tests of the project (in case of test-only is empty) can be a slow process on big projects or projects with many other tests that are not related with the programs to be mutated. In this case, specifying the test classes to be executed is recommended to speed up the process.
enable-reduction
- Description: flag (Boolean value) indicating whether the mutant reduction module will be executed or not.
- Mandatory: No
- Default: true
- Example: enable-reduction: false
- Warning: The reduction module will only be executed if the flag is true. In addition, if the transmutAlive command is executed, the value of enable-reduction must be the same as the previous execution.
reduction-rules
- Description: list of reduction rules to be applied by the mutants reduction module. Only the reduction rules in the list are applied by the module.
- Mandatory: No
- Default: [ "ALL" ]
- Example: reduction-rules: [ "UTDE", "DTIE" ]
- Warning: These reduction rules will be applied only if enable-reduction: true.
force-execution
- Description: list of mutants IDs that were removed by the reduction module in a previous run that you want to force execution. This list must be updated after a first run of the tool and an analysis of the removed mutants if they are identified as relevants. If the list of programs, original program codes, list of mutation operators, enable reduction flag and/or list of reduction rules are not changed, different executions of TRANSMUT-Spark will always generate the same mutants and with the same IDs (including the removed mutants). Mutants in this list leave the list of removed mutants and are forced to run.
- Mandatory: No
- Default: []
- Example: force-execution: [ 3, 5, 6 ]
- Warning: if the list of programs, original program codes, list of mutation operators, enable reduction flag and/or reduction rules are changed after a run, the generated mutants and IDs may differ from those generated in the previous run in a new run. The removed mutants in this list are only forced to execute with the transmutAlive command.
src-dir
- Description: The directory containing the sources of the programs to be mutated.
- Mandatory: No
- Default: 'src/main/scala/' (scalaSource from sbt)
- Example: src-dir: 'other-src-folder/main/scala/'
semanticdb-dir
- Description: The directory containing the semanticdb specifications generated by the SemanticDB compile plugin. TRANSMUT-Spark is dependent on SemanticDB compile plugin, so SemanticDB must be enabled in the project settings (semanticdbEnabled := true in the build.sbt file)
- Mandatory: No
- Default: 'target/scala-2.12/meta/' (semanticdbTargetRoot from sbt)
- Example: semanticdb-dir: 'target/scala-2.12/other-meta/'
- Warning: if the SemanticDB compile plugin is not enabled in the project settings (semanticdbEnabled := true in the build.sbt file), the execution will fail.
transmut-base-dir
- Description: The directory where the TRANSMUT-Spark folder containing the sources and reports will be created. The folder generated has the name "transmut-$datetime" where $datetime has the date and time the process was started in the format "yyyyMMddHHmmss".
- Mandatory: No
- Default: 'target/'
- Example: transmut-base-dir: 'other-transmut-base-folder/'

Mutation Operators

List of mutation operators supported by TRANSMUT-Spark.

Operator	Description
UTS	Unary Transformation Swap
BTS	Binary Transformation Swap
UTR	Unary Transformation Replacement
BTR	Binary Transformation Replacement
UTD	Unary Transformation Deletion
MTR	Mapping Transformation Replacement
FTD	Filter Transformation Deletion
NFTP	Negation of Filter Transformation Predicate
STR	Set Transformation Replacement
DTD	Distinct Transformation Deletion
DTI	Distinct Transformation Insertion
ATR	Aggregation Transformation Replacement
JTR	Join Transformation Replacement
OTD	Order Transformation Deletion
OTI	Order Transformation Inversion

In the list of mutation operators in configurations (mutation-operators), use "ALL" as an alias to apply all mutation operators.

The description and more details about these mutation operators can be found in the paper Mutation Operators for Large Scale Data Processing Programs in Spark.

Mutants Reduction

TRANSMUT-Spark has a module that applies reduction rules to reduce the number of mutants to be executed in the process. The objective of this module is to reduce the number of equivalent, redundant and inefficient mutants. It is important to note that this does not mean that all equivalent, redundant or inefficient mutants will be removed, but only those that fit the reduction rules. The reduction rules were defined based on results of experiments and theoretical analysis of some mutation operators. The reduction module is optional, it will only be executed if enable-reduction: true. The list of reduction rules follows below, the reduction rules to be applied can be defined in the configuration (reduction-rules).

Rule	Description
UTDE	Removes mutants generated with the mutation operators FTD, DTD and OTD when the UTD operator has also been applied.
MTRR	Removes the following mutants generated with the MTR operator: mutants that map to $Max$ and $Min$, when the mapping is to a numerical type; mutants that map to " ", when the mapping is to the type string; mutants that map to $x.reverse$, when the mapping is to a collection type; and mutants that map to $null$, when the mapping is to any other type.
FTDS	Removes mutants generated with the mutation operator NFTP when the FTD or UTD operators have also been applied.
OTDS	Removes mutants generated with the mutation operator OTI when the OTD or UTD operators have also been applied.
DTIE	Removes mutants generated with the mutation operator DTI when the distinct transformation has been inserted after grouping or aggregation transformations.
ATRC	Removes the commutative replacement mutants ($f_m(x,y) = f(y, x)$) generated with the ATR mutation operator.

In the list of reduction rules in configurations (reduction-rules), use "ALL" as an alias to apply all reduction rules.

The removed mutants can be seen in the reports generated by the tool. They are on a different list because they have not been executed. The removed mutants can be forced to be executed using the transmutAlive command and the force-execution parameter in configurations. This list should only contain mutants that have been removed and is independent of the list of equivalent mutants.

Restrictions

References (parameters, variables and values) must have unique names;
Programs to be mutated must be encapsulated in methods;
All RDDs must have their own reference (must be declared as a parameter, variable or value);
Only one transformation must be called in a statement:
- For example: val rdd2 = rdd.filter( (a: String) => !a.isEmpty )
Anonymous (lambda) functions must have their input parameters explicitly typed:
- Incorrect: rdd.map( a => a * a )
- Correct: rdd.map( (a: Int) => a * a )

Example of a traditional Spark program that is not in the format supported by TRANSMUT-Spark:

package examples

import org.apache.spark._

object WordCount {

	def main(args: Array[String]): Unit = {
		val conf = new SparkConf()
		conf.setAppName("Word-Count")
		val sc = new SparkContext(conf)
		val textFile = sc.textFile("hdfs://...")
		val counts = textFile.flatMap(line => line.split(" "))
       					.map(word => (word, 1))
       					.reduceByKey(_ + _)
		counts.saveAsTextFile("hdfs://...")
	}
  
}

Example of the same program in a version supported by TRANSMUT-Spark:

package examples

import org.apache.spark._
import org.apache.spark.rdd.RDD

object WordCount {

	def wordCount(input: RDD[String]) = {
		val words = input.flatMap( (line: String) => line.split(" ") )
		val pairs = words.map( (word: String) => (word, 1) )
		val counts = pairs.reduceByKey( (a: Int, b: Int) => a + b )
		counts
	}
	
	def main(args: Array[String]): Unit = {
		val conf = new SparkConf()
		conf.setAppName("Word-Count")
		val sc = new SparkContext(conf)
		val textFile = sc.textFile("hdfs://...")
		val results = wordCount(textFile)
		results.saveAsTextFile("hdfs://...")
	}
  
}

Considering the example above, only the method wordCount should be included as a program to be mutated in the TRANSMUT-Spark configurations. Open the example project to see more examples of programs in the format supported by TRANSMUT-Spark and to run the plugin.

Supported Transformations

List of RDD transformations that can be mutated by TRANSMUT-Spark. Transformations that are not in this list can be in the program, but will not be modified in the process.

Transformation	Interface
map	`map[U](f: (T) ⇒ U): RDD[U]`
flatMap	`flatMap[U](f: (T) ⇒ TraversableOnce[U]): RDD[U]`
filter	`filter(f: (T) ⇒ Boolean): RDD[T]`
distinct	`distinct(): RDD[T]`
sortBy	`sortBy[K](f: (T) ⇒ K, ascending: Boolean = true): RDD[T]`
sortByKey	`sortByKey(ascending: Boolean = true): RDD[(K, V)]`
union	`union(other: RDD[T]): RDD[T]`
intersection	`intersection(other: RDD[T]): RDD[T]`
subtract	`subtract(other: RDD[T]): RDD[T]`
join	`join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]`
leftOuterJoin	`leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]`
rightOuterJoin	`rightOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], W))]`
fullOuterJoin	`fullOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], Option[W]))]`
reduceByKey	`reduceByKey(func: (V, V) ⇒ V): RDD[(K, V)]`
combineByKey* (mergeCombiners)	`combineByKey[C](createCombiner: (V) ⇒ C, mergeValue: (C, V) ⇒ C, mergeCombiners: (C, C) ⇒ C): RDD[(K, C)]`

* - Only the parameter in parentheses is mutated, the rest remains the same.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
code-analyzer/src		code-analyzer/src
core/src		core/src
mutation-analyzer/src		mutation-analyzer/src
mutation-manager/src		mutation-manager/src
project		project
sbt-transmut/src		sbt-transmut/src
transmut-spark-example-project		transmut-spark-example-project
util/src		util/src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRANSMUT-Spark

Transformation Mutation for Apache Spark

Requirements

Add Plugin

Usage

Outputs

Configurations

Configuration Options:

Mutation Operators

Mutants Reduction

Restrictions

Supported Transformations

About

Releases

Packages

Contributors 2

Languages

License

jbsneto-ppgsc-ufrn/transmut-spark

Folders and files

Latest commit

History

Repository files navigation

TRANSMUT-Spark

Transformation Mutation for Apache Spark

Requirements

Add Plugin

Usage

Outputs

Configurations

Configuration Options:

Mutation Operators

Mutants Reduction

Restrictions

Supported Transformations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages