TRANSMUT-Spark is a sbt plugin for mutation testing of Apache Spark programs in Scala. It applies transformation mutation to insert faults in the set of transformations called in a Spark program. TRANSMUT-Spark runs the entire mutation testing process by generating the mutants, running the tests for each mutant and analyzing the test results to identify killed and survived mutants, generating reports with the results and metrics. The TRANSMUT-Spark plugin is based on the Stryker4s sbt plugin.
- sbt 1.3+
- Scala 2.12.x
- SemanticDB enabled (
semanticdbEnabled := true
) - Spark programs using only the RDD (Core) API
The TRANSMUT-Spark plugin has not yet been published to a remote repository, to use it you must clone this project and publish the plugin locally with sbt publishLocal
inside the TRANSMUT-Spark project folder. After that, the plugin is able to be added to local projects.
To install the TRANSMUT-Spark plugin in a project, add the following line to your project/plugins.sbt
:
addSbtPlugin("br.ufrn.dimap.forall" % "sbt-transmut" % "0.1-SNAPSHOT")
To use TRANSMUT-Spark, enter transmut
in the sbt console or sbt transmut
in the terminal inside the folder of the project with the programs to be mutated. It triggers the execution of TRANSMUT-Spark that will look for the transmut.conf
configuration file in the root of the project and run the entire process.
Additionally, there is also the transmutAlive
command that can be executed after a first execution of transmut
. The transmutAlive
considers the results of the last execution of TRANSMUT-Spark to execute only the lived mutants of the last execution again. This command is useful to analyze only the lived mutants without having to run all the generated mutants. To run transmutAlive
, the options sources
, programs
, mutation-operators
, enable-reduction
and reduction-rules
in the configurations must be the same as in the configurations of the last run. In addition, the transmutAlive
command can also be used to force the execution of mutants that have been removed by the reduction module. For this, a list of mutants to force execution must be added (force-execution
). Thus, the transmutAlive
command will only execute if there are lived mutants from the last execution or if there are removed mutants to force execution, otherwise the command will not execute.
The successful execution of TRANSMUT-Spark will generate the folder transmut-$datetime
inside target/
(or other folder defined in transmut-base-dir
in configurations) containing the following content:
mutants/
- Folder with the meta-mutant sources generated by TRANSMUT-Spark. A meta-mutant is a source that agregates all mutants generated for this source in just a single code. To create this meta-mutant, we apply the Mutation Switching technique used by Stryker4s.
mutated-src/
- Folder with a copy of the project's Scala source folder (
src/main/scala/
or other folder defined bysrc-dir
in configurations) and modified sources. To not modify the original source codes, we copy the folder and mutate the sources inside the copy. This folder contains the modified sources and the other sources that were not modified in the process. To see only the modified sources, look at the source codes in themutants
folder.
- Folder with a copy of the project's Scala source folder (
reports/
- Folder with the reports generated by TRANSMUT-Spark.
reports/html/
- Folder with the HTML reports generated by TRANSMUT-Spark.
reports/json/
- Folder with the JSON reports generated by TRANSMUT-Spark.
transmut.conf
- Copy of the configuration file that was used in this run.
Configurations of TRANSMUT-Spark are set in the transmut.conf
file in the root of the project. This file is mandatory because at least one program source and one program must be specified to run the process. The file is in the HOCON format and all configuration options should be in the "transmut" namespace. For example:
transmut {
sources: [ "WordCount.scala" ],
programs: [ "wordCount" ]
}
sources
- Description: list of file names of the program source codes (Scala source codes with the programs to be mutated). Only the file name must be entered, not the full path.
- Mandatory: Yes
- Example:
sources: [ "WordCount.scala" ]
programs
- Description: list of programs (methods) to be mutated. For TRANSMUT-Spark, a Spark program must be encapsulated in a method to be mutated. The programs in the list must exist in one of the sources. Only the methods in the list are mutated, other methods and statements of the sources remain unchanged.
- Mandatory: Yes
- Example:
programs: [ "wordCount" ]
- Warning: if the programs in the list are not identified in any source, the execution will fail.
mutation-operators
- Description: list of mutation operators to be applied in the mutation testing process. Only the mutation operators in the list are applied to generate mutants.
- Mandatory: No
- Default:
[ "ALL" ]
- Example:
mutation-operators: [ "DTI", "ATR", "JTR" ]
equivalent-mutants
- Description: list of equivalent mutants IDs. This list must be updated after a first run of the tool and an analysis of the survived mutants if they are identified as equivalent. If the list of programs, original program codes, list of mutation operators, enable reduction flag and/or list of reduction rules are not changed, different executions of TRANSMUT-Spark will always generate the same mutants and with the same IDs. Mutants in this list are not executed in the tests and are marked as equivalent in the reports.
- Mandatory: No
- Default:
[]
- Example:
equivalent-mutants: [ 3, 5, 6 ]
- Warning: if the list of programs, original program codes, list of mutation operators, enable reduction flag and/or list of reduction rules are changed after a run, the generated mutants and IDs may differ from those generated in the previous run in a new run.
test-only
- Description: list of test classes to be executed. If the list is empty, all tests of the project are executed. The full name of the test classes must be entered (package + class name).
- Mandatory: No
- Default:
[]
- Example:
test-only: [ "examples.WordCountTest" ]
- Warning: run all tests of the project (in case of
test-only
is empty) can be a slow process on big projects or projects with many other tests that are not related with the programs to be mutated. In this case, specifying the test classes to be executed is recommended to speed up the process.
enable-reduction
- Description: flag (Boolean value) indicating whether the mutant reduction module will be executed or not.
- Mandatory: No
- Default:
true
- Example:
enable-reduction: false
- Warning: The reduction module will only be executed if the flag is true. In addition, if the
transmutAlive
command is executed, the value ofenable-reduction
must be the same as the previous execution.
reduction-rules
- Description: list of reduction rules to be applied by the mutants reduction module. Only the reduction rules in the list are applied by the module.
- Mandatory: No
- Default:
[ "ALL" ]
- Example:
reduction-rules: [ "UTDE", "DTIE" ]
- Warning: These reduction rules will be applied only if
enable-reduction: true
.
force-execution
- Description: list of mutants IDs that were removed by the reduction module in a previous run that you want to force execution. This list must be updated after a first run of the tool and an analysis of the removed mutants if they are identified as relevants. If the list of programs, original program codes, list of mutation operators, enable reduction flag and/or list of reduction rules are not changed, different executions of TRANSMUT-Spark will always generate the same mutants and with the same IDs (including the removed mutants). Mutants in this list leave the list of removed mutants and are forced to run.
- Mandatory: No
- Default:
[]
- Example:
force-execution: [ 3, 5, 6 ]
- Warning: if the list of programs, original program codes, list of mutation operators, enable reduction flag and/or reduction rules are changed after a run, the generated mutants and IDs may differ from those generated in the previous run in a new run. The removed mutants in this list are only forced to execute with the
transmutAlive
command.
src-dir
- Description: The directory containing the sources of the programs to be mutated.
- Mandatory: No
- Default:
'src/main/scala/'
(scalaSource from sbt) - Example:
src-dir: 'other-src-folder/main/scala/'
semanticdb-dir
- Description: The directory containing the semanticdb specifications generated by the SemanticDB compile plugin. TRANSMUT-Spark is dependent on SemanticDB compile plugin, so SemanticDB must be enabled in the project settings (
semanticdbEnabled := true
in thebuild.sbt
file) - Mandatory: No
- Default:
'target/scala-2.12/meta/'
(semanticdbTargetRoot from sbt) - Example:
semanticdb-dir: 'target/scala-2.12/other-meta/'
- Warning: if the SemanticDB compile plugin is not enabled in the project settings (
semanticdbEnabled := true
in thebuild.sbt
file), the execution will fail.
- Description: The directory containing the semanticdb specifications generated by the SemanticDB compile plugin. TRANSMUT-Spark is dependent on SemanticDB compile plugin, so SemanticDB must be enabled in the project settings (
transmut-base-dir
- Description: The directory where the TRANSMUT-Spark folder containing the sources and reports will be created. The folder generated has the name "transmut-$datetime" where $datetime has the date and time the process was started in the format "yyyyMMddHHmmss".
- Mandatory: No
- Default:
'target/'
- Example:
transmut-base-dir: 'other-transmut-base-folder/'
List of mutation operators supported by TRANSMUT-Spark.
Operator | Description |
---|---|
UTS | Unary Transformation Swap |
BTS | Binary Transformation Swap |
UTR | Unary Transformation Replacement |
BTR | Binary Transformation Replacement |
UTD | Unary Transformation Deletion |
MTR | Mapping Transformation Replacement |
FTD | Filter Transformation Deletion |
NFTP | Negation of Filter Transformation Predicate |
STR | Set Transformation Replacement |
DTD | Distinct Transformation Deletion |
DTI | Distinct Transformation Insertion |
ATR | Aggregation Transformation Replacement |
JTR | Join Transformation Replacement |
OTD | Order Transformation Deletion |
OTI | Order Transformation Inversion |
In the list of mutation operators in configurations (mutation-operators
), use "ALL" as an alias to apply all mutation operators.
The description and more details about these mutation operators can be found in the paper Mutation Operators for Large Scale Data Processing Programs in Spark.
TRANSMUT-Spark has a module that applies reduction rules to reduce the number of mutants to be executed in the process. The objective of this module is to reduce the number of equivalent, redundant and inefficient mutants. It is important to note that this does not mean that all equivalent, redundant or inefficient mutants will be removed, but only those that fit the reduction rules. The reduction rules were defined based on results of experiments and theoretical analysis of some mutation operators. The reduction module is optional, it will only be executed if enable-reduction: true
. The list of reduction rules follows below, the reduction rules to be applied can be defined in the configuration (reduction-rules
).
Rule | Description |
---|---|
UTDE | Removes mutants generated with the mutation operators FTD, DTD and OTD when the UTD operator has also been applied. |
MTRR | Removes the following mutants generated with the MTR operator: mutants that map to |
FTDS | Removes mutants generated with the mutation operator NFTP when the FTD or UTD operators have also been applied. |
OTDS | Removes mutants generated with the mutation operator OTI when the OTD or UTD operators have also been applied. |
DTIE | Removes mutants generated with the mutation operator DTI when the distinct transformation has been inserted after grouping or aggregation transformations. |
ATRC | Removes the commutative replacement mutants ($f_m(x,y) = f(y, x)$) generated with the ATR mutation operator. |
In the list of reduction rules in configurations (reduction-rules
), use "ALL" as an alias to apply all reduction rules.
The removed mutants can be seen in the reports generated by the tool. They are on a different list because they have not been executed. The removed mutants can be forced to be executed using the transmutAlive
command and the force-execution
parameter in configurations. This list should only contain mutants that have been removed and is independent of the list of equivalent mutants.
- References (parameters, variables and values) must have unique names;
- Programs to be mutated must be encapsulated in methods;
- All RDDs must have their own reference (must be declared as a parameter, variable or value);
- Only one transformation must be called in a statement:
- For example:
val rdd2 = rdd.filter( (a: String) => !a.isEmpty )
- For example:
- Anonymous (lambda) functions must have their input parameters explicitly typed:
- Incorrect:
rdd.map( a => a * a )
- Correct:
rdd.map( (a: Int) => a * a )
- Incorrect:
Example of a traditional Spark program that is not in the format supported by TRANSMUT-Spark:
package examples
import org.apache.spark._
object WordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
conf.setAppName("Word-Count")
val sc = new SparkContext(conf)
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
}
}
Example of the same program in a version supported by TRANSMUT-Spark:
package examples
import org.apache.spark._
import org.apache.spark.rdd.RDD
object WordCount {
def wordCount(input: RDD[String]) = {
val words = input.flatMap( (line: String) => line.split(" ") )
val pairs = words.map( (word: String) => (word, 1) )
val counts = pairs.reduceByKey( (a: Int, b: Int) => a + b )
counts
}
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
conf.setAppName("Word-Count")
val sc = new SparkContext(conf)
val textFile = sc.textFile("hdfs://...")
val results = wordCount(textFile)
results.saveAsTextFile("hdfs://...")
}
}
Considering the example above, only the method wordCount
should be included as a program to be mutated in the TRANSMUT-Spark configurations. Open the example project to see more examples of programs in the format supported by TRANSMUT-Spark and to run the plugin.
List of RDD transformations that can be mutated by TRANSMUT-Spark. Transformations that are not in this list can be in the program, but will not be modified in the process.
Transformation | Interface |
---|---|
map | map[U](f: (T) ⇒ U): RDD[U] |
flatMap | flatMap[U](f: (T) ⇒ TraversableOnce[U]): RDD[U] |
filter | filter(f: (T) ⇒ Boolean): RDD[T] |
distinct | distinct(): RDD[T] |
sortBy | sortBy[K](f: (T) ⇒ K, ascending: Boolean = true): RDD[T] |
sortByKey | sortByKey(ascending: Boolean = true): RDD[(K, V)] |
union | union(other: RDD[T]): RDD[T] |
intersection | intersection(other: RDD[T]): RDD[T] |
subtract | subtract(other: RDD[T]): RDD[T] |
join | join[W](other: RDD[(K, W)]): RDD[(K, (V, W))] |
leftOuterJoin | leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))] |
rightOuterJoin | rightOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], W))] |
fullOuterJoin | fullOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (Option[V], Option[W]))] |
reduceByKey | reduceByKey(func: (V, V) ⇒ V): RDD[(K, V)] |
combineByKey* (mergeCombiners) | combineByKey[C](createCombiner: (V) ⇒ C, mergeValue: (C, V) ⇒ C, mergeCombiners: (C, C) ⇒ C): RDD[(K, C)] |
* - Only the parameter in parentheses is mutated, the rest remains the same.