Newsletter Downloads
Open source analytics: an introduction to the special issue
This special issue contains six articles on open source analytics. It includes an article describing the Weka data mining system, two articles on infrastructure to support analytics, an article on the PMML standard for statistical and data mining models,...
What is analytic infrastructure and why should you care?
We define analytic infrastructure to be the services, applications, utilities and systems that are used for either preparing data for modeling, estimating models, validating models, scoring data, or related activities. For example, analytic ...
The WEKA data mining software: an update
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread ...
What's PMML and what's new in PMML 4.0?
The Predictive Model Markup Language (PMML) data mining standard has arguably become one of the most widely adopted data mining standards in use today. Two years in the making, the latest release of PMML contains several new features and many ...
KNIME - the Konstanz information miner: version 2.0 and beyond
- Michael R. Berthold,
- Nicolas Cebron,
- Fabian Dill,
- Thomas R. Gabriel,
- Tobias Kötter,
- Thorsten Meinl,
- Peter Ohl,
- Kilian Thiel,
- Bernd Wiswedel
The Konstanz Information Miner is a modular environment, which enables easy visual assembly and interactive execution of a data pipeline. It is designed as a teaching, research and collaboration platform, which enables simple integration of new ...
Efficient deployment of predictive analytics through open standards and cloud computing
Over the past decade, we have seen tremendous interest in the application of data mining and statistical algorithms, first in research and science and, more recently, across various industries. This has translated into the development of a myriad of ...
Development and user experiences of an open source data cleaning, deduplication and record linkage system
Record linkage, also known as database matching or entity resolution, is now recognised as a core step in the KDD process. Data mining projects increasingly require that information from several sources is combined before the actual mining can be ...
Challenging research issues in data mining, databases and information retrieval
Data mining research along with related fields such as databases and information retrieval poses challenging problems, especially for doctoral students. The research spreads over a variety of topics such as text mining, semantic web, multilingual ...
Correlation clustering
This is a short summary of the author's thesis on "Correlation Clustering" (Ludwig-Maximilians-Universität München, Germany, 2008). The complete thesis is available at http://edoc.ub.uni-muenchen.de/8736/.
Adaptive learning and mining for data streams and frequent patterns
This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification ...