1.3 Tasks of Data Mining
1.3 Tasks of Data Mining
1.3 Tasks of Data Mining
Long:
1. 1.6 Classification of Data mining Systems:
The data mining system can be classified according to the following criteria:
Database Technology
Statistics
Machine Learning
Information Science
Visualization
Other Disciplines
DEPT OF CSE & IT
VSSUT, Burla
Some Other Classification Criteria:
Classification according to kind of databases mined
Classification according to kind of knowledge mined
Classification according to kinds of techniques utilized
Classification according to applications adapted
Classification according to kind of databases mined
We can classify the data mining system according to kind of databases mined. Database
system can be classified according to different criteria such as data models, types of data
etc. And the data mining system can be classified accordingly. For example if we classify
the database according to data model then we may have a relational, transactional, object-
relational, or data warehouse mining system.
Classification according to kind of knowledge mined
We can classify the data mining system according to kind of knowledge mined. It is
means data mining system are classified on the basis of functionalities such as:
Characterization
Discrimination
Association and Correlation Analysis
Classification
Prediction
Clustering
Outlier Analysis
Evolution Analysis
Classification according to kinds of techniques utilized
We can classify the data mining system according to kind of techniques used. We can
describes these techniques according to degree of user interaction involved or the
methods of analysis employed.
Classification according to applications adapted
We can classify the data mining system according to application adapted. These
applications are
as follows:
Finance
Telecommunications
DNA
Stock Markets
E-mail
2. .7 Major Issues In Data Mining:
Mining different kinds of knowledge in databases. - The need of different
users is not the same. And Different user may be in interested in different kind of
knowledge. Therefore it is necessary for data mining to cover broad range of knowledge
discovery task.
Interactive mining of knowledge at multiple levels of abstraction. - The data mining
process needs to be interactive because it allows users to focus the search for patterns,
providing and refining data mining requests based on returned results.
Incorporation of background knowledge. - To guide discovery process and to express
the discovered patterns, the background knowledge can be used. Background knowledge
may be used to express the discovered patterns not only in concise terms but at multiple
level of abstraction.
Data mining query languages and ad hoc data mining. - Data Mining Query language
that allows the user to describe ad hoc mining tasks, should be integrated with a data
warehouse query language and optimized for efficient and flexible data mining.
Presentation and visualization of data mining results. - Once the patterns are
discovered it needs to be expressed in high level languages, visual representations. This
representations should be easily understandable by the users.
Handling noisy or incomplete data. - The data cleaning methods are required that can
handle the noise, incomplete objects while mining the data regularities. If data cleaning
methods are not there then the accuracy of the discovered patterns will be poor.
Pattern evaluation. - It refers to interestingness of the problem. The patterns discovered
should be interesting because either they represent common knowledge or lack novelty.
Efficiency and scalability of data mining algorithms. - In order to effectively extract
the information from huge amount of data in databases, data mining algorithm must be
efficient and scalable.
Parallel, distributed, and incremental mining algorithms. - The factors such as huge
size of databases, wide distribution of data,and complexity of data mining methods
motivate the development of parallel and distributed data mining algorithms. These
algorithm divide the data into partitions which is further processed parallel. Then the
results from the partitions is merged. The incremental algorithms, updates databases
without having mine the data again from scratch.
3. What is Data Warehousing?
A data warehousing is defined as a technique for collecting and managing data from
varied sources to provide meaningful business insights. It is a blend of technologies and
components which aids the strategic use of data.
The decision support database (Data Warehouse) is maintained separately from the
organization's operational database. However, the data warehouse is not a product but
an environment. It is an architectural construct of an information system which provides
users with current and historical decision support information which is difficult to access
or present in the traditional operational data store.
The data warehouse is the core of the BI system which is built for data analysis and
reporting.
You many know that a 3NF-designed database for an inventory system many have
tables related to each other. For example, a report on current inventory information can
include more than 12 joined conditions. This can quickly slow down the response time
of the query and report. A data warehouse provides a new design which can help to
reduce the response time and helps to enhance the performance of queries for reports
and analytics.
History of Datawarehouse
The Datawarehouse benefits users to understand and enhance their organization's
performance. The need to warehouse data evolved as computer systems became more
complex and needed to handle increasing amounts of Information. However, Data
Warehousing is a not a new thing.
1960- Dartmouth and General Mills in a joint research project, develop the terms
dimensions and facts.
1970- A Nielsen and IRI introduces dimensional data marts for retail sales.
1983- Tera Data Corporation introduces a database management system which
is specifically designed for decision support
Data warehousing started in the late 1980s when IBM worker Paul Murphy and
Barry Devlin developed the Business Data Warehouse.
However, the real concept was given by Inmon Bill. He was considered as a
father of data warehouse. He had written about a variety of topics for building,
usage, and maintenance of the warehouse & the Corporate Information Factory.
1. Structured
2. Semi-structured
3. Unstructured data
The data is processed, transformed, and ingested so that users can access the
processed data in the Data Warehouse through Business Intelligence tools, SQL
clients, and spreadsheets. A data warehouse merges information coming from different
sources into one comprehensive database.
By merging all of this information in one place, an organization can analyze its
customers more holistically. This helps to ensure that it has considered all the
information available. Data warehousing makes data mining possible. Data mining is
looking for patterns in the data that may lead to higher sales and profits.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line
of business, such as sales, finance, sales or finance. In an independent data mart, data
can collect directly from sources.