M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
Data mining can be helpful to human resources (HR) departments in identifying the characteristics of
their most successful employees. Information obtained – such as universities attended by highly
successful employees – can help HR focus recruiting efforts accordingly. Additionally, Strategic
Enterprise Management applications help a company translate corporate-level goals, such as profit
and margin share targets, into operational decisions, such as production plans and workforce levels.
This process is essential in transforming large volumes of raw data — structured, unstructured, or
semi-structured — into valuable, actionable knowledge. The steps 1 to 4 come under the data
preprocessing stage. Here, data mining is represented as a single step but it refers to the entire
knowledge discovery process. While mining database, we can search for Trends and Data Patterns.
For example, Maximum Marks scored by students, Minimum Marks scored by students, Analysis of
Sales Data, etc. could be obtained as readily as possible.
#2) Data Warehouse Data: A data warehouse is a collection of information collected from multiple
data sources, stored under a unified schema at a single sit. A DW is modelled as a multidimensional
data structure called data cube having cells and dimensions providing precomputation and faster
access to data.
#3) Transactional Data: Transactional Data captures a transaction. It has a transaction id and a list of
items used in transaction.
#4) Other kinds of Data: Other data can include: time-related data, spatial data, hypertext data, and
multimedia data.
Data Mining is a highly application-driven domain. Many techniques such as statistics, machine
learning, pattern recognition, information retrieval, visualization, etc., influence the development of
data analysis methods.
By using data analysis and IR, we can find major topics in the collection of
documents and also the major topics involved in each document.
BEST PRACTICES IN DATA MINING:
Data collection and preprocessing are crucial to data mining. They involve cleaning and organizing
data to prepare it for evaluation so your data mining tools can understand it. These steps help
ensure your e efforts produce results that match your objectives.
Gather relevant data from various sources, such as your databases, spreadsheets, logs, etc. Then,
preprocess your raw data:
Clean: Fix any errors, including typos, duplicate entries, and inconsistencies.
Normalize and standardize: Make the variables in your data comparable, which is important
for any data mining technique. Normalization scales data to a range between 0 and 1,
and standardization transforms data to have a mean of 0 and a standard deviation of 1.
Transform: Adjust data to meet specific project needs. This could involve combining data,
creating new variables, or encoding (e.g., turning words or categories into numbers so a
computer can understand them better).
Use a tool like Excel or Google Sheets for basic, manual cleaning to make it easier and quicker for
your team. Or you can use a more advanced platform like Trifacta for complex, automated data
preprocessing.