Unstructured Data Classification
Unstructured Data Classification
Unstructured Data Classification
Which preprocessing technique is used to remove the most commonly used words?Stopword removal
Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train
the classifier and testing set to test the same T
True Negative is when the predicted instance and the actual is positive.F
True Positive is when the predicted instance and the actual instance is not negative.T
ITPE
A classifer that can compute using numeric as well as categorical values is Decision Tree Classifier
print(sentiment_analysis_data['label'].unique()) 10
Which of the given hyper parameter(s), when increased may cause random forest to over fit the data?
Depth of Tree
Choose the correct sequence for classifier building from the following:Initialize -> Train - -> Predict--
>Evaluate
Classification where each data is mapped to more than one class is called Multi Class Classification
To view the first 3 rows of the dataset, which of the following commands are used?
sentiment_analysis_data.head(3)
Imagine you have just finished training a decision tree for spam classication and it is showing abnormal
bad performance on both your training and test sets. Assume that your implementation has no bugs.
What could be reason for this problem You need to increase the learning rate.
Which NLP technique uses lexical knowledge base to obtain the correct base form of the words?
lemmatization
Identify the stop words from the following Both "the" and "it"
Which of the following command is used to view the dataset SIZE and what is the value returned?
sentiment_analysis_data.size,(7086, 3)
Which type of cross validation is used for imbalanced dataset? Stratified Shuffle Split
Which numerical statistics is used to identify the importance of a rare word in a document? tf-idf