case study BDUD
case study BDUD
case study BDUD
A. Who is right, the marketing director or his boss? If you answered his boss,
what would you do to fix the measure of satisfaction?
The marketing director is wrong and his boss is correct. The number of
complaints on a particular product cannot determine the level of its
satisfaction when compared to other products. If the best selling product
receives the most complaints, that does not mean the worst customer
satisfaction. The number of items sold plays the most important role in
getting the number of customer's feedback. The more you sell the bigger
the sample to get both negative and positive data from.
B. What can you say about the attribute type of the original product
satisfaction attribute?
The marketing director is correct when he talks about ratio attributes, but he
does not use percent variation allowing proportional comparison with other
products. A better way to describe the data would be through soemthing
tracking the number of complaints per items sold.
4. (a) Is the marketing director in trouble? Will his approach work for
generating an ordinal ranking of the product variations in terms of
customer preference? Explain.
I don't think that this approach makes the most since the customers should
be able to compare them all. However, sometimes there are too many to get
reasonable ordinal data. This means that you are going to need conditional
questions that only get asked if you need them. This way you can avoid the
contradictions that can arise with the marketing directors.
(c) For the original product evaluation scheme, the overall rankings of
each product variation are found by computing its average over all test
subjects. Comment on whether you think that this is a reasonable
approach. What other approaches might you take?
I think that the original idea would work okay as long as the value we are
averaging is uniform across the entire scope of items. For instance if every
customer is rating their experience on a scale of 1-10 then it would be fine
but if some items rate on a different scale or some customers rate on a
different scale then it wouldn't work. There would be obvious issues with
this idea as well, their would be people that rate everything a 10 or a 1.
Another approach would be to take all of the complaints made and divide it
by the number of purchases, then this could be ordered.
I can think of a couple of ways if the id was tied into age or location. For
instance a phone number is kind of an id number and area codes give you a
good prediction of location. aAso check numbers are a form of id and give a
good prediction of time and possibly age of the person with the account.
Based off of what I know about elephants as one of these would increase
they all would so I would use something to see what kind of correlation
they have with one another. A good idea would probably be Pearsons's
correlation coefficient. This would enable us to predict some of the data
based off of the other values. For instance if we know the weight and height
then we can probably predict the others based off the correlation data we
have.
15. You are given a set of m objects that is divided into K groups, where
the ith group is of size mi. If the goal is to obtain a sample of size n < m,
what is the difference between the following two sampling schemes?
(b)We randomly select n elements from the data set, without regard for
the group to which an object belongs.
In the first group you get a much better proportional representation of each
group something that can't be said for the second one, because n < m. It is
possible that the number of K is alot larger than each sample. This means
that you may not be able to proportionately sample each group. Another
issue is that as you the size of n grows closer to m, you loose any
advantage you once had by sampling.
where dfi is the number of documents in which the ith term appears,
which is known as the document frequency of the term. This
transformation is known as the inverse document frequency
transformation.
In this situation it measn that the term will weighted very heavily.
i.e. Let the frequency be 2, the documents total 6, and the term appears in 1
document.
2*log(6/1) = 1.57
In every document?
i.e. Let the frequency be 2, the documents total 6, and the term appears in 6
documents
2*log(6/6)= 0
A problem with data mining in particular for search engines and other
practical purposes on the web, common terms that are on ervy page cannot
be weighted as much as the more useful terms. An example is if someone
was serching for "the baseball" They might be interested in every document
that has the word "baseball" in it but they definately don't care about every
document that has the word "the" in it.