Nothing Special   »   [go: up one dir, main page]

0% found this document useful (0 votes)
60 views2 pages

DVX

Download as txt, pdf, or txt
Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1/ 2

By visualizing the distribution of variable lender_count, we can find out which

numerical range has the highest frequency of data occurrence.


Say we are interested in finding out the range of the highest frequency of
lender_count for all loans in the United States. Therefore,
the first thing we need to do is to filter the data so it only contain all data
from United States. After the data is ready,
you can start to make a plot to visualize the data. You may need to check the
distribution of the lender_count using a histogram or density chart.

# your code here

At what range does the lender_count in United States has the highest frequency

Say we are interested in analyzing the loans posted in the Manufacturing sector. We
would like to see the relationship or pattern between the amount of loan
(loan_amount) and number of lenders (lender_count). To do that, we can use a
scatter plot.

# your code here

How would you describe the relationship between the amount of loan and the number
of lenders from all loans within the Manufacturing sector?

[ ] The higher the loan amount, the lower the lender count
[ ] The higher the loan amount, the higher the lender count
[ ] Loan amount and lender count don't have any meaningful relationship

Which statement is true based on the scatterplot you have created?

[ ] There are some loan that has big loan amount but little number of lender count
[ ] There are some loan that has big lender count but little loan amount
[ ] Most of the loan request has loan amount more than 7500

Consider the following case: One of the data analysts in Kiva is tasked to analyze
the time duration of a loan from the first time being posted to be fully funded in
the Philippines according to each repayment interval types. The analyst then tried
to visualize the monthly trend of the average funded time duration in hourly units
each month.

Pay attention to the resulting plot of the analyst’s task in Guatemala.png. Now
your task is to recreate the previous plot for country of Philippines using your
data.

In order to analyze the trend, first we need to subset the data for the country of
Philippines. We will also need to convert any date data into a proper date format.

What is the earliest and latest posted time of any loan in 2015?

Now we are set to calculate the duration from a loan is posted until it is fully
funded. We need to create a new column that contains the difference between the
funded time and the posted time. We will call it funding duration. This column will
have a data type of time and presented in unit of minutes. We need to convert them
into numeric and divide by 60, so the time would be in hourly value.

# your code here


Since we want to visualize the monthly average funding duration, you need to create
a new column which contains the month of the posted time before aggregating the
data.
# your code here
Finally, we will start to aggregate the data based on the month of the posted time
and the repayment interval to get the average funding duration.

# your code here


Which repayment interval has the longest fund duration and at what month did it
happen?

[ ] monthly repayment interval in April


[ ] bullet repayment interval in January
[ ] monthly repayment interval in March

The data has been properly prepared. Now it is your time to create the line plot to
visualize the trend. Fill in the code below to produce the plot.

# ggplot(loan_agg, aes(x = ........, y = ........., color = ......, group =


repayment_interval))+
# geom_line()+
# geom_point()+
# labs(title = "Funding Duration Trend on Philippines, 2015")+
# theme_minimal()+
# theme(legend.position = "top")
Which statement is TRUE based on the line plot?

[ ] Monthly repayment interval has almost the same funding duration with Irregular
repayment interval in August
[ ] Bullet repayment interval has longer funding duration than Irregular repayment
interval in June
[ ] Monthly repayment interval never funded faster than Irregular repayment
interval

https://ggplot2-book.org/polishing.html#theme-elements

You might also like