Probability theory and statistics


Postby Guest » Mon Apr 19, 2021 11:02 am

A real estate agent is trying to understand the nature of housing stock and home prices in and around a medium sized town in upstate New York. She has collected data from a random sample of 1047 homes sold in the last 12 months. Data was collected on the following variables and is available in the houseprices.csv file.

• Price – the sale price of the house in $
• Living Area – in Sq. ft.
• Bathrooms – number of bathrooms in the house (powder rooms with no tub or shower area are considered 0.5 baths)
• Bedrooms – the number of bedrooms
• Lot Size – size of the property on which the house sits (in acres).
• Age – of the house in years
• Fireplace – whether or not the house has a fireplace (Yes = 1, No = 0)

Your task in this assignment is to analyze this dataset to gain some understanding of this particular real estate market – the values of homes, their characteristics in terms of size and other features, and relationships between these. This understanding will prove immensely helpful to the real estate agent in advising her clients. Since all of the homes are from the same geographical area, location (which usually has a huge bearing on home values) is not a major concern here.

Most of the analysis will be done in response to the specific questions posed on the homework assignments. But feel free to explore and play around with the data set to enhance your own understanding of how to make sense of data.

1. a) Your friend claims that the average house price in this area is above $150K. Do you agree? Briefly explain what the p-values in these cases mean?

b) He also claims that the average living area is more than 1800 Sq. Ft. Do you agree with this? (Use a 5% significance level for both.). Briefly explain what the p-values in these cases mean?

2. Are the home prices higher for houses with fireplaces as compared to those without?

a) Create side-by-side box plots of the house prices of the two groups and comment them.

b) Formulate an appropriate hypothesis and test it in order to check the above claim. Assume that the population standard deviations of house prices in the two groups are equal.

3. Any house aged more than 30 years is considered an "old" house. Your friend claims that old houses have larger lot sizes than new houses. Do you agree? Explain. Use a significance level of 5% for your test. Historical data suggests that old houses include some very large and some very small lot sizes, but new houses are more homogeneous in their lot sizes.

4. Based on the evidence available here, would you be willing to claim that fireplaces have become more fashionable? For simplicity, it is OK to compare only "new" houses and "old" houses. Use a significance level of 5% for your test. Use a significance level of 5% for your test.

5. Suppose that houses with 1-2 bedrooms are considered to be "Small Houses", those with 3-4 are "Medium Houses" and 5-6 as "Big Houses". Can we conclude that the prices of Small, Medium and Big houses are not the same, at 1% level of significance?
houseprices _data.pdf
House Prices Dataset
(686.7 KiB) Downloaded 18 times

Re: Statistics

Postby Guest » Mon Apr 19, 2021 11:47 am

Please download this dataset to work on.
(95 KiB) Downloaded 39 times

Return to Probabilities and Statistics

Who is online

Users browsing this forum: No registered users and 2 guests