联系方式

  • QQ:821613408
  • 邮箱:[email protected]
  • 工作时间:8:00-23:00
  • 微信:horysk8
  • 您当前位置:首页 >> CS作业CS作业

    日期:2019-06-08 09:09

    PSTAT 126 Final Project Option 1: the CDI Data
    1 Description
    The following description can be found in Appendix C.2 of Applied Linear Regression Models, fourth
    edition, by Kutner, Nachtsheim, and Neter:
    This data set provides selected county demographic information (CDI) for 440 of the most
    populous counties in the United States. Each line of the data set has an identification number with
    a county name and state abbreviation and provides information on 14 variables for a single county.
    Counties with missing data were deleted from the data set. The information generally pertains to
    the years 1990 and 1992. The 16 variables are
    Variable Name Description
    County County name
    State Two-letter state abbreviation
    LandArea Land area (square miles)
    TotalPop Estimated 1990 population
    Pop18 Percent of 1990 CDI population aged 18–34
    Pop65 Percent of 1990 CDI population aged 65 years old and older
    Physicians Number of professionally active nonfederal physicians during 1990
    Beds Total number of beds, cribs, and bassinets during 1990
    Crimes Total number of serious crimes in 1990, including murder, rape, robbery, aggravated
    asault, burglary, larceny-theft, and motor vehicle theft, as reported by law enforcement
    agencies
    HSGrad Percent of adult population (persons 25 years old or older) wo completed 12 or more years
    of school
    Bachelor Percent of adult population (percsons 25 years old or older) with bachelor’s degree
    Poverty Percent of 1990 CDI population with income below poverty level
    Unemp Percent of 1990 CDI labor force that is unemployed
    IncPerCap Per capita income of 1990 CDI population (dollars)
    PersonalInc Total personal income of 1990 CDI population (in millions of dollars)
    Region Geographic region classification is that used by the U.S. Bureau of the Census, where:
    1 = NE, 2 = NC, 3 = S, 4 = W
    The file CDI.rds contains these data and is available on Gauchospace.
    2 Project Components
    The overall project consists of a thorough investigation of two regression models that combine concepts and
    methods of linear regression used throughout the quarter.
    2.1 Part I
    You will investigate the model
    Physicians ~ log(TotalPop) + LandArea + IncPerCap (1)
    by answering the following questions.
    1a) What relationships do you expect to see between the response and each of the predictors, and why? What
    kind of associations, if any, do you expect will be present between the three predictors, and why? Do
    some exploratory analysis (e.g. plots and/or numerical summaries) to test you intuition.
    b) Fit the model in (1) and provide interpretations of the estimated coefficients. Report the value of R2 and
    explain its meaning.
    c) Do diagnostic checks to assess whether or not the linear regression assumptions seem to hold. If the
    model assumptions do not hold in your view, investigate possible transformations for predictors and/or
    response. Once suitable transformations are found, repeat b) for this new model and use this model for
    the remainder of Part I. Otherwise, move on to d).
    d) Using your fitted model, compute 95% confidence intervals for each of the coefficients in the model, and
    provide an interpretation for each. Conduct a test for the existence of a linear relationship between the
    predictors and response at α = 0.01. Give the null and alternative hypotheses (defining any notation
    that you use), value of the test statistic and its null distribution, the p-value or critical value, and your
    decision.
    e) Does the variance increase or decrease with log(TotalPop)? Perfom a test to make your conclusion. If
    you conclude that the variance is not constant, refit the model using weighted least squares and comment
    on any differences to the fitted coefficients or their standard errors.
    f) Summarize your analysis and comment on any interesting or unexpected findings.
    2.2 Part II
    You will investigate the model
    Physicians ~ TotalPop + Region (2)
    a) Fit the model in (2), and check the diagnostics. Find transformations if necessary.
    b) Using your transformations from a), refit the new model. For each region separately, write out an equation
    that expresses the estimated mean of number of physicians as a function of total population and personal
    income. Based on these equations, explain why this model is called a parallel regression model.
    c) Does the geographic region have a significant effect on the number of physicians in a county? Explain
    your answer. If geographic region is not important, remove it from the model from now on.
    d) Use model selection techniques from class, build on your current model by selecting relevant predictors
    from Pop65, Crimes, Bachelor, Poverty, and PersonalInc. Perform a partial F-test to assess whether
    the improvement from adding these predictors compared to the first model is statistically significant at
    α = 0.05.
    e) Using the model chosen in d), identify any influential points. For any data points with large influence,
    use leverages and/or residuals (standardized or studentized) to explain why they are influential.
    f) Summarize your analysis and comment on any interesting or unexpected findings.
    2

    [email protected]

    版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
    免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

    python代写
    微信客服:horysk8