High School

You are given a model with age as an independent variable, and you run the following line of code:

```
1 - pchisq(964.52 - 960.23, 713 - 712)
```

This results in 0.038.

**Coefficients:**

| Estimate | Std. Error | z value | Pr(>|z|) |
|----------|------------|---------|---------|
| (Intercept) | -0.05622 | 0.17353 | -0.327 | 0.7438 |
| Age | -0.01006 | 0.00533 | -2.057 | 0.0397 |

**Question:**

What is your best guess of the probability of "Survived" being equal to 1 if your alpha value is 0.012?

You are modeling whether an individual is going to survive the Titanic disaster.

The first 6 rows are as follows:

| PassengerId | Survived | Class | Name | Sex | Age | SibSp | Ticket |
|-------------|----------|-------|----------------------------------------------------|--------|-----|-------|----------------|
| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22 | 1 | A/5 21171 |
| 2 | 1 | 1 | Cummings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38 | 1 | PC 17599 |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26 | 0 | STON/O2. 3101282 |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35 | 1 | 113803 |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35 | 0 | 373450 |
| 6 | 0 | 3 | Moran, Mr. James | male | 0 | 0 | 330877 |

**Part (i):**

If degrees of freedom were not an issue, give one variable which would allow you to explain all of the variability in "Survived."

**Part (ii):**

Use the output below to predict the probability of survival for:

**Coefficients:**

| Estimate | Std. Error | t value | Pr(>|t|) |
|-------------|------------|---------|--------------|
| (Intercept) | 0.96809 | 0.03919 | 24.700 | < 2e-16 |
| factor(Class)2 | -0.04703 | 0.05862 | -0.802 | 0.42256 |
| factor(Class)3 | -0.46809 | 0.05039 | -9.290 | < 2e-16 |
| Sex: Male | -0.59923 | 0.05215 | -11.400 | < 2e-16 |
| factor(Class)2:Sex: Male | -0.16441 | 0.07718 | -2.130 | 0.03342 |
| factor(Class)3:Sex: Male | 0.23468 | 0.06433 | 3.648 | 0.00028 |

Note that the model above includes interaction effects, which work just like in regular regression. These effects are created by multiplying the values of the independent variables.

**Predict for:**

1) A female individual who is in second class.
2) A female individual who is in third class.
3) A male individual who is in first class.

**Confusion Table:**

| | Predicted 0 | Predicted 1 |
|---------------|-------------|-------------|
| Actual 0 | 40 | 10 |
| Actual 1 | 15 | 25 |

**Additional Information:**

- Table(Survived):
- Survived 0: 549
- Survived 1: 342

- GLM Output:
- Null deviance: 1186.70 on 890 degrees of freedom
- Residual deviance: 1084.40 on 889 degrees of freedom

**Tasks:**

a) Which of the above independent variables would you use if you wanted to predict the "Survived" variable? Justify your choice.

b) Find Sensitivity (\(P(Y_{\text{Predicted}} = 1 | Y_{\text{Actual}} = 1)\)). Show work.

c) Find Precision (\(P(Y_{\text{Actual}} = 1 | Y_{\text{Predicted}} = 1)\)). Show work.

d) Find Accuracy. Show work.

Answer :

The model has used age as an independent variable to predict 'Survived', but the result does not reject the null hypothesis. Evaluation metrics (Precision, Sensitivity, Accuracy) are addressed using the confusion matrix.

This question involves a statistical analysis of the Titanic dataset, a common introductory dataset for data science or statistics. To provide a rudimentary understanding, alpha is a statistical term for the probability of rejecting the null hypothesis when it is true. In this context, the null hypothesis could be that there is no association between age and survival rate. If your alpha value is 0.01, you're stating that you would only accept a result as significant if there is less than a 1% chance it occurred due to randomness.

From the output given, the probability of 'Survived' being 1, assuming age as the independent variable, yields an alpha value of 0.0397 which is greater than your stated alpha value of 0.01. This insinuates that age is not significantly associated with survival rate in this model.

To predict the probability of survival according to the indicated class and sex, you would need to consider the related coefficients in the logistic regression model. The exact formula will also incorporate interaction effects.

The terms Precision, Sensitivity, and Accuracy are evaluation metrics. Sensitivity (True Positive Rate) observes proportion of actual positives correctly identified, calculated as 25/(25+15). Precision (Positive Predictive Value) measures the proportion of predicted positives that are actually positive, calculated as 25/(25+10). Accuracy is the ratio of true predictions (both positives and negatives) to all predictions, calculated as (25+40)/(25+15+40+10).

Learn more about the topic of Statistical Analysis here:

https://brainly.com/question/34698288

#SPJ11