Introduction ¶

We will briefly talk in this lesson about the Scientific Library for Python (SciPy), which is the scientific toolbox for Python.
We will get a brief overview of the statistics subpackage and we will use it to perform many statistical calculations, including calculations of probabilities, probability distributions, and confidence intervals.
We will also perform hypothesis testing on a real-world dataset, and we will be able to state conclusions that go beyond the sample that we have.

In this lesson we will show how to use SciPy Library to perform

Statistics
Probabilities
Hypothesis testing
Performing statistical tests

Introduction to SciPy ¶

SciPy

is a tool for doing scientific computing in Python.
is a Python-based ecosystem that is an open source software for math, science, and engineering.
contains various toolboxes dedicated to common issues in scientific computing.

Let us first import the libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from numpy import sqrt, arange
from scipy import stats
%matplotlib inline

In this lesson we use a dataset containing information about Portuguese students from two public schools. This is a real world dataset that was collected in order to study alcohol consumption in young people and its effects on students' academic performance. The dataset was built from two sources: school reports and questionnaires.

Attributes:

1 school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
2 sex - student's sex (binary: 'F' - female or 'M' - male)
3 age - student's age (numeric: from 15 to 22)
4 address - student's home address type (binary: 'U' - urban or 'R' - rural)
5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 -secondary education or 4 - higher education)
8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 -secondary education or 4 - higher education)
9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
11 reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
12 guardian - student's guardian (nominal: 'mother', 'father' or 'other')
13 traveltime - home to school travel time (numeric: 1 <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
15 failures - number of past class failures (numeric: n if 1<=n<3, else 4)
16 schoolsup - extra educational support (binary: yes or no)
17 famsup - family educational support (binary: yes or no)
18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
19 activities - extra-curricular activities (binary: yes or no)
20 nursery - attended nursery school (binary: yes or no)
21 higher - wants to take higher education (binary: yes or no)
22 internet - Internet access at home (binary: yes or no)
23 romantic - with a romantic relationship (binary: yes or no)
24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
25 freetime - free time after school (numeric: from 1 - very low to 5 - very high)
26 goout - going out with friends (numeric: from 1 - very low to 5 - very high)
27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
29 health - current health status (numeric: from 1 - very bad to 5 - very good)
30 absences - number of school absences (numeric: from 0 to 93)

these grades are related with the course subject:

31 G1 - first period grade (numeric: from 0 to 20)
31 G2 - second period grade (numeric: from 0 to 20)
32 G3 - final grade (numeric: from 0 to 20, output target)

Importing the data into Python ¶

student = pd.read_csv("/Users/dhafermalouche/Documents/Teaching_2018_2019/Teaching_python/student.csv", sep=";")

A preview of the data

student.head()

Confidence Intervals ¶

G3 is the variable that gives the final grade (numeric: from 0 to 20, output target). We will show how to compute the confidence interval of the mean.

The sample size

sample_size = student.shape[0]
sample_size

649

Because we have a sample size that is much greater than 30, we can use the Central Limit Theorem to calculate confidence intervals. According to this theorem we can calculate a confidence interval for the mean using the normal distribution.

To get the confidence interval for the mean we need three numbers:

Sample mean
Standard error
Confidence level

Formula for the standard error:

$$ SE = \frac{s}{\sqrt n} $$

sample_mean_grade = student['G3'].mean()
sample_mean_grade

11.906009244992296

std_error_grades = student['G3'].std()/sqrt(sample_size)
std_error_grades

0.126814350307949

stats.norm.interval(0.95, loc=sample_mean_grade, scale=std_error_grades)

(11.65745768566587, 12.154560804318722)

Let us compare the confidence intervals of G3 between girls and boys

Girls data

SF = student.loc[(student.sex == 'F')]

SF.head()

Boys data

SM = student.loc[(student.sex == 'M')]

SM.head()

sample_mean_grade_F =SF['G3'].mean()
sample_mean_grade_F

12.2532637075718

sample_mean_grade_M =SM['G3'].mean()
sample_mean_grade_M

11.406015037593985

sample_size_F = SF.shape[0]
sample_size_F

383

sample_size_M = SM.shape[0]
sample_size_M

266

std_error_grades_F = SF['G3'].std()/sqrt(sample_size_F)
std_error_grades_F

0.1596364743648972

std_error_grades_M = SM['G3'].std()/sqrt(sample_size_M)
std_error_grades_M

0.20360457170514298

Girls Confidence Interval

stats.norm.interval(0.95, loc=sample_mean_grade_F, scale=std_error_grades_F)

(11.94038196719765, 12.566145447945951)

Boys Confidence Interval

stats.norm.interval(0.95, loc=sample_mean_grade_M, scale=std_error_grades_M)

(11.006957409964201, 11.805072665223769)

Let us show this graphicaly then

fig, axes = plt.subplots(1,2, figsize = (14,4))
sns.boxplot(x='sex', y='G3', data=student, ax=axes[0])
sns.pointplot(x='sex', y='G3', data=student, ax=axes[1]);

/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/seaborn/categorical.py:462: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)
/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/seaborn/categorical.py:1460: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  stat_data = remove_na(group_data)

Let us try to answer to the question: Does alcohol consumption affect academic performance?

We can then draw the same graph as above by comparing the final grades between the five categories of drinker in the weekday. We will be using then Dalc variable

fig, axes = plt.subplots(1,2, figsize = (14,4))
sns.boxplot(x='Dalc', y='G3', data=student, ax=axes[0])
sns.pointplot(x='Dalc', y='G3', data=student, ax=axes[1]);

/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/seaborn/categorical.py:462: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)
/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/seaborn/categorical.py:1460: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  stat_data = remove_na(group_data)

Hypothesis testing ¶

Some Statistical test from scipy.stats

kurtosistest(a[, axis, nan_policy]) Tests whether a dataset has normal kurtosis
normaltest(a[, axis, nan_policy]) Tests whether a sample differs from a normal distribution.
skewtest(a[, axis, nan_policy]) Tests whether the skew is different from the normal distribution.
pearsonr(x, y) Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
ttest_1samp(a, popmean[, axis, nan_policy]) Calculates the T-test for the mean of ONE group of scores.
ttest_1samp(a, popmean[, axis, nan_policy]) Calculates the T-test for the mean of ONE group of scores.
ttest_ind(a, b[, axis, equal_var, nan_policy]) Calculates the T-test for the means of two independent samples of scores.
ttest_ind_from_stats(mean1, std1, nobs1, ...) T-test for means of two independent samples from descriptive statistics.
ttest_rel(a, b[, axis, nan_policy]) Calculates the T-test on TWO RELATED samples of scores, a and b.
kstest(rvs, cdf[, args, N, alternative, mode]) Perform the Kolmogorov-Smirnov test for goodness of fit.
chisquare(f_obs[, f_exp, ddof, axis]) Calculates a one-way chi square test.
ansari(x, y) Perform the Ansari-Bradley test for equal scale parameters
bartlett(*args) Perform Bartlett’s test for equal variances
levene(*args, **kwds) Perform Levene test for equal variances.
shapiro(x[, a, reta]) Perform the Shapiro-Wilk test for normality.
anderson(x[, dist]) Anderson-Darling test for data coming from a particular distribution
anderson_ksamp(samples[, midrank]) The Anderson-Darling test for k-samples.

Normality test ¶

Let's perform the Normality test whose Null Hypothesis is that the data is sampled from a Normal distribution. We will use a significance level of 5.0%

stats.kurtosistest(a=student['G3'])

KurtosistestResult(statistic=6.754184473055633, pvalue=1.436408963570327e-11)

Correlation test ¶

Let's perform the Correlation test whose Null Hypothesis is that the correlation coefficient is equal to zero. We will use a significance level of 5.0%

stats.pearsonr(student['G1'],student['G2'])

(0.8649816303085822, 6.373794780789756e-196)

Test of Equal variances ¶

Let's perform the Bartetts's test whose Null Hypothesis is that the variances are equal. We will use a significance level of 5.0%

student.groupby('sex')['G3'].var()

sex
F     9.760297
M    11.026983
Name: G3, dtype: float64

grades_girls = student['G3'][student['sex']=='F']
grades_boys = student['G3'][student['sex']=='M']
stats.bartlett(grades_girls, grades_boys)

BartlettResult(statistic=1.1708204423803006, pvalue=0.2792327258200368)

According to the test we cannot reject the Null hypothesis of equal variances, so we will use assume that the two groups are samples from a population with the same variances. This information will be useful in our next test.

Let's make again a graph showing the confidence intervals of the mean of the final grade between Boys and Girls

fig, axes = plt.subplots(1,2, figsize = (14,4))
sns.boxplot(x='sex', y='G3', data=student, ax=axes[0])
sns.pointplot(x='sex', y='G3', data=student, ax=axes[1]);

/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/seaborn/categorical.py:462: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)
/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/seaborn/categorical.py:1460: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  stat_data = remove_na(group_data)

Comparing means ¶

The visualizations suggest there is a difference between the means of the final grade of the two groups. Now we will perform a formal statistical test to confirm the hypothesis that girls have higher final scores than the boys.

Null Hypothesis: for both groups (Boys and Girls) the population means of the final grade are equal.

Alternative Hypothesis: The population means of the final grades are different.

A common test to apply in for these cases is the two-sample t-test, which is used to determine if two population means are equal.

The assumptions of this test are:

Independent samples: we will assume that the method for collecting the data assured that the answers given by the students are independent.
Large enough sample size or observations come from a normally-distributed population: this assumption is required if we are working with small samples (less than 30), since in the smaller group we have 166 observations we can say that we have a "large enough" sample.
Variances are equal

In addition this test have two versions: one assuming equal variances and the other assuming unequal variances.

stats.ttest_ind(grades_girls, grades_boys, equal_var=True)

Ttest_indResult(statistic=3.310937693029702, pvalue=0.000981528706137396)

Since we got such a low p-value we can reject the Null hypothesis of equal means for the two groups at a level of significance of 5%.

There is then a statistical significant difference between the grades in the two analyzed groups. The results suggest that the girls have higher grades than boys.

Do male teenagers drink more than female teenagers?

Let's first create two categorical variables Dalc_c and Walc_c from the variables Dalc and Walc by considering three categories of drinker:

Low when the scores are 1 or 2
Med when the scores are equal to 3
High when the scores are 4 or 5

Dalc_cat = {
    1: 'Low',
    2: 'Low',
    3: 'Med',
    4: 'High',
    5: 'High'
}

student['Dalcc'] = student['Dalc'].map(Dalc_cat)
student['Dalcc'].head()

0    Low
1    Low
2    Low
3    Low
4    Low
Name: Dalcc, dtype: object

student['Dalcc'].value_counts()

Low     572
Med      43
High     34
Name: Dalcc, dtype: int64

Make it as an ordered variable

student['Dalcc'] = student['Dalcc'].astype(dtype='category', 
                               categories=['Low', 'Med', 'High'],
                               ordered=True)

/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:3: FutureWarning: specifying 'categories' or 'ordered' in .astype() is deprecated; pass a CategoricalDtype instead
  This is separate from the ipykernel package so we can avoid doing imports until

student['Dalcc'].value_counts(sort=False)

Low     572
Med      43
High     34
Name: Dalcc, dtype: int64

gender_Dalc_table = pd.crosstab(student['Dalcc'], student['sex'])
gender_Dalc_table

fig, axes = plt.subplots(1,2, figsize = (14,4))
student['sex'].value_counts().plot(kind='bar', ax=axes[0], title='Gender')
student['Dalcc'].value_counts().plot(kind='bar', ax=axes[1], title='Alcohol Consumption Level');

And the plot of the cross-table

Given the alcohol consumption

100*(gender_Dalc_table.T/gender_Dalc_table.apply(sum, axis=1)).T

And Given the sex variable

100*(gender_Dalc_table/gender_Dalc_table.apply(sum, axis=0)).T

fig, axes = plt.subplots(1,2, figsize = (14,4))
gender_Dalc_table.plot(kind='bar', stacked=True, ax=axes[0]);
(100*(gender_Dalc_table.T/gender_Dalc_table.apply(sum, axis=1)).T).plot(kind='bar', stacked=True, ax=axes[1]);

fig, axes = plt.subplots(1,2, figsize = (14,4))
(gender_Dalc_table.T).plot(kind='bar', stacked=True, ax=axes[0]);
(100*(gender_Dalc_table/gender_Dalc_table.apply(sum, axis=0)).T).plot(kind='bar', stacked=True, ax=axes[1]);

Chi-square test of independence of variables in a contingency table.

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table.

chi_stat, p_value, dof, expected = stats.chi2_contingency(gender_Dalc_table)

chi_stat

39.435980582823575

 p_value

2.732660141395978e-09

dof

2

One-Way Anova ¶

The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of independent groups.

The Tukey Test (or Tukey procedure), also called Tukey’s Honest Significant Difference test, is a post-hoc test based on the studentized range distribution.

An ANOVA test can tell you if your results are significant overall, but it will not tell you exactly where those differences lie. After you have run an ANOVA and found significant results, then you can run Tukey’s HSD to find out which specific groups’s means (compared with each other) are different. The test compares all possible pairs of means.

As an example we will compare the final grades between the group of Alcohol consumption on the whole sample and we will do the same on the sub-samples of Girls and Boys.

Let us first import the libraries

import pandas as pd
import scipy.stats as stats
import researchpy as rp
import statsmodels.api as sm
from statsmodels.formula.api import ols

/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

We will now compute the confidence interval of the variable G3 and the confidence intervals by groups

rp.summary_cont(student['G3'])

rp.summary_cont(student['G3'].groupby(student['Dalcc']))

This is a first method to perform an ANOVA

stats.f_oneway(student['G3'][student['Dalcc'] == 'High'], 
             student['G3'][student['Dalcc'] == 'Med'],
             student['G3'][student['Dalcc'] == 'Low'])

F_onewayResult(statistic=11.358219553765995, pvalue=1.4188530997129723e-05)

Second method

We need to create a new variable with type object

student['Dalcc'].head()

0    Low
1    Low
2    Low
3    Low
4    Low
Name: Dalcc, dtype: category
Categories (3, object): [Low < Med < High]

student['Dalcc2'] = student['Dalcc'].astype(dtype='object')

student['Dalcc2'].head()

0    Low
1    Low
2    Low
3    Low
4    Low
Name: Dalcc2, dtype: object

results = ols('G3 ~ C(Dalcc2)', data=student).fit()
results.summary()

Let's us now compare the One-way ANOVA results between the girls and the boys sub-sample. To do so we will first consider these two sub-samples and perform two one-way ANOVA. We will then compare the coefficients and measure the difference of the impact of the consumption of Alcohol between Girls and Boys teenagers.

Girls sub-samples

student_girls=student[student['sex']=='F']

student_girls.head()

Let's then perform the One way-ANOVA

results_girls = ols('G3 ~ C(Dalcc2)', data=student_girls).fit()
results_girls.summary()

Boys sub-samples

student_boys=student[student['sex']=='M']

student_boys.head()

Let's then perform the One way-ANOVA

results_boys = ols('G3 ~ C(Dalcc2)', data=student_boys).fit()
results_boys.summary()

Comparison of the coefficients

I will draw in a same figure a plot with the confidence intervals of the coefficients of the Boys and Girls models. We will first create a dataframe containing the estimation of the models.

x=['Boys','Girls']
coef_df = pd.DataFrame()
for i, mod in enumerate([results_boys, results_girls]):
    err_series = mod.params - mod.conf_int()[0]
    coef_df = coef_df.append(pd.DataFrame({'coef': mod.params.values[1:],
                                           'err': err_series.values[1:],
                                           'varname': err_series.index.values[1:],
                                           'model':  x[i]
                                          })
                            )
coef_df

## marker to use
marker_list = 'so'
width=0.05
## 5 covariates in total
base_x = pd.np.arange(2) - 0.2
base_x

array([-0.2,  0.8])

from matplotlib import pyplot as plt
from matplotlib.lines import Line2D
%matplotlib inline

fig, ax = plt.subplots(figsize=(8, 5))
for i, mod in enumerate(coef_df.model.unique()):
    mod_df = coef_df[coef_df.model == mod]
    mod_df = mod_df.set_index('varname').reindex(coef_df['varname'].unique())
    ## offset x posistions
    X = base_x + width*i
    ax.bar(X, mod_df['coef'],  
           color='none',yerr=mod_df['err'])
    ## remove axis labels
    ax.set_ylabel('')
    ax.set_xlabel('')
    ax.scatter(x=X, 
               marker=marker_list[i], s=120, 
               y=mod_df['coef'], color='black')
    ax.axhline(y=0, linestyle='--', color='black', linewidth=4)
    ax.xaxis.set_ticks_position('none')
    _ = ax.set_xticklabels(['','','Low','','','','Med'], 
                           rotation=0, fontsize=16)


       
## finally, build customized legend
legend_elements = [Line2D([0], [0], marker=m,
                          label=x[i],
                          color = 'k',
                          markersize=10)
                   for i, m in enumerate(marker_list)
                  ]
_ = ax.legend(handles=legend_elements, loc=2, 
              prop={'size': 15}, labelspacing=1.2)

Tukey’s HSD Post-hoc comparison ¶

The overall model was significant, now to test which groups differ. Deciding which groups to compare should be theory driven. There are a few different techniques that can be used. Each of these techniques have different ways of controlling for familywise error rate. We will show here how to perform one them:

from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

mc_girls = MultiComparison(student_girls['G3'], student_girls['Dalcc2'])
mc_results_girls = mc_girls.tukeyhsd()
print(mc_results_girls)

Multiple Comparison of Means - Tukey HSD,FWER=0.05
============================================
group1 group2 meandiff  lower  upper  reject
--------------------------------------------
 High   Low    1.753   -0.7244 4.2304 False 
 High   Med    1.2626  -2.0372 4.5625 False 
 Low    Med   -0.4904  -2.7373 1.7565 False 
--------------------------------------------

mc_boys = MultiComparison(student_boys['G3'], student_boys['Dalcc2'])
mc_results_boys = mc_boys.tukeyhsd()
print(mc_results_girls)

Multiple Comparison of Means - Tukey HSD,FWER=0.05
============================================
group1 group2 meandiff  lower  upper  reject
--------------------------------------------
 High   Low    1.753   -0.7244 4.2304 False 
 High   Med    1.2626  -2.0372 4.5625 False 
 Low    Med   -0.4904  -2.7373 1.7565 False 
--------------------------------------------

sex	F	M
Dalcc
Low	63.461538	36.538462
Med	25.581395	74.418605
High	26.470588	73.529412

Dalcc	Low	Med	High
sex
F	94.778068	2.872063	2.349869
M	78.571429	12.030075	9.398496

	N	Mean	SD	SE	95% Conf.	Interval
Dalcc
Low	572	12.101399	3.172263	0.132639	11.841199	12.361599
Med	43	11.139535	2.252844	0.343555	10.458197	11.820873
High	34	9.588235	4.171459	0.715399	8.164966	11.011505

Dep. Variable:	G3	R-squared:	0.034
Model:	OLS	Adj. R-squared:	0.031
Method:	Least Squares	F-statistic:	11.36
Date:	Sun, 04 Nov 2018	Prob (F-statistic):	1.42e-05
Time:	20:01:02	Log-Likelihood:	-1670.2
No. Observations:	649	AIC:	3346.
Df Residuals:	646	BIC:	3360.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	9.5882	0.545	17.580	0.000	8.517	10.659
C(Dalcc2)[T.Low]	2.5132	0.561	4.477	0.000	1.411	3.616
C(Dalcc2)[T.Med]	1.5513	0.730	2.126	0.034	0.118	2.984

	school	sex	age	address	famsize	Pstatus	Medu	Fedu	Mjob	Fjob	...	famrel	freetime	goout	Dalc	Walc	health	absences	G1	G2	G3
0	GP	F	18	U	GT3	A	4	4	at_home	teacher	...	4	3	4	1	1	3	4	0	11	11
1	GP	F	17	U	GT3	T	1	1	at_home	other	...	5	3	3	1	1	3	2	9	11	11
2	GP	F	15	U	LE3	T	1	1	at_home	other	...	4	3	2	2	3	3	6	12	13	12
3	GP	F	15	U	GT3	T	4	2	health	services	...	3	2	2	1	1	5	0	14	14	14
4	GP	F	16	U	GT3	T	3	3	other	other	...	4	3	2	1	2	5	0	11	13	13

Omnibus:	109.962	Durbin-Watson:	1.608
Prob(Omnibus):	0.000	Jarque-Bera (JB):	269.472
Skew:	-0.884	Prob(JB):	3.05e-59
Kurtosis:	5.615	Cond. No.	10.8

Dep. Variable:	G3	R-squared:	0.008
Model:	OLS	Adj. R-squared:	0.003
Method:	Least Squares	F-statistic:	1.496
Date:	Sun, 04 Nov 2018	Prob (F-statistic):	0.225
Time:	20:01:07	Log-Likelihood:	-977.75
No. Observations:	383	AIC:	1961.
Df Residuals:	380	BIC:	1973.
Df Model:	2
Covariance Type:	nonrobust

Omnibus:	62.566	Durbin-Watson:	1.544
Prob(Omnibus):	0.000	Jarque-Bera (JB):	144.827
Skew:	-0.829	Prob(JB):	3.56e-32
Kurtosis:	5.516	Cond. No.	16.5

Dep. Variable:	G3	R-squared:	0.051
Model:	OLS	Adj. R-squared:	0.044
Method:	Least Squares	F-statistic:	7.057
Date:	Sun, 04 Nov 2018	Prob (F-statistic):	0.00103
Time:	20:01:10	Log-Likelihood:	-689.23
No. Observations:	266	AIC:	1384.
Df Residuals:	263	BIC:	1395.
Df Model:	2
Covariance Type:	nonrobust

Omnibus:	54.017	Durbin-Watson:	1.640
Prob(Omnibus):	0.000	Jarque-Bera (JB):	124.841
Skew:	-0.967	Prob(JB):	7.78e-28
Kurtosis:	5.743	Cond. No.	7.75

	coef	err	varname	model
0	2.501627	1.353127	C(Dalcc2)[T.Low]	Boys
1	1.666250	1.706736	C(Dalcc2)[T.Med]	Boys
0	1.752984	2.070134	C(Dalcc2)[T.Low]	Girls
1	1.262626	2.757395	C(Dalcc2)[T.Med]	Girls

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	10.5556	1.040	10.149	0.000	8.511	12.600
C(Dalcc2)[T.Low]	1.7530	1.053	1.665	0.097	-0.317	3.823
C(Dalcc2)[T.Med]	1.2626	1.402	0.900	0.369	-1.495	4.020

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	9.2400	0.649	14.227	0.000	7.961	10.519
C(Dalcc2)[T.Low]	2.5016	0.687	3.640	0.000	1.148	3.855
C(Dalcc2)[T.Med]	1.6662	0.867	1.922	0.056	-0.040	3.373