Regression model with Python¶

Simple linear regression model¶

Method 1¶

We will be using the library sklearn.model_selection. We will estimate a regression model on training dataset and use it to predict the response target variable on a test dataset.

Importing the libraries¶

import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset¶

dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

dataset

X

array([[ 1.1],
       [ 1.3],
       [ 1.5],
       [ 2. ],
       [ 2.2],
       [ 2.9],
       [ 3. ],
       [ 3.2],
       [ 3.2],
       [ 3.7],
       [ 3.9],
       [ 4. ],
       [ 4. ],
       [ 4.1],
       [ 4.5],
       [ 4.9],
       [ 5.1],
       [ 5.3],
       [ 5.9],
       [ 6. ],
       [ 6.8],
       [ 7.1],
       [ 7.9],
       [ 8.2],
       [ 8.7],
       [ 9. ],
       [ 9.5],
       [ 9.6],
       [10.3],
       [10.5]])

y

array([ 39343,  46205,  37731,  43525,  39891,  56642,  60150,  54445,
        64445,  57189,  63218,  55794,  56957,  57081,  61111,  67938,
        66029,  83088,  81363,  93940,  91738,  98273, 101302, 113812,
       109431, 105582, 116969, 112635, 122391, 121872])

Splitting the dataset into the Training set and Test set¶

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 5)

X_train

array([[10.3],
       [ 1.1],
       [ 5.3],
       [ 2.9],
       [ 1.3],
       [ 9.6],
       [ 4. ],
       [ 6.8],
       [ 6. ],
       [ 8.7],
       [ 3.2],
       [ 2.2],
       [ 3.2],
       [ 3.7],
       [ 5.1],
       [ 7.9],
       [ 3. ],
       [ 4.9],
       [ 4.5],
       [ 2. ]])

X_test

array([[ 4. ],
       [10.5],
       [ 8.2],
       [ 9. ],
       [ 5.9],
       [ 3.9],
       [ 1.5],
       [ 4.1],
       [ 9.5],
       [ 7.1]])

Fitting Simple Linear Regression to the Training set¶

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py:509: RuntimeWarning: internal gelsd driver lwork query error, required iwork dimension not returned. This is likely the result of LAPACK bug 0038, fixed in LAPACK 3.2.2 (released July 21, 2010). Falling back to 'gelss' driver.
  linalg.lstsq(X, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

help(regressor.fit)

Help on method fit in module sklearn.linear_model.base:

fit(X, y, sample_weight=None) method of sklearn.linear_model.base.LinearRegression instance
    Fit linear model.
    
    Parameters
    ----------
    X : numpy array or sparse matrix of shape [n_samples,n_features]
        Training data
    
    y : numpy array of shape [n_samples, n_targets]
        Target values. Will be cast to X's dtype if necessary
    
    sample_weight : numpy array of shape [n_samples]
        Individual weights for each sample
    
        .. versionadded:: 0.17
           parameter *sample_weight* support to LinearRegression.
    
    Returns
    -------
    self : returns an instance of self.

Predicting the Test set results¶

y_pred = regressor.predict(X_test)

Visualizing the Training set results¶

plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Visualizing the Test set results¶

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

The $R^2$

regressor.score(X_train,y_train)

0.9494021955344463

Since our model is $y=a+b\times x + \epsilon$ then the estimation of $b$ is

regressor.coef_

array([9213.15275885])

and the estimation of the intercept $a$ is

regressor.intercept_

27334.81404888486

Method 2¶

We will be using now statsmodels library.

import statsmodels.api as sm

/Users/dhafermalouche/anaconda3/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

The model without constant (intercept):

$y=b\times x + \epsilon$

model0 = sm.OLS(y, X).fit()

model0.summary()

The model with constant

$y=a+b\times x + \epsilon$

X = sm.add_constant(X) 
X.head()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-06e640ddfb5b> in <module>()
      1 X = sm.add_constant(X)
----> 2 X.head()

AttributeError: 'numpy.ndarray' object has no attribute 'head'

model = sm.OLS(y, X).fit() ## sm.OLS(output, input)

model.summary()

Multivariate Regression Model¶

Method 1¶

Loading libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing data

dataset = pd.read_csv('startups.csv')
dataset.head()

Independent and dependent variables

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

X

array([[165349.2, 136897.8, 471784.1, 'New York'],
       [162597.7, 151377.59, 443898.53, 'California'],
       [153441.51, 101145.55, 407934.54, 'Florida'],
       [144372.41, 118671.85, 383199.62, 'New York'],
       [142107.34, 91391.77, 366168.42, 'Florida'],
       [131876.9, 99814.71, 362861.36, 'New York'],
       [134615.46, 147198.87, 127716.82, 'California'],
       [130298.13, 145530.06, 323876.68, 'Florida'],
       [120542.52, 148718.95, 311613.29, 'New York'],
       [123334.88, 108679.17, 304981.62, 'California'],
       [101913.08, 110594.11, 229160.95, 'Florida'],
       [100671.96, 91790.61, 249744.55, 'California'],
       [93863.75, 127320.38, 249839.44, 'Florida'],
       [91992.39, 135495.07, 252664.93, 'California'],
       [119943.24, 156547.42, 256512.92, 'Florida'],
       [114523.61, 122616.84, 261776.23, 'New York'],
       [78013.11, 121597.55, 264346.06, 'California'],
       [94657.16, 145077.58, 282574.31, 'New York'],
       [91749.16, 114175.79, 294919.57, 'Florida'],
       [86419.7, 153514.11, 0.0, 'New York'],
       [76253.86, 113867.3, 298664.47, 'California'],
       [78389.47, 153773.43, 299737.29, 'New York'],
       [73994.56, 122782.75, 303319.26, 'Florida'],
       [67532.53, 105751.03, 304768.73, 'Florida'],
       [77044.01, 99281.34, 140574.81, 'New York'],
       [64664.71, 139553.16, 137962.62, 'California'],
       [75328.87, 144135.98, 134050.07, 'Florida'],
       [72107.6, 127864.55, 353183.81, 'New York'],
       [66051.52, 182645.56, 118148.2, 'Florida'],
       [65605.48, 153032.06, 107138.38, 'New York'],
       [61994.48, 115641.28, 91131.24, 'Florida'],
       [61136.38, 152701.92, 88218.23, 'New York'],
       [63408.86, 129219.61, 46085.25, 'California'],
       [55493.95, 103057.49, 214634.81, 'Florida'],
       [46426.07, 157693.92, 210797.67, 'California'],
       [46014.02, 85047.44, 205517.64, 'New York'],
       [28663.76, 127056.21, 201126.82, 'Florida'],
       [44069.95, 51283.14, 197029.42, 'California'],
       [20229.59, 65947.93, 185265.1, 'New York'],
       [38558.51, 82982.09, 174999.3, 'California'],
       [28754.33, 118546.05, 172795.67, 'California'],
       [27892.92, 84710.77, 164470.71, 'Florida'],
       [23640.93, 96189.63, 148001.11, 'California'],
       [15505.73, 127382.3, 35534.17, 'New York'],
       [22177.74, 154806.14, 28334.72, 'California'],
       [1000.23, 124153.04, 1903.93, 'New York'],
       [1315.46, 115816.21, 297114.46, 'Florida'],
       [0.0, 135426.92, 0.0, 'California'],
       [542.05, 51743.15, 0.0, 'New York'],
       [0.0, 116983.8, 45173.06, 'California']], dtype=object)

y

array([192261.83, 191792.06, 191050.39, 182901.99, 166187.94, 156991.12,
       156122.51, 155752.6 , 152211.77, 149759.96, 146121.95, 144259.4 ,
       141585.52, 134307.35, 132602.65, 129917.04, 126992.93, 125370.37,
       124266.9 , 122776.86, 118474.03, 111313.02, 110352.25, 108733.99,
       108552.04, 107404.34, 105733.54, 105008.31, 103282.38, 101004.64,
        99937.59,  97483.56,  97427.84,  96778.92,  96712.8 ,  96479.51,
        90708.19,  89949.14,  81229.06,  81005.76,  78239.91,  77798.83,
        71498.49,  69758.98,  65200.33,  64926.08,  49490.75,  42559.73,
        35673.41,  14681.4 ])

Encoding categorical data

Importing library

dX = pd.DataFrame(X,columns=dataset.columns[:4])

dummies = pd.get_dummies(dX.State)
dummies.head()

dummies1=pd.DataFrame(dummies.iloc[:, :-1].values,columns=['California','Florida'])
dummies1.head()

X1 = dataset.iloc[:, :-2].values
X1

array([[165349.2 , 136897.8 , 471784.1 ],
       [162597.7 , 151377.59, 443898.53],
       [153441.51, 101145.55, 407934.54],
       [144372.41, 118671.85, 383199.62],
       [142107.34,  91391.77, 366168.42],
       [131876.9 ,  99814.71, 362861.36],
       [134615.46, 147198.87, 127716.82],
       [130298.13, 145530.06, 323876.68],
       [120542.52, 148718.95, 311613.29],
       [123334.88, 108679.17, 304981.62],
       [101913.08, 110594.11, 229160.95],
       [100671.96,  91790.61, 249744.55],
       [ 93863.75, 127320.38, 249839.44],
       [ 91992.39, 135495.07, 252664.93],
       [119943.24, 156547.42, 256512.92],
       [114523.61, 122616.84, 261776.23],
       [ 78013.11, 121597.55, 264346.06],
       [ 94657.16, 145077.58, 282574.31],
       [ 91749.16, 114175.79, 294919.57],
       [ 86419.7 , 153514.11,      0.  ],
       [ 76253.86, 113867.3 , 298664.47],
       [ 78389.47, 153773.43, 299737.29],
       [ 73994.56, 122782.75, 303319.26],
       [ 67532.53, 105751.03, 304768.73],
       [ 77044.01,  99281.34, 140574.81],
       [ 64664.71, 139553.16, 137962.62],
       [ 75328.87, 144135.98, 134050.07],
       [ 72107.6 , 127864.55, 353183.81],
       [ 66051.52, 182645.56, 118148.2 ],
       [ 65605.48, 153032.06, 107138.38],
       [ 61994.48, 115641.28,  91131.24],
       [ 61136.38, 152701.92,  88218.23],
       [ 63408.86, 129219.61,  46085.25],
       [ 55493.95, 103057.49, 214634.81],
       [ 46426.07, 157693.92, 210797.67],
       [ 46014.02,  85047.44, 205517.64],
       [ 28663.76, 127056.21, 201126.82],
       [ 44069.95,  51283.14, 197029.42],
       [ 20229.59,  65947.93, 185265.1 ],
       [ 38558.51,  82982.09, 174999.3 ],
       [ 28754.33, 118546.05, 172795.67],
       [ 27892.92,  84710.77, 164470.71],
       [ 23640.93,  96189.63, 148001.11],
       [ 15505.73, 127382.3 ,  35534.17],
       [ 22177.74, 154806.14,  28334.72],
       [  1000.23, 124153.04,   1903.93],
       [  1315.46, 115816.21, 297114.46],
       [     0.  , 135426.92,      0.  ],
       [   542.05,  51743.15,      0.  ],
       [     0.  , 116983.8 ,  45173.06]])

dX = pd.DataFrame(X1, columns=dataset.columns[:3])

dX1=dX.join(dummies1)

dX1

X2 = dX1.values

X2

array([[1.6534920e+05, 1.3689780e+05, 4.7178410e+05, 0.0000000e+00,
        0.0000000e+00],
       [1.6259770e+05, 1.5137759e+05, 4.4389853e+05, 1.0000000e+00,
        0.0000000e+00],
       [1.5344151e+05, 1.0114555e+05, 4.0793454e+05, 0.0000000e+00,
        1.0000000e+00],
       [1.4437241e+05, 1.1867185e+05, 3.8319962e+05, 0.0000000e+00,
        0.0000000e+00],
       [1.4210734e+05, 9.1391770e+04, 3.6616842e+05, 0.0000000e+00,
        1.0000000e+00],
       [1.3187690e+05, 9.9814710e+04, 3.6286136e+05, 0.0000000e+00,
        0.0000000e+00],
       [1.3461546e+05, 1.4719887e+05, 1.2771682e+05, 1.0000000e+00,
        0.0000000e+00],
       [1.3029813e+05, 1.4553006e+05, 3.2387668e+05, 0.0000000e+00,
        1.0000000e+00],
       [1.2054252e+05, 1.4871895e+05, 3.1161329e+05, 0.0000000e+00,
        0.0000000e+00],
       [1.2333488e+05, 1.0867917e+05, 3.0498162e+05, 1.0000000e+00,
        0.0000000e+00],
       [1.0191308e+05, 1.1059411e+05, 2.2916095e+05, 0.0000000e+00,
        1.0000000e+00],
       [1.0067196e+05, 9.1790610e+04, 2.4974455e+05, 1.0000000e+00,
        0.0000000e+00],
       [9.3863750e+04, 1.2732038e+05, 2.4983944e+05, 0.0000000e+00,
        1.0000000e+00],
       [9.1992390e+04, 1.3549507e+05, 2.5266493e+05, 1.0000000e+00,
        0.0000000e+00],
       [1.1994324e+05, 1.5654742e+05, 2.5651292e+05, 0.0000000e+00,
        1.0000000e+00],
       [1.1452361e+05, 1.2261684e+05, 2.6177623e+05, 0.0000000e+00,
        0.0000000e+00],
       [7.8013110e+04, 1.2159755e+05, 2.6434606e+05, 1.0000000e+00,
        0.0000000e+00],
       [9.4657160e+04, 1.4507758e+05, 2.8257431e+05, 0.0000000e+00,
        0.0000000e+00],
       [9.1749160e+04, 1.1417579e+05, 2.9491957e+05, 0.0000000e+00,
        1.0000000e+00],
       [8.6419700e+04, 1.5351411e+05, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [7.6253860e+04, 1.1386730e+05, 2.9866447e+05, 1.0000000e+00,
        0.0000000e+00],
       [7.8389470e+04, 1.5377343e+05, 2.9973729e+05, 0.0000000e+00,
        0.0000000e+00],
       [7.3994560e+04, 1.2278275e+05, 3.0331926e+05, 0.0000000e+00,
        1.0000000e+00],
       [6.7532530e+04, 1.0575103e+05, 3.0476873e+05, 0.0000000e+00,
        1.0000000e+00],
       [7.7044010e+04, 9.9281340e+04, 1.4057481e+05, 0.0000000e+00,
        0.0000000e+00],
       [6.4664710e+04, 1.3955316e+05, 1.3796262e+05, 1.0000000e+00,
        0.0000000e+00],
       [7.5328870e+04, 1.4413598e+05, 1.3405007e+05, 0.0000000e+00,
        1.0000000e+00],
       [7.2107600e+04, 1.2786455e+05, 3.5318381e+05, 0.0000000e+00,
        0.0000000e+00],
       [6.6051520e+04, 1.8264556e+05, 1.1814820e+05, 0.0000000e+00,
        1.0000000e+00],
       [6.5605480e+04, 1.5303206e+05, 1.0713838e+05, 0.0000000e+00,
        0.0000000e+00],
       [6.1994480e+04, 1.1564128e+05, 9.1131240e+04, 0.0000000e+00,
        1.0000000e+00],
       [6.1136380e+04, 1.5270192e+05, 8.8218230e+04, 0.0000000e+00,
        0.0000000e+00],
       [6.3408860e+04, 1.2921961e+05, 4.6085250e+04, 1.0000000e+00,
        0.0000000e+00],
       [5.5493950e+04, 1.0305749e+05, 2.1463481e+05, 0.0000000e+00,
        1.0000000e+00],
       [4.6426070e+04, 1.5769392e+05, 2.1079767e+05, 1.0000000e+00,
        0.0000000e+00],
       [4.6014020e+04, 8.5047440e+04, 2.0551764e+05, 0.0000000e+00,
        0.0000000e+00],
       [2.8663760e+04, 1.2705621e+05, 2.0112682e+05, 0.0000000e+00,
        1.0000000e+00],
       [4.4069950e+04, 5.1283140e+04, 1.9702942e+05, 1.0000000e+00,
        0.0000000e+00],
       [2.0229590e+04, 6.5947930e+04, 1.8526510e+05, 0.0000000e+00,
        0.0000000e+00],
       [3.8558510e+04, 8.2982090e+04, 1.7499930e+05, 1.0000000e+00,
        0.0000000e+00],
       [2.8754330e+04, 1.1854605e+05, 1.7279567e+05, 1.0000000e+00,
        0.0000000e+00],
       [2.7892920e+04, 8.4710770e+04, 1.6447071e+05, 0.0000000e+00,
        1.0000000e+00],
       [2.3640930e+04, 9.6189630e+04, 1.4800111e+05, 1.0000000e+00,
        0.0000000e+00],
       [1.5505730e+04, 1.2738230e+05, 3.5534170e+04, 0.0000000e+00,
        0.0000000e+00],
       [2.2177740e+04, 1.5480614e+05, 2.8334720e+04, 1.0000000e+00,
        0.0000000e+00],
       [1.0002300e+03, 1.2415304e+05, 1.9039300e+03, 0.0000000e+00,
        0.0000000e+00],
       [1.3154600e+03, 1.1581621e+05, 2.9711446e+05, 0.0000000e+00,
        1.0000000e+00],
       [0.0000000e+00, 1.3542692e+05, 0.0000000e+00, 1.0000000e+00,
        0.0000000e+00],
       [5.4205000e+02, 5.1743150e+04, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [0.0000000e+00, 1.1698380e+05, 4.5173060e+04, 1.0000000e+00,
        0.0000000e+00]])

80% Training Set, 20% Test Set.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X2, y, test_size = 0.2, random_state = 0)

We can then create a regressor and “fit the line” (and use that line on Test Set):

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

regressor.coef_

array([ 7.73467193e-01,  3.28845975e-02,  3.66100259e-02, -6.99369053e+02,
       -1.65865321e+03])

regressor.intercept_

43253.53667068361

Predicting the Test set results

y_pred = regressor.predict(X_test)
y_pred

array([103015.20159776, 132582.27760831, 132447.73845184,  71976.09851266,
       178537.4822107 , 116161.24230157,  67851.69209689,  98791.73374679,
       113969.43533008, 167921.06569569])

Method 2¶

Let's start by importing the libraries and the data

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm

Defining dependent and independent variables

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

Constructing the dummy variables

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()

Deleting one dummy variable to avoid the dummy variable trap

X = X[:, 1:]

X[:4]

array([[0.0000000e+00, 1.0000000e+00, 1.6534920e+05, 1.3689780e+05,
        4.7178410e+05],
       [0.0000000e+00, 0.0000000e+00, 1.6259770e+05, 1.5137759e+05,
        4.4389853e+05],
       [1.0000000e+00, 0.0000000e+00, 1.5344151e+05, 1.0114555e+05,
        4.0793454e+05],
       [0.0000000e+00, 1.0000000e+00, 1.4437241e+05, 1.1867185e+05,
        3.8319962e+05]])

Constructing the data of the independent variables

dX2 = pd.DataFrame(X, columns=['Florida','New York','R&D Spend', 'Administration', 'Marketing Spend'])

We add an intercept

dX2 = sm.add_constant(dX2)

Performing the regression model

model = sm.OLS(y, dX2).fit() ## sm.OLS(output, input)

model.summary()

Dep. Variable:	y	R-squared:	0.973
Model:	OLS	Adj. R-squared:	0.972
Method:	Least Squares	F-statistic:	1048.
Date:	Sun, 04 Nov 2018	Prob (F-statistic):	2.56e-24
Time:	20:40:25	Log-Likelihood:	-327.28
No. Observations:	30	AIC:	656.6
Df Residuals:	29	BIC:	658.0
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
x1	1.325e+04	409.401	32.376	0.000	1.24e+04	1.41e+04

Omnibus:	0.610	Durbin-Watson:	0.323
Prob(Omnibus):	0.737	Jarque-Bera (JB):	0.671
Skew:	-0.121	Prob(JB):	0.715
Kurtosis:	2.308	Cond. No.	1.00

Dep. Variable:	y	R-squared:	0.957
Model:	OLS	Adj. R-squared:	0.955
Method:	Least Squares	F-statistic:	622.5
Date:	Sun, 04 Nov 2018	Prob (F-statistic):	1.14e-20
Time:	20:40:27	Log-Likelihood:	-301.44
No. Observations:	30	AIC:	606.9
Df Residuals:	28	BIC:	609.7
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	2.579e+04	2273.053	11.347	0.000	2.11e+04	3.04e+04
x1	9449.9623	378.755	24.950	0.000	8674.119	1.02e+04

	Experience	Salary
0	1.1	39343
1	1.3	46205
2	1.5	37731
3	2.0	43525
4	2.2	39891
5	2.9	56642
6	3.0	60150
7	3.2	54445
8	3.2	64445
9	3.7	57189
10	3.9	63218
11	4.0	55794
12	4.0	56957
13	4.1	57081
14	4.5	61111
15	4.9	67938
16	5.1	66029
17	5.3	83088
18	5.9	81363
19	6.0	93940
20	6.8	91738
21	7.1	98273
22	7.9	101302
23	8.2	113812
24	8.7	109431
25	9.0	105582
26	9.5	116969
27	9.6	112635
28	10.3	122391
29	10.5	121872

Omnibus:	2.140	Durbin-Watson:	1.648
Prob(Omnibus):	0.343	Jarque-Bera (JB):	1.569
Skew:	0.363	Prob(JB):	0.456
Kurtosis:	2.147	Cond. No.	13.2

	R&D Spend	Administration	Marketing Spend	State	Profit
0	165349.20	136897.80	471784.10	New York	192261.83
1	162597.70	151377.59	443898.53	California	191792.06
2	153441.51	101145.55	407934.54	Florida	191050.39
3	144372.41	118671.85	383199.62	New York	182901.99
4	142107.34	91391.77	366168.42	Florida	166187.94

Dep. Variable:	y	R-squared:	0.951
Model:	OLS	Adj. R-squared:	0.945
Method:	Least Squares	F-statistic:	169.9
Date:	Sun, 04 Nov 2018	Prob (F-statistic):	1.34e-27
Time:	20:40:43	Log-Likelihood:	-525.38
No. Observations:	50	AIC:	1063.
Df Residuals:	44	BIC:	1074.
Df Model:	5
Covariance Type:	nonrobust

Omnibus:	14.782	Durbin-Watson:	1.283
Prob(Omnibus):	0.001	Jarque-Bera (JB):	21.266
Skew:	-0.948	Prob(JB):	2.41e-05
Kurtosis:	5.572	Cond. No.	1.45e+06

	R&D Spend	Administration	Marketing Spend	California	Florida
0	165349.20	136897.80	471784.10	0	0
1	162597.70	151377.59	443898.53	1	0
2	153441.51	101145.55	407934.54	0	1
3	144372.41	118671.85	383199.62	0	0
4	142107.34	91391.77	366168.42	0	1
5	131876.90	99814.71	362861.36	0	0
6	134615.46	147198.87	127716.82	1	0
7	130298.13	145530.06	323876.68	0	1
8	120542.52	148718.95	311613.29	0	0
9	123334.88	108679.17	304981.62	1	0
10	101913.08	110594.11	229160.95	0	1
11	100671.96	91790.61	249744.55	1	0
12	93863.75	127320.38	249839.44	0	1
13	91992.39	135495.07	252664.93	1	0
14	119943.24	156547.42	256512.92	0	1
15	114523.61	122616.84	261776.23	0	0
16	78013.11	121597.55	264346.06	1	0
17	94657.16	145077.58	282574.31	0	0
18	91749.16	114175.79	294919.57	0	1
19	86419.70	153514.11	0.00	0	0
20	76253.86	113867.30	298664.47	1	0
21	78389.47	153773.43	299737.29	0	0
22	73994.56	122782.75	303319.26	0	1
23	67532.53	105751.03	304768.73	0	1
24	77044.01	99281.34	140574.81	0	0
25	64664.71	139553.16	137962.62	1	0
26	75328.87	144135.98	134050.07	0	1
27	72107.60	127864.55	353183.81	0	0
28	66051.52	182645.56	118148.20	0	1
29	65605.48	153032.06	107138.38	0	0
30	61994.48	115641.28	91131.24	0	1
31	61136.38	152701.92	88218.23	0	0
32	63408.86	129219.61	46085.25	1	0
33	55493.95	103057.49	214634.81	0	1
34	46426.07	157693.92	210797.67	1	0
35	46014.02	85047.44	205517.64	0	0
36	28663.76	127056.21	201126.82	0	1
37	44069.95	51283.14	197029.42	1	0
38	20229.59	65947.93	185265.10	0	0
39	38558.51	82982.09	174999.30	1	0

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	5.013e+04	6884.820	7.281	0.000	3.62e+04	6.4e+04
Florida	198.7888	3371.007	0.059	0.953	-6595.030	6992.607
New York	-41.8870	3256.039	-0.013	0.990	-6604.003	6520.229
R&D Spend	0.8060	0.046	17.369	0.000	0.712	0.900
Administration	-0.0270	0.052	-0.517	0.608	-0.132	0.078
Marketing Spend	0.0270	0.017	1.574	0.123	-0.008	0.062