Home Basic Model Fitting in R
Post
Cancel

Basic Model Fitting in R

In previous categories we have looked at cleaning, summarizing, and subsetting data, with some minor calculations, but we haven’t yet looked at analyzing our data.

R is a very powerful tool for data analysis. We can fit linear models and view graphs. First, we will look at some basic data analysis processes in R.

Let’s re-load in our Gapminder data:

1
2
3
link <- 'https://raw.githubusercontent.com/jstaf/gapminder/master/gapminder/gapminder.csv'
df <- read.csv(url(link))
head(df)
1
2
3
4
5
6
7
##       country continent year lifeExp      pop gdpPercap
## 1 Afghanistan      Asia 1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia 1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia 1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia 1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia 1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia 1977  38.438 14880372  786.1134

Linear Modelling

Let’s say we want to fit a linear model to see if there is a relationship between population and time. First, let’s do this with the entire dataset with all countries.

We know that there is Simple Linear Regression (SLR) and Multiple Linear Regression (MLR). See the pseudocode below for a layout on how to form both in R.

1
2
3
4
5
6
# Simple Linear Regression
# model_name <- lm(Y ~ X, data = dataframe)

# Multiple Linear Regression
# model_name <- lm(Y ~ X1 + X2 + ..., data = dataframe)
# model_name <- lm(Y ~ ., data = dataframe) to use all variables except Y

Now that we have the layout of the code down, let’s use Y as population and X as year for an SLR model.

1
2
model1 <- lm(pop ~ year, data = df)
summary(model1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
## 
## Call:
## lm(formula = pop ~ year, data = df)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
##  -43318856  -27548179  -18558743   -9628265 1275164661 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -972185807  294031308  -3.306 0.000965 ***
## year            506081     148533   3.407 0.000672 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 105800000 on 1702 degrees of freedom
## Multiple R-squared:  0.006775,	Adjusted R-squared:  0.006191 
## F-statistic: 11.61 on 1 and 1702 DF,  p-value: 0.0006716

Now, let’s perform MLR with two variables, year and lifeExp.

1
2
model2 <- lm(pop ~ year + lifeExp, data = df)
summary(model2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
## 
## Call:
## lm(formula = pop ~ year + lifeExp, data = df)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
##  -47148304  -27382074  -18454446   -8444157 1273829226 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -799315581  321078589  -2.489   0.0129 *
## year            409882     164973   2.485   0.0131 *
## lifeExp         295176     220507   1.339   0.1809  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 105800000 on 1701 degrees of freedom
## Multiple R-squared:  0.00782,	Adjusted R-squared:  0.006653 
## F-statistic: 6.703 on 2 and 1701 DF,  p-value: 0.00126

Finally, let’s perform MLR with all variables in the dataset.

1
2
model3 <- lm(pop ~ ., data = df)
summary(model3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
## 
## Call:
## lm(formula = pop ~ ., data = df)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -375691051   -5548380    -604356    4854327  391492520 
## 
## Coefficients: (4 not defined because of singularities)
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -2.211e+08  1.677e+08  -1.319 0.187435    
## countryAlbania                  -5.423e+07  1.431e+07  -3.789 0.000157 ***
## countryAlgeria                  -2.412e+07  1.347e+07  -1.790 0.073575 .  
## countryAngola                   -8.521e+06  1.259e+07  -0.677 0.498532    
## countryArgentina                -2.796e+07  1.449e+07  -1.930 0.053788 .  
## countryAustralia                -4.731e+07  1.550e+07  -3.053 0.002304 ** 
## countryAustria                  -5.220e+07  1.533e+07  -3.405 0.000678 ***
## countryBahrain                  -4.982e+07  1.443e+07  -3.453 0.000568 ***
## countryBangladesh                5.839e+07  1.287e+07   4.538 6.10e-06 ***
## countryBelgium                  -5.088e+07  1.537e+07  -3.310 0.000954 ***
## countryBenin                    -2.687e+07  1.282e+07  -2.096 0.036235 *  
## countryBolivia                  -2.992e+07  1.302e+07  -2.299 0.021637 *  
## countryBosnia and Herzegovina   -5.198e+07  1.424e+07  -3.650 0.000271 ***
## countryBotswana                 -3.697e+07  1.316e+07  -2.808 0.005043 ** 
## countryBrazil                    7.429e+07  1.376e+07   5.400 7.68e-08 ***
## countryBulgaria                 -4.978e+07  1.450e+07  -3.432 0.000614 ***
## countryBurkina Faso             -1.793e+07  1.268e+07  -1.414 0.157468    
## countryBurundi                  -2.106e+07  1.268e+07  -1.661 0.096887 .  
## countryCambodia                 -2.130e+07  1.278e+07  -1.666 0.095911 .  
## countryCameroon                 -2.008e+07  1.280e+07  -1.569 0.116755    
## countryCanada                   -3.735e+07  1.563e+07  -2.389 0.017010 *  
## countryCentral African Republic -2.179e+07  1.266e+07  -1.722 0.085353 .  
## countryChad                     -2.287e+07  1.274e+07  -1.795 0.072860 .  
## countryChile                    -4.360e+07  1.427e+07  -3.056 0.002279 ** 
## countryChina                     9.099e+08  1.366e+07  66.602  < 2e-16 ***
## countryColombia                 -2.330e+07  1.388e+07  -1.679 0.093446 .  
## countryComoros                  -3.532e+07  1.300e+07  -2.718 0.006646 ** 
## countryCongo, Dem. Rep.          7.368e+06  1.267e+07   0.581 0.561055    
## countryCongo, Rep.              -3.354e+07  1.302e+07  -2.576 0.010080 *  
## countryCosta Rica               -5.633e+07  1.453e+07  -3.876 0.000111 ***
## countryCote d'Ivoire            -2.113e+07  1.281e+07  -1.650 0.099208 .  
## countryCroatia                  -5.353e+07  1.460e+07  -3.666 0.000255 ***
## countryCuba                     -5.049e+07  1.464e+07  -3.448 0.000580 ***
## countryCzech Republic           -4.891e+07  1.490e+07  -3.283 0.001050 ** 
## countryDenmark                  -5.625e+07  1.554e+07  -3.620 0.000304 ***
## countryDjibouti                 -2.712e+07  1.274e+07  -2.129 0.033376 *  
## countryDominican Republic       -4.194e+07  1.366e+07  -3.071 0.002169 ** 
## countryEcuador                  -4.060e+07  1.381e+07  -2.941 0.003322 ** 
## countryEgypt                     6.005e+06  1.325e+07   0.453 0.650446    
## countryEl Salvador              -4.046e+07  1.352e+07  -2.993 0.002807 ** 
## countryEquatorial Guinea        -2.252e+07  1.264e+07  -1.781 0.075048 .  
## countryEritrea                  -2.446e+07  1.271e+07  -1.924 0.054538 .  
## countryEthiopia                  1.638e+07  1.267e+07   1.293 0.196194    
## countryFinland                  -5.543e+07  1.520e+07  -3.648 0.000273 ***
## countryFrance                   -8.804e+06  1.541e+07  -0.571 0.567874    
## countryGabon                    -3.138e+07  1.310e+07  -2.396 0.016694 *  
## countryGambia                   -2.432e+07  1.267e+07  -1.920 0.055049 .  
## countryGermany                   1.733e+07  1.538e+07   1.127 0.259883    
## countryGhana                    -2.306e+07  1.299e+07  -1.775 0.076090 .  
## countryGreece                   -5.244e+07  1.515e+07  -3.460 0.000555 ***
## countryGuatemala                -3.413e+07  1.329e+07  -2.567 0.010339 *  
## countryGuinea                   -1.818e+07  1.264e+07  -1.438 0.150520    
## countryGuinea-Bissau            -1.729e+07  1.258e+07  -1.374 0.169663    
## countryHaiti                    -2.725e+07  1.288e+07  -2.115 0.034566 *  
## countryHonduras                 -3.889e+07  1.337e+07  -2.909 0.003672 ** 
## countryHong Kong, China         -5.631e+07  1.521e+07  -3.703 0.000220 ***
## countryHungary                  -4.642e+07  1.457e+07  -3.186 0.001473 ** 
## countryIceland                  -6.410e+07  1.575e+07  -4.071 4.91e-05 ***
## countryIndia                     6.643e+08  1.304e+07  50.949  < 2e-16 ***
## countryIndonesia                 1.101e+08  1.311e+07   8.396  < 2e-16 ***
## countryIran                     -6.002e+05  1.349e+07  -0.045 0.964511    
## countryIraq                     -2.588e+07  1.334e+07  -1.939 0.052645 .  
## countryIreland                  -5.722e+07  1.513e+07  -3.781 0.000162 ***
## countryIsrael                   -5.786e+07  1.515e+07  -3.819 0.000139 ***
## countryItaly                    -7.265e+06  1.527e+07  -0.476 0.634287    
## countryJamaica                  -5.453e+07  1.439e+07  -3.789 0.000157 ***
## countryJapan                     4.916e+07  1.542e+07   3.187 0.001465 ** 
## countryJordan                   -4.258e+07  1.351e+07  -3.151 0.001656 ** 
## countryKenya                    -1.790e+07  1.301e+07  -1.376 0.169170    
## countryKorea, Dem. Rep.         -3.402e+07  1.383e+07  -2.459 0.014037 *  
## countryKorea, Rep.              -1.476e+07  1.406e+07  -1.050 0.293722    
## countryKuwait                   -4.441e+07  1.832e+07  -2.424 0.015450 *  
## countryLebanon                  -4.982e+07  1.412e+07  -3.528 0.000431 ***
## countryLesotho                  -3.122e+07  1.287e+07  -2.425 0.015431 *  
## countryLiberia                  -2.074e+07  1.263e+07  -1.643 0.100642    
## countryLibya                    -3.974e+07  1.365e+07  -2.911 0.003652 ** 
## countryMadagascar               -1.960e+07  1.278e+07  -1.534 0.125294    
## countryMalawi                   -1.672e+07  1.264e+07  -1.322 0.186342    
## countryMalaysia                 -3.626e+07  1.393e+07  -2.602 0.009349 ** 
## countryMali                     -1.668e+07  1.264e+07  -1.319 0.187231    
## countryMauritania               -3.379e+07  1.299e+07  -2.600 0.009400 ** 
## countryMauritius                -5.093e+07  1.399e+07  -3.641 0.000280 ***
## countryMexico                    1.651e+07  1.409e+07   1.172 0.241337    
## countryMongolia                 -3.857e+07  1.321e+07  -2.919 0.003558 ** 
## countryMontenegro               -5.799e+07  1.458e+07  -3.977 7.29e-05 ***
## countryMorocco                  -2.191e+07  1.334e+07  -1.642 0.100692    
## countryMozambique               -7.711e+06  1.259e+07  -0.612 0.540438    
## countryMyanmar                  -3.676e+06  1.305e+07  -0.282 0.778138    
## countryNamibia                  -3.556e+07  1.308e+07  -2.719 0.006626 ** 
## countryNepal                    -1.469e+07  1.283e+07  -1.145 0.252448    
## countryNetherlands              -4.915e+07  1.569e+07  -3.132 0.001769 ** 
## countryNew Zealand              -5.847e+07  1.530e+07  -3.821 0.000138 ***
## countryNicaragua                -4.023e+07  1.340e+07  -3.001 0.002732 ** 
## countryNiger                    -1.845e+07  1.267e+07  -1.456 0.145611    
## countryNigeria                   4.984e+07  1.265e+07   3.940 8.51e-05 ***
## countryNorway                   -5.822e+07  1.597e+07  -3.646 0.000275 ***
## countryOman                     -4.030e+07  1.359e+07  -2.965 0.003069 ** 
## countryPakistan                  5.468e+07  1.315e+07   4.159 3.37e-05 ***
## countryPanama                   -5.350e+07  1.429e+07  -3.745 0.000187 ***
## countryParaguay                 -5.108e+07  1.415e+07  -3.611 0.000315 ***
## countryPeru                     -2.598e+07  1.347e+07  -1.928 0.054039 .  
## countryPhilippines               5.621e+06  1.360e+07   0.413 0.679421    
## countryPoland                   -2.383e+07  1.459e+07  -1.633 0.102628    
## countryPortugal                 -4.834e+07  1.470e+07  -3.289 0.001028 ** 
## countryPuerto Rico              -5.801e+07  1.494e+07  -3.882 0.000108 ***
## countryReunion                  -5.358e+07  1.415e+07  -3.785 0.000159 ***
## countryRomania                  -3.503e+07  1.437e+07  -2.438 0.014885 *  
## countryRwanda                   -1.591e+07  1.261e+07  -1.262 0.207196    
## countrySao Tome and Principe    -4.295e+07  1.335e+07  -3.217 0.001324 ** 
## countrySaudi Arabia             -2.802e+07  1.391e+07  -2.015 0.044058 *  
## countrySenegal                  -2.684e+07  1.291e+07  -2.080 0.037702 *  
## countrySerbia                   -4.703e+07  1.444e+07  -3.257 0.001151 ** 
## countrySierra Leone             -1.122e+07  1.258e+07  -0.892 0.372703    
## countrySingapore                -5.517e+07  1.499e+07  -3.680 0.000241 ***
## countrySlovak Republic          -5.370e+07  1.470e+07  -3.653 0.000268 ***
## countrySlovenia                 -5.719e+07  1.491e+07  -3.835 0.000130 ***
## countrySomalia                  -1.526e+07  1.260e+07  -1.211 0.226045    
## countrySouth Africa             -6.780e+06  1.316e+07  -0.515 0.606543    
## countrySpain                    -2.663e+07  1.521e+07  -1.750 0.080268 .  
## countrySri Lanka                -3.997e+07  1.410e+07  -2.834 0.004649 ** 
## countrySudan                    -8.692e+06  1.281e+07  -0.679 0.497470    
## countrySwaziland                -3.013e+07  1.284e+07  -2.346 0.019114 *  
## countrySweden                   -5.577e+07  1.568e+07  -3.557 0.000386 ***
## countrySwitzerland              -5.543e+07  1.595e+07  -3.475 0.000525 ***
## countrySyria                    -3.750e+07  1.364e+07  -2.749 0.006043 ** 
## countryTaiwan                   -4.115e+07  1.466e+07  -2.808 0.005049 ** 
## countryTanzania                 -9.287e+06  1.278e+07  -0.726 0.467660    
## countryThailand                 -3.539e+06  1.371e+07  -0.258 0.796393    
## countryTogo                     -3.164e+07  1.295e+07  -2.443 0.014670 *  
## countryTrinidad and Tobago      -5.277e+07  1.423e+07  -3.709 0.000216 ***
## countryTunisia                  -3.975e+07  1.359e+07  -2.924 0.003502 ** 
## countryTurkey                    1.033e+06  1.352e+07   0.076 0.939115    
## countryUganda                   -1.504e+07  1.277e+07  -1.178 0.239162    
## countryUnited Kingdom           -4.993e+06  1.538e+07  -0.325 0.745519    
## countryUnited States             1.690e+08  1.566e+07  10.794  < 2e-16 ***
## countryUruguay                  -5.631e+07  1.463e+07  -3.848 0.000124 ***
## countryVenezuela                -3.789e+07  1.426e+07  -2.658 0.007945 ** 
## countryVietnam                   1.200e+07  1.332e+07   0.901 0.367650    
## countryWest Bank and Gaza       -4.401e+07  1.356e+07  -3.245 0.001201 ** 
## countryYemen, Rep.              -1.729e+07  1.274e+07  -1.357 0.175082    
## countryZambia                   -2.077e+07  1.272e+07  -1.633 0.102627    
## countryZimbabwe                 -2.855e+07  1.301e+07  -2.194 0.028351 *  
## continentAmericas                       NA         NA      NA       NA    
## continentAsia                           NA         NA      NA       NA    
## continentEurope                         NA         NA      NA       NA    
## continentOceania                        NA         NA      NA       NA    
## year                             9.442e+04  8.804e+04   1.072 0.283681    
## lifeExp                          1.339e+06  2.189e+05   6.118 1.19e-09 ***
## gdpPercap                       -1.908e+02  1.655e+02  -1.153 0.248975    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30810000 on 1559 degrees of freedom
## Multiple R-squared:  0.9229,	Adjusted R-squared:  0.9158 
## F-statistic: 129.6 on 144 and 1559 DF,  p-value: < 2.2e-16

Note that the outputs give us a lot of information. We see the default hypotheses tested, with p-values for each coefficient and for the model as a whole. We see the RSE and the degrees of freedom, F-test, R-squared values, and much more! There are a lot of insights that we can extract from this simple output.

This post is licensed under CC BY 4.0 by the author.