## Binomial GLM in Splus

### 1999-03-23

Because this example is a designed experiment and both independent variables are factors, there is a binomial n > 1 for each design point. An observational regression experiment with a linear independent variable might have no two observations with the same x-value.

```> remission<-data.frame(rem=c(18,27,14,25),sex=factor(c(1,1,2,2)),treat=factor(c
(1,2,1,2)),n=c(30,30,30,30))
> remission
rem sex treat  n
1  18   1     1 30
2  27   1     2 30
3  14   2     1 30
4  25   2     2 30```

#### Dependent Variable: Bound columns of "Success" and "Failure" counts

```> fit1<-glm(cbind(rem,n-rem)~sex*treat,family = binomial(link = logit),
+ data=remission,subset=n>0)
> summary(fit1)

Call: glm(formula = cbind(rem, n - rem) ~ sex * treat, family = binomial(link =
logit), data = remission, subset = n > 0)

Coefficients:
Value Std. Error     t value
(Intercept)  1.01964905  0.2349443  4.33996088
sex -0.28169579  0.2349443 -1.19898970
treat  0.88368219  0.2349443  3.76124133
sex:treat -0.01219754  0.2349443 -0.05191674

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 18.23268 on 3 degrees of freedom

Residual Deviance:  0 on 0 degrees of freedom

Number of Fisher Scoring Iterations: 3

Correlation of Coefficients:
(Intercept)        sex      treat
sex -0.1532238
treat  0.3821937  -0.1419910
sex:treat -0.1419910   0.3821937 -0.1532238
> anova(fit1,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: cbind(rem, n - rem)

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                     3   18.23268
sex  1  1.43362         2   16.79906 0.2311748
treat  1 16.79637         1    0.00270 0.0000416
sex:treat  1  0.00270         0    0.00000 0.9585671
> fit2<-update(fit1,.~.-sex:treat)
> summary(fit2)

Call: glm(formula = cbind(rem, n - rem) ~ sex + treat, family = binomial(link =
logit), data = remission, subset = n > 0)
Deviance Residuals:

-0.02064035 0.03350603 0.02026137 -0.02719909

Coefficients:
Value Std. Error   t value
(Intercept)  1.0179709  0.2323865  4.380508
sex -0.2770531  0.2169754 -1.276887
treat  0.8818647  0.2320043  3.801070

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 18.23268 on 3 degrees of freedom

Residual Deviance: 0.002699 on 1 degrees of freedom

Number of Fisher Scoring Iterations: 3

Correlation of Coefficients:
(Intercept)        sex
sex -0.1028273
treat  0.3668905  -0.0855042
> anova(fit2,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: cbind(rem, n - rem)

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                     3   18.23268
sex  1  1.43362         2   16.79906 0.2311748
treat  1 16.79637         1    0.00270 0.0000416```

#### Dependent Variable: Proportion of "Success" weighted by binomial n

This analysis corresponds exactly to the derivation in McCullagh & Nelder, where the mean is the binomial proportion.

```> fit3<-glm(rem/n~sex*treat,family=binomial(link=logit), data = remission, subse
t = n > 0, weight=n)
> summary(fit3)

Call: glm(formula = rem/n ~ sex * treat, family = binomial(link = logit), data =

remission, weights = n, subset = n > 0)

Coefficients:
Value Std. Error     t value
(Intercept)  1.01964905  0.2349448  4.33995087
sex -0.28169579  0.2349448 -1.19898694
treat  0.88368219  0.2349448  3.76123265
sex:treat -0.01219754  0.2349448 -0.05191662

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 18.23268 on 3 degrees of freedom

Residual Deviance:  0 on 0 degrees of freedom

Number of Fisher Scoring Iterations: 5

Correlation of Coefficients:
(Intercept)        sex      treat
sex -0.1532273
treat  0.3821965  -0.1419945
sex:treat -0.1419945   0.3821965 -0.1532273
> anova(fit3,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: rem/n

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                     3   18.23268
sex  1  1.43362         2   16.79906 0.2311748
treat  1 16.79637         1    0.00270 0.0000416
sex:treat  1  0.00270         0    0.00000 0.9585671
> fit4<-update(fit3,.~.-sex:treat)
> summary(fit4)

Call: glm(formula = rem/n ~ sex + treat, family = binomial(link = logit), data =
remission, weights = n, subset = n > 0)
Deviance Residuals:
1          2          3           4
-0.02064035 0.03350603 0.02026137 -0.02719909

Coefficients:
Value Std. Error   t value
(Intercept)  1.0179709  0.2323868  4.380503
sex -0.2770531  0.2169756 -1.276886
treat  0.8818647  0.2320046  3.801066

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 18.23268 on 3 degrees of freedom

Residual Deviance: 0.002699 on 1 degrees of freedom

Number of Fisher Scoring Iterations: 5

Correlation of Coefficients:
(Intercept)        sex
sex -0.1028275
treat  0.3668920  -0.0855044
> anova(fit4,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: rem/n

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                     3   18.23268
sex  1  1.43362         2   16.79906 0.2311748
treat  1 16.79637         1    0.00270 0.0000416```

#### Expand the data frame so that there is one row per subject, scoring 1 for "success" and 0 for "failure"

```> remission.expand <- data.frame(rem = c(rep(1, sum(rem)), rep(0, sum(n - rem))),
sex = c(rep(sex, rem), rep(sex, n - rem)), treat = c(rep(treat, rem),
rep(treat, n - rem)))
> remission.expand
rem sex treat
1   1   1     1
2   1   1     1
3   1   1     1
4   1   1     1
5   1   1     1
6   1   1     1
7   1   1     1
8   1   1     1
9   1   1     1
10   1   1     1
11   1   1     1
12   1   1     1
13   1   1     1
14   1   1     1
15   1   1     1
16   1   1     1
17   1   1     1
18   1   1     1
19   1   1     2
20   1   1     2
21   1   1     2
22   1   1     2
23   1   1     2
24   1   1     2
25   1   1     2
26   1   1     2
27   1   1     2
28   1   1     2
29   1   1     2
30   1   1     2
31   1   1     2
32   1   1     2
33   1   1     2
34   1   1     2
35   1   1     2
36   1   1     2
37   1   1     2
38   1   1     2
39   1   1     2
40   1   1     2
41   1   1     2
42   1   1     2
43   1   1     2
44   1   1     2
45   1   1     2
46   1   2     1
47   1   2     1
rem sex treat
48   1   2     1
49   1   2     1
50   1   2     1
51   1   2     1
52   1   2     1
53   1   2     1
54   1   2     1
55   1   2     1
56   1   2     1
57   1   2     1
58   1   2     1
59   1   2     1
60   1   2     2
61   1   2     2
62   1   2     2
63   1   2     2
64   1   2     2
65   1   2     2
66   1   2     2
67   1   2     2
68   1   2     2
69   1   2     2
70   1   2     2
71   1   2     2
72   1   2     2
73   1   2     2
74   1   2     2
75   1   2     2
76   1   2     2
77   1   2     2
78   1   2     2
79   1   2     2
80   1   2     2
81   1   2     2
82   1   2     2
83   1   2     2
84   1   2     2
85   0   1     1
86   0   1     1
87   0   1     1
88   0   1     1
89   0   1     1
90   0   1     1
91   0   1     1
92   0   1     1
93   0   1     1
94   0   1     1
rem sex treat
95   0   1     1
96   0   1     1
97   0   1     2
98   0   1     2
99   0   1     2
100   0   2     1
101   0   2     1
102   0   2     1
103   0   2     1
104   0   2     1
105   0   2     1
106   0   2     1
107   0   2     1
108   0   2     1
109   0   2     1
110   0   2     1
111   0   2     1
112   0   2     1
113   0   2     1
114   0   2     1
115   0   2     1
116   0   2     2
117   0   2     2
118   0   2     2
119   0   2     2
120   0   2     2
> remission.expand\$sex<-factor(remission.expand\$sex)
> remission.expand\$treat<-factor(remission.expand\$treat)```

#### Dependent Variable: Success/Fail binary score unweighted

Note that when we fit the saturated model we get the same estimates and tests as we got from the previous fits, but there are now 116 degress of freedom for the residual deviance. Because the observed scores are either 0 or 1 but, in this example, the fitted binomial probabilities are never 0 or 1, the residual deviance can't be zero.

```> fit5<-glm(rem~sex*treat,data=remission.expand,family=binomial(link=logit))
> summary(fit5)

Call: glm(formula = rem ~ sex * treat, family = binomial(link = logit), data =
remission.expand)
Deviance Residuals:
Min        1Q    Median       3Q      Max
-2.145962 -1.121257 0.4590455 1.010768 1.234617

Coefficients:
Value Std. Error     t value
(Intercept)  1.01964685  0.2347573  4.34340879
sex -0.28169360  0.2347573 -1.19993549
treat  0.88368000  0.2347573  3.76422822
sex:treat -0.01219535  0.2347573 -0.05194875

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 146.6074 on 119 degrees of freedom

Residual Deviance: 128.3747 on 116 degrees of freedom

Number of Fisher Scoring Iterations: 4

Correlation of Coefficients:
(Intercept)        sex      treat
sex -0.1519319
treat  0.3812090  -0.1406812
sex:treat -0.1406812   0.3812090 -0.1519319
> anova(fit5,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: rem

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                   119   146.6074
sex  1  1.43362       118   145.1738 0.2311748
treat  1 16.79637       117   128.3774 0.0000416
sex:treat  1  0.00270       116   128.3747 0.9585671
> fit6<-update(fit5,.~.-sex:treat)
> summary(fit6)

Call: glm(formula = rem ~ sex + treat, family = binomial(link = logit), data =
remission.expand)
Deviance Residuals:
Min        1Q   Median       3Q      Max
-2.137429 -1.118173 0.463493 1.007725 1.237822

Coefficients:
Value Std. Error   t value
(Intercept)  1.0179706  0.2323177  4.381805
sex -0.2770530  0.2169430 -1.277078
treat  0.8818644  0.2319361  3.802188

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 146.6074 on 119 degrees of freedom

Residual Deviance: 128.3774 on 117 degrees of freedom

Number of Fisher Scoring Iterations: 4

Correlation of Coefficients:
(Intercept)        sex
sex -0.1027515
treat  0.3665306  -0.0854362
> anova(fit6,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: rem

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                   119   146.6074
sex  1  1.43362       118   145.1738 0.2311748
treat  1 16.79637       117   128.3774 0.0000416```

#### Dependent Variable: Success/Fail binary score, data grouped by unique rows, weighted by counts

This analysis gives exactly the same residual deviances as the previous analysis, but the residual from the saturated model has only 4 degrees of freedom.

```> remission.group
rem count sex treat
1   1    18   1     1
2   0    12   1     1
3   1    27   1     2
4   0     3   1     2
5   1    14   2     1
6   0    16   2     1
7   1    25   2     2
8   0     5   2     2
+ weight=count)
> summary(fit7)

Call: glm(formula = rem ~ sex * treat, family = binomial(link = logit), data =
remission.group, weights = count)
Deviance Residuals:
1         2       3         4        5         6        7         8
4.288324 -4.689454 2.38527 -3.716916 4.619515 -4.485028 3.019284 -4.232918

Coefficients:
Value Std. Error     t value
(Intercept)  1.01964685  0.2347573  4.34340879
sex -0.28169360  0.2347573 -1.19993549
treat  0.88368000  0.2347573  3.76422822
sex:treat -0.01219535  0.2347573 -0.05194875

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 146.6074 on 7 degrees of freedom

Residual Deviance: 128.3747 on 4 degrees of freedom

Number of Fisher Scoring Iterations: 4

Correlation of Coefficients:
(Intercept)        sex      treat
sex -0.1519319
treat  0.3812090  -0.1406812
sex:treat -0.1406812   0.3812090 -0.1519319
> anova(fit7,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: rem

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                     7   146.6074
sex  1  1.43362         6   145.1738 0.2311748
treat  1 16.79637         5   128.3774 0.0000416
sex:treat  1  0.00270         4   128.3747 0.9585671
> fit8<-update(fit7,.~.-sex:treat)
> summary(fit8)

Call: glm(formula = rem ~ sex + treat, family = binomial(link = logit), data =
remission.group, weights = count)
Deviance Residuals:
1        2       3         4        5        6        7         8
4.275416 -4.70127 2.40838 -3.702135 4.631506 -4.47269 3.000915 -4.246047

Coefficients:
Value Std. Error   t value
(Intercept)  1.0179706  0.2323177  4.381805
sex -0.2770530  0.2169430 -1.277078
treat  0.8818644  0.2319361  3.802188

(Dispersion Parameter for Binomial family taken to be 1 )

Null Deviance: 146.6074 on 7 degrees of freedom

Residual Deviance: 128.3774 on 5 degrees of freedom

Number of Fisher Scoring Iterations: 4

Correlation of Coefficients:
(Intercept)        sex
sex -0.1027515
treat  0.3665306  -0.0854362
> anova(fit8,test="Chisq")
Analysis of Deviance Table

Binomial model

Response: rem

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev   Pr(Chi)
NULL                     7   146.6074
sex  1  1.43362         6   145.1738 0.2311748
treat  1 16.79637         5   128.3774 0.0000416
>```