Up Next
Go up to 4 Examples Using glmlab
Go forward to 4.2 Example: Log-Linear Models

4.1 Example: Multiple Regression

Table 4.1 shows (artificial) data recording the volume of a toxic chemical that is produced as a by-product in a certain industrial manufacturing process.

VolumeTemperatureWeight of CatalystMethod
(in litres)(in deg. C)(in kg) 
30901.5A
39851.0A
26701.5B
36802.0A
22801.0B
18852.5B
32901.0A
26852.0B
Toxic Chemical Production Data
 
The Initial glmlab Screen
 
 

glmlab is started by typing glmlab at the MATLAB prompt, producing the initial glmlab screen given in Figure 4.1. The data is stored in the file chemical.mat in the data subdirectory of glmlab. This file can be loaded using the LOAD Data File option from the glmlab File menu. By default, the window opens in the glmlab data folder. To load the same file at the MATLAB prompt, type the following:

>> load chemical       %if not using the glmlab menu

After loading this file, check to see what variables have been loaded (using MATLAB's who command). Looking at the variable Chemicalhelp will also be useful--type Chemicalhelp at the MATLAB prompt.  

>> Chemicalhelp
Chemicalhelp =
The file  chemical  contains these variables:                 
 Cat:    The weight of catalyst (in kg)                       
 Method: The method used to produce the chemical (qualitative)
 Temp:   The temperature of the manufacturing process         
 Vol:    The volume of toxic by-product produced.             
Notice that variable names have been used with the first letter capitalised. This is a precaution against using names reserved by MATLAB or glmlab. (For example, length is a reserved word in MATLAB used for defining the length of a vector. If length was used as a variable name, any MATLAB code that used the function length would not work).


 When defining variables, use names with capitalised first letters since most MATLAB commands contain all lower case letters.

Remember that Method is a qualitative (or categorical) variable. To represent Method, a `1' has been used for Method A and a `2' for Method B.

To begin fitting, start glmlab; the initial screen (Figure 4.1) should again appear. If glmlab is already running, glmlab needs to be told that a new model is to be fitted. To do this, choose Declare New Model from the Options menu.


 If glmlab is running and a new model is to be fitted or new data is to be loaded, choose Declare New Model from the Options menu. This clears all the internal settings and resets the default options.
For demonstration purposes only, one variable at a time will be included in the model. The variables are then entered into their place: The response variable is Vol and the first covariate is Temp. When these are entered into the appropriate places in the glmlab Window, and the FIT MODEL button is pressed, the following parameter estimates appears on the screen:
 -----------------------------------------
   Estimate        S.E.      Variable
 -----------------------------------------
  12.000000    36.091550   Constant        
   0.200000     0.433023   Temp
 -----------------------------------------
Deviance:          334.000000            Link: ID
Residual df:                 6   Distribution: NORMAL
Scale parameter (dispersion parameter):        55.666667
Variables names are listed in right column labelled Variable and the corresponding parameter estimates are given on the left (in the Estimate column). The column labelled S.E. contains the standard errors for the parameter estimates. The deviance listed under the parameter estimates is a measure of the goodness-of-fit of the model. For normal regression models only, the deviance is equivalent to the residual sum-of-squares. The scale parameter in the case of a normal distribution only is an estimate of the residual variance. The scale parameter is always found by dividing the residual deviance by the residual degrees of freedom. Suppose that another variable is included, say Cat. This variable is then added to the covariate list, so that the glmlab screen will appear as in Figure 4.1. Pressing the FIT MODEL button produces the following parameter estimates:
 -----------------------------------------
   Estimate        S.E.      Variable
 -----------------------------------------
  22.404762    37.179998   Constant       
   0.172619     0.430573   Temp
  -5.202381     4.980572   Cat 
 -----------------------------------------
Deviance:           274.172619    (change:    -59.827381)
Residual df:                 5    (change:            -1)
Scale parameter (dispersion parameter):        54.834524
Variables Entered for Chemical Data
 

The deviance is given again, but also the change in the deviance. For normal distribution models, this change in the deviance (and corresponding change in the degrees of freedom) can be recorded in a analysis of variance table for testing. The change in deviance is equivalent to a change in the sum-of-squares for a normal distribution model only. The sign of the change in deviance is important: The negative sign implies that the deviance has become smaller than the last fitting (and therefore that the sum of the squared residuals are smaller1). The last variable to include is Method. This variable is qualitative and not quantitative (like Vol, Temp and Cat). So to indicate to glmlab that Method is qualitative, include Method in the covariate list using the fac command. The fac command indicates to glmlab that the variable is qualitative rather than quantitative (the glmlab default). See Figure 4.1.

Entering Data Using the fac Command
 


 When using qualitative variables, remember to use the fac command.

Pressing FIT MODEL again produces these estimates:

 -----------------------------------------
   Estimate        S.E.      Variable
 -----------------------------------------
  66.452830    22.668419   Constant             
  -0.354717     0.264890   Temp       
  -1.169811     2.814611   Cat        
 -13.028302     3.447181   Method(2)
 -----------------------------------------
Deviance:            59.981132    (change:   -214.191487)
Residual df:                 4    (change:            -1)
Scale parameter (dispersion parameter):        14.995283
The output includes a term labelled Method(2). This indicates that the parameter estimate corresponds to the second level of the variable Method (that is, Method B). We could therefore write the model as
V=66.45-0.35T-1.17C-13.03M,
where

V=the volume of the toxic chemical produced;
T=the temperature of the process;
C=the amount of catalyst used;
M_2=1 if Method B is used (and is 0 otherwise).

All the variables are now fitted. To record all the important information, glmlab produces a file called DETAILS.m. (On some Windows machines, the file name may be all lower case.) The file is stored in the /glmlab/glmlog directory. The DETAILS.m file for this session should contain information similar to what follows:  

>> type DETAILS

(Created at 3:06:44pm on 19-Apr-96.)
  Deviance        Change   df  Change   Variables
 334.000000                 6           [Vol];[Const,Temp]
 274.172619    -59.827381   5    -1     [Vol];[Const,Temp,Cat]
  59.981132   -214.191487   4    -1     [Vol];[Const,Temp,Cat,fac(Method)]

Whenever a new model is declared (in the Options menu) or whenever glmlab is started, glmlab determines if the file called DETAILS.m is in the appropriate directory. If so, it is deleted without warning so that a new DETAILS.m file can be created. This file must be copied to another file if the information needs to be kept.


 Remember to copy the file DETAILS.m to another file if the information is to be kept, as DETAILS.m is overwritten whenever a new model is declared and whenever glmlab is started.
Completing the analysis would include looking at some of the residual plots.  
Peter Dunn

Up Next