## Generalisation of the FP procedure for semi-continuous variables in epidemiology and clinical research. The SP@Z study.

**Project team**: Heiko Becher, Eva Lorenz

**External Collaborators**: Willi Sauerbrei, Carolin Jenkner, Freiburg; Patrick Royston, London

**Funding**: DFG (BE 2056/10-1)

**Funding period**: 2011-2014

**Status:** completed

A common task in epidemiology is to estimate the dose–response function for a continuous exposure. Often a proportion of subjects is unexposed. Typical examples are cigarette consumption, alcohol intake, or occupational exposures. The question arises as to how to model such variables statistically.

In this project we contribute to the theory, practical procedures and application in epidemiology and clinical research to derive multivariable models with an emphasis on dose-response-relationships in spike at zero situations. This will result in a further extension of the modelling techniques using the class of fractional polynomials to determine whether non-linear functions improve the fit of data. In the first part of the project an extension of the fractional polynomial method of modelling continuous exposure is developed. A binary variable for the unexposed fraction is added to the model. In a two-stage procedure, we assess whether the binary variable and/or the continuous function for the exposed individuals is required for a good fit to the data. Extension to the multivariable situation is described.

Three data sets with different characteristics are used as illustrations. The analyses of the three studies using the proposed procedure give differing results. In one example, only the binary variable seems to be required. In the other two examples, the binary variable and fractional polynomial functions of the exposure variable are needed. One function is monotonic and the other has a minimum. In the third example, adjusting for confounders has almost no effect on the function selected. In conclusion, the new procedure offers a worthwhile extension of dose–response modelling with an unexposed fraction.

In the second part of the project we theoretically derive the correct dose-response curves for a spike at zero situation under several specific distributional assumptions. We show that under these, the inclusion of a binary variable XE denoting the binary exposure status (yes-no) plus the continuous part of the variable, possibly transformed, yields the correct dose-response curve. For example, if the continuous part is log-normal distributed, the log-transformed variable must be included into the model.

In the second part of the project we theoretically derive the correct dose-response curves for a spike at zero situation under several specific distributional assumptions. We show that under these, the inclusion of a binary variable XE denoting the binary exposure status (yes-no) plus the continuous part of the variable, possibly transformed, yields the correct dose-response curve. For example, if the continuous part is log-normal distributed, the log-transformed variable must be included into the model. We will directly build on recent results in which we have investigated a selected set of distributions (e.g. normal, log normal, gamma) in the framework of a logistic regression model.

Third, we investigate the performance of the suggested FP-spike procedure by means of a simulation study. We use the data structure of a case-control study and simulate a log normal distributed variable with different combinations of fractions unexposed and different peaks in cases and controls, representing smaller or larger effects. For stronger effects in the continuous part the FP procedure identifies a suitable dose-response function almost always. In other cases, the FP part is dropped from the model giving a simple yes-no exposure step function. The resulting dose-response-curve, however, generally approximates the true one closely, supporting the procedure.

For specific epidemiological or clinical situations we will investigate and compare possible metrics for continuous variables which can be differently defined or which consist of correlated variables and can be summarized in an index trying to cover a multidimensional phenomenon.

**Publications**:

Royston P, Sauerbrei W, Becher H: Modelling continuous exposures with a 'spike' at zero: a new procedure based on fractional polynomials. Statistics in Medicine 29:1219-27.

Lorenz E, Jenkner C, Sauerbrei W, Becher H. Dose-response modelling for bivariate covariables with and without a 'Spike' at Zero: Theory and applications to binary outcomes. Statistica Neerlandica. 2015;69(4):374-98.