Modelling Discrete Data
To extract knowledge from measurements, we need a mathematical model of the system. For an (apparently) simple measurement task such as using a steel rule to measure the length L of a wooden rod, there is may be a straightforward relationship between the measurements and the quantities of interest, e.g., the reading x from the ruler scale is an estimate of the length L. In more complicated measurement experiments the relationship between the measurements and the parameters of interest is generally less straightforward.
Introduction
If we are interested in an accurate estimate of the rod length, we must take into account the effect of temperature and bending on the rule and rod, the squareness of the ends of the rod, the effect of humidity on the wood, etc.
In general terms, the mathematical model predicts the response of the system y (e.g. scale reading) as a function of the variables x = (x1,…,xp) (e.g., temperature, humidity) and the unknown model parameters a = (a1,…,an) (e.g., length of the rod): y = f(x, a). The goal of a measurement experiment is to determine estimates of a from measurements of responses yi of the system corresponding to variables xi. This is usually done by solving a set of equations involving the parameters a and data (yi,xi).
The equations relating the response of the system to variables and parameters represent one aspect of the model. A second aspect arises from the fact that measurements have uncertainties associated with them which feed through to uncertainties in the parameter estimates. This means that in solving for the parameters a, it is necessary to take into account the uncertainties in measurements in order to determine estimates that are most consistent with and make best use of the data.
We can assess the effectiveness of different estimation approaches by using Monte Carlo simulation to examine the variation in parameter estimates. Suppose a system is modelled as a linear response y = a1 + a2x depending on one variable and two parameters: estimates of the parameters a1 and a2 can be found from fitting a best fit line to data (see Figure 1). Figure 2 shows the variation in the estimates of parameter a1 for three estimation algorithms, the first using an optimal approach (least variation), the second using inappropriate weights for the data, and the third using only two of the data points (most variation). Designing a good parameter estimation approach is a crucial step in getting the most from the data.
|
|
|
Further Reading
- SSfM Best Practice Guide 4: Discrete Modelling
- SSfM Best Practice Guide 5: MetroS, the Software Reuse Library
This tutorial is an abridgement of an article that first appeared in Counting on IT Issue 9.


