Differently exposed images (e.g. individual frames of video) of the same subject matter are denoted as vectors: , , , , , , .

Each video frame is some unknown function, , of the actual
quantity of light, falling on the image sensor:
f_i = f(k_i q ( **A_i****x**+**b_i** **c_i****x**+d_i )),
where
denotes the spatial coordinates of the image,
is a single unknown scalar exposure constant, and parameters
, , , and denote the projective
coordinate transformation between successive pairs of images:
is the linear coordinate transformation
(e.g. accounts for magnification in each of the and directions and
shear in each of the and directions), is the translation in each
of these two coordinate directions, and is the projective chirp rate in
each of these two coordinate directions[3].
The additional constant makes the coordinate transformation into a group.

For simplicity, this coordinate transformation is assumed to be able to be independently recovered (e.g. using the methods of [3]). Therefore, without loss of generality, in this paper, it will be taken to be the identity coordinate transformation, which corresponds to the special case of images differing only in exposure.

Without loss of generality, will be called the reference exposure, and will be set to unity, and frame zero will be called the reference frame, so that . Thus we have: 1k_i f^-1 (f_i) = f^-1 (f_0), i, 0<i<I.

Taking the logarithm of both sides, F^-1 (f_i) - K_i = F^-1 (f_0), i, 0<i<I, where , and is the logarithmic inverse camera response function (e.g. a LookUp Table converting pixel values into exposure values).

Re-arranging, we have: F^-1 (f_i) - F^-1 (f_0) = K_i, i, 0<i<I. This relation suggests a way to estimate the camera response function, , from a pair of differently exposed images of the same subject matter. Before estimating the camera response function, we consider how the noise will affect the estimation.