CHAPTER
6
The topic of the last chapter is about evaluation of a function at any given location, however, one may be only interested in evaluation of a function at its maximum or minimum, and not quite so elsewhere. This problem is called optimization. This chapter discusses several frequently-used optimization techniques in crystallographic computing. When choosing an appropriate means of optimization, object function and its functional complexity in the domain of search, also known as energy landscape, are the primary concerns. The topic of optimization is largely out the scope of this book. CCL and CPL offer only a few basic optimization tools.
6.1 Locate an
Element in a Monotonic Sequence
Given a monotonically increasing or decreasing sequence s of objects and another object x, one may need to find where the object x shall fit in the sequence. For example, it is often desired to find an appropriate resolution bin for any given reflection during a statistical calculation.
6.1.1 Template functions mac::locate and mac::hunt
template <class S, class T>
long locate(const S&, const T&, const
long = -1);
template <class S, class T>
long hunt(const S&, const T&,
const long = -1);
These two functions have the exact same prototype, where S is a sequence type, and must support operator [] and function size(). The element of the sequence must have a comparable type T. These functions return an index j that s[j] <= x < s[j + 1] or s[j] >= x > s[j + 1]. If j = -1 or j = s.size() ¨C 1, the given object x is out of range towards the beginning or end of the sequence, respectively. Both functions may accept an initial guess for where the given object may locate. hunt is a factor of 2 slower than locate at the worst case, but it can be a factor of log2(n) faster than locate, if the desired point is quite close to the input guess. If a valid initial guess given to locate, it calls hunt. If no valid initial guess, hunt is the same as locate.
template <class S, class T>
long locate(const S&, const long,
const T&, const long = -1);
template <class S, class T>
long hunt(const S&, const long,
const T&, const long = -1);
These two functions behave the same as the previous ones, except that the sequence type S does not have to support size() function, but the size of the sequence must be given by the second argument.
#include <assert.h>
#include <math.h>
#include <vector.h>
#include <mac/locate.h>
main()
{
const size_t n = 20;
std::vector<double> s(n);
for (int i = 0; i < n; i++)
s[i] = log(i + 1);
// s is
populated with an ascending sequence.
for (double t = -1; t < 4; t +=
0.1)
{
const long k =
mac::locate(s, t);
assert (0 <=
k && k < n - 1
&& s[k] <= t && t < s[k + 1] ||
0 > k
&& s[k] > t
||
k == n - 1 && s[k] <= t);
assert (k ==
mac::hunt(s, t, long(n / 2)));
}
// Repeat the previous section,
but specify the size of the vector.
const long m = n / 2;
for (double t = -1; t < 4; t +=
0.1)
{
const long k =
mac::locate(s, m, t);
assert (0 <=
k && k < m - 1
&& s[k] <= t && t < s[k + 1] ||
0 > k
&& s[k] > t
||
k == m - 1 && s[k] <= t);
assert (k ==
mac::hunt(s, m, t, m / 2));
}
return 0;
}
Listing 6.1.1.0.1 Testing mac::locate and mac::hunt.
6.1.2 locate and hunt functions wrapped in cpl.mac.op
These functions are available in cpl.mac.op instantiated with several types.
locateVTYP(s,
x, i = -1)
huntVTYP(s, x, i = -1)
The monotonic sequence s must be cpl.std.vectorTYP, where available TYPs are DBL, FLT, INT, SIN, LUI, UIN, and SUI (Table 1.3.2.0.1).
locateVsDBL(s, l, x, i = -1)
huntVsDBL(s, l, x, i = -1)
The monotonic sequence s must be cpl.std.vectorDBL. l is an effective size of the sequence, which must be no greater than the real size of s. If l is small than the real size, only the first l elements will be used.
locateADBL(s,
l, x, i = -1)
huntADBL(s, l, x, i = -1)
The monotonic sequence s must be a C array of doubles, which is the returned type of cpl.ptr.new_ppDBL() or cpl.ptr.copy_ppDBL(). l must be no greater than the real size of the array.
#!/usr/bin/python
import cpl
import math
n = 20
s = cpl.std.vectorDBL()
for i in range(n):
s.append(math.log(i + 1))
# s is populated with an ascending
sequence.
for i in range(-10, 40):
t = 0.1 * i
k = cpl.mac.op.locateVDBL(s, t)
assert 0 <= k and k < n - 1 and s[k] <= t and t < s[k +
1] or\
0 > k
and s[k] > t
or\
k == n - 1 and s[k] <= t
assert k == cpl.mac.op.huntVDBL(s,
t, n / 2)
# Repeat the previous
section, but specify the size of the vector.
m = n / 2
for i in range(-10, 40):
t = 0.1 * i
k = cpl.mac.op.locateVsDBL(s, m,
t)
assert 0 <= k and k < m - 1 and s[k] <= t and t < s[k +
1] or\
0 > k
and s[k] > t
or\
k == m - 1 and s[k] <= t
assert k == cpl.mac.op.huntVsDBL(s,
m, t, m / 2)
# Make a C array that
contains the reverse order of s.
r = []
for i in range(n):
r.append(s[i])
# Copy vectorDBL to a list
r.reverse()
# Reverse the order
a =
cpl.mathematics.sequenceToArray(r) # Make a C array from the list
app =
cpl.ptr.copy_ppDBL(a.cast()) # Need a pointer to
pointer to double
for i in range(-10, 40):
t = 0.1 * i
k = cpl.mac.op.locateADBL(app, n,
t)
assert 0 <= k and k < n - 1 and a[k] >= t and t > a[k +
1] or\
0 > k
and a[k] < t
or\
k == n - 1 and a[k] >= t
assert k == cpl.mac.op.huntADBL(app,
n, t, n / 2)
cpl.ptr.delete_ppDBL(app)
# Clean up
Listing 6.1.2.0.1 Testing functions locate and hunt in Python.
6.2 Golden Section Search
If one has to find a minimum of a smooth, 1-dimentional function within a known range, golden section search is a robust algorithm of choice.
6.2.1 Template function mac::golden
template <class X, class Y>
Y golden(const X& xa, const X& xb,
const X& xc,
Y (*f)(const X&), X *xmin,
const X& t = 0.0001);
Given a 1-dimensioanl function f with domain type X and function type Y, this function returns the minimum of the function and modifies xmin by the abscissa of the minimum. The golden section search is performed between abscissas xa and xc. xb is an initial guess where the functional minimum shall locate, that is, f(xb) is less than both f(xa) and f(xc). The search stops when a fractional precision of about t is reached.
#include <iostream.h>
#include <math.h>
#include <mac/golden.h>
double func(const double&
x)
{
return -cos(x - 0.2);
}
main()
{
double xmin;
const double ymin =
mac::golden(-0.5, 0.1, 0.5, func, &xmin);
cout << "xmin, ymin
shall be close to 0.2, -1, respectively." << endl;
cout << "xmin = "
<< xmin << endl;
cout << "ymin = "
<< ymin << endl;
return 0;
}
Listing 6.2.1.0.1 Testing mac::golden().
6.2.2 cpl.mac.op.goldenDBL
goldenTYP(xa, xb, xc, f, t = 0.0001)
Given a pointer to a C function template <class T> T (*f)(const T&), returns a pair containing the minimum functional value of f and the variable value where the minimum is found between xa and xc. xb is a hint where the minimum can be found. t is a fractional tolerance. The available types are DBL and FLT.
#include
<math.h>
double
y(const double& x)
{
return -cos(x - 0.2);
}
Listing 6.2.2.0.1 Call-back function in C++ for cpl.mac.op.goldenDBL.
%module
cb
%{
#include
"cb.h"
%}
%include
"cb.h"
%constant
double (*goldenDBL_cb)(const double&) = y;
Listing 6.2.2.0.2 SWIG instruction that wraps the call-back function into Python.
#!/usr/bin/python
import
cpl, cb
ymin,
xmin = cpl.mac.op.goldenDBL(-0.5, 0.1, 0.5, cb.goldenDBL_cb, 1e-6)
print "xmin, ymin shall
be close to 0.2, -1, respectively."
print "xmin =",
xmin
print
"ymin =", ymin
Listing 6.2.2.0.3 Testing goldenDBL in Python.
xmin,
ymin shall be close to 0.2, -1, respectively.
xmin
= 0.199689437822
ymin
= -0.999999951776
Listing 6.2.2.0.4 Output result from the testing program above.
This is a trivial example, but illustrates the usage of call-back function. When the template function mac::golden() is wrapped into cpl.mac.io, the call-back function it requires must be wrapped from C++ too. (Using call-back function in Python?) The SWIG instruction defines the wrapped call-back function as goldenDBL_cb.
6.3 Linear Least-squares Minimization
Consider
a set of experimental measurements and a theoretical model intended to describe
the experiment. It is desired to
adjust several parameters in the model in order to fit the noise-containing
experimental data as well as possible.
This process is also known as parameterization. Here ¡®fit¡¯ may be under the
least-squares sense when the noise distributes normally. However, the least-squares may not be
the best in case of other noise distributions, as we will see later. When the theoretical model can be
written as a linear equation
, where ai
are the coefficients to be adjusted and fi(x) are a set of base functions, the
least-squares minimization can be very straightforward.
6.4 Nonlinear
Least-squares Minimization
When the theoretical model cannot be described by a linear equation as it often happens in crystallographic computing, nonlinear minimization becomes useful. This section presents some template classes that perform large-scale non-linear least-squares fitting against various types of experimental measurements using a trust-region implementation of Levenberg-Marquardt method.
The
theoretical model can be written as a nonlinear function Y = f(x, A), where A is a set of adjustable
parameters A = (a1, a2, ¡, an). A set
of experimental measurements (xj,
yj), j = 1, 2, ¡, m, is
available, each is also known as a data point. Theoretical calculation using the model
at the experimental abscissas xj
results in (xj, Yj), j = 1, 2, ¡, m. A best set of parameters ai in terms of least-squares
fitting can be found when
is minimized.
6.4.1 Base class mac::nonlinear<Y, X>
This base class is an abstract template. Use the derived classes for various measurement types as described below. The parameters ai are always of type double. If they are some types other than double, e.g., complex<double>, they must be split into doubles. Y is the type of both the measurements yi and the theoretical model Y. X is the type of the variable x.
6.4.1.1 Constructor
and destructor
nonlinear(Y (*)(const X&,
const
std::vector<X>&,
const
std::vector<Y>&,
const
std::vector<double>&,
const
std::vector<bool>&,
std::vector<Y>*),
const std::vector<X>& x,
const std::vector<Y>& y,
const std::vector<double>& para,
const std::vector<bool>& mask,
const
std::vector<double>& weight,
const int iter,
const double tol,
const double
ratio);
The first argument to the constructor is a function pointer. The user supplied function shall evaluate the theoretical model Y = f(x, A) at given x and parameter set A. See 6.4.1.2 for detail.
The second and third arguments x and y are vector<X> and vector<Y> of a same length, respectively. They contain the entire set of measurement data. weight has the same length, and specifies the weight of all data points.
The forth and fifth arguments para and mask are vectors of a common length. para brings in the initial parameters. mask is a set of Booleans that indicates whether the corresponding parameters are free for fitting.
iter is a maximum iteration allowed in the least-squares minimization. tol is a fractional precision that triggers the end of the minimization. ratio specifies the maximum data-to-parameter ratio.
virtual ~nonlinear();
This a virtual destructor.
6.4.1.2 Evaluation functions of
theoretical model
The user supplied evaluation function is required to calculate the function value and partial derivatives of the function to all adjustable parameters. This function can be specified at the time of construction. It can again be changed later by a set function.
void setTheoretical(
Y (*)(const X&
u,
const
std::vector<X>& x,
const
std::vector<Y>& y,
const
std::vector<double>& para,
const
std::vector<bool>&
mask,
std::vector<Y>* deri));
This function simply takes a function pointer and have the class remember it being the evaluation function for the theoretical model Y = f(x, A) at the given x = u, and a set of specified parameters A = para. This evaluation function calculates one model value at a time. The previous function pointer is forgotten once a new pointer is set. The entire data set of experimental measurements is also given as x and y, since they might be required during the evaluation. If the last pointer deri is valid, the vector pointed by deri will be modified by partial derivatives of the function to all parameters. mask is a screen that indicates the free or fixed parameters. If a parameters is fixed, the partial derivative to this parameter needs not and will not be evaluated.
void testCalculation(const int = 0,
const double = 0.001) const;
This function tests the user-supplied evaluation function. Given a 0-offset index of a data point, and a small value as interval step for all parameters, this function calls the user-supplied evaluation function and print out the function values and partial derivatives in a table. The interval is used when estimating partial derivatives using the function evaluation. One may check whether the function value and partial derivatives are consistent. The default data point is the first one and the default interval is 0.001.
void testCalculation(
const int,
const std::vector<double>&) const;
Given a 0-offset index of a data point, and a vector of small numbers for intervals that correspond to all parameters, this function tests the user-supplied evaluation function.
void setTheoreticalAll(std::vector<Y>
(*)(
const std::vector<X>&
x,
const std::vector<Y>&
y,
const std::vector<double>&
para,
const std::vector<bool>&
mask,
std::vector<std::vector<Y>
>* deri));
There is another type of evaluation function that calculates a set of model values at once. This set function sets a function pointer of such type. This type of evaluation function does not take the variable u, instead it evaluates at all values in the vector x, and returns a vector<Y> of the size of x. The last pointer deri points to a vector of vectors. If it is valid, it will be modified by partial derivatives upon return. The outer vector has the same length of x, and the inner ones have that of para after returning from the evaluation function. However, these dimensions may not be correct when entering the evaluation function. This evaluation function has higher priority than the previous one, that is, if it is set, the other function will not be used.
No testing function is available for this type of evaluation function. It may be added in future release.
6.4.1.3 Experimental measurements
Experimental measurements are the noise-containing data that
the parameters will be fitted against.
They are a sequence of pairs {(x1,
y1), (x2, y2),
¡, (xm, ym)}. Each pair is associated with a weight
that indicates the reliability of the data point. This class does not take standard
deviations since the way a weight is calculated based upon standard deviation
may vary. It is the user¡¯s control on
how the weights are evaluated.
Measurement data are set at the time of construction, but more data can
be added later, the data can be selected, and the entire data set can be
exchanged after construction.
void appendMeasurement(const X&,
const Y&,
const double = 1);
void appendMeasurement(
const std::vector<X>&,
const std::vector<Y>&,
const std::vector<double>& =
std::vector<double>());
These functions append one or more data points to the measurement data, respectively. The last arguments double or vector<double> supply the weights. If they are not given, the weights are set to 1. The additional data points will be used in the next fitting if no screening is done (see below).
void exchangeMeasurement(
const std::vector<X>&,
const std::vector<Y>&,
const std::vector<double>& =
std::vector<double>());
This function replaces the entire measurement data set with the newly specified one. The default weights are all 1. All data points will be used in the next fitting if no data screening is done. The old data set will be lost after calling this function.
void screenMeasurement(const double errmax =
0);
Given a maximum allowed error errmax,
this function screens all data points.
If
errmax, the
data point j is temporarily
rejected. Rejected data points will
not be used in the next fitting, but they are not lost and can be reaccepted by
later screening or by the next function useAllMeasurement(). This function may change the order of
the measurement data.
void useAllMeasurement();
Reaccepts all data points. The original sequence of the data points is not restored.
std::vector<X> getDatapoint() const;
std::vector<Y> getMeasurement() const;
Return currently accepted xj¡¯s and yj¡¯s, respectively.
int getMeasurementSize() const;
int getTotalMeasurement() const;
Return the sizes of currently accepted data points and the total data points, respectively.
void printMeasurement(const bool = 0) const;
Prints a message on the status of the current measurement data, e.g., number of accepted and rejected data points. If the Boolean is set, the accepted xj¡¯s and yj¡¯s will also be printed, however it will be a compile error if the operator << is not defined for type X or Y.
6.4.1.4 Model parameters
It is the goal of this class to find a best set of parameters for the theoretical model. The initial values of the parameters can be set at the time of construction, but they may be reset at a later time too. A vector<bool> is used as a mask. If an element is true, the corresponding parameter will be refined; if false, it will be fixed.
void setParameter(const
std::vector<double>&);
Sets the new parameter set. The old parameters will be lost after calling this function.
std::vector<double> getParameter() const;
int getParameterSize() const;
Return the current parameter set or its size, respectively.
void printParameter() const;
Prints a message on the current parameter set.
void setParameterName(
const std::vector<std::string>&);
Sets the names of the parameters. They will be used in printout.
void setMask(const
std::vector<bool>&);
std::vector<bool> getMask() const;
Sets and gets the mask, respectively.
6.4.1.5 Other control
void setDataToParameterRatio(const double =
0);
double getDataToParameterRatio() const;
Sets and gets the maximum data-to-parameter ratio used in fitting. If 0 is given, there will be no control for maximum data-to-parameter ratio. If 0 is returned from the get function, it indicates no control for the maximum.
int numberOfDataUsed();
Returns the number of data points will be actually used in fitting considering the maximum data-to-parameter ratio and number of free parameters. (This function should be a const, but it is not now. It changes a private data member dataUsed. This may be fixed in future releases.)
void setIterationMax(const int);
int getIterationMax() const;
void printIterationMax() const;
Sets, gets, and prints maximum number of iteration in fitting, respectively.
void setTolerance(const double);
double getTolerance() const;
void printTolerance() const;
Sets, gets, and prints a fractional tolerance that controls ending of the least-squares fitting, respectively.
6.4.1.6
Least-squares fitting
double fit(const std::vector<bool>&
=
std::vector<bool>());
This function is the most important one in the class. It performs the least-squares fitting, alters the parameter set and the statistics of the result. It may take a new mask as an option. This function can be called multiple times with different masks.
double fit(const
std::vector<double>&);
Performs the least-squares fitting starting from the specified new parameter set instead of the one before this fitting. The parameters before the fitting are lost.
void unfit();
Undoes the last fitting. The parameter set and statistics before the last fitting are restored, and the parameter set and statistics after the last fitting are saved. However, if a new mask was used when calling the last fitting, this mask remains effective after calling unfit(). If a new parameter set was given to the last fitting, this set will be restored by unfit(), not the parameters before the last fitting. Do not change data-to-parameter ratio between last fitting and calling this function.
6.4.1.7 Statistics
of result
double calculateChiSquare();
double getChiSquare() const;
Calculates and returns
, respectively, where wj
is the weight of data point j. The first function also returns the updated
c2.
double getRMSD() const;
double RMSDDrop() const;
Return root-mean-square deviation defined as
and the fractional
difference between the RMSD before last fitting and the current one,
respectively.
6.4.2 Derived class mac::nonlinearOperator<Y = double, X = int> for operator *
The base class nonlinear<Y,
X> is a pure virtual class since a required function dotProduct() is not defined. If type Y
is a single value such as double,
float, int, or a vector such as mac::vector3D<T>,
use this derived class. The
operator * of type Y must be defined as follow: if two
vectors u = (u1, u2, ¡, un) and v = (v1, v2,
¡, vn), u
* v =
. This derived
class can be used for all types with operator *
defined as above.
nonlinearOperator(Y (*)(const X&,
const std::vector<X>&,
const
std::vector<Y>&,
const
std::vector<double>&,
const std::vector<bool>&,
std::vector<Y>*) = NULL,
const std::vector<X>& x =
std::vector<X>(),
const std::vector<Y>& y =
std::vector<Y>(),
const std::vector<double>& para =
std::vector<double>(),
const std::vector<bool>& mask =
std::vector<bool>(),
const std::vector<double>& weight =
std::vector<double>(),