CHAPTER
5
This part of data processing is often called scaling in monochromatic methods, which refers to frame-to-frame scaling and determination of relative temperature factors. In Laue diffraction, a far more important and complex task wavelength normalization plays the major role. Wavelength normalization results in a wavelength-dependent function often called l-curve. This curve looks mostly like the X-ray spectrum of incident beam, when converted to an energy-dependent function. This curve is in fact an overall correction of wavelength dependency, including absorption correction (see 5.2.4 for details).
Each Laue spot is usually associated with a wavelength, called a single, or several discrete wavelengths, called a multiple, until Ren et al. (J. Synchrotron Rad. 6, 891-917, 1999) pointed out that this is a gross oversimplification.
.
This equation reformulated from Ren et al. essentially states that relative bandwidth of a Laue reflection is proportional to crystal mosaicity and inversely proportional to tangent of Bragg angle, and approximately the angle itself in nearly all protein cases. A Laue spot is stimulated by X-rays within an energy band of several hundreds of eV, and can often becomes a partial reflection. This is particularly true at low Bragg angles. A partial reflection in monochromatic oscillation happens since the oscillation range is limited to a small angle of 1° or so. Such partial is called angular partial. A Laue partial happens due to insufficient energy range, therefore Laue partial is called energy partial. Considering energy band for each Laue reflection and energy partials improves data merging from different Bragg angles.
An individual program Epinorm (energy partial improved normalization) carries out all the functionalities described above. What Epinorm really does is data reduction from integrated intensities to structure factor amplitudes.
5.1
Input Data and Control Parameters
An example of command script for scaling is shown below.
diagnostic off
busy
off
warning off
prompt off
result off
@
m37v3_8us_002.mar3450.inp
@
m37v3_8us_004.mar3450.inp
…
@
m37v3_8us_062.mar3450.inp
prompt on
result on
Input
Image initial.lam
Resolution 2.1 100
Wavelength 1 1.5 1.1
Chebyshev 64 unimodal
Spot 8 6
Quit
Scale
3 2 1 m37v3_8us_002.mar3450.ii
Lambda
refined.lam
Apply
1 m37v3_8us.hkl
Stop
Yes
Listing 5.1.0.0.1 Command script to run Epinorm.
The command script of Epinorm shown in Listing 5.1.0.0.1 is very similar to that of Precognition. Misusing a script will generate error message. Just like the one for integration, this script first loads in a set of .inp files of those frames to be scaled together. Bad frames should be excluded from this list. The global printing switches prompt and result help to make the log file more concise.
The Input section is optional. An initial l-curve can be loaded here. See Chapter 4 for format. If no l-curve is loaded, a straight line is assumed. In this case, you should specify a wavelength range. If not, 0.5 to 1 Å is the default. If an initial l-curve is loaded, you may still change the wavelength range by using the Wavelength command. If part of the new wavelength range falls out of the l-curve, 0 intensity is assumed. The Wavelength command can take a third, optional number as the reference wavelength. However, please note that the explicit input of wavelength range and reference will take effect only if this command is after the initial l-curve. If no reference is given, it is automatically determined from the initial l-curve. The shortest wavelength corresponding to the greatest intensity values in the initial l-curve will be chosen as a reference.
The command Resolution specifies a resolution range within which data are loaded for scaling. In most cases, this command is unnecessary, so that data at all resolutions are loaded. If you explicitly specify a resolution range, use the integration range. This command is only useful in special cases, and will be explained elsewhere.
The command Chebyshev can be used to specify a maximum order of Chebyshev polynomials. If not specified, the program may set a default depending on the complexity of the initial l-curve. If a second integer less than or equal to the first one is given, this many Chebyshev terms at higher degree are allowed to have frame specific values, that is, each frame may have its own l-curve. If the second integer is missing, its default is 0, that is, all frames share a single l-curve. The second integer cannot be greater than the first; or it will be ignored except a warning message.
An optional string argument of choices unimodal, bimodal, arbitrary, free, or fix can be given. The first three choices hint the program to find a unimodal, bimodal or arbitrary spectrum, respectively. The program will try to remove some spiky features at both ends of the derived spectrum, if no string argument is given. An explicit argument arbitrary forces the program to leave all spikes unmodified. See Figure 5.1.0.0.1 for an example of spike removal. Option fix signals the program not to refine the spectrum, and free reverses.

Figure 5.1.0.0.1 l-curves of 128-term Chebyshev approximation derived by Epinorm. The dotted and solid lines are before and after spike removal and both ends, respectively.
The command Spot with two numerical arguments initializes crystal mosaicity. Obviously, more streaky the spots, larger the mosaicity it would be. If no Spot is given, the default mosaicity is 0. See 5.2.3 for more. This command also prevent the program from restoring overall mosaicity from a saved parameter file. See 5.3 for details.
If there are some heavy atoms present in your crystal, and if you desire to examine the anomalous scattering signal from them, an additional control command Anomalous in Input section can be used (not shown in Listing 5.1.0.0.1). This command toggles a flag that signals whether anomalous scattering should be considered during data reduction. The default state is off, which fits the most cases. This is the very first point in the entire process where an explicit option can be given, if anomalous scattering should become a concern. However, implicitly but clearly, at the very beginning of the data processing, consistent indexing of all frames in a dataset is crucial to extraction of anomalous signal. One must take great care of such consistency by using all possible means provided in Chapters 2 and 3. If re-indexing cannot be avoided, specify a desired orientation matrix prior to re-indexing as described in 3.8. Switching on this flag would signal the program to separate Friedel pairs, so that each member of a Friedel pair is considered independent of, instead of equivalent to, the other. Rmerge’s calculated later will not include discrepancy between Friedel pairs (See 5.2.6). It is also possible to delay the switch after scaling and before merging of redundant and equivalent data. See 5.4 for detail. It must be noted that these two alternatives reflect different strategies of handling of anomalous signal. The former preserves the maximum amount of anomalous signal, however, may misidentify some systematic errors as anomalous signal. The latter guards from possible systematic errors, but may unknowingly attenuate some real anomalous signal. I left this as a user’s choice.
The command Anomalous in Input section may take an optional string argument on or off. If no argument or no recognizable one is given, the command negates the current state.
5.2
Data Selection and Parameter Fitting
The main command is Scale. It may take three numeric arguments and a string argument. All these arguments are optional.
This command in the current release does not enter a submenu, but this will change in future releases if scaling becomes more complex or has more options.
5.2.1 Data selection
The first number specifies a s-cut. 3 is the default. If I/s(I) less than this value, this integrated intensity will not be used in scaling, however, this does not mean that this data point will be lost forever. This minimization process does not require all the data points available. You may watch the reported data-to-parameter ratio during scaling. If this ratio reaches a few hundreds, there should be enough data points to over-determine the parameters. s-cut must be a value greater than or equal to 0. 0 s-cut means that all positive, but not 0, integrated intensities will join scaling. The s-cut is the only control where user can intervene the data rejection. Other data rejection criteria are automatic. See Listing 5.2.6.0.1 and text below.
Another way to control data selection is to specify number of data points loaded from each frame. If the first numeric argument is equal to or greater than 100, it is no longer considered as s-cut, rather number of data points per frame. If you have tens of frames to scale at once, a few hundreds data points per frame would be sufficient. If you only scale a few images, you may need more. Controlling data points per frame usually makes the program run faster, however, it opens a possibility of insufficient data. It should be understood that data-to-parameter ratio is not the only thing to consider here. Data population as functions of resolution and wavelength is more important. Using only the strongest, and therefore insufficient data may results in no representation at high resolution and two wings of the spectrum. This could cause arbitrary temperature factors and noisy l-curve.
5.2.2 Data isotropy
The second number can be -1, 0, 1, or 2. 0 is the default, which indicates isotropic scale factors and temperature factors only. -1 indicates isotropic scale factors only. All temperature factors will be kept as initialized. 1 indicates anisotropic but linear scale factors and temperature factors can be used.
, and
,
are scale factor and temperature factor, respectively.
2 indicates nonlinear anisotropic scale factors and temperature factors are allowed.
, and
.
Anisotropic factors in general help minimize local errors, but they may be refined to some unreasonably large values, if there are not enough data to restrain them. Use them judiciously.
The string argument specifies a reference frame. The isotropic scale factor a0 of this frame is fixed at 1. All other factors a’s and b’s of this frame will be fixed as initialized. If no reference is specified, the first frame is assumed to be the reference.
The program initialized a’s and b’s are 0 except that a0’s are 1. The second numerical argument and the string argument to command Scale function as selectors to the initialized values, but these arguments do not reset these factors. Therefore, these factors can also be initialized to other user-specified values, and the arguments to command Scale choose to fix some of them and to free others. See 5.3 on user-initialized factors.
5.2.3 Crystal mosaicity
The third numerical argument can be 0, 1, or 2. 0 is the default, which indicates that crystal mosaicity will be fixed as initialized. 1 indicates that an overall mosaicity can be refined, and 2 means frame-by-frame mosaicity. Combination of this option with Spot command in Input section makes all the possibilities.
5.2.4 Absorption correction

It
is obvious that a l-curve obtained from
the process of wavelength normalization accounts for the total effects of the
source spectrum and all absorption by optical elements in the incident beam
prior to the sample crystal, including obstacles like air and front wall of
sample capillary. It is very
appropriate to call this wavelength-dependent correction l-curve,
instead of spectrum. It is less
obvious that a l-curve also corrects
an overall effect of absorption by elements around the crystal environment, for
example, sample crystal itself, surrounding liquid, flow-cell, cryoloop,
capillary, diamond anvil cell, gasket, air, front layer of detector, etc. Why would a l-curve
without special consideration be already capable of correction of a large
portion of the seemingly complex absorption? Consider absorption by only one element
for the simplicity of the argument.
Absorption correction factor
fA = e-m(l)p(t),
where m(l) is linear absorption coefficient as function wavelength, and p(t) is path length through the absorbing element as function of the orientation of a reflected beam t. Path length can be rewritten as a constant mean path length and a deviation from the mean as function of orientation:
p(t) = p0 + Dp(t).
Absorption correction factor then becomes a product of two parts:
fA
=
e-m(l)Dp(t),
where the first part is wavelength-dependent only. This part will be automatically corrected by l-curve. When the range of Dp(t) is smaller than p0, which is often the case, the most of absorption effect has already been taken care of by l-curve. What is left uncorrected is the second, orientation-dependent portion. Therefore, absorption correction factor can be redefined as:
fA = e-m(l)Dp(t),
in one element case.
In general, if a reflected beam at orientation t passes through n types of materials, absorption correction factor can be written as:
fA
=
Dpi(t)].
In X-ray wavelength range, mass absorption coefficients are roughly proportional to squared wavelength, so that a generalized path length P(t) can be defined independent of wavelength:
fA = exp[-l2P(t)].
The generalized path length P(t) is a spherical function or simply a 2-dimensional function in detector space that includes variation of path lengths, densities, and steepness of mass absorption coefficients of all materials involved. Contrasted to wavelength normalization, absorption correction focuses on the unevenness across the detector space rather than wavelength dependency.
Absorption correction is not yet released in the latest version of Epinorm.
5.2.5 Initial scaling
If a set of integrated intensities has never been scaled, there is an option to initialize the process in a less error prone way, but this is not always necessary. To use the initial cycle, specify a string argument initial to the command Scale, and ask for an abbreviated scaling followed by normal scaling later. See 5.3 for saving and restoring intermediate results.
diagnostic off
busy
off
warning off
prompt off
result off
@
m37v3_8us_002.mar3450.inp
@
m37v3_8us_004.mar3450.inp
…
@
m37v3_8us_062.mar3450.inp
prompt on
result on
Input
Image initial.lam
Resolution 2.5 100 # lower resolution
Wavelength 1 1.5 1.1 # NOTE: use
same bandwidth as in normal cycles
Chebyshev 16 #
lower Chebyshev order, no frame-specific lambda-curve
# Spot 8 6 # comment
out for 0 mosaicity
Quit
Scale
3 -1 0 initial # use strong observations,
isotropic scaling only
Lambda
refined.lam
Stop
Yes
Listing 5.2.5.0.1 Command script for an initial scaling cycle.
5.2.6 Minimization cycle and
statistical report
====================================================
Scaling Cycle 4
Isotropic scale factor
Overall spectrum
====================================================
Total measurements: 123021
Accepted :
115161 93.6108%
Rejected :
7860 6.38915%
Data-to-parameter : 1251.75
Maximum iteration : 32
Tolerance
: 0.0001
Chi-square
: 2.8666e+07 3.32024e+07 -4.53632e+06 -13.6626%
R.M.S.D.
:
936.884 1008.29 -71.4082 -7.0821%
Quadratic R-factor: 14.2902%
(Current and previous values, absolute and relative
changes)
______
| )_
| Report |
| ------ |
| ------ |
| ------ |
| ----
|
|________|
R-model
= 0.125352
Weighted R-model = 0.117885
R-models calculated from
115161 accepted integrated intensities.
These R-factors indicate how well the integrated
intensities are modeled by the current parameter
set.
R-merge on F^2 = 0.168612
Weighted R-merge on F^2 = 0.127676
R-merge on F =
0.0993019
Weighted R-merge on F = 0.0804078
R-merges calculated from
115131 accepted integrated intensities of
33883 unique reflections with redundant
measurements.
These R-factors indicate how well the
symmetry-related
reflections agree with each other.
Mean F^2 / sigma(F^2) = 10.7618
Mean F
/ sigma(F)
= 21.4314
Signal-to-noise ratio calculated from
9333 unique reflections with highly redundant
measurements.
Resolution range (A) Unique refl. Mean F^2/sigma(F^2) Mean F/sigma(F)
____________________ ____________ ___________________ _______________
1000.0000 - 4.7877
235
15.97
31.64
4.7877 - 3.8000
527
19.34
38.30
3.8000 - 3.3196
615
17.18
34.07
3.3196 - 3.0161
601
14.53
28.93
3.0161 - 2.7999
625
12.49
24.85
2.7999 - 2.6348
588
11.54
23.11
2.6348 - 2.5028
590
10.65
21.33
2.5028 - 2.3938
538
9.58
19.16
2.3938 - 2.3017
580
9.95
19.88
2.3017 - 2.2223
569
8.51
17.00
2.2223 - 2.1528
626
8.93
17.83
2.1528 - 2.0912
652
8.12
16.15
2.0912 - 2.0362
657
7.85
15.65
2.0362 - 1.9865
659
7.75
15.41
1.9865 - 1.9413
653
7.62
15.16
1.9413 - 1.9000
618
7.18
14.28
File light.inp is overwritten.
File m37v_1a_004.mar3450.ii.lam is overwritten.
File m37v_1a_006.mar3450.ii.lam is overwritten.
…
File m37v_1b_062.mar3450.ii.lam is overwritten.
Listing 5.2.6.0.1 Statistics report from each cycle of scaling.
The minimization process is scheduled in many cycles. Each cycle generates a report like the one listed above. First, a title tells what parameters are refined in this cycle, followed by a section of basic statistics on the data. Data rejection is done automatically based on several criteria. A non-redundant measurement is rejected, since it cannot contribute to the refinement. Data points with large errors are also rejected automatically, but once again, rejected data during scaling may still be included in the final output.
, and
,
where the summation is over N accepted data points. c2 and R.M.S.D. measure how well the observed integrated intensities are modeled, but they do not give a relative sense. Rquadratic in more statistical sense and Rmodel in more crystallographic sense defined below indicates such relative residual of fitting:
, and
.
The Rmerge, also known as Rsymm, measures how well the symmetry-related and redundant measurements agree with each other after applying the current correction factors. All these statistics shall improve cycle by cycle, if the refinement is going well. However, you may notice that some R factors may increase slightly. This is due to newly applied data rejection may accept more data point into the refinement while the process converges.
Mean F2/s(F2) and mean F/s(F) are meant to be objective measures of signal-to-noise ratio. The sample standard deviation s is calculated from redundant observations of at least 4 times, so that s is a lower bound of the real noise content.
5.2.7 Results
Results of the minimization process are reported after the final cycle as the Listing below.
______
| )_
| Report |
| ------ |
| ------ |
| ------ |
| ----
|
|________|
Beam polarization: 0.921758
Mean crystal mosaicity (degree): 0
m37v3_8us_002.mar3450.ii 0.00000
m37v3_8us_004.mar3450.ii 0.00000
…
m37v3_8us_062.mar3450.ii 0.00000
Isotropic
scale factor:
m37v3_8us_002.mar3450.ii 1.00000
m37v3_8us_004.mar3450.ii 1.15969
…
m37v3_8us_062.mar3450.ii 1.43031
Isotropic
temperature factor:
m37v3_8us_002.mar3450.ii 0.00000
m37v3_8us_004.mar3450.ii -4.58414
…
m37v3_8us_062.mar3450.ii -7.24519
Anisotropic
scale factor:
m37v3_8us_002.mar3450.ii
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
m37v3_8us_004.mar3450.ii
0.00042 0.00791 -0.00124