CHAPTER 1

Overview

 

 

1.1     Introduction

 

Crystallographic Concept Library (CCL) and Crystallographic Protocol Library (CPL) are being designed and implemented to achieve industrial strength in crystallographic computing.  One of the goals in the current structural genomics efforts is to find a path to industrialization of macromolecular crystallography.  CCL and CPL are designed to meet demands in this industrialization process from two distinctive but complementary perspectives.

 

CCL contains a collection of computer code that reflects the fundamental concepts in crystallography, such as space group, unit cell, and structure factor.  Its sub-library Math Applications in Crystallography (MAC) collects the commonly-used mathematical procedures in crystallographic computing, e.g., matrix and vector operations, FFT, least-squares model fitting.  Most of these computer code are in C++ programming language with some well-tested FORTRAN code inherited from older programs.  I try to isolate the recurring code from the general crystallographic application programs.  Those parts that handle the crystallographic concepts are summarized and engineered in object-oriented fashion.  Other parts often involve pure math procedures or algorithms.  They are also isolated from the crystallographic topics.  Since CCL and MAC are not end-user applications, in order to take advantage of these libraries, the existing application programs in the field need to be redesigned or at least modified.  Thus CCL and MAC most frequently face the question ¡®Why reinvent the wheel?¡¯  Reinventing the wheel is not the goal, but to achieve industrial strength in crystallographic computing at an unprecedented high-throughput and robustness, ¡®reengineer the wheel¡¯ is envisaged to be inevitable.

 

Nevertheless, this field has accumulated an abundant source of computational tools that effectively implement the working methods over the decades.  In contrast to CCL, CPL tries to utilize the existing resources, and integrate them into a set of uniform-looking application programming interfaces.  CPL offers a set of Python modules and classes for writing applications, graphical user interfaces and databases.  CPL accommodates the existing software packages, such as CNS, SOLVE, CCP4, and makes them truly complementary by providing automatic trial-runs, independent error analysis, data rejection/weighting, result comparison, and other intelligence in order to find optimal protocols to individual data set and its specific error content.  CPL hides all specific data formats, command scripts, output logs of the underlying software.  From the application programmer¡¯s point of view, CPL is a set of high-level, automated, and intelligent protocols that perform complex crystallographic processes, such as data scaling, MAD phasing, and structure refinement.  The actual working engine and the corresponding logistic details are hidden from the users if not requested.  CPL generates reports in several formats including XML.  Finally, CCL and CPL are designed to bring in strength at different levels.  They complement each other when solving specific problem.  The state-of-the-art programming techniques make it very feasible to integrate CCL and CPL.  Both libraries emphasize extensibility and portability for the constantly-evolving field of structural biology and structural genomics.

 

From an even broader perspective, a variety of new approaches are proposed and being actively practiced in crystallographic computing with the demand of structural genomics in mind.  More sophisticated, robust, and sometimes inevitably complex algorithms are being introduced to the users. On the other hand, straightforward front-ends and high-level of automation are expected.  CCL and CPL attempt to bring a fresh thinking to the new wave of advancement in crystallographic computing.

 

1.2     A Tour of This Book

 

1.3     Conventions Used

 

1.3.1 Notations

 

s, S

scalars

f, F

scalar functions

c, C

complexes; c = a + ib

c = |c|

amplitude of a complex

cc, Cc

complex conjugates, if c=a + ib, cc = a - ib

v, V

vectors; v = (a, b, c)

vectors;  = (-a, -b, -c) if v = (a, b, c)

m, M

matrices

p, P

geometric points

(C, r)

circle with center C and radius r

|OP|

distance from point O to point P

ÐABC

angle

 

Table 1.3.1.0.1 Notations used.

 

1.3.2 Type abbreviation in function or class identifiers

 

Type T

TYP

std::complex<long double>

LDC

std::complex<double>

DCX

std::complex<float>

FCX

long double

LDB

double

DBL

float

FLT

long int

LIN

int

INT

short int

SIN

long unsigned int

LUI

unsigned int

UIN

short unsigned int

SUI

bool

BOL

char

CHA

std::string

STD

mac::vector3D

V3D

mac::matrix3D

M3D

mac::polarRotationMatrix

PRM

mac::polarToPolarRotationMatrix

PPM

mac::EulerRotationMatrix

ERM

ccl::gridCoordinates

GDC

ccl::fractionalCoordinates

FRC

ccl::CartesianCoordinates

CTC

ccl::MillerIndices

HKL

 

Table 1.3.2.0.1 Type T in C++ and TYP in Python function or class identifiers.

 

1.4     Related Software

 

CCL and CPL rely on many other software to be built and executed.  This section describes where to obtain these software and how to install them.

 

1.4.1 GCC

 

CCL and CPL are compiled by GCC or GNU Compiler Collection (http://gcc.gnu.org).  The current release has not yet been tested by other compilers.

 

GCC can be obtained from ftp://ftp.gnu.org/gnu/gcc.  First, download gcc-3.2.tar.gz to a local harddrive.  Second, uncompress and untar the file by command

 

tar xzvf [path/]gcc-3.2.tar.gz

 

in a directory where GCC will be installed, usually recommended in /usr/local, where [path/] means optional.  On some systems, one may need two commands of gunzip and tar.  The GCC top directory, e.g., /usr/local/gcc-3.2, will be created.  In the top directory, type command

 

./configure [--prefix=`pwd`]

            [--enable-languages=c,c++,f77]

 

If the option is used, GCC will be installed in its top directory, otherwise, in /usr/local.  In most cases, the latter is recommended, but if multiple versions of GCC are needed, their top directories can be the choice.  If configure went well, type make in the top directory to build the package.  This step will take a while.  Then type make install to install the files.

 

1.4.2 Python

 

Only CPL depends on Python (http://python.org).  Python distribution Python-2.2.2.tgz can be downloaded from http://python.org/2.2.2.  The installation is exactly same as that of GCC.  If it is desired to replace the existing Python version on your system, use /usr in the prefix option.

 

1.4.3 PIL

 

Python Imaging Library (PIL; http://www.pythonware.com/products/pil) adds image processing capability to Python.  The latest distribution of PIL Imaging-1.1.4a2.tar.gz can be downloaded from http://effbot.org/downloads.  Installation of PIL must be done after those of GCC and Python.  Change your working directory to where Python is installed, e.g., /usr/local/Python-2.2.2.  Make a directory Extensions, if there is not yet one.  In the directory Extensions, unpack PIL distribution using command:

 

tar xzvf [path/]Imaging-1.1.4a2.tar.gz

 

A new directory Imaging-1.1.4a2 will be created.  Move into Imaging-1.1.4a2/libImaging, and run the following configuration and make commands:

 

./configure

make

 

After these are done, move back to Imaging-1.1.4a2, and run:

 

python setup.py build

python setup.py install

 

Before the last command of installation, make sure the python command is indeed what you intend to use.  PIL installed by one python command will not be available for other Python releases on the same machine.

 

Some incomplete system may cause error during setup due to missing freetype.

 

1.4.4 Numeric/Numarray

 

Numeric package provides multidimensional arrays.  This package is in transition to a new generation of package Numarray.  The latest distribution can be found at http://www.numpy.org.  Download Numeric-23.0.tar.gz and unpack it in a directory, say, /usr/local/Python-2.2.2/Extensions or simply /usr/local.  A new directory Numeric-23.0 will be created.  In this top level directory, execute:

 

python setup.py install

 

Make sure that the python command is the very version Numeric is intended to be installed into.  Installation to one Python version will not be available to other versions on the same system.

 

1.4.5 Pmw

 

Pmw, Python Megawidgets (http://pmw.sourceforge.net), is used to create GUI components.  Pmw.1.1.tar.gz can be obtained from Source Forge.  Unpacking of this file in a directory, say, /usr/local/Python-2.2.2/Extensions or simply /usr/local creates a new directory Pmw.  The parent directory of Pmw should be added to the environment variable PYTHONPATH before using it.  An alternative is to make a symbolic link of Pmw to /usr/local/rri/pub.

 

1.4.6 SWIG

 

SWIG, Simplified Wrapper and Interface Generator (http://www.swig.org), is used to wrap CCL into several modules of CPL.  Several SWIG shared libraries are required at runtime, even the user does not rebuild CPL.  Installation of SWIG is identical to that of GCC.  swig-1.3.19.tar.gz can be obtained from http://www.swig.org/download.html.

 

1.4.7 FFTW

 

Fast Fourier Transform in the West (FFTW, http://fftw.org) package is used by CCL and in turn by CPL.  Its shared libraries are required at runtime.  fftw-2.1.3.tar.gz can be downloaded from http://fftw.org/download.html.  Its installation procedure is as described in 1.4.1 GCC.

 

1.4.8 TNT

 

Template Numerical Toolkit (TNT, http://math.nist.gov/tnt) is a library contains only herder files.  It is required only when CCL and CPL are built.  First, download tnt094.zip from http://math.nist.gov/tnt/download.html, then type command

 

unzip [path/]tnt094.zip

 

in the directory where TNT will be installed.  /usr/local/include is recommended.  A subdirectory tnt will be created, which contains a set of header files.

 

1.4.9 LAPACK and BLAS

 

Linear Algebra PACKage (LAPACK, http://www.netlib.org/lapack) and Basic Linear Algebra Subprograms (BLAS, http://www.netlib.org/blas) are low-level libraries used by CCL and CPL.  Rpms are available from http://www.netlib.org/lapack/rpms.  First, download the following rpm files:

 

lapack-3_0-2_src.rpm

lapack-3_0-2_i386.rpm

lapack-man-3_0-2_i386.rpm

blas-3_0-2_i386.rpm

blas-man-3_0-2_i386.rpm

 

then use the command rpm ¨Ci [--prefix=path] package.rpm to install package in the directory path.  /usr/local is recommended.  By default, it installs to /usr.

 

1.4.10 Gnuplot

 

Gnuplot (http://www.gnuplot.info) is currently optional.  Future releases of our GUI programs may use gnuplot.py as a graphic module.  Download and unpack gnuplot-3.7.3.tar.gz and gnuplot-py-1.6.tar.gz in directory /usr/local.  To install gnuplot, follow the procedure described in 1.4.1.  To install gnuplot.py, move into the new directory gnuplot-py-1.6, and run command

 

python setup.py install

 

1.4.11 SOLVE

 

 

 

1.4.12 CNS

 

 

 

1.4.13 CCP4

 

 

 

1.4.14 SHELX

 

 

 

1.5 Installation and Execution