Gromacs  2022.2
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Example code for writing trajectory analysis tools

GROMACS installation includes a template for writing trajectory analysis tools using Framework for trajectory analysis. It can be found from share/gromacs/template/ under the installation directory, and from share/template/ in the source distribution.

The full source code for the file is also included in this documentation: template.cpp The rest of this page walks through the code to explain the different parts.

Global definitions

We start by including some generic C++ headers:

#include <string>
#include <vector>
and continue by including the header for the analysis library: This header includes other headers that together define all the basic data types needed for writing trajectory analysis tools. For convenience, we also import all names from the gmx namespace into the global scope to avoid repeating the name everywhere:
using namespace gmx;

Tool module class declaration

We then define a class that implements our analysis tool:

class AnalysisTemplate : public TrajectoryAnalysisModule
{
public:
AnalysisTemplate();
virtual void initOptions(IOptionsContainer *options,
TrajectoryAnalysisSettings *settings);
virtual void initAnalysis(const TrajectoryAnalysisSettings &settings,
const TopologyInformation &top);
virtual void analyzeFrame(int frnr, const t_trxframe &fr, t_pbc *pbc,
TrajectoryAnalysisModuleData *pdata);
virtual void finishAnalysis(int nframes);
virtual void writeOutput();
private:
class ModuleData;
std::string fnDist_;
double cutoff_;
Selection refsel_;
AnalysisNeighborhood nb_;
AnalysisData data_;
};
The analysis tool class inherits from gmx::TrajectoryAnalysisModule, which is an interface with a few convenience functions for easier interfacing with other code. Below, we walk through the different methods as implemented in the template (note that the template does not implement some of the virtual methods because they are less often needed), discussing some issues that can arise in more complex cases. See documentation of gmx::TrajectoryAnalysisModule for a full description of the available virtual methods and convenience functions. The first block of member variables are used to contain values provided to the different options. They will vary depending on the needs of the analysis tool. The AnalysisNeighborhood object provides neighborhood searching that is used in the analysis. The final block of variables are used to process output data. See initAnalysis() for details on how they are used.

For the template, we do not need any custom frame-local data. If you think you need some for more complex analysis needs, see documentation of gmx::TrajectoryAnalysisModuleData for more details. If you do not care about parallelization, you do not need to consider this part. You can simply declare all variables in the module class itself, initialize them in gmx::TrajectoryAnalysisModule::initAnalysis(), and do any postprocessing in gmx::TrajectoryAnalysisModule::finishAnalysis()).

Construction

The constructor (and possible destructor) of the analysis module should be simple: the constructor should just initialize default values, and the destructor should free any memory managed by the module. For the template, we have no attributes in our class that need to be explicitly freed, so we declare only a constructor:

AnalysisTemplate::AnalysisTemplate()
: cutoff_(0.0)
{
registerAnalysisDataset(&data_, "avedist");
}

Input options

Initialization of the module is split into a few methods, two of which are used in the template. gmx::TrajectoryAnalysisModule::initOptions() is used to set up options understood by the module, as well as for setting up different options through gmx::TrajectoryAnalysisSettings (see the documentation of that class for more details):

void
AnalysisTemplate::initOptions(IOptionsContainer *options,
TrajectoryAnalysisSettings *settings)
{
static const char *const desc[] = {
"This is a template for writing your own analysis tools for",
"GROMACS. The advantage of using GROMACS for this is that you",
"have access to all information in the topology, and your",
"program will be able to handle all types of coordinates and",
"trajectory files supported by GROMACS. In addition,",
"you get a lot of functionality for free from the trajectory",
"analysis library, including support for flexible dynamic",
"selections. Go ahead an try it![PAR]",
"To get started with implementing your own analysis program,",
"follow the instructions in the README file provided.",
"This template implements a simple analysis programs that calculates",
"average distances from a reference group to one or more",
"analysis groups."
};
settings->setHelpText(desc);
options->addOption(FileNameOption("o")
.filetype(OptionFileType::Plot).outputFile()
.store(&fnDist_).defaultBasename("avedist")
.description("Average distances from reference group"));
options->addOption(SelectionOption("reference")
.store(&refsel_).required()
.description("Reference group to calculate distances from"));
options->addOption(SelectionOption("select")
.storeVector(&sel_).required().multiValue()
.description("Groups to calculate distances to"));
options->addOption(DoubleOption("cutoff").store(&cutoff_)
.description("Cutoff for distance calculation (0 = no cutoff)"));
settings->setFlag(TrajectoryAnalysisSettings::efRequireTop);
}
For the template, we first set a description text for the tool (used for help text). Then we declare an option to specify the output file name, followed by options that are used to set selections, and finally an option to set a cutoff value. For the cutoff, the default value will be the one that was set in the constructor, but it would also be possible to explicitly set it here. The values provided by the user for the options will be stored in member variables. Finally, we indicate that the tool always requires topology information. This is done for demonstration purposes only; the code in the template works even without a topology.

For additional documentation on how to define different kinds of options, see gmx::IOptionsContainer, basicoptions.h, and gmx::SelectionOption. You only need to define options that are specific to the analysis; common options, e.g., for specifying input topology and trajectories are added by the framework.

To adjust settings or selection options (e.g., the number of accepted selections) based on option values, you need to override gmx::TrajectoryAnalysisModule::optionsFinished(). For simplicity, this is not done in the template.

Analysis initialization

The actual analysis is initialized in gmx::TrajectoryAnalysisModule::initAnalysis():

void
AnalysisTemplate::initAnalysis(const TrajectoryAnalysisSettings &settings,
const TopologyInformation & /*top*/)
{
nb_.setCutoff(static_cast<real>(cutoff_));
data_.setColumnCount(0, sel_.size());
avem_.reset(new AnalysisDataAverageModule());
data_.addModule(avem_);
if (!fnDist_.empty())
{
new AnalysisDataPlotModule(settings.plotSettings()));
plotm->setFileName(fnDist_);
plotm->setTitle("Average distance");
plotm->setXAxisIsTime();
plotm->setYLabel("Distance (nm)");
data_.addModule(plotm);
}
}
Information about the topology is passed as a parameter. The settings object can also be used to access information about user input.

One of the main tasks of this method is to set up appropriate gmx::AnalysisData objects and modules for them (see gmx::TrajectoryAnalysisModule for the general approach). These objects will be used to process output from the tool. Their main purpose is to support parallelization, but even if you don't care about parallelism, they still provide convenient building blocks, e.g., for histogramming and file output.

For the template, we first set the cutoff for the neighborhood search.

Then, we create and register one gmx::AnalysisData object that will contain, for each frame, one column for each input selection. This will contain the main output from the tool: minimum distance between the reference selection and that particular selection. We then create and setup a module that will compute the average distance for each selection (see writeOutput() for how it is used). Finally, if an output file has been provided, we create and setup a module that will plot the per-frame distances to a file.

If the analysis module needs some temporary storage during processing of a frame (i.e., it uses a custom class derived from gmx::TrajectoryAnalysisModuleData), this should be allocated in gmx::TrajectoryAnalysisModule::startFrames() (see below) if parallelization is to be supported.

If you need to do initialization based on data from the first frame (most commonly, based on the box size), you need to override gmx::TrajectoryAnalysisModule::initAfterFirstFrame(), but this is not used in the template.

Analyzing the frames

There is one more initialization method that needs to be overridden to support automatic parallelization: gmx::TrajectoryAnalysisModule::startFrames(). If you do not need custom frame-local data (or parallelization at all), you can skip this method and ignore the last parameter to gmx::TrajectoryAnalysisModule::analyzeFrame() to make things simpler. In the template, this method is not necessary.

The main part of the analysis is (in most analysis codes) done in the gmx::TrajectoryAnalysisModule::analyzeFrame() method, which is called once for each frame:

void
AnalysisTemplate::analyzeFrame(int frnr, const t_trxframe &fr, t_pbc *pbc,
TrajectoryAnalysisModuleData *pdata)
{
The frnr parameter gives a zero-based index of the current frame (mostly for use with gmx::AnalysisData), pbc contains the PBC information for the current frame for distance calculations with, e.g., pbc_dx(), and pdata points to a data structure created in startFrames(). Although usually not necessary (except for the time field), raw frame data can be accessed through fr. In most cases, the analysis should be written such that it gets all position data through selections, and does not assume a constant size for them. This is all that is required to support the full flexibility of the selection engine.

For the template, we first get data from our custom data structure for shorthand access (if you use a custom data object, you need a static_cast here):

AnalysisDataHandle dh = pdata->dataHandle(data_);
const Selection &refsel = pdata->parallelSelection(refsel_);

We then do a simple calculation and use the AnalysisDataHandle class to set the per-frame output for the tool:

AnalysisNeighborhoodSearch nbsearch = nb_.initSearch(pbc, refsel);
dh.startFrame(frnr, fr.time);
for (size_t g = 0; g < sel_.size(); ++g)
{
const Selection &sel = pdata->parallelSelection(sel_[g]);
int nr = sel.posCount();
real frave = 0.0;
for (int i = 0; i < nr; ++i)
{
SelectionPosition p = sel.position(i);
frave += nbsearch.minimumDistance(p.x());
}
frave /= nr;
dh.setPoint(g, frave);
}
dh.finishFrame();

After all the frames have been processed, gmx::TrajectoryAnalysisModule::finishAnalysis() is called once. This is the place to do any custom postprocessing of the data. For the template, we do nothing, because all necessary processing is done in the data modules:

void
AnalysisTemplate::finishAnalysis(int /*nframes*/)
{
}

If the data structure created in gmx::TrajectoryAnalysisModule::startFrames() is used to aggregate data across frames, you need to override gmx::TrajectoryAnalysisModule::finishFrames() to combine the data from the data structures (see documentation of the method for details). This is not necessary for the template, because the ModuleData structure only contains data used during the analysis of a single frame.

Output

Finally, most programs need to write out some values after the analysis is complete. In some cases, this can be achieved with proper chaining of data modules, but often it is necessary to do some custom processing. All such activities should be done in gmx::TrajectoryAnalysisModule::writeOutput(). This makes it easier to reuse analysis modules in, e.g., scripting languages, where output into files might not be desired. The template simply prints out the average distances for each analysis group:

void
AnalysisTemplate::writeOutput()
{
// We print out the average of the mean distances for each group.
for (size_t g = 0; g < sel_.size(); ++g)
{
fprintf(stderr, "Average mean distance for '%s': %.3f nm\n",
sel_[g].name(), avem_->average(0, g));
}
}
Here, we use the avem_ module, which we initialized in initAnalysis() to aggregate the averages of the computed distances.

Definition of main()

Now, the only thing remaining is to define the main() function. To implement a command-line tool, it should create a module and run it using gmx::TrajectoryAnalysisCommandLineRunner using the boilerplate code below:

int
main(int argc, char *argv[])
{
return gmx::TrajectoryAnalysisCommandLineRunner::runAsMain<AnalysisTemplate>(argc, argv);
}

Tools within GROMACS

Analysis tools implemented using the template can also be easily included into the GROMACS library. To do this, follow these steps:

  1. Put your tool source code into src/gromacs/trajectoryanalysis/modules/.
  2. Remove using namespace gmx; and enclose all the code into gmx::analysismodules namespace, and the tool class into an unnamed namespace within this.
  3. Create a header file corresponding to your tool and add the following class into it withing gmx::analysismodules namespace (replace Template with the name of your tool):
    class TemplateInfo
    {
    public:
    static const char name[];
    static const char shortDescription[];
    };
  4. Add definition for these items in the source file, outside the unnamed namespace (replace Template, AnalysisTemplate and the strings with correct values):
    const char TemplateInfo::name[] = "template";
    const char TemplateInfo::shortDescription[] =
    "Compute something";
    TrajectoryAnalysisModulePointer TemplateInfo::create()
    {
    return TrajectoryAnalysisModulePointer(new AnalysisTemplate);
    }
  5. Register your module in src/gromacs/trajectoryanalysis/modules.cpp.
  6. Done. Your tool can now be invoked as gmx template, using the string you specified as the name.

See existing tools within the src/gromacs/trajectoryanalysis/modules/ for concrete examples and preferred layout of the files. Please document yourself as the author of the files, using Doxygen comments like in the existing files.