Gromacs  2024.3
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Selection compilation

The compiler takes the selection element tree from the selection parser (see Selection parsing) as input. The selection parser is quite independent of selection evaluation details, and the compiler processes the tree to conform to what the evaluation functions expect. For better control and optimization possibilities, the compilation is done on all selections simultaneously. Hence, all the selections should be parsed before the compiler can be called.

The compiler initializes all fields in gmx::SelectionTreeElement not initialized by the parser: gmx::SelectionTreeElement::v (some fields have already been initialized by the parser), gmx::SelectionTreeElement::evaluate, and gmx::SelectionTreeElement::u (again, some elements have been initialized in the parser). The gmx::SelectionTreeElement::cdata field is used during the compilation to store internal data, but the data is freed when the compiler returns.

In addition to initializing the elements, the compiler reorganizes the tree to simplify and optimize evaluation. The compiler also evaluates the static parts of the selection: in the end of the compilation, static parts have been replaced by the result of the evaluation.

The compiler is invoked using gmx::SelectionCompiler. The gmx::SelectionCompiler::compile() method does the compilation in several passes over the gmx::SelectionTreeElement tree.

  1. Defaults are set for the position type and flags of position calculation methods that were not explicitly specified in the user input.
  2. Subexpressions are extracted: a separate root is created for each subexpression, and placed before the expression is first used. Currently, only variables and expressions used to evaluate parameter values are extracted, but common subexpression could also be detected here.
  3. A second pass (in fact, multiple passes because of interdependencies) with simple reordering and initialization is done:
    1. Boolean expressions are combined such that one element can evaluate, e.g., "A and B and C". The subexpressions in boolean expression are reordered such that static expressions come first without otherwise altering the relative order of the expressions.
    2. The compiler data structure is allocated for each element, and the fields are initialized, with the exception of the contents of gmax and gmin fields. This is the part that needs multiple passes, because some flags are set recursively based on which elements refer to an element, and these flags need to be set to initialize other fields.
    3. The gmx::SelectionTreeElement::evaluate field is set to the correct evaluation function from evaluate.h.
  4. The evaluation function of all elements is replaced with the analyze_static() function to be able to initialize the element before the actual evaluation function is called. The evaluation machinery is then called to initialize the whole tree, while simultaneously evaluating the static expressions. During the evaluation, track is kept of the smallest and largest possible selections, and these are stored in the internal compiler data structure for each element. To be able to do this for all possible values of dynamical expressions, special care needs to be taken with boolean expressions because they are short-circuiting. This is done through the SEL_CDATA_EVALMAX flag, which makes dynamic child expressions of BOOL_OR expressions evaluate to empty groups, while subexpressions of BOOL_AND are evaluated to largest possible groups. Memory is also allocated to store the results of the evaluation. For each element, analyze_static() calls the actual evaluation function after the element has been properly initialized.
  5. Another evaluation pass is done over subexpressions with more than one reference to them. These cannot be completely processed during the first pass, because it is not known whether later references require additional evaluation of static expressions.
  6. Unused subexpressions are removed. For efficiency reasons (and to avoid some checks), this is actually done several times already earlier in the compilation process.
  7. Most of the processing is now done, and the next pass simply sets the evaluation group of root elements to the largest selection as determined in pass 4. For root elements of subexpressions that should not be evaluated before they are referred to, the evaluation group/function is cleared. At the same time, position calculation data is initialized for for selection method elements that require it. Compiler data is also freed as it is no longer needed.
  8. A final pass initializes the total masses and charges in the gmx_ana_selection_t data structures.

The actual evaluation of the selection is described in the documentation of the functions in evaluate.h.

Todo:
Some combinations of method parameter flags are not yet properly treated by the compiler or the evaluation functions in evaluate.cpp. All the ones used by currently implemented methods should work, but new combinations might not.

Element tree after compilation

After the compilation, the selection element tree is suitable for gmx_ana_selcollection_evaluate(). Enough memory has been allocated for gmx::SelectionTreeElement::v (and gmx::SelectionTreeElement::cgrp for SEL_SUBEXPR elements) to allow the selection to be evaluated without allocating any memory.

Root elements

The top level of the tree consists of a chain of SEL_ROOT elements. These are used for two purposes:

  1. A selection that should be evaluated. These elements appear in the same order as the selections in the input. For these elements, gmx::SelectionTreeElement::v has been set to the maximum possible group that the selection can evaluate to (only for dynamic selections), and gmx::SelectionTreeElement::cgrp has been set to use a NULL group for evaluation.
  2. A subexpression that appears in one or more selections. Each selection that gives a value for a method parameter is a potential subexpression, as is any variable value. Only subexpressions that require evaluation for each frame are left after the selection is compiled. Each subexpression appears in the chain before any references to it. For these elements, gmx::SelectionTreeElement::cgrp has been set to the group that should be used to evaluate the subexpression. If gmx::SelectionTreeElement::cgrp is empty, the total evaluation group is not known in advance or it is more efficient to evaluate the subexpression only when it is referenced. If this is the case, gmx::SelectionTreeElement::evaluate is also NULL.

The children of the SEL_ROOT elements can be used to distinguish the two types of root elements from each other; the rules are the same as for the parsed tree (see Root elements). Subexpressions are treated as if they had been provided through variables.

Selection names are stored as after parsing (see Root elements).

Constant elements

All (sub)selections that do not require particle positions have been replaced with SEL_CONST elements. Constant elements from the parser are also retained if present in dynamic parts of the selections. Several constant elements with a NULL gmx::SelectionTreeElement::evaluate are left for debugging purposes; of these, only the ones for BOOL_OR expressions are used during evaluation.

The value is stored in gmx::SelectionTreeElement::v, and for group values with an evaluation function set, also in gmx::SelectionTreeElement::cgrp. For GROUP_VALUE elements, unnecessary atoms (i.e., atoms that could never be selected) have been removed from the value.

SEL_CONST elements have no children.

Method evaluation elements

All selection methods that need to be evaluated dynamically are described by a SEL_EXPRESSION element. The gmx::SelectionTreeElement::method and gmx::SelectionTreeElement::mdata fields have already been initialized by the parser, and the compiler only calls the initialization functions in the method data structure to do some additional initialization of these fields at appropriate points. If the gmx::SelectionTreeElement::pc data field has been created by the parser, the compiler initializes the data structure properly once the required positions are known. If the gmx::SelectionTreeElement::pc field is NULL after the parser, but the method provides only sel_updatefunc_pos(), an appropriate position calculation data structure is created. If gmx::SelectionTreeElement::pc is not NULL, gmx::SelectionTreeElement::pos is also initialized to hold the positions calculated.

Children of these elements are of type SEL_SUBEXPRREF, and describe parameter values that need to be evaluated for each frame. See the next section for more details. SEL_CONST children can also appear, and stand for parameters that get their value from a static expression. These elements are present only for debugging purposes: they always have a NULL evaluation function.

Subexpression elements

As described in Root elements, subexpressions are created for each variable and each expression that gives a value to a selection method parameter. As the only child of the SEL_ROOT element, these elements have a SEL_SUBEXPR element. The SEL_SUBEXPR element has a single child, which evaluates the actual expression. After compilation, only subexpressions that require particle positions for evaluation are left. For non-variable subexpression, automatic names have been generated to help in debugging.

For SEL_SUBEXPR elements, memory has been allocated for gmx::SelectionTreeElement::cgrp to store the group for which the expression has been evaluated during the current frame. This is only done if full subexpression evaluation by _gmx_sel_evaluate_subexpr() is needed; the other evaluation functions do not require this memory.

SEL_SUBEXPRREF elements are used to describe references to subexpressions. They have always a single child, which is the SEL_SUBEXPR element being referenced.

If a subexpression is used only once, the evaluation has been optimized by setting the child of the SEL_SUBEXPR element to evaluate the value of SEL_SUBEXPRREF directly (in the case of memory pooling, this is managed by the evaluation functions). In such cases, the evaluation routines for the SEL_SUBEXPRREF and SEL_SUBEXPR elements only propagate some status information, but do not unnecessarily copy the values.

Boolean elements

SEL_BOOLEAN elements have been merged such that one element may carry out evaluation of more than one operation of the same type. The static parts of the expressions have been evaluated, and are placed in the first child. These are followed by the dynamic expressions, in the order provided by the user.

Arithmetic elements

Constant and static expressions in SEL_ARITHMETIC elements have been calculated. Currently, no other processing is done.