Selection syntax and usage ========================== Selections are used to select atoms/molecules/residues for analysis. In contrast to traditional index files, selections can be dynamic, i.e., select different atoms for different trajectory frames. The GROMACS manual contains a short introductory section to selections in the Analysis chapter, including suggestions on how to get familiar with selections if you are new to the concept. The subtopics listed below provide more details on the technical and syntactic aspects of selections. Each analysis tool requires a different number of selections and the selections are interpreted differently. The general idea is still the same: each selection evaluates to a set of positions, where a position can be an atom position or center-of-mass or center-of-geometry of a set of atoms. The tool then uses these positions for its analysis to allow very flexible processing. Some analysis tools may have limitations on the types of selections allowed. Specifying selections from command line --------------------------------------- If no selections are provided on the command line, you are prompted to type the selections interactively (a pipe can also be used to provide the selections in this case for most tools). While this works well for testing, it is easier to provide the selections from the command line if they are complex or for scripting. Each tool has different command-line arguments for specifying selections (see the help for the individual tools). You can either pass a single string containing all selections (separated by semicolons), or multiple strings, each containing one selection. Note that you need to quote the selections to protect them from the shell. If you set a selection command-line argument, but do not provide any selections, you are prompted to type the selections for that argument interactively. This is useful if that selection argument is optional, in which case it is not normally prompted for. To provide selections from a file, use ``-sf file.dat`` in the place of the selection for a selection argument (e.g., ``-select -sf file.dat``). In general, the ``-sf`` argument reads selections from the provided file and assigns them to selection arguments that have been specified up to that point, but for which no selections have been provided. As a special case, ``-sf`` provided on its own, without preceding selection arguments, assigns the selections to all (yet unset) required selections (i.e., those that would be promted interactively if no selections are provided on the command line). To use groups from a traditional index file, use argument ``-n`` to provide a file. See the "syntax" subtopic for how to use them. If this option is not provided, default groups are generated. The default groups are generated with the same logic as for non-selection tools. Depending on the tool, two additional command-line arguments may be available to control the behavior: * ``-seltype`` can be used to specify the default type of positions to calculate for each selection. * ``-selrpos`` can be used to specify the default type of positions used in selecting atoms by coordinates. See the "positions" subtopic for more information on these options. Selection syntax ---------------- A set of selections consists of one or more selections, separated by semicolons. Each selection defines a set of positions for the analysis. Each selection can also be preceded by a string that gives a name for the selection for use in, e.g., graph legends. If no name is provided, the string used for the selection is used automatically as the name. For interactive input, the syntax is slightly altered: line breaks can also be used to separate selections. \ followed by a line break can be used to continue a line if necessary. Notice that the above only applies to real interactive input, not if you provide the selections, e.g., from a pipe. It is possible to use variables to store selection expressions. A variable is defined with the following syntax:: VARNAME = EXPR ; where ``EXPR`` is any valid selection expression. After this, ``VARNAME`` can be used anywhere where ``EXPR`` would be valid. Selections are composed of three main types of expressions, those that define atoms (``ATOM_EXPR``), those that define positions (``POS_EXPR``), and those that evaluate to numeric values (``NUM_EXPR``). Each selection should be a ``POS_EXPR`` or a ``ATOM_EXPR`` (the latter is automatically converted to positions). The basic rules are as follows: * An expression like ``NUM_EXPR1 < NUM_EXPR2`` evaluates to an ``ATOM_EXPR`` that selects all the atoms for which the comparison is true. * Atom expressions can be combined with boolean operations such as ``not ATOM_EXPR``, ``ATOM_EXPR and ATOM_EXPR``, or ``ATOM_EXPR or ATOM_EXPR``. Parentheses can be used to alter the evaluation order. * ``ATOM_EXPR`` expressions can be converted into ``POS_EXPR`` expressions in various ways, see the "positions" subtopic for more details. * ``POS_EXPR`` can be converted into ``NUM_EXPR`` using syntax like "``x of POS_EXPR``". Currently, this is only supported for single positions like in expression "``x of cog of ATOM_EXPR``". Some keywords select atoms based on string values such as the atom name. For these keywords, it is possible to use wildcards (``name "C*"``) or regular expressions (e.g., ``resname "R[AB]"``). The match type is automatically guessed from the string: if it contains other characters than letters, numbers, '*', or '?', it is interpreted as a regular expression. To force the matching to use literal string matching, use ``name = "C*"`` to match a literal C*. To force other type of matching, use '?' or '~' in place of '=' to force wildcard or regular expression matching, respectively. Strings that contain non-alphanumeric characters should be enclosed in double quotes as in the examples. For other strings, the quotes are optional, but if the value conflicts with a reserved keyword, a syntax error will occur. If your strings contain uppercase letters, this should not happen. Index groups provided with the ``-n`` command-line option or generated by default can be accessed with ``group NR`` or ``group NAME``, where ``NR`` is a zero-based index of the group and ``NAME`` is part of the name of the desired group. The keyword ``group`` is optional if the whole selection is provided from an index group. To see a list of available groups in the interactive mode, press enter in the beginning of a line. Specifying positions in selections ---------------------------------- Possible ways of specifying positions in selections are: 1. A constant position can be defined as ``[XX, YY, ZZ]``, where ``XX``, ``YY`` and ``ZZ`` are real numbers. 2. ``com of ATOM_EXPR [pbc]`` or ``cog of ATOM_EXPR [pbc]`` calculate the center of mass/geometry of ``ATOM_EXPR``. If ``pbc`` is specified, the center is calculated iteratively to try to deal with cases where ``ATOM_EXPR`` wraps around periodic boundary conditions. 3. ``POSTYPE of ATOM_EXPR`` calculates the specified positions for the atoms in ``ATOM_EXPR``. ``POSTYPE`` can be ``atom``, ``res_com``, ``res_cog``, ``mol_com`` or ``mol_cog``, with an optional prefix ``whole_`` ``part_`` or ``dyn_``. ``whole_`` calculates the centers for the whole residue/molecule, even if only part of it is selected. ``part_`` prefix calculates the centers for the selected atoms, but uses always the same atoms for the same residue/molecule. The used atoms are determined from the the largest group allowed by the selection. ``dyn_`` calculates the centers strictly only for the selected atoms. If no prefix is specified, whole selections default to ``part_`` and other places default to ``whole_``. The latter is often desirable to select the same molecules in different tools, while the first is a compromise between speed (``dyn_`` positions can be slower to evaluate than ``part_``) and intuitive behavior. 4. ``ATOM_EXPR``, when given for whole selections, is handled as 3. above, using the position type from the command-line argument ``-seltype``. Selection keywords that select atoms based on their positions, such as ``dist from``, use by default the positions defined by the ``-selrpos`` command-line option. This can be overridden by prepending a ``POSTYPE`` specifier to the keyword. For example, ``res_com dist from POS`` evaluates the residue center of mass distances. In the example, all atoms of a residue are either selected or not, based on the single distance calculated. Arithmetic expressions in selections ------------------------------------ Basic arithmetic evaluation is supported for numeric expressions. Supported operations are addition, subtraction, negation, multiplication, division, and exponentiation (using ^). Result of a division by zero or other illegal operations is undefined. Selection keywords ------------------ The following selection keywords are currently available. For keywords marked with a plus, additional help is available through a subtopic KEYWORD, where KEYWORD is the name of the keyword. * Keywords that select atoms by an integer property: :: atomnr mol (synonym for molindex) molecule (synonym for molindex) molindex resid (synonym for resnr) residue (synonym for resindex) resindex resnr (use in expressions or like "atomnr 1 to 5 7 9") * Keywords that select atoms by a numeric property: :: beta (synonym for betafactor) betafactor charge distance from POS [cutoff REAL] distance from POS [cutoff REAL] mass mindistance from POS_EXPR [cutoff REAL] mindistance from POS_EXPR [cutoff REAL] occupancy x y z (use in expressions or like "occupancy 0.5 to 1") * Keywords that select atoms by a string property: :: altloc atomname atomtype chain insertcode name (synonym for atomname) pdbatomname pdbname (synonym for pdbatomname) resname type (synonym for atomtype) (use like "name PATTERN [PATTERN] ...") * Additional keywords that directly select atoms: :: all insolidangle center POS span POS_EXPR [cutoff REAL] none same KEYWORD as ATOM_EXPR within REAL of POS_EXPR * Keywords that directly evaluate to positions: :: cog of ATOM_EXPR [pbc] com of ATOM_EXPR [pbc] (see also "positions" subtopic) * Additional keywords: :: merge POSEXPR POSEXPR permute P1 ... PN plus POSEXPR Selecting atoms by name - atomname, name, pdbatomname, pdbname ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: name pdbname atomname pdbatomname These keywords select atoms by name. ``name`` selects atoms using the GROMACS atom naming convention. For input formats other than PDB, the atom names are matched exactly as they appear in the input file. For PDB files, 4 character atom names that start with a digit are matched after moving the digit to the end (e.g., to match 3HG2 from a PDB file, use ``name HG23``). ``pdbname`` can only be used with a PDB input file, and selects atoms based on the exact name given in the input file, without the transformation described above. ``atomname`` and ``pdbatomname`` are synonyms for the above two keywords. Selecting based on distance - dist, distance, mindist, mindistance, within ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: distance from POS [cutoff REAL] mindistance from POS_EXPR [cutoff REAL] within REAL of POS_EXPR ``distance`` and ``mindistance`` calculate the distance from the given position(s), the only difference being in that ``distance`` only accepts a single position, while any number of positions can be given for ``mindistance``, which then calculates the distance to the closest position. ``within`` directly selects atoms that are within ``REAL`` of ``POS_EXPR``. For the first two keywords, it is possible to specify a cutoff to speed up the evaluation: all distances above the specified cutoff are returned as equal to the cutoff. Selecting atoms in a solid angle - insolidangle ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: insolidangle center POS span POS_EXPR [cutoff REAL] This keyword selects atoms that are within ``REAL`` degrees (default=5) of any position in ``POS_EXPR`` as seen from ``POS`` a position expression that evaluates to a single position), i.e., atoms in the solid angle spanned by the positions in ``POS_EXPR`` and centered at ``POS``. Technically, the solid angle is constructed as a union of small cones whose tip is at ``POS`` and the axis goes through a point in ``POS_EXPR``. There is such a cone for each position in ``POS_EXPR``, and point is in the solid angle if it lies within any of these cones. The cutoff determines the width of the cones. Merging selections - merge, plus ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: POSEXPR merge POSEXPR [stride INT] POSEXPR merge POSEXPR [merge POSEXPR ...] POSEXPR plus POSEXPR [plus POSEXPR ...] Basic selection keywords can only create selections where each atom occurs at most once. The ``merge`` and ``plus`` selection keywords can be used to work around this limitation. Both create a selection that contains the positions from all the given position expressions, even if they contain duplicates. The difference between the two is that ``merge`` expects two or more selections with the same number of positions, and the output contains the input positions selected from each expression in turn, i.e., the output is like A1 B1 A2 B2 and so on. It is also possible to merge selections of unequal size as long as the size of the first is a multiple of the second one. The ``stride`` parameter can be used to explicitly provide this multiplicity. ``plus`` simply concatenates the positions after each other, and can work also with selections of different sizes. These keywords are valid only at the selection level, not in any subexpressions. Permuting selections - permute ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: permute P1 ... PN By default, all selections are evaluated such that the atom indices are returned in ascending order. This can be changed by appending ``permute P1 P2 ... PN`` to an expression. The ``Pi`` should form a permutation of the numbers 1 to N. This keyword permutes each N-position block in the selection such that the i'th position in the block becomes Pi'th. Note that it is the positions that are permuted, not individual atoms. A fatal error occurs if the size of the selection is not a multiple of n. It is only possible to permute the whole selection expression, not any subexpressions, i.e., the ``permute`` keyword should appear last in a selection. Extending selections - same ^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: same KEYWORD as ATOM_EXPR The keyword ``same`` can be used to select all atoms for which the given ``KEYWORD`` matches any of the atoms in ``ATOM_EXPR``. Keywords that evaluate to integer or string values are supported. Selection evaluation and optimization ------------------------------------- Boolean evaluation proceeds from left to right and is short-circuiting i.e., as soon as it is known whether an atom will be selected, the remaining expressions are not evaluated at all. This can be used to optimize the selections: you should write the most restrictive and/or the most inexpensive expressions first in boolean expressions. The relative ordering between dynamic and static expressions does not matter: all static expressions are evaluated only once, before the first frame, and the result becomes the leftmost expression. Another point for optimization is in common subexpressions: they are not automatically recognized, but can be manually optimized by the use of variables. This can have a big impact on the performance of complex selections, in particular if you define several index groups like this:: rdist = distance from com of resnr 1 to 5; resname RES and rdist < 2; resname RES and rdist < 4; resname RES and rdist < 6; Without the variable assignment, the distances would be evaluated three times, although they are exactly the same within each selection. Anything assigned into a variable becomes a common subexpression that is evaluated only once during a frame. Currently, in some cases the use of variables can actually lead to a small performance loss because of the checks necessary to determine for which atoms the expression has already been evaluated, but this should not be a major problem. Selection limitations --------------------- * Some analysis programs may require a special structure for the input selections (e.g., some options of ``gmx gangle`` require the index group to be made of groups of three or four atoms). For such programs, it is up to the user to provide a proper selection expression that always returns such positions. * All selection keywords select atoms in increasing order, i.e., you can consider them as set operations that in the end return the atoms in sorted numerical order. For example, the following selections select the same atoms in the same order:: resname RA RB RC resname RB RC RA :: atomnr 10 11 12 13 atomnr 12 13 10 11 atomnr 10 to 13 atomnr 13 to 10 If you need atoms/positions in a different order, you can: * use external index groups (for some static selections), * use the ``permute`` keyword to change the final order, or * use the ``merge`` or ``plus`` keywords to compose the final selection from multiple distinct selections. * Due to technical reasons, having a negative value as the first value in expressions like :: charge -1 to -0.7 result in a syntax error. A workaround is to write :: charge {-1 to -0.7} instead. * When ``name`` selection keyword is used together with PDB input files, the behavior may be unintuitive. When GROMACS reads in a PDB file, 4 character atom names that start with a digit are transformed such that, e.g., 1HG2 becomes HG21, and the latter is what is matched by the ``name`` keyword. Use ``pdbname`` to match the atom name as it appears in the input PDB file. Selection examples ------------------ Below, examples of different types of selections are given. * Selection of all water oxygens:: resname SOL and name OW * Centers of mass of residues 1 to 5 and 10:: res_com of resnr 1 to 5 10 * All atoms farther than 1 nm of a fixed position:: not within 1 of [1.2, 3.1, 2.4] * All atoms of a residue LIG within 0.5 nm of a protein (with a custom name):: "Close to protein" resname LIG and within 0.5 of group "Protein" * All protein residues that have at least one atom within 0.5 nm of a residue LIG:: group "Protein" and same residue as within 0.5 of resname LIG * All RES residues whose COM is between 2 and 4 nm from the COM of all of them:: rdist = res_com distance from com of resname RES resname RES and rdist >= 2 and rdist <= 4 * Selection like with duplicate atoms like C1 C2 C2 C3 C3 C4 ... C8 C9:: name "C[1-8]" merge name "C[2-9]" This can be used with ``gmx distance`` to compute C1-C2, C2-C3 etc. distances. * Selection with atoms in order C2 C1:: name C1 C2 permute 2 1 This can be used with ``gmx gangle`` to get C2->C1 vectors instead of C1->C2. * Selection with COMs of two index groups:: com of group 1 plus com of group 2 This can be used with ``gmx distance`` to compute the distance between these two COMs. * Fixed vector along x (can be used as a reference with ``gmx gangle``):: [0, 0, 0] plus [1, 0, 0] * The following examples explain the difference between the various position types. This selection selects a position for each residue where any of the three atoms C[123] has ``x < 2``. The positions are computed as the COM of all three atoms. This is the default behavior if you just write ``res_com of``. :: part_res_com of name C1 C2 C3 and x < 2 This selection does the same, but the positions are computed as COM positions of whole residues:: whole_res_com of name C1 C2 C3 and x < 2 Finally, this selection selects the same residues, but the positions are computed as COM of exactly those atoms atoms that match the ``x < 2`` criterion:: dyn_res_com of name C1 C2 C3 and x < 2 * Without the ``of`` keyword, the default behavior is different from above, but otherwise the rules are the same:: name C1 C2 C3 and res_com x < 2 works as if ``whole_res_com`` was specified, and selects the three atoms from residues whose COM satisfiex ``x < 2``. Using :: name C1 C2 C3 and part_res_com x < 2 instead selects residues based on the COM computed from the C[123] atoms.