Selections are used to select atoms/molecules/residues for analysis. In contrast to traditional index files, selections can be dynamic, i.e., select different atoms for different trajectory frames. The GROMACS manual contains a short introductory section to selections in the Analysis chapter, including suggestions on how to get familiar with selections if you are new to the concept. The subtopics listed below provide more details on the technical and syntactic aspects of selections.
Each analysis tool requires a different number of selections and the selections are interpreted differently. The general idea is still the same: each selection evaluates to a set of positions, where a position can be an atom position or center-of-mass or center-of-geometry of a set of atoms. The tool then uses these positions for its analysis to allow very flexible processing. Some analysis tools may have limitations on the types of selections allowed.
If no selections are provided on the command line, you are prompted to type the selections interactively (a pipe can also be used to provide the selections in this case for most tools). While this works well for testing, it is easier to provide the selections from the command line if they are complex or for scripting.
Each tool has different command-line arguments for specifying selections (see the help for the individual tools). You can either pass a single string containing all selections (separated by semicolons), or multiple strings, each containing one selection. Note that you need to quote the selections to protect them from the shell.
If you set a selection command-line argument, but do not provide any selections, you are prompted to type the selections for that argument interactively. This is useful if that selection argument is optional, in which case it is not normally prompted for.
To provide selections from a file, use -sf file.dat in the place of the selection for a selection argument (e.g., -select -sf file.dat). In general, the -sf argument reads selections from the provided file and assigns them to selection arguments that have been specified up to that point, but for which no selections have been provided. As a special case, -sf provided on its own, without preceding selection arguments, assigns the selections to all (yet unset) required selections (i.e., those that would be promted interactively if no selections are provided on the command line).
To use groups from a traditional index file, use argument -n to provide a file. See the “syntax” subtopic for how to use them. If this option is not provided, default groups are generated. The default groups are generated with the same logic as for non-selection tools.
Depending on the tool, two additional command-line arguments may be available to control the behavior:
See the “positions” subtopic for more information on these options.
A set of selections consists of one or more selections, separated by semicolons. Each selection defines a set of positions for the analysis. Each selection can also be preceded by a string that gives a name for the selection for use in, e.g., graph legends. If no name is provided, the string used for the selection is used automatically as the name.
For interactive input, the syntax is slightly altered: line breaks can also be used to separate selections. followed by a line break can be used to continue a line if necessary. Notice that the above only applies to real interactive input, not if you provide the selections, e.g., from a pipe.
It is possible to use variables to store selection expressions. A variable is defined with the following syntax:
VARNAME = EXPR ;
where EXPR is any valid selection expression. After this, VARNAME can be used anywhere where EXPR would be valid.
Selections are composed of three main types of expressions, those that define atoms (ATOM_EXPR), those that define positions (POS_EXPR), and those that evaluate to numeric values (NUM_EXPR). Each selection should be a POS_EXPR or a ATOM_EXPR (the latter is automatically converted to positions). The basic rules are as follows:
Some keywords select atoms based on string values such as the atom name. For these keywords, it is possible to use wildcards (name "C*") or regular expressions (e.g., resname "R[AB]"). The match type is automatically guessed from the string: if it contains other characters than letters, numbers, ‘*’, or ‘?’, it is interpreted as a regular expression. To force the matching to use literal string matching, use name = "C*" to match a literal C*. To force other type of matching, use ‘?’ or ‘~’ in place of ‘=’ to force wildcard or regular expression matching, respectively.
Strings that contain non-alphanumeric characters should be enclosed in double quotes as in the examples. For other strings, the quotes are optional, but if the value conflicts with a reserved keyword, a syntax error will occur. If your strings contain uppercase letters, this should not happen.
Index groups provided with the -n command-line option or generated by default can be accessed with group NR or group NAME, where NR is a zero-based index of the group and NAME is part of the name of the desired group. The keyword group is optional if the whole selection is provided from an index group. To see a list of available groups in the interactive mode, press enter in the beginning of a line.
Possible ways of specifying positions in selections are:
Selection keywords that select atoms based on their positions, such as dist from, use by default the positions defined by the -selrpos command-line option. This can be overridden by prepending a POSTYPE specifier to the keyword. For example, res_com dist from POS evaluates the residue center of mass distances. In the example, all atoms of a residue are either selected or not, based on the single distance calculated.
Basic arithmetic evaluation is supported for numeric expressions. Supported operations are addition, subtraction, negation, multiplication, division, and exponentiation (using ^). Result of a division by zero or other illegal operations is undefined.
The following selection keywords are currently available. For keywords marked with a plus, additional help is available through a subtopic KEYWORD, where KEYWORD is the name of the keyword.
Keywords that select atoms by an integer property:
atomnr
mol (synonym for molindex)
molecule (synonym for molindex)
molindex
resid (synonym for resnr)
residue (synonym for resindex)
resindex
resnr
(use in expressions or like “atomnr 1 to 5 7 9”)
Keywords that select atoms by a numeric property:
beta (synonym for betafactor)
betafactor
charge
distance from POS [cutoff REAL]
distance from POS [cutoff REAL]
mass
mindistance from POS_EXPR [cutoff REAL]
mindistance from POS_EXPR [cutoff REAL]
occupancy
x
y
z
(use in expressions or like “occupancy 0.5 to 1”)
Keywords that select atoms by a string property:
altloc
atomname
atomtype
chain
insertcode
name (synonym for atomname)
pdbatomname
pdbname (synonym for pdbatomname)
resname
type (synonym for atomtype)
(use like “name PATTERN [PATTERN] ...”)
Additional keywords that directly select atoms:
all
insolidangle center POS span POS_EXPR [cutoff REAL]
none
same KEYWORD as ATOM_EXPR
within REAL of POS_EXPR
Keywords that directly evaluate to positions:
cog of ATOM_EXPR [pbc]
com of ATOM_EXPR [pbc]
(see also “positions” subtopic)
Additional keywords:
merge POSEXPR
POSEXPR permute P1 ... PN
plus POSEXPR
name
pdbname
atomname
pdbatomname
These keywords select atoms by name. name selects atoms using the GROMACS atom naming convention. For input formats other than PDB, the atom names are matched exactly as they appear in the input file. For PDB files, 4 character atom names that start with a digit are matched after moving the digit to the end (e.g., to match 3HG2 from a PDB file, use name HG23). pdbname can only be used with a PDB input file, and selects atoms based on the exact name given in the input file, without the transformation described above.
atomname and pdbatomname are synonyms for the above two keywords.
distance from POS [cutoff REAL]
mindistance from POS_EXPR [cutoff REAL]
within REAL of POS_EXPR
distance and mindistance calculate the distance from the given position(s), the only difference being in that distance only accepts a single position, while any number of positions can be given for mindistance, which then calculates the distance to the closest position. within directly selects atoms that are within REAL of POS_EXPR.
For the first two keywords, it is possible to specify a cutoff to speed up the evaluation: all distances above the specified cutoff are returned as equal to the cutoff.
insolidangle center POS span POS_EXPR [cutoff REAL]
This keyword selects atoms that are within REAL degrees (default=5) of any position in POS_EXPR as seen from POS a position expression that evaluates to a single position), i.e., atoms in the solid angle spanned by the positions in POS_EXPR and centered at POS.
Technically, the solid angle is constructed as a union of small cones whose tip is at POS and the axis goes through a point in POS_EXPR. There is such a cone for each position in POS_EXPR, and point is in the solid angle if it lies within any of these cones. The cutoff determines the width of the cones.
POSEXPR merge POSEXPR [stride INT]
POSEXPR merge POSEXPR [merge POSEXPR ...]
POSEXPR plus POSEXPR [plus POSEXPR ...]
Basic selection keywords can only create selections where each atom occurs at most once. The merge and plus selection keywords can be used to work around this limitation. Both create a selection that contains the positions from all the given position expressions, even if they contain duplicates. The difference between the two is that merge expects two or more selections with the same number of positions, and the output contains the input positions selected from each expression in turn, i.e., the output is like A1 B1 A2 B2 and so on. It is also possible to merge selections of unequal size as long as the size of the first is a multiple of the second one. The stride parameter can be used to explicitly provide this multiplicity. plus simply concatenates the positions after each other, and can work also with selections of different sizes. These keywords are valid only at the selection level, not in any subexpressions.
permute P1 ... PN
By default, all selections are evaluated such that the atom indices are returned in ascending order. This can be changed by appending permute P1 P2 ... PN to an expression. The Pi should form a permutation of the numbers 1 to N. This keyword permutes each N-position block in the selection such that the i’th position in the block becomes Pi’th. Note that it is the positions that are permuted, not individual atoms. A fatal error occurs if the size of the selection is not a multiple of n. It is only possible to permute the whole selection expression, not any subexpressions, i.e., the permute keyword should appear last in a selection.
same KEYWORD as ATOM_EXPR
The keyword same can be used to select all atoms for which the given KEYWORD matches any of the atoms in ATOM_EXPR. Keywords that evaluate to integer or string values are supported.
Boolean evaluation proceeds from left to right and is short-circuiting i.e., as soon as it is known whether an atom will be selected, the remaining expressions are not evaluated at all. This can be used to optimize the selections: you should write the most restrictive and/or the most inexpensive expressions first in boolean expressions. The relative ordering between dynamic and static expressions does not matter: all static expressions are evaluated only once, before the first frame, and the result becomes the leftmost expression.
Another point for optimization is in common subexpressions: they are not automatically recognized, but can be manually optimized by the use of variables. This can have a big impact on the performance of complex selections, in particular if you define several index groups like this:
rdist = distance from com of resnr 1 to 5;
resname RES and rdist < 2;
resname RES and rdist < 4;
resname RES and rdist < 6;
Without the variable assignment, the distances would be evaluated three times, although they are exactly the same within each selection. Anything assigned into a variable becomes a common subexpression that is evaluated only once during a frame. Currently, in some cases the use of variables can actually lead to a small performance loss because of the checks necessary to determine for which atoms the expression has already been evaluated, but this should not be a major problem.
Some analysis programs may require a special structure for the input selections (e.g., some options of gmx gangle require the index group to be made of groups of three or four atoms). For such programs, it is up to the user to provide a proper selection expression that always returns such positions.
All selection keywords select atoms in increasing order, i.e., you can consider them as set operations that in the end return the atoms in sorted numerical order. For example, the following selections select the same atoms in the same order:
resname RA RB RC
resname RB RC RA
atomnr 10 11 12 13
atomnr 12 13 10 11
atomnr 10 to 13
atomnr 13 to 10
If you need atoms/positions in a different order, you can:
Due to technical reasons, having a negative value as the first value in expressions like
charge -1 to -0.7
result in a syntax error. A workaround is to write
charge {-1 to -0.7}
instead.
When name selection keyword is used together with PDB input files, the behavior may be unintuitive. When GROMACS reads in a PDB file, 4 character atom names that start with a digit are transformed such that, e.g., 1HG2 becomes HG21, and the latter is what is matched by the name keyword. Use pdbname to match the atom name as it appears in the input PDB file.
Below, examples of different types of selections are given.
Selection of all water oxygens:
resname SOL and name OW
Centers of mass of residues 1 to 5 and 10:
res_com of resnr 1 to 5 10
All atoms farther than 1 nm of a fixed position:
not within 1 of [1.2, 3.1, 2.4]
All atoms of a residue LIG within 0.5 nm of a protein (with a custom name):
"Close to protein" resname LIG and within 0.5 of group "Protein"
All protein residues that have at least one atom within 0.5 nm of a residue LIG:
group "Protein" and same residue as within 0.5 of resname LIG
All RES residues whose COM is between 2 and 4 nm from the COM of all of them:
rdist = res_com distance from com of resname RES
resname RES and rdist >= 2 and rdist <= 4
Selection like with duplicate atoms like C1 C2 C2 C3 C3 C4 ... C8 C9:
name "C[1-8]" merge name "C[2-9]"
This can be used with gmx distance to compute C1-C2, C2-C3 etc. distances.
Selection with atoms in order C2 C1:
name C1 C2 permute 2 1
This can be used with gmx gangle to get C2->C1 vectors instead of C1->C2.
Selection with COMs of two index groups:
com of group 1 plus com of group 2
This can be used with gmx distance to compute the distance between these two COMs.
Fixed vector along x (can be used as a reference with gmx gangle):
[0, 0, 0] plus [1, 0, 0]
The following examples explain the difference between the various position types. This selection selects a position for each residue where any of the three atoms C[123] has x < 2. The positions are computed as the COM of all three atoms. This is the default behavior if you just write res_com of.
part_res_com of name C1 C2 C3 and x < 2
This selection does the same, but the positions are computed as COM positions of whole residues:
whole_res_com of name C1 C2 C3 and x < 2
Finally, this selection selects the same residues, but the positions are computed as COM of exactly those atoms atoms that match the x < 2 criterion:
dyn_res_com of name C1 C2 C3 and x < 2
Without the of keyword, the default behavior is different from above, but otherwise the rules are the same:
name C1 C2 C3 and res_com x < 2
works as if whole_res_com was specified, and selects the three atoms from residues whose COM satisfiex x < 2. Using
name C1 C2 C3 and part_res_com x < 2
instead selects residues based on the COM computed from the C[123] atoms.