Searching for sub-sequences in an alignment

CINEMA provides a highly flexible tool for searching for occurrences of a particular sub-sequence, either over the entire alignment, or a specific set of sequences. Searches can be for exact matches to the sub-sequence, or a "fuzzy" search can be made, using a substitution matrix to score the potential matches. Simple regular expressions may be used to construct a search string. An intuitive search-result navigation system provides a simple and fast method for finding the required occurrence of the sub-sequence.

CINEMA also provides an extra facility for working with search results: following a sub-sequence search, a simple form of automatic alignment is possible by choosing pairs of results and having CINEMA align them.


Basic searching

To begin searching for a sub-sequence, launch the Find sub-sequence dialog by clicking on the icon on the main CINEMA toolbar, by choosing Find sub-sequence from the Edit menu, or right-click the mouse over the alignment and select Find sub-sequence from the context menu. To search for a sub-sequence:



  1. Enter the sub-sequence text in the search field and press the Enter key, or click on the Find button
  2. All search results found, i.e. regions of the alignment that exactly match the search string, will be circled in black. The view will be automatically shifted to display the first search result found, which is the 'active' search result and circled in red.

if no occurrences of the sub-sequence are found, a message box will de displayed to confirm the failure of the search.


Navigating search results

The results of a sub-sequence search may be navigated using the four arrow buttons in the bottom-half of the Find sub-sequence dialog:

Use the left/right arrow buttons to move between result on the current sequence and the up/down arrow buttons to move between sequences. The active search result will change as the results are navigated between. If any of the buttons become inactive, then there are no more results in that particular direction. By navigating between the search results it should be possible to quickly and simply find the required occurrence of the sub-sequence.


Restricting the search to specific sequences

By default sub-sequence searching is performed over the entire alignment. However, it is possible to restrict the search to one or more specific sequences:

  1. In the Alignment view, use the left mouse button to selected the sequence(s) to search over, using the shift and/or control keys to select multiple sequences.
  2. In the Find sub-sequence dialog, un-check the Search over entire alignment box:
  3. Enter a search string and press Find perform the search.


Searching using regular expressions

CINEMA allows the use of regular expressions to search for sub-sequences. A regular expression is a formula for matching strings that follow some pattern. In order to use regular expressions, you must check the Allow regular expressions box:

Regular expressions in CINEMA are made up of the normal protein & DNA sequence characters (A-Z, -) and meta characters. The following table lists the most common meta characters, describes how they are used and gives some simple examples:

Metachar Use Example
. Matches any character egc.d would match the subsequences egcad, egcbd, ..., egczd and egc-d.
n+ Matches one or more instances of n egcn+d would match the subsequences egcnd, egcnnd, egcnnnd and so-on.
n* Matches zero or more instances of n egcn*d would match the subsequences egcd, egcnd, egcnnd and so-on.
[ab] Matches one of a or b egc[ad] would match both the sub-sequences egca and egcd.

Note that CINEMA's regular expression parser is based on Perl's regexp language. Further information may be easily found on the web.


Fuzzy searching

By default sub-sequence searching finds only exact matches to the search string. It is, however, possible to perform a 'fuzzy' search for a sub-sequence, using a substitution matrix to score potential matches and a user-controlled threshold to determine which of these matches are accepted. A number of common substitution matrices are supplied by default: BLOSUM62, BLOSUM80, PAM120 & PAM250.

To perform a fuzzy search:

  1. Check the Use fuzzy search box in the Find sub-sequence dialog.
  2. Use the combo-box to choose an appropriate substitution matrix with which to score the search results.
  3. Enter a search string and press Find to perform the fuzzy search.
  4. The search results may be refined by using the Score threshold slider to alter the threshold used to accept/reject potential matches. After every change to the threshold, press Find to re-run the search.

Note that Fuzzy searching will not be available if the Allow regular expressions box is checked. This box must be unchecked before the fuzzy searching controls can be used.


Search result-guided alignment

Following a sub-sequence search, it is possible to use the search results to guide the sequence alignment process. CINEMA can be instructed to align pairs of sequences, according to the relative positions of specific search results along those sequences.

The basic approach is to anchor one search result so that its sequence remains in a fixed position, and then align other sequences to it, according to the positions of the search results on those sequences.

The first step in search result-guided sequence alignment is to nominate one of the sequence results to be anchored:

  1. Navigate to the search result that is to be anchored, either by using the arrow buttons, or by clicking directly onto the result.
  2. Click on the Anchor result button, the outline of the current result changes to indicate it is now anchored.
  3. Now use the result navigation buttons to locate a search result on another sequence that you wish to align with the anchored result.
    Note that only the search results between nearest breakpoint either side of the anchored search result are considered during this process.
  4. To align the current search result with the anchored result, click on the Align result to anchor button. The sequence containing the current result will be slid in the appropriate direction to that so that the result lines up with the anchored result.

During this procedure, it is possible to re-nominate the anchored search result at any point, by clicking on the Anchor result button. The previous anchored result will become a standard search result again.