multialign2ref

multialign2ref performs multialignment to reference sequences

Syntax

  • Seqsaligned=multialign2ref(refSeqs, Seqs2align)example
  • Seqsaligned=multialign2ref(refSeqs, Seqs2align,Name,Value)example
  • [Seqsaligned, WrngAlignment]=multialign2ref(___)example

Description

This function performs multialignment of set of Amino or nucleotide sequences not aligned (Seqs2align) to a set of reference sequences already aligned (refSeqs). In order to achieve this target function multialign of the BioInformatics toolbox is called several times with all the default options. Note that if the alignment of a sequence inside Seqs2align changes the reference sequences we call again function multialign using the 'Name',Value, GapOpen',100000 in order to allow the possibility of creating gaps or adding additional characters in the reference sequence. If the new call to multialign: 1) creates gap(s) in the reference sequences (refSeqs) we delete the characters corresponding to the gaps in the aligned sequence and store this information inside boolean variable usedGap of output table SeqsMultiAligned.

2) creates additional characters at the end of the reference sequences (refSeqs) we delete the additional characters and store this information inside boolean vector usedDeletion of output table SeqsMultiAligned.

As a result of this process the number of sites (characters) in the reference sequence is kept fixed and all the aligned sequences will have the same number of characters of the reference sequences. We also store the information about the sequences which could not be aligned inside second output argument WrngAlignment. So far with the hundreds of millions of sequences we have aligned this case never took place.

example

Seqsaligned =multialign2ref(refSeqs, Seqs2align) Example of multialign2ref without optional arguments.

example

Seqsaligned =multialign2ref(refSeqs, Seqs2align, Name, Value) Example without using the parallel computing toolbox.

example

[Seqsaligned, WrngAlignment] =multialign2ref(___) Example of two output arguments.

Examples

expand all

  • Example of multialign2ref without optional arguments.
  • Load fastafile containing original covid and other sequences

    Seqs2align = fastaread("X01sel.txt");
    % Load fasta file containing the 5 covid variants
    % (Alpha, Beta, Delta, Gamma, Omicron)
    variants = fastaread('Variants.txt');
    % Original covid sequence and variants make up the reference sequences
    refSequences=[Seqs2align(1); variants];
    % Perform multialignment on the reference sequences
    refSequences=multialign(refSequences);
    % Remove initial covid sequence from Seqs2align
    Seqs2align=[Seqs2align(2:101)];
    %Call of multialgin2ref with all default arguments
    Seqsaligned=multialign2ref(refSequences,Seqs2align);
    Process started
       25-Mar-2025 22:58:26
    
    Total estimated time (seconds)=5.4674
    

  • Example without using the parallel computing toolbox.
  • Load fastafile containing original covid and other sequences

    Seqs2align = fastaread("X01sel.txt");
    % Load fasta file containing the 5 covid variants
    % (Alpha, Beta, Delta, Gamma, Omicron)
    variants = fastaread('Variants.txt');
    % Original covid sequence and variants make up the reference sequences
    refSequences=[Seqs2align(1); variants];
    % Perform multialignment on the reference sequences
    refSequences=multialign(refSequences);
    % Remove initial covid sequence from Seqs2align
    Seqs2align=[Seqs2align(2:50)];
    % Call multialign2ref with option 'UseParallel',false
    Seqsaligned=multialign2ref(refSequences,Seqs2align,'UseParallel',false);

  • Example of two output arguments.
  • Load fastafile containing original covid and other sequences

    Seqs2align = fastaread("X01sel.txt");
    % Load fasta file containing the 5 covid variants
    % (Alpha, Beta, Delta, Gamma, Omicron)
    variants = fastaread('Variants.txt');
    % Original covid sequence and variants make up the reference sequences
    refSequences=[Seqs2align(1); variants];
    % Perform multialignment on the reference sequences
    refSequences=multialign(refSequences);
    % Remove initial covid sequence from Seqs2align
    Seqs2align=[Seqs2align(2:101)];
    %Call of multialign2ref with all default arguments
    [Seqsaligned,WrngAlignment]=multialign2ref(refSequences,Seqs2align);
    Process started
       25-Mar-2025 22:58:36
    
    Total estimated time (seconds)=7.107
    

    Input Arguments

    expand all

    refSeqs — Reference sequences already aligned. Vector of structures.

    Vector of structures of length a with the fields

    Value Description
    Sequence

    for the residues and

    Header

    (or 'refSeqs.Name') for the labels.

    Data Types: struct

    Seqs2align — Sequences which have to be aligned. Vector of structures.

    Vector of structures of length n with the fields

    Value Description
    Sequence

    for the residues and

    Header

    or (or 'Seqs2align.Name') for the labels.

    Data Types: struct

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'UseParallelValue','false' , 'ScoringMatrix','BLOSUM30' , 'NumberSeqsEachIter', 50 , 'verbose', false

    UseParallel —use or not the parallel computing toolbox.boolean | positive integer.

    If UseParallel is true the parallel computing toolbox is used.

    Example: 'UseParallelValue','false'

    Data Types: boolean or positive integer

    ScoringMatrix —scoring method to use for the alignment.character vector | string.

    This option specifies the scoring method to use for the alignment. For further details about this option see option ScoringMatrixValue of the multialignment function of the BioInformatics toolbox.

    Example: 'ScoringMatrix','BLOSUM30'

    Data Types: character or string

    NumberSeqsEachIter —Number of sequences to add for each iteration.positive integer.

    This option controls the number of sequences to add for each iteration. The default is to add 25 additional sequences for each iteration before calling multialign.

    Example: 'NumberSeqsEachIter', 50

    Data Types: single or double

    verbose —Show estimated time to complete the process.boolean.

    If verbose is true (default) the estimated time to perform the alignment is shown on the screen.

    Example: 'verbose', false

    Data Types: logical

    Output Arguments

    expand all

    Seqsaligned — description Structure

    Sequences which have been aligned. Vector of structures.

    Vector of structures of the same length n of input Seqs2align with the following 4 fields.

    Value Description
    Sequence

    n aligned sequences

    Header

    containing the labels (this field did not change from input Seqs2align.Header).

    usedGap

    boolean containing true for the sequence in which in order to compute the alignment it was necessary to modify option 'GapOpen', to 100000

    usedDeletion

    boolean containing true when the sequences to align contained a number of characters greater than those of the reference sequences.

    Data Types - array of struct

    WrngAlignment —sequences which could not be aligned. Vector

    Vector containing the numbers of sequences for which usedGap and usedDeletion was not sufficient to produce the alignment. For example if WrngAlignment=[400 800] the it was not possible to align sequences 400 and 800. If WrngAlignment is en empty value all sequences could be aligned.

    Data Types - vector with natural numbers

    References

    See Also

    |

    This page has been automatically generated by our routine publishFS


    The developers of the toolbox The forward search group Terms of Use Acknowledgments