multialign2ref

multialign2ref performs multialignment to reference sequences

Syntax

Seqsaligned=multialign2ref(refSeqs, Seqs2align)example
Seqsaligned=multialign2ref(refSeqs, Seqs2align,Name,Value)example
[Seqsaligned, WrngAlignment]=multialign2ref(___)example

Description

This function performs multialignment of set of Amino or nucleotide sequences not aligned (Seqs2align) to a set of reference sequences already aligned (refSeqs). In order to achieve this target function multialign of the BioInformatics toolbox is called several times with all the default options. Note that if the alignment of a sequence inside Seqs2align changes the reference sequences we call again function multialign using the 'Name',Value, GapOpen',100000 in order to allow the possibility of creating gaps or adding additional characters in the reference sequence. If the new call to multialign: 1) creates gap(s) in the reference sequences (refSeqs) we delete the characters corresponding to the gaps in the aligned sequence and store this information inside boolean variable usedGap of output table SeqsMultiAligned.

2) creates additional characters at the end of the reference sequences (refSeqs) we delete the additional characters and store this information inside boolean vector usedDeletion of output table SeqsMultiAligned.

As a result of this process the number of sites (characters) in the reference sequence is kept fixed and all the aligned sequences will have the same number of characters of the reference sequences. We also store the information about the sequences which could not be aligned inside second output argument WrngAlignment. So far with the hundreds of millions of sequences we have aligned this case never took place.

example

Seqsaligned =multialign2ref(refSeqs, Seqs2align) Example of multialign2ref without optional arguments.

example

Seqsaligned =multialign2ref(refSeqs, Seqs2align, Name, Value) Example without using the parallel computing toolbox.

example

[Seqsaligned, WrngAlignment] =multialign2ref(___) Example of two output arguments.

Examples

expand all

Example of multialign2ref without optional arguments.

Load fastafile containing original covid and other sequences

Seqs2align = fastaread("X01sel.txt");
% Load fasta file containing the 5 covid variants
% (Alpha, Beta, Delta, Gamma, Omicron)
variants = fastaread('Variants.txt');
% Original covid sequence and variants make up the reference sequences
refSequences=[Seqs2align(1); variants];
% Perform multialignment on the reference sequences
refSequences=multialign(refSequences);
% Remove initial covid sequence from Seqs2align
Seqs2align=[Seqs2align(2:101)];
%Call of multialgin2ref with all default arguments
Seqsaligned=multialign2ref(refSequences,Seqs2align);

Process started
   25-Mar-2025 22:58:26

Total estimated time (seconds)=5.4674

Example without using the parallel computing toolbox.

Load fastafile containing original covid and other sequences

Seqs2align = fastaread("X01sel.txt");
% Load fasta file containing the 5 covid variants
% (Alpha, Beta, Delta, Gamma, Omicron)
variants = fastaread('Variants.txt');
% Original covid sequence and variants make up the reference sequences
refSequences=[Seqs2align(1); variants];
% Perform multialignment on the reference sequences
refSequences=multialign(refSequences);
% Remove initial covid sequence from Seqs2align
Seqs2align=[Seqs2align(2:50)];
% Call multialign2ref with option 'UseParallel',false
Seqsaligned=multialign2ref(refSequences,Seqs2align,'UseParallel',false);

Example of two output arguments.

Load fastafile containing original covid and other sequences

Seqs2align = fastaread("X01sel.txt");
% Load fasta file containing the 5 covid variants
% (Alpha, Beta, Delta, Gamma, Omicron)
variants = fastaread('Variants.txt');
% Original covid sequence and variants make up the reference sequences
refSequences=[Seqs2align(1); variants];
% Perform multialignment on the reference sequences
refSequences=multialign(refSequences);
% Remove initial covid sequence from Seqs2align
Seqs2align=[Seqs2align(2:101)];
%Call of multialign2ref with all default arguments
[Seqsaligned,WrngAlignment]=multialign2ref(refSequences,Seqs2align);

Process started
   25-Mar-2025 22:58:36

Total estimated time (seconds)=7.107

Input Arguments

expand all

`refSeqs` — Reference sequences already aligned. Vector of structures.

Vector of structures of length a with the fields

Value	Description
`Sequence`	for the residues and
`Header`	(or 'refSeqs.Name') for the labels.

Data Types: struct

`Seqs2align` — Sequences which have to be aligned. Vector of structures.

Vector of structures of length n with the fields

Value	Description
`Sequence`	for the residues and
`Header`	or (or 'Seqs2align.Name') for the labels.

Data Types: struct

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

 'UseParallelValue','false'
, 'ScoringMatrix','BLOSUM30'
, 'NumberSeqsEachIter', 50
, 'verbose', false

`UseParallel` —use or not the parallel computing toolbox.boolean | positive integer.

If UseParallel is true the parallel computing toolbox is used.

Example: 'UseParallelValue','false'

Data Types: boolean or positive integer

`ScoringMatrix` —scoring method to use for the alignment.character vector | string.

This option specifies the scoring method to use for the alignment. For further details about this option see option ScoringMatrixValue of the multialignment function of the BioInformatics toolbox.

Example: 'ScoringMatrix','BLOSUM30'

Data Types: character or string

`NumberSeqsEachIter` —Number of sequences to add for each iteration.positive integer.

This option controls the number of sequences to add for each iteration. The default is to add 25 additional sequences for each iteration before calling multialign.

Example: 'NumberSeqsEachIter', 50

Data Types: single or double

`verbose` —Show estimated time to complete the process.boolean.

If verbose is true (default) the estimated time to perform the alignment is shown on the screen.

Example: 'verbose', false

Data Types: logical

Output Arguments

expand all

`Seqsaligned` — description Structure

Sequences which have been aligned. Vector of structures.

Vector of structures of the same length n of input Seqs2align with the following 4 fields.

Value	Description
`Sequence`	n aligned sequences
`Header`	containing the labels (this field did not change from input Seqs2align.Header).
`usedGap`	boolean containing true for the sequence in which in order to compute the alignment it was necessary to modify option 'GapOpen', to 100000
`usedDeletion`	boolean containing true when the sequences to align contained a number of characters greater than those of the reference sequences. Data Types - array of struct

`WrngAlignment` —sequences which could not be aligned. Vector

Vector containing the numbers of sequences for which usedGap and usedDeletion was not sufficient to produce the alignment. For example if WrngAlignment=[400 800] the it was not possible to align sequences 400 and 800. If WrngAlignment is en empty value all sequences could be aligned.

Data Types - vector with natural numbers

Documentation

multialign2ref

Syntax

Description

Examples

Example of multialign2ref without optional arguments.

Example without using the parallel computing toolbox.

Example of two output arguments.

Input Arguments

`refSeqs` — Reference sequences already aligned. Vector of structures.

`Seqs2align` — Sequences which have to be aligned. Vector of structures.

Name-Value Pair Arguments

`UseParallel` —use or not the parallel computing toolbox.boolean | positive integer.

`ScoringMatrix` —scoring method to use for the alignment.character vector | string.

`NumberSeqsEachIter` —Number of sequences to add for each iteration.positive integer.

`verbose` —Show estimated time to complete the process.boolean.

Output Arguments

`Seqsaligned` — description Structure

`WrngAlignment` —sequences which could not be aligned. Vector

References

See Also

Documentation

multialign2ref

Syntax

Description

Examples

Example of multialign2ref without optional arguments.

Example without using the parallel computing toolbox.

Example of two output arguments.

Input Arguments

refSeqs — Reference sequences already aligned. Vector of structures.

Seqs2align — Sequences which have to be aligned. Vector of structures.

Name-Value Pair Arguments

UseParallel —use or not the parallel computing toolbox.boolean | positive integer.

ScoringMatrix —scoring method to use for the alignment.character vector | string.

NumberSeqsEachIter —Number of sequences to add for each iteration.positive integer.

verbose —Show estimated time to complete the process.boolean.

Output Arguments

Seqsaligned — description Structure

WrngAlignment —sequences which could not be aligned. Vector

References

See Also

`refSeqs` — Reference sequences already aligned. Vector of structures.

`Seqs2align` — Sequences which have to be aligned. Vector of structures.

`UseParallel` —use or not the parallel computing toolbox.boolean | positive integer.

`ScoringMatrix` —scoring method to use for the alignment.character vector | string.

`NumberSeqsEachIter` —Number of sequences to add for each iteration.positive integer.

`verbose` —Show estimated time to complete the process.boolean.

`Seqsaligned` — description Structure

`WrngAlignment` —sequences which could not be aligned. Vector