| File | Description of Data Set |
|---|---|
| BioInformatic datasets | |
| Variants.txt | Covid-19 FASTA file variants sequences.
When the file is loaded using fastared function (of the bioInfomatics toolbox) the output is a struct array with fields: Header and Sequence. The Header contains the name of virus variants. The sequence represents the amino acid sequences of variants identified by the World Health Organization. |
| X01sel.txt and X02sel.txt | Covid-19 FASTA file sequences.
When the file is loaded using fastared function (of the bioInfomatics toolbox), the output is a struct array with fields: Header and Sequence. The Header contains information related to the laboratory where the sequence analysis was performed, the date, the country, and the living organism carrying the virus. Note that the format in which the date or the country is stored is very heterogeneous. The sequence represents the amino acid sequence to be aligned with the original strain of COVID-19 and the subsequent variants identified by the World Health Organization. X01sel has a length equal to 461 while X02sel has a length equal to 501. The data come from the GISAID - gisaid.org database. |
| Geographic datasets | |
| continents-according-to-our-world-in-data.xlsx | The original version of the file comes from
https://github.com/Mike-Honey/covid-19-genomes/ It contains a lot of information for each country of the world: code, continent, latitude, longitude, alpha-2 code, alpha-3 code, numeric code, Alpha-2 code – a two-letter code that represents a country name, recommended as the general purpose code Alpha-3 code – a three-letter code that represents a country name, which is usually more closely related to the country name. For more information on numeric code see ISO 3166-1 numeric - Wikipedia |