Inputs ====== Phenotype file -------------- Save your phenotype data as a tab-separated file as ``data/data.tsv``. The phenotype file should contain at least 3 columns with headers: * ``strain``: sample names. * ``fasta``: relative or absolute path to the assemblies (SAMPLE.fasta). * ``phenotype``: target phenotype(s). An optional column named ``gff`` can also be provided, indicating the absolute or relative path to the pre-computed annotations (SAMPLE.gff), to completely skip the ggcaller gene-calling step. There can be more than one target phenotype and the column name will be used in populating the output directory. Subsequent columns can contain other target phenotypes and/or any covariate. Additional columns are allowed and will be simply ignored. See an example phenotype data from the `test data `__:: strain fasta gff phenotype covariate1 covariate2 ECOR-01 test/small_fastas/ECOR-01.fasta test/gffs/ECOR-01.gff 0 0.20035297602710966 1 ECOR-02 test/small_fastas/ECOR-02.fasta test/gffs/ECOR-02.gff 1 0.8798471273587852 1 ECOR-03 test/small_fastas/ECOR-03.fasta test/gffs/ECOR-03.gff 0 0.008404161045130532 0 ECOR-04 test/small_fastas/ECOR-04.fasta test/gffs/ECOR-04.gff 0 0.04728873355931962 1 .. note:: Only the target variables/phenotype indicated in the ``config/config.yaml`` file will be used for the associations. See :doc:`usage` for more information. Sample's genome sequences ----------------------------------------- By default, the microGWAS pipeline takes the assemblies with the ``.fasta`` extensions. Make sure that each sample assembly file follows this naming convention before running the analysis. .. note:: The pipeline uses ggCaller to generate GFF annotations automatically only if the ``gff`` column is not present in the phenotype file, so you no longer need to provide GFF files for your samples. However, using ggCaller can take a long time with large datasets containing more than ~2k genomes. If you are dealing with a large number of samples, providing pre-computed GFF files via the optional ``gff`` column is highly recommended to speed up the analysis.