Build PHYLIP supermatrix and create RAxML partition file using aligned fasta or phylip files.

supermat(infiles, outfile = "supermat.out.phy",
         partition.file = "gene_partition.txt")

Arguments

infiles

a character string vector for phylip or aligned fasta file.

outfile

the name of the PHYLIP supermatrix

partition.file

partition data summary describing the genes.

Details

Supermatrix here means a phylip file with combined aligned sequences. The missing sequences should be replaced with either "?" or "-".

Value

A list containing: (1)supermat.dat:a list containing all the data frames read by read.phylip or read.fasta (2)res.super.dat: a data frame containing the sequences and the names (3)partition.dat: summary for all the fasta or phylip files (4)partition.dat.vector: character string vector for the partition file for RAxML

References

Kress, W. J., Erickson, D. L., Jones, F. A., Swenson, N. G., Perez, R., Sanjur, O., & Bermingham, E. (2009). Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proceedings of the National Academy of Sciences, 106(44), 18621-18626.

de Queiroz, A.and Gatesy, J. (2007). The supermatrix approach to systematics. Trends in Ecology & Evolution, 22(1), 34-41.

https://github.com/stamatak/standard-RAxML

Author

Jinlong Zhang <jinlongzhang01@gmail.com>

Note

Punctuation characters and white space in the names of the sequences will be replaced by "_". More information can be found at regex. Type of the sequence in the RAxML partition file should be changed manually according to the manual of RAxML.

See also

Examples

cat("6 22", "seq_1 --TTACAAATTGACTTATTATA", "seq_2 GATTACAAATTGACTTATTATA", "seq_3 GATTACAAATTGACTTATTATA", "seq_5 GATTACAAATTGACTTATTATA", "seq_8 GATTACAAATTGACTTATTATA", "seq_10 ---TACAAATTGAATTATTATA", file = "matk.phy", sep = "\n") cat("5 15", "seq_1 GATTACAAATTGACT", "seq_3 GATTACAAATTGACT", "seq_4 GATTACAAATTGACT", "seq_5 GATTACAAATTGACT", "seq_8 GATTACAAATTGACT", file = "rbcla.phy", sep = "\n") cat("5 50", "seq_2 GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA", "seq_3 GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG", "seq_5 GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC", "seq_8 ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG", "seq_9 ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA", file = "trn1.phy", sep = "\n") supermat(infiles = c("matk.phy", "rbcla.phy", "trn1.phy"))
#> Supermatrix "supermat.out.phy" and RAxML partition file "gene_partition.txt" have been saved to: #> /Users/jinlong/Documents/github/phylotools/docs/reference
unlink(c("matk.phy", "rbcla.phy", "trn1.phy")) unlink(c("supermat.out.phy","gene_partition.txt")) cat( ">seq_1", "--TTACAAATTGACTTATTATA", ">seq_2", "GATTACAAATTGACTTATTATA", ">seq_3", "GATTACAAATTGACTTATTATA", ">seq_5", "GATTACAAATTGACTTATTATA", ">seq_8", "GATTACAAATTGACTTATTATA", ">seq_10", "---TACAAATTGAATTATTATA", file = "matk.fasta", sep = "\n") cat( ">seq_1", "GATTACAAATTGACT", ">seq_3", "GATTACAAATTGACT", ">seq_4", "GATTACAAATTGACT", ">seq_5", "GATTACAAATTGACT", ">seq_8", "GATTACAAATTGACT", file = "rbcla.fasta", sep = "\n") cat( ">seq_2", "GTCTTATAAGAAAGAATAAGAAAG--AAATACAAA-------AAAAAAGA", ">seq_3", "GTCTTATAAGAAAGAAATAGAAAAGTAAAAAAAAA-------AAAAAAAG", ">seq_5", "GACATAAGACATAAAATAGAATACTCAATCAGAAACCAACCCATAAAAAC", ">seq_8", "ATTCCAAAATAAAATACAAAAAGAAAAAACTAGAAAGTTTTTTTTCTTTG", ">seq_9", "ATTCTTTGTTCTTTTTTTTCTTTAATCTTTAAATAAACCTTTTTTTTTTA", file = "trn1.fasta", sep = "\n") supermat(infiles = c("matk.fasta", "rbcla.fasta", "trn1.fasta"))
#> Supermatrix "supermat.out.phy" and RAxML partition file "gene_partition.txt" have been saved to: #> /Users/jinlong/Documents/github/phylotools/docs/reference
unlink(c("matk.fasta", "rbcla.fasta", "trn1.fasta")) unlink(c("supermat.out.phy","gene_partition.txt"))