grouping the data frame containing sequences and names and generate fasta file

Splite the data frame of sequences based on the reference table of grouping.

split_dat(dat, ref_table)

Arguments

dat	data frame generated by `read.phylip` or `read.fasta`
ref_table	data frame with first column for the name of the sequence, second column for the group the sequence belongs to.

Details

Each group of sequences will be saved to a fasta file. Sequences not included in the ref_table will be saved in "Ungrouped.fasta"

Value

This is a subroutine, there is no return value.

References

http://www.genomatix.de/online_help/help/sequence_formats.html

Author

Jinlong Zhang <jinlongzhang01@gmail.com>

Examples


  cat(
  ">seq_1",   "--TTACAAATTGACTTATTATA",
  ">seq_2",   "GATTACAAATTGACTTATTATA",
  ">seq_3",   "GATTACAAATTGACTTATTATA",
  ">seq_5",   "GATTACAAATTGACTTATTATA",
  ">seq_8",   "GATTACAAATTGACTTATTATA",
  ">seq_10",  "---TACAAATTGAATTATTATA",
  ">seq_11",  "--TTACAAATTGACTTATTATA",
  ">seq_12",  "GATTACAAATTGACTTATTATA",
  ">seq_13",  "GATTACAAATTGACTTATTATA",
  ">seq_15",  "GATTACAAATTGACTTATTATA",
  ">seq_16",  "GATTACAAATTGACTTATTATA",
  ">seq_17",  "---TACAAATTGAATTATTATA",
  file = "trnh.fasta", sep = "\n")

sequence_name <- get.fasta.name("trnh.fasta")
sequence_group <- c("group1","group1","group1","group1","group1",
"group2","group2","group2","group3","group3","group3","group3")
group <- data.frame(sequence_name, sequence_group)

fasta <- read.fasta("trnh.fasta")
split_dat(fasta, group)
#> ungrouped.fasta has been saved to  /Users/jinlong/Documents/github/phylotools/docs/reference 
#> group1.fasta has been saved to  /Users/jinlong/Documents/github/phylotools/docs/reference 
#> group2.fasta has been saved to  /Users/jinlong/Documents/github/phylotools/docs/reference 
#> group3.fasta has been saved to  /Users/jinlong/Documents/github/phylotools/docs/reference 
#> splitted fasta files have been saved to: 
#>  /Users/jinlong/Documents/github/phylotools/docs/reference 

unlink("trnh.fasta")
unlink("ungrouped.fasta")
unlink("group1.fasta")
unlink("group2.fasta")
unlink("group3.fasta")