Sunday, May 19, 2024
0
rated 0 times [  0] [ 0]  / answers: 1 / hits: 581  / 1 Year ago, fri, december 16, 2022, 10:32:17

How can I change all the lines in a fasta file from this:


>vsearch_derep1;size=1 331 95 544  TRINITY_DN40607_c0_g1_i1 len=2000path=[0:0-1097]
ATGGGATTAACTGGTAAGTTAATTGCTGCAATAGAGTTTAAGGCTGGTGGTGATGTTTTC
CATGAGCTGTTCAGGCACAAGCCACAACATTTATCCACAGTAAGCTCTGAGAAAGTACAA

To this:


>TRINITY_DN40607_c0_g1_i1
ATGGGATTAACTGGTAAGTTAATTGCTGCAATAGAGTTTAAGGCTGGTGGTGATGTTTTC
CATGAGCTGTTCAGGCACAAGCCACAACATTTATCCACAGTAAGCTCTGAGAAAGTACAA

That means I would like to remove anything between ">" and "TRINITY_", and anything after "TRINITY_DN40607_c0_g1_i1". Please note that the "1" after "i" varies throughout the fasta file.


I will appreciate your help


More From » command-line

 Answers
5

With awk like so:


awk '{for (i=1; i<=NF; ++i) {if ($i ~ "TRINITY_") {$0=">"$i}}}1' file.fasta

If TRINITY_ is found in a field then replace the whole line with that field preceded with > and print all lines ... The above command will not edit the original file file.fasta but, will just output the lines ... To get the output in a file like output.fasta instead, do it like so:


awk '{for (i=1; i<=NF; ++i) {if ($i ~ "TRINITY_") {$0=">"$i}}}1' file.fasta > output.fasta

or edit the original file in-place(the original file will be modified) with gawk like so:


gawk -i inplace '{for (i=1; i<=NF; ++i) {if ($i ~ "TRINITY_") {$0=">"$i}}}1' file.fasta

[#349] Sunday, December 18, 2022, 1 Year  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
ainubt

Total Points: 496
Total Questions: 98
Total Answers: 126

Location: Sao Tome and Principe
Member since Wed, Dec 21, 2022
1 Year ago
;