Wednesday, April 24, 2024
3
rated 0 times [  3] [ 0]  / answers: 1 / hits: 501  / 2 Years ago, wed, june 1, 2022, 6:03:39

I have a huge text file (10GB) formatted as follow (multimodel PDB file):


Model 1
... (some text)
ENDMDL
Model 2
... (some text)
ENDMDL
Model 3
... (some text)
ENDMDL
...
Model 9999
... (some text)
ENDMDL
End

I know how to extract each model to a separate file:


while read line; do
echo "${line}" >> model_${i}.pdb
[[ ${line[0]} == ENDMDL ]] && ((i++))
done < $pdb

Now, I need to extract the models in a discrete way in steps of N. The idea is that if N=5, then extract the model 1, then the model 6, then model 11, etc.


A note: the number after the word Model, cannot be used as reference as can be duplicated due to standard issue with multimodels PDB files.


More From » command-line

 Answers
1

You could use awk, using the ENDMDL marker as the record separator and modulo arithmetic to pick the records:


awk -v skip=5 'BEGIN{ORS = RS = "
ENDMDL
"} !((NR-1)%skip)' file.pdb

or (to write each extracted model to a separate file)


awk -v skip=5 '
BEGIN {ORS = RS = "
ENDMDL
"}
!((NR-1)%skip) {f = sprintf("model_%d.pdb", NR); print > f; close(f)}
' file.pdb

[#2082] Wednesday, June 1, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
tisglitter

Total Points: 307
Total Questions: 103
Total Answers: 119

Location: Bosnia and Herzegovina
Member since Thu, Jun 24, 2021
3 Years ago
tisglitter questions
Sun, Jan 9, 22, 16:18, 2 Years ago
Fri, Dec 10, 21, 14:31, 2 Years ago
Wed, Jun 8, 22, 20:29, 2 Years ago
;