Monday, April 29, 2024
5
rated 0 times [  5] [ 0]  / answers: 1 / hits: 1467  / 2 Years ago, sun, june 12, 2022, 8:56:28

As I finally received data from NGS sequencing, for some days I used Ubuntu to analyse them. However, I lack the basics of shell coding, and I feel overwhelmed by this whole new language.


I managed to follow pipelines, but there are still beginner issues.


Specifically, I have a folder with 96 files that I want to rename. They are typically of the form:


AD18_S1_R2_cat_trimmed.fastq.gz
AD19_S26_R2_cat_trimmed.fastq.gz

Basically, I am trying to delete the sample ID, for instance _S1 and _S26.
I recently discovered asterisks, and used them successfully for a previous function. But I have an issue imagining how to use them here.
What I think would work is to extract the expression between _S and _R and remove it, while keeping the R.


If the sample ID always had the same length, I would have used [5-7] to remove the characters from the name. But it won't work for some samples.


I want to understand how to do this, more than having the answer. Thus, would you kindly explain me how to make this change, and what does your code mean if you agree to share a solution?


More From » command-line

 Answers
2

mmv is a nice tool for this. It is not installed by default, so you can install it using:


sudo apt install mmv

Then just run the following command in the directory where you keep the files:


mmv -n '*_*_R2_cat_trimmed.fastq.gz' '#1_R2_cat_trimmed.fastq.gz'

A simplified explanation:



  • -n (no-execute) is used so that you get a preview of the changes without them getting applied. If you are satisfied with the output, rerun the command without the -n flag.



  • You want to remove anything you have between the first and the second _, so the first argument of mmv ('*_*_R2_cat_trimmed.fastq.gz') is a general expression for your files.


    The star is a wildcard that means "match any string of characters". So we match any string up to the first _, then any string between the first and second _, and we leave the rest of the file name as is.



  • The second argument ('#1_R2_cat_trimmed.fastq.gz') basically says "rename using the first match" (#1) and the rest is just the part of the string that we leave as is. Since we didn't use the second match (#2), we effectively removed it.




By default, mmv applies the changes in the background. If you want to see the changes while being made, you can use the -v (verbose) flag.


For more info about mmv, you can consult its manpage, by running man mmv in your terminal.


Note: Before running any command, always test it in a portion of your files to make sure that it works as you want and you don't lose any files. It's also good to always keep a backup of the original files.


[#547] Monday, June 13, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
iething

Total Points: 49
Total Questions: 127
Total Answers: 112

Location: Luxembourg
Member since Tue, Jan 25, 2022
2 Years ago
;