Question

5

Removing a variable pattern from the title of my files

rated 0 times [ 5] [ 0] / answers: 1 / hits: 1467 / 2 Years ago, sun, june 12, 2022, 8:56:28

As I finally received data from NGS sequencing, for some days I used Ubuntu to analyse them. However, I lack the basics of shell coding, and I feel overwhelmed by this whole new language.

I managed to follow pipelines, but there are still beginner issues.

Specifically, I have a folder with 96 files that I want to rename. They are typically of the form:

AD18_S1_R2_cat_trimmed.fastq.gz

AD19_S26_R2_cat_trimmed.fastq.gz

Basically, I am trying to delete the sample ID, for instance _S1 and _S26.
I recently discovered asterisks, and used them successfully for a previous function. But I have an issue imagining how to use them here.
What I think would work is to extract the expression between _S and _R and remove it, while keeping the R.

If the sample ID always had the same length, I would have used [5-7] to remove the characters from the name. But it won't work for some samples.

I want to understand how to do this, more than having the answer. Thus, would you kindly explain me how to make this change, and what does your code mean if you agree to share a solution?

Answers

Only authorized users can answer the question. Please sign in first, or register a free account.

iething

Add To Favorites

Follow

Total Points: 49

Total Questions: 127

Total Answers: 112

Location: Luxembourg

Member since Tue, Jan 25, 2022

2 Years ago

iething questions

1 impish security release error on Ubuntu 22.04 LTS while running sudo apt update

Thu, Nov 3, 22, 18:34, 2 Years ago

1 Although GNOME Shell integration extension is running, native host connector is not detected in ubuntu 21.10

Fri, Sep 30, 22, 09:24, 2 Years ago

1 strange file security on Ubuntu 20.04

Thu, Mar 23, 23, 22:07, 1 Year ago

1 I want to merge an unallocated space with my Ubuntu Partition but i can't

Mon, May 29, 23, 00:07, 1 Year ago

1 Can multiple operating systems share program installations and user profiles?

Thu, Dec 29, 22, 09:40, 1 Year ago

View All

answered 2 Years ago chilgirlguid · Accepted Answer

mmv is a nice tool for this. It is not installed by default, so you can install it using:

sudo apt install mmv

Then just run the following command in the directory where you keep the files:

mmv -n '*_*_R2_cat_trimmed.fastq.gz' '#1_R2_cat_trimmed.fastq.gz'

A simplified explanation:

-n (no-execute) is used so that you get a preview of the changes without them getting applied. If you are satisfied with the output, rerun the command without the -n flag.

You want to remove anything you have between the first and the second _, so the first argument of mmv ('*_*_R2_cat_trimmed.fastq.gz') is a general expression for your files.

The star is a wildcard that means "match any string of characters". So we match any string up to the first _, then any string between the first and second _, and we leave the rest of the file name as is.

The second argument ('#1_R2_cat_trimmed.fastq.gz') basically says "rename using the first match" (#1) and the rest is just the part of the string that we leave as is. Since we didn't use the second match (#2), we effectively removed it.

By default, mmv applies the changes in the background. If you want to see the changes while being made, you can use the -v (verbose) flag.

For more info about mmv, you can consult its manpage, by running man mmv in your terminal.

Note: Before running any command, always test it in a portion of your files to make sure that it works as you want and you don't lose any files. It's also good to always keep a backup of the original files.