Tuesday, April 30, 2024
 Popular · Latest · Hot · Upcoming
3
rated 0 times [  3] [ 0]  / answers: 1 / hits: 769  / 2 Years ago, sun, may 15, 2022, 8:49:34

I have a directory with gigabytes (about 8gb) of small individual files. I want to organize them into directories by date. The date it was created is part of the filename.



Here is an example of a filename:



4RCPBlockCoverLtrednalaserfalse07-10-2012-11-50-14-656.doc.gz


I want the docs directory setup like this:



docs_by_date
2013
01
02
03
04


If the destination directory does not exist it should be created. After verification of a successful copy the source file should be removed.



I'm no super guru with bash, a lot of the symbols I still don't know what they mean, so an explanation would be great of what the script is doing.


More From » 12.04

 Answers
1

I've made the assumption that for the file 07-10-2012-11-50-14-656.doc.gz, you want it sorted by year (i.e. 2012) and month (i.e. 10).



#!/usr/bin/env bash
# This is the preferred way of invoking a bash script, and is better than #!/bin/bash for reasons of portability.
# To use the script, make it executable with `chmod u+x /path/to/script`
# Then run this script with `/path/to/script /path/to/original/files /path/to/docs_by_date`

# Let's set up some path variables. This script will transfer files from the directory specified by the first argument to the directory specified by the second.
pathtooriginalfiles=$1
pathtotarget=$2

# Lets iterate over the files in the original directory, by listing all non-invisible files with $(ls ${pathtooriginalfiles}), and repeating the block with $i changing each time.
for i in $(ls "${pathtooriginalfiles}"); do
# Find the matching parts of the filename that specify the date by echoing then piping the variable to sed. The regex part looks for "everything at the beginning" ^.*, followed by two iterations of digits [0-9]{2}, followed by four iterations of digits, etc. before ending with .doc.gz. It then replaces this entire string with what matches between () using the 1 variable, i.e. the year or month.
year=$(echo -n ${i}| sed -r 's/^.*[0-9]{2}-([0-9]{4})-[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]{3}.doc.gz$/1/')
month=$(echo -n ${i}| sed -r 's/^.*([0-9]{2})-[0-9]{4}-[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]{3}.doc.gz$/1/')

# Create the directory if it doesn't exist already, then copy into it.
mkdir -p "${pathtotarget}/${year}/${month}"
cp "${pathtooriginalfiles}/${i}" "${pathtotarget}/${year}/${month}"
done


Also, I haven't coded exactly what you requested. You said that it should test to see if the files are there, then automatically delete them. Instead, this script just copies them and leaves the originals alone. I'd recommend that you "test" it yourself manually to make sure it does what you think it should, rather than relying on the script to do that itself. (Any bugs in the copying part would probably be replicated in the checking part.) If you really want the script to remove originals, then just change the cp part to mv instead. (I feel that mv is cleaner than copying and deleting anyway. One reason is that cp doesn't checksum, although you could use rsync -a instead.


[#29955] Tuesday, May 17, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
horicgly

Total Points: 36
Total Questions: 126
Total Answers: 104

Location: Iceland
Member since Thu, Dec 1, 2022
1 Year ago
;