Friday, May 3, 2024
9
rated 0 times [  9] [ 0]  / answers: 1 / hits: 2125  / 2 Years ago, sat, february 26, 2022, 10:05:22

I'm on Ubuntu 16.04.



I have a folder with a lot of text files (almost 12k).
I need to upload them all to a website that accepts .tar.gz uploads and then decompresses them automatically, but has a limit of 10MB (10000KB) per file (so in particular each file has to be decompressed on its own).
If I tar.gz all these files the resulting file is of about 72MB.



What I would like to do is to create eight .tar.gz files, each of size / dimension (strictly) smaller than 10000KB.



Alternatively, one can assume that all the files above have approximately the same dimension, so I would like to create eight .tar.gz files with more or less the same amount of files each.



How can I do any of these two tasks?



I am perfectly fine with a solution that involves GUI, CLI or scripting. I am not looking for speed here, I just need it done.


More From » command-line

 Answers
6

Totally patchwork and a quick, rough sketch as it is, but tested on a directory with 3000 files, the script below did an extremely fast job:



#!/usr/bin/env python3
import subprocess
import os
import sys

splitinto = 2

dr = sys.argv[1]
os.chdir(dr)

files = os.listdir(dr)
n_files = len(files)
size = n_files // splitinto

def compress(tar, files):
command = ["tar", "-zcvf", "tarfile" + str(tar) + ".tar.gz", "-T", "-", "--null"]
proc = subprocess.Popen(command, stdin=subprocess.PIPE)
with proc:
proc.stdin.write(b'0'.join(map(str.encode, files)))
proc.stdin.write(b'0')
if proc.returncode:
sys.exit(proc.returncode)

sub = []; tar = 1
for f in files:
sub.append(f)
if len(sub) == size:
compress(tar, sub)
sub = []; tar += 1

if sub:
# taking care of left
compress(tar, sub)


How to use




  • Save it into an empty file as compress_split.py

  • In the head section, set the number of files to compress into. In practice, there will always be one more to take care of the remaining few "left overs".

  • Run it with the directory with your files as argument:



    python3 /path/tocompress_split.py /directory/with/files/tocompress



numbered .tar.gz files will be created in the same directory as where the files are.



Explanation



The script:




  • lists all files in the directory

  • cd's into the directory to prevent adding the path info to the tar file

  • reads through the file list, grouping them by the set division

  • compresses the sub group(s) into numbered files






EDIT



Automatically create chunks by size in mb



More sophisticated is to use the max- size (in mb) of the chunks as a (second) argument. In the script below, the chunks are written into a compressed file as soon as the chunk reaches (passes) the threshold.



Since the script is triggered by the chunks, exceeding the threshold, this will only work if the size of (all) files is substantially smaller than the chunk size.



The script:



#!/usr/bin/env python3
import subprocess
import os
import sys

dr = sys.argv[1]
chunksize = float(sys.argv[2])
os.chdir(dr)

files = os.listdir(dr)
n_files = len(files)

def compress(tar, files):
command = ["tar", "-zcvf", "tarfile" + str(tar) + ".tar.gz", "-T", "-", "--null"]
proc = subprocess.Popen(command, stdin=subprocess.PIPE)
with proc:
proc.stdin.write(b'0'.join(map(str.encode, files)))
proc.stdin.write(b'0')
if proc.returncode:
sys.exit(proc.returncode)

sub = []; tar = 1; subsize = 0
for f in files:
sub.append(f)
subsize = subsize + (os.path.getsize(f)/1000000)
if subsize >= chunksize:
compress(tar, sub)
sub = []; tar += 1; subsize = 0

if sub:
# taking care of left
compress(tar, sub)


To run:



python3 /path/tocompress_split.py /directory/with/files/tocompress chunksize


...where chunksize is the size of input for the tar command.



In this one, the suggested improvements by @DavidFoerster are included. Thanks a lot!


[#13131] Sunday, February 27, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
cisccla

Total Points: 66
Total Questions: 134
Total Answers: 115

Location: Croatia
Member since Fri, Sep 11, 2020
4 Years ago
;