Thursday, May 2, 2024
 Popular · Latest · Hot · Upcoming
11
rated 0 times [  11] [ 0]  / answers: 1 / hits: 3725  / 2 Years ago, wed, april 20, 2022, 9:06:07

What I expect from two commands which always produce the same output on their own, is them to always produce the same output when put in a pipeline, but apparently this is not the case for tar | gzip:



~/test$ ls
~/test$ dd if=/dev/urandom of=file bs=10000000 count=1
1+0 records in
1+0 records out
10000000 bytes (10 MB) copied, 0,877671 s, 11,4 MB/s // Creating a 10MB random file
~/test$ tar cf file.tar file // Archiving the file in a tarball
~/test$ tar cf file1.tar file // Archiving the file again in another tarball
~/test$ cmp file.tar file1.tar // Comparing the two output files
~/test$ gzip -c file > file.gz // Compressing the file with gzip
~/test$ gzip -c file > file1.gz // Compressing the file again with gzip
~/test$ cmp file.gz file1.gz // Comparing the two output files
~/test$ tar c file | gzip > file.tar.gz // Archiving and compressing the file
~/test$ tar c file | gzip > file1.tar.gz // Archiving and compressing the file again
~/test$ cmp file.tar.gz file1.tar.gz // Comparing the output files
file.tar.gz file1.tar.gz differ: byte 5, line 1 // File differs at byte 5
~/test$ cmp -i 5 file.tar.gz file1.tar.gz // Comparing the output files after byte 5
~/test$


Adding to this, even tar cfz file.tar file on his own always produces different outputs:



~/test$ tar cfz file2.tar file // Archiving and compressing the file
~/test$ tar cfz file3.tar file // Archiving and compressing the file again
~/test$ cmp file2.tar.gz file3.tar.gz // Comparing the output files
file2.tar.gz file3.tar.gz differ: byte 5, line 1 // File differs at byte 5
~/test$ cmp -i 5 file2.tar.gz file3.tar.gz // Comparing the output files after byte 5
~/test$


While splitting the pipeline finally produces an output that makes sense:



~/test$ gzip -c file.tar > file4.tar.gz
~/test$ gzip -c file.tar > file5.tar.gz
~/test$ cmp file4.tar.gz file5.tar.gz
~/test$


It looks like whatever happens happens only when tar's output is piped directly into gzip.



What is the explanation of this behavior?


More From » tar

 Answers
7

The header for the resulting gzip file is different depending on how it is called.



Gzip tries to store some origin information in the resulting file header. When called on normal files this includes the origin file name by default and a timestamp, which it gets from the original file.



When it is made to compress data piped to it, the origin is not as easy as with a normal file, so it resorts to a different naming and time stamp convention.



To prove this try adding the -n param to the offending lines in your example as...



~/temp$ tar c file | gzip -n > file1.tar.gz
~/temp$ tar c file | gzip -n > file.tar.gz
~/temp$ cmp file.tar.gz file1.tar.gz


Now the files are identical again...



From man gzip ...




   -n --no-name
When compressing, do not save the original file name and time
stamp by default. (The original name is always saved if the name
had to be truncated.) When decompressing, do not restore the
original file name if present (remove only the gzip suffix from
the compressed file name) and do not restore the original time
stamp if present (copy it from the compressed file). This option
is the default when decompressing.



So the difference is indeed the original file name and time stamp information that is turned off by the -n param.


[#21243] Friday, April 22, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
riffnkful

Total Points: 390
Total Questions: 123
Total Answers: 110

Location: Puerto Rico
Member since Sat, Mar 13, 2021
3 Years ago
;