Wednesday, May 8, 2024
 Popular · Latest · Hot · Upcoming
2
rated 0 times [  2] [ 0]  / answers: 1 / hits: 2988  / 3 Years ago, tue, july 27, 2021, 1:51:35

I have a server that when the contents of a specific folder are modified it will calculate a checksum.



The problem is that the calculation of the checksum takes about 30 minutes since it is recalculating every single file in that folder even if a simple text file has been modified. So while the checksum is running the files are not to be used.



The calculation of the checksum is done with the following command.



find . -type f | xargs md5sum > some_file



Every day new files are added to the folder and others are deleted.



Is there a way to update the checksum file only on the modified/added/deleted files without calculating the md5 for the rest of the files?



Edit: Clarification



The checksum needs to contain an md5 for every single file in that folder. What I am trying to achieve is a way to edit/update the checksum file when something changes in the folder:




  1. Remove md5 for file when deleted

  2. Add md5 for file when added in folder

  3. Update hash code when file is modified



All these without recalculating the entire folder from the top


More From » server

 Answers
2

This is a very rough script trying to do what you want to. Feel free to copy, modify, optimize - would be nice to have some response if it works for you. If have tested in my "Downloads" folder and found only one error left (a filename containing [, which grepdid not like).



Edit: Modified the source again, as the creation of timestamps (in the first version) is no longer needed as new/modified files are found with find -newer. Also added parameters to setthe name of the hash file and possibly the top folder to start with; so the script does not have to be called from the top directory.



#!/bin/bash
#
# Script to create md5 hashes for files in and below the current directory
# or the directory passed at the commandline
# In the first run, create the sums for all files.
# In the second run,
# - if the files have not changed, keep the entries
# - if the files have been deleted, forget the entry
# - if the files have changed, create new md5 hash.
#
# Rough version - should be optimized
#

if [ $# -lt 1 ] ; then
echo "Usage:"
echo "$0 <hashfile> [<topdir>]"
echo
exit
fi

export HASHFILE=$1
export TOPDIR='.'
if [ $# -eq 2 ] ; then TOPDIR=$2; fi

export BACKFILE=$HASHFILE.bck
export TMPFILE=$HASHFILE.tmp

# In the first run, we create the file $HASHFILE if it does not exist
# You have to make sure that $HASHFILE does not contain any garbage for the first run!!

if [ ! ( -f $HASHFILE -a -s $HASHFILE ) ]; then
echo -n "Creating $HASHFILE for the first time..."
find $TOPDIR -type f -print0 | xargs -0 md5sum > $HASHFILE
echo "done."
exit
fi

# In the second run, we proceed to find the differences.
# First, find the newer files

find $TOPDIR -type f -newer $HASHFILE -print > $TMPFILE

# Now save the old file and create a new one, starting with new files

mv $HASHFILE $BACKFILE
echo -n "Processing new or modified files ..."
cat $TMPFILE | while read filename ; do
md5sum "$filename" >> $HASHFILE
done
echo "done."

# Now walk through the old file and process to new file

cat $BACKFILE | while read md5 filename ; do
# Does the file still exist?
if [ -f "$filename" ] ; then
# Has the file been modified?
if grep -q -e "^$filename$" $TMPFILE ; then
echo "$filename has changed!"
else
echo "$md5 $filename" >> $HASHFILE
#echo "$filename has not changed."
fi
else
echo "$filename has been removed!"
fi
done

# We now may delete temporary files
# rm $BACKFILE
# rm $TMPFILE

exit

[#9419] Tuesday, July 27, 2021, 3 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
torlim

Total Points: 408
Total Questions: 113
Total Answers: 110

Location: Estonia
Member since Wed, May 27, 2020
4 Years ago
;