Question

7

How can I compare two directories recursively and check if one of the directories contains the other?

rated 0 times [ 7] [ 0] / answers: 1 / hits: 4582 / 1 Year ago, mon, november 28, 2022, 3:10:51

I have two directories, they contain common files. I want to know if one directory contains the same file as the other. I found a script on the net but I want to need improve it to do recursively.

  #!/bin/bash



  # cmp_dir - program to compare two directories



  # Check for required arguments

  if [ $# -ne 2 ]; then

      echo "usage: $0 directory_1 directory_2" 1>&2

      exit 1

  fi



  # Make sure both arguments are directories

  if [ ! -d $1 ]; then

      echo "$1 is not a directory!" 1>&2

      exit 1

  fi



  if [ ! -d $2 ]; then

      echo "$2 is not a directory!" 1>&2

      exit 1

  fi



  # Process each file in directory_1, comparing it to directory_2

  missing=0

  for filename in $1/*; do

      fn=$(basename "$filename")

      if [ -f "$filename" ]; then

          if [ ! -f "$2/$fn" ]; then

              echo "$fn is missing from $2"

              missing=$((missing + 1))

          fi

      fi

  done

  echo "$missing files missing"

Would anybody suggest an algorithm for it?

Answers

Only authorized users can answer the question. Please sign in first, or register a free account.

donurp

Add To Favorites

Follow

Total Points: 328

Total Questions: 128

Total Answers: 123

Location: Faroe Islands

Member since Thu, Apr 8, 2021

3 Years ago

answered 1 Year ago pardsea · Accepted Answer

#!/bin/bash



# cmp_dir - program to compare two directories



# Check for required arguments

if [ $# -ne 2 ]; then

  echo "usage: $0 directory_1 directory_2" 1>&2

  exit 1

fi



# Make sure both arguments are directories

if [ ! -d "$1" ]; then

  echo "$1 is not a directory!" 1>&2

  exit 1

fi



if [ ! -d "$2" ]; then

  echo "$2 is not a directory!" 1>&2

  exit 1

fi



# Process each file in directory_1, comparing it to directory_2

missing=0

while IFS= read -r -d $'0' filename

do

  fn=${filename#$1}

  if [ ! -f "$2/$fn" ]; then

      echo "$fn is missing from $2"

      missing=$((missing + 1))

  fi

done < <(find "$1" -type f -print0)



echo "$missing files missing"

Note that I have added double-quotes around $1 and $2 at various places above to protect them shell expansion. Without the double-quotes, directory names with spaces or other difficult characters would cause errors.

The key loop now reads:

while IFS= read -r -d $'0' filename

do

  fn=${filename#$1}

  if [ ! -f "$2/$fn" ]; then

      echo "$fn is missing from $2"

      missing=$((missing + 1))

  fi

done < <(find "$1" -type f -print0)

This uses find to recursively dive into directory $1 and find file names. The construction while IFS= read -r -d $'0' filename; do .... done < <(find "$1" -type f -print0) is safe against all file names.

basename is no longer used because we are looking at files within subdirectories and we need to keep the subdirectories. So, in place of the call to basename, the line fn=${filename#$1} is used. This just removes from filename the prefix containing directory $1.

Problem 2

Suppose that we match files by name but regardless of directory. In other words, if the first directory contains a file a/b/c/some.txt, we will consider it present in the second directory if file some.txt exists in any subdirectory of the second directory. To do this replace the loop above with:

while IFS= read -r -d $'0' filename

do

  fn=$(basename "$filename")

  if ! find "$2" -name "$fn" | grep -q . ; then

      echo "$fn is missing from $2"

      missing=$((missing + 1))

  fi

done < <(find "$1" -type f -print0)