Thursday, September 21, 2023
rated 0 times [  9] [ 0]  / answers: 1 / hits: 71299  / 1 Year ago, thu, june 9, 2022, 3:16:18

I have my references as a text file with a long list of entries and each has two (or more) fields.

The first column is the reference's url; the second column is the title which may vary a bit depending on how the entry was made. Same for third field which may or may not be present.

I want to identify but not remove entries that have the first field (reference url) identical. I know about sort -k1,1 -u but that will automatically (non-interactively) remove all but the first hit. Is there a way to just let me know so I can choose which to retain?

In the extract below of three lines that have the same first field (, I would like to keep line 2 because it has additional tags (sort, CLI) and delete lines #1 and #3:  unique-lines-based-on-the-first-field Unique lines based on the first field sort, CLI Unique lines based on the first field

Is there a program to help identify such "duplicates"? Then, I can manually clean up by personally deleting lines #1 and #3?

More From » command-line


If I understand your question, I think that you need something like:

for dup in $(sort -k1,1 -u file.txt | cut -d' ' -f1); do grep -n -- "$dup" file.txt; done


for dup in $(cut -d " " -f1 file.txt | uniq -d); do grep -n -- "$dup" file.txt; done

where file.txt is your file containing data about you are interested.

In the output you will see the number of the lines and lines where first field is found two or more times.

[#26498] Saturday, June 11, 2022, 1 Year  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.

Total Points: 409
Total Questions: 117
Total Answers: 125

Location: Andorra
Member since Sat, May 27, 2023
4 Months ago