Thursday, April 25, 2024
9
rated 0 times [  9] [ 0]  / answers: 1 / hits: 71359  / 2 Years ago, thu, june 9, 2022, 3:16:18

I have my references as a text file with a long list of entries and each has two (or more) fields.



The first column is the reference's url; the second column is the title which may vary a bit depending on how the entry was made. Same for third field which may or may not be present.



I want to identify but not remove entries that have the first field (reference url) identical. I know about sort -k1,1 -u but that will automatically (non-interactively) remove all but the first hit. Is there a way to just let me know so I can choose which to retain?



In the extract below of three lines that have the same first field (http://unix.stackexchange.com/questions/49569/), I would like to keep line 2 because it has additional tags (sort, CLI) and delete lines #1 and #3:



http://unix.stackexchange.com/questions/49569/  unique-lines-based-on-the-first-field
http://unix.stackexchange.com/questions/49569/ Unique lines based on the first field sort, CLI
http://unix.stackexchange.com/questions/49569/ Unique lines based on the first field


Is there a program to help identify such "duplicates"? Then, I can manually clean up by personally deleting lines #1 and #3?


More From » command-line

 Answers
4

If I understand your question, I think that you need something like:



for dup in $(sort -k1,1 -u file.txt | cut -d' ' -f1); do grep -n -- "$dup" file.txt; done


or:



for dup in $(cut -d " " -f1 file.txt | uniq -d); do grep -n -- "$dup" file.txt; done


where file.txt is your file containing data about you are interested.



In the output you will see the number of the lines and lines where first field is found two or more times.


[#26498] Saturday, June 11, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
farnic

Total Points: 409
Total Questions: 117
Total Answers: 125

Location: Andorra
Member since Sat, May 27, 2023
12 Months ago
;