I have my references as a text file with a long list of entries and each has two (or more) fields.
The first column is the reference's url; the second column is the title which may vary a bit depending on how the entry was made. Same for third field which may or may not be present.
I want to identify but not remove entries that have the first field (reference url) identical. I know about sort -k1,1 -u
but that will automatically (non-interactively) remove all but the first hit. Is there a way to just let me know so I can choose which to retain?
In the extract below of three lines that have the same first field (http://unix.stackexchange.com/questions/49569/
), I would like to keep line 2 because it has additional tags (sort, CLI) and delete lines #1 and #3:
http://unix.stackexchange.com/questions/49569/ unique-lines-based-on-the-first-field
http://unix.stackexchange.com/questions/49569/ Unique lines based on the first field sort, CLI
http://unix.stackexchange.com/questions/49569/ Unique lines based on the first field
Is there a program to help identify such "duplicates"? Then, I can manually clean up by personally deleting lines #1 and #3?