Question

1

merging tab delimited txt files based on column (which is header ) in bash?

rated 0 times [ 1] [ 0] / answers: 1 / hits: 869 / 2 Years ago, fri, february 4, 2022, 3:13:59

Im having two text files which contains million records , all the records are tab delimited , how we can merge these two files based on same header(column)?

file:1

    LogEntryTime              nameId       PartnerId        

    2021-06-05T15:00:53 07    5lsddf        qyutxwr

file:2

        nameId  GroupId  compnayId

        5lsddf  l4buafm   0rd33cs

output like this:

    LogEntryTime              nameId       PartnerId    GroupId  compnayId

    2021-06-05T15:00:53 07    5lsddf        qyutxwr     l4buafm   0rd33cs

Tried this but not working:

paste file1.txt file2.txt | nameId -s $'	' -t

and

cat file1.txt file2.txt |  awk -F '	' '{print $ list the all columns name here}'

awk one which is working but need to mention all the column numbers there.

is there any other solution help me out.

thanks in advance

Answers

Only authorized users can answer the question. Please sign in first, or register a free account.

confiorc

Add To Favorites

Follow

Total Points: 42

Total Questions: 121

Total Answers: 123

Location: India

Member since Wed, Aug 26, 2020

4 Years ago

confiorc questions

1 Sudo aliased to please; how do i make please see aliases?

Fri, Jul 23, 21, 18:21, 3 Years ago

1 403 error on sudo apt-get update

Tue, Mar 22, 22, 14:25, 2 Years ago

1 Guake keyboard shortcuts no longer working

Thu, Feb 17, 22, 22:43, 2 Years ago

1 How to uninstall foxit reader from Ubuntu 20.04 using terminal?

Mon, Apr 10, 23, 07:00, 1 Year ago

1 How do I install the RTL8811au drivers?

Wed, Jun 1, 22, 01:16, 2 Years ago

View All

answered 2 Years ago mouedi · Accepted Answer

If your files are properly constructed tab separated (TSV) files, then you can use csvjoin from the Python-based csvkit package.

Ex. given:

$ head file1.tsv file2.tsv | cat -A

==> file1.tsv <==$

LogEntryTime^InameId^IPartnerId$

2021-06-05T15:00:53 07^I5lsddf^Iqyutxwr$

$

==> file2.tsv <==$

nameId^IGroupId^IcompnayId$

5lsddf^Il4buafm^I0rd33cs$

(cat -A to make the tabs visible, as ^I) then

$ csvjoin -I -t -c nameId file1.tsv file2.tsv

LogEntryTime,nameId,PartnerId,GroupId,compnayId

2021-06-05T15:00:53 07,5lsddf,qyutxwr,l4buafm,0rd33cs

To get the output back in TSV format, use csvformat from the same package:

$ csvjoin -I -t -c nameId file1.tsv file2.tsv | csvformat -T

LogEntryTime    nameId  PartnerId       GroupId compnayId

2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

Note that -I disables type inference - which can sometimes behave unexpectedly, especially with datetime fields.

Even simpler, using Miller (available from the universe repository, as package miller):

$ mlr --tsv join -f file1.tsv -j nameId then reorder -f LogEntryTime file2.tsv

LogEntryTime    nameId  PartnerId       GroupId compnayId

2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

The reorder is necessary because by default mlr join outputs the common field first (just like the system join command). Note that for unsorted input, the whole of file1.tsv will be loaded into memory.