Friday, May 17, 2024
1
rated 0 times [  1] [ 0]  / answers: 1 / hits: 869  / 2 Years ago, fri, february 4, 2022, 3:13:59

Im having two text files which contains million records , all the records are tab delimited , how we can merge these two files based on same header(column)?


file:1


    LogEntryTime              nameId       PartnerId        
2021-06-05T15:00:53 07 5lsddf qyutxwr



file:2


        nameId  GroupId  compnayId
5lsddf l4buafm 0rd33cs


output like this:


    LogEntryTime              nameId       PartnerId    GroupId  compnayId
2021-06-05T15:00:53 07 5lsddf qyutxwr l4buafm 0rd33cs

Tried this but not working:


paste file1.txt file2.txt | nameId -s $'	' -t

and


cat file1.txt file2.txt |  awk -F '	' '{print $ list the all columns name here}'

awk one which is working but need to mention all the column numbers there.


is there any other solution help me out.


thanks in advance


More From » command-line

 Answers
3

If your files are properly constructed tab separated (TSV) files, then you can use csvjoin from the Python-based csvkit package.


Ex. given:


$ head file1.tsv file2.tsv | cat -A
==> file1.tsv <==$
LogEntryTime^InameId^IPartnerId$
2021-06-05T15:00:53 07^I5lsddf^Iqyutxwr$
$
==> file2.tsv <==$
nameId^IGroupId^IcompnayId$
5lsddf^Il4buafm^I0rd33cs$

(cat -A to make the tabs visible, as ^I) then


$ csvjoin -I -t -c nameId file1.tsv file2.tsv
LogEntryTime,nameId,PartnerId,GroupId,compnayId
2021-06-05T15:00:53 07,5lsddf,qyutxwr,l4buafm,0rd33cs

To get the output back in TSV format, use csvformat from the same package:


$ csvjoin -I -t -c nameId file1.tsv file2.tsv | csvformat -T
LogEntryTime nameId PartnerId GroupId compnayId
2021-06-05T15:00:53 07 5lsddf qyutxwr l4buafm 0rd33cs

Note that -I disables type inference - which can sometimes behave unexpectedly, especially with datetime fields.




Even simpler, using Miller (available from the universe repository, as package miller):


$ mlr --tsv join -f file1.tsv -j nameId then reorder -f LogEntryTime file2.tsv
LogEntryTime nameId PartnerId GroupId compnayId
2021-06-05T15:00:53 07 5lsddf qyutxwr l4buafm 0rd33cs

The reorder is necessary because by default mlr join outputs the common field first (just like the system join command). Note that for unsorted input, the whole of file1.tsv will be loaded into memory.


[#1467] Sunday, February 6, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
confiorc

Total Points: 42
Total Questions: 121
Total Answers: 123

Location: India
Member since Wed, Aug 26, 2020
4 Years ago
confiorc questions
Fri, Jul 23, 21, 18:21, 3 Years ago
Tue, Mar 22, 22, 14:25, 2 Years ago
Thu, Feb 17, 22, 22:43, 2 Years ago
Wed, Jun 1, 22, 01:16, 2 Years ago
;