Friday, May 17, 2024
 Popular · Latest · Hot · Upcoming
3
rated 0 times [  3] [ 0]  / answers: 1 / hits: 1338  / 2 Years ago, mon, august 29, 2022, 8:25:42

As we process our csv data, we generate a lot of output files with 30 000 lines in each one of them. They all have the same columns/fields. They are all also in csv format and we put them into the same folder on the Linux server. The files are uniquely named using a combination of date, time and numeric digits. See below.



AB_20151127_120000_0_SEGMENT_FINAL.csv
AB_20151127_120000_1_SEGMENT_FINAL.csv
AB_20151127_120000_2_SEGMENT_FINAL.csv
AB_20151127_120000_3_SEGMENT_FINAL.csv
.
.
.
AB_20151127_120000_599_SEGMENT_FINAL.csv


So now we need to merge/join all of them into one big file called:
AB_20151127_120000_SEGMENT_FINAL.csv (note the missing numeric digits from the merged file)



I tried awk as below but it is not working. Please tell me what I did wrong.



awk '"AB_20151127_120000_" NR-1 "_SEGMENT_FINAL.csv"' > AB_20151127_120000_SEGMENT_FINAL.csv

More From » files

 Answers
6

If the order in which the files are concatenated is not important, use:



cat AB_20151127_120000_*_SEGMENT_FINAL.csv > AB_20151127_120000_SEGMENT_FINAL.csv


If the order is important, you'll have to get creative. If you know the number of segments, 599 for example, you can use brace expansion (the is only there to let me print the command on two lines for readability):



cat AB_20151127_120000_{0..599}_SEGMENT_FINAL.csv > 
AB_20151127_120000_SEGMENT_FINAL.csv


If you don't, you can still use brace expansion. Just choose a large enough number to be sure that all files will be included and ignore error messages about non-existant files:



cat AB_20151127_120000_{0..599}_SEGMENT_FINAL.csv > 
AB_20151127_120000_SEGMENT_FINAL.csv 2>/dev/null


Alternatively, you can generate a list of sorted file names and use that:



cat $(printf '%s
' AB_20151127_120000_*_SEGMENT_FINAL.csv | sort -nt_ -k4) >
AB_20151127_120000_SEGMENT_FINAL.csv


The printf will print each file name followed by a newline which is the passed to sort which will sort it numerically (-n) on the 4th field (-t4) where fields are defined by _ (-t_).


[#17274] Wednesday, August 31, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
memorrappin

Total Points: 325
Total Questions: 122
Total Answers: 100

Location: Armenia
Member since Sat, Sep 12, 2020
4 Years ago
;