Question

3

Merging/joining a lot of csv files with numeric digits in the file name

rated 0 times [ 3] [ 0] / answers: 1 / hits: 1338 / 2 Years ago, mon, august 29, 2022, 8:25:42

As we process our csv data, we generate a lot of output files with 30 000 lines in each one of them. They all have the same columns/fields. They are all also in csv format and we put them into the same folder on the Linux server. The files are uniquely named using a combination of date, time and numeric digits. See below.

AB_20151127_120000_0_SEGMENT_FINAL.csv

AB_20151127_120000_1_SEGMENT_FINAL.csv

AB_20151127_120000_2_SEGMENT_FINAL.csv

AB_20151127_120000_3_SEGMENT_FINAL.csv

.

.

.

AB_20151127_120000_599_SEGMENT_FINAL.csv

So now we need to merge/join all of them into one big file called:
AB_20151127_120000_SEGMENT_FINAL.csv (note the missing numeric digits from the merged file)

I tried awk as below but it is not working. Please tell me what I did wrong.

awk '"AB_20151127_120000_" NR-1 "_SEGMENT_FINAL.csv"' > AB_20151127_120000_SEGMENT_FINAL.csv

Answers

Only authorized users can answer the question. Please sign in first, or register a free account.

memorrappin

Add To Favorites

Follow

Total Points: 325

Total Questions: 122

Total Answers: 100

Location: Armenia

Member since Sat, Sep 12, 2020

4 Years ago

answered 2 Years ago herfor · Accepted Answer

If the order in which the files are concatenated is not important, use:

cat AB_20151127_120000_*_SEGMENT_FINAL.csv > AB_20151127_120000_SEGMENT_FINAL.csv

If the order is important, you'll have to get creative. If you know the number of segments, 599 for example, you can use brace expansion (the is only there to let me print the command on two lines for readability):

cat AB_20151127_120000_{0..599}_SEGMENT_FINAL.csv > 

    AB_20151127_120000_SEGMENT_FINAL.csv

If you don't, you can still use brace expansion. Just choose a large enough number to be sure that all files will be included and ignore error messages about non-existant files:

cat AB_20151127_120000_{0..599}_SEGMENT_FINAL.csv > 

    AB_20151127_120000_SEGMENT_FINAL.csv 2>/dev/null

Alternatively, you can generate a list of sorted file names and use that:

cat $(printf '%s

' AB_20151127_120000_*_SEGMENT_FINAL.csv | sort -nt_ -k4) > 

    AB_20151127_120000_SEGMENT_FINAL.csv

The printf will print each file name followed by a newline which is the passed to sort which will sort it numerically (-n) on the 4th field (-t4) where fields are defined by _ (-t_).