I want to read parts of a big csv file between rows n and m and between columns p and q.
Is there an easy way to do this easily with the shell? (Are there commands I should read the doc's? otherwise, I'll write a python script)
I want to read parts of a big csv file between rows n and m and between columns p and q.
Is there an easy way to do this easily with the shell? (Are there commands I should read the doc's? otherwise, I'll write a python script)
I had a script that I adjusted (good idea the (N+1)q part!) thanks to @chronitis comment and the SO answer:
#! /bin/bash
#
N=10
M=20
P=2
Q=3
sed -n "$N,${M}p; $((M+1))q" $1 | cut -d, -f$P-$Q
Save the file as for example cut_csv
, make it executable and use as
cut_csv file
It can be made fancier by accepting the N,M,P,Q parameters as input etc, but I use it seldomly so I normally simply edit the file.
How it works:
The main command is the following (let's suppose N=10, M=20, P=2, Q=3); the shell substitutes the variables and the last line will become: (1)
sed -n "10,20p ; 21q" file | cut -d, -f2-3
Let's start with the first command:
sed -n "10,20p ; 21q" file
This call sed (stream editor, man sed
) in no-print mode (-n
) and execute the following commands on the file:
p
) the lines between 10 and 20 (this is the 10,20p
part)q
) when reading line 21 (21q
) so that discard the rest of the fileThe output of sed is piped (|
) to cut
:
cut -d, -f2-3
This command (man cut
) selects fields of a line (and repeat for each line). In this case, I am telling it that the separator between fields (columns) is a commad (-d,
), and to print out the columns between 2 and 3.
As another more complex example I often use this one:
sed -n "1p; 10,14p; 21q" data.csv | cut -d, -f1,4-8
This will select row 1 (where I have titles :-)) and rows from 10 to 14 (5 lines); then select columns 1 (time in my data...) and column from 4 to 8. It is really powerful once you get grips with it.
(1) one great way to see what the shell is doing is change the first line (which is called a shebang) like that:
#! /bin/bash -xv
The shell will now print every command it reads and the result of the substitutions:
(0)asus-rmano: part_of_csv.sh p20dedo.csv
#! /bin/bash -xv
#
N=10
+ N=10
M=20
+ M=20
P=2
+ P=2
Q=3
+ Q=3
sed -n "$N,${M}p; $((M+1))q" $1 | cut -d, -f$P-$Q
+ cut -d, -f2-3
+ sed -n '10,20p; 21q' p20dedo.csv
16:05:49,000
16:05:51,000
[...]