Sunday, May 5, 2024
 Popular · Latest · Hot · Upcoming
2
rated 0 times [  2] [ 0]  / answers: 1 / hits: 582  / 1 Year ago, tue, march 21, 2023, 11:47:20

The data I am working with data that has this format:



1880    20  David   7570    Mabel   13096


I need to pull the year (1880), the rank (20), and the name only (David or Mabel). The end result is:



1880        20      David


or



1880        20      Mabel


I have been successful in just pulling the year, rank and name seperately, but I have issues putting together a whole regular expression. I know the basics of regex, but I cannot currently access my notes.



I am attempting to use egrep.


More From » grep

 Answers
2

I'd recommend a Python solution (I don't know what you're currently using):



import re

re_find_data = re.compile(r'^(d+)s+(d+)s+(S+)s+(d+)s+(S+)s+(d+)$')

for line in open(r'/path/to/file'):
for match in re_find_date.findall(line):
print(match)
# Do something with 'match'

# You can index the 'match' tuple like so:
print(match[2])
# Print 3rd part (name)


Regarding grep



I'm not sure grep can be used in this case. It lacks the ability to print specific numbered groups, which is what you need (I believe). I'd be interested to hear if your professor (or anyone else) has a solution with pure grep.



This Regex should work, but you'd need to extract the fields you want from the numbered groups (demonstrated in Python example):



(d+)s+(d+)s+(S+)s+(d+)s+(S+)s+(d+)


grep is just not the right tool. A Regex is, but under a different implementation (hello, Python!).


[#27045] Thursday, March 23, 2023, 1 Year  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
chilgirlguid

Total Points: 123
Total Questions: 114
Total Answers: 121

Location: Macau
Member since Mon, Nov 16, 2020
4 Years ago
;