Saturday, May 18, 2024
2
rated 0 times [  2] [ 0]  / answers: 1 / hits: 5923  / 3 Years ago, sun, may 16, 2021, 10:18:43

I want to process the body of text and extract an integer from a specific position in the text, but I'm not sure how to describe that 'particular position'. Regular expressions really confuse me. I spent (wasted) a couple hours reading tutorials and I feel no closer to an answer :(



There's a bunch of text which may or may not include integers (that I don't want) and then there's a line that always contains



id_ad=1929170&action


and then followed by a bunch of garbage I don't care about, again it may or may not include one or more integers.



So intuitively I know I just want to ignore everything up to (and including) id_ad= and ignore everything after (and including) &action and I'll be left with the integer I want. And I know I can use regular expressions to achieve this. But I can't seem to figure it out.



I'd like to do this as a one liner from terminal if possible.


More From » command-line

 Answers
3

Not so much a one liner (although the command to run it is a one liner :) ), but here is a python option:



#!/usr/bin/env python3
import sys
file = sys.argv[1]

with open(file) as src:
text = src.read()

starters = [(i+6, text[i:].find("&action")+i) for i in range(len(text)) if text[i:i+6] == "id_ad="]
if len (starters) > 0:
for item in starters:
print(text[item[0]:item[1]])


The script first lists all occurrences (indexes) of the (start) string "id_ad=", in combination with (end) string "&action". Then it prints all that is between those "markers".



Extracted from a prepared file:



" I want to process the body of text and extract an integer from a specific position in the text, but I'm not sure how to describe that 'particular position'. Regular expressions really confuse me. I spent (wasted) a couple hours reading tutorials and I feel no closer to an answer :(
There's a bunch of text which may or may not include integers (that I don't want) and then there's a line that always contains
id_ad=1929170&action
There's a bunch of text which may or may not include integers (that I don't want) and then there's a line that always contains
id_ad=1889170&action and then followed by a bunch of garbage I don't care about, again it may or may not include one or more integers.
There's a bunch of text which may or may not include integers (that I don't want) and then there's a line that always contains id_ad=1889170&action and then followed by a bunch of garbage I don't care about, again it may or may not include one or more integers.
There's a bunch of text which may or may not include integers (that I don't want) and then there's a line that always contains id_ad=1929990&action"



The result is:



1929170
1889170
1889170
1929990


How to use



Paste the script into an empty file, save it as extract.py run it by the command:



python3 <script> <file>


Note



If there is only one occurrence in the text file, the script can be much shorter:



#!/usr/bin/env python3
import sys
file = sys.argv[1]

with open(file) as src:
text = src.read()
print(text[text.find("id_ad=")+6:text.find("&action")])

[#22507] Monday, May 17, 2021, 3 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
rontablis

Total Points: 293
Total Questions: 123
Total Answers: 104

Location: Austria
Member since Mon, Mar 1, 2021
3 Years ago
;