Sunday, April 28, 2024
2
rated 0 times [  2] [ 0]  / answers: 1 / hits: 1255  / 2 Years ago, thu, april 14, 2022, 12:46:40

So I have this pattern, the whole thing is one line


<img  itemprop="image"  class="hovered__image jsOpenGallery lazyload" data-src="//static.yellowpages.ca/ypca/ypui-6.65.0.0-20220419.0826/resources/images/serp/photo-gallery-icon.svg" alt="Drain King Plumbers - Plumbers & Plumbing Contractors"/><img  itemprop="image"  class="jsMerchantLogo lazyload" data-src="https://ssmscdn.yp.ca/image/resize/8bfbcba8-0a3e-48d3-b64b-16df5995779c/yp-serp-thumbnail/1.jpg" alt="Drain King Plumbers - Plumbers & Plumbing Contractors"/>

here I am using the expression "alt=" to find the tag and I need to get the name of the business after it like this from above code


alt="Drain King Plumbers - Plumbers & Plumbing Contractors"

The name can be anything, but it is always enclosed in " ". can I use grep to return something like alt="business name"


More From » command-line

 Answers
6

htmlq


You can use htmlq (Like jq, but for HTML.). Install it with brew: brew install htmlq and pipe your string to


| htmlq --attribute alt img

Check also for HTML pup, and xq for XML.


grep (PCREs)


A less elegant way (you can't really parse [X]HTML with regex) is to just use grep with --perl-regexp and --only-matching, with a regex using lookbehind:


| grep -Po "(?<= alt=")[^"]*"

Check also ripgrep.


[#522] Friday, April 15, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
coffekne

Total Points: 114
Total Questions: 122
Total Answers: 126

Location: Mauritania
Member since Sun, Oct 17, 2021
3 Years ago
coffekne questions
;