I have large TXT
files in arabic Tashkil and I'm trying to find lines that contain specific pattern mashkula with َ ً ُ ٌ ّ ْ ٍ
, I've tried the following grep
syntax:
cat file.txt | grep "اهلا"
This returns nothing until I insert Tashkil marks:
cat file.txt | grep "أهْلاً"
I get the correct output
أهْلاً
I also tried
grep -P "[ُ ّ َ ً ِ ٍ ٌ ْ ~]|[اهلا]" file.txt
And this returns all matching characters in different patterns:
أهْلاً أ ... هْ.. لًا أنْتَ لَيْلاً ..
How to match arabic diacritical marks with grep?
Is it possible to remove Tashkil marks from text before using grep?
My OS is Ubuntu 18.04
UPDATE: At this moment, I remove Tashkil marks from text with:
sed "s/[ُ ّ َ ً ِ ٍ ٌ ْ]//g"
, then I can grep
what I want. But in this approach, sed
command removes spaces from all text!