[Geany-Users] Regular expression, for Unicode characters

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Geany-Users] Regular expression, for Unicode characters

Vesta
How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">

    <p>                                                   </p>
    <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
    <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
    <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
    <p>                                                                             </p>
    <p>                       CU CONGUE IRIURE SCAEVOLA   --
       UT DOMING IRACUNDIA. </p>
    <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
    <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>

I want appply class to

<p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
<p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
       UT DOMING IRACUNDIA. </p>
<p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
Replace with: <p class="bold"\1
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Lex Trotman
Geany uses the Glib regex library whose syntax is described at
https://developer.gnome.org/glib/stable/glib-regex-syntax.html

Cheers
Lex

2016-07-31 22:03 GMT+10:00 Vesta <[hidden email]>:

> How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">
>
>     <p>                                                   </p>
>     <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
>     <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
>     <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
>     <p>                                                                             </p>
>     <p>                       CU CONGUE IRIURE SCAEVOLA   --
>        UT DOMING IRACUNDIA. </p>
>     <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
>     <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>
>
> I want appply class to
>
> <p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> <p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
>        UT DOMING IRACUNDIA. </p>
> <p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>
>
> I need Unicode solution for Cyrillic text. This not works:
>
> Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
> Replace with: <p class="bold"\1
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Vesta
Can anyone show how should look regular expression for this particular case?

this not works too:

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)

Regards,
Vesta





> Sent: Sunday, July 31, 2016 at 3:32 PM
> From: "Lex Trotman" <[hidden email]>
> To: "Geany general discussion list" <[hidden email]>
> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>
> Geany uses the Glib regex library whose syntax is described at
> https://developer.gnome.org/glib/stable/glib-regex-syntax.html
>
> Cheers
> Lex
>
> 2016-07-31 22:03 GMT+10:00 Vesta <[hidden email]>:
> > How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">
> >
> >     <p>                                                   </p>
> >     <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> >     <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
> >     <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
> >     <p>                                                                             </p>
> >     <p>                       CU CONGUE IRIURE SCAEVOLA   --
> >        UT DOMING IRACUNDIA. </p>
> >     <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
> >     <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>
> >
> > I want appply class to
> >
> > <p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> > <p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
> >        UT DOMING IRACUNDIA. </p>
> > <p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>
> >
> > I need Unicode solution for Cyrillic text. This not works:
> >
> > Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
> > Replace with: <p class="bold"\1
> > _______________________________________________
> > Users mailing list
> > [hidden email]
> > https://lists.geany.org/cgi-bin/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

James Ginns
Regular Expressions are a tad difficult to master.

Basic question: you're using lazy modifiers on purpose right? Just
checking.

So, a dissection The regex engine (don't know what you're using) should
hit \W*? and look for as few non word characters as possible (in some
instances zero). Then it will look for ONE character in the character
class [p{Lu}] (unicode?). Then it will look for zero or more instances
of [p{Lu}] or a non-word character. This is until it gets to the closing
tag. Since you're only looking for a single capital letter, why not try:

<p(>.*?[[p{Lu}]].*?</p>)

Or better yet, since you're only replacing the p tag with p class="bold"
why not just capture the initial p tag:

(<p>).*?[[p{Lu}]].*?</p>

Hope that gives you some starting ideas.

On 07/31/2016 08:19 AM, Vesta wrote:

> Can anyone show how should look regular expression for this particular case?
>
> this not works too:
>
> <p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
>
> Regards,
> Vesta
>
>
>
>
>
>> Sent: Sunday, July 31, 2016 at 3:32 PM
>> From: "Lex Trotman" <[hidden email]>
>> To: "Geany general discussion list" <[hidden email]>
>> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>>
>> Geany uses the Glib regex library whose syntax is described at
>> https://developer.gnome.org/glib/stable/glib-regex-syntax.html
>>
>> Cheers
>> Lex
>>
>> 2016-07-31 22:03 GMT+10:00 Vesta <[hidden email]>:
>>> How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">
>>>
>>>      <p>                                                   </p>
>>>      <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
>>>      <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
>>>      <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
>>>      <p>                                                                             </p>
>>>      <p>                       CU CONGUE IRIURE SCAEVOLA   --
>>>         UT DOMING IRACUNDIA. </p>
>>>      <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
>>>      <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>
>>>
>>> I want appply class to
>>>
>>> <p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
>>> <p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
>>>         UT DOMING IRACUNDIA. </p>
>>> <p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>
>>>
>>> I need Unicode solution for Cyrillic text. This not works:
>>>
>>> Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
>>> Replace with: <p class="bold"\1
>>> _______________________________________________
>>> Users mailing list
>>> [hidden email]
>>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>> _______________________________________________
>> Users mailing list
>> [hidden email]
>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>>
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Vesta
<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
I just found this regex for unicode,Perl, somewhere and tried modify it, but it not works.

I have Geany 1.23.1, I browsed it regex syntax, but there is no any examples.

The text I want parse have multiple spaces inside paragraphs tags. Sometimes upper case text inside paragraphs are mixed with lower case characters or words - those paragraphs need be omitted. So we need match and apply bold class only to paragraphs, containing all upper case text, as in my examples.

I tried both regex but it not works.
<p(>.*?[[p{Lu}]].*?</p>)

(<p>).*?[[p{Lu}]].*?</p>

Vesta

> Sent: Tuesday, August 02, 2016 at 12:03 PM
> From: "James Ginns" <[hidden email]>
> To: "Geany general discussion list" <[hidden email]>
> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>
> Regular Expressions are a tad difficult to master.
>
> Basic question: you're using lazy modifiers on purpose right? Just
> checking.
>
> So, a dissection The regex engine (don't know what you're using) should
> hit \W*? and look for as few non word characters as possible (in some
> instances zero). Then it will look for ONE character in the character
> class [p{Lu}] (unicode?). Then it will look for zero or more instances
> of [p{Lu}] or a non-word character. This is until it gets to the closing
> tag. Since you're only looking for a single capital letter, why not try:
>
> <p(>.*?[[p{Lu}]].*?</p>)
>
> Or better yet, since you're only replacing the p tag with p class="bold"
> why not just capture the initial p tag:
>
> (<p>).*?[[p{Lu}]].*?</p>
>
> Hope that gives you some starting ideas.
>
> On 07/31/2016 08:19 AM, Vesta wrote:
> > Can anyone show how should look regular expression for this particular case?
> >
> > this not works too:
> >
> > <p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
> >
> > Regards,
> > Vesta
> >
> >
> >
> >
> >
> >> Sent: Sunday, July 31, 2016 at 3:32 PM
> >> From: "Lex Trotman" <[hidden email]>
> >> To: "Geany general discussion list" <[hidden email]>
> >> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
> >>
> >> Geany uses the Glib regex library whose syntax is described at
> >> https://developer.gnome.org/glib/stable/glib-regex-syntax.html
> >>
> >> Cheers
> >> Lex
> >>
> >> 2016-07-31 22:03 GMT+10:00 Vesta <[hidden email]>:
> >>> How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">
> >>>
> >>>      <p>                                                   </p>
> >>>      <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> >>>      <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
> >>>      <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
> >>>      <p>                                                                             </p>
> >>>      <p>                       CU CONGUE IRIURE SCAEVOLA   --
> >>>         UT DOMING IRACUNDIA. </p>
> >>>      <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
> >>>      <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>
> >>>
> >>> I want appply class to
> >>>
> >>> <p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> >>> <p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
> >>>         UT DOMING IRACUNDIA. </p>
> >>> <p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>
> >>>
> >>> I need Unicode solution for Cyrillic text. This not works:
> >>>
> >>> Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
> >>> Replace with: <p class="bold"\1
> >>> _______________________________________________
> >>> Users mailing list
> >>> [hidden email]
> >>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >> _______________________________________________
> >> Users mailing list
> >> [hidden email]
> >> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >>
> > _______________________________________________
> > Users mailing list
> > [hidden email]
> > https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Colomban Wendling
In reply to this post by Vesta
Le 31/07/2016 à 15:19, Vesta a écrit :
> Can anyone show how should look regular expression for this particular case?

This will work:

(<p)(>[^[:lower:]]*[[:upper:]][^[:lower:]]*</p>)

It matches any *but* lowercase, then one upper character, then anything
*but* lower characters.  Using "not lowercase" is useful to allow
punctuation and digits.

if you're interested in supporting uppercase <p> tags, you'll need to
make quantifiers ungreedy too:

(<[pP])(>[^[:lower:]]*?[[:upper:]][^[:lower:]]*?</[pP]>)

Cheers,
Colomban
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

James Ginns
In reply to this post by Vesta
Hmm. Could you be more specific then? When you say it doesn't work, what
kinds of lines is it missing and what kinds of lines is it catching? You
could use a tool like regexpal to see what is and isn't matching. From
the lack of descriptiveness in your message you might have just
forgotten a semicolon for all anyone knows.


On 08/02/2016 06:59 AM, Vesta wrote:

> <p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
> I just found this regex for unicode,Perl, somewhere and tried modify it, but it not works.
>
> I have Geany 1.23.1, I browsed it regex syntax, but there is no any examples.
>
> The text I want parse have multiple spaces inside paragraphs tags. Sometimes upper case text inside paragraphs are mixed with lower case characters or words - those paragraphs need be omitted. So we need match and apply bold class only to paragraphs, containing all upper case text, as in my examples.
>
> I tried both regex but it not works.
> <p(>.*?[[p{Lu}]].*?</p>)
>
> (<p>).*?[[p{Lu}]].*?</p>
>
> Vesta
>
>> Sent: Tuesday, August 02, 2016 at 12:03 PM
>> From: "James Ginns" <[hidden email]>
>> To: "Geany general discussion list" <[hidden email]>
>> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>>
>> Regular Expressions are a tad difficult to master.
>>
>> Basic question: you're using lazy modifiers on purpose right? Just
>> checking.
>>
>> So, a dissection The regex engine (don't know what you're using) should
>> hit \W*? and look for as few non word characters as possible (in some
>> instances zero). Then it will look for ONE character in the character
>> class [p{Lu}] (unicode?). Then it will look for zero or more instances
>> of [p{Lu}] or a non-word character. This is until it gets to the closing
>> tag. Since you're only looking for a single capital letter, why not try:
>>
>> <p(>.*?[[p{Lu}]].*?</p>)
>>
>> Or better yet, since you're only replacing the p tag with p class="bold"
>> why not just capture the initial p tag:
>>
>> (<p>).*?[[p{Lu}]].*?</p>
>>
>> Hope that gives you some starting ideas.
>>
>> On 07/31/2016 08:19 AM, Vesta wrote:
>>> Can anyone show how should look regular expression for this particular case?
>>>
>>> this not works too:
>>>
>>> <p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
>>>
>>> Regards,
>>> Vesta
>>>
>>>
>>>
>>>
>>>
>>>> Sent: Sunday, July 31, 2016 at 3:32 PM
>>>> From: "Lex Trotman" <[hidden email]>
>>>> To: "Geany general discussion list" <[hidden email]>
>>>> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>>>>
>>>> Geany uses the Glib regex library whose syntax is described at
>>>> https://developer.gnome.org/glib/stable/glib-regex-syntax.html
>>>>
>>>> Cheers
>>>> Lex
>>>>
>>>> 2016-07-31 22:03 GMT+10:00 Vesta <[hidden email]>:
>>>>> How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">
>>>>>
>>>>>       <p>                                                   </p>
>>>>>       <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
>>>>>       <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
>>>>>       <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
>>>>>       <p>                                                                             </p>
>>>>>       <p>                       CU CONGUE IRIURE SCAEVOLA   --
>>>>>          UT DOMING IRACUNDIA. </p>
>>>>>       <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
>>>>>       <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>
>>>>>
>>>>> I want appply class to
>>>>>
>>>>> <p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
>>>>> <p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
>>>>>          UT DOMING IRACUNDIA. </p>
>>>>> <p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>
>>>>>
>>>>> I need Unicode solution for Cyrillic text. This not works:
>>>>>
>>>>> Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
>>>>> Replace with: <p class="bold"\1
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> [hidden email]
>>>>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>>>> _______________________________________________
>>>> Users mailing list
>>>> [hidden email]
>>>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>>>>
>>> _______________________________________________
>>> Users mailing list
>>> [hidden email]
>>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>> _______________________________________________
>> Users mailing list
>> [hidden email]
>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>>
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Vesta
I don't know why it not work. Both regex just don't match anything. Below is  screen shots.

https://s31.postimg.org/myq22vtln/Screenshot_from_2016_08_02_23_17_55.png

https://s32.postimg.org/ktjn7ywp1/Screenshot_from_2016_08_02_23_19_15.png


> Sent: Tuesday, August 02, 2016 at 4:58 PM
> From: "James Ginns" <[hidden email]>
> To: "Geany general discussion list" <[hidden email]>
> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>
> Hmm. Could you be more specific then? When you say it doesn't work, what
> kinds of lines is it missing and what kinds of lines is it catching? You
> could use a tool like regexpal to see what is and isn't matching. From
> the lack of descriptiveness in your message you might have just
> forgotten a semicolon for all anyone knows.
>
>
> On 08/02/2016 06:59 AM, Vesta wrote:
> > <p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
> > I just found this regex for unicode,Perl, somewhere and tried modify it, but it not works.
> >
> > I have Geany 1.23.1, I browsed it regex syntax, but there is no any examples.
> >
> > The text I want parse have multiple spaces inside paragraphs tags. Sometimes upper case text inside paragraphs are mixed with lower case characters or words - those paragraphs need be omitted. So we need match and apply bold class only to paragraphs, containing all upper case text, as in my examples.
> >
> > I tried both regex but it not works.
> > <p(>.*?[[p{Lu}]].*?</p>)
> >
> > (<p>).*?[[p{Lu}]].*?</p>
> >
> > Vesta
> >
> >> Sent: Tuesday, August 02, 2016 at 12:03 PM
> >> From: "James Ginns" <[hidden email]>
> >> To: "Geany general discussion list" <[hidden email]>
> >> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
> >>
> >> Regular Expressions are a tad difficult to master.
> >>
> >> Basic question: you're using lazy modifiers on purpose right? Just
> >> checking.
> >>
> >> So, a dissection The regex engine (don't know what you're using) should
> >> hit \W*? and look for as few non word characters as possible (in some
> >> instances zero). Then it will look for ONE character in the character
> >> class [p{Lu}] (unicode?). Then it will look for zero or more instances
> >> of [p{Lu}] or a non-word character. This is until it gets to the closing
> >> tag. Since you're only looking for a single capital letter, why not try:
> >>
> >> <p(>.*?[[p{Lu}]].*?</p>)
> >>
> >> Or better yet, since you're only replacing the p tag with p class="bold"
> >> why not just capture the initial p tag:
> >>
> >> (<p>).*?[[p{Lu}]].*?</p>
> >>
> >> Hope that gives you some starting ideas.
> >>
> >> On 07/31/2016 08:19 AM, Vesta wrote:
> >>> Can anyone show how should look regular expression for this particular case?
> >>>
> >>> this not works too:
> >>>
> >>> <p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
> >>>
> >>> Regards,
> >>> Vesta
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> Sent: Sunday, July 31, 2016 at 3:32 PM
> >>>> From: "Lex Trotman" <[hidden email]>
> >>>> To: "Geany general discussion list" <[hidden email]>
> >>>> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
> >>>>
> >>>> Geany uses the Glib regex library whose syntax is described at
> >>>> https://developer.gnome.org/glib/stable/glib-regex-syntax.html
> >>>>
> >>>> Cheers
> >>>> Lex
> >>>>
> >>>> 2016-07-31 22:03 GMT+10:00 Vesta <[hidden email]>:
> >>>>> How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">
> >>>>>
> >>>>>       <p>                                                   </p>
> >>>>>       <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> >>>>>       <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
> >>>>>       <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
> >>>>>       <p>                                                                             </p>
> >>>>>       <p>                       CU CONGUE IRIURE SCAEVOLA   --
> >>>>>          UT DOMING IRACUNDIA. </p>
> >>>>>       <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
> >>>>>       <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>
> >>>>>
> >>>>> I want appply class to
> >>>>>
> >>>>> <p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> >>>>> <p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
> >>>>>          UT DOMING IRACUNDIA. </p>
> >>>>> <p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>
> >>>>>
> >>>>> I need Unicode solution for Cyrillic text. This not works:
> >>>>>
> >>>>> Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
> >>>>> Replace with: <p class="bold"\1
> >>>>> _______________________________________________
> >>>>> Users mailing list
> >>>>> [hidden email]
> >>>>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> [hidden email]
> >>>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >>>>
> >>> _______________________________________________
> >>> Users mailing list
> >>> [hidden email]
> >>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >> _______________________________________________
> >> Users mailing list
> >> [hidden email]
> >> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >>
> > _______________________________________________
> > Users mailing list
> > [hidden email]
> > https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Vesta
In reply to this post by Colomban Wendling
Regex works fine -- Thank you.


B.Regards,
Alex

> Sent: Tuesday, August 02, 2016 at 3:17 PM
> From: "Colomban Wendling" <[hidden email]>
> To: "Geany general discussion list" <[hidden email]>
> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>
> Le 31/07/2016 à 15:19, Vesta a écrit :
> > Can anyone show how should look regular expression for this particular case?
>
> This will work:
>
> (<p)(>[^[:lower:]]*[[:upper:]][^[:lower:]]*</p>)
>
> It matches any *but* lowercase, then one upper character, then anything
> *but* lower characters.  Using "not lowercase" is useful to allow
> punctuation and digits.
>
> if you're interested in supporting uppercase <p> tags, you'll need to
> make quantifiers ungreedy too:
>
> (<[pP])(>[^[:lower:]]*?[[:upper:]][^[:lower:]]*?</[pP]>)
>
> Cheers,
> Colomban
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Vesta
In reply to this post by Colomban Wendling
One note: how to replace <p> with <p class="bold"> in all matched lines?



> Sent: Tuesday, August 02, 2016 at 3:17 PM
> From: "Colomban Wendling" <[hidden email]>
> To: "Geany general discussion list" <[hidden email]>
> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>
> Le 31/07/2016 à 15:19, Vesta a écrit :
> > Can anyone show how should look regular expression for this particular case?
>
> This will work:
>
> (<p)(>[^[:lower:]]*[[:upper:]][^[:lower:]]*</p>)
>
> It matches any *but* lowercase, then one upper character, then anything
> *but* lower characters.  Using "not lowercase" is useful to allow
> punctuation and digits.
>
> if you're interested in supporting uppercase <p> tags, you'll need to
> make quantifiers ungreedy too:
>
> (<[pP])(>[^[:lower:]]*?[[:upper:]][^[:lower:]]*?</[pP]>)
>
> Cheers,
> Colomban
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Colomban Wendling
Le 03/08/2016 à 00:49, Vesta a écrit :
> One note: how to replace <p> with <p class="bold"> in all matched lines?

\1 class="bold"\2

or alter the RE to whatever capture you like best

Cheers,
Colomban
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Vesta
\1 class="bold"\2

How to alter this to apply <h2> </h2> tags in place of <p> </p> tags?


Best Regards,
Vesta

> Sent: Wednesday, August 03, 2016 at 2:16 AM
> From: "Colomban Wendling" <[hidden email]>
> To: "Geany general discussion list" <[hidden email]>
> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>
> Le 03/08/2016 à 00:49, Vesta a écrit :
> > One note: how to replace <p> with <p class="bold"> in all matched lines?
>
> \1 class="bold"\2
>
> or alter the RE to whatever capture you like best
>
> Cheers,
> Colomban
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Regular expression, for Unicode characters

Colomban Wendling
Le 03/08/2016 à 02:13, Vesta a écrit :
> \1 class="bold"\2
>
> How to alter this to apply <h2> </h2> tags in place of <p> </p> tags?

You should try and understand the regex instead of using it as a mere
magic solution.





But here you go:

(<p>)([^[:lower:]]*[[:upper:]][^[:lower:]]*)(</p>)

<h2>\2</h2>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Geany-Users] Remove Extra Whitespace from text

Vesta
In reply to this post by James Ginns
Text have multiple whitespaces between words within <p> </p> and <h2><h2> tags.

How to find multiple whitespaces and replace them with a single whitespace?

Regards,
Alex
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Remove Extra Whitespace from text

Colomban Wendling
Le 05/08/2016 à 14:10, Vesta a écrit :
> Text have multiple whitespaces between words within <p> </p> and <h2><h2> tags.
>
> How to find multiple whitespaces and replace them with a single whitespace?

learn regexes? :)  For basic stuff like that it isn't so complex, and
very powerful.  Though, here you could also do it just replacing two
spaces with one until there's no more to replace.

Regards,
Colomban

PS: [[:space:]]+
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Remove Extra Whitespace from text

Vesta
Thanks you for support.
Regex is a quite tricky, however if there is no other way, regex is only solution.

[[:space:]]+

There is one small issue with this: it also removes space between </p> and <p> when paragraphs begins from new line, i.e.
<p> first line text </p>
<p> second line text </p>

so paragraphs merge in one line:
<p> first line text </p> <p> second line text </p>

The same for headers and paragraphs:

<h1> text </h2>
<p> text </p>

becomes <h1> text </h2> <p> text </p>

How to avoid this?

Best Regards,
Alex

> Sent: Friday, August 05, 2016 at 3:12 PM
> From: "Colomban Wendling" <[hidden email]>
> To: "Geany general discussion list" <[hidden email]>
> Subject: Re: [Geany-Users] Remove Extra Whitespace from text
>
> Le 05/08/2016 à 14:10, Vesta a écrit :
> > Text have multiple whitespaces between words within <p> </p> and <h2><h2> tags.
> >
> > How to find multiple whitespaces and replace them with a single whitespace?
>
> learn regexes? :)  For basic stuff like that it isn't so complex, and
> very powerful.  Though, here you could also do it just replacing two
> spaces with one until there's no more to replace.
>
> Regards,
> Colomban
>
> PS: [[:space:]]+
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Geany-Users] Remove Extra Whitespace from text

Colomban Wendling
Le 05/08/2016 à 22:31, Vesta a écrit :
> […]
>
> [[:space:]]+
>
> There is one small issue with this: it also removes space between </p> and <p> when paragraphs begins from new line, i.e.
> […]
>
> How to avoid this?

don't match newlines.  " +" (without the quotes) is likely enough.
_______________________________________________
Users mailing list
[hidden email]
https://lists.geany.org/cgi-bin/mailman/listinfo/users
Loading...