Use Regex: Difference between revisions

From Cuis CookBook
Jump to navigation Jump to search
(imported material)
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 2: Line 2:


* First things first, to work with Regexp you need to load the appropriate package
* First things first, to work with Regexp you need to load the appropriate package
<syntaxhighlight lang="smalltalk">

Feature require: 'Regex'.
Feature require: 'Regex'.
</syntaxhighlight>

* Important, '''matchesRegex''' tries to match a full string, if you want to scan for a substring use '''search: .'''
* Important, '''matchesRegex''' tries to match a full string, if you want to scan for a substring use '''search: .'''
* Does the string match a regex?
* Does the string match a regex?
<syntaxhighlight lang="smalltalk">

'hello123' matchesRegex: 'he'. " => false "
'hello123' matchesRegex: 'he'. " => false "
'hello123' matchesRegex: 'he.*'. " => true "
'hello123' matchesRegex: 'he.*'. " => true "
</syntaxhighlight>

* Find all occurrences of a regex
* Find all occurrences of a regex
<syntaxhighlight lang="smalltalk">

' 2019-08-01T00:00:00+00:00' allRegexMatches: '\d+'.
' 2019-08-01T00:00:00+00:00' allRegexMatches: '\d+'.
" => an OrderedCollection('2019' '08' '01' '00' '00' '00' '00' '00') "
" => an OrderedCollection('2019' '08' '01' '00' '00' '00' '00' '00') "
</syntaxhighlight>

* '''Split''' a string, classic applications: CSV parsing, get a list of words from a sentence.
* '''Split''' a string, classic applications: CSV parsing, get a list of words from a sentence.
<syntaxhighlight lang="smalltalk">

',' asRegex split: '123,c1,12.4 , Foo bar, baz'.
',' asRegex split: '123,c1,12.4 , Foo bar, baz'.
" => an OrderedCollection('123' 'c1' '12.4 ' ' Foo bar' ' baz') "
" => an OrderedCollection('123' 'c1' '12.4 ' ' Foo bar' ' baz') "
</syntaxhighlight>

* '''Substitute''' each regex match with a given string
* '''Substitute''' each regex match with a given string
<syntaxhighlight lang="smalltalk">

'ab cd ab' copyWithRegex: '(a|b)+' matchesReplacedWith: 'foo' .
'ab cd ab' copyWithRegex: '(a|b)+' matchesReplacedWith: 'foo' .
" => 'foo cd foo' "
" => 'foo cd foo' "
</syntaxhighlight>

* '''Substitute''' each regex match with the '''result of Block value'''
* '''Substitute''' each regex match with the '''result of Block value'''
<syntaxhighlight lang="smalltalk">

'ab cd ab' copyWithRegex: '(a|b)+' matchesTranslatedUsing: [:each | each asUppercase] .
'ab cd ab' copyWithRegex: '(a|b)+' matchesTranslatedUsing: [:each | each asUppercase] .
" => 'AB cd AB' "
" => 'AB cd AB' "
</syntaxhighlight>

* '''Grouping''' in regex
* '''Grouping''' in regex
<syntaxhighlight lang="smalltalk">

str _ 'Today is 05-Aug, it is about 09:34, and we are near Verona. ' .
str := 'Today is 05-Aug, it is about 09:34, and we are near Verona. ' .
rex _ '(\d+)\:(\d+)' asRegex .
rex := '(\d+)\:(\d+)' asRegex .
rex class. " => RxMatcher "
rex class. " => RxMatcher "
rex search: str. " => true # true, a match has been found. "
rex search: str. " => true # true, a match has been found. "
rex subexpressionCount. " => 3 # number of elements matched. "
rex subexpressionCount. " => 3 # number of elements matched. "
rex subexpression: 1. " => '09:34' # first element is always the whole match. "
rex subexpression: 1. " => '09:34' # first element is always the whole match. "
rex subexpression: 2. " => '09' # then there are groups... "
rex subexpression: 2. " => '09' # then there are groups... "
rex subexpression: 3. " => '34' "
rex subexpression: 3. " => '34' "
</syntaxhighlight>


* Observe well that the matching substrings are stored into the Regex object.
* Observe well that the matching substrings are stored into the Regex object.

Latest revision as of 22:41, 4 May 2025

Regex is large subject. For the moment I am going to put here just a list of examples.

  • First things first, to work with Regexp you need to load the appropriate package
Feature require: 'Regex'.
  • Important, matchesRegex tries to match a full string, if you want to scan for a substring use search: .
  • Does the string match a regex?
'hello123' matchesRegex: 'he'. " => false " 
'hello123' matchesRegex: 'he.*'. " => true "
  • Find all occurrences of a regex
' 2019-08-01T00:00:00+00:00' allRegexMatches: '\d+'. 
" => an OrderedCollection('2019' '08' '01' '00' '00' '00' '00' '00') "
  • Split a string, classic applications: CSV parsing, get a list of words from a sentence.
',' asRegex split: '123,c1,12.4 , Foo bar, baz'. 
" => an OrderedCollection('123' 'c1' '12.4 ' ' Foo bar' ' baz') "
  • Substitute each regex match with a given string
'ab cd ab' copyWithRegex: '(a|b)+' matchesReplacedWith: 'foo' . 
" => 'foo cd foo' "
  • Substitute each regex match with the result of Block value
'ab cd ab' copyWithRegex: '(a|b)+' matchesTranslatedUsing: [:each | each asUppercase] . 
" => 'AB cd AB' "
  • Grouping in regex
str := 'Today is 05-Aug, it is about 09:34, and we are near Verona. ' . 
rex := '(\d+)\:(\d+)' asRegex . 
rex class. " => RxMatcher " 
rex search: str. " => true # true, a match has been found. " 
rex subexpressionCount. " => 3 # number of elements matched. " 
rex subexpression: 1. " => '09:34' # first element is always the whole match. " 
rex subexpression: 2. " => '09' # then there are groups... " 
rex subexpression: 3. " => '34' "
  • Observe well that the matching substrings are stored into the Regex object.

Dr. Nicola Mingotti updated this on 06-Sep-2021. Examples were run in a Cuis a bit older than Cuis5.0-4834.image.