Oct 26, 2006; 10:52
Fletcher Sandbeck
Re: Another Regex problem
On 2006-10-26 at 15:17 by j.harris@digital-ink.co.uk (Jon Harris):
>We are developing a site for a client that has a glossary page.
>
>Simple stuff like JIT means "just in time" etc.
>
>With their page content we need to replace a glossary code with a link
>to the definition on the glossary page. (Sounds like a simple
>string_replace, so far) only problem is we need to keep the text as it
>was orginally entered and build a link around it as in the examples
>below.
>
>'jit' becomes <a href="glossary.lasso?def=jit">jit</a>
> ^^^ ^^^
>and
>
>'Jit' becomes <a href="glossary.lasso?def=jit">Jit</a>
> ^^^ ^^^
>
>and
>
>'JIT' becomes <a href="glossary.lasso?def=jit">JIT</a>
> ^^^ ^^^
>
>etc. So it now it looks like something I could do with a regex.
>
>Can anyone give me some ideas about what the regex expression might look
>like?
If you are using the latest version of Lasso 8.5 I would use the new [RegExp] type to do this. You can read about the new type in this tip of the week:
<http://www.omnipilot.com/TotW.1768.9152.lasso>
For example, generate an array containing each word that you want to link.
[var: 'MyGlossary' = (array: 'jit', 'asap', 'wysiwyg')]
[var: 'MyString' = 'This will happen JIT.']
Then you can use the following code to iterate through each instance of these words in your source and decide whether or not it needs to be linked. The conditionals inTag, inAnchor, and inHead attempt to determine if the current match is within the angle brackets of a tag, has already been linked, or is within the head of the document.
<?LassoScript
Var: 'MyRegExp' = (RegExp: -Find='\\w+', -Input=$MyString, -IgnoreCase);
While: $MyRegExp->Find;
Var: 'temp' = $MyRegExp->(MatchString);
Var: 'inTag' = ((string_replaceregexp: $myregexp->output,
-find='<.*>', -replace='', -ignorecase) >> '<');
Var: 'inAnchor' = ((string_replaceregexp: $myregexp->output,
-find='<a .*</a>', -replace='', -ignorecase) >> '<a ');
Var: 'inHead' = ((string_replaceregexp: $myregexp->output,
-find='<head .*</head >', -replace='', -ignorecase) >> '<head ');
If: ($MyGlossary >> $temp) && !($inTag || $inAnchor || $inHead);
$MyRegExp->(AppendReplacement: '<a href="glossary.lasso?def=' +
$temp + '">' + $temp + '</a>');
Else;
$MyRegExp->(AppendReplacement: $temp);
/If;
/While;
$MyRegExp->AppendTail;
?>
When we output $MyRegExp you can see that the bare instance of JIT has been modified, but the instance that was already linked has not.
[Encode_HTML: $MyRegExp]
-> This will happen <a href="glossary.lasso?def=JIT">JIT</a>. This will not be modfiied <a href="">JIT</a>.
You can do something similar with the [String_ReplaceRegexp] tag in LP8 and earlier, but it is not as efficient as the code shown above.
Hope this helps,
[fletcher]
--
Fletcher Sandbeck fletcher@omnipilot.com
Director of Product Development http://www.lassostudio.com
OmniPilot Software, Inc. http://www.omnipilot.com
===========================================Attend the Lasso Summit
March 2-7, 2007 in Fort Lauderdale, FL
http://www.LassoSummit.com/
========================================= ------------------------------
Lasso Support: http://support.omnipilot.com/
Search the list archives: http://www.listsearch.com/lassotalk.lasso
Manage your list subscription:
http://www.listsearch.com/lassotalk.lasso?manage