TTMSFNCMemoCustomLanguage
TTMSFNCMemoCustomLanguage is a component to create custom language definitions to be used in a TTMSFNCMemo. It is based on a Monarch library and we will refer to the sample that is available in the Monarch documentation.
Add a custom language to TTMSFNCMemo
Adding a custom langauge can be done programmatically or at design time. The general steps are as follow:
- Setup
TTMSFNCMemoCustomLanguagewith the language definitions - Add a new item in the
TTMSFNCMemo.CustomLanguagescollection and assign theTTMSFNCMemoCustomLanguageinstance to theTTMSFNCMemo.CustomLanguages[].Languageproperty. - Set the
TTMSFNCMemo.Languageproperty tomlCustom. - Set the
TTMSFNCMemo.CustomLanguageIDproperty to the custom language ID (if the first item is added, it will automatically be selected).
Doing the steps above programmatically:
procedure TForm1.Button1Click(Sender: TObject);
begin
//Setup component as desired, e.g:
//TTMSFNCMemoCustomLanguage1.Brackets.Add('{', '}', lbtCurly);
TMSFNCMemo1.CustomLanguages.Add.Language := TMSFNCMemoCustomLanguage1;
TMSFNCMemo1.Language := mlCustom;
//Optionally:
//TMSFNCMemo1.CustomLanguageID := 'my-language';
end;
Defining a language
We mapped the component to the definition object that the Monarch library is using. It helps if you familiarize yourself a bit with the Monarch documentation first.
For the most common and simple properties please refer to the properties overview at the end.
ExtraKeys
The Monarch library allows flexible definitions. In the language definition object it is possible to define a list of strings or a regular expression with a custom name, and later these can be referenced as a token or guard with the same name. To provide the same flexibility, we added the ExtraKeys collection.
Each ExtraKey item has 3 properties:
Name: string: This will be used to reference the keyValueList: TStringList: The values of the key as a listValueRegex: string: The value of the key as a regular expression string
In the mylang sample provided by Monarch, you can observe 5 of these extra keys: keywords, typeKeywords, operators, symbols and escapes.
At design-time, you'd want to add keywords, typeKeywords and operators in the ValueList. For symbols and escapes ValueRegex should be used.
You can add these to TTMSFNCMemoCustomLanguage the following way:
//One-by-one, for example:
procedure TForm1.Button1Click(Sender: TObject);
var
key: TTMSFNCMemoCustomLanguageKey;
begin
key := TMSFNCMemoCustomLanguage1.ExtraKeys.Add;
key.Name := 'symbols';
key.ValueRegex := '[=><!~?:&|+\-*\/\^%]+';
end;
//Or using the Add overrides:
procedure TForm1.Button1Click(Sender: TObject);
begin
TMSFNCMemoCustomLanguage1.ExtraKeys.Add('keywords', ['abstract', 'continue',
'for', 'new', 'switch', 'assert', 'goto', 'do', 'if', 'private', 'this',
'break', 'protected', 'throw', 'else', 'public', 'enum', 'return', 'catch',
'try', 'interface', 'static', 'class', 'finally', 'const', 'super', 'while',
'true', 'false']);
TMSFNCMemoCustomLanguage1.ExtraKeys.Add('typeKeywords', ['boolean', 'double',
'byte', 'int', 'short', 'char', 'void', 'long', 'float']);
TMSFNCMemoCustomLanguage1.ExtraKeys.Add('operators', ['=', '>', '<', '!', '~',
'?', ':', '==', '<=', '>=', '!=', '&&', '||', '++', '--', '+', '-', '*',
'/', '&', '|', '^', '%', '<<', '>>', '>>>', '+=', '-=', '*=', '/=', '&=',
'|=', '^=', '%=', '<<=', '>>=', '>>>=']);
TMSFNCMemoCustomLanguage1.ExtraKeys.Add('symbols', '[=><!~?:&|+\-*\/\^%]+');
TMSFNCMemoCustomLanguage1.ExtraKeys.Add('escapes', '\\(?:[abfnrtv\\"'']|x[0-9A-Fa-f]{1,4}|u[0-9A-Fa-f]{4}|U[0-9A-Fa-f]{8})');
end;
Tokenizer
The Tokenizer property corresponds to the tokenizer used in Monarch. This tokenizer defines how lexical analysis is performed and how the input is split into tokens. While the Monarch documentation utilizes shorthand syntax, the TTMSFNCMemoCustomLanguage requires the full object representation. To simplify programmatic configuration, we’ve provided Add method overrides, which replicates the shorthand syntax commonly used in JavaScript.
Let's see how you can add items to the tokenizer based on the mylang sample.
procedure TForm1.Button1Click(Sender: TObject);
var
root, comment, str, whitespace: TTMSFNCMemoCustomLanguageTokenizerItem;
r: TTMSFNCMemoCustomLanguageRule;
begin
root := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
root.Name := 'root';
// identifiers and keywords
r := root.Rules.Add;
r.Regex := '[a-z_$][\w$]*';
r.Action.Cases.Add('@typeKeywords', 'keyword');
r.Action.Cases.Add('@keywords', 'keyword');
r.Action.Cases.Add('@default', 'identifier');
root.Rules.Add('[A-Z][\w\$]*', 'type.identifier');
//whitespace
root.Rules.Add.Include := '@whitespace';
//delimiters and operators
root.Rules.Add('[{}()\[\]]', '@brackets');
root.Rules.Add('[<>](?!@symbols)', '@brackets');
r := root.Rules.Add;
r.Regex := '@symbols';
r.Action.Cases.Add('@operators', 'operator');
r.Action.Cases.Add('@default', '');
//@annotations
r := root.Rules.Add;
r.Regex := '@\s*[a-zA-Z_\$][\w\$]*';
r.Action.Token := 'annotation';
r.Action.LogMessage := 'annotation token: $0';
//numbers
root.Rules.Add('\d*\.\d+([eE][\-+]?\d+)?', 'number.float');
root.Rules.Add('0[xX][0-9a-fA-F]+', 'number.hex');
root.Rules.Add('\d+', 'number');
//delimiter
root.Rules.Add('[;,.]', 'delimiter');
//strings
root.Rules.Add('"([^"\\]|\\.)*$', 'string.invalid');
r := root.Rules.Add;
r.Regex := '"';
r.Action.Token := 'string.quote';
r.Action.Bracket := lbkOpen;
r.Action.Next := '@string';
//characters
root.Rules.Add('''[^\\'']''', 'string');
r := root.Rules.Add;
r.Regex := '('')(@escapes)('')';
r.Action.Group.Add.Token := 'string';
r.Action.Group.Add.Token := 'string.escape';
r.Action.Group.Add.Token := 'string';
root.Rules.Add('''', 'string.invalid');
comment := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
comment.Name := 'comment';
comment.Rules.Add('[^\/*]+', 'comment');
comment.Rules.Add('\/\*', 'comment', '@push');
comment.Rules.Add('\*\/', 'comment', '@pop');
comment.Rules.Add('[\/*]', 'comment');
str := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
str.Name := 'string';
str.Rules.Add('[^\\"]+', 'string');
str.Rules.Add('@escapes', 'string.escape');
str.Rules.Add('\\.', 'string.invalid');
r := src.Rules.Add;
r.Regex := '"';
r.Action.Token := 'string.quote';
r.Action.Bracket := lbkClose;
r.Action.Next := '@pop';
whitespace := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
whitespace.Name := 'whitespace';
whitespace.Rules.Add('[ \t\r\n]+', 'white');
whitespace.Rules.Add('\/\*', 'comment', '@comment');
whitespace.Rules.Add('\/\/.*$', 'comment');
end;
Note
The mylang sample is available as a demo in the Demo folder.
Save and load a language definition
Thanks to TTMSFNCPersistence, you can save and load a language definition in the component. To save, simply call:
procedure TForm1.Button1Click(Sender: TObject);
begin
TMSFNCMemoCustomLanguage1.SaveSettingsToFile('path/to/my-lang.file');
end;
And to load:
procedure TForm1.Button1Click(Sender: TObject);
begin
TMSFNCMemoCustomLanguage1.LoadSettingsFromFile('path/to/my-lang.file');
end;
Properties
| Property name | Description |
|---|---|
| Brackets | This is used by the tokenizer to easily define matching braces. |
| DefaultToken: string | The default token is used if nothing matches in the tokenizer. |
| ExtraKeys | See ExtraKeys. |
| IgnoreCase: Boolean | Maps onto the ignoreCase option. The regular expressions in the tokenizer use this to do case (in)sensitive matching, as well as tests in the cases construct.. This value is False by default. |
| IncludeLF: Boolean | Include line feeds (in the form of an \n character) at the end of the lines. Default value is False. |
| LanguageID: string | The ID of the language. If left empty, the component name will be used. |
| Tokenizer | Defines the tokenization rules. See Tokenizer. |
| Unicode: Boolean | Determines if the language is unicode-aware |
Methods
| Method name | Description |
|---|---|
| LoadSettingsFromFile | Saves component settings (= language definition) to a file |
| SaveSettingsToFile | Loads component settings (= language definition) from a file |