Skip to content

TTMSFNCMemoCustomLanguage

TTMSFNCMemoCustomLanguage is a component to create custom language definitions to be used in a TTMSFNCMemo. It is based on a Monarch library and we will refer to the sample that is available in the Monarch documentation.

Add a custom language to TTMSFNCMemo

Adding a custom langauge can be done programmatically or at design time. The general steps are as follow:

  1. Setup TTMSFNCMemoCustomLanguage with the language definitions
  2. Add a new item in the TTMSFNCMemo.CustomLanguages collection and assign the TTMSFNCMemoCustomLanguage instance to the TTMSFNCMemo.CustomLanguages[].Language property.
  3. Set the TTMSFNCMemo.Language property to mlCustom.
  4. Set the TTMSFNCMemo.CustomLanguageID property to the custom language ID (if the first item is added, it will automatically be selected).

Doing the steps above programmatically:

procedure TForm1.Button1Click(Sender: TObject);
begin
  //Setup component as desired, e.g:
  //TTMSFNCMemoCustomLanguage1.Brackets.Add('{', '}', lbtCurly);
  TMSFNCMemo1.CustomLanguages.Add.Language := TMSFNCMemoCustomLanguage1;
  TMSFNCMemo1.Language := mlCustom;
  //Optionally:
  //TMSFNCMemo1.CustomLanguageID := 'my-language';
end;

Defining a language

We mapped the component to the definition object that the Monarch library is using. It helps if you familiarize yourself a bit with the Monarch documentation first.

For the most common and simple properties please refer to the properties overview at the end.

ExtraKeys

The Monarch library allows flexible definitions. In the language definition object it is possible to define a list of strings or a regular expression with a custom name, and later these can be referenced as a token or guard with the same name. To provide the same flexibility, we added the ExtraKeys collection.

Each ExtraKey item has 3 properties:

  • Name: string: This will be used to reference the key
  • ValueList: TStringList: The values of the key as a list
  • ValueRegex: string: The value of the key as a regular expression string

In the mylang sample provided by Monarch, you can observe 5 of these extra keys: keywords, typeKeywords, operators, symbols and escapes.

At design-time, you'd want to add keywords, typeKeywords and operators in the ValueList. For symbols and escapes ValueRegex should be used. You can add these to TTMSFNCMemoCustomLanguage the following way:

//One-by-one, for example:
procedure TForm1.Button1Click(Sender: TObject);
var
  key: TTMSFNCMemoCustomLanguageKey;
begin
  key := TMSFNCMemoCustomLanguage1.ExtraKeys.Add;
  key.Name := 'symbols';
  key.ValueRegex := '[=><!~?:&|+\-*\/\^%]+';
end;

//Or using the Add overrides:
procedure TForm1.Button1Click(Sender: TObject);
begin
  TMSFNCMemoCustomLanguage1.ExtraKeys.Add('keywords', ['abstract', 'continue',
    'for', 'new', 'switch', 'assert', 'goto', 'do', 'if', 'private', 'this',
    'break', 'protected', 'throw', 'else', 'public', 'enum', 'return', 'catch',
    'try', 'interface', 'static', 'class', 'finally', 'const', 'super', 'while',
    'true', 'false']);

  TMSFNCMemoCustomLanguage1.ExtraKeys.Add('typeKeywords', ['boolean', 'double',
    'byte', 'int', 'short', 'char', 'void', 'long', 'float']);

  TMSFNCMemoCustomLanguage1.ExtraKeys.Add('operators', ['=', '>', '<', '!', '~',
    '?', ':', '==', '<=', '>=', '!=', '&&', '||', '++', '--', '+', '-', '*',
    '/', '&', '|', '^', '%', '<<', '>>', '>>>', '+=', '-=', '*=', '/=', '&=',
    '|=', '^=', '%=', '<<=', '>>=', '>>>=']);

  TMSFNCMemoCustomLanguage1.ExtraKeys.Add('symbols', '[=><!~?:&|+\-*\/\^%]+');

  TMSFNCMemoCustomLanguage1.ExtraKeys.Add('escapes', '\\(?:[abfnrtv\\"'']|x[0-9A-Fa-f]{1,4}|u[0-9A-Fa-f]{4}|U[0-9A-Fa-f]{8})');
end;

Tokenizer

The Tokenizer property corresponds to the tokenizer used in Monarch. This tokenizer defines how lexical analysis is performed and how the input is split into tokens. While the Monarch documentation utilizes shorthand syntax, the TTMSFNCMemoCustomLanguage requires the full object representation. To simplify programmatic configuration, we’ve provided Add method overrides, which replicates the shorthand syntax commonly used in JavaScript.

Let's see how you can add items to the tokenizer based on the mylang sample.

procedure TForm1.Button1Click(Sender: TObject);
var
  root, comment, str, whitespace: TTMSFNCMemoCustomLanguageTokenizerItem;
  r: TTMSFNCMemoCustomLanguageRule;
begin
  root := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
  root.Name := 'root';

  // identifiers and keywords
  r := root.Rules.Add;
  r.Regex := '[a-z_$][\w$]*';
  r.Action.Cases.Add('@typeKeywords', 'keyword');
  r.Action.Cases.Add('@keywords', 'keyword');
  r.Action.Cases.Add('@default', 'identifier');

  root.Rules.Add('[A-Z][\w\$]*', 'type.identifier');

  //whitespace
  root.Rules.Add.Include := '@whitespace';

  //delimiters and operators
  root.Rules.Add('[{}()\[\]]', '@brackets');
  root.Rules.Add('[<>](?!@symbols)', '@brackets');
  r := root.Rules.Add;
  r.Regex := '@symbols';
  r.Action.Cases.Add('@operators', 'operator');
  r.Action.Cases.Add('@default', '');

  //@annotations
  r := root.Rules.Add;
  r.Regex := '@\s*[a-zA-Z_\$][\w\$]*';
  r.Action.Token := 'annotation';
  r.Action.LogMessage := 'annotation token: $0';

  //numbers
  root.Rules.Add('\d*\.\d+([eE][\-+]?\d+)?', 'number.float');
  root.Rules.Add('0[xX][0-9a-fA-F]+', 'number.hex');
  root.Rules.Add('\d+', 'number');

  //delimiter
  root.Rules.Add('[;,.]', 'delimiter');

  //strings
  root.Rules.Add('"([^"\\]|\\.)*$', 'string.invalid');
  r := root.Rules.Add;
  r.Regex := '"';
  r.Action.Token := 'string.quote';
  r.Action.Bracket := lbkOpen;
  r.Action.Next := '@string';

  //characters
  root.Rules.Add('''[^\\'']''', 'string');
  r := root.Rules.Add;
  r.Regex := '('')(@escapes)('')';
  r.Action.Group.Add.Token := 'string';
  r.Action.Group.Add.Token := 'string.escape';
  r.Action.Group.Add.Token := 'string';
  root.Rules.Add('''', 'string.invalid');

  comment := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
  comment.Name := 'comment';
  comment.Rules.Add('[^\/*]+', 'comment');
  comment.Rules.Add('\/\*', 'comment', '@push');
  comment.Rules.Add('\*\/', 'comment', '@pop');
  comment.Rules.Add('[\/*]', 'comment');

  str := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
  str.Name := 'string';
  str.Rules.Add('[^\\"]+', 'string');
  str.Rules.Add('@escapes', 'string.escape');
  str.Rules.Add('\\.', 'string.invalid');
  r := src.Rules.Add;
  r.Regex := '"';
  r.Action.Token := 'string.quote';
  r.Action.Bracket := lbkClose;
  r.Action.Next := '@pop';

  whitespace := TMSFNCMemoCustomLanguage1.Tokenizer.Add;
  whitespace.Name := 'whitespace';
  whitespace.Rules.Add('[ \t\r\n]+', 'white');
  whitespace.Rules.Add('\/\*', 'comment', '@comment');
  whitespace.Rules.Add('\/\/.*$', 'comment');
end;

Note

The mylang sample is available as a demo in the Demo folder.

Save and load a language definition

Thanks to TTMSFNCPersistence, you can save and load a language definition in the component. To save, simply call:

procedure TForm1.Button1Click(Sender: TObject);
begin
  TMSFNCMemoCustomLanguage1.SaveSettingsToFile('path/to/my-lang.file');
end;

And to load:

procedure TForm1.Button1Click(Sender: TObject);
begin
  TMSFNCMemoCustomLanguage1.LoadSettingsFromFile('path/to/my-lang.file');
end;

Properties

Property name Description
Brackets This is used by the tokenizer to easily define matching braces.
DefaultToken: string The default token is used if nothing matches in the tokenizer.
ExtraKeys See ExtraKeys.
IgnoreCase: Boolean Maps onto the ignoreCase option. The regular expressions in the tokenizer use this to do case (in)sensitive matching, as well as tests in the cases construct.. This value is False by default.
IncludeLF: Boolean Include line feeds (in the form of an \n character) at the end of the lines. Default value is False.
LanguageID: string The ID of the language. If left empty, the component name will be used.
Tokenizer Defines the tokenization rules. See Tokenizer.
Unicode: Boolean Determines if the language is unicode-aware

Methods

Method name Description
LoadSettingsFromFile Saves component settings (= language definition) to a file
SaveSettingsToFile Loads component settings (= language definition) from a file