Character Strings

Strings of characters play a central role in input/output so that the operations provided for strings to some extent reflect this. However, if one wishes, a more general set of operations are available if the string is first converted into a sequence. We will give some examples of this below.

Magma provides two kinds of strings: normal character strings, and binary strings. Character strings are an inappropriate choice for manipulating data that includes non-printable characters. If this is required, a better choice is the binary string type. This type is similar semantically to a sequence of integers, in which each character is represented by its ASCII value between 0 and 255. The difference between a binary string and a sequence of integers is that a binary string is stored internally as an array of bytes, which is a more space-efficient representation.

Contents

Representation of Strings

Character strings may consist of all ordinary characters appearing on your keyboard, including the blank (space). Two symbols have a special meaning: the double-quote " and the backslash . The double-quote is used to delimit a character string, and hence cannot be used inside a string; to be able to use a double-quote in strings the backslash is designed to be an escape character and is used to indicate that the next symbol has to be taken literally; thus, by using \" inside a string one indicates that the symbol " has to be taken literally and is not to be interpreted as the end-of-string delimiter. Thus:

> "\"Print this line in quotes\"";
"Print this line in quotes"
To obtain a literal backslash, one simply types two backslashes; for characters other than double-quotes and backslash it does not make a difference when a backslash precedes them inside a string, with the exception of n, r and t. Any occurrence of \n or \r inside a string is converted into a <new-line> while \t is converted into a <tab>. For example:
> "The first line,\nthe second line, and then\ran\tindented line";
The first line,
the second line, and then
an        indented line
Note that a backslash followed by a return allows one to conveniently continue the current construction on the next line; so \<return> inside a string will be ignored, except that input will continue on a new line on your screen.

Binary strings, on the hand, can consist of any character, whether printable or non-printable. Binary strings cannot be constructed using literals, but must be constructed either from a character string, or during a read operation from a file.

Creation of Strings

"abc" : -> MonStgElt
Create a string from a succession of keyboard characters (a, b, c) enclosed in double quotes " ".
BinaryString(s) : MonStgElt -> BStgElt
BString(s) : MonStgElt -> BStgElt
Create a binary string from the character string s.
s cat t : MonStgElt, MonStgElt -> MonStgElt
s cat t : BStgElt, BStgElt -> BStgElt
s * t : MonStgElt, MonStgElt -> MonStgElt
Concatenate the strings s and t.
s cat:= t : MonStgElt, MonStgElt -> MonStgElt
s cat:= t : BStgElt, BStgElt -> BStgElt
s *:= t : MonStgElt, MonStgElt -> MonStgElt
Modification-concatenation of the string s with t: concatenate s and t and put the result in s.
&cat s : [ MonStgElt ] -> MonStgElt
&cat s : [ BStgElt ] -> BStgElt
&* s : [ MonStgElt ] -> MonStgElt
Given an enumerated sequence s of strings, return the concatenation of these strings.
s ^ n : MonStgElt, RngIntElt -> MonStgElt
Form the n-fold concatenation of the string s, for n≥0. If n=0 this is the empty string, if n=1 it equals s, etc.
s[i] : MonStgElt, RngIntElt -> MonStgElt
Returns the substring of s consisting of the i-th character.
s[i] : BStgElt, RngIntElt -> RngIntElt
Returns the numeric value representing the i-th character of s.
ElementToSequence(s) : MonStgElt -> [ MonStgElt ]
Eltseq(s) : MonStgElt -> [ MonStgElt ]
Returns the sequence of characters of s (as length 1 strings).
ElementToSequence(s) : BStgElt -> [ BStgElt ]
Eltseq(s) : BStgElt -> [ BStgElt ]
Returns the sequence of numeric values representing the characters of s.
Substring(s, n, k) : MonStgElt, RngIntElt, RngIntElt -> MonStgElt
Substring(s, n, k) : BStgElt, RngIntElt, RngIntElt -> BStgElt
Return the substring of s of length k starting at position n.

Integer-Valued Functions

# s : MonStgElt -> RngIntElt
# s : BStgElt -> RngIntElt
The length of the string s.
Index(s, t) : MonStgElt, MonStgElt -> RngIntElt
Position(s, t) : MonStgElt, MonStgElt -> RngIntElt
This function returns the position (an integer p with 0 < p≤#s) in the string s where the beginning of a contiguous substring t occurs. It returns 0 if t is not a substring of s. (If t is the empty string, position 1 will always be returned, even if s is empty as well.)

Character Conversion

To perform more sophisticated operations, one may convert the string into a sequence and use the extensive facilities for sequences described in the next part of this manual; see the examples at the end of this chapter for details.

StringToCode(s) : MonStgElt -> RngIntElt
Returns the code number of the first character of string s. This code depends on the computer system that is used; it is ASCII on most UNIX machines.
CodeToString(n) : RngIntElt -> MonStgElt
Returns a character (string of length 1) corresponding to the code number n, where the code is system dependent (see previous entry).
StringToInteger(s) : MonStgElt -> RngIntElt
Returns the integer corresponding to the string of decimal digits s. All non-space characters in the string s must be digits (0, 1, ..., 9), except the first character, which is also allowed to be + or -. An error results if any other combination of characters occurs. Leading zeros are omitted.
StringToInteger(s, b) : MonStgElt, MonStgElt -> RngIntElt
Returns the integer corresponding to the string of digits s, all assumed to be written in base b. All non-space characters in the string s must be digits less than b (if b is greater than 10, `A' is used for 10, `B' for 11, etc.), except the first character, which is also allowed to be + or -. An error results if any other combination of characters occurs.
StringToIntegerSequence(s) : MonStgElt -> [ RngIntElt ]
Returns the sequence of integers corresponding to the string s of space-separated decimal numbers. All non-space characters in the string s must be digits (0, 1, ..., 9), except the first character after each space, which is also allowed to be + or -. An error results if any other combination of characters occurs. Leading zeros are omitted. Each number can begin with a sign (+ or -) without a space.
IntegerToString(n) : RngIntElt -> MonStgElt
Convert the integer n into a string of decimal digits; if n is negative the first character of the string will be -. (Note that leading zeros and a + sign are ignored when Magma builds an integer, so the resulting string will never begin with + or 0 characters.)
IntegerToString(n, b) : RngIntElt, RngIntElt -> MonStgElt
Convert the integer n into a string of digits with the given base (which must be in the range [2 ... 36]); if n is negative the first character of the string will be -.

Boolean Functions

s eq t : MonStgElt, MonStgElt -> BoolElt
s eq t : BStgElt, BStgElt -> BoolElt
Returns true if and only if the strings s and t are identical. Note that blanks are significant.
s ne t : MonStgElt, MonStgElt -> BoolElt
s ne t : BStgElt, MonStgElt -> BoolElt
Returns true if and only if the strings s and t are distinct. Note that blanks are significant.
s in t : MonStgElt, MonStgElt -> BoolElt
Returns true if and only if s appears as a contiguous substring of t. Note that the empty string is contained in every string.
s notin t : MonStgElt, MonStgElt -> BoolElt
Returns true if and only if s does not appear as a contiguous substring of t. Note that the empty string is contained in every string.
s lt t : MonStgElt, MonStgElt -> BoolElt
s lt t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically less than t, false otherwise. Here the ordering on characters imposed by their ASCII code number is used.
s le t : MonStgElt, MonStgElt -> BoolElt
s le t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically less than or equal to t, false otherwise. Here the ordering on characters imposed by their ASCII code number is used.
s gt t : MonStgElt, MonStgElt -> BoolElt
s gt t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically greater than t, false otherwise. Here the ordering on characters imposed by their ASCII code number is used.
s ge t : MonStgElt, MonStgElt -> BoolElt
s ge t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically greater than or equal to t, false otherwise. Here the ordering on characters imposed by their ASCII code number is used.

Example IO_Strings (H3E1)

> "Mag" cat "ma";
Magma
Omitting double-quotes usually has undesired effects:
> "Mag cat ma";
Mag cat ma
And note that there are two different equalities involved in the following!
> "73" * "9" * "42" eq "7" * "3942";
true
> 73 * 9 * 42 eq 7 * 3942;
true
The next line shows how strings can be concatenated quickly, and also that strings of blanks can be used for formatting:
> s := ("Mag" cat "ma? ")^2;
> s, " "^30, s[4]^12, "!";
Magma? Magma?                            mmmmmmmmmmmm !
Here is a way to list (in a sequence) the first occurrence of each of the ten digits in the decimal expansion of π, using IntegerToString and Position.
> pi := Pi(RealField(1001));
> dec1000 := Round(10^1000*(pi-3));
> I := IntegerToString(dec1000);
> [ Position(I, IntegerToString(i)) : i in [0..9] ];
[ 32, 1, 6, 9, 2, 4, 7, 13, 11, 5 ]
Using the length # and string indexing [ ] it is also easy to count the number of occurrences of each digit in the string containing the first 1000 digits.
> [ #[i : i in [1..#I] | I[i] eq IntegerToString(j)] : j in [0..9] ];
[ 93, 116, 103, 102, 93, 97, 94, 95, 101, 106 ]
We would like to test if the ASCII-encoding of the string `Magma' appears. This could be done as follows, using StringToCode and in, or alternatively, Position. To reduce the typing, we first abbreviate IntegerToString to its and StringToCode to sc.
> sc := StringToCode;
> its := IntegerToString;
> M := its(sc("M")) * its(sc("a")) * its(sc("g")) * its(sc("m")) * its(sc("a"));
> M;
779710310997
> M in I;
false
> Position(I, M);
0
So `Magma' does not appear this way. However, we could be satisfied if the letters appear somewhere in the right order. To do more sophisticated operations (like this) on strings, it is necessary to convert the string into a sequence, because sequences constitute a more versatile data type, allowing many more advanced operations than strings.
> Iseq := [ I[i] : i in [1..#I] ];
> Mseq := [ M[i] : i in [1..#M] ];
> IsSubsequence(Mseq, Iseq);
false
> IsSubsequence(Mseq, Iseq: Kind := "Sequential");
true
Finally, we find that the string `magma' lies in between `Pi' and `pi':
> "Pi" le "magma";
true
> "magma" lt "pi";
true

Parsing Strings

Split(S, D) : MonStgElt, MonStgElt -> [ MonStgElt ]
Split(S) : MonStgElt -> [ MonStgElt ]
    IncludeEmpty: BoolElt               Default: false
Given a string S, together with a string D describing a list of separator characters, return the sequence of strings obtained by splitting S at any of the characters contained in D. That is, S is considered as a sequence of fields, with any character in D taken to be a delimiter separating the fields. If D is omitted, it is taken to be the string consisting of the newline character alone (so S is split into the lines found in it). If S is desired to be split into space-separated words, the argument " \t\n" should be given for D.

By default, empty fields are not returned. This may be changed by setting the parameter IncludeEmpty to true.

Example IO_Split (H3E2)

We demonstrate elementary uses of Split.
> Split("a b c d", " ");
[ a, b, c, d ]
> // Note that adjacent separators do not produce
> // extra fields by default:
> Split("a||b|c", "|");
[ a, b, c ]
> // But they can be made to appear with IncludeEmpty:
> Split("a||b|c", "|" : IncludeEmpty := true);
[ a, , b, c ]
> Split("abxcdyefzab", "xyz");
[ ab, cd, ef, ab ]
> // Note that no splitting happens if the delimiter
> // is empty:
> Split("abcd", "");
[ abcd ]
Regexp(R, S) : MonStgElt, MonStgElt -> BoolElt, MonStgElt, [ MonStgElt ]
Given a string R specifying a regular expression, together with a string S, return whether S matches R. If so, return also the matched substring of S, together with the sequence of matched substrings of S corresponding to the parenthesized expressions of R. This function is based on the freely distributable reimplementation of the V8 regexp package by Henry Spencer. The syntax and interpretation of the characters |, *, +, ?, ^, $, [], is the same as in the UNIX command egrep. The parenthesized expressions are numbered in left-to-right order of their opening parentheses. Note that the parentheses should not have an initial backslash before them as the UNIX commands grep and ed require.

Example IO_Regexp (H3E3)

We demonstrate some elementary uses of Regexp.
> Regexp("b.*d", "abcde");
true bcd []
> Regexp("b(.*)d", "abcde");
true bcd [ c ]
> Regexp("b.*d", "xyz");
false
> date := "Mon Jun 17 10:27:27 EST 1996";
> _, _, f := Regexp("([0-9][0-9]):([0-9][0-9]):([0-9][0-9])", date);
> f;
[ 10, 27, 27 ]
> h, m, s := Explode(f);
> h, m, s;
10 27 27
V2.28, 13 July 2023