Next: , Previous: , Up: Library   [Contents][Index]


5.2 tr, dc, and sq functions

Built-in Function: string tr (string subj, string set1, string set2)

Translates characters in string subj and returns the resulting string.

Translation rules are defined by two character sets: set1 is a set of characters which, when encountered in subj, must be replaced with the corresponding characters from set2. E.g.:

tr('text', 'tx', 'ni') ⇒ 'nein'

The source set set1 can contain character classes, sets of characters enclosed in square brackets. A character class matches the input character if that character is listed in the class. When a match occurs, the character is replaced with the corresponding character from set2:

tr('abacus', '[abc]', '_') ⇒ ‘____us

An exclamation sign at the beginning of a character class reverses its meaning, i.e. the class matches any character not listed in it:

tr('abacus', '[!abc]', '_') ⇒ ‘abac__

A character set can contain ranges, specified as the first and last characters from the range separated by a dash. A range ‘x-y’ comprises all characters between x and y inclusive. For example, ‘[a-d]’ is equivalent to ‘[abcd]’. Character sets must be ascending, i.e. ‘[a-d]’ is correct, but ‘[d-a]’ is not. You may include ‘-’ in its literal meaning by making it the first or last character between the brackets: ‘[0-9-]’ matches any digit or dash.

Similarly, to include a closing bracket, make it the first character in the class (after negation character, for excluding ranges), e.g. ‘[][!]’ matches the three characters ‘[’, ‘]’ and ‘!’, whereas ‘[!][]’ matches any character except ‘[’ and ‘]’.

Named character classes are special reserved names between ‘[:’ and ‘:]’ delimiters:

[:alnum:]

Matches any alphanumeric character. Equivalent to ‘[:alpha:][:digit:]’.

[:alpha:]

Matches any alphabetic character.

[:blank:]

Matches horizontal space or tab.

[:cntrl:]

Matches a control character, i.e. a character with ASCII code less than 32.

[:digit:]

Matches a decimal digit (0 through 9).

[:graph:]

Matches any printable character except space (horizontal space and tab).

[:lower:]

Matches any lowercase letter.

[:print:]

Matches any printable character including space.

[:punct:]

Matches any printable character which is not a space or an alphanumeric character.

[:space:]

Matches ‘white-space’ characters: horizontal space (ASCII 32), form-feed (ASCII 12, or ‘\f’), newline (ASCII 10, or ‘\n’), carriage return (ASCII 13, or ‘\r’), horizontal tab (ASCII 9, or ‘\t’), and vertical tab (ASCII 11, or ‘\v’).

[:upper:]

Matches any upper case letter.

[:xdigit:]

Matches any hexagesimal digit: ‘0’ through ‘9’, ‘a’ through ‘f’ and ‘A’ through ‘F’.

Named classes can appear in character classes in set1 anywhere a regular character is allowed. Examples:

[][:alpha:]-]

Mathes alphabet letters (both cases), digits, closing bracket and dash.

[!][:alpha:]-]

A complement of the above: matches any character except the ones listed above.

[[:xdigit:][:blank:]]

Matches any hexagesimal digit or horizontal whitespace characters.

The replacement set must not be empty. Its length must be equal to or less than that of set1 (character classes being counted as one character). If set1 contains more characters than set2, the surplus ones will be translated to the last character from set2:

tr('lasted', 'alde', 'iL?') ⇒ 'List??'

Both sets can contain character ranges, represented as ‘c1-c2’. Whenever a range appears in set1, a range must appear in the corresponding position of set2:

tr('gnu', 'a-z', 'A-Z') ⇒ 'GNU'

Character ranges are not to be confused with ranges in character classes: they are similiar, but quite distinct. Both match a single character, but while ranges translate to a corresponding character from the replacement range, ranges within character class translate to a single character:

tr('gnu', '[a-z]', 'A') ⇒ 'AAA'

Character ranges in set1 must always be in ascending order (i.e. ‘a-z’ is allowed, whereas ‘z-a’ is not). Ranges in set2 can be both ascending and descending, e.g.:

tr('8029', '0-9', '9-0') ⇒ '1970'

To translate a dash, place it as the first or last character in set1:

tr('in-place', '-ilp', ' Irg') ⇒ 'In grace'

The tr function will raise the e_inval exception if set2 is empty or set1 contains a range without matching range in set2. It will raise the e_range exception, if a descending range appears in set1 or number of characters in a range from set1 does not match that from the corresponding range in set2.

Built-in Function: string dc (string subj, string set1)

Deletes from subj characters that appear in set1. The syntax of set1 is as described in tr, except that character ranges are treated as if appearing within character class (e.g. ‘a-z’ is the same as ‘[a-z]’).

For example, dc(subj, '0-9') removes decimal digits from first argument.

Built-in Function: string sq (string subj, string set1)

Squeezes repeats, i.e. replaces each sequence of a repeated character that is listed in set1, with a single occurrence of that character. The syntax of set1 is as described in tr, except that character ranges are treated as if appearing within character class (e.g. ‘a-z’ is the same as ‘[a-z]’).

For example, sq(subj, '[[:space:]]') replaces multiple occurrences of whitespace characters with a single character.


Next: , Previous: , Up: Library   [Contents][Index]