diff options
Diffstat (limited to 'man-pages-posix-2003/man1p/tr.1p')
-rw-r--r-- | man-pages-posix-2003/man1p/tr.1p | 543 |
1 files changed, 543 insertions, 0 deletions
diff --git a/man-pages-posix-2003/man1p/tr.1p b/man-pages-posix-2003/man1p/tr.1p new file mode 100644 index 0000000..47d4780 --- /dev/null +++ b/man-pages-posix-2003/man1p/tr.1p @@ -0,0 +1,543 @@ +.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved +.TH "TR" 1P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual" +.\" tr +.SH PROLOG +This manual page is part of the POSIX Programmer's Manual. +The Linux implementation of this interface may differ (consult +the corresponding Linux manual page for details of Linux behavior), +or the interface may not be implemented on Linux. +.SH NAME +tr \- translate characters +.SH SYNOPSIS +.LP +\fBtr\fP \fB[\fP\fB-c | -C\fP\fB][\fP\fB-s]\fP \fIstring1 string2\fP\fB +.br +.sp +tr -s\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1\fP\fB +.br +.sp +tr -d\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1\fP\fB +.br +.sp +tr -ds\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1 string2\fP\fB +.br +\fP +.SH DESCRIPTION +.LP +The \fItr\fP utility shall copy the standard input to the standard +output with substitution or deletion of selected characters. +The options specified and the \fIstring1\fP and \fIstring2\fP operands +shall control translations that occur while copying +characters and single-character collating elements. +.SH OPTIONS +.LP +The \fItr\fP utility shall conform to the Base Definitions volume +of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines. +.LP +The following options shall be supported: +.TP 7 +\fB-c\fP +Complement the set of values specified by \fIstring1\fP. See the EXTENDED +DESCRIPTION section. +.TP 7 +\fB-C\fP +Complement the set of characters specified by \fIstring1\fP. See the +EXTENDED DESCRIPTION section. +.TP 7 +\fB-d\fP +Delete all occurrences of input characters that are specified by \fIstring1\fP. +.TP 7 +\fB-s\fP +Replace instances of repeated characters with a single character, +as described in the EXTENDED DESCRIPTION section. +.sp +.SH OPERANDS +.LP +The following operands shall be supported: +.TP 7 +\fIstring1\fP,\ \fIstring2\fP +.sp +Translation control strings. Each string shall represent a set of +characters to be converted into an array of characters used for +the translation. For a detailed description of how the strings are +interpreted, see the EXTENDED DESCRIPTION section. +.sp +.SH STDIN +.LP +The standard input can be any type of file. +.SH INPUT FILES +.LP +None. +.SH ENVIRONMENT VARIABLES +.LP +The following environment variables shall affect the execution of +\fItr\fP: +.TP 7 +\fILANG\fP +Provide a default value for the internationalization variables that +are unset or null. (See the Base Definitions volume of +IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables +for +the precedence of internationalization variables used to determine +the values of locale categories.) +.TP 7 +\fILC_ALL\fP +If set to a non-empty string value, override the values of all the +other internationalization variables. +.TP 7 +\fILC_COLLATE\fP +.sp +Determine the locale for the behavior of range expressions and equivalence +classes. +.TP 7 +\fILC_CTYPE\fP +Determine the locale for the interpretation of sequences of bytes +of text data as characters (for example, single-byte as +opposed to multi-byte characters in arguments) and the behavior of +character classes. +.TP 7 +\fILC_MESSAGES\fP +Determine the locale that should be used to affect the format and +contents of diagnostic messages written to standard +error. +.TP 7 +\fINLSPATH\fP +Determine the location of message catalogs for the processing of \fILC_MESSAGES +\&.\fP +.sp +.SH ASYNCHRONOUS EVENTS +.LP +Default. +.SH STDOUT +.LP +The \fItr\fP output shall be identical to the input, with the exception +of the specified transformations. +.SH STDERR +.LP +The standard error shall be used only for diagnostic messages. +.SH OUTPUT FILES +.LP +None. +.SH EXTENDED DESCRIPTION +.LP +The operands \fIstring1\fP and \fIstring2\fP (if specified) define +two arrays of characters. The constructs in the following +list can be used to specify characters or single-character collating +elements. If any of the constructs result in multi-character +collating elements, \fItr\fP shall exclude, without a diagnostic, +those multi-character elements from the resulting array. +.TP 7 +\fIcharacter\fP +Any character not described by one of the conventions below shall +represent itself. +.TP 7 +\\\fIoctal\fP +Octal sequences can be used to represent characters with specific +coded values. An octal sequence shall consist of a backslash +followed by the longest sequence of one, two, or three-octal-digit +characters (01234567). The sequence shall cause the value whose +encoding is represented by the one, two, or three-digit octal integer +to be placed into the array. If the size of a byte on the +system is greater than nine bits, the valid escape sequence used to +represent a byte is implementation-defined. Multi-byte +characters require multiple, concatenated escape sequences of this +type, including the leading \fB'\\'\fP for each byte. +.TP 7 +\\\fIcharacter\fP +The backslash-escape sequences in the Base Definitions volume of IEEE\ Std\ 1003.1-2001, +Table 5-1, Escape Sequences +and Associated Actions ( \fB'\\\\'\fP, \fB'\\a'\fP, \fB'\\b'\fP, +\fB'\\f'\fP, \fB'\\n'\fP, \fB'\\r'\fP, +\fB'\\t'\fP, \fB'\\v'\fP ) shall be supported. The results of using +any other character, other than an octal digit, following +the backslash are unspecified. +.TP 7 +\fIc\fP-\fIc\fP +In the POSIX locale, this construct shall represent the range of collating +elements between the range endpoints (as long as +neither endpoint is an octal sequence of the form \\\fIoctal\fP), +inclusive, as defined by the collation sequence. The characters +or collating elements in the range shall be placed in the array in +ascending collation sequence. If the second endpoint precedes +the starting endpoint in the collation sequence, it is unspecified +whether the range of collating elements is empty, or this +construct is treated as invalid. In locales other than the POSIX locale, +this construct has unspecified behavior. +.LP +If either or both of the range endpoints are octal sequences of the +form \\\fIoctal\fP, this shall represent the range of +specific coded values between the two range endpoints, inclusive. +.TP 7 +.B :\fIclass\fP: +Represents all characters belonging to the defined character class, +as defined by the current setting of the \fILC_CTYPE\fP +locale category. The following character class names shall be accepted +when specified in \fIstring1\fP: +.TS C +center; l l l l l l. +\fBalnum\fP \fBblank\fP \fBdigit\fP \fBlower\fP \fBpunct\fP \fBupper\fP +\fBalpha\fP \fBcntrl\fP \fBgraph\fP \fBprint\fP \fBspace\fP \fBxdigit\fP +.TE +.LP +In addition, character class expressions of the form [: \fIname\fP:] +shall be recognized in those locales where the \fIname\fP +keyword has been given a \fBcharclass\fP definition in the \fILC_CTYPE\fP +category. +.LP +When both the \fB-d\fP and \fB-s\fP options are specified, any of +the character class names shall be accepted in +\fIstring2\fP. Otherwise, only character class names \fBlower\fP or +\fBupper\fP are valid in \fIstring2\fP and then only if the +corresponding character class ( \fBupper\fP and \fBlower\fP, respectively) +is specified in the same relative position in +\fIstring1\fP. Such a specification shall be interpreted as a request +for case conversion. When [: \fIlower\fP:] appears in +\fIstring1\fP and [: \fIupper\fP:] appears in \fIstring2\fP, the arrays +shall contain the characters from the \fBtoupper\fP +mapping in the \fILC_CTYPE\fP category of the current locale. When +[: \fIupper\fP:] appears in \fIstring1\fP and [: +\fIlower\fP:] appears in \fIstring2\fP, the arrays shall contain the +characters from the \fBtolower\fP mapping in the +\fILC_CTYPE\fP category of the current locale. The first character +from each mapping pair shall be in the array for \fIstring1\fP +and the second character from each mapping pair shall be in the array +for \fIstring2\fP in the same relative position. +.LP +Except for case conversion, the characters specified by a character +class expression shall be placed in the array in an +unspecified order. +.LP +If the name specified for \fIclass\fP does not define a valid character +class in the current locale, the behavior is +undefined. +.TP 7 +.B =\fIequiv\fP= +Represents all characters or collating elements belonging to the same +equivalence class as \fIequiv\fP, as defined by the +current setting of the \fILC_COLLATE\fP locale category. An equivalence +class expression shall be allowed only in \fIstring1\fP, +or in \fIstring2\fP when it is being used by the combined \fB-d\fP +and \fB-s\fP options. The characters belonging to the +equivalence class shall be placed in the array in an unspecified order. +.TP 7 +.B \fIx\fP*\fIn\fP +Represents \fIn\fP repeated occurrences of the character \fIx\fP. +Because this expression is used to map multiple characters +to one, it is only valid when it occurs in \fIstring2\fP. If \fIn\fP +is omitted or is zero, it shall be interpreted as large +enough to extend the \fIstring2\fP-based sequence to the length of +the \fIstring1\fP-based sequence. If \fIn\fP has a leading +zero, it shall be interpreted as an octal value. Otherwise, it shall +be interpreted as a decimal value. +.sp +.LP +When the \fB-d\fP option is not specified: +.IP " *" 3 +Each input character found in the array specified by \fIstring1\fP +shall be replaced by the character in the same relative +position in the array specified by \fIstring2\fP. When the array specified +by \fIstring2\fP is shorter that the one specified by +\fIstring1\fP, the results are unspecified. +.LP +.IP " *" 3 +If the \fB-C\fP option is specified, the complements of the characters +specified by \fIstring1\fP (the set of all characters +in the current character set, as defined by the current setting of +\fILC_CTYPE\fP, except for those actually specified in the +\fIstring1\fP operand) shall be placed in the array in ascending collation +sequence, as defined by the current setting of +\fILC_COLLATE\fP. +.LP +.IP " *" 3 +If the \fB-c\fP option is specified, the complement of the values +specified by \fIstring1\fP shall be placed in the array in +ascending order by binary value. +.LP +.IP " *" 3 +Because the order in which characters specified by character class +expressions or equivalence class expressions is undefined, +such expressions should only be used if the intent is to map several +characters into one. An exception is case conversion, as +described previously. +.LP +.LP +When the \fB-d\fP option is specified: +.IP " *" 3 +Input characters found in the array specified by \fIstring1\fP shall +be deleted. +.LP +.IP " *" 3 +When the \fB-C\fP option is specified with \fB-d\fP, all characters +except those specified by \fIstring1\fP shall be deleted. +The contents of \fIstring2\fP are ignored, unless the \fB-s\fP option +is also specified. +.LP +.IP " *" 3 +When the \fB-c\fP option is specified with \fB-d\fP, all values except +those specified by \fIstring1\fP shall be deleted. The +contents of \fIstring2\fP shall be ignored, unless the \fB-s\fP option +is also specified. +.LP +.IP " *" 3 +The same string cannot be used for both the \fB-d\fP and the \fB-s\fP +option; when both options are specified, both +\fIstring1\fP (used for deletion) and \fIstring2\fP (used for squeezing) +shall be required. +.LP +.LP +When the \fB-s\fP option is specified, after any deletions or translations +have taken place, repeated sequences of the same +character shall be replaced by one occurrence of the same character, +if the character is found in the array specified by the last +operand. If the last operand contains a character class, such as the +following example: +.sp +.RS +.nf + +\fBtr -s '[:space:]' +\fP +.fi +.RE +.LP +the last operand's array shall contain all of the characters in that +character class. However, in a case conversion, as +described previously, such as: +.sp +.RS +.nf + +\fBtr -s '[:upper:]' '[:lower:]' +\fP +.fi +.RE +.LP +the last operand's array shall contain only those characters defined +as the second characters in each of the \fBtoupper\fP or +\fBtolower\fP character pairs, as appropriate. +.LP +An empty string used for \fIstring1\fP or \fIstring2\fP produces undefined +results. +.SH EXIT STATUS +.LP +The following exit values shall be returned: +.TP 7 +\ 0 +All input was processed successfully. +.TP 7 +>0 +An error occurred. +.sp +.SH CONSEQUENCES OF ERRORS +.LP +Default. +.LP +\fIThe following sections are informative.\fP +.SH APPLICATION USAGE +.LP +If necessary, \fIstring1\fP and \fIstring2\fP can be quoted to avoid +pattern matching by the shell. +.LP +If an ordinary digit (representing itself) is to follow an octal sequence, +the octal sequence must use the full three digits to +avoid ambiguity. +.LP +When \fIstring2\fP is shorter than \fIstring1\fP, a difference results +between historical System\ V and BSD systems. A BSD +system pads \fIstring2\fP with the last character found in \fIstring2\fP. +Thus, it is possible to do the following: +.sp +.RS +.nf + +\fBtr 0123456789 d +\fP +.fi +.RE +.LP +which would translate all digits to the letter \fB'd'\fP . Since this +area is specifically unspecified in this volume of +IEEE\ Std\ 1003.1-2001, both the BSD and System\ V behaviors are allowed, +but a conforming application cannot rely on +the BSD behavior. It would have to code the example in the following +way: +.sp +.RS +.nf + +\fBtr 0123456789 '[d*]' +\fP +.fi +.RE +.LP +It should be noted that, despite similarities in appearance, the string +operands used by \fItr\fP are not regular +expressions. +.LP +Unlike some historical implementations, this definition of the \fItr\fP +utility correctly processes NUL characters in its input +stream. NUL characters can be stripped by using: +.sp +.RS +.nf + +\fBtr -d '\\000' +\fP +.fi +.RE +.SH EXAMPLES +.IP " 1." 4 +The following example creates a list of all words in \fBfile1\fP one +per line in \fBfile2\fP, where a word is taken to be a +maximal string of letters. +.sp +.RS +.nf + +\fBtr -cs "[:alpha:]" "[\\n*]" <file1 >file2 +\fP +.fi +.RE +.LP +.IP " 2." 4 +The next example translates all lowercase characters in \fBfile1\fP +to uppercase and writes the results to standard output. +.sp +.RS +.nf + +\fBtr "[:lower:]" "[:upper:]" <file1 +\fP +.fi +.RE +.LP +.IP " 3." 4 +This example uses an equivalence class to identify accented variants +of the base character \fB'e'\fP in \fBfile1\fP, which +are stripped of diacritical marks and written to \fBfile2\fP. +.sp +.RS +.nf + +\fBtr "[=e=]" e <file1 >file2 +\fP +.fi +.RE +.LP +.SH RATIONALE +.LP +In some early proposals, an explicit option \fB-n\fP was added to +disable the historical behavior of stripping NUL characters +from the input. It was considered that automatically stripping NUL +characters from the input was not correct functionality. +However, the removal of \fB-n\fP in a later proposal does not remove +the requirement that \fItr\fP correctly process NUL +characters in its input stream. NUL characters can be stripped by +using \fItr\fP \fB-d\fP '\\000'. +.LP +Historical implementations of \fItr\fP differ widely in syntax and +behavior. For example, the BSD version has not needed the +bracket characters for the repetition sequence. The \fItr\fP utility +syntax is based more closely on the System V and XPG3 model +while attempting to accommodate historical BSD implementations. In +the case of the short \fIstring2\fP padding, the decision was +to unspecify the behavior and preserve System V and XPG3 scripts, +which might find difficulty with the BSD method. The assumption +was made that BSD users of \fItr\fP have to make accommodations to +meet the syntax defined here. Since it is possible to use the +repetition sequence to duplicate the desired behavior, whereas there +is no simple way to achieve the System V method, this was the +correct, if not desirable, approach. +.LP +The use of octal values to specify control characters, while having +historical precedents, is not portable. The introduction of +escape sequences for control characters should provide the necessary +portability. It is recognized that this may cause some +historical scripts to break. +.LP +An early proposal included support for multi-character collating elements. +It was pointed out that, while \fItr\fP does employ +some syntactical elements from REs, the aim of \fItr\fP is quite different; +ranges, for example, do not have a similar meaning +(``any of the chars in the range matches", \fIversus\fP "translate +each character in the range to the output counterpart"). As +a result, the previously included support for multi-character collating +elements has been removed. What remains are ranges in +current collation order (to support, for example, accented characters), +character classes, and equivalence classes. +.LP +In XPG3 the [: \fIclass\fP:] and [= \fIequiv\fP=] conventions are +shown with double brackets, as in RE syntax. However, +\fItr\fP does not implement RE principles; it just borrows part of +the syntax. Consequently, [: \fIclass\fP:] and [= +\fIequiv\fP=] should be regarded as syntactical elements on a par +with [ \fIx\fP* \fIn\fP], which is not an RE bracket +expression. +.LP +The standard developers will consider changes to \fItr\fP that allow +it to translate characters between different character +encodings, or they will consider providing a new utility to accomplish +this. +.LP +On historical System V systems, a range expression requires enclosing +square-brackets, such as: +.sp +.RS +.nf + +\fBtr '[a-z]' '[A-Z]' +\fP +.fi +.RE +.LP +However, BSD-based systems did not require the brackets, and this +convention is used here to avoid breaking large numbers of BSD +scripts: +.sp +.RS +.nf + +\fBtr a-z A-Z +\fP +.fi +.RE +.LP +The preceding System V script will continue to work because the brackets, +treated as regular characters, are translated to +themselves. However, any System V script that relied on \fB"a-z"\fP +representing the three characters \fB'a'\fP, +\fB'-'\fP, and \fB'z'\fP have to be rewritten as \fB"az-"\fP . +.LP +The ISO\ POSIX-2:1993 standard had a \fB-c\fP option that behaved +similarly to the \fB-C\fP option, but did not supply +functionality equivalent to the \fB-c\fP option specified in IEEE\ Std\ 1003.1-2001. +This meant that historical practice +of being able to specify \fItr\fP \fB-d\fP\\200-\\377 (which would +delete all bytes with the top bit set) would have no effect +because, in the C locale, bytes with the values octal 200 to octal +377 are not characters. +.LP +The earlier version also said that octal sequences referred to collating +elements and could be placed adjacent to each other to +specify multi-byte characters. However, it was noted that this caused +ambiguities because \fItr\fP would not be able to tell +whether adjacent octal sequences were intending to specify multi-byte +characters or multiple single byte characters. +IEEE\ Std\ 1003.1-2001 specifies that octal sequences always refer +to single byte binary values. +.SH FUTURE DIRECTIONS +.LP +None. +.SH SEE ALSO +.LP +\fIsed\fP +.SH COPYRIGHT +Portions of this text are reprinted and reproduced in electronic form +from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology +-- Portable Operating System Interface (POSIX), The Open Group Base +Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of +Electrical and Electronics Engineers, Inc and The Open Group. In the +event of any discrepancy between this version and the original IEEE and +The Open Group Standard, the original IEEE and The Open Group Standard +is the referee document. The original Standard can be obtained online at +http://www.opengroup.org/unix/online.html . |