aboutsummaryrefslogtreecommitdiffstats
path: root/man-pages-posix-2003/man1p/tr.1p
diff options
context:
space:
mode:
Diffstat (limited to 'man-pages-posix-2003/man1p/tr.1p')
-rw-r--r--man-pages-posix-2003/man1p/tr.1p543
1 files changed, 543 insertions, 0 deletions
diff --git a/man-pages-posix-2003/man1p/tr.1p b/man-pages-posix-2003/man1p/tr.1p
new file mode 100644
index 0000000..47d4780
--- /dev/null
+++ b/man-pages-posix-2003/man1p/tr.1p
@@ -0,0 +1,543 @@
+.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
+.TH "TR" 1P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
+.\" tr
+.SH PROLOG
+This manual page is part of the POSIX Programmer's Manual.
+The Linux implementation of this interface may differ (consult
+the corresponding Linux manual page for details of Linux behavior),
+or the interface may not be implemented on Linux.
+.SH NAME
+tr \- translate characters
+.SH SYNOPSIS
+.LP
+\fBtr\fP \fB[\fP\fB-c | -C\fP\fB][\fP\fB-s]\fP \fIstring1 string2\fP\fB
+.br
+.sp
+tr -s\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1\fP\fB
+.br
+.sp
+tr -d\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1\fP\fB
+.br
+.sp
+tr -ds\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1 string2\fP\fB
+.br
+\fP
+.SH DESCRIPTION
+.LP
+The \fItr\fP utility shall copy the standard input to the standard
+output with substitution or deletion of selected characters.
+The options specified and the \fIstring1\fP and \fIstring2\fP operands
+shall control translations that occur while copying
+characters and single-character collating elements.
+.SH OPTIONS
+.LP
+The \fItr\fP utility shall conform to the Base Definitions volume
+of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
+.LP
+The following options shall be supported:
+.TP 7
+\fB-c\fP
+Complement the set of values specified by \fIstring1\fP. See the EXTENDED
+DESCRIPTION section.
+.TP 7
+\fB-C\fP
+Complement the set of characters specified by \fIstring1\fP. See the
+EXTENDED DESCRIPTION section.
+.TP 7
+\fB-d\fP
+Delete all occurrences of input characters that are specified by \fIstring1\fP.
+.TP 7
+\fB-s\fP
+Replace instances of repeated characters with a single character,
+as described in the EXTENDED DESCRIPTION section.
+.sp
+.SH OPERANDS
+.LP
+The following operands shall be supported:
+.TP 7
+\fIstring1\fP,\ \fIstring2\fP
+.sp
+Translation control strings. Each string shall represent a set of
+characters to be converted into an array of characters used for
+the translation. For a detailed description of how the strings are
+interpreted, see the EXTENDED DESCRIPTION section.
+.sp
+.SH STDIN
+.LP
+The standard input can be any type of file.
+.SH INPUT FILES
+.LP
+None.
+.SH ENVIRONMENT VARIABLES
+.LP
+The following environment variables shall affect the execution of
+\fItr\fP:
+.TP 7
+\fILANG\fP
+Provide a default value for the internationalization variables that
+are unset or null. (See the Base Definitions volume of
+IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
+for
+the precedence of internationalization variables used to determine
+the values of locale categories.)
+.TP 7
+\fILC_ALL\fP
+If set to a non-empty string value, override the values of all the
+other internationalization variables.
+.TP 7
+\fILC_COLLATE\fP
+.sp
+Determine the locale for the behavior of range expressions and equivalence
+classes.
+.TP 7
+\fILC_CTYPE\fP
+Determine the locale for the interpretation of sequences of bytes
+of text data as characters (for example, single-byte as
+opposed to multi-byte characters in arguments) and the behavior of
+character classes.
+.TP 7
+\fILC_MESSAGES\fP
+Determine the locale that should be used to affect the format and
+contents of diagnostic messages written to standard
+error.
+.TP 7
+\fINLSPATH\fP
+Determine the location of message catalogs for the processing of \fILC_MESSAGES
+\&.\fP
+.sp
+.SH ASYNCHRONOUS EVENTS
+.LP
+Default.
+.SH STDOUT
+.LP
+The \fItr\fP output shall be identical to the input, with the exception
+of the specified transformations.
+.SH STDERR
+.LP
+The standard error shall be used only for diagnostic messages.
+.SH OUTPUT FILES
+.LP
+None.
+.SH EXTENDED DESCRIPTION
+.LP
+The operands \fIstring1\fP and \fIstring2\fP (if specified) define
+two arrays of characters. The constructs in the following
+list can be used to specify characters or single-character collating
+elements. If any of the constructs result in multi-character
+collating elements, \fItr\fP shall exclude, without a diagnostic,
+those multi-character elements from the resulting array.
+.TP 7
+\fIcharacter\fP
+Any character not described by one of the conventions below shall
+represent itself.
+.TP 7
+\\\fIoctal\fP
+Octal sequences can be used to represent characters with specific
+coded values. An octal sequence shall consist of a backslash
+followed by the longest sequence of one, two, or three-octal-digit
+characters (01234567). The sequence shall cause the value whose
+encoding is represented by the one, two, or three-digit octal integer
+to be placed into the array. If the size of a byte on the
+system is greater than nine bits, the valid escape sequence used to
+represent a byte is implementation-defined. Multi-byte
+characters require multiple, concatenated escape sequences of this
+type, including the leading \fB'\\'\fP for each byte.
+.TP 7
+\\\fIcharacter\fP
+The backslash-escape sequences in the Base Definitions volume of IEEE\ Std\ 1003.1-2001,
+Table 5-1, Escape Sequences
+and Associated Actions ( \fB'\\\\'\fP, \fB'\\a'\fP, \fB'\\b'\fP,
+\fB'\\f'\fP, \fB'\\n'\fP, \fB'\\r'\fP,
+\fB'\\t'\fP, \fB'\\v'\fP ) shall be supported. The results of using
+any other character, other than an octal digit, following
+the backslash are unspecified.
+.TP 7
+\fIc\fP-\fIc\fP
+In the POSIX locale, this construct shall represent the range of collating
+elements between the range endpoints (as long as
+neither endpoint is an octal sequence of the form \\\fIoctal\fP),
+inclusive, as defined by the collation sequence. The characters
+or collating elements in the range shall be placed in the array in
+ascending collation sequence. If the second endpoint precedes
+the starting endpoint in the collation sequence, it is unspecified
+whether the range of collating elements is empty, or this
+construct is treated as invalid. In locales other than the POSIX locale,
+this construct has unspecified behavior.
+.LP
+If either or both of the range endpoints are octal sequences of the
+form \\\fIoctal\fP, this shall represent the range of
+specific coded values between the two range endpoints, inclusive.
+.TP 7
+.B :\fIclass\fP:
+Represents all characters belonging to the defined character class,
+as defined by the current setting of the \fILC_CTYPE\fP
+locale category. The following character class names shall be accepted
+when specified in \fIstring1\fP:
+.TS C
+center; l l l l l l.
+\fBalnum\fP \fBblank\fP \fBdigit\fP \fBlower\fP \fBpunct\fP \fBupper\fP
+\fBalpha\fP \fBcntrl\fP \fBgraph\fP \fBprint\fP \fBspace\fP \fBxdigit\fP
+.TE
+.LP
+In addition, character class expressions of the form [: \fIname\fP:]
+shall be recognized in those locales where the \fIname\fP
+keyword has been given a \fBcharclass\fP definition in the \fILC_CTYPE\fP
+category.
+.LP
+When both the \fB-d\fP and \fB-s\fP options are specified, any of
+the character class names shall be accepted in
+\fIstring2\fP. Otherwise, only character class names \fBlower\fP or
+\fBupper\fP are valid in \fIstring2\fP and then only if the
+corresponding character class ( \fBupper\fP and \fBlower\fP, respectively)
+is specified in the same relative position in
+\fIstring1\fP. Such a specification shall be interpreted as a request
+for case conversion. When [: \fIlower\fP:] appears in
+\fIstring1\fP and [: \fIupper\fP:] appears in \fIstring2\fP, the arrays
+shall contain the characters from the \fBtoupper\fP
+mapping in the \fILC_CTYPE\fP category of the current locale. When
+[: \fIupper\fP:] appears in \fIstring1\fP and [:
+\fIlower\fP:] appears in \fIstring2\fP, the arrays shall contain the
+characters from the \fBtolower\fP mapping in the
+\fILC_CTYPE\fP category of the current locale. The first character
+from each mapping pair shall be in the array for \fIstring1\fP
+and the second character from each mapping pair shall be in the array
+for \fIstring2\fP in the same relative position.
+.LP
+Except for case conversion, the characters specified by a character
+class expression shall be placed in the array in an
+unspecified order.
+.LP
+If the name specified for \fIclass\fP does not define a valid character
+class in the current locale, the behavior is
+undefined.
+.TP 7
+.B =\fIequiv\fP=
+Represents all characters or collating elements belonging to the same
+equivalence class as \fIequiv\fP, as defined by the
+current setting of the \fILC_COLLATE\fP locale category. An equivalence
+class expression shall be allowed only in \fIstring1\fP,
+or in \fIstring2\fP when it is being used by the combined \fB-d\fP
+and \fB-s\fP options. The characters belonging to the
+equivalence class shall be placed in the array in an unspecified order.
+.TP 7
+.B \fIx\fP*\fIn\fP
+Represents \fIn\fP repeated occurrences of the character \fIx\fP.
+Because this expression is used to map multiple characters
+to one, it is only valid when it occurs in \fIstring2\fP. If \fIn\fP
+is omitted or is zero, it shall be interpreted as large
+enough to extend the \fIstring2\fP-based sequence to the length of
+the \fIstring1\fP-based sequence. If \fIn\fP has a leading
+zero, it shall be interpreted as an octal value. Otherwise, it shall
+be interpreted as a decimal value.
+.sp
+.LP
+When the \fB-d\fP option is not specified:
+.IP " *" 3
+Each input character found in the array specified by \fIstring1\fP
+shall be replaced by the character in the same relative
+position in the array specified by \fIstring2\fP. When the array specified
+by \fIstring2\fP is shorter that the one specified by
+\fIstring1\fP, the results are unspecified.
+.LP
+.IP " *" 3
+If the \fB-C\fP option is specified, the complements of the characters
+specified by \fIstring1\fP (the set of all characters
+in the current character set, as defined by the current setting of
+\fILC_CTYPE\fP, except for those actually specified in the
+\fIstring1\fP operand) shall be placed in the array in ascending collation
+sequence, as defined by the current setting of
+\fILC_COLLATE\fP.
+.LP
+.IP " *" 3
+If the \fB-c\fP option is specified, the complement of the values
+specified by \fIstring1\fP shall be placed in the array in
+ascending order by binary value.
+.LP
+.IP " *" 3
+Because the order in which characters specified by character class
+expressions or equivalence class expressions is undefined,
+such expressions should only be used if the intent is to map several
+characters into one. An exception is case conversion, as
+described previously.
+.LP
+.LP
+When the \fB-d\fP option is specified:
+.IP " *" 3
+Input characters found in the array specified by \fIstring1\fP shall
+be deleted.
+.LP
+.IP " *" 3
+When the \fB-C\fP option is specified with \fB-d\fP, all characters
+except those specified by \fIstring1\fP shall be deleted.
+The contents of \fIstring2\fP are ignored, unless the \fB-s\fP option
+is also specified.
+.LP
+.IP " *" 3
+When the \fB-c\fP option is specified with \fB-d\fP, all values except
+those specified by \fIstring1\fP shall be deleted. The
+contents of \fIstring2\fP shall be ignored, unless the \fB-s\fP option
+is also specified.
+.LP
+.IP " *" 3
+The same string cannot be used for both the \fB-d\fP and the \fB-s\fP
+option; when both options are specified, both
+\fIstring1\fP (used for deletion) and \fIstring2\fP (used for squeezing)
+shall be required.
+.LP
+.LP
+When the \fB-s\fP option is specified, after any deletions or translations
+have taken place, repeated sequences of the same
+character shall be replaced by one occurrence of the same character,
+if the character is found in the array specified by the last
+operand. If the last operand contains a character class, such as the
+following example:
+.sp
+.RS
+.nf
+
+\fBtr -s '[:space:]'
+\fP
+.fi
+.RE
+.LP
+the last operand's array shall contain all of the characters in that
+character class. However, in a case conversion, as
+described previously, such as:
+.sp
+.RS
+.nf
+
+\fBtr -s '[:upper:]' '[:lower:]'
+\fP
+.fi
+.RE
+.LP
+the last operand's array shall contain only those characters defined
+as the second characters in each of the \fBtoupper\fP or
+\fBtolower\fP character pairs, as appropriate.
+.LP
+An empty string used for \fIstring1\fP or \fIstring2\fP produces undefined
+results.
+.SH EXIT STATUS
+.LP
+The following exit values shall be returned:
+.TP 7
+\ 0
+All input was processed successfully.
+.TP 7
+>0
+An error occurred.
+.sp
+.SH CONSEQUENCES OF ERRORS
+.LP
+Default.
+.LP
+\fIThe following sections are informative.\fP
+.SH APPLICATION USAGE
+.LP
+If necessary, \fIstring1\fP and \fIstring2\fP can be quoted to avoid
+pattern matching by the shell.
+.LP
+If an ordinary digit (representing itself) is to follow an octal sequence,
+the octal sequence must use the full three digits to
+avoid ambiguity.
+.LP
+When \fIstring2\fP is shorter than \fIstring1\fP, a difference results
+between historical System\ V and BSD systems. A BSD
+system pads \fIstring2\fP with the last character found in \fIstring2\fP.
+Thus, it is possible to do the following:
+.sp
+.RS
+.nf
+
+\fBtr 0123456789 d
+\fP
+.fi
+.RE
+.LP
+which would translate all digits to the letter \fB'd'\fP . Since this
+area is specifically unspecified in this volume of
+IEEE\ Std\ 1003.1-2001, both the BSD and System\ V behaviors are allowed,
+but a conforming application cannot rely on
+the BSD behavior. It would have to code the example in the following
+way:
+.sp
+.RS
+.nf
+
+\fBtr 0123456789 '[d*]'
+\fP
+.fi
+.RE
+.LP
+It should be noted that, despite similarities in appearance, the string
+operands used by \fItr\fP are not regular
+expressions.
+.LP
+Unlike some historical implementations, this definition of the \fItr\fP
+utility correctly processes NUL characters in its input
+stream. NUL characters can be stripped by using:
+.sp
+.RS
+.nf
+
+\fBtr -d '\\000'
+\fP
+.fi
+.RE
+.SH EXAMPLES
+.IP " 1." 4
+The following example creates a list of all words in \fBfile1\fP one
+per line in \fBfile2\fP, where a word is taken to be a
+maximal string of letters.
+.sp
+.RS
+.nf
+
+\fBtr -cs "[:alpha:]" "[\\n*]" <file1 >file2
+\fP
+.fi
+.RE
+.LP
+.IP " 2." 4
+The next example translates all lowercase characters in \fBfile1\fP
+to uppercase and writes the results to standard output.
+.sp
+.RS
+.nf
+
+\fBtr "[:lower:]" "[:upper:]" <file1
+\fP
+.fi
+.RE
+.LP
+.IP " 3." 4
+This example uses an equivalence class to identify accented variants
+of the base character \fB'e'\fP in \fBfile1\fP, which
+are stripped of diacritical marks and written to \fBfile2\fP.
+.sp
+.RS
+.nf
+
+\fBtr "[=e=]" e <file1 >file2
+\fP
+.fi
+.RE
+.LP
+.SH RATIONALE
+.LP
+In some early proposals, an explicit option \fB-n\fP was added to
+disable the historical behavior of stripping NUL characters
+from the input. It was considered that automatically stripping NUL
+characters from the input was not correct functionality.
+However, the removal of \fB-n\fP in a later proposal does not remove
+the requirement that \fItr\fP correctly process NUL
+characters in its input stream. NUL characters can be stripped by
+using \fItr\fP \fB-d\fP '\\000'.
+.LP
+Historical implementations of \fItr\fP differ widely in syntax and
+behavior. For example, the BSD version has not needed the
+bracket characters for the repetition sequence. The \fItr\fP utility
+syntax is based more closely on the System V and XPG3 model
+while attempting to accommodate historical BSD implementations. In
+the case of the short \fIstring2\fP padding, the decision was
+to unspecify the behavior and preserve System V and XPG3 scripts,
+which might find difficulty with the BSD method. The assumption
+was made that BSD users of \fItr\fP have to make accommodations to
+meet the syntax defined here. Since it is possible to use the
+repetition sequence to duplicate the desired behavior, whereas there
+is no simple way to achieve the System V method, this was the
+correct, if not desirable, approach.
+.LP
+The use of octal values to specify control characters, while having
+historical precedents, is not portable. The introduction of
+escape sequences for control characters should provide the necessary
+portability. It is recognized that this may cause some
+historical scripts to break.
+.LP
+An early proposal included support for multi-character collating elements.
+It was pointed out that, while \fItr\fP does employ
+some syntactical elements from REs, the aim of \fItr\fP is quite different;
+ranges, for example, do not have a similar meaning
+(``any of the chars in the range matches", \fIversus\fP "translate
+each character in the range to the output counterpart"). As
+a result, the previously included support for multi-character collating
+elements has been removed. What remains are ranges in
+current collation order (to support, for example, accented characters),
+character classes, and equivalence classes.
+.LP
+In XPG3 the [: \fIclass\fP:] and [= \fIequiv\fP=] conventions are
+shown with double brackets, as in RE syntax. However,
+\fItr\fP does not implement RE principles; it just borrows part of
+the syntax. Consequently, [: \fIclass\fP:] and [=
+\fIequiv\fP=] should be regarded as syntactical elements on a par
+with [ \fIx\fP* \fIn\fP], which is not an RE bracket
+expression.
+.LP
+The standard developers will consider changes to \fItr\fP that allow
+it to translate characters between different character
+encodings, or they will consider providing a new utility to accomplish
+this.
+.LP
+On historical System V systems, a range expression requires enclosing
+square-brackets, such as:
+.sp
+.RS
+.nf
+
+\fBtr '[a-z]' '[A-Z]'
+\fP
+.fi
+.RE
+.LP
+However, BSD-based systems did not require the brackets, and this
+convention is used here to avoid breaking large numbers of BSD
+scripts:
+.sp
+.RS
+.nf
+
+\fBtr a-z A-Z
+\fP
+.fi
+.RE
+.LP
+The preceding System V script will continue to work because the brackets,
+treated as regular characters, are translated to
+themselves. However, any System V script that relied on \fB"a-z"\fP
+representing the three characters \fB'a'\fP,
+\fB'-'\fP, and \fB'z'\fP have to be rewritten as \fB"az-"\fP .
+.LP
+The ISO\ POSIX-2:1993 standard had a \fB-c\fP option that behaved
+similarly to the \fB-C\fP option, but did not supply
+functionality equivalent to the \fB-c\fP option specified in IEEE\ Std\ 1003.1-2001.
+This meant that historical practice
+of being able to specify \fItr\fP \fB-d\fP\\200-\\377 (which would
+delete all bytes with the top bit set) would have no effect
+because, in the C locale, bytes with the values octal 200 to octal
+377 are not characters.
+.LP
+The earlier version also said that octal sequences referred to collating
+elements and could be placed adjacent to each other to
+specify multi-byte characters. However, it was noted that this caused
+ambiguities because \fItr\fP would not be able to tell
+whether adjacent octal sequences were intending to specify multi-byte
+characters or multiple single byte characters.
+IEEE\ Std\ 1003.1-2001 specifies that octal sequences always refer
+to single byte binary values.
+.SH FUTURE DIRECTIONS
+.LP
+None.
+.SH SEE ALSO
+.LP
+\fIsed\fP
+.SH COPYRIGHT
+Portions of this text are reprinted and reproduced in electronic form
+from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
+-- Portable Operating System Interface (POSIX), The Open Group Base
+Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
+Electrical and Electronics Engineers, Inc and The Open Group. In the
+event of any discrepancy between this version and the original IEEE and
+The Open Group Standard, the original IEEE and The Open Group Standard
+is the referee document. The original Standard can be obtained online at
+http://www.opengroup.org/unix/online.html .