Conversion of Open Group's troff sources to POSIX man pages
===========================================================

This directory contains files and scripts that are used to convert
the POSIX manual pages to 'man' format, suitable for release by the
Linux man-pages project.

1. Necessary data:
==================

* obtainable from The Open Group
  - directory with the troff sources [1]
  - file ,xref.5 containing information to crossreferences
  - file _strings.def containing information to references to other
    standards
* obtainable online
  - the HTML version of the standard [2]

[1] The troff sources are not part of this repository, and must be
    obtained by contacting The Open Group.
[2] As at November 2020, the HTML version of the standard can be
    downloaded from https://pubs.opengroup.org/onlinepubs/9699919799/.


The directory of troff sources contains four directories: "Builtins",
"Commands", "Functions", "Headers". (Some of these contain
subdirectories with "LEGACY" interfaces.) The directories contain .mm
and .h files containing groff_mm files with extensions by The Open
Group. Upon request one can also obtain a file defining their custom
macros but this file is not necessary for the scripts.

A relevant line in ,xref.5 could look like

gropdf-info:href workdir page 104 Section 3.441

It contains a label ("workdir"), the page number and the
section number.

A line in _strings.def might look like

.ds Z5 ISO\ POSIX\(hy1 standard

This tells us how to translate the escape sequence \*(Z5 .

The HTML version of the standard can be obtained at

http://pubs.opengroup.org/onlinepubs/9699919799/download/index.html

The relevant files for the scripts are basedefs/V1_chap*.html,
functions/V2_chap*.html, utilities/V3_chap*.html and
xrat/V4_*_chap*.html. These are parts of the standard we do not
have the sources for.

2. Procedure to generate the man pages
======================================

Change your directory to the directory containing the conversion
scripts. Type

./,xref.1.awk < ,xref.5 > ,xref.1
./,xref.py /path/to/HTML_version_of_standard > ,xref

to generate ,xref and

sed -f _strings.sed _strings.def > _strings

to generate _strings. With this done you can start generating
individual man pages. To generate all pages use:

./posix.py 0p /path/to/troff_sources/Headers/*.h
./posix.py 1p /path/to/troff_sources/Built-Ins/*.mm
./posix.py 1p /path/to/troff_sources/Commands/*.mm
./posix.py 3p /path/to/troff_sources/Functions/*.mm

You can now find the converted pages in your current working
directory.

Clean up:

rm ,xref ,xref.1 _strings

3. Description of the included scripts
======================================

,xref.1.awk takes ,xref.5 from its standard input, strips
irrelevant lines and transforms lines of the form

gropdf-info:href whitespace page 103 Section 3.436

to

   whitespace Section 3.436

,xref.1.py expects ,xref.1 generated from ,xref.1.awk in the
current working directory and the path to the HTML version of
the standard as its first argument. It extracts section, table
and figure names for parts of the standard we do not have sources
for, adds them to the xrefs and writes them to standard output.
For the example, inside

/path/to/HTML_version_of_standard/basedefs/V1_chap03.html

it finds a line

class; see also <a href="#tag_03_436">White Space</a>.</p>

and therefore outputs

whitespace Section 3.436, White Space

to ,xref.

The sed script _strings.sed does a simple conversion of lines of
the form

.ds Z5 ISO\ POSIX\(hy1 standard

to

\*(Z5	ISO\ POSIX\(hy1 standard

The main script is posix.py. It takes the name of the man section
as its first argument and the names of the pages to be converted
as its other arguments. Furthermore, it expects the data files
,xref and _strings in its current working directory. It outputs
converted man pages to its current working directory.

Notes:

A final processing of the xrefs happens in posix.py: On the one
hand the section names for cross-references internal to the
current page are added.  On the other hand the references to
other man pages are correctly formatted. The order of the entries
in ,xref is used to deduce the right section number. This could
also be achieved by careful examining the source directory.

The code in posix.py to get the indentation right by inserting
".RS ..." and ".RE" in the right places is very hacky and might
fail with pages with a slightly more complex structure then now.