catdvi
a DVI to text/plain translator

Introduction

catdvi is a program that translates TeX Device Independent (DVI) files into readable plain text. The program is under development. It produces satisfactory results in many cases, but still has some issues with complicated input.

Goals

Actually, "translate to plain text" can mean several different things, depending on the intended use:

  1. Output formatted text that resembles the layout of the DVI file as closely as possible, suitable for e.g. preview on a character cell terminal or printing on a teletype style printer.
  2. Output unformatted text in "read order". (Rather than "print order", which makes quite a difference with e.g. multi-column page layouts). Useful for searching, indexing and other kinds of postprocessing, and maybe also for export to different text processors.
  3. Output (not completely plain) text in read order with the formatting distilled into some kind of markup so that paragraph breaks, subscripts, superscripts, etc. can still be recognized. This functionality is essentially a (La-)TeX decompiler, useful for recovery of lost or otherwise unavailable .tex files.

catdvi's principal target is to create human-readable text files from DVI input, and hence the first kind of translation.

The second kind is supported as well because one of the developers needed it and it could be obtained as an easy by-product (based on the mostly true assumption that read order = order in the source file = order in the DVI file).

The third kind of translation is the most difficult one to achieve since a DVI file does not contain logical markup information. The structure of the text has to be guessed from heuristic principles and an analysis of certain characteristics of TeX's output. No attempt in this direction has been made so far. But knowledge of some aspects of text structure would also help to improve the quality of layout in case 1. If it turns out these can reliably be guessed, an option to show them as markup will probably follow. This feature has low priority at the moment, especially since nobody has expressed a need for it.

Development status

The current version is 0.14, released 24. November 2002. It uses Unicode internally and can output Unicode (UTF-8), 8-bit ISO-8859-1 (aka Latin-1), 8-bit ISO-8859-15 (aka Latin-9) and 7-bit US-ASCII. The current version understands the Cork (T1) font encoding, most of Knuth's original font encodings, the encoding of all AMS fonts except the cyrillic ones, and some others. Font encodings for languages written in non-latin alphabets (except perhaps Klingon) may be included if somebody asks for them and is willing to act as consultant and beta tester for the implementation; the catdvi maintainer doesn't speak any such language.

Planned advances include:

Mailing list

The CatDVI project has a mailing list, catdvi-misc at lists.sourceforge.net, for miscellaneous discussion and announcements about catdvi. It can be used to discuss both users' and developers' problems. Subscribe to catdvi-misc using a Web interface.

Availability

See the Sourceforge project page for catdvi for source downloads, mailing list archives, bug reporting, anonymous CVS access and other stuff. The most recent release is always available there, and the current development source is available in the CVS repository. Additionally, a recent (but not necessarily the most recent) release of catdvi will always be available on CTAN in the directory dviware.

Requirements and portability

You need a hosted ISO C (1990) environment and the Kpathsea library (included with e.g. teTeX) to compile this program. GNU Make makes the compilation pleasant, but is not required. TeX font metric (.tfm) files for the fonts used in your DVI files have to be present at run time.

The program should be very portable. It is expected and intended that it will work on almost every system where an ISO C compiler and a port of the Kpathsea library are available. This includes most UNIX-like systems and many others.

Where possible, the code aims at ISO C compliance and as few assumptions about the working environment as possible are made. Searching for .tfm files in the file system is an inherently system-dependent activity and is currently done with help of the Kpathsea library. Non-kpathsea implementations of that functionality will be accepted if somebody codes them. The most notable known portability problem in other parts of the program is the assumption that CHAR_BIT equals 8; however, this assumption seems safe among contemporary platforms.

Development is done under GNU/Linux on x86. Additionally, different versions of catdvi have been verified to compile and work under GNU/Linux on Alpha, PPC and UltraSparc architectures, FreeBSD 4.3 on x86, Mac OS X on PPC, and AIX 4.2 on RS6000. If you have catdvi working on another platform, please send a note to the catdvi-misc mailing list (you need not be subscribed to do this). If the program does not work on your system, then please send a note as well so that the problem can be fixed.

Online documentation

The manual page, README, INSTALL and NEWS files for the current catdvi release are available for online reference. See the Links section below for information about some related topics.

Authors and copyright

catdvi was written by Antti-Juhani Kaijanaho <gaia@iki.fi>, based on a skeletal version by J.H.M. Dassen (Ray). Björn Brill <brill@fs.math.uni-frankfurt.de> did further improvements and currently maintains the program. The program is copyrighted by its authors. It contains code copyright Free Software Foundation, Inc.

The catdvi program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

The program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.

You should receive a copy of the GNU General Public License along with this program; see the file COPYING. If not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Links


Last modified on $Date: 2002/11/24 23:03:18 $ by $Author: bjoernb $ (at users.sourceforge.net)

Source Forge

Valid HTML 4.0! Valid CSS!