SGML Solutions
FrameMaker to SGML converter
 

With our extensive knowledge of Adobe's FrameMaker MIF (Maker Interchange Format) and our solid background in SGML we created this conversion tool for the critical task of transforming data from a desktop authoring format, in FrameBuilder MIF or FrameMaker + SGML MIF, to SGML. The program is independent of structures that may be defined in a Frame document and outputs elements in SGML format using the same names and structure as defined by the input's applied EDD (Element Definition Document - a FrameMaker document in which a hierarchical element structure is defined and later imported into a FrameMaker template).

The program implements this as an independent generic first step separating this primary task from any particular work application. A secondary remapping program pass can be defined for further converting element names, attributes and structure. This is essential whenever the element structure defined in the FrameMaker EDD differs from the target SGML DTD, as very often is the case.

When attached to the Framemaker GUI by means of an API client this program becomes a handy part of routine processing tasks.

SGML Converters
 

SGML data is not usually associated with a style definition, and it is generally considered good practice to separate structure from style, that is to maintain documents in both a structured format and a rendering format. The availability of SGML structured data allows for the easy alteration of style or the adoption of a different styling system at any time. This double requirement therefore also necessitates the creation and maintenance of more conversion filters.

We have developed converters for further transforming SGML data to HTML, XML and DynaText® to be applied as a last step in the production work flow.

SGML linker
 

Used primarily to convert unlinked cross-references into active links which can eventually be converted to HTML hyperlinks, this tool creates ID based links by matching text in source and target tags. ID attributes are first assigned at all possible target locations and the SGML file is then parsed and scanned for a set of tags predefined as candidates for conversion to cross-reference tags. Upon finding a match between the text inside a source reference tag and a target the program will assign an attribute of type IDREF at the source tag matching the target's ID. Furthermore, the values computed for ID attributes are a 7-byte base 36 hashing of the target text string. This hashing approach eliminates the requirement to reprocess and republish data files which have not changed since previous releases.

This program includes some very useful features such as the "interactive mode" by which alternative links can be assigned at source tags by scanning a sorted list of targets. A more detailed description of this program is available.

SGML Link Normalizer and Validator
 

This tool performs a validation on all active cross-reference links in the SGML format. In addition to verifying the existence and uniqueness of cross-reference target tags bearing matching IDs it also normalizes the text content between the source and target components of links. This is especially useful as a localization tool. When document sets are sent for translation the text at a cross-reference and the text at the matching target will be translated at different times and often by different translators, which invariably results in discrepancies. This tool will completely recreate the source tag by replacing its #PCDATA content with the text from the target tag and formatting it grammatically and syntactically in accordance with the paragraph design which was used to create the cross-reference initially within the authoring system. The program is also language-sensitive and capable of performing this normalization in multiple languages and in accordance with specific locale preferences.

This utility has had an important impact on the reduction of the translation costs of revision material. When documents are processed through a translation memory system, such as Trados®, a matching is performed on paragraphs in the document with paragraphs in a data repository where an archive of previously translated material is maintained. Upon finding an exact or 100% match the corresponding translation of the paragraph is inserted into the output document thus reducing the amount of words requiring costly and unnecessary re-translation by a professional. The normalization utility which will run after translators have completed their work will perform an automatic translation on all cross-reference segments by importing and re-formatting their now translated text from the corresponding unique target. All cross-reference segments can therefore be marked prior to the translation hand-off so that translators may bypass them, thus further reducing the cost of manual translation.

SGML sorting tool
 

A tool which sorts entire sections of an SGML file using a prescribed section tag as a sort key. This is essential when documents get translated. A good example would be a glossary file in which terms are ordered alphabetically. Here is a brief description of the procedure which takes place when sorting a glossary file coded in accordance with the DocBook DTD:

  • A list of sort keys is extracted from <Sect1><Title> segments
  • The file is fragmented into multiple smaller files containing data within <Sect1> wrappers
  • The key list is sorted
  • The file is reconstructed in the new order of the sorted keys

In addition, a most important feature of this application is the implementation of collation sequence definitions and ordering on single- and double-byte character sets. Instead of implementing a sorting algorithm based solely on the ordering of ASCII characters, we have created a generic method for the linguistic definition of character ordering including the weighting of accented latin characters and special Chinese, Korean and Japanese characters.

This is a Data Wizard application in which all parameters are set in a user-defined "sort definition" driver file. The driver files are self documenting and may provide technical insight to the interested programmer or project manager. Please feel free to examine or save any of the following sort definition files:

Single Byte Character Sets Double Byte Character Sets
   
US English Chinese Traditional
German Chinese Simplified
Spanish Korean
French Japanese (shift-JIS)
Italian  
Portuguese  

While presently hard-coded to work on DocBook tags, this program can be very easily re-adapted for other structures.

SGML Reports and Utilities
 

Maintaining an SGML repository makes it possible to write programs for generating statistical reports which can be of great use to the project manager in charge of collecting and organizing the various components that make up an online documentation product. The SGML format also allows for the creation of a miscellany of utilities for performing housekeeping and special conversion tasks on the data.

We have developed a wide variety of report and utility programs including:

  SGML Indexer
This program generates a back-of-book alphabetical index for a book or for a set of books by picking up index markers from within the SGML document body. It works with collation sequence sorting specifications in multiple languages.
  Graphics Reference Normalization
Used for normalizing directory path names inside attributes containing references to graphic files. Graphics on the web or other delivey media will not usually reside under the same local directory path used when references were created at authoring time. This program ensures that all references to graphics files are altered to reflect their final location on the online delivery system.
  Graphics file listings
These are essential for validation purposes and for gathering all referenced graphics files.
  Tables of Content Generator
Similar to the SGML indexer this program creates an SGML table of content for a single document or a set of documents by collecting user-defined chapter and section title tags from the SGML document body.
  Localization Prepping
Programs which can segregate specific portions of the document body prior to localization processing, and re-instate them afterwards.
  Entity Conversion
A utility which converts SGML entities to characters and vice versa.
  Attribute Updating
A program which changes the contents of attributes in accordance with user-defined search-and-replace tables. A very useful tool when attribute values depend on document revision cycles.
Home | Authoring | Development | Web Design | Data Wizard | Downloads | Contact Us
Doc Samples | Authoring Tools | Translation Tools | Localization |
SGML/XML
Web Consortium | HTML Spec | DocBook DTD

© 2004 - Pendulum Software Ltd.