ECM engineering has filters for various file formats: illustrator
10, Illustrator 11, InDesign, CorelDraw, PhotoShop, Visio,
Excel, Syscat. These are NOT free. Prices range from EUR 80 to
EUR 240. General comments are that these filters work without
best way to deal with PDF files is to avoid them! :-)
your client for the editable file that generated the PDF file.
the above is not possible:
If text is heavily formatted, one has to scan and OCR
the PDF file. Formatting of the resulting file will not
Remember: LOTS OF DTP/FORMATTING to be
OCR software used by members of the group: Omni
Finereader. Freeware OCR: SimpleOCR
If formatting is not a must, open your document in Acrobat
Reader; go to View and select Continuous; go to Edit and
select Select All, then select Copy. Paste the text in
a Word doc; format as needed (delete paragraph marks,
and import to Déjà Vu normally.
Consider a PDF File a "hard-copy" translation.
Translate it in a new document with "keyed" text.
Most PDF converters/extractors will place text
inside boxes, giving the document the same look as the original,
but still very difficult
to handle. That is, it is not "real formatting." Most of these
tools are not free. For a list of some of them, click
To illustrate the procedure, assume that you want to break into
3 sections, with these page numbers:
1-300, 301-600, 601-780
Click Document / Extract pages, enter the numbers 1-300, (leave
the box "Delete pages after Extraction" unchecked).
This opens a new PDF document containing only those pages. Do
a "Save As" and give this document an appropriate filename.
Close it, and you're back at the original document.
Continue by extracting and saving pages 301-600, and finally 601-780.
is a program that quickly extracts the text from AutoCAD drawings,
for translation outside the AutoCAD environment. After translation,
the text is merged with the original graphic elements to produce
translator does not need AutoCAD, but should have at least an
AutoCAD viewer, which is available free or at a low cost. AutoCAD
or a compatible program is needed only to perform preprocessing
and post-processing (to resolve interference between text and
other elements such as margins, lines, text boxes), but this job
does not have to be done by the translator. In some cases it might
be best for the client to do this job.
I've had good success processing WordPerfect
files in Déjà Vu X using the following workflow
1. Open WP file in OpenOffice.org.
2. Import OOo file into Déjà Vu and translate
in the usual way.
4. Open the exported file in OOo and resave in any of the Word
formats. RTF will also work, but generates Para Style codes
in the final WordPerfect file that need to be stripped out.
5. Open the .doc file in WordPerfect and adjust formatting.
This produces relatively idiomatic WP files
with much less formatting garbage than is produced if you use
WP's own RTF filters out and in.
In my testing, all three .doc formats
offered by OOo produced very similar results; the 6.0 and 95
were identical. The smallest WP file was generated from the 97/2000/XP
conversion. RTF produced the biggest file, even after the
Para Style codes mentioned above were removed.
NOTE: If you are still using an OOo version earlier than 2.0,
this requires an open-source filter (import only for now) called
WriterPerfect - find it here.
1. - Open documents in WordPerfect
2. - Save as Word docs (6.0/95 preferably)
3. - Translate
4. - Open translated "Word" doc in WordPerfect and save as a
Comment by Paul Cowan: It
might be a good idea to reiterate that using the proprietary
in steps 2 and 4 will produce bigger files with lots of garbage.
2) look at what kind of files were extracted: help topics are
in HTML format,
there is most likely a TOC file with extension .hhc (could be
an index file with extension .hhk, possibly a CSS file and images.
files, usually not for translation.
3) make sure your client also provided the *HHP* file for this
file set -- some
may bundle it inside the CHM (this way it's always current and
most won't because of an undesirable side-effect on the full
text search feature (the content of the HHP file would be indexed).
You need the HHP file to
recompile your translated file set into CHM format (using MS
HTML Help Workshop). In the HHP file, you'll typically need to
change the target file name
(for instance, if it contains a language code), the title attribute
displayed in the title bar of the help viewer may contain translatable
note that this title will only be displayed when the help is
appropriate user/system locale settings for the target language),
attribute (for correct index sorting, and possibly the font settings
usually needed for CJK languages).
Normally you just copy the image files and the CSS file over
to the target
repertory, and translate all HTML, HHC, HHK files, using DV's
usual HTML import
filter, which you may want to customize (using HTMHide.txt as
indications in the DV help) to avoid importing all HTML file
paths that are part
of the index and TOC files.
Once everything is translated, compile the target helpset. Make
sure your index
is properly sorted (there's a sort tool in MS HHW). If the index
single-target entries, you can set it to binary (it would then
sorted). If there is more than one help topic for some index
entries then don't
use a binary index (because the topic titles won't be listed
where expected in
the list of found topics -- this is bug we have to live with
since MS HHW is not
maintained any more). The proper user/system locale settings
must be enabled on
your system when the index is to be sorted, in order to achieve
the correct sort
* The reason why some clients will name their TOC file with
extension is to work around another HHW bug: if the extension
starts with "h",
then the TOC is included in the full-text search, and this produces
untitled" hits in the Topics found search results. To avoid
this, replace the "
.hhc" with anything else that does not start with "h".
This workaround is not an option for the HHP file, because MS
recognizes ".hhp" for its HH project files.
I think that translators should always compile their translated
files into the final CHM format themselves. This is the only
way one can make
sure the index is properly sorted and *tweaked*. I don't want
redundant entries to remain in my translated index.
This often happens when the same notion is worded in different
ways so that the
users will have more chances to find what they're looking for
in the index. When
such 'synonym' entries are translated they may end up starting
with the same
word, or with the same letter, thus appearing consecutively in
index. When this happens I always manually remove the redundant
entries from the
index file -- they don't help the user; they only create unnecessary
I always review the translated index and postedit it (partly
in a text editor,
partly in MS Help Workshop).
Another MS HHW bug I forgot to mention in my previous message
(to Herbert) is
that its manual index sort feature (the A/Z button) does not
work well when you
have sets of second-level entries in the index that belong to
first-level entries starting with the same word. You then need
to move the
misplaced entries to their proper location (under their parent
entry) manually. For instance, instead of:
sorting with the A/Z button produces:
DVX works with Adobe InDesign 2 files. Files should be saved
as "tagged text" in InDesign. For more info on how to
work with InDesign files directly in DVX (that is, without using
StoryCollector or other filter), see page 339 of the Workgroup
Manual. If you need to go through Trados, many times a client's
requirement, see below...
Rasmus Carlsson / Tim Wright / Guy Penet
InDesign through Trados' StoryCollector
The StoryCollector that comes with Trados FL 6.5.5 works like
a charm with InDesign CS. It exports ISC-files, that needs to
be converted to TTX in Trados TagEditor. The TTX import fine in
DVX, and then you just need to
reverse the process.
The plug-in files should be installed in a folder called Trados
in the InDesign Plug-in folder. When you start InDesign a new
menu option will appear on the menubar called Trados. This contains
the options for importing and exporting.
All this information and how to proceed with the import/export
routines can be found in the file StoryCollectorIND1033.hlp. Double
click on the file and all your frustrations will be relieved.
Indesign CS2 is not compatible with Trados 6.5. Only version
2.0 of Indesign works with Trados 6.5. You will have to upgrade
to Trados 7.0 for CS version of Indesign.
Copied from the Help file of the Indesign
plug-in section of Trados 7.0:
TRADOS Story Collector is an InDesign Plug-in. The Plug-in is
supported by InDesign 2.0 and InDesign CS. It allows you to gather
up all the stories in an InDesign document so that they can be
presented, in context, within one file for translation. Note that
InDesign CS2 is not currently supported by the Story Collector.
Taken from: Story Collector for InDesign Help. Copyright (c) June
2005 TRADOS Inc.
If you have Publisher, it is possible to export files
to rtf format (Word Art will not be exported).
Once in rtf, it is pretty easy to handle in any CAT tool. The
is that there is no way back. Translation should be DTPed back
into the file, using Publisher.
See better method below.
way to translate .pub files is to save them as Web Pages, translate
in any CAT tool, convert back to html format, and then
open back in Publisher and save as a pub files. The only problem
is with WordArt, which should be edited in Publisher afterwards.
Compared to the method above (exporting to rtf format and making
DTP work afterwards), this method is better. There is no need
for DTP work afterwards (except
You'd at least
export stuff from Catalyst, e g. in its glossary format, then use
for mid-processing in Catalyst (thanks Suzanne for off-list input!).
I don't know yet what version Catalyst is needed for it, the free
or one of these versions:
Translator/Pro Edition: € 999
Localizer Edition: €3999
Developer/Pro Edition: €6499
What I do know, is that if you have only the free version, you
client to send you files made with a developer/pro edition in a
Other than that: try and get behind what source files the client
actually wants translated. Chances are, you can do them all in
or without messing about.
...and from a different message sent by Gudmund...
Some hasty reflections:
- If you get the freelance version (be aware that there are two
different versions), you will only be able to process a certain
ttk files - the client has to have the right version for sending
that can be processed in one of the free versions.
- If the client knows her/his way around (not always the case,
it would seem), they should be able to export the stuff as TMX,
which you can process in DV. Do take care to export with the right
charset, and that
the charset indicated in the exported file is the one it's really
Report back if you choose that way, there's a free TMX validator
point you to.
- Depending on how sensible (IMHO) a workflow they've opted for,
may or may not have chosen to block all manner of non-translatable
- Worst case, compatibility-wise, Catalyst can handle plain text,
delimited 2-column files (bilingual) for the Catalyst variety of
- Be aware that there may be problems regarding string length
will only show up in Catalyst if DV is used.
- The modern versions (not sure which editions) can interact/integrate
with Trados. They also handle XLIFF files. Be aware that the exact
meaning of the word "supports" has to be eked out from
edition to edition.
I don't know what the XLIFF support "Visual XLIFF 1.0" implicates
but since DV doesn't handle XLIFF (yet, at least), it would mean
another (potentially lossy) round trip, e. g. to the po format.
If XLIFF and/or
XLIFF conversion is lossy in any one step, I doubt there's anything
be gained that way. If it isn't lossy, you might benefit from having
access to string specific comments and advice via F6 in DV(X).
Will answer any additional questions as best I can if they pop
I got a lot of invaluable and friendly input from Suzanne Bolduc
then, thanks Suzanne! :)
If I understood Gudmund correctly, one should export translatable
text in Catalyst's glossary format, translate it, and then import
the export/import should be done with a paid version of the program.