ECM engineering has filters for various file formats: illustrator
10, Illustrator 11, InDesign, CorelDraw, PhotoShop, Visio,
Excel, Syscat. These are NOT free. Prices range from EUR 80 to
EUR 240. General comments are that these filters work without
problems.
The
best way to deal with PDF files is to avoid them! :-)
Ask
your client for the editable file that generated the PDF file.
If
the above is not possible:
If text is heavily formatted, one has to scan and OCR
the PDF file. Formatting of the resulting file will not
be straightforward.
Remember: LOTS OF DTP/FORMATTING to be
done.
OCR software used by members of the group: Omni
Pro, Abbyy
Finereader. Freeware OCR: SimpleOCR
If formatting is not a must, open your document in Acrobat
Reader; go to View and select Continuous; go to Edit and
select Select All, then select Copy. Paste the text in
a Word doc; format as needed (delete paragraph marks,
etc.);
and import to Déjà Vu normally.
Consider a PDF File a "hard-copy" translation.
Translate it in a new document with "keyed" text.
Most PDF converters/extractors will place text
inside boxes, giving the document the same look as the original,
but still very difficult
to handle. That is, it is not "real formatting." Most of these
tools are not free. For a list of some of them, click
here.
To illustrate the procedure, assume that you want to break into
3 sections, with these page numbers:
1-300, 301-600, 601-780
Click Document / Extract pages, enter the numbers 1-300, (leave
the box "Delete pages after Extraction" unchecked).
This opens a new PDF document containing only those pages. Do
a "Save As" and give this document an appropriate filename.
Close it, and you're back at the original document.
Continue by extracting and saving pages 301-600, and finally 601-780.
There
is a program that quickly extracts the text from AutoCAD drawings,
for translation outside the AutoCAD environment. After translation,
the text is merged with the original graphic elements to produce
translated drawings.
The
translator does not need AutoCAD, but should have at least an
AutoCAD viewer, which is available free or at a low cost. AutoCAD
or a compatible program is needed only to perform preprocessing
and post-processing (to resolve interference between text and
other elements such as margins, lines, text boxes), but this job
does not have to be done by the translator. In some cases it might
be best for the client to do this job.
I've had good success processing WordPerfect
files in Déjà Vu X using the following workflow
:
1. Open WP file in OpenOffice.org.
2. Import OOo file into Déjà Vu and translate
in the usual way.
3. Export.
4. Open the exported file in OOo and resave in any of the Word
formats. RTF will also work, but generates Para Style codes
in the final WordPerfect file that need to be stripped out.
5. Open the .doc file in WordPerfect and adjust formatting.
This produces relatively idiomatic WP files
with much less formatting garbage than is produced if you use
WP's own RTF filters out and in.
In my testing, all three .doc formats
offered by OOo produced very similar results; the 6.0 and 95
ones
were identical. The smallest WP file was generated from the 97/2000/XP
conversion. RTF produced the biggest file, even after the
Para Style codes mentioned above were removed.
NOTE: If you are still using an OOo version earlier than 2.0,
this requires an open-source filter (import only for now) called
WriterPerfect - find it here.
Alternatively
1. - Open documents in WordPerfect
2. - Save as Word docs (6.0/95 preferably)
3. - Translate
4. - Open translated "Word" doc in WordPerfect and save as a
WordPerfect file.
Comment by Paul Cowan: It
might be a good idea to reiterate that using the proprietary
filters
in steps 2 and 4 will produce bigger files with lots of garbage.
2) look at what kind of files were extracted: help topics are
in HTML format,
there is most likely a TOC file with extension .hhc (could be
something else*),
an index file with extension .hhk, possibly a CSS file and images.
Maybe other
files, usually not for translation.
3) make sure your client also provided the *HHP* file for this
file set -- some
may bundle it inside the CHM (this way it's always current and
available, but
most won't because of an undesirable side-effect on the full
text search feature (the content of the HHP file would be indexed).
You need the HHP file to
recompile your translated file set into CHM format (using MS
HTML Help Workshop). In the HHP file, you'll typically need to
change the target file name
(for instance, if it contains a language code), the title attribute
(title
displayed in the title bar of the help viewer may contain translatable
words --
note that this title will only be displayed when the help is
viewed with
appropriate user/system locale settings for the target language),
the language
attribute (for correct index sorting, and possibly the font settings
(this is
usually needed for CJK languages).
Normally you just copy the image files and the CSS file over
to the target
repertory, and translate all HTML, HHC, HHK files, using DV's
usual HTML import
filter, which you may want to customize (using HTMHide.txt as
per the
indications in the DV help) to avoid importing all HTML file
paths that are part
of the index and TOC files.
Once everything is translated, compile the target helpset. Make
sure your index
is properly sorted (there's a sort tool in MS HHW). If the index
contains only
single-target entries, you can set it to binary (it would then
be automatically
sorted). If there is more than one help topic for some index
entries then don't
use a binary index (because the topic titles won't be listed
where expected in
the list of found topics -- this is bug we have to live with
since MS HHW is not
maintained any more). The proper user/system locale settings
must be enabled on
your system when the index is to be sorted, in order to achieve
the correct sort
order.
* The reason why some clients will name their TOC file with
a different
extension is to work around another HHW bug: if the extension
starts with "h",
then the TOC is included in the full-text search, and this produces
incorrect "
untitled" hits in the Topics found search results. To avoid
this, replace the "
.hhc" with anything else that does not start with "h".
This workaround is not an option for the HHP file, because MS
HHW only
recognizes ".hhp" for its HH project files.
I think that translators should always compile their translated
HTML Help
files into the final CHM format themselves. This is the only
way one can make
sure the index is properly sorted and *tweaked*. I don't want
series of
redundant entries to remain in my translated index.
This often happens when the same notion is worded in different
ways so that the
users will have more chances to find what they're looking for
in the index. When
such 'synonym' entries are translated they may end up starting
with the same
word, or with the same letter, thus appearing consecutively in
the translated
index. When this happens I always manually remove the redundant
entries from the
index file -- they don't help the user; they only create unnecessary
noise.
I always review the translated index and postedit it (partly
in a text editor,
partly in MS Help Workshop).
Another MS HHW bug I forgot to mention in my previous message
(to Herbert) is
that its manual index sort feature (the A/Z button) does not
work well when you
have sets of second-level entries in the index that belong to
different
first-level entries starting with the same word. You then need
to move the
misplaced entries to their proper location (under their parent
first-level
entry) manually. For instance, instead of:
meeting room
locating
scheduling
meeting
accepting
details
scheduling
viewing
sorting with the A/Z button produces:
meeting
meeting room
Using the binary index options (set in
the HHP file) produces a properly sorted
index, but it has another bug (see my previous message).
---- ...and an additional note from Suzanne:
BTW, another very powerful set of tools to work with the MS
HTML Help format (and also with the MS Help 2 format for .NET
applications) is Robert Chandler's FAR
HTML (shareware).
FAR is the only tool I know of, from which
you can *print* the TOC or index of a compiled help set.
The FAR website provides a lot of information on these help
formats, and more (including on the Longhorn/Vista help format).
DVX works with Adobe InDesign 2 files. Files should be saved
as "tagged text" in InDesign. For more info on how to
work with InDesign files directly in DVX (that is, without using
StoryCollector or other filter), see page 339 of the Workgroup
Manual. If you need to go through Trados, many times a client's
requirement, see below...
Rasmus Carlsson / Tim Wright / Guy Penet
InDesign through Trados' StoryCollector
The StoryCollector that comes with Trados FL 6.5.5 works like
a charm with InDesign CS. It exports ISC-files, that needs to
be converted to TTX in Trados TagEditor. The TTX import fine in
DVX, and then you just need to
reverse the process.
The plug-in files should be installed in a folder called Trados
in the InDesign Plug-in folder. When you start InDesign a new
menu option will appear on the menubar called Trados. This contains
the options for importing and exporting.
All this information and how to proceed with the import/export
routines can be found in the file StoryCollectorIND1033.hlp. Double
click on the file and all your frustrations will be relieved.
Indesign CS2 is not compatible with Trados 6.5. Only version
2.0 of Indesign works with Trados 6.5. You will have to upgrade
to Trados 7.0 for CS version of Indesign.
Copied from the Help file of the Indesign
plug-in section of Trados 7.0:
TRADOS Story Collector is an InDesign Plug-in. The Plug-in is
supported by InDesign 2.0 and InDesign CS. It allows you to gather
up all the stories in an InDesign document so that they can be
presented, in context, within one file for translation. Note that
InDesign CS2 is not currently supported by the Story Collector.
Taken from: Story Collector for InDesign Help. Copyright (c) June
2005 TRADOS Inc.
If you have Publisher, it is possible to export files
to rtf format (Word Art will not be exported).
Once in rtf, it is pretty easy to handle in any CAT tool. The
problem
is that there is no way back. Translation should be DTPed back
into the file, using Publisher.
See better method below.
The best
way to translate .pub files is to save them as Web Pages, translate
in any CAT tool, convert back to html format, and then
open back in Publisher and save as a pub files. The only problem
is with WordArt, which should be edited in Publisher afterwards.
Compared to the method above (exporting to rtf format and making
DTP work afterwards), this method is better. There is no need
for DTP work afterwards (except
for WordArt).
You'd at least
need to
export stuff from Catalyst, e g. in its glossary format, then use
that
for mid-processing in Catalyst (thanks Suzanne for off-list input!).
I don't know yet what version Catalyst is needed for it, the free
one,
or one of these versions:
Translator/Pro Edition: € 999
Localizer Edition: €3999
Developer/Pro Edition: €6499
What I do know, is that if you have only the free version, you
need the
client to send you files made with a developer/pro edition in a
processable format.
Other than that: try and get behind what source files the client
actually wants translated. Chances are, you can do them all in
DV, with
or without messing about.
...and from a different message sent by Gudmund...
Some hasty reflections:
- If you get the freelance version (be aware that there are two
different versions), you will only be able to process a certain
kind of
ttk files - the client has to have the right version for sending
ttk's
that can be processed in one of the free versions.
- If the client knows her/his way around (not always the case,
it would seem), they should be able to export the stuff as TMX,
which you can process in DV. Do take care to export with the right
charset, and that
the charset indicated in the exported file is the one it's really
in.
Report back if you choose that way, there's a free TMX validator
I can
point you to.
- Depending on how sensible (IMHO) a workflow they've opted for,
they
may or may not have chosen to block all manner of non-translatable
strings...
- Worst case, compatibility-wise, Catalyst can handle plain text,
tab
delimited 2-column files (bilingual) for the Catalyst variety of
pretranslation ("leverage").
- Be aware that there may be problems regarding string length
etc. that
will only show up in Catalyst if DV is used.
- The modern versions (not sure which editions) can interact/integrate
with Trados. They also handle XLIFF files. Be aware that the exact
meaning of the word "supports" has to be eked out from
edition to edition.
I don't know what the XLIFF support "Visual XLIFF 1.0" implicates
here,
but since DV doesn't handle XLIFF (yet, at least), it would mean
another (potentially lossy) round trip, e. g. to the po format.
If XLIFF and/or
XLIFF conversion is lossy in any one step, I doubt there's anything
to
be gained that way. If it isn't lossy, you might benefit from having
access to string specific comments and advice via F6 in DV(X).
Will answer any additional questions as best I can if they pop
up.
I got a lot of invaluable and friendly input from Suzanne Bolduc
back
then, thanks Suzanne! :)
------------
If I understood Gudmund correctly, one should export translatable
text in Catalyst's glossary format, translate it, and then import
it
back.
It seems
the export/import should be done with a paid version of the program.