A bunch of updates for the Greek FreeBSD/doc translations
02 Sep 2008, 15:18 by Giorgos KeramidasTranslations of technical documentation from English to Greek are a relatively difficult task. It takes a certain level of attention to detail and a fairly good command of both languages. Then there is the minor issue of keeping the translations up to date with their English counterparts.
Updating translations (the old style)
We have a growing body of translated work at the FreeBSD Greek documentation project team, and it was getting rather unwieldy going through each file manually and checking if there are updates in the English version that we would like to pull out of CVS, translate from scratch or re-translate, and commit to our main translation tree. Back when I started writing the original Greek translation build glue, I copied a tagging scheme used by existing translations that was helpful for this sort of manual check. Each translated file had a comment of the form:
<!-- Original revision: 1.17 -->
When looking for updates, one had to manually perform the following steps for each file in the doc/el_GR.ISO8859-7/ directory:
-
Check if the file includes an “Original revision” comment.
-
Extract the revision number from that comment, and note it down somewhere.
-
Make an educated guess about the pathname of the original English text. Some times the path is easy to guess by substituting el_GR.ISO8859-7 with en_US.ISO8859-1 in the file’s path name. Some other times, it isn’t so easy (especially for files in the el_GR.ISO8859-7/share directory).
-
Locate the
$FreeBSD: ... $
line in the original English text. -
Compare with the saved revision from the comment of the Greek text, and see if there are updates to translate.
There are just five steps for each file in this checking process. When translated files are just a bunch of articles, and a few makefiles, it’s boring to repeat these steps for each file, but it isn’t so difficult that nobody can do it. Now that we have Greek translations for a large part of the FreeBSD Handbook, and I am a bit more pressed for time, manually performing these steps for each file of the Greek translation tree started becoming very difficult to do in a timely manner.
New tools (checkupdate)
This was the main reason for writing the checkupdate script. With a lot of help from Gabor Pali, one of the committers who work for the Hungarian FreeBSD translations, I wrote a Python script called checkupdate and designed a tagging scheme that would make this part of the translator’s work much easier. We started by defining how a translator can “tag” a translated source file with the revision of the last fully translated English version. The idea we came up was:
Each translated file will contain a pair of tags called “
%SOURCE%
” and “%SRCID%
“. The%SOURCE
tag will point to the relative path of the English text under the doc/ tree. The%SRCID%
tag will refer to the last fully translated revision of the%SOURCE%
file.
An example for one of the translated Greek articles is:
$ pwd /ws/bsd/doc $ head -10 el_GR.ISO8859-7/articles/new-users/article.sgml <!-- $FreeBSD: doc/el_GR.ISO8859-7/articles/new-users/article.sgml,v 1.4 2008/01/14 14:19:42 keramida Exp $ Για Χρήστες Νέους τόσο στο FreeBSD όσο και στο Unix The FreeBSD Greek Documentation Project %SOURCE% en_US.ISO8859-1/articles/new-users/article.sgml %SRCID% 1.24$
Then we wrote a Python script that can “parse” the %SOURCE%
and %SRCID%
tags, look up the CVS
(or Subversion) revision number of the original English text, and report any differences. The
“interface” of the script was quite simple: a list of filenames is fed to the script through
standard input, and it assumes they are relative pathnames under the top of a doc/ checkout. This
way, to check all the files of the Greek translation one would run:
$ pwd/ws/bsd/doc $ find el_GR.ISO8859-7 | checkupdate
To check multiple translations trees at once it would be possible either to loop through the translations:
$ pwd/ws/bsd/doc $ for dname in el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 ; do \ find "${dname}" | checkupdate ; \ done
or just pass their names directly to find:
$ pwd/ws/bsd/doc $ find el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 | checkupdate
The first version of the script tried to include as much information about each translated file as
possible, so it used a relatively verbose output format. This is the default output format even
today. For the current version of the el_GR.ISO8859-7 translation tree the checkupdate
script
output includes the following:
$ find el_GR.ISO8859-7 | checkupdate el_GR.ISO8859-7/articles/Makefile rev. 1.16 1.39 -> 1.60 en_US.ISO8859-1/articles/Makefile el_GR.ISO8859-7/articles/laptop/article.sgml rev. 1.4 1.9 -> 1.25 en_US.ISO8859-1/articles/laptop/article.sgml[...]
Gabor (pgj) later added an option for compact output, because he likes seeing one line of output for
each file. The compact mode is enabled with the -c option of the checkupdate
script:
$ find el_GR.ISO8859-7 | checkupdate -c 1.39 -> 1.60 el_GR.ISO8859-7/articles/Makefile 1.9 -> 1.25 el_GR.ISO8859-7/articles/laptop/article.sgml [...]
The checkupdate
script has now been committed to the FreeBSD doc/ tree in CVS, and it includes a
short manpage too. The script and manpage sources are browsable online at:
http://cvsweb.freebsd.org/doc/el_GR.ISO8859-7/share/tools/checkupdate/
Updating translations (new style)
Using the checkupdate
script and a CVS checkout of the doc/ tree is much easier now. I usually
open two side-by-side terminals, and keep running CVS diff commands in one of them and checkupdate
in the other. A typical MFen session for one of the Greek articles includes:
-
Picking one of the translated files to update, from the output of
checkupdate
. For this example, let’s assume I want to update the laptop/article.sgml file. -
Running “cvs log” and “cvs diff” in the second terminal window, to look at each change committed in CVS:
$ cvs log -r1.9:1.25 en_US.ISO8859-1/articles/laptop/article.sgml | more $ cvs diff -r1.9 -r1.25 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
-
If the diffs seem to large to translate in one go, I may opt to translate each CVS change as a separate piece. The FreeBSD doc committers try to keep content and indentation changes separate, so it is often the case that translating revision 1.9 (a content change) as a standalone change is a lot easier than trying to decipher what changed between 1.8 and 1.10 (because revision 1.10 rewrapped and reformatted lots of text and it makes looking for the content changes of 1.9 unnecessarily hard).
-
Looking at only one revision of a file is slightly boring in CVS, but not really tough:
$ cvs diff -r1.8 -r1.9 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
-
When the translation of revision 1.9 is done, I commit it to the Mercurial tree I am using for local work, taking care to update the
%SRCID%
comment in the file to show that it is now synchronized with English revision 1.9. -
Some time later, a bunch of changes are pushed to the main Mercurial tree at http://hg.hellug.gr/freebsd/doc-el/
Recent updates
Using the checkupdate
script and the CVS diff commands described so far, I merged from the
English text a fair number of updates since last night. The commit email started tricking in late at
night, when I extracted the patches from my personal Mercurial tree and committed them into CVS:
2008-08-31 [ 29: Giorgos Keramidas ] cvs commit: doc/en_US.ISO8859-1/books/developers-handbook/policies chapter.s$ 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/share/sgml mailing-lists.ent 2008-09-01 [ 15: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/share/sgml freebsd.ent 2008-09-01 [ 14: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/books Makefile.inc 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/releng extra.css 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml 2008-09-01 [ 15: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 13: Giorgos Keramidas ] cvs commit: doc/en_US.ISO8859-1/articles/dialup-firewall article.sgml 2008-09-01 [ 14: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 15: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 14: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 14: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-01 [ 14: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml 2008-09-02 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/books/handbook colophon.sgml 2008-09-02 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml 2008-09-02 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml 2008-09-02 [ 13: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml 2008-09-02 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml 2008-09-02 [ 12: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml 2008-09-02 [ 14: Giorgos Keramidas ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
The number of commits looks scary, but in reality this was only because I was experimenting with separate MFen commits of each English revision.
In retrospect, this may not be a very good idea. We don’t really need all the English versions translated in CVS (some may be broken, others may be intermediate commits, or may be missing some bits). It doesn’t make sense to include all the false starts of the English docs in the el_GR.ISO8859-7 tree too. So the last two commits to CVS included a bunch of English revision merges in “collapsed” form:
keramida 2008-09-02 13:56:43 UTC FreeBSD doc repository Modified files: el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml Log: MFen: 1.11 -> 1.13 en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml Revision Changes Path 1.5 +9 -4 doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml keramida 2008-09-02 13:57:41 UTC FreeBSD doc repository Modified files: el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml Log: MFen: 1.13 -> 1.17 en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml Revision Changes Path 1.6 +198 -3 doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml
I think I like this commit style a bit better, and after a short discussion in the mailing list of the translators, Manolis seems to like this style too.
A nice article about dVCS in the Enterprise
Bryan W Taylor posted a very intriguing writeup a bit earlier, titled: “The Need for Distributed Version Control in the Enterprise“.
There are a few points of the article that seem a bit controversial. For instance, I am not sure I totally agree with the comments abouts “feature scoped” development ...
read moreMercurial teaser
How fast can you push changesets to your colleagues if you are using a client-server based SCM?
Mercurial (which is a distributed SCM), when changes are pushed over an NFS-shared filesystem, can push 24 changes, with 81 patches, affecting 54 files, in less than 0.3 sec:
gker@freya ...
Automated workspace updates with Mercurial
When using a distributed SCM, it is often very useful to be able to automatically sync a “reference” workspace with a remote, “parent” workspace. This way, even when offline, a local clone of the parent workspace is available.
Having a local clone of the “reference tree” is useful in many ...
read more