DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(gettext.info.gz) PO Files

Info Catalog (gettext.info.gz) Users (gettext.info.gz) Top (gettext.info.gz) Sources
 
 3 The Format of PO Files
 ************************
 
 The GNU `gettext' toolset helps programmers and translators at
 producing, updating and using translation files, mainly those PO files
 which are textual, editable files.  This chapter explains the format of
 PO files.
 
    A PO file is made up of many entries, each entry holding the relation
 between an original untranslated string and its corresponding
 translation.  All entries in a given PO file usually pertain to a
 single project, and all translations are expressed in a single target
 language.  One PO file "entry" has the following schematic structure:
 
      WHITE-SPACE
      #  TRANSLATOR-COMMENTS
      #. EXTRACTED-COMMENTS
      #: REFERENCE...
      #, FLAG...
      #| msgid PREVIOUS-UNTRANSLATED-STRING
      msgid UNTRANSLATED-STRING
      msgstr TRANSLATED-STRING
 
    The general structure of a PO file should be well understood by the
 translator.  When using PO mode, very little has to be known about the
 format details, as PO mode takes care of them for her.
 
    A simple entry can look like this:
 
      #: lib/error.c:116
      msgid "Unknown system error"
      msgstr "Error desconegut del sistema"
 
    Entries begin with some optional white space.  Usually, when
 generated through GNU `gettext' tools, there is exactly one blank line
 between entries.  Then comments follow, on lines all starting with the
 character `#'.  There are two kinds of comments: those which have some
 white space immediately following the `#' - the TRANSLATOR COMMENTS -,
 which comments are created and maintained exclusively by the
 translator, and those which have some non-white character just after the
 `#' - the AUTOMATIC COMMENTS -, which comments are created and
 maintained automatically by GNU `gettext' tools.  Comment lines
 starting with `#.' contain comments given by the programmer, directed
 at the translator; these comments are called EXTRACTED COMMENTS because
 the `xgettext' program extracts them from the program's source code.
 Comment lines starting with `#:' contain references to the program's
 source code.  Comment lines starting with `#,' contain flags; more
 about these below.  Comment lines starting with `#|' contain the
 previous untranslated string for which the translator gave a
 translation.
 
    All comments, of either kind, are optional.
 
    After white space and comments, entries show two strings, namely
 first the untranslated string as it appears in the original program
 sources, and then, the translation of this string.  The original string
 is introduced by the keyword `msgid', and the translation, by `msgstr'.
 The two strings, untranslated and translated, are quoted in various
 ways in the PO file, using `"' delimiters and `\' escapes, but the
 translator does not really have to pay attention to the precise quoting
 format, as PO mode fully takes care of quoting for her.
 
    The `msgid' strings, as well as automatic comments, are produced and
 managed by other GNU `gettext' tools, and PO mode does not provide
 means for the translator to alter these.  The most she can do is merely
 deleting them, and only by deleting the whole entry.  On the other
 hand, the `msgstr' string, as well as translator comments, are really
 meant for the translator, and PO mode gives her the full control she
 needs.
 
    The comment lines beginning with `#,' are special because they are
 not completely ignored by the programs as comments generally are.  The
 comma separated list of FLAGs is used by the `msgfmt' program to give
 the user some better diagnostic messages.  Currently there are two
 forms of flags defined:
 
 `fuzzy'
      This flag can be generated by the `msgmerge' program or it can be
      inserted by the translator herself.  It shows that the `msgstr'
      string might not be a correct translation (anymore).  Only the
      translator can judge if the translation requires further
      modification, or is acceptable as is.  Once satisfied with the
      translation, she then removes this `fuzzy' attribute.  The
      `msgmerge' program inserts this when it combined the `msgid' and
      `msgstr' entries after fuzzy search only.   Fuzzy Entries.
 
 `c-format'
 `no-c-format'
      These flags should not be added by a human.  Instead only the
      `xgettext' program adds them.  In an automated PO file processing
      system as proposed here the user changes would be thrown away
      again as soon as the `xgettext' program generates a new template
      file.
 
      The `c-format' flag tells that the untranslated string and the
      translation are supposed to be C format strings.  The `no-c-format'
      flag tells that they are not C format strings, even though the
      untranslated string happens to look like a C format string (with
      `%' directives).
 
      In case the `c-format' flag is given for a string the `msgfmt'
      does some more tests to check to validity of the translation.
       msgfmt Invocation,  c-format Flag and 
      c-format.
 
 `objc-format'
 `no-objc-format'
      Likewise for Objective C, see  objc-format.
 
 `sh-format'
 `no-sh-format'
      Likewise for Shell, see  sh-format.
 
 `python-format'
 `no-python-format'
      Likewise for Python, see  python-format.
 
 `lisp-format'
 `no-lisp-format'
      Likewise for Lisp, see  lisp-format.
 
 `elisp-format'
 `no-elisp-format'
      Likewise for Emacs Lisp, see  elisp-format.
 
 `librep-format'
 `no-librep-format'
      Likewise for librep, see  librep-format.
 
 `scheme-format'
 `no-scheme-format'
      Likewise for Scheme, see  scheme-format.
 
 `smalltalk-format'
 `no-smalltalk-format'
      Likewise for Smalltalk, see  smalltalk-format.
 
 `java-format'
 `no-java-format'
      Likewise for Java, see  java-format.
 
 `csharp-format'
 `no-csharp-format'
      Likewise for C#, see  csharp-format.
 
 `awk-format'
 `no-awk-format'
      Likewise for awk, see  awk-format.
 
 `object-pascal-format'
 `no-object-pascal-format'
      Likewise for Object Pascal, see  object-pascal-format.
 
 `ycp-format'
 `no-ycp-format'
      Likewise for YCP, see  ycp-format.
 
 `tcl-format'
 `no-tcl-format'
      Likewise for Tcl, see  tcl-format.
 
 `perl-format'
 `no-perl-format'
      Likewise for Perl, see  perl-format.
 
 `perl-brace-format'
 `no-perl-brace-format'
      Likewise for Perl brace, see  perl-format.
 
 `php-format'
 `no-php-format'
      Likewise for PHP, see  php-format.
 
 `gcc-internal-format'
 `no-gcc-internal-format'
      Likewise for the GCC sources, see  gcc-internal-format.
 
 `qt-format'
 `no-qt-format'
      Likewise for Qt, see  qt-format.
 
 `boost-format'
 `no-boost-format'
      Likewise for Boost, see  boost-format.
 
 
    It is also possible to have entries with a context specifier. They
 look like this:
 
      WHITE-SPACE
      #  TRANSLATOR-COMMENTS
      #. EXTRACTED-COMMENTS
      #: REFERENCE...
      #, FLAG...
      #| msgctxt PREVIOUS-CONTEXT
      #| msgid PREVIOUS-UNTRANSLATED-STRING
      msgctxt CONTEXT
      msgid UNTRANSLATED-STRING
      msgstr TRANSLATED-STRING
 
    The context serves to disambiguate messages with the same
 UNTRANSLATED-STRING.  It is possible to have several entries with the
 same UNTRANSLATED-STRING in a PO file, provided that they each have a
 different CONTEXT.  Note that an empty CONTEXT string and an absent
 `msgctxt' line do not mean the same thing.
 
    A different kind of entries is used for translations which involve
 plural forms.
 
      WHITE-SPACE
      #  TRANSLATOR-COMMENTS
      #. EXTRACTED-COMMENTS
      #: REFERENCE...
      #, FLAG...
      #| msgid PREVIOUS-UNTRANSLATED-STRING-SINGULAR
      #| msgid_plural PREVIOUS-UNTRANSLATED-STRING-PLURAL
      msgid UNTRANSLATED-STRING-SINGULAR
      msgid_plural UNTRANSLATED-STRING-PLURAL
      msgstr[0] TRANSLATED-STRING-CASE-0
      ...
      msgstr[N] TRANSLATED-STRING-CASE-N
 
    Such an entry can look like this:
 
      #: src/msgcmp.c:338 src/po-lex.c:699
      #, c-format
      msgid "found %d fatal error"
      msgid_plural "found %d fatal errors"
      msgstr[0] "s'ha trobat %d error fatal"
      msgstr[1] "s'han trobat %d errors fatals"
 
    Here also, a `msgctxt' context can be specified before `msgid', like
 above.
 
    The PREVIOUS-UNTRANSLATED-STRING is optionally inserted by the
 `msgmerge' program, at the same time when it marks a message fuzzy.  It
 helps the translator to see which changes were done by the developers
 on the UNTRANSLATED-STRING.
 
    It happens that some lines, usually whitespace or comments, follow
 the very last entry of a PO file.  Such lines are not part of any entry,
 and will be dropped when the PO file is processed by the tools, or may
 disturb some PO file editors.
 
    The remainder of this section may be safely skipped by those using a
 PO file editor, yet it may be interesting for everybody to have a better
 idea of the precise format of a PO file.  On the other hand, those
 wishing to modify PO files by hand should carefully continue reading on.
 
    Each of UNTRANSLATED-STRING and TRANSLATED-STRING respects the C
 syntax for a character string, including the surrounding quotes and
 embedded backslashed escape sequences.  When the time comes to write
 multi-line strings, one should not use escaped newlines.  Instead, a
 closing quote should follow the last character on the line to be
 continued, and an opening quote should resume the string at the
 beginning of the following PO file line.  For example:
 
      msgid ""
      "Here is an example of how one might continue a very long string\n"
      "for the common case the string represents multi-line output.\n"
 
 In this example, the empty string is used on the first line, to allow
 better alignment of the `H' from the word `Here' over the `f' from the
 word `for'.  In this example, the `msgid' keyword is followed by three
 strings, which are meant to be concatenated.  Concatenating the empty
 string does not change the resulting overall string, but it is a way
 for us to comply with the necessity of `msgid' to be followed by a
 string on the same line, while keeping the multi-line presentation
 left-justified, as we find this to be a cleaner disposition.  The empty
 string could have been omitted, but only if the string starting with
 `Here' was promoted on the first line, right after `msgid'.(1) It was
 not really necessary either to switch between the two last quoted
 strings immediately after the newline `\n', the switch could have
 occurred after _any_ other character, we just did it this way because
 it is neater.
 
    One should carefully distinguish between end of lines marked as `\n'
 _inside_ quotes, which are part of the represented string, and end of
 lines in the PO file itself, outside string quotes, which have no
 incidence on the represented string.
 
    Outside strings, white lines and comments may be used freely.
 Comments start at the beginning of a line with `#' and extend until the
 end of the PO file line.  Comments written by translators should have
 the initial `#' immediately followed by some white space.  If the `#'
 is not immediately followed by white space, this comment is most likely
 generated and managed by specialized GNU tools, and might disappear or
 be replaced unexpectedly when the PO file is given to `msgmerge'.
 
    ---------- Footnotes ----------
 
    (1) This limitation is not imposed by GNU `gettext', but is for
 compatibility with the `msgfmt' implementation on Solaris.
 
Info Catalog (gettext.info.gz) Users (gettext.info.gz) Top (gettext.info.gz) Sources
automatically generated byinfo2html