A Translator's Rumblings

読書日記もあり

Editing an existing StarDict + recompile for Dictionary.app

I have been using Pali Text Society's Pali-English Dictionary in GoldenDict client (StarDict format) and and in Dictionary.app, with the data obtained from a now defunct website, but this data contains some noticeable issues:

  1. There are a few pages of additions and corrections in the printed edition, which need to be incorporated into the StarDict data.
  2. The initial long 'ā' always appears as short 'a'. So, as it stands, if you search for āmanteti and type 'amanteti', you will find the word but it is given under the header 'amanteti' (subsequent diacritics are correctly displayed. It seems to be just the initial long ā that's affected).
  3. There are other minor corrections, e.g. under paccassosi, it says see under patissuṇāṭi but it should actually be paṭissuṇāti.

Steps needed to edit the data:
I have performed the following steps using Debian 8.
I am using a set of StarDict-format data files, whose source I cannot now find.
The dictionary consist of the following files, archived in .bz2.
<filename>.dict.dz
<filename>.idx
<filename>.ifo

  1. Install stardict-tools. You will be using two commands from these tools, namely, stardict2txt and tabfile. The former converts the binary dictionary data to a human-readable .txt format, which you can edit. After editing the contents of the dictionary, you will need to compile it again using tabfile. However, after version 3.0.x, tabfile seems to have stopped allowing duplicate entries, e.g., a-1, a-2 and a-3, etc., the very first entries in the dictionary representing the short vowel 'a'. Therefore I have to install a much older version of stardict-tools which supported duplicate entries. So although it is tempting to simply type
    sudo apt-get install stardict-tools
    the latest version apt-get fetches is 3.0.2-4 and this doesn't support duplicate entries, so you will have to get an older Debian package. (There are patches you can apply to the source code to fix this but I was unable to compile it - 'make' returned some completely obscure error and I just couldn't go any further. Hence this rather desperate use of an ancient version. See http://www.simidic.org/wiki/index.php/Install_StarDict-Tools for patches.)
    Here is the download page for stardict-tools: http://archive.debian.net/etch/amd64/stardict-tools/download
  2. Now I have obtained the file stardict-tools_2.4.8-0.1_amd64.deb, let's install it. Installing the above file throws up some dependency error which doesn't seem to be solved automatically so download another package needed by this older version of stardict-tools: https://packages.debian.org/squeeze/amd64/libmysqlclient16/download
    and install as follows:
    gdebi libmysqlclient16_5.1.73-1_amd64.deb
    then
    gdebi stardict-tools_2.4.8-0.1_amd64.deb
  3. Copy the relevant files to your local path:
    cp /usr/lib/stardict-tools/* /usr/local/bin
  4. Now exit the superuser mode (unless you've been using sudo) and move to your dictionary folder where you have the three files. I'm assuming that you have extracted the tar.bz2 file or some such archived file.
  5. Convert the binary data to human-readable format:
    stardict2txt <filename>.ifo <output_filename>.txt
    Now you have the editable .txt file. 
  6. Make necessary changes and convert it back into StarDict:
    tabfile <filename>.txt
  7. This will create the three files which make up StarDict. Richt-click the .idx file and note the byte size of the .idx file and update the idxfilesize= value in the .ifo file.
  8. You can now copy the three files to your StarDict data folder and start using it with GoldenDict, etc. but I prefer to use the PED in Mac OS X's Dictionary.app so there are a few more steps to convert the stardict files to the .dictionary format.
  9. Put the three StarDict files in a folder, say 'PTS_PED' and archive it:
    tar cjvf PTS_PED.tar.bz2 PTS_PED
  10. Now, download an app called DictUnifier and install it on your Mac:
    https://github.com/jjgod/mac-dictionary-kit
    or run
    $ brew tap caskroom/cask
    $ brew install brew-cask
    $ brew cask install dictunifier
  11. Open Applications > DictUnifier.app and drag and drop the above .tar.bz2 file on the software. You specify the name of the dictionary to be shown in the catalogue of the dictionary client and convert. That's it. When you get the 'Done!' message, the resulting PTS_PED.dictionary is already in the ~/Library/Dictionary folder. All you need to do now is to tick the box next to the PED dictionary in Dictionary.app > Preferences.

Now the initial long ā is displayed correctly as seen below:

f:id:kotonohatrans:20150904170350p:plain

This is sweet. I plan to add more formatting to the data as it's quite difficult to read through the text when a dictionary entry has multiple meanings, sub-entries with compounds, etc. (try 'kamma'!)

(You can edit the display name of the dictionary in the tab. Thus I have changed the lengthy name of the PTS dictionary to 'PED'.)