perjantai 22. elokuuta 2008

What is the best programming platform for localization?

My job is to design and write software localization tool. Such a tool is software that helps other programmers to localize their applications. Localization tool is just another tool in the chain of IDE, compiler, make, profiler, etc. I have been doing this past 15 years. During that time I have got familiar to several programming platforms. Too many of them! This is because in order to support localization of the platform I first had to learn how to write code for it. Most of platform are still existing such as Visual C++, Visual Basic, Delphi, C++Builder, Java, Symbian, Palm, .NET (Window Forms) and PO (GetText). Some are brand new such as WPF and Android. Some of them are already gone such as Visual J++. Some of them I like (Delphi, WPF and Android), some of them are pretty neutral and some of them I don’t like at all (Symbian). My purpose here is not to compare what platform is the best or start any war about platforms. No, my plan it to tell what platform is the best for localization. If you forget about other needs or your working parameters (in most cases you do not have any choice over platform) and only consider about localization what platform should you use in order to make localization as easy and painless as possible.

My answer is simple. It is Delphi! Let me explain why.

Localizing is mostly about translating resources. Two most important resources are UI and strings. UI is dialog, menus, forms, image and everything you see on the screen. Strings are the text in the message boxes, list and so on. Of course there is many other resource types that also should sometimes be localized but most application only need UI and string resources to be localized. How is Delphi superior with these?

Let’s talk about resource strings first. Everybody that has used Visual C++ know how to use resource string. You add the string to the RC file, create ID if it and finally replace the hard coded string in your source code with call to LoadString function. This simple step requires you to maintain three different files (.cpp, .rc and .h) each having different file format. Every time you remove a hard coded string you have to open those files and edit them. As a result the original string moves very far away from the place where it is used. Why is this bad? Well, the programmer mostly edits and views the code. If you have something like this:

#include “resourceid.h”

{
str = LoadString(ID_MY_STRING);
}

You just don’t know what is the string without first checking what is the ID and then opening RC file that defines the string. This takes lots of time. Note that above is already simplified by using my own LoadString instead of WIN32 LoadString that requires four parameters.

Delphi’s answer to this is resourcestring clause. You actually define the resource string in your code.

resourcestring
SMyString = ‘Hello World’;
begin
str := SMyString;
end;

The nice part here is that original string, id and usage are all in the same place so you can view them all in a single sight and editing of them is very simple. I have not seen feature like this in any other platform. This brings one drawback. Delphi uses symbolic names in resource ID instead of integer numbers. When Delphi compiler compiles above it adds the string to the string resource and an assigns a numeric ID for it. Here is a problem. When you use a localization tool it identifies string resources by numeric ID and if Delphi changes this ID (it is very likely if you modify the code) you might end up fixed up or lost translations. Both of them are very bad thing. Fortunately some of the localization tools (Sisulizer, TsiLang) can make a link between Delphi’s symbolic name and numeric resource ID. As long as the symbolic name does not change (ID only changes) your translation are safe.

Why is Delphi good with UI? A simple answer: Form files that contain all information needed for localization, inherited forms and frames. Delphi uses forms as a basic building block of UI. Forms are just like dialogs in Visual C++. Form data is stored in a DFM file that is text file that contains all the components and properties of the form. When Delphi compiles the application it stores the binary DFM data in a custom resource (RT_RCDATA). A localization tool that can handle Delphi’s from resource can localize the form. Delphi’s form is much more flexible than Visual C++ dialog. Delphi is a component oriented language. It means that everything is components and all components contain properties. So when you design UI in Delphi you create a form and then you place the component and finally you add logic using component method and events. In Delphi you can inherit another from any other from. The inherited form contains all the components as the original plus you can add new ones. What is unique in Delphi is the fact that you can also change values of inherited components. Delphi also contains smaller UI building block called frame. This is a group on components that together build up a piece of UI. You can then drop these frames to you forms. What is very neat is the ability for inherit frames. All this makes UI designing very pleasant and productive in Delphi. However all the inherited stuff and frames makes life of localization tools very hard. If a localization tool wants to visually an inherited Delphi form that contains frames and even inherited frames the tool has to have lots of code to do that. As far as I know the localization tool that I wrote is the only one that can properly visually show these forms outside Delphi (for example on the computer of your translator).

Delphi’s DFM contains all the data need for localization. This includes the parent child relationship, type information, images, colors, fonts and so on. .NET Windows Forms also have inherited from and frame. However the implementation is bad. You can’t override inherited property value. The worst part is the fact that the resource file, .ResX, does not contain all the information needed for localization. In order to localize and provide full visual editor for a Windows Forms form the localization tool has to either read the source code or decompile compiled code in order to get parent child relationships and missing properties such as colors (that new not be localizer at all). All this hassle is a big mystery for me because .NET was created many years after VCL by partly by same guys as VCL but it still fails to provide productive mechanism for localization. WPF is a bit better but it also has problem such as not containing any type information about properties or components in the resource file (.xaml).

When you start you localized application you want to have full control over the language it is going to use. Again Delphi has good points. It has a build in feature to read resource only DLLs and you can force it to use any of the resources. .NET has the same feature but the all resource assemblies must locate in a sub directory of the main assembly. Delphi uses resource files on the same directory as the exe or DLL.

Some programmers want to enable runtime language switch. Traditionally this has been implemented by adding all string to a file or database and adding some code to the application that read the string from file or database on run time. Delphi provides an easy way to make 3rd party components and some vendors have implemented translation components for Delphi. They work very well. The problem is that they do not use Delphi’s build in localization architecture: the resource DLLs. Instead they provide their own way to do location and it means that you have to modify your code in order to take them in use. However it is possible to perform runtime language switch using Delphi’s resource DLLs. Sisulizer uses this approach by providing a small unit that you can add to you project. After that the runtime language switch in enable without any need to change the code and still using Delphi’s resource DLLs. The same is possible for .NET. Sisulizer also provides .NET class that does the same.

The last issue is deployment. One of Delphi’s great features is that you can make single EXE applications. It means that application does not need any DLL files, COM object or any other files whatsoever to work. This makes deployment a piece of cake and also improves quality because it is guaranteed that your application and libraries are compatible to each other. What is important is the fact that single EXE application are also much easier to localize. This is because many libraries contain UI and at least resource strings. If the library file is a standalone you have to provide localized version or resource DLL for the library. If you have many libraries and language you end up having dozens of files. Delphi’s localization support uses resource DLLs. These are standalone files so you end up having more files after all. Fortunately some localization vendors such as Sisulizer provide a way to embed resource DLL inside the EXE as custom resources. First time the application start it will create the resources DLLs from custom resource data. As a result you can create single EXE file that has support for any number of languages and you have full control over the initial language and finally you can change the language one run time. In order to do that you only have to add few lines of code into your main form. No other platform provides such flexibility.

Are there any bad things in Delphi for localization? Some but one of them is going to disappear soon. Very annoying thing of Delphi was missing of Unicode. Up to Delphi 2007 all Delphi applications were ANSI applications. This did not only disable localization for some languages such as Hindi but also made it hard to run an application in Japanese on an English computer. Fortunately this will change very soon when a Unicode enabled Delphi 2009 appears. It contains full support for Unicode and some other features that make writing international code ever easier. Another bad thing in Delphi Enterprise is that it contains a localization tool called ITE. It works but it is very hard to use and misses most the features that you expect to have in a localization tool. There is no translation memory, spell checker, or validation. Importing and exporting is very limited. Translators do not have WYWIWYG. ITE fails very often in maintaining existing translations after you have modified string ids or string values. Sometimes it even mixes translation that is totally unacceptable. My opinion is ITE works against Delphi because it makes people to believe that localization for Delphi is difficult. ITE’s quality does not reach the excellent internationalization level of Delphi. Everything else but ITE is very good in Delphi but ITE kinds of spoils the whole package. Fortunately there are some good 3rd party localization tools for Delphi.

Delphi 2009 will be the best tool available if you are planning to localize you application. Of course it is very nice and productive programming environment as well.

2 kommenttia:

Unknown kirjoitti...

Great read and interesting perspective.

A quick question, I have a teram member that is an avid Python programmer.

What would be your input on that language?

Jaakko Salmenius kirjoitti...

I do not have so much experience about Python. It is a great language but because I have not done any localization with it I can not tell.