sunnuntai 21. kesäkuuta 2020

Delphi is still a valuable programming language

Past few years, I have been using a lot of C# and JavaScript. Writing backend code and API using ASP.NET Core and client applications using Angular. C#/.NET Core and JavaScript/TypeScript are great languages, but Delphi still has its place.

Soluling localization tool has four parts: the GUI tool, the command-line tool, browser app, and the cloud service. Delphi is used in GUI and command-line tools. The big difference between Soluling and other modern localization tools/services is that in Soluling, the file scanning, string extraction and localized files creating happens locally on the developer's machine or on the build server. As we know, three operating systems are used in the build process: Windows, Linux, and macOS. So we needed a programming language that can handle all these in the same code base. There are several possible choices here: Python, Java, JavaScript, .NET Core, and Delphi! Yes, the Delphi compiler can create native code from the same source code for Windows, Linux and Mac.

Traditionally Delphi has been a great tool to create Windows desktop applications. This is why it was a good choice to create Soluling's GUI tool. However, the command-line tool needs the same classes as the GUI minus the UI part. When we started the Soluling project, Delphi did not support Linux, only Windows, and macOS. We hoped that Linux support would come, and soon it did. We also considered Java and Python, but because for both of them, building GUI is so much harder than with Delphi, we chose Delphi.

Now when we are implementing the cloud version, we use C# and Delphi. All the normal backend code like API, DB access is done in C# ASP.NET Core, but the file scanning, string extraction and file building is using the same Delphi code as Soluling GUI and Soluling command line. The shared code size is huge. Because we support more than 100 file formats and database formats we have hundredths of classes totaling almost 700 000 lines of code. This is all none UI code. With desktop and web UI it is almost double. the only place where we do not use Delphi is the browser app where we use Angular.

Using Delphi, we can reuse the same code in three places: GUI, command line and cloud. Soluling's customers can use the Soluling command-line on Windows, Linux and Mac. The GUI version is still Windows only and probably will stay like that.

lauantai 20. kesäkuuta 2020

Soluling - a new localization tool

Last week Soluling was released. It is a new localization tool. You might ask why do we need another localization tool when we have so many existing. There are several traditional desktop localization tools such as SDL Passolo, Alchemy Catalyst, and Sisulizer. There are also several web-based tools, such as Memsource, Crowdin, and Phrase. Well, between these two groups is a gap that Soluling fits. The desktop-based tool has the possibility to read all files of the development environment. It is easy to create a new project and update it when needed. The drawback is the translator collaboration. You need to send the project file to a translator, and they have to use that specific tool to edit it. An alternative approach is to export data to a format known by the tool of the translator (e.g., XLIFF). Web-based tools do not have that drawback. The translator just uses her browser to work with the project. However, the web-based tools have very shallow support for file formats. Basically, they can only work with XML and JSOM base formats. In addition, you have to send your files to the service or let the service to access your repo. I personally do not like either of the choices.

Soluling brings the best of both worlds. Soluling has all the benefits of the desktop tools: easy creation of the project, full access to all files, rich Windows-based UI. When you want to localize a file, application, or database, you first use Soluling GUI or Soluling command line to create a project. That creates a Soluling project file, .ntp. It is an XML based file that contains information about the files or project you want to localize, plus all the extracted strings of those files. In addition to the resource files, Soluling knows the project files too. For example, if you want to localize an ASP.NET Core application, you just add your .csproj or .sln file to the Soluling project. Soluling reads the project file locating the resource files and then extracts strings. If you later add a new resource file into your project, you don't have to modify your Soluling project. Once the Soluling project file, .ntp, has been created, you add that to the same repo with your source code. Then, in your build process, you use the command-line version of Soluling to maintain and update the project file. This means that before you build, you tell Soluling to rescan the project. If it finds new or modified strings, it adds them to the project file. This project file can then be sent for translation. You have several choices. You can use one of the then supported machine translator (DeepL, Google, Microsoft, etc.) or send the .ntp to your translator. The build process can also upload the project to the Soluling cloud when the translator can use Soluling's browser application to translate it. It will also be possible to order translations from Soluling's professional translators.

During the build process, the build server or DevOps pipeline uses Soluling's command line to create localized files. This process uses the current translations found in the .ntp file. If the .ntp had been upload to the cloud the process first export the most recent translation from the cloud and imports them to the local .ntp. This way the local .ntp is the snapshot of the cloud project. Soluling creates the localized files, localization application or resource files. Everything created by Soluling is ready to be deployed. There is no need to post process them.

Soluling supports more file, application, and database formats than any other localization tool or service. You can see the full list of the supported formats here.

You can use Soluling without the continuous localization process just by starting Soluling GUI, opening the file and performing the scan, export, import, and build operations.

sunnuntai 12. marraskuuta 2017

Grammatical numbers, grammatical genders and abbreviated numbers

I have been quiet for a very long time. It was not my plan but raising kids, writing new localization tool and having a new job have taken my time.

Past few years I have been working on internationalization and localization industry writing tools and APIs for I18N and L10N. I want to show one of those APIs.

https://github.com/soluling/I18N

It is a set of open source APIs for .NET and Delphi to fill gap between native I18N API of Windows and CLDR. I am a big fan of CLDR. It has a great number of cool stuff. Unfortunately there are some major limitations. First is that no all OS support full set of CLDR. This bring the need to use a CLDR library or add-on. There is one, ICU, but I don't like it too much. It is C++/Java only. It is huge. It is old. It is hard to use. It requires CLDR XML data. We need easier API to I18N. My plan is to implemented some of them. At start I implemented API for grammatical numbers, grammatical genders and abbreviated numbers. My approach differs from ICU. Instead of writing one cross platform API that requires CLDR XML data I each API on its native language. This means .NET API is 100% C# and Delphi API is 100% Delphi. Also the rules are in C# or Delphi. I wrote a tool that extract data from CLDR and generates both .cs and .pas files that are compiled into the library. The result is 100% native API that is ready to be used without any DLLs or XML files.

Last week I gave a session about these API at CodeRage XII. It was a pleasant experience. CodeRage is a Delphi oriented virtual conference that contains tons of good sessions about how to write better Delphi applications.

Unlike Android, ,NET and Delphi do not have a resource format suitable for grammatical numbers and genders. This is why my API uses multi patterns composed together and stored into a standard resource string. For example a .NET string for grammatical number would be something like this

"{plural, ski, one {I have {0} ski} other {I have {0} skis}"

Then instead of Format you would use API's MultiPattern.Format. The functions uses language specific rules to calculate what pattern to use and uses it.

The translator can add any number of pattern in the string. If you use a good localization tools (as shown on my Github pages) the tool takes care that you enter right amount of patterns.

If we need to show large numbers to the user we have traditionally converted numbers into strings and showed them. For example number like 143503000000 would be
143,503,000,000 if showed on English system but 143 503 000 000 when showed on Finnish system. In most cases this would be enough to properly show the number. However to make the number to occupy less space and to make it easier to understand the magnitude of the numbers we can use the abbreviation API. The above number becomes to 144B or 140B  depending on how many digits we want to use. In Finnish it would be 144 mrd. In Japanese 1440億. The API uses language specific rules to round and abbreviate the number.

If you are a .NET or Delphi programmer I hope you start using these APIs.

lauantai 14. maaliskuuta 2009

Double resourcing

This winter I have been implementing localization support for various mobile platforms such as Android, Silverlight, Qt and Symbian. In all these platforms I have faced a feature that I don’t like and that is bad for localization. As far as I know there is no name for this “feature” so I use the name I invented: double resourcing. What does double resourcing mean? Shortly it means situation where you do not add actual strings into your user interface (UI) resource files but instead you add all strings to flat string resource file(s) and you refer those strings from your UI resource files. Let me show a simple sample. Here is a UI resource of a button

[button id=”browseButton” caption=”&Browse...” with=”100” height=”30” color=”red” /]

The above line does not represent any actual resource file format but is purely used as demonstration purposes. The caption attribute contains the string of the button. In most cases you only localize this. Sometimes you also have to localize with and/or color. The good thing of above format is that all the data for button are in a single file and even in a single XML element. So the context of &Browse... string is the caption of browseButton. Button also has a context that is most likely form or dialog. Now if you use double resourcing you no longer write the string value in the caption attribute but instead you have to maintain two files. First a flat string file

browse_button_caption &Browse...

Secondly the UI resource file becomes to

[button id=”browseButton” caption=”link::strings.txt::browse_button_caption.” with=”100” height=”30” color=”red” /]

The caption attribute contains a reference to the actual string resource. In this case the string value is found from strings.txt file with browse_button_caption context.

Why is double resourcing a bad idea? First of all it makes creating and maintaining resource files a lot harder. You no longer put string into the resource file but you add the strings to a flat string file and put the link or id of the string to the resource file. When you move the strings from the original user interface resource to a flat string file you lose the context of the string. Context is a very important part in the localization process. Without it the translator may not figure out the full meaning of the string and this may lead to a bad translations. Also because you do not know where the string belongs to it is hard to check if the translations fit to the size of the element where the string came from.

I am not happy to notice that many platforms, even brand new, use double resourcing. Symbian has always used it. It is not wonder because frankly speaking Symbian is a nightmare for developers. How should its resource format being any better! There are more platforms using double resourcing. Even some samples of WPF recommend you to store strings into flat .resx files and place only links to the .xaml files. This is odd because XAML is a very good resource format and a single XAML file contains all elements of a page or window. You only need to translate the .xaml file. The reason for using .resx with WPF might be that there did not exist any good localization tools supporting XAML. Resx is well know format that is familiar for most developers and translators even very few translator can edit .resx with text editor without breaking it. However this is not a good reason. Using only XAML files makes creating and maintaining WPF projects much easier. Nowadays there are good localization tools that can localize XAML files. Microsoft should remove the sample files using double resourcing as soon as possible and to instruct users to localize XAML files.

Recently I have been implementing localization support for two very promising platforms: Qt and Android. Guess what? Both use double resourcing! I cannot believe this. Both Qt and Android has good UI resource format (ui and layout files). The vendors should encourage user to localize these files but it seems that platform developers do not take localization seriously. Somehow they believe if they provide flat resource string file it is all they need to do for localization. Hey, this is wrong. Only the UI files contain strings with full context and this gives translators all information they need to create high quality translations. UI file format is indeed more difficult than string file format but developers and translators should leave reading and writing of resource files for localization tools. They can cope with even complex XML or binary based resource files without breaking them.

As far as see the platform vendors should concentrate on following:
  • Implement structured UI resource file that contains all UI information such as strings, layouts, colors, etc. XML is a good format for this.
  • Implement mechanism where there can be localized UI files.
  • Implement feature where the developer can choose what language resource to use.
Microsoft has done this. They only problem is that some of its sample files do not encourage to use UI file localization. Android has almost same situation but all samples use double resourcing. Qt has not implemented runtime UI resources format at all. Symbian is hopeless. It does not use XML in the resource file format, it has several different incompatible resource formats and most of their samples use double resourcing. The platform has so many problems that even with this new Symbian foundation I don’t expect the platform surviving more than ten years.

Localization tool vendors should implement:
  • Tool can read the original UI files
  • Tool can visually show the items of the UI files.
  • Tool can create localized UI files.
  • Tool can compile the localized resource files to runtime format that is most often binary format (baml, mo, rss)
Of course every platform needs a flat string resource file but this should be used only for string resources such as error messages and dynamic strings.

Symbian, Google and QtSoftware should abandon double resourcing and move to UI file localization. Some platform vendors such as Borland/CodeGear has always used properly structured UI files and never used double resouring. In my first blog entry I told that Delphi is the best tool if you consider localization. In this blog and next to come I will every now and then compare bad things to good things and in most cases it will be same as comparing other platforms to Delphi.

perjantai 22. elokuuta 2008

What is the best programming platform for localization?

My job is to design and write software localization tool. Such a tool is software that helps other programmers to localize their applications. Localization tool is just another tool in the chain of IDE, compiler, make, profiler, etc. I have been doing this past 15 years. During that time I have got familiar to several programming platforms. Too many of them! This is because in order to support localization of the platform I first had to learn how to write code for it. Most of platform are still existing such as Visual C++, Visual Basic, Delphi, C++Builder, Java, Symbian, Palm, .NET (Window Forms) and PO (GetText). Some are brand new such as WPF and Android. Some of them are already gone such as Visual J++. Some of them I like (Delphi, WPF and Android), some of them are pretty neutral and some of them I don’t like at all (Symbian). My purpose here is not to compare what platform is the best or start any war about platforms. No, my plan it to tell what platform is the best for localization. If you forget about other needs or your working parameters (in most cases you do not have any choice over platform) and only consider about localization what platform should you use in order to make localization as easy and painless as possible.

My answer is simple. It is Delphi! Let me explain why.

Localizing is mostly about translating resources. Two most important resources are UI and strings. UI is dialog, menus, forms, image and everything you see on the screen. Strings are the text in the message boxes, list and so on. Of course there is many other resource types that also should sometimes be localized but most application only need UI and string resources to be localized. How is Delphi superior with these?

Let’s talk about resource strings first. Everybody that has used Visual C++ know how to use resource string. You add the string to the RC file, create ID if it and finally replace the hard coded string in your source code with call to LoadString function. This simple step requires you to maintain three different files (.cpp, .rc and .h) each having different file format. Every time you remove a hard coded string you have to open those files and edit them. As a result the original string moves very far away from the place where it is used. Why is this bad? Well, the programmer mostly edits and views the code. If you have something like this:

#include “resourceid.h”

{
str = LoadString(ID_MY_STRING);
}

You just don’t know what is the string without first checking what is the ID and then opening RC file that defines the string. This takes lots of time. Note that above is already simplified by using my own LoadString instead of WIN32 LoadString that requires four parameters.

Delphi’s answer to this is resourcestring clause. You actually define the resource string in your code.

resourcestring
SMyString = ‘Hello World’;
begin
str := SMyString;
end;

The nice part here is that original string, id and usage are all in the same place so you can view them all in a single sight and editing of them is very simple. I have not seen feature like this in any other platform. This brings one drawback. Delphi uses symbolic names in resource ID instead of integer numbers. When Delphi compiler compiles above it adds the string to the string resource and an assigns a numeric ID for it. Here is a problem. When you use a localization tool it identifies string resources by numeric ID and if Delphi changes this ID (it is very likely if you modify the code) you might end up fixed up or lost translations. Both of them are very bad thing. Fortunately some of the localization tools (Sisulizer, TsiLang) can make a link between Delphi’s symbolic name and numeric resource ID. As long as the symbolic name does not change (ID only changes) your translation are safe.

Why is Delphi good with UI? A simple answer: Form files that contain all information needed for localization, inherited forms and frames. Delphi uses forms as a basic building block of UI. Forms are just like dialogs in Visual C++. Form data is stored in a DFM file that is text file that contains all the components and properties of the form. When Delphi compiles the application it stores the binary DFM data in a custom resource (RT_RCDATA). A localization tool that can handle Delphi’s from resource can localize the form. Delphi’s form is much more flexible than Visual C++ dialog. Delphi is a component oriented language. It means that everything is components and all components contain properties. So when you design UI in Delphi you create a form and then you place the component and finally you add logic using component method and events. In Delphi you can inherit another from any other from. The inherited form contains all the components as the original plus you can add new ones. What is unique in Delphi is the fact that you can also change values of inherited components. Delphi also contains smaller UI building block called frame. This is a group on components that together build up a piece of UI. You can then drop these frames to you forms. What is very neat is the ability for inherit frames. All this makes UI designing very pleasant and productive in Delphi. However all the inherited stuff and frames makes life of localization tools very hard. If a localization tool wants to visually an inherited Delphi form that contains frames and even inherited frames the tool has to have lots of code to do that. As far as I know the localization tool that I wrote is the only one that can properly visually show these forms outside Delphi (for example on the computer of your translator).

Delphi’s DFM contains all the data need for localization. This includes the parent child relationship, type information, images, colors, fonts and so on. .NET Windows Forms also have inherited from and frame. However the implementation is bad. You can’t override inherited property value. The worst part is the fact that the resource file, .ResX, does not contain all the information needed for localization. In order to localize and provide full visual editor for a Windows Forms form the localization tool has to either read the source code or decompile compiled code in order to get parent child relationships and missing properties such as colors (that new not be localizer at all). All this hassle is a big mystery for me because .NET was created many years after VCL by partly by same guys as VCL but it still fails to provide productive mechanism for localization. WPF is a bit better but it also has problem such as not containing any type information about properties or components in the resource file (.xaml).

When you start you localized application you want to have full control over the language it is going to use. Again Delphi has good points. It has a build in feature to read resource only DLLs and you can force it to use any of the resources. .NET has the same feature but the all resource assemblies must locate in a sub directory of the main assembly. Delphi uses resource files on the same directory as the exe or DLL.

Some programmers want to enable runtime language switch. Traditionally this has been implemented by adding all string to a file or database and adding some code to the application that read the string from file or database on run time. Delphi provides an easy way to make 3rd party components and some vendors have implemented translation components for Delphi. They work very well. The problem is that they do not use Delphi’s build in localization architecture: the resource DLLs. Instead they provide their own way to do location and it means that you have to modify your code in order to take them in use. However it is possible to perform runtime language switch using Delphi’s resource DLLs. Sisulizer uses this approach by providing a small unit that you can add to you project. After that the runtime language switch in enable without any need to change the code and still using Delphi’s resource DLLs. The same is possible for .NET. Sisulizer also provides .NET class that does the same.

The last issue is deployment. One of Delphi’s great features is that you can make single EXE applications. It means that application does not need any DLL files, COM object or any other files whatsoever to work. This makes deployment a piece of cake and also improves quality because it is guaranteed that your application and libraries are compatible to each other. What is important is the fact that single EXE application are also much easier to localize. This is because many libraries contain UI and at least resource strings. If the library file is a standalone you have to provide localized version or resource DLL for the library. If you have many libraries and language you end up having dozens of files. Delphi’s localization support uses resource DLLs. These are standalone files so you end up having more files after all. Fortunately some localization vendors such as Sisulizer provide a way to embed resource DLL inside the EXE as custom resources. First time the application start it will create the resources DLLs from custom resource data. As a result you can create single EXE file that has support for any number of languages and you have full control over the initial language and finally you can change the language one run time. In order to do that you only have to add few lines of code into your main form. No other platform provides such flexibility.

Are there any bad things in Delphi for localization? Some but one of them is going to disappear soon. Very annoying thing of Delphi was missing of Unicode. Up to Delphi 2007 all Delphi applications were ANSI applications. This did not only disable localization for some languages such as Hindi but also made it hard to run an application in Japanese on an English computer. Fortunately this will change very soon when a Unicode enabled Delphi 2009 appears. It contains full support for Unicode and some other features that make writing international code ever easier. Another bad thing in Delphi Enterprise is that it contains a localization tool called ITE. It works but it is very hard to use and misses most the features that you expect to have in a localization tool. There is no translation memory, spell checker, or validation. Importing and exporting is very limited. Translators do not have WYWIWYG. ITE fails very often in maintaining existing translations after you have modified string ids or string values. Sometimes it even mixes translation that is totally unacceptable. My opinion is ITE works against Delphi because it makes people to believe that localization for Delphi is difficult. ITE’s quality does not reach the excellent internationalization level of Delphi. Everything else but ITE is very good in Delphi but ITE kinds of spoils the whole package. Fortunately there are some good 3rd party localization tools for Delphi.

Delphi 2009 will be the best tool available if you are planning to localize you application. Of course it is very nice and productive programming environment as well.

torstai 8. toukokuuta 2008

Hello

Hello. I am Jaakko and I am an architect of Sisulizer localization tool. My plan is to write localization related articles here.