Internationalization Frequently Asked Questions

Java

Internationalization
Frequently Asked Questions

This page answers common questions about internationalization of the Java 2 platform, Standard Edition, version 1.4, and of Sun's Java 2 Runtime Environments, Standard Edition, version 1.4. For more information, see the Internationalization home page.

General Questions
Locales
Resource Bundles
Text Processing
Character Encodings
Text Input
Text Rendering
Component Orientation
Miscellaneous

General Questions

What is internationalization?

Internationalization allows software to be adapted to any language and cultural convention. During the internationalization process, the programmer isolates the parts of a program that are dependent on language and culture. For example, the programmer will isolate error messages because they must be translated during localization.

What is localization?

Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

How do I go about internationalizing an existing program?

See the steps outlined in the Checklist section of the The Java Tutorial.

Locales

What is a locale?

A locale is a geographic or political region that shares the same language and customs. In the Java programming language, a locale is represented by a Locale object. Locale-sensitive operations, such as collation and date formatting, vary according to locale.

Where can I find some coding examples that use `Locale` objects?

See the Setting the Locale section of the The Java Tutorial.

Which locales are supported?

The supported locales vary between different implementations of the Java 2 platform and between areas of functionality. Information about the supported locales in Sun's Java 2 Runtime Environments is provided by the Supported Locales document.

Can a Java application use multiple locales?

Yes. This capability allows you to create multilingual applications.

Can I set the default locale from outside an application?

This depends on the implementation of the Java 2 platform you're using. The initial default locale is normally determined from the host operating system's locale. Version 1.4 of Sun's Java 2 Runtime Environments lets you override this by setting the user.language, user.country, and user.variant system properties from the command line. For example, to select Locale("de", "DE", "EURO") as the initial default locale, you would use:

java -Duser.language=de -Duser.country=DE -Duser.variant=EURO MainClass

Since not all runtime environments provide this feature, it should only be used for testing.

Resource Bundles

What is a resource bundle?

A ResourceBundle object allows you to isolate localizable elements from the rest of the application. With all resources separated into a bundle, the application simply loads the appropriate bundle for the active locale. If the user switches locales, the application just loads a different bundle.

Where can I find some coding examples that use `ResourceBundle` objects?

See the Isolating Locale-Specific Data section of the The Java Tutorial.

How do I specify non-ASCII strings in a properties file?

You can specify any Unicode character with the \uXXXX notation. (The XXXX denotes the 4 hexadecimal digits that comprise the Unicode value of a character.) For example, a properties file might have the following entries:

s1=hello there
s2=\uff2d\uff33\u30b4

If you have edited and saved the file in a non-ASCII encoding, you can convert it to ASCII with the native2ascii tool. For example, you might want to do this when editing a properties file in Shift-JIS, a popular Japanese encoding.

How do I compile a non-ASCII `ListResourceBundle`?

If your source file is in a non-ASCII encoding, you can direct the compiler to convert it into Unicode. For example, you would compile a Japanese resource bundle written in the Shift-JIS encoding as follows:

javac -encoding SJIS LabelsResource_ja.java

Text Processing

How do I format a date?

You can use the SimpleDateFormat to format and parse dates in a locale-sensitive manner. See the section on formatting Dates and Times in the The Java Tutorial.

Are formatters thread-safe?

Instances of java.text.Format and its subclasses are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.

How does setting the default locale affect the results of sorting?

The Collator class, and its subclasses, are used for building sorting routines. These classes are locale-sensitive, and when created with the no-argument constructor will use the collating sequence of the default locale.

The Collator object supports different levels of decomposition and strength. How do I choose the right decomposition and strength in a locale?

Since decomposing takes time, turning decomposition off makes comparisons go faster. However, for Latin languages the NO_DECOMPOSITION mode is not useful if the text contains accents. You should use the default decomposition unless you really know what you're doing.

The strength property you choose depends on what your application is trying to accomplish. For example, when performing a text search you may allow a "weak" match, in which accents and differences in case (upper vs. lower) are ignored. This type of search employs the PRIMARY strength. If you are sorting a list of words, you might want to use the TERTIARY strength. In this mode the properties that must match are the base character, accent, and case.

Character Encodings

What is a character encoding?

A character encoding is a mapping between characters and code values.

What is Unicode?

In the Java programming language, char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages. You can learn more about the Unicode standard at the Unicode Consortium web site.

How do I convert data between Unicode and other character encodings?

The Converting Non-Unicode Text section of the The Java Tutorial explains how to perform the conversions within an application using high-level APIs, or see the java.nio.charset.Charset class if you need more direct access to character conversion. To convert data files, use the native2ascii tool.

Which character encodings are supported when converting text to and from Unicode?

See the Supported Encodings web page.

How do I create my own character converters?

The java.nio.charset.spi.CharsetProvider class lets developers create their own character converters.

What is the default encoding?

The default encoding is selected by the Java runtime based on the host operating system and its locale. For example, in the US locale on Windows, Cp1252 is used. In the Simplified Chinese locale on Solaris, either EUC_CN or GBK can be the default encoding, depending on the selection made when logging into Solaris.

The default encoding is significant because the Java programming language uses Unicode to represent characters, but the file system of the host operating system usually uses some other encoding. The default encoding has to match the encoding used by the host operating system to ensure correct interaction.

What is the UTF-8 encoding?

UTF-8 stands for Universal Transformation Format, 8-bit encoding form. It is a transmission format for Unicode that is suitable for use with many network protocols and UNIX file systems.

Are the Cp1252 and ISO8859_1 encodings identical?

No. Cp1252 contains some additional characters in the range from 0x80 to 0x9F. See the Microsoft documentation for more information.

Text Input

What is the Input Method Framework?

The input method framework enables all text editing components to receive Japanese, Chinese, or Korean text input through input methods. An input method lets users enter thousands of different characters using keyboards with far fewer keys. Typically a sequence of several characters needs to be typed and then converted to create one or more characters. For specifications and examples see the web page, Input Method Framework.

What does it mean to switch input methods?

A user may have multiple input methods available. For example, the user may have input methods for different languages or input methods that accept various types of input. Such a user must be able to select the input method used for a particular language or the input method that provides the fastest input.

Can an input method be selected and activated programmatically?

An application can request an input method that supports a specific locale using the InputContext.selectInputMethod method, but it cannot select a specific input method - that selection is up to the user.

An application can activate an input method using the InputContext.setCompositionEnabled method.

Do the AWT and Swing (JFC) text components work with input methods?

See the Input Methods section of the Java 2 SDK Internationalization Overview.

Text Rendering

What choices does an application have in selecting fonts?

An application using lightweight components can select fonts in four different ways:

Using logical font names: The Java 2 platform defines five logical font names that every implementation must support: Serif, SansSerif, Monospaced, Dialog, and DialogInput. These logical font names are mapped to physical fonts in implementation dependent ways. Typically one logical font name maps to several physical fonts in order to cover a large range of characters.
Using physical font names: The Java 2 platform provides APIs that let an application determine which fonts are available to a given runtime and which characters these fonts can handle, and request these fonts using their real name (for example, "Times Roman" or "Helvetica"). The application can either let the user choose fonts or programmatically determine the fonts to be used.
Using the Lucida fonts: Sun's Java 2 Runtime Environments contain this family of physical fonts, which is also licensed for use in other implementations of the Java 2 platform. These fonts are physical fonts, but don't depend on the host operating system.
Using bundled physical fonts: An application can bundle TrueType fonts and instantiate them using the Font.createFont method.

An application using peered AWT components can only use logical font names.

What are the advantages and disadvantages of these four approaches?

Here's a brief summary:

Using logical font names:
- Advantages: These font names are guaranteed to work anywhere, and they enable text rendering in at least the language that the host operating system is localized for (often a much larger range of languages).
- Disadvantages: The physical fonts used for rendering the text vary between different implementations, host operating systems, and locales, so an application can not achieve the same look everywhere. Also, the mapping mechanisms often limit the range of characters that can be rendered. For example, in Sun's Java 2 Runtime Environments Japanese text can only be rendered on Japanese localized host operating systems, not on other localized systems even if Japanese fonts have been installed.
Using physical font names:
- Advantages: This approach lets an application take full advantage of all available fonts, to accomplish both different text appearances and maximum language coverage.
- Disadvantages: This approach is substantially harder to program.
Using the Lucida fonts:
- Advantages: Applications using these fonts can achieve the same look wherever these fonts are available. Also, these fonts cover a large range of languages (especially European and Middle Eastern), so you can create fully multilingual applications for the supported languages.
- Disadvantages: These fonts may not be available in all Java 2 runtime environments. Also, they currently do not cover the complete Unicode character set; in particular, Chinese, Japanese, and Korean are not supported.
Using bundled physical fonts:
- Advantages: Applications using these fonts can achieve the same look everywhere, and have full control over which languages they support.
- Disadvantages: The bundled fonts may be quite big, in particular if they support Chinese, Japanese, and Korean. Licensing issues need to be resolved.

Why doesn't my application display any Chinese, Japanese, or Korean characters even though I have fonts for these languages installed?

The answer depends on how your application selects fonts - see above.

Using logical font names: To use a physical font, it must be selected by the mapping mechanism. In Sun's Java 2 Runtime Environments, fonts for Chinese, Japanese, or Korean are only selected when running on host operating systems localized for these specific languages. To change the mapping, you need to modify a font.properties file - see below.
Using physical font names: Your application may not be selecting the fonts correctly, or the font may be using an encoding that's not supported by the Java 2 Runtime Environment.
Using the Lucida fonts: The Lucida fonts included in Sun's Java 2 Runtime Environments do not support Chinese, Japanese, or Korean.
Using bundled physical fonts: The fonts bundled with your application may not support these languages.

What is a font.properties file?

The font.properties files are used in Sun's Java 2 Runtime Environments to map logical font names to physical fonts. There are several files to support different mappings depending on host operating system version and locale. The files are located in the lib directory within the J2RE installation.

Note that font.properties files are implementation dependent. Not all implementations of the Java 2 platform use them, and the format and content vary between different runtime environments as well as between releases.

How do I add a physical font to the mapping of a logical font?

Since the mapping from logical fonts to physical fonts is implementation dependent, the answer varies. For Sun's Java 2 Runtime Environments, you need to create or modify a font.properties file - see the web page The font.properties Files. Note however that this is a modification of the J2RE, and Sun does not support modified J2REs. For other implementations, see their respective documentation.

Why can I see some characters in Swing components, but not in peered AWT components?

Swing user interface components use a different mechanism to render text than peered AWT components. The Swing components use the Graphics.drawString method, typically specifying a logical font name. The logical font name is then mapped to a set of physical fonts to cover a large range of characters. AWT components on the other hand are implemented using host operating system components. These host operating system components often do not support Unicode, so the text gets converted to some other character encoding, depending on the host operating system and locale. These encodings often cover a smaller range of characters than the physical fonts used to implement logical font names. For example, on a Japanese Windows system, many European accented characters are mapped to the Arial font for Swing components, but get lost when converting the text to the Shift-JIS encoding for peered AWT components.

Why can't my application display all Unicode characters even though I have a Unicode font installed?

As in the Chinese/Japanese/Korean case above, this may be because text is not rendered using the Unicode font at all or only for some characters. If your application selects the Unicode font using its physical font name, and it still cannot render all characters, it could be that the Unicode font doesn't in fact cover the entire Unicode character set - sometimes a font is called a Unicode font if it just provides the tables that support the Unicode character encoding.

What font types do Sun's Java 2 Runtime Environments support?

See the Supported Fonts document.

Is it possible to display more than one language in Sun's Java 2 Runtime Environments?

The short answer is yes. The long answer needs to look at which languages you want to display at the same time, and how your application selects fonts.

It is quite common for a group of languages to share a small common character set - for example, the Western European languages can be written in the ISO 8859-1 character set. If you only need to display languages within such a group, you usually don't need to do anything - it will just work.
If the languages you need to display are all supported by the Lucida font family, and your application only needs to run on Java 2 runtime environments that contain this font family, you can simply use fonts from that family.
If you need to support languages using separate character ranges, and your application selects fonts using logical font names, you need to create a font.properties file that supports all the languages. See the web page, The font.properties Files, for details.
If you need to support languages using separate character ranges, and your application selects fonts using physical names, you need to select the fonts using information about the range of characters that they support.

Can Sun's Java 2 Runtime Environment render text in Thai, Lao, Burmese, or any of the Indic scripts?

Among the South and South-East Asian scripts, version 1.4 of Sun's Java 2 Runtime Environments supports Thai and Devanagari. For a complete list of all supported writing systems, see the Supported Locales document. Support for other writing systems may be added in future releases.

Component Orientation

Which user interface components implement component orientation in Sun's Java 2 Runtime Environments?

See the Supported Locales document.

Miscellaneous

Do Sun's Java 2 Runtime Environments support the Euro currency?

Yes, Sun's Java 2 Runtime Environments let you type the Euro character, render it, convert it from and to numerous character encodings, and use it when formatting numeric values as currency. For text input and rendering, you need the appropriate support in the host operating system - see the documentation for Windows and Solaris (general information and patches). For formatting with the Euro currency symbol before 1/1/2002, you can request a locale with the "EURO" variant or specify the currency using the new class java.util.Currency. Starting from 1/1/2002, Sun's Java 2 Runtime Environments v. 1.4 will use the Euro as the default currency for the member countries of the European Monetary Union.

Please send comments to: java-intl@java.sun.com

Java Software