Using Data Files in Any Language or Character Set - MapInfo_Pro - 2023

MapInfo Pro Help

Product type
Software
Portfolio
Locate
Product family
MapInfo
Product
MapInfo > MapInfo Pro
Version
2023
Language
English
Product name
MapInfo Pro
Title
MapInfo Pro Help
First publish date
1985
ft:lastEdition
2023-09-12
ft:lastPublication
2023-09-12T16:39:16.995549

You can work with characters from any language in your data files, so that multi-language tables display properly in maps, browsers, the Info tool, and other locations. MapInfo Pro can open tables, files, or workspaces with Unicode characters in the file name or path name regardless of the locale of MapInfo Pro or which localized version of MapInfo Pro you are running. A system setting called Encode Workspaces and Tab Files enables this feature, which is off by default.

Note: You would disable Encode Workspaces and Tab Files to share MapInfo tables with versions of MapInfo Pro that are older than version 15.2, to share data with applications that do not support the UTF-8 character set, or when you use data from only one language. In this case, workspaces and tables are written with the current system character setting (charset).

When enabled, this system setting writes workspaces using the UTF-8 charset. New Tab files or Tab files being re-written, such as save copy as, pack table, update friendly name, or update metadata, use the UTF-8 encoding. The !charset in the .tab file remains the same; it represents the data in the table and not the charset of the .tab file itself. MapInfo Pro writes a UTF-8 Byte Order Mark (BOM) at the beginning of the file, so that other applications recognize the encoding.

When Encode Workspaces and Tab Files is enabled (turned on) and you are opening an Excel or Access file for import into MapInfo native TAB format, the resulting tables (TAB files) are in UTF-8 format. When opening an instance of an Excel, ASCII, CSV, or Lotus 1-2-3 data type and Create Copy in MapInfo Format is checked on the Open Table dialog, the resulting table is in MapInfo Extended format with a default character set (charset) preference set to NativeX (MapInfo Extended). When reading from or writing to a .QRY file, the file opens using the UTF-8 character set.

To enable or disable the Encode Workspaces and Tab Files feature:

  1. On the PRO tab, click Options, and click System Settings in the System group, to open the System Settings Preferences dialog box.
  2. Select the Encode Workspaces and Tab Files check box to enable this feature or clear the check box to disable it.
  3. Click OK.

To specify a specific character set, such as UTF-8 or UTF-16, to use for your MapInfo tables (*.tab) and MapInfo Interchange files (*.mif, *.mid), see Setting Your Language Preferences.

To specify a specific character set, such as UTF-8 or UTF-16, to use for your MapInfo tables (*.tab) and MapInfo Interchange files (*.mif, *.mid), see Setting Your Language Preferences in the Help System.

Note: You can encounter data corruption, due to truncation or conversion, when saving a copy of a database table between Unicode and non-Unicode character sets. When saving non-UTF-8 (non-Unicode) to UTF-8 (Unicode), there is the potential for data truncation. When saving UTF-8 or UTF-16 (Unicode) to a non-Unicode, there is the potential for conversion issues.

When saving data to the MapInfo Extended TAB format (NativeX format), MapInfo Pro interprets the width of character fields in tables with a UTF-16 character set (charset) as the number of characters with two bytes (16-bits) per character. It interprets the width of character fields in tables with any character set other than UTF-16 (such as WindowsLatin1, Cyrillic, and UTF-8) as the number of bytes. For non UTF-8 character sets each character takes up one byte, but could also take from one to four bytes. For UTF-8, since it is used to store characters from any language, it is more likely to require more than one byte. This means that you need to allow for larger field widths to avoid data truncation.

Using the UTF-16 character set is the best way to ensure that all data is preserved, but it results in larger file sizes. The UTF-8 character set can be used to encode all characters faithfully, but truncation could occur. When you save a copy of a table from a non UTF-8 character set to UTF-8, increase the field width to avoid truncation.

See also:

Saving a Table or a Copy of a Table