Lesson 5	National Character Sets
Objective	Choose a National Character Set for a Database

National Character Sets in Oracle 19c

Choosing a "National Character Set" (NCHAR character set) for an Oracle Database deployed using Oracle 19c involves a few important considerations. The NCHAR character set is used for columns defined with `NCHAR`, `NVARCHAR2`, and `NCLOB` data types, which are intended to store data in languages different from the database's default character set.
Steps to Choose a National Character Set:

Understand the Purpose of NCHAR Data Types: The `NCHAR`, `NVARCHAR2`, and `NCLOB` data types are used to store Unicode data. This is particularly useful for applications that need to support multiple languages or special characters not covered by the database's primary character set.
Identify the Requirements:
- Determine whether your application requires multilingual support or must store characters from different languages that are not supported by the database's main character set.
- Assess the storage requirements, as the choice of the national character set can impact the storage size of NCHAR data types.
Available National Character Sets in Oracle 19c: Oracle 19c supports the following national character sets:
- AL16UTF16: This is a fixed-width Unicode character set that uses 2 bytes for each character.
- UTF8: This is a variable-width Unicode character set that uses 1 to 3 bytes per character (Note: In some contexts, this might be referred to as AL32UTF8, which is generally the recommended character set for general use).
Note: As of Oracle 12c and later versions, including 19c, AL16UTF16 is the default national character set.
Selecting the National Character Set: You choose the national character set during the database creation process. Here’s how you can do it:
- Using DBCA (Database Configuration Assistant):
1. When creating a new database, the DBCA will prompt you to choose the character set and national character set.
2. On the "Character Set" page, you will see options to set the national character set. The default is typically `AL16UTF16`.
3. If you need to change it, you can select `UTF8` or another supported option depending on your requirements.
Manual Database Creation: If you are manually creating the database using SQL scripts, you can specify the national character set in the `CREATE DATABASE` statement:
```
CREATE DATABASE your_database_name
...
NATIONAL CHARACTER SET AL16UTF16;
```
Replace `AL16UTF16` with `UTF8` if you prefer to use that character set instead.
Considerations for Choosing AL16UTF16 vs. UTF8:
- AL16UTF16: Recommended if your application needs to store a large number of multilingual characters, and you prefer a fixed-width character set. It is generally more efficient for in-memory operations because it uses a constant amount of space per character.
- UTF8: Better suited if you expect to store a mix of single-byte and multi-byte characters and prefer a variable-width encoding. It might be more storage-efficient if your data contains many ASCII characters.
Testing and Validation:
- After setting the national character set, test your application thoroughly to ensure that it handles all required characters correctly.
- Validate that any data migrations, imports, or exports between databases preserve the integrity of the data when using NCHAR data types.

Changing the National Character Set After Database Creation:
Changing the national character set after the database has been created is not straightforward and typically involves exporting and re-importing the database or recreating the database. Therefore, it is crucial to make the right choice during the initial setup.
Summary:

Choose between `AL16UTF16` (default) or `UTF8` based on your application’s needs for multilingual support.
Use DBCA or the `CREATE DATABASE` command to set the national character set during database creation.
Ensure that your choice aligns with the character data requirements and storage considerations of your application.

By carefully selecting the appropriate national character set, you can ensure that your Oracle 19c database is well-suited to handle the specific multilingual and character encoding needs of your application.

Two types of Character Sets

An Oracle database has two types of character sets: the database character set and the national character set. The database character set is used to determine what types of data can be used for identifiers, PL/SQL programs, and the data stored in CHAR, VARCHAR2, CLOB, and LONG columns. The national character set is used to store and interpret the data kept in NCHAR and NVARCHAR2 columns. These two data types are specifically designed to accept national language characters. You can set the national character set with the NLS_NCHAR parameter. Like the NLS_LANG parameter, the NLS_NCHAR parameter is set in the client environment. By using the NLS_CHAR setting, you can use a single Oracle database to store more than one type of character string in the same database. For instance, you may want to store both Chinese and English characters in your Oracle database. You would have to specify a database character set that could handle Chinese characters, but you might want to store all of this type of data in NCHAR or NVARCHAR2 columns. Some database character sets also support multiple national character sets, so you could use the same method to support a widely mixed user community.
The following series of images illustrates the relationship between the database character set and the national character set and the CHAR/VARCHAR2 data types and the NCHAR/NVARCHAR2 data types:

1) Database character set defines the character set for the entire database — 1) The database character set defines the character set for the entire database

2) Any data stored in columns with a datatype of CHAR, VARCHAR 2, CLOB and LONG can use any characters in the database character set — 2) Any data stored in columns with a datatype of CHAR, VARCHAR2, CLOB and LONG can use any characters in the database character set

3) The national character set defines the character set for the the national language setting for a particular user

4) Any data stored in columns with a datatype of CHAR, VARCHAR2, or NCLOB uses the national character set .

Indexing using Oracle Database

National Character Sets and Select View

Question: Which view should I use to determine the character set and national character set my server is using?

sys@8i> select * from nls_database_parameters;

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               US7ASCII
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-YY
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-YY HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-YY HH.MI.SSXFF AM TZH:T
NLS_DUAL_CURRENCY              $
NLS_COMP
NLS_NCHAR_CHARACTERSET         US7ASCII
NLS_RDBMS_VERSION              8.1.5.0.0

18 rows selected.

Limitations
Although Oracle does support multiple-byte character sets, a few types of data must always be implemented with a single-byte character set. Database names, instance names, filenames, and rollback segment names can be written only with single-byte characters, whereas keywords must always be in English. There is also a difference between how you specify length for the CHAR/VARCHAR2 data types and for the NCHAR/NVARCHAR2 data types. The CHAR/VARCHAR2 data types are typically used for single-byte character sets, so the length specification for these data types is in terms of bytes. For fixed-length multibyte character sets, the length of the NCHAR/NVARCHAR2 data types refers to the number of characters in the column.
Creating your own Character Set: Oracle gives you the ability to create your customized character sets by extending an existing character set. You may need to do this to include a variety of information, such as vendor-specific codes. The NLS Data Installation Utility can be used to create these extended character sets. The next lesson will explain some of the ways that you can convert data from one language to another.