Language Support   «Prev  Next»

Lesson 4Character Sets
ObjectiveUnderstand how to define a Character Set for a Database

Define Character Sets for an Oracle Database

In Oracle 19c, the character set defines how character data is stored in the database, including encoding and supported languages. It is crucial for handling multilingual data and ensuring compatibility with applications.
  1. Checking the Current Character Set:
    Before changing or defining a character set, you can check the current database character set using:
    SELECT parameter, value 
    FROM nls_database_parameters 
    WHERE parameter IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
    

    Example Output:
    PARAMETER             VALUE
    -------------------   -------------------
    NLS_CHARACTERSET      AL32UTF8
    NLS_NCHAR_CHARACTERSET AL16UTF16
    
    • NLS_CHARACTERSET → Controls the character set for CHAR, VARCHAR2, and CLOB.
    • NLS_NCHAR_CHARACTERSET→ Controls the character set for NCHAR, NVARCHAR2, and NCLOB.
  2. Defining a Character Set During Database Creation:
    When creating a new Oracle 19c database, specify the character set in the `CREATE DATABASE` statement.
    Example: Creating a Database with UTF-8
    CREATE DATABASE mydb
    USER SYS IDENTIFIED BY mypassword
    USER SYSTEM IDENTIFIED BY mypassword
    LOGFILE GROUP 1 ('/oracle/oradata/mydb/redo01.log') SIZE 50M,
            GROUP 2 ('/oracle/oradata/mydb/redo02.log') SIZE 50M,
            GROUP 3 ('/oracle/oradata/mydb/redo03.log') SIZE 50M
    MAXLOGFILES 5
    DATAFILE '/oracle/oradata/mydb/system01.dbf' SIZE 700M AUTOEXTEND ON
    CHARACTER SET AL32UTF8
    NATIONAL CHARACTER SET AL16UTF16;
    
    • CHARACTER SET AL32UTF8: Specifies Unicode UTF-8 (recommended for international applications).
    • NATIONAL CHARACTER SET AL16UTF16: Defines NCHAR, NVARCHAR2, and NCLOB storage.

  3. Changing the Character Set of an Existing Database:
    Changing the character set of an existing Oracle database is not recommended unless absolutely necessary because it can lead to data corruption. However, if required, follow these steps:
    1. Step 1: Shut Down the Database
            SHUTDOWN IMMEDIATE;
          
    2. Step 2: Start in Restrict Mode
            STARTUP RESTRICT;
          
    3. Step 3: Change the Character Set
      If changing to a superset (e.g., WE8ISO8859P1AL32UTF8), you can use:
            ALTER DATABASE CHARACTER SET AL32UTF8;
          
      If changing to a non-superset, use CSALTER Script:
            ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8;
          
      Warning: This method is riskier and should only be used if data integrity is ensured.
    4. Step 4: Restart the Database
            SHUTDOWN IMMEDIATE;
            STARTUP;
          
  4. Alternative: Using `DBCA` (Database Configuration Assistant):
    If you are setting up Oracle 19c via DBCA, you can define the character set in the Database Creation Wizard under the Character Set section.
  5. Best Practices for Choosing a Character Set
    1. Use Unicode (AL32UTF8) for applications that support multiple languages.
    2. Ensure compatibility with existing applications before modifying the character set.
    3. Backup the database before making any changes to the character set.
    4. Check for data conversion issues using:
        SELECT column_name, char_length, char_used
        FROM user_tab_columns
        WHERE data_type IN ('CHAR', 'VARCHAR2', 'CLOB');
      
    5. Validate the character set change using:
      SELECT * FROM v$nls_parameters 
      WHERE parameter LIKE '%CHARACTERSET%';
      
The NLS_LANG parameter is used to establish the language for the language-independent support delivered by Oracle.
The character set is used to establish the language for the language-dependent storage of data.
  • Character Sets:
    A character set is a group of characters that defines how to represent valid values for a particular language in the database. A character set that supports English allows upper- and lowercase representations of each letter of the alphabet, the 10 digits that make up numbers, and all valid punctuation characters. The character set for other languages, such as Chinese, could include many, many more valid characters.
    Oracle can support character sets that can be represented in a single byte of data, such as English, or that require multiple bytes, such as many Asian languages. One-hundred and eighty different character sets come with your Oracle database. Oracle uses an industry standard called Unicode for its character sets, which can store a wide variety of single and multiple-byte languages.

Defining a Character Set

You define a character set for your database when you install the database. You cannot change the character set used by the database once you create the database.
  • Character Set Encoding
    When computer systems process characters, they use numeric codes instead of the graphical representation of the character.
    For example, when the database stores the letter A, it actually stores a numeric code that is interpreted by software as the letter. These numeric codes are especially important in a global environment because of the potential need to convert data between different character sets.

What is an Encoded Character Set?

You specify an encoded character set when you create a database. Choosing a character set determines what languages can be represented in the database. It also affects:
  1. How you create the database schema
  2. How you develop applications that process character data
  3. How the database works with the operating system
  4. Performance

A group of characters (for example, alphabetic characters, ideographs, symbols, punctuation marks, and control characters) can be encoded as a character set. An encoded character set assigns unique numeric codes to each character in the character repertoire. The numeric codes are called code points or encoded values. Table Character Set shows examples of characters that have been assigned a numeric code value in the ASCII character set.
Character Set Encoded Characters in the ASCII Character Set
Character Description Code Value
! Exclamation Mark 21
# Number Sign 23
$ Dollar Sign 24
1 Number 1 31
2 Number 2 32
3 Number 3 33
A Uppercase A 41
B Uppercase B 42
C Uppercase C 43
a Lowercase a 61
b Lowercase b 62
c Lowercase c 63

In the next lesson, you will learn how to use a character set.

SEMrush Software