Lesson 4 | Character Sets |
Objective | Understand how to define a Character Set for a Database |
Define Character Sets
The NLS_LANG
parameter is used to establish the language for the language-independent support delivered by Oracle.
The character set is used to establish the language for the language-dependent storage of data.
Character sets
A character set is a group of characters that defines how to represent valid values for a particular language in the database.
A character set that supports English allows upper- and lowercase representations of each letter of the alphabet, the 10 digits that make up numbers, and all valid punctuation characters. The character set for other languages, such as Chinese, could include many, many more valid characters.
Oracle can support character sets that can be represented in a single byte of data, such as English, or that require multiple bytes, such as many Asian languages. One-hundred and eighty different character sets come with your Oracle database.
Oracle uses an industry standard called Unicode for its character sets, which can store a wide variety of single and multiple-byte languages.
Defining a character set
You define a character set for your database when you install the database.
You cannot change the character set used by the database once you create the database.
Character Set Encoding
When computer systems process characters, they use numeric codes instead of the graphical representation of the character.
For example, when the database stores the letter A, it actually stores a numeric code that is interpreted by software as the letter.
These numeric codes are especially important in a global environment because of the potential need to convert data between different character sets.
What is an Encoded Character Set?
You specify an encoded character set when you create a database.
Choosing a character set determines what languages can be represented in the database. It also affects:
- How you create the database schema
- How you develop applications that process character data
- How the database works with the operating system
- Performance
A group of characters (for example, alphabetic characters, ideographs, symbols, punctuation marks, and control characters) can be encoded as a character set. An encoded character set assigns unique numeric codes to each character in the character repertoire. The numeric codes are called code points or encoded values. Table
Character Set shows examples of characters that have been assigned a numeric code value in the ASCII character set.
Character Set Encoded Characters in the ASCII Character Set
Character |
Description |
Code Value |
! |
Exclamation Mark |
21 |
# |
Number Sign |
23 |
$ |
Dollar Sign |
24 |
1 |
Number 1 |
31 |
2 |
Number 2 |
32 |
3 |
Number 3 |
33 |
A |
Uppercase A |
41 |
B |
Uppercase B |
42 |
C |
Uppercase C |
43 |
a |
Lowercase a |
61 |
b |
Lowercase b |
62 |
c |
Lowercase c |
63 |
In the next lesson, you will learn how to use a character set.