Tuesday, June 20, 2017

Oracle character sets and VARCHAR size

A few weeks ago one of my work mates contacted me with a problem that seemed particularly weird for him. A database column had a limit of 200 characters but updates would fail sometimes with "value too large for column" even though the application was also limiting the input to 200 characters.

The problem is actually quite simple, but it can be confusing if you are not familiar with 2 concepts: database character sets and the way Oracle limits a VARCHAR column.

First, character sets. Oracle has 2 types of character sets based on the type of encoding used:
  • Single-byte character sets (e.g. US7ASCII and WE8ISO8859P1)
  • Multibyte character sets (e.g. AL32UTF8)
A single-byte character set, as the name implies, uses only one byte to represent each of the supported characters. A multibyte character set, on the other hand, can use more than one byte to represent one character.

Now the problem is clear. The application was limiting the input to 200 characters but these characters used up more than 200 bytes in the database, causing the error. But why is Oracle limiting the column to 200 bytes instead of 200 characters as expected?

As a matter of fact Oracle can limit the size of your VARCHAR columns both ways. If you want to limit the size in bytes:

    mycolumn VARCHAR2(10 BYTE)

If you want to limit the size in characters:

    mycolumn VARCHAR2(10 CHAR)

The catch here is that the default behavior is limit in bytes, contrary to what one would expect. So, if you don't specify which way to limit your VARCHAR2 columns Oracle will limit them in bytes. Beware of that!

    mycolumn VARCHAR2(10) is equivalent to mycolumn VARCHAR2(10 BYTE)

Obs: this default behavior can be changed by setting the NLS_LENGTH_SEMANTICS parameter.

I find this behavior a bit weird and completely understands why my work mate was a bit confused. 

If you want more information on this, here's a link to a good Oracle document: