Primitive Data Types: Characters
Can you figure out what is written in the picture shown above? I will help you out, this is Welcome written is different languages. Try to see how many languages you can identify. You might be wondering what welcome in all these different languages has to do with characters in Java?
Unicode
Java uses Unicode to represent characters. Unicode can represent nearly all languages of the world like English, Hindi, Bengali, Kannada, Arabic, Hebrew, Chinese, Japanese, Greek, Hangul, and the list goes on. Serving content to the user in their native language is very crucial for the success of next generation of Internet applications. You would have experienced it for yourself too that it is becoming more and more easier to create and consume vernacular content in your native language on the Internet. WhatsApp, Facebook, Google, TikTok all these platforms are heavily focussed on supporting local languages of the user. All this is made possible by Unicode. It provides a unique number for every character irrespective of the platform, device, application or language. It has solved the problem of uniformly representing characters across platforms and supporting almost all languages within a singular character set.
char
Characters group has only one primitive data type, char which is used to store a single character. The size of char is 16 bits and it has a range of 0 to 65,536. There are no negative chars.
One thing to note about char is that, char is just like one of the data types of the integer group. In fact, in the formal Java language specification char is referred to as an integral type. What this means is that, you can assign a number to a variable which is of char data type. You can also perform arithmetic operation on it. Sounds a bit confusing? Let’s look at a BlueJ program which should make things clear.
public class IntegralChar
{
public void demoChar() {
char ch1;
ch1 = 65; //65 is the code for capital A
System.out.println("The value of ch1 is " + ch1);
System.out.println("Lets print A-Z");
/*
* We are running this loop 26 times
* as there are 26 english alphabets from A to Z
*/
for (int i = 0; i < 26; i++)
{
System.out.print(ch1);
System.out.print(' ');
ch1 = (char)(ch1 + 1);
}
}
}
Below is the output of this program. Let's go through the program and understand the reasoning behind the output.
The line char ch1;
declares ch1
to be a variable of char
type. In the next 2 lines,
ch1 = 65; //65 is the code for capital A
System.out.println("The value of ch1 is " + ch1);
we assign a value of 65
to ch1
and when we print the value of ch1
, we see that its value is capital A. We assigned 65
to ch1
but it has a value of capital A, what’s happening here?
Remember ch1
is a char
data type. When we assign an integer value to a variable of type char
, Java treats that integer value as a Unicode. It will assign the character corresponding to that Unicode as the value of the variable. 65 is the Unicode for A. So when 65 is assigned to ch1
, Java knows that the value of ch1
is the character whose Unicode is 65 which is A. Hence when you print the value of ch1
, A gets printed.
In the next lines of the program,
System.out.println("Lets print A-Z");
/*
* We are running this loop 26 times
* as there are 26 english alphabets from A to Z
*/
for (int i = 0; i < 26; i++)
{
System.out.print(ch1);
System.out.print(' ');
ch1 = (char)(ch1 + 1);
}
we are utilizing the fact that Unicode values of A to Z are in sequence and we can perform arithmetic operations on char
types to print A - Z using a for
loop. We have given ch1
an initial value of Unicode of A. In each iteration of the for
loop we are incrementing the value of ch1
by 1
and printing it. The result is that A – Z gets printed on the screen. For now, just ignore this (char)
mentioned here inside parenthesis in the line ch1 = (char)(ch1 + 1);
. This is a cast operation which we will look at soon in the operators section.