Basic Data Types
Integers
Computers can only perform operations on numbers. Modern CPUs can perform operations on integer and floating point numbers. The most basic and most commonly used are the integers. Integers are even used to represent characters based on the character set mappings such as ASCII or Unicode.
To understand the capabilities of the integers supported by the computer, it is important to note that computers do not store integers (or floating point) using the decimal (base 10) digits that we are accustomed to using. Instead computers store numbers using [[binary]] (base 2). In binary there are only two digits: 0 and 1. Therefore a single binary digit or bit, can hold one of the two possible binary values. If we combine two bits, we can represent four different values. With three bits we can represent eight different values and four bits gives us 16 values. This doubling of the number of values represented with each bit added gives us the following formula for the number of values that can be stored with a given number of bits.
values = 2n where n represents the number of bits.
The smallest group of bits that computers work with is a group of eight bits called a byte. A byte allows us to represent 256 different values. Groups of bytes combine to give us larger and larger ranges of value. Because we want to represent negative as well as positive numbers, we use half the range of values to represent negative values. Integers that can store positive or negative numbers are called signed integers.
# bits | # bytes | # values | Signed Range |
---|---|---|---|
8 | 1 | 256 | -128 to 127 |
16 | 2 | 65,536 | -32,768 to 32,767 |
32 | 4 | 4,294,967,296 | -2,147,483,648 to 2,147,483,647 |
64 | 8 | 18,446,744,073,709,551,616 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
For many years computers that operate primarily on 32-bit integers were common. Modern desktop computers typically operate primarily on 64-bit integers but 32-bit CPUs are still commonly embedded in devices other than desktop computers such as tablets, cell phones, routers, etc.
Integer [[literals]] are represented in our programs much as you would expect using decimal (base 10) notation using the digits 0-9. They can also be written in [[hexadecimal]] notation (base 16) which is handy in some situations. Hexadecimal values are written using the 16 digits 0-9 and A-F. They must be prefaced by the symbol 0x. Hexadecimal notation is a handy way to represent values that easily map to their binary representation as each hexadecimal digit represents exactly four binary digits. The following are examples of valid integer literals in both decimal and hexadecimal notation.
Decimal | Hexadecimal |
---|---|
0 | 0x0 |
8 | 0x8 |
15 | 0xf |
326 | 0x146 |
43981 | 0xabcd |
439041101 | 0x1a2b3c4d |
-326 | -0x146 |
Questions
- In the table of integer specs above, the number of values is always a number that ends in the digit 6. Why is this?
- Integers with 128 bits would give us 2128 ≅ 3.4 x 1038 different values. Is this a useful data type or too excessive?
Projects
More ★'s indicate higher difficulty level.
Floating Point
To represent fractional values we need another data type in addition to integers. This is where floating point data types come in. Floating point data types can represent a whole number and a fractional part. There are two commonly used floating point representations: 32-bit singles and 64-bit doubles. The bit patterns of these floating point representations are defined in a specification called [[IEEE 754]] defined in 1984.
Name | Bits | Precision(Decimal Digits) | Exponent Min | Exponent Max |
---|---|---|---|---|
Single | 32 | ~7 | ~-38 | ~+38 |
Double | 64 | ~16 | ~-308 | ~+308 |
It would seem that [[IEEE singles]] would result in faster computations because they are smaller data types (4 bytes) vs [[IEEE doubles]] (8 bytes). It turns out that computations with [[IEEE doubles]] are faster because this is the native data type used by modern processors. Floating point [[literals]] always represented, therefore, as doubles.
Floating point [[literals]] are written using standard base 10 decimal notation. They can also be written in a form of [[scientific notation]] that uses the letter 'e' to indicate a power of ten multiplier such that a number like 1.23 × 104 is written as 1.23e4
. When written in scientific notation we see how the Floating Point Specs listed above apply. The Precision in Decimal Digits refers to the maximum number of digits we can represent in the mantissa (number before the 'e'). The Exponent Min and Max refer to the smallest and largest value of the power of ten (number after the 'e').
Fixed Point | Scientific Notation |
---|---|
0 | 0e0 |
3.1415 | 3.1415e0 |
15 | 1.5e1 |
326 | 3.26e2 |
0.123 | 1.23e-1 |
-3.1415 | -3.1415e0 |
-0.001 | -1e-3 |
Questions
- What types of computations would benefit from the added precision of [[IEEE doubles]] over [[IEEE singles]]?
- What types of computations would benefit from the added exponent range of [[IEEE doubles]] over [[IEEE singles]]?
- What types of computations would be best done as integers instead of floating point?
Projects
More ★'s indicate higher difficulty level.
Booleans
The simplest of all basic data types is the [[Boolean]], named after [[George Boole]]. [[Booleans]] can only be one of two possible values: false
or true
. They are stored typically as numbers either 0 or 1 respectively. While the [[literal]] values false
or true
don't seem as varied or interesting as the other data types, we will see that they are extremely useful when used as the results of a [[Boolean expression]].
Questions
- Why are Boolean values named after George Boole?
Projects
More ★'s indicate higher difficulty level.
Characters and Strings
Computers store characters as integer codes based on a particular coding scheme such as [[ASCII]], [[ISO Latin-1]] or [[Unicode]]. Even so, we treat characters as a distinct type separate from integers. This is because integers support arithmetic operations whereas characters do not.
Java characters are encoded using multiple bytes so they can represent [[Unicode]] characters.
Character constants are represented in our code using a single character enclosed in single quotes. Alternately the hexadecimal code for a character can be used by prefacing the code by \u
and enclosing within single quotes.
Thus strings are merely a sequence of zero of more characters enclosed in quotes. An empty string (or null string) has no characters. Java string [[literals]] are enclosed in double quotes.
Type | Literal |
---|---|
Character | 'A' or '\u65' |
String | "ABCD" |
String | "\u2660\u2665\u2663\u2666" |
Unfortunately Java doesn't support Unicode codepoints larger than 16-bits (char size in Java). Larger codepoints must be written using two UTF-16 surrogate pairs. See [[Supplementary Characters in the Java Platform]]
Questions
- Why is ASCII support still important?
- Why is UTF-8 encoded Unicode backward compatible with ASCII?
- Why is Unicode support so important?
- Why is Unicode probably the last character coding scheme that will ever be developed?
Projects
More ★'s indicate higher difficulty level.
References
- [[Unicode Charts]]
- [[Unicode Lookup]]