5.1 Data Representations
This section under construction.
All data on digital computers is represented as a sequence of 0s
and 1s. This includes numeric data, text, executable files, images,
audio, and video. The ASCII standard associates a seven bit binary
number with each of 128 distinct characters. The MP3 file format
rigidly specifies how to encode each raw audio file as
a sequence of 0s and 1s.
All data are numbers, and all numbers are data.
In this section we describe how to represent integers in binary, decimal, and hexadecimal and how to convert between different representations. We also describe how to represent negative integers.
Number systems.
There are many ways to represent integers: the number of days in the month of October can be represented as 31 in decimal, 11111 in binary, 1F in hexadecimal, or XXXI in Roman Numerals. It is important to remember than an integer is an integer, no matter whether it is represented in decimal or with Roman Numerals.- Decimal numbers. We are most familiar with performing arithmetic with the decimal (base 10) number system. This number system has been widely adopted, in large part because we have 10 fingers. However, other number systems still persist in modern society.
- Sexagecimal numbers. The Sumerians uses a sexagecimal (base 60) number system. We speculate that 60 was chosen since it is divisible by many integers: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, and 30. Most clocks are based on the sexagecimal system. The Babylonians inherited sexagecimal numbers from the Sumerians. They divided a circle into 360 degrees since they believed the Sun rotated around the Earth in about 360 days. Ptolemy tabulated trigonometric tables using base 360, and, even today, we still often use degrees instead of radians when doing geometry.
- Binary numbers. Computers are based on the binary (base 2) number system because each wire can be in one of two states (on or off).
- Hexadecimal numbers. Writing numbers in binary is tedious since this representation uses between 3 and 4 times as many digits as the decimal representation. The hexadecimal (base 16) number system is often used as a shorthand for binary. Base 16 is useful because 16 is a power of 2, and numbers have roughly has many digits as in the corresponding decimal representation.
x = xn bn + xn-1 bn-1 + ... + x1 b1 + x0 b0.
The xi terms are the positional digits, and each digit is required to be an integer between 0 and b - 1. In binary, the two digits (also referred to as bits) are 0 and 1; in decimal, the ten digits are 0 through 9; in hexadecimal, the sixteen digits are 0 through 9 and the letters A through F. Every nonnegative integer can be expressed using positional notation, and the representation is unique (up to an arbitrary number of leading 0s). As an example the number of days in a leap year is 36610 = 1011011102 = 16E16 since:
366 = 3×102 + 6×101 + 6× 100 366 = 1× 28 + 0× 27 + 1× 26 + 1×25 + 0×24 + 1×23 + 1×22 + 1×21 + 0×20 366 = 1×162 + 6×161 + E×160
The following table gives the binary, decimal, and hexadecimal representations of the first 48 integers.
| BIN | DEC | HEX | BIN | DEC | HEX | BIN | DEC | HEX | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 10000 | 16 | 10 | 100000 | 32 | 20 | ||
| 1 | 1 | 1 | 10001 | 17 | 11 | 100001 | 33 | 21 | ||
| 10 | 2 | 2 | 10010 | 18 | 12 | 100010 | 34 | 22 | ||
| 11 | 3 | 3 | 10011 | 19 | 13 | 100011 | 35 | 23 | ||
| 100 | 4 | 4 | 10100 | 20 | 14 | 100100 | 36 | 24 | ||
| 101 | 5 | 5 | 10101 | 21 | 15 | 100101 | 37 | 25 | ||
| 110 | 6 | 6 | 10110 | 22 | 16 | 100110 | 38 | 26 | ||
| 111 | 7 | 7 | 10111 | 23 | 17 | 100111 | 39 | 27 | ||
| 1000 | 8 | 8 | 11000 | 24 | 18 | 101000 | 40 | 28 | ||
| 1001 | 9 | 9 | 11001 | 25 | 19 | 101001 | 41 | 29 | ||
| 1010 | 10 | A | 11010 | 26 | 1A | 101010 | 42 | 2A | ||
| 1011 | 11 | B | 11011 | 27 | 1B | 101011 | 43 | 2B | ||
| 1100 | 12 | C | 11100 | 28 | 1C | 101100 | 44 | 2C | ||
| 1101 | 13 | D | 11101 | 29 | 1D | 101101 | 45 | 2D | ||
| 1110 | 14 | E | 11110 | 30 | 1E | 101110 | 46 | 2E | ||
| 1111 | 15 | F | 11111 | 31 | 1F | 101111 | 47 | 2F |
Number conversion.
You need to know how to convert from a number represented in one system to another.- Converting from base b to decimal.
To convert from an integer represented in base b to decimal,
multiply the ith digit by the ith power of b, and sum up the results.
For example, the binary number 101101110 is 366 in decimal.
The hexadecimal number 16E is 366 in decimal.1 0 1 1 0 1 1 1 0 (binary) 256 128 64 32 16 8 4 2 1 (powers of 2) ------------------------------------------- 256 + 64 + 32 + 8 + 4 + 2 (multiplied by corresponding digit) 366
1 6 E (hex) 256 16 1 (powers of 16) ------------- 256 + 96 + 14 (multiplied by corresponding digit) 366
- Converting from decimal to base b.
It is slightly more difficult to convert an integer
represented in decimal to one in base b because
we are accustomed to performing arithmetic in base 10.
The easiest way to convert from decimal to base b
by hand is to repeatedly divide by the base b,
and read the remainder upwards. For example, the calculations
below convert from the decimal integer 366 to binary (101101110) and
to hexadecimal (16E).
- Converting between base b1 and b2. One way is to convert the base b1 integer to decimal (using the first algorithm described above) and then to convert the resulting decimal integer to base b2 (using the second algorithm described above).
- Converting between binary and hexadecimal.
We describe a fast and elegant way to convert directly from the binary to hexadecimal
representation of an integer:
First, group the digits 4 at a time starting from the right;
then convert each group to a single hexadecimal digit, padding 0s
to the very last group if necessary. For example, the hexadecimal
representation of 111010111001110001 is 3AE71.
0011 1010 1110 0111 0001 3 A E 7 1
To convert from hexadecimal to binary: convert each hexadecimal digit individually into its corresponding 4 digit binary number, removing any leading 0's.
9 F 0 3 1001 1111 0000 0011
This works because one base (16) is a power of the other (2). Likewise, it would be easy to convert between the base 125 and base 5 representations.
- Number conversion in Java.
Converting a string to another type is called parsing, and can be a
significant computational burden, since there are normally numerous cases to consider.
We have already used one such built-in static method extensively:
Integer.parseInt() converts from a
string of decimal digits to an integer.
Program BinaryConverter.java defines a
static method fromBinaryString(s) that converts a string s
of binary digits to an integer, and another
static method toBinaryString(n) that converts an
int n to a binary string. We note that Java's library calls
Integer.parseInt(s, 2) and Integer.toBinaryString(n)
accomplish the same goals as fromBinaryString(s) and toBinaryString(),
respectively.
Program HexInOut.java reads in a hexadecimal integer from standard input and prints it out in hexadecimal to standard output.
Arithmetic in other number systems.
One way to perform arithmetic is to convert all of the numbers to base 10, perform arithmetic as usual, and then convert back. In many cases, it is easier to perform the arithmetic directly in the given number system.- Addition.
In grade school you learned how to add two decimal integers:
add the two least significant digits (rightmost digits);
if the sum is more than 10, then carry a 1 and write down
the sum modulo 10. Repeat with the next digit, but this time
include the carry bit in the addition.
The same procedure generalizes to base b by replacing
the 10 with the base b. For example,
if you are working in base 16 and the two summand digits
are 7 and E, then you should carry a 1 and write down a
5 because 7 + E = 1516. Below, we compute
456710 + 36610 = 493310
in binary (left), decimal (middle) and hexadecimal (right).
1 0 0 0 1 1 1 0 1 0 1 1 1 + 0 0 0 0 1 0 1 1 0 1 1 1 0 ------------------------- 1 0 0 1 1 0 1 0 0 0 1 0 1
4 5 6 7 + 3 6 6 --------- 4 9 3 3
1 1 D 7 + 1 6 E --------- 1 3 4 5
- Multiplication.
One compelling reason to use positional number systems
is to facilitate multiplication. Multiplying two Roman
Numerals is awkward and slow.
In contrast, the grade school algorithm for multiplying two
decimal integers is straightforward and reasonably efficient.
As with addition, it easily generalizes to handle base b integers.
All intermediate single-digit multiplications and additions are
done in base b.
Below, we multiply the decimal integers 4,567 and 366 (left) and
then again in hex (right).
4 5 6 7 * 3 6 6 --------- 2 7 4 0 2 2 7 4 0 2 1 3 7 0 1 ------------- 1 6 7 1 5 2 21 1 D 7 * 1 6 E -------------- F 9 C 2 6 B 0 A 1 1 D 7 -------------- 1 9 8 1 6 2
Negative integers.
On a machine with 16-bit words, there are 216 = 65,536 possible integers that can be stored in one word of memory. By interpreting the 16 bits as a binary number, we obtain an unsigned integer in the range 0 through 65,535. Instead, we can interpret the leading bit as the sign of the number, using two's complement notation. This allows us to interpret the 16 bits as a signed integer in the range -32,768 through +32,767, as described in the table below. As with binary integers, it is often convenient to express 16-bit two's complement integers using hexadecimal notation.
BINARY HEX DECIMAL 0000 0000 0000 0000 0000 0 0000 0000 0000 0001 0001 +1 0000 0000 0000 0010 0002 +2 0000 0000 0000 0011 0003 +3 ... 0111 1111 1111 1110 7FFE +32,766 0111 1111 1111 1111 7FFF +32,767 1000 0000 0000 0000 8000 -32,768 1000 0000 0000 0001 8001 -32,767 1000 0000 0000 0010 8002 -32,766 ... 1111 1111 1111 1101 FFFD -3 1111 1111 1111 1110 FFFE -2 1111 1111 1111 1111 FFFF -1
- Negating a two's complement integer.
To negate (change from positive to negative or negative to positive)
a two's complement integer, first complement
all of the bits, then add 1. By complement, we mean
replace all of the 0's with 1's, and the 1's with 0's.
(One way to think about it is that numbers in the range -231 to 231 - 1
are stored modulo 232.)
The table below illustrates a few examples.
DECIMAL BINARY COMPLEMENT INCREMENT DECIMAL +3 0000 0000 0000 0011 1111 1111 1111 1100 1111 1111 1111 1101 -3 -3 1111 1111 1111 1101 0000 0000 0000 0010 0000 0000 0000 0011 +3 +40 0000 0000 0010 1000 1111 1111 1101 0111 1111 1111 1101 1000 -40 -40 1111 1111 1101 1000 0000 0000 0010 0111 0000 0000 0010 1000 +40 +366 0000 0001 0110 1110 1111 1110 1001 0001 1111 1110 1001 0010 -366 0 0000 0000 0000 0000 1111 1111 1111 1111 0000 0000 0000 0000 0 -32,768 1000 0000 0000 0000 0111 1111 1111 1111 1000 0000 0000 0000 -32,768 - Converting a hexadecimal two's complement integer into decimal. To convert the 16-bit two's complement integer FE92 into decimal, we start by writing down its binary representation: 1111 1110 1001 0010. We recognize it as a negative integer since the most significant bit is 1. Then, we negate it (flip bits and add 1) to obtain: 0000 0001 0110 1110. We convert 101101110 (binary) to 366 (decimal). After putting back in the negative sign, we obtain the final answer of -366 (decimal).
- Converting a negative decimal integer into its hexadecimal two's complement representation. To convert from the decimal integer -366 to its 16-bit two's complement representation, we can apply the above steps in reverse. First we convert 366 (decimal) into 101101110 (binary). Next, we negate it (flip bits and add one) to obtain 1111 1110 1001 0010. It is important that we fill in all 16 bits. If desired, we can convert this to hexadecimal: FE92.
- Adding two's complement integers.
Adding two's complement integers is straightforward:
add the numbers as if they were unsigned integers, ignoring
any overflow.
Below we compute
456710 + -36610 = 420110
in decimal (left) and again in binary using 16-bit two's complement integers
(right).
Note that the second binary integer represents a negative
integer using two's complement notation.
4 5 6 7 + -3 6 6 --------- 4 2 0 1
0 0 0 1 0 0 0 1 1 1 0 1 0 1 1 1 + 1 1 1 1 1 1 1 0 1 0 0 1 0 0 1 0 --------------------------------- 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1
In the example above, we carry a one into the most significant bit (leftmost bit), and we carry a one out, which we subsequently discard. Despite the apparent overflow, we are left with the correct result.
- Overflow.
However, there is one situation where this addition produce an
"incorrect" answer: if we do not carry a one into the most significant
bit, but do carry a one out.
Below we compute -32,76610 + (-36610) = -33,13210
using 16-bit two's complement integers. At first, we might be surprised
to see that the result of adding two negative integers is a positive
integer (to see this quickly, look at the most significant bit of
the result). This occurs because there is carry out of the most
significant digit, but no carry in to it.
In hindsight, we should not be surprised because the true answer
-33,132 cannot be represented as a 16-bit two's complement integer.
-3 2 7 6 6 + -3 6 6 ------------- -3 3 1 3 2
1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 + 1 1 1 1 1 1 1 0 1 0 0 1 0 0 1 0 --------------------------------- 0 1 1 1 1 1 1 0 1 0 0 1 0 1 0 0
Bit-whacking operators in Java.
In Java, an int is a 32-bit two's complement integer. Java supports a number of operators to manipulate the bits of integer types, as summarized below. These bit-whacking operators are especially useful when performing low-level data processing such as cryptography, data compression, error correction, and transmitting email.
| NAME | OPERATOR | PURPOSE |
|---|---|---|
| bitwise NOT | ~x |
Flip all the bits of x |
| bitwise AND | x & y |
Take the AND of each pair of bits |
| bitwise OR | x | y |
Take the OR of each pair of bits |
| bitwise XOR | x ^ y |
Take the XOR of each pair of bits |
| left shift | x << y |
Move the bits of x to the left y positions |
| right shift | x >> y |
Move the bits of x to the right y positions |
| unsigned right shift | x >>> y |
Move the bits of x to the right y positions |
Program BitWhacking.java reads in
two integers a and b from the command line, applies the bit-whacking
operations, and prints the results.
- Bitwise not.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 ~ 43 = 1111 1111 1111 1111 1111 1111 1101 0100 = -44 - Bitwise and.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 17 = 0000 0000 0000 0000 0000 0000 0001 0001 43 & 17 = 0000 0000 0000 0000 0000 0000 0000 0001 = 1 - Bitwise or.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 17 = 0000 0000 0000 0000 0000 0000 0001 0001 43 | 17 = 0000 0000 0000 0000 0000 0000 0011 1011 = 59 - Bitwise xor.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 17 = 0000 0000 0000 0000 0000 0000 0001 0001 43 ^ 17 = 0000 0000 0000 0000 0000 0000 0011 1010 = 58 - Left shift.
Left shifting an integer by i bits means that each of the
bits in the integer are moved i positions to the left.
Zeros are padded to the right end.
This is equivalent to multiplying the integer by 2i.
For example 43 << 3 is 344 and -43 << 3 is -344.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 43 << 3 = 0000 0000 0000 0000 0000 0001 0101 1000 = 344 -43 = 1111 1111 1111 1111 1111 1111 1101 0101 -43 << 3 = 1111 1111 1111 1111 1111 1110 1010 1000 = -344
- Right shift.
Right shifting an integer means that you shift
all of the bits i positions to the right, padding 0s to the
beginning if the integer is positive and 1s if the integer is negative.
It is equivalent to dividing the integer
by 2i and rounding the result down (towards -infinity).
For example 43 >> 3 is 5 and -43 >> 3 is -6.
For positive integers, this is equivalent to integer division, but
for negative integers, the rounding works in the opposite direction.
43 = 0000 0000 0000 0000 0000 0000 0010 1011 43 >> 3 = 0000 0000 0000 0000 0000 0000 0000 0101 = 5 43 / 8 = 0000 0000 0000 0000 0000 0000 0000 0101 = 5 -43 = 1111 1111 1111 1111 1111 1111 1101 0101 -43 >> 3 = 1111 1111 1111 1111 1111 1111 1111 1010 = -6 -43 / 8 = 1111 1111 1111 1111 1111 1111 1111 1011 = -5
- Unsigned right shift.
An unsigned right shift is identical to a signed right shift when applied
to positive integers, but differs when applied to negative integers.
An unsigned right shift always pads 0s to the left,
regardless of whether the integer is positive or negative.
For example, 43 >>> 3 is 5 but
-43 >>> 3 is 536870906. To see why, observe that
43 = 0000 0000 0000 0000 0000 0000 0010 1011 43 >>> 3 = 0000 0000 0000 0000 0000 0000 0000 0101 = 5 -43 = 1111 1111 1111 1111 1111 1111 1101 0101 -43 >>> 3 = 0001 1111 1111 1111 1111 1111 1111 1010 = 536870906
Java library functions for bit-whacking.
The library Integer contains a number of useful functions for bit-whacking. The table below refers to the two's complement representation of the int variable x.
| FUNCTION | RETURN VALUE |
|---|---|
Integer.bitCount(x) |
Number of one-bits in x |
Integer.highestOneBit(x) |
Zero out all but the leftmost one bit of x. |
Integer.lowestOneBit(x) |
Zero out all but the rightmost one bit of x. |
Integer.numberOfLeadingZeros(x) |
Number of zero bits preceding highest one bit. |
Integer.numberOfTrailingZeros(x) |
Number of zero bits following lowest one bit. |
Integer.rotateLeft(x, i) |
Rotate of x by circularly shifting i bits to the left. |
Integer.rotateRight(x, i) |
Rotate x by circularly shifting i bits to the right. |
Integer.reverse(x) |
Reverse of the bits of x. |
Big Endian, little endian.
Computers differ in the way in which they store multi-byte chunks of information, e.g., the 16-bit short integer 0111000011110010 = 70F2. This consists of the two bytes 70 and F2, where each byte encodes 8 bits. The two are two primary formats, and they differ only in the order or "endianness" in which they store the bytes.- Big endian systems store the most significant bytes first, e.g., they store the integer above in its natural order 70F2. Java uses this format, as does Apple Mac, IBM PowerPC G5, Cray, and Sun Sparc.
- Little endian systems store the least significant bytes first, e.g., they store the integer above in reverse byte-order F270. This format is more natural when manually performing arithmetic, e.g., since for addition you work from least significant byte to most significant byte. Intel 8086, Intel Pentium, Intel Xeon use this format.
Historical note: big endian and little endian derive from Gulliver's Travels. How to crack an egg? At little end or big end?
Dangers of not recognizing overflow. In September 2004, a computer glitch grounded air traffic in southern California for several hours and left planes without radio communication. The software was running on Windows 2000 Advanced Server. The software relied on a Windows function named GetTickCount(), which returns the number of milliseconds since the system was started. Unfortunately, this value is represented using a 32-bit integer. After approximately 49.7 days, the value overflows. This glitch was known when the system was launched. To avoid the ensuing problems, the documentation recommended manually rebooting the system once a month.
Q + A
Q. How much new information is created each year?
A. Here is a relatively recent study.
Q. Why base 10?
A. Largely because we have 10 fingers. Although 10 fingers enable you to count in base 11 (from 0 to 10), the decimal system was introduced before mathematicians accepted zero as a number.
Q. Do all programming languages use 32 bit two's complement integers?
A. No. Java is unusual in that it completely specifies the representation of an int. In C, there is no requirement that an int be 32 bits or that it uses two's complement notation to represent negative integers. Different C compilers can represent integers in different ways, and this can lead to incompatibilities when trying to use the same program with different compilers.
Q. My program only needs integers between -32,768 and 32,767. Does using a 16-bit short make my program faster? Does it save space?
A. Typically a single short variable is internally stored using 32 bits (especially if the underlying hardware architecture is 32 bit), in which case it does not save any space to declare a single short. Moreover, it can take longer because the Java system must make the 32 bits behave as if it were representing a 16-bit two's complement integer. On the other hand, if you declare a huge array of type short, then the elements will be packed two to an int, and the short array will use approximately half as much memory as an int array of the same length.
Q. Why do I need to cast to add two variables of type short?
A. Java convert the results of most integer operations to be of type int. If a, b, and c are of type short, then a + b is promoted to type int, and assigning this sum to a short requires an explicit cast. One exception to this rule is if you use +=, in which case the cast is performed automatically.
Q. Can I apply bitwise & and bitwise | to boolean values? If so, is there any difference between the corresponding logical operators && and ||?
A. Yes. The difference is that the logical operators are subject to short circuiting: the expression (f(x) && g(x)) will not evaluate g(x) if f(x) is false. This also explains why there is no ^^ operator for logical XOR: it is never possible to short-circuit an XOR, so it would always be identical to bitwise ^.
Q. What happens if I shift an int more than 31 places?
A. You can't. Java only uses the five low-order bits of the second operand. This has the effect of shifting the number of values mod 32. Another consequence of this is that left shifting by a negative integer does not right shift the number. This behavior coincides with the physical hardware on many microprocessors.
Q. I need an unsigned 32-bit integer, but Java only has signed 32-bit integers. What should I do?
A. First, are you sure that you really need an unsigned type. Signed and unsigned integers behave identically on the bitwise operators (except >>), addition, subtraction, and multiplication. In many applications, these are sufficient, assuming you replace >> with >>>. Comparison operators are easy to simulate by checking the sign bit. Division and remainder are the trickiest: the easiest solution is to convert to type long.
Program UnsignedDivision.java uses this trick, and also does it directly using 32-bit operations.
long MASK = (1L << 32) - 1; // 0x00000000FFFFFFFF; int quotient = (int) ((a & MASK) / (b & MASK)); int remainder = (int) ((a & MASK) % (b & MASK));
Q. I need an unsigned 8-bit integer, but Java only has signed 8-bit integers (bytes). What should I do?
A. Same advice as previous question. One place where it's nice to have unsigned integers is for a lookup table, indexed by the byte. With signed integers the index can be negative. Also, if b is a byte, then b << 4 automatically casts b to an int. This could be undesirable since b is signed. In many applications you need to remove the signed extended bits via (b << 4) & 0xff.
Q. Why does (b << i) give weird results when b is a byte?
A. In Java, byte is an 8-bit signed integer. Before the right shift, b is converted to an integer. You may want ((b & 0xff) << i) instead.
Exercises
- Convert the decimal number 92 to binary.
Answer: 1011100.
- Convert the hexadecimal number BB23A
to octal.
Answer: first convert to binary 1011 1011 0010 0011 1010, then consider the bits three at a time 10 111 011 001 000 111 010, and convert to octal 2731072.
- Add the two hexadecimal numbers 23AC and 4B80 and give the result in hexadecimal. Hint: add directly in hex instead of converting to decimal, adding, and converting back.
- Assume m and n are positive integers. How many 1 bits are there in the binary representation of 2^(m+n)?
- What is the only decimal integer that is reverse when
written in hexadecimal?
Answer: 53.
- How many bits are in the binary representation of 2^2^2^2^17?
- IPv4 is the protocol developed in the 1970s that dictates how computers on the Internet communicate. Each computer on the Internet needs it own Internet address. IPv4 uses 32 bit addresses. How many computers can the Internet handle? Is this enough for every human being to have their own? Every cell phone and toaster?
- IPv6 is an Internet protocol in which each computer has a 128 bit address. How many computers would the Internet be able to handle if this standard is adopted? Is this enough? Answer: 2^128. That at least enough for the short term - 5000 addresses per square micrometer of the Earth's surface!
- When you buy a hard drive, 1 GB means 1,000 MB (megabytes) and 1 MB means 1,000 KB (kilobytes) and 1 KB means 1,000 bytes. But when you buy memory, 1 GB means 1,024 MB, 1 MB means 1,024 KB, and 1 KB means 1,024 bytes. What percentage difference is there in the amount of storage in a 100 MB hard drive vs. 100 MB memory? 1GB hard drive vs. 1 GB memory? (1024/1000)2 = 4.9% and (1024/1000)3 = 7.4%.
-
Why does the following code fragment fail?
Answer: Java automatically promotes the sum to be of type int. To assign the result to a short, you need to explicitly cast it back c = (short) (a + b). Yes, this is rather quirky.short a = 4; short b = 5; short c = a + b;
-
What does the following code fragment from
program Overflow.java print out?
int a = 2147483647; // 2^31 - 1 int b = a + 1; System.out.println("a = " + a); System.out.println("b = " + b); -
What does the following code fragment print out?
int a = -5 >> 3; int b = -5 >>> 3; System.out.println(a); System.out.println(b);
- List all values a of type int for which (a == (a >> 1)). Hint: there is more than one.
- Suppose a is a variable of type int. Find two values of a for which (a == -a) is true. Answer: 0 and -2147483648.
- What is the result of a = -1 * -2147483648? Answer: 0.
-
What does the following code fragment print out?
int a = 11 & 17; int b = 11 ^ 17; int c = 11 | 17; int d = ~11; System.out.println(a); System.out.println(b); System.out.println(c); System.out.println(d);
-
Given two positive integers a and b, what result does the
following Java code fragment leave in c?
c = 0; while (b > 0) { if (b & 1 == 1) c = c + a; b = b >> 1; a = a << 1; }Answer: a * b.
-
What does the following code do to the integers stored in two
different variables a and b?
a = a ^ b; b = a ^ b; a = a ^ b;
- Repeat the previous question, but assume a and b are the same variable.
-
What does the following code do to the integers stored in two
different variables a and b? Any problems with overflow?
a = a + b; b = a - b; a = a - b;
-
What do each of the following statements do?
x = - ~x; x = ~ -x;
Increment x, decrement x
- Modify Binary.java so that it converts from base 7 to decimal and vice versa.
-
What does the following do?
Answer: computes the parity of the number of 1 bits set in the binary representation of a using divide-and-conquer.public static boolean parity(int a) { a ^= a >>> 32; a ^= a >>> 16; a ^= a >>> 8; a ^= a >>> 4; a ^= a >>> 2; a ^= a >>> 1; return a & 1; } -
What is the value of cnt after the following loop?
Hint: it's not an infinite loop.int cnt = 0; for (int i = 1; i != 0; i = 2 * i) { cnt++; } -
Explain why the following Java code fragment correctly determines
whether the integer n is a power of 2.
boolean isPowerOfTwo = (n & -n) == n;
Creative Exercises
- Linear feedback shift register. Rewrite LFSR.java to simulate the linear feedback shift register from Chapter 1 using bit-whacking operations.
- Linear feedback shift register cycle length. Modify the program from the previous exercise to compute the cycle length of the LFSR using Floyd's method. What if you change the taps to xyz?
- IP addresses and IP numbers
An IP address (IPV4) is comprised of integers w, x,
y, and z and is typically written
as the string w.x.y.z. The corresponding IP number is given by
IP number = 16777216*w + 65536*x + 256*y + z. Given an IP number, the corresponding IP address is w = (ipnum / 16777216 ) % 256, x = (ipnum / 65536) % 256, y = (ipnum / 256) % 256, z = (ipnum) % 256. [Or use shifting and masking.]
Write a function that takes an IP number and returns a String corresponding to the IP address. Write a function that takes an IP address and returns a int corresponding to the IP number. For example, if the IP number is 3401190660, then the function should return "202.186.13.4".
- IP address. Write a program that takes a 32 bit string as a command line argument, and prints out the corresponding IP address in dotted decimal form. That is, take the bits 8 at a time, convert each group to decimal, and separate each group with a dot. For example, the binary IP address 01010000000100000000000000000001 should be converted to 80.16.0.1.
- Base64 encoding. Base64 encoding is a popular method for sending binary data over the Internet. It converts arbitrary data to ASCII text, which can be emailed back between systems without problems. Write a program to read in a arbitrary binary file and encode it using Base64.
- Counting in base -2.
Use the definition of the positional notation to define the base -2
number system. There are two digits 0 and 1. Count from -7 to 7
in this system.
0 = 0 -1 = 11 (-2 + 1) 1 = 1 -2 = 10 2 = 110 (4 + -2) -3 = 1101 (-8 + 4 + 1) 3 = 111 -4 = 100 4 = 100 -5 = 1111 5 = 101 -6 = 1110 6 = 11010 (16 + -8 + -2) -7 = 1001 7 = 11011
- RGBA color format.
Some of Java's classes (BufferedImage, PixelGrabber) use a special
encoding called RGBA to store the color of each pixel.
The format consists of four integers, representing the red, green, and
blue intensities from 0 (not present) to 255 (fully used),
and also the alpha transparency value from 0 (transparent) to
255 (opaque).
The four 8-bit integers are compacted into a single 32-bit integer.
Write a code fragment to extract the four components from
the RGBA integer, and to go the other way.
// extract int alpha = (rgba >> 24) & 0xff; int red = (rgba >> 16) & 0xff; int green = (rgba >> 8) & 0xff; int blue = (rgba >> 0) & 0xff; // write back rgba = (alpha << 24) | (red << 16) | (green << 8) | (blue << 0); System.out.println(rgba);
- Min and max.
One of the following computes min(a, b), the other
computes max(a, b) without branching. Which is
which? Explain how it works.
f = b + ((a - b) & -(a < b)); // min(a, b) g = a - ((a - b) & -(a < b)); // max(a, b)
- Find the missing value.
Suppose you have an array consisting of 232 - 1
integers of type int such that no two
integer appears more than once. Since there are 232
possible values, exactly one integer is missing.
Write a code fragment to find the missing integer using
as little extra storage as possible.
Hint: this is a popular interview question. It's possible to do it using only one extra int. Use either properties of integer overflow on two's complement integers or use the XOR function.
- Cyclic redundancy check. Write programs CRC16.java and CRC32.java that read in data from standard input and computes its 16 or 32-bit CRC. Write a program CRC16CCITT.java" for 16-bit CRC in CCITT format.
- Number conversion.
Write a program Converter.java that
converts between base b and decimal for any 2 &le b ≤ 36. You should have
a static method toString(int n, int b) that converts n to a base b
string and a static method fromString(String s, int b) that converts
from a base b string to an integer.
Consider defining and using
String digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
- Excel column numbering. Write a function that takes a nonnegative integer and converts it into the corresponding Excel column name (0 = A, 1 = B, ..., 25 = Z, 26 = AA, ..., 702 = AAA).
- Elias Gamma coding.
Write a function elias that takes as input an integer N and returns the
Elias Gamma code as a string.
The Elias Gamma code
is a scheme to encode the positive integers. To generate the code for
an integer N, write the integer N in binary, subtract 1 from the number of
bits in the binary encoding, and prepend that many zeros. For example,
the code for the first 10 positive integers is given below.
1 1 6 00110 2 010 7 00111 3 011 8 0001000 4 00100 9 0001001 5 00101 10 0001010
- Bit reversal.
Write a function that takes an integer input, reverse its bits, and
returns that integer.
For example if n = 8, and the input is 13 (00001101), then its
reversal is 176 (10110000).
public static int bitReverse(int input) { int ans = 0; for (int i = 0; i < n; i++) { ans = (ans << 1) + (input & 1); input = input >> 1; } return ans; } - Bit-reversal sorting.
Use the previous algorithm to "sort" an array of N = 2n elements
into their bit-reversed order.
Swap elements i and j if i and j are bit reversal of each other.
Such permutations arise in the Fast Fourier Transform.
0 1 2 3 4 5 6 12 13 14 15 0000 0001 0010 0011 0100 0101 0110 ... 1100 1101 1110 1111 0000 1000 0100 1100 0010 1010 0110 ... 0011 1011 0111 1111 0 8 4 9 2 10 6 3 11 7 15
- Swap without temporary storage.
What do the following two code fragments do given integers
a and b?
Answer: each 3-line fragment swaps a and b. It works provided a and b are not the same variables (in which case both variables are zero'd out).a = a + b; b = a - b; a = a - b; a = a ^ b; b = a ^ b; a = a ^ b;
- Find the unique integer. Suppose you have an array of 2N + 1 integers, and you know that each of N integers appear exactly twice. Describe an elegant and efficient algorithm to identify the integer that appears only once. Hint: xor.
- Bit-whacking version of Gray codes Use bit-whacking operations and iteration instead of recursion to generate a gray code. Name your program BitWhackingGrayCode.java.
- Free the prisoners I. A warden meets with 17 new prisoners when they arrive. The warden tells them that they may meet today and plan a strategy, but after the meeting, each prisoner will be in solitary confinement and will not be able to communicate with one another. The prison has a switch room with 17 switches that can be on or off, although the initial configuration is not revealed. There is one special setting of the 17 switches that if it is ever achieved will enable the prisoners to go free. Each hour the warden escorts one prisoner to the switch room, where the prisoner can flip at most one switch (from on to off or off to on). The warden can choose prisoners in arbitrary order, so one prisoner may be chosen four times in a row, or not at all. Design a strategy for the 17 prisoners so that they are guaranteed to be set free after some finite amount of time.
- Free the prisoners II. Same premise as above, except that the switch room has 2 switches (initially both off), and a prisoner must flip exactly one of the two switches upon entering the switch room. At any time, a prisoner may declare "all 17 of us have visited the control room." If it is true, all prisoners are freed; otherwise they are all executed. The warden can choose prisoners in arbitrary order, so one prisoner may be chosen four times in a row, but each prisoner will be chosen infinitely often (assuming they are never freed). Design a strategy for the 17 prisoners so that they are guaranteed to be set free after some finite amount of time. Extra credit: don't assume the initial configuration is known.
- Count the number of 1 bits.
Write function that takes an integer input and returns the
number of 1's
in its binary representation.
Answer: here are an iterative and a recursive solution.
This is how Integer.bitCount() is implemented by Sun. See if you can figure out how it works.public static int bitCount(int input) { int count = 0; for (int i = 0; i < 32; i++) count = count + (input >>> i & 1); return count; } public static int bitCount(int x) { if (x == 0) return 0; return (x & 1) + bitCount(x >>> 1); }public static int bitCount(int i) { i = i - ((i >>> 1) & 0x55555555); i = (i & 0x33333333) + ((i >>> 2) & 0x33333333); i = (i + (i >>> 4)) & 0x0f0f0f0f; i = i + (i >>> 8); i = i + (i >>> 16); return i & 0x3f; } - Sparse bit-counting.
Explain why the following function (that often appears in job interviews
for programmers)
correctly counts the number of 1 bits in the binary representation of its input.
If the input has k 1's, how many times does the while loop iterate?
public static int bitCount(int input) { int count = 0; while (input != 0) { count++; input = input & (input - 1); } return count; } - Table lookup bit-counting.
Repeat the previous exercise, but pre-compute a table to speed up
the computation.
Answer: this one assumes you have a precomputed table of size 256, with bits[i] storing the number of 1 bits in the binary representation of i. You can use the bit counting function from the previous exercise to initialize it. previous
public static int bitCount(int input) { return bits[(input >> 0) & 0xff] + bits[(input >> 8) & 0xff] + bits[(input >> 16) & 0xff] + bits[(input >> 24) & 0xff]; }Increasing the table size to 216 = 65,536 will make things faster assuming you have sufficient memory. A table of size 232 is likely prohibitive.
- Dictionary attack. One method that sleazy spammers use to auto-generate email addresses is by enumerating all possible email addresses at a give domain, e.g., hotmail.com. This annoying tactic is called a dictionary or Rumpelstiltskin attack and explains why you sometimes receive spam on a new email address to which you haven't given to anybody. Use Converter.java to design such a program. Your program Rumpelstiltskin.java should take a command line parameter N and print out all 36N possible passwords of N or fewer characters involving numbers and uppercase letters.
- Breaking a gold chain. You have a gold chain with 14 links that you are going to use to pay an worker for 15 days at a fee of 1 gold link per day. It's possible to split the chain into 15 pieces by cutting 14 times. Your goal is to pay the worker while only breaking the chain 3 times. The worker must receive exactly the right fraction of total payment after each day of work. Hint: break the chain so there are pieces of 1 section, 2 sections, 4 sections, and 8 sections.
- Hamming encoder. Write a Java program HammingEncoder.java that reads in a sequence of 0s and 1s, 4 bits at a time, and encodes them using Hamming codes.
- Hamming decoder. Write a Java program HammingDecoder.java that reads in a sequence of 0s and 1s encoded using Hamming codes, 7 bits at a time, and decodes and correct them.
- Hamming codes. Modify your solutions to the previous two exercises so that the input bits are packed 8 to the byte.
- Absolute value. The constant Integer.MIN_VALUE is the most negative 32-bit two's complement integer. What is Math.abs(Integer.MIN_VALUE)?
- Prove that a k-digit decimal number can be represented in binary with no more than 4k bits.
- Sum of powers of 2. Compute the sum of powers of 2. What value do you end up with on two's complement machine?
- CD Database.
CDDB and freedb
are databases that allow you to look up
CD information on the Web and display the artist, title, and song name.
Each CD has a (nearly) unique disc ID number which is used to
query the database.
- Write a static method sumDigits() that takes an integer parameter and returns the sum of the decimal digits in the integer. For example, sumDigits(6324) returns 15 since 6 + 3 + 2 + 4 = 15.
-
Write a program CDDB.java that
computes the disc ID from a list of lengths of the track lengths.
The 32-bit (8 hex digit) ID number is computed from the length of
the tracks on the CD and the number of tracks as follows:
XXYYYYZZ XX = checksum of track offsets in seconds, taken mod 255 YYYY = length of the CD in seconds ZZ = number of tracks on the CD
- True or false. If a xor b = c, then c xor a = b and c xor b = a.
-
Explain why the following code fragment does not
leave ABCD in variable a. How would
you fix it?
Answer. In Java, byte is a signed 8-bit integer. The right-shift promotes b0 to a (negative) integer. To fix the problem, use c = ((b0 & 0xff) << 8) | (b1 & 0xff);.byte b0 = 0xAB; byte b1 = 0xCD; int c = (b0 << 8) | b1;
- Poisonous wine.
"You are the ruler of an empire and you are about to have celebration
tomorrow. The celebration is the most important party you have ever
hosted. You've got 1000 bottles of wine you were planning to open
for the celebration, but you find out that one of them is poisoned.
The actual poison exhibits no symptoms until somewhere around the
23rd hour, then results in sudden death. You have thousands of
prisoners at your disposal. What is the smallest number of
prisoners you must have to drink from the bottles to find
the poisoned bottle?"
Hint: you can represent the number 1,000 using 10 bits.
