22 Dec, 2004

# Clock Speed: Tell Me When it Hertz

Jargon explained: clock, megahertz/gigahertz, cycle

Computer performance is a traffic problem, moving data and instructions from memory and around inside the chip. Most people think of "traffic" in terms of cars and highways. However, there is a more relevant traffic analogy that everyone experienced before they learned to drive.

Students have been sitting in class for a long time. Finally the bell rings throughout the school signaling the end of the current period. Everyone gets up and moves through the hall to their next classroom. After a few minutes the bell rings again to signal the start of the next period. The bell has to ring everywhere in the school at the same time to coordinate movement. Without the bell, some classes would be released early and others would be released late.

The various parts of a computer hold instructions and data. Periodically they send this data along wires to the next processing station. To coordinate this activity, the computer provides a clock pulse. The clock is a regular pattern of alternating high and low voltages on a wire. To compare this with a clock in the hall, lets say the high voltage signal is a "tick" and the low voltage signal is a "tock". The clock speed is measured in millions per second (Megahertz) or billions per second (Gigahertz). A 100Mh PC mainboard has a clock which "ticks" and "tocks" 100 million times each second. Each tick-tock sequence is called a cycle. The clock pulse tells some circuits when to start sending data on the wires, while it tells other circuits when the data from the previous pulse should have already arrived.

A small point of notation: The standard clock speeds are some multiple of 33.3333... MHz. Three times this speed is 100 MHz. By convention, the speeds are rounded down to 33 and 66 MHz, but the fraction explains why three times a 33 MHz clock is 100 and not 99.

There are five ways to increase the processing power of a CPU or the teaching power of a High School.

• Raise the clock speed - In the analogy, this corresponds to reducing the time available for each class period. If the teacher can talk faster, and if the students behave and listen more closely, this can work up to a point. Each student gets done with the school day earlier.
• Build a Pipeline - A more complicated solution shortens the class period, but then breaks each subject into a sequence of steps. If it takes 45 minutes to cover Algebra, and that time cannot be reduced, then the subject could be covered in three consecutive 15 minute periods. A simpler subject might be covered in just one period. After all, there is no reason other than the convenience of scheduling why every every class for every subject lasts the same period of time. Students get done quicker, but only if some of the subjects are light weight.
• Parallelism - Add more classrooms and more students. No one student learns anything faster, but at the end of the day the school has taught more people in the same amount of time. Of course, this only works if you have more students in the school district to teach.
• Class Size - double the number of students in each classroom. High Schools don't like to do this. Computers, however, can easily switch from 32 to 64 bit operations. This will not effect most programs, but the particular applications that need processing power (games, multimedia) can be distributed in a 64 bit form to get more work done per operation.
• Build a Second School - Sometime in '05 or '06 both Intel and AMD will begin to ship "multi-core" processor chips. This creates a system with two separate CPUs. An individual program won't run any faster, and if these chips have a slower clock may even run more slowly. However, two programs will be able to run at once, and programs that require the most performance (games, multimedia) can be written to use both CPUs at once.

The easiest solution, and the one that benefits everyone without requiring any changes to software, is to speed up the clock. Beyond a point, that also required a longer pipeline. Then sometime in 2004 both CPU vendors ran into a ceiling. Intel had difficulty pushing its clock much beyond 3 GHz, and AMD had trouble pushing past 2 GHz. Because AMD had more parallelism, the AMD chip was just as powerful as the Intel chip despite the lower clock speed.

So both vendors reconsidered their strategy and have decided to consider the other options. AMD was first to offer a 64 bit processor, but because Microsoft was not ready to ship a corresponding 64 bit version of its operating system the AMD advantage has been limited.  Intel developed a range of tricks to get more work done at lower clock speeds and lower power in their Centrino laptop processor, and they are now migrating some of this technology to desktop systems.

## Vital Statistics

Warning: This is a confusing collection of numbers. What is important is how the different numbers relate to each other. Specific numbers are plugged in because they are occasionally mentioned in the literature.

Example: There are two versions of the Intel 2.4 GHz Pentium 4. One get a clock speed from the mainboard of 100 MHz, but since it transfers data 4 times per clock tick its "Font Side Bus" (FSB) to memory and I/O is said to be four times the clock or 400 MHz. Internally the CPU has a "multiplier" of 24, meaning the external clock is divided into 24 periods to produce the 2.4 GHz value. A slightly more modern version of P4 gets a 133 MHz clock, has a 533 MHz Front Side Bus and has a multiplier of 18. The equivalent AMD Athlon XP 2400+ gets a clock of 133 MHz, has a Front Side Bus twice that at 266 MHz, and an internal multiplier of 15. That gives it an internal speed of 2.0 GHz, but since it executes more instructions per internal clock tick it is rated to be equivalent to Intel's 2.4 GHz.

 Chip Type Actual Clock Bits/Clock FSB Multiplier Speed Pentium 4 2.4 100 MHz 4 400 MHz 24 2.4 GHz Pentium 4 2.4A 133 MHz 4 533 MHz 18 2.4 GHz Athlon XP 2400 133 MHz 2 266 MHz 15 2.0 GHz

The earliest PC had one clock, and its signal applied to the CPU, memory, and all the I/O devices. A modern PC has many different clock signals for different areas of the machine. Clocks are generated by the mainboard. Their speed is often set in the BIOS setup panels that appear when the user presses DEL or another key during the power up boot.

CPU socket clock
The mainboard generates a clock signal that paces the transfer of data to and from the CPU. Data from the CPU may be going to memory, to the AGP video card, or to an I/O device. The mainboard may sense the CPU chip and set the clock based on the manufacturer's recommendation, or it may provide a BIOS setup panel that lets the user adjust the clock value. The standard values tend to be 100, 133, 166, or 200 MHz.
Front Side Bus (FSB)
The CPU transfers data to the "Northbridge" chip on the mainboard. From there it can go to memory, the video card, or the I/O bus. An Intel CPU transfers data 4 times for every cycle of the CPU socket clock. So while the actual clock speed may be 200 MHz, an Intel CPU chip is typically described as having an 800 MHz Front Side Bus. AMD is more complicated. The old 32 bit Athlon processors transferred data only twice per clock cycle. With a CPU clock of 166 MHz, the FSB is 333 MHz. However, the new Athlon 64 CPU chip has its own integrated memory controller and a high speed HyperTransport integrated I/O bus. FSB numbers would be meaningless. There is no Northbridge chip between the CPU and other devices. The CPU can use its direct connection to memory while at the same time performing high speed I/O to video or other devices.
Multiplier
The CPU generates an internal clock that runs faster than the mainboard clock. If the mainboard clock is 100 MHz and the CPU "multiplier" is 24, then the internal clock cycles 24 times for every tick of the mainboard clock, producing a CPU speed of 2.4 GHz. The same 2.4 GHz can also be produced by applying a multiplier of 18 to a mainboard clock running at 133 MHz. The multiplier is manufactured into the CPU chip and cannot be changed.
Memory
Modern mainboards generate a separate clock to the memory. As it happens, the current memory clock rates are also 100, 133, 166, and 200 MHz. Some motherboards generate this clock as a completely independent number, while others express it as a ratio to the CPU bus clock. DDR (double data rate) memory transfers data twice per cycle (on the tick and again on the tock) and is therefore often quoted as having a speed that is twice the actual clock speed (200, 266, 333, or 400 MHz).
PCI Bus
The PCI standard calls for a 33 MHz clock speed. Some systems generate this independently, but most systems simply divide the 100 MHz CPU bus clock by three or the 133 MHz clock by 4. This is fine as long as you stick to the standard values. If you use the BIOS to nudge the CPU up slightly to a non-standard value like 110 MHz, then the PCI bus will also be running fast. At some point, one of the adapter cards will be far enough out of spec that it will become unreliable.

The Front Side Bus connects the CPU to memory. If the FSB is running at an effective rate of 800 MHz but the fastest memory is 400 MHz, then the CPU gets no benefit from its data transfer ability. The newest high performance mainboards have two separate memory buses. DDR memory has to be installed in pairs. A memory reference is split between the two 400 MHz buses producing an 800 MHz aggregate transfer rate that matches the speed of the CPU.

## BIOS Setup

Each time the computer powers up, the mainboard senses the type of CPU that is installed. It can sense if the type of CPU has changed. Initially, the CPU clock speed will be set to whatever value is standard for this particular model of processor. Similarly, the mainboard determines the type of memory and sets speeds and timings to match the slowest type of memory installed in the system.

After the mainboard has been shipped to customers, the CPU vendor may add new processor models. Existing mainboards can be updated to correctly handle these new CPU chips by updating a set of programs called the BIOS. Unlike ordinary software, the BIOS is stored in read-only memory on the mainboard and it provides programming for the chipset instead of the CPU. However, if the new CPU chip fits in the same socket and uses the same voltage levels as the older processors, then an alternative to updating the BIOS is to manually enter all the right speeds and timings into the BIOS configuration screens displayed if you press Del or F2 just as the computer begins to power up.

More aggressive computer users may enter values that are faster than the numbers published for their CPU chip. This practice is called "overclocking." Because processors are tested beyond their rated speed, almost any CPU can be overclocked by 5% or 10%. More than that may require special cooling.

## Nanoseconds

All the ads and specifications quote clock speed in Megahertz. However, the more important number is the length of time between clock ticks (the cycle time). Such periods are usually measured in nanoseconds (billionths of a second) abbreviated "nsec."

Electricity travels through a copper wire just a bit slower than the speed of light. Normally, we can just regard the speed of light as "very fast." It becomes important when the distances are very long (astronomy) or when the times are very short (computers). A nanosecond is the amount of time that it takes light (or an electric signal) to travel about one foot.

PC clock speeds appear at first to be a strange collection of numbers. However, the corresponding cycle types display a much more regular pattern:

```        Clock   Cycle		Bus
33Mh    30 nsec		PCI (general adapter cards)
66Mh    15 nsec		AGP (video adapter)
100Mh   10 nsec		mainboard clock to the CPU
2Ghz     0.5 nsec	CPU internal clock after multipler```

A processor with a 2 GHz clock must perform operations in less time than it takes for light (or electricity) to travel 6 inches. The chip is very small, but it has millions of circuits. All must be manufactured to a very high level of precision.

However, it is much simpler to apply quality control to a chip the size of a fingernail than to the entire mainboard. This by itself show the problems of a higher speed main clock, and the benefit of capping the I/O bus design at 33 MHz (30 light-feet of signal distance).

## Instructions per Cycle: Get in Gear

To add up a column of numbers with a pocket calculator, you simply type each number in and press the "+" key (or the "=" key at the end). Most users probably think that a PC spreadsheet program does the same thing. However, the human brain has actually been doing the hard part of the operation, moving down one row in the column, focusing on the number, and recognizing it. Each PC instruction carries with it a number of additional operations that would not be obvious to the casual user.

First, the computer must locate the next instruction in memory and move it to the CPU. This instruction is coded as a number. The computer must decode the number to determine the operation (say ADD), and the size of the data (say 16-bits). Additional information is then moved and decoded to determine the location in memory (the row and column of the spreadsheet). Finally, the number is added to the running total. Although a human might take some time to add two eight digit numbers together, the addition is the simplest part of the operation for a computer chip. Decoding the instruction and locating the data take the most time.

Each generation of Intel CPU chip has performed this operation in fewer clock cycles than the previous generation.

• A 386 CPU required a minimum of 6 clock ticks to add two numbers.
• A 486 CPU could generally add two numbers in two clock ticks.
• A Pentium CPU could add two numbers in a single clock tick.
• A modern processor can add two to six pairs of numbers in a single clock tick. If it discovers that the next instruction needs data that hasn't arrived from slow memory, it can rearrange things to execute subsequent instructions until the data arrives.

To make a car go faster, one steps on the accelerator. Extra gas makes the engine rotate faster. When RPM gets high enough, it is better to shift to a higher gear. The PC system clock (measured in MHz) is like the engine speed (measured in RPM). The CPU model selects the gear. The original 86 processor was like first gear, and the 486 is like fourth gear. So it is a mistake to compare clock speed across changes in the architecture.

This explains the current difference between Intel and AMD chip speeds. AMD has more internal processing units, so it executes more instructions at the same clock speed. AMD therefore quotes its processor by the equivalent Intel processor speed and not the actual clock.

Copyright 1998, 2004 PCLT -- Introduction to PC Hardware -- H. Gilbert