Test Flashcards

(439 cards)

1
Q

Personal computer (PC )

A

A computer designed for use b y an individual, usually incorpor ating a graphics displa y, a keyboar d, and a mouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Server

A

A computer used for running lar ger pr ograms for multiple users, often simultaneously , and typically accessed only via a network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Super computer

A

A class of computers with the highest per formance and cost; the y are congur ed as ser vers and typically cost tens t o hundr eds of millions of dollars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Embedded computer

A

A computer inside another de vice used for running one pr edetermined application or collection of softwar e. processor cor es Many embedded pr ocessors ar e designed using pr ocessor cor es, a v ersion of a pr ocessor written in a har dwar e description language, such as V erilog or VHDL. Personal mobile de vices Personal mobile de vices (PMDs) ar e small wir eless de vices t o connect t o the Internet; the y rely on batteries for power , and softwar e is installed b y downloading apps. Conv entional examples ar e smar t phones and tablets. PMDs Personal mobile de vices (PMDs) ar e small wir eless de vices t o connect t o the Internet; the y rely on batteries for power , and softwar e is installed b y downloading apps. Conv entional examples ar e smar t phones and tablets. Warehouse Scale Computers Taking o ver from the conv entional ser ver is Cloud Computing , which r elies upon giant datacenters that ar e now known as W arehouse Scale Computers ( WSCs). Cloud computing Cloud computing r efers t o lar ge collections of ser vers that pr ovide ser vices o ver the Internet; some providers r ent dynamically v arying numbers of ser vers as a utility . Softwar e as a Ser vice Softwar e as a Ser vice (SaaS) deliv ers softwar e and data as a ser vice o ver the Internet, usually via a thin pr ogram such as a br owser that runs on local client de vices, instead of binar y code that must be installed, and runs wholly on that de vice. Examples include web sear ch and social networking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Multicor e micr oprocessor

A

A micr oprocessor containing multiple pr ocessors (“ cores”) in a single integr ated cir cuit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Acronym

A

A wor d constructed b y taking the initial letters of a string of wor ds. F or example: RAM is an acr onym for Random Access Memor y, and CPU is an acr onym for Centr al Pr ocessing Unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Terabyte (TB)

A

Originally 1,099,511,627,776 (2 ) bytes, although communications and secondar y storage systems de velopers star ted using the term t o mean 1,000,000,000,000 (10 ) bytes. tebib yte (TiB) To reduce confusion, we now use the term tebib yte (TiB) for 2 bytes, dening terabyte (TB) t o mean 10 bytes. 2.2 Eight gr eat ideas in computer ar chitectur e Moor e’s Law Moor e’s Law states that integr ated cir cuit r esour ces double e very 18-24 months. abstr actions A major pr oductivity technique for har dwar e and softwar e is t o use abstr actions t o char acteriz e the design at diff erent le vels of r epresentation; lower-le vel details ar e hidden t o off er a simpler model at higher le vels. common case Making the common case fast will tend t o enhance per formance better than optimizing the r are case.40 12 40 12 parallel Since the dawn of computing, computer ar chitects ha ve off ered designs that get mor e performance b y computing oper ations in par allel. pipelining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Systems softwar e

A

Softwar e that pr ovides ser vices that ar e commonly useful, including oper ating systems, compilers, loaders, and assemblers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Oper ating system

A

Super vising pr ogram that manages the r esour ces of a computer for the benet of the pr ograms that run on that computer .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Compiler

A

A pr ogram that tr anslates high-le vel language statements int o assembly language statements. binar y numbers The two symbols for these two letters ar e the numbers 0 and 1, and we commonly think of the computer language as numbers in base 2, or binar y numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Binar y digit

A

Also called a bit. One of the two numbers in base 2 (0 or 1) that ar e the components of information. bit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Binar y digit

A

Also called a bit. One of the two numbers in base 2 (0 or 1) that ar e the components of information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Instruction

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assembler

A

A pr ogram that tr anslates a symbolic v ersion of instructions int o the binar y version.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Assembly language

A

A symbolic r epresentation of machine instructions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Machine language

A

A binar y representation of machine instructions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

High-le vel pr ogramming language

A

A por table language such as C, C++, Ja va, or Visual Basic that is composed of wor ds and algebr aic notation that can be tr anslated b y a compiler int o assembly language. 2.4 Under the co vers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Input de vice

A

A mechanism thr ough which the computer is f ed information, such as a k eyboar d.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Output de vice

A

A mechanism that conv eys the r esult of a computation t o a user , such as a displa y, or to another computer .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Liquid cr ystal displa y

A

A displa y technology using a thin la yer of liquid polymers that can be used t o transmit or block light accor ding t o whether a char ge is applied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Activ e matrix displa y

A

A liquid cr ystal displa y using a tr ansist or to contr ol the tr ansmission of light at each individual pix el. bit map The image is composed of a matrix of pictur e elements, or pixels, which can be r epresented as a matrix of bits, called a bit map.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Pixel

A

The smallest individual pictur e element. Scr eens ar e composed of hundr eds of thousands t o millions of pix els, or ganiz ed in a matrix. raster r efresh buff er The computer har dwar e suppor t for gr aphics consists mainly of a r aster r efresh buff er, or fr ame buffer, to store the bit map. frame buff er The computer har dwar e suppor t for gr aphics consists mainly of a r aster r efresh buff er, or fr ame buffer, to store the bit map.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Integr ated cir cuit

A

Also called a chip. A de vice combining do zens t o millions of tr ansist ors. chip

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Integr ated cir cuit

A

Also called a chip. A de vice combining do zens t o millions of tr ansist ors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Centr al pr ocessor unit (CPU)
Also called pr ocessor . The activ e par t of the computer , which contains the datapath and contr ol and which adds numbers, tests numbers, signals I/O de vices t o activ ate, and so on.
26
Centr al pr ocessor unit (CPU)
Also called pr ocessor . The activ e par t of the computer , which contains the datapath and contr ol and which adds numbers, tests numbers, signals I/O de vices t o activ ate, and so on. processor
27
Centr al pr ocessor unit (CPU)
Also called pr ocessor . The activ e par t of the computer , which contains the datapath and contr ol and which adds numbers, tests numbers, signals I/O de vices t o activ ate, and so on.
28
Datapath
The component of the pr ocessor that per forms arithmetic oper ations.
29
Contr ol
The component of the pr ocessor that commands the datapath, memor y, and I/O de vices accor ding t o the instructions of the pr ogram.
30
Memor y
The st orage ar ea in which pr ograms ar e kept when the y are running and that contains the data needed b y the running pr ograms.
31
Dynamic r andom access memor y (DRAM)
Memor y built as an integr ated cir cuit; it pr ovides random access t o any location. Access times ar e 50 nanoseconds and cost per gigab yte in 2012 was $5 t o $10.
32
Dynamic r andom access memor y (DRAM)
Memor y built as an integr ated cir cuit; it pr ovides random access t o any location. Access times ar e 50 nanoseconds and cost per gigab yte in 2012 was $5 t o $10.
33
Cache memor y
A small, fast memor y that acts as a buff er for a slower , larger memor y.
34
Static r andom access memor y (SRAM)
Also memor y built as an integr ated cir cuit, but faster and less dense than DRAM.
35
Instruction set ar chitectur e
Also called ar chitectur e. An abstr act inter face between the har dwar e and the lowest-le vel softwar e that encompasses all the information necessar y to write a machine language pr ogram that will run corr ectly , including instructions, r egisters, memor y access, I/O , and so on. architectur e
36
Instruction set ar chitectur e
Also called ar chitectur e. An abstr act inter face between the har dwar e and the lowest-le vel softwar e that encompasses all the information necessar y to write a machine language pr ogram that will run corr ectly , including instructions, r egisters, memor y access, I/O , and so on.
37
Application binar y inter face (ABI)
The user por tion of the instruction set plus the oper ating system inter faces used b y application pr ogrammers. It denes a standar d for binar y por tability acr oss computers.
38
Implementation
Har dwar e that obe ys the ar chitectur e abstr action.
39
Volatile memor y
St orage, such as DRAM, that r etains data only if it is r eceiving power .
40
Nonv olatile memor y
A form of memor y that r etains data e ven in the absence of a power sour ce and that is used t o store programs between runs. A D VD disk is nonv olatile.
41
Main memor y
Also called primar y memor y. Memor y used t o hold pr ograms while the y are running; typically consists of DRAM in t oday's computers. primar y memor y
42
Main memor y
Also called primar y memor y. Memor y used t o hold pr ograms while the y are running; typically consists of DRAM in t oday's computers.
43
Secondar y memor y
Nonv olatile memor y used t o store programs and data between runs; typically consists of ash memor y in PMDs and magnetic disks in ser vers.
44
Magnetic disk
Also called har d disk. A form of nonv olatile secondar y memor y composed of rotating platters coated with a magnetic r ecor ding material. Because the y are rotating mechanical devices, access times ar e about 5 t o 20 milliseconds and cost per gigab yte in 2012 was $0.05 t o $0.10.
45
Flash memor y
A nonv olatile semiconduct or memor y. It is cheaper and slower than DRAM but more expensiv e per bit and faster than magnetic disks. Access times ar e about 5 t o 50 micr oseconds and cost per gigab yte in 2012 was $0.75 t o $1.00. networks Networks inter connect whole computers, allowing computer users t o extend the power of computing b y including communication.
46
Local ar ea network (L AN)
A network designed t o carr y data within a geogr aphically conned ar ea, typically within a single building.
47
Local ar ea network (L AN)
A network designed t o carr y data within a geogr aphically conned ar ea, typically within a single building.
48
Wide ar ea network ( WAN)
A network extended o ver hundr eds of kilometers that can span a continent.
49
Wide ar ea network ( WAN)
A network extended o ver hundr eds of kilometers that can span a continent. 2.5 Technologies for building pr ocessors and memor y
50
Transist or
An on/off switch contr olled b y an electric signal.
51
Very large-scale integr ated ( VLSI) cir cuit
A de vice containing hundr eds of thousands t o millions of transist ors.
52
Very large-scale integr ated ( VLSI) cir cuit
A de vice containing hundr eds of thousands t o millions of transist ors.
53
Silicon
A natur al element that is a semiconduct or.
54
Semiconduct or
A substance that does not conduct electricity well.
55
Silicon cr ystal ingot
A r od composed of a silicon cr ystal that is between 8 and 12 inches in diameter and about 12 t o 24 inches long.
56
Wafer
A slice fr om a silicon ingot no mor e than 0.1 inches thick, used t o create chips.
57
Defect
A micr oscopic aw in a waf er or in patterning steps that can r esult in the failur e of the die containing that def ect.
58
Die
The individual r ectangular sections that ar e cut fr om a waf er, mor e informally known as chips. chips
59
Die
The individual r ectangular sections that ar e cut fr om a waf er, mor e informally known as chips.
60
Yield
The per centage of good dies fr om the t otal number of dies on the waf er. 2.6 Performance
61
Response time
Also called ex ecution time. The t otal time r equir ed for the computer t o complete a task, including disk accesses, memor y accesses, I/O activities, oper ating system o verhead, CPU execution time, and so on. execution time
62
Response time
Also called ex ecution time. The t otal time r equir ed for the computer t o complete a task, including disk accesses, memor y accesses, I/O activities, oper ating system o verhead, CPU execution time, and so on.
63
Throughput
Also called bandwidth. Another measur e of per formance, it is the number of tasks completed per unit time. bandwidth
64
Throughput
Also called bandwidth. Another measur e of per formance, it is the number of tasks completed per unit time.
65
Time is the measur e of computer per formance
the computer that per forms the same amount of work in the least time is the fastest. wall clock time The most str aightfor ward denition of time is called wall clock time, r esponse time, or elapsed time. These terms mean the t otal time t o complete a task, including disk accesses, memor y accesses, input/output (I/O) activities, oper ating system o verhead—e verything. response time The most str aightfor ward denition of time is called wall clock time, r esponse time, or elapsed time. These terms mean the t otal time t o complete a task, including disk accesses, memor y accesses, input/output (I/O) activities, oper ating system o verhead—e verything. elapsed time The most str aightfor ward denition of time is called wall clock time, r esponse time, or elapsed time. These terms mean the t otal time t o complete a task, including disk accesses, memor y accesses, input/output (I/O) activities, oper ating system o verhead—e verything.
66
CPU ex ecution time
Also called CPU time. The actual time the CPU spends computing for a specic task.
67
CPU ex ecution time
Also called CPU time. The actual time the CPU spends computing for a specic task.
68
User CPU time
The CPU time spent in a pr ogram itself.
69
System CPU time
The CPU time spent in the oper ating system per forming tasks on behalf of the program. system per formance We will use the term system per formance t o refer to elapsed time on an unloaded system and CPU per formance t o refer to user CPU time. CPU per formance We will use the term system per formance t o refer to elapsed time on an unloaded system and CPU per formance t o refer to user CPU time. Clock r ate Clock r ate is the inv erse of the clock period.
70
Clock cy cle
Also called tick, clock tick, clock period, clock, or cy cle. The time for one clock period, usually of the pr ocessor clock, which runs at a constant r ate.
71
Clock period
The length of each clock cy cle.
72
Clock cy cles per instruction (CPI)
A verage number of clock cy cles per instruction for a pr ogram or program fr agment.
73
Instruction count
The number of instructions ex ecuted b y the pr ogram.
74
Instruction mix
A measur e of the dynamic fr equency of instructions acr oss one or many programs. IPC Some designers inv ert CPI t o talk about IPC, or instructions per clock cy cle. If a pr ocessor executes on a verage two instructions per clock cy cle, then it has an IPC of 2 and hence a CPI of 0.5. instructions per clock cy cle Some designers inv ert CPI t o talk about IPC, or instructions per clock cy cle. If a pr ocessor executes on a verage two instructions per clock cy cle, then it has an IPC of 2 and hence a CPI of 0.5. 2.7 The power wall fanout The number of tr ansist ors connected t o an output (called the fanout). 2.8 The sea change: The switch fr om unipr ocessors t o multipr ocessors Redundant Arr ays of Inexpensiv e Disks Many disks in conjunction can off er much higher thr oughput, which was the original inspir ation of Redundant Arr ays of Inexpensiv e Disks (RAID). RAID Many disks in conjunction can off er much higher thr oughput, which was the original inspir ation of Redundant Arr ays of Inexpensiv e Disks (RAID). graphics pr ocessing unit The gr aphics pr ocessing unit (GPU) is a har dwar e component that acceler ates gr aphics. GPU The gr aphics pr ocessing unit (GPU) is a har dwar e component that acceler ates gr aphics. 2.9 Real stuff: Benchmarking the Intel Cor e i7
75
Workload
A set of pr ograms run on a computer that is either the actual collection of applications run b y a user or constructed fr om r eal pr ograms t o appr oximate such a mix. A typical workload species both the pr ograms and the r elativ e frequencies.
76
Benchmark
A pr ogram selected for use in comparing computer per formance. SPEC SPEC (System P erformance E valuation Cooper ative) is an effor t funded and suppor ted b y a number of computer v endors t o create standar d sets of benchmarks for modern computer systems. System P erformance E valuation Cooper ative SPEC (System P erformance E valuation Cooper ative) is an effor t funded and suppor ted b y a number of computer v endors t o create standar d sets of benchmarks for modern computer systems. SPECr atio Dividing the ex ecution time of a r eference pr ocessor b y the ex ecution time of the e valuated computer normaliz es the ex ecution time measur ements; this normalization yields a measur e, called the SPECr atio, which has the adv antage that bigger numeric r esults indicate faster performance. 2.10 Fallacies and pitfalls Amdahl' s Law Amdahl' s Law: A rule stating that the per formance enhancement possible with a giv en impr ovement is limited b y the amount that the impr oved featur e is used. It is a quantitativ e version of the law of diminishing r eturns.
77
Million instructions per second (MIPS)
A measur ement of pr ogram ex ecution speed based on the number of millions of instructions. MIPS is computed as the instruction count divided b y the product of the ex ecution time and 10 . 2.12 Historical perspectiv e and r eading J. Pr esper E ckert J. Pr esper E ckert and John Mauchly at the Moor e School of the Univ ersity of P ennsylv ania built what is widely accepted t o be the world' s rst oper ational electr onic, gener al-purpose computer called the ENI AC (Electr onic Numerical Integr ator and Calculat or). John Mauchly J. Pr esper E ckert and John Mauchly at the Moor e School of the Univ ersity of P ennsylv ania built what is widely accepted t o be the world' s rst oper ational electr onic, gener al-purpose computer called the ENI AC (Electr onic Numerical Integr ator and Calculat or). ENIAC J. Pr esper E ckert and John Mauchly at the Moor e School of the Univ ersity of P ennsylv ania built what is widely accepted t o be the world' s rst oper ational electr onic, gener al-purpose computer called the ENI AC (Electr onic Numerical Integr ator and Calculat or). von Neumann Von Neumann helped cr ystalliz e the ideas and wr ote a memo pr oposing a st ored-pr ogram computer called ED VAC (Electr onic Discr ete V ariable A utomatic Computer). EDVAC Von Neumann helped cr ystalliz e the ideas and wr ote a memo pr oposing a st ored-pr ogram computer called ED VAC (Electr onic Discr ete V ariable A utomatic Computer). EDSA C Wilk es decided t o embark on a pr oject t o build a st ored-pr ogram computer named EDSA C (Electr onic Dela y Storage A utomatic Calculat or). EDSA C star ted working in 1949 and was the world' s rst full-scale, oper ational, st ored-pr ogram computer . Mark-I6 6 A small pr ototype called the Mark-I, built at the Univ ersity of Manchester in 1948, might be called the rst oper ational st ored-pr ogram machine. John A tanasoff John A tanasoff, who built a small-scale electr onic computer in the early 1940s. Konrad Z use Another pioneering computer that deser ves cr edit was a special-purpose machine built b y Konrad Zuse in Germany in the late 1930s and early 1940s. Alan T uring During W orld W ar II special-purpose electr onic computers wer e built t o decr ypt inter cepted German messages. A team at Bletchle y Park, including Alan T uring, built the Colossus in 1943. Colossus During W orld W ar II special-purpose electr onic computers wer e built t o decr ypt inter cepted German messages. A team at Bletchle y Park, including Alan T uring, built the Colossus in 1943. Howar d Aik en Howar d Aik en was building an electr o-mechanical computer called the Mark-I at Har vard (a name that Manchester later adopted for its machine). Mark-I Howar d Aik en was building an electr o-mechanical computer called the Mark-I at Har vard (a name that Manchester later adopted for its machine). Harvard architectur e The term Har vard architectur e was coined t o describe machines with distinct memories. Whirlwind pr oject The Whirlwind pr oject was begun at MI T in 1947 and was aimed at applications in r eal-time r adar signal pr ocessing. Although it led t o several inv entions, its most impor tant inno vation was magnetic cor e memor y. BINAC Eckert and Mauchly formed E ckert-Mauchly Computer Corpor ation. Their rst machine, the BIN AC, was built for Nor throp and was shown in A ugust 1949. UNIV AC I Originally deliv ered in June 1951, UNIV AC I sold for about $1 million and was the rst successful commer cial computer . IBM 701 The rst IBM computer , the IBM 701, shipped in 1952, and e ventually 19 units wer e sold. Digital E quipment Corpor ation Digital E quipment Corpor ation (DEC ) unv eiled the PDP-8, the rst commer cial minicomputer . DEC Digital E quipment Corpor ation (DEC ) unv eiled the PDP-8, the rst commer cial minicomputer . PDP-8 Digital E quipment Corpor ation (DEC ) unv eiled the PDP-8, the rst commer cial minicomputer . minicomputer The minicomputer was a small machine that was a br eakthr ough in low-cost design, allowing DEC to off er a computer for under $20,000. Intel 4004 Intel inv enting the rst micr oprocessor in 1971—the Intel 4004. super computer Super computer , an extr emely fast computer tar geted t o per form a lar ge number of computations typically needed b y scientic applications. Seymour Cr ay Seymour Cr ay is often cr edited as the "father of super computing" and r egar ded as a pioneer of super computing. Cray-1 The Cr ay-1 was simultaneously the fastest in the world, the most expensiv e, and the computer with the best cost/per formance for scientic pr ograms. Apple IIe In 1977, the Apple IIe ( gur e below) fr om Ste ve Jobs and Ste ve Wozniak set standar ds for low cost, high v olume, and high r eliability that dened the personal computer industr y. Steve Jobs In 1977, the Apple IIe ( gur e below) fr om Ste ve Jobs and Ste ve Wozniak set standar ds for low cost, high v olume, and high r eliability that dened the personal computer industr y. Steve Wozniak In 1977, the Apple IIe ( gur e below) fr om Ste ve Jobs and Ste ve Wozniak set standar ds for low cost, high v olume, and high r eliability that dened the personal computer industr y. Xerox Alt o The computer that inspir ed many of the ar chitectur al and softwar e concepts that char acteriz e the modern deskt op machines was the X erox Alt o. ARPAnet The wide ar ea ARP Anet pr oduced the rst v ersions of Internet-style networking. Whetst one The Whetst one synthetic pr ogram was cr eated b y measuring scientic pr ograms written in Algol- 60. Dhrystone Dhrystone is another synthetic benchmark that is still used in some embedded computing cir cles. Kernels Kernels ar e small, time-intensiv e pieces fr om r eal pr ograms that ar e extr acted and then used as benchmarks. Livermor e Loops Livermor e Loops and Linpack ar e the best-known examples of k ernel benchmarks. Linpack Livermor e Loops and Linpack ar e the best-known examples of k ernel benchmarks. SPEC SPEC pr ovided benchmark sets for gr aphics, high-per formance scientic computing, object- oriented computing, le systems, W eb ser vers and clients, Ja va, engineering CAD applications, and power . Embedded Micr oprocessor Benchmark Consor tium The embedded community was inspir ed b y SPEC t o create the Embedded Micr oprocessor Benchmark Consor tium (EEMBC ). Star ted in 1997, it consists of a collection of k ernels or ganiz ed into suites that addr ess diff erent por tions of the embedded industr y. EEMBC The embedded community was inspir ed b y SPEC t o create the Embedded Micr oprocessor Benchmark Consor tium (EEMBC ). Star ted in 1997, it consists of a collection of k ernels or ganiz ed into suites that addr ess diff erent por tions of the embedded industr y. 3. Instructions 3.1 Introduction
78
Instruction set
The v ocabular y of commands underst ood b y a giv en ar chitectur e.
79
Stored-pr ogram concept
The idea that instructions and data of many types can be st ored in memor y as numbers and thus be easy t o change, leading t o the st ored-pr ogram computer . 3.2 Oper ations of the computer har dwar e comments The wor ds to the right of the double slashes (/ /) on each line ar e comments for the human r eader , so the computer ignor es them. 3.3 Oper ands of the computer har dwar e
80
Word
A natur al unit of access in a computer , usually a gr oup of 32 bits.
81
Doublewor d
Another natur al unit of access in a computer , usually a gr oup of 64 bits; corr esponds to the siz e of a r egister in the LEGv8 ar chitectur e.
82
Data tr ansf er instruction
A command that mo ves data between memor y and r egisters.
83
Addr ess
A v alue used t o delineate the location of a specic data element within a memor y arr ay. load The data tr ansf er instruction that copies data fr om memor y to a r egister is tr aditionally called load. base addr ess A base addr ess is the star ting addr ess of an arr ay in memor y. base r egister A base r egister is a r egister that holds an arr ay's base addr ess. offset An offset is a constant v alue added t o a base addr ess t o locate a par ticular arr ay element. spilling r egisters The pr ocess of putting less fr equently used v ariables (or those needed later) int o memor y is called spilling r egisters. 3.4 Signed and unsigned numbers
84
Binar y digit
Also called binar y bit. One of the two numbers in base 2, 0 or 1, that ar e the components of information. Least signicant bit Least signicant bit: The rightmost bit in an LEGv8 doublewor d. Most signicant bit Most signicant bit: The leftmost bit in an LEGv8 doublewor d.. overow If the number that is the pr oper r esult of such oper ations cannot be r epresented b y these rightmost har dwar e bits, o verow is said t o ha ve occurr ed. sign and magnitude r epresentation Sign and magnitude r epresentation is a signed number r epresentation wher e a single bit is used t o represent the sign, and the r emaining bits r epresent the the magnitude. Two's complement Two's complement: A signed number r epresentation wher e a leading 0 indicates a positiv e number and a leading 1 indicates a negativ e number . The complement of a v alue is obtained b y complementing each bit (0 → 1 or 1 → 0), and then adding one t o the r esult. sign extension The function of a signed load is t o cop y the sign r epeatedly t o ll the r est of the r egister , known as a sign extension. One's complement One's complement: A notation that r epresents the most negativ e value b y 10 … 000 and the most positiv e value b y 01 … 11 , leaving an equal number of negativ es and positiv es but ending up with two z eros, one positiv e (00 … 00 ) and one negativ e (11 … 11 ). The term is also used to mean the inv ersion of e very bit in a pattern: 0 t o 1 and 1 t o 0.
85
Biased notation
A notation that r epresents the most negativ e value b y 00 … 000 and the most positiv e value b y 11 … 11 , with 0 typically ha ving the v alue 10 … 00 , ther eby biasing the number such that the number plus the bias has a non-negativ e representation. 3.5 Repr esenting instructions in the computer elds A machine instruction is composed of elds, each eld ha ving se veral bits and r epresenting some part of the instruction.two two two two two two two
86
Instruction format
A form of r epresentation of an instruction composed of elds of binar y numbers.
87
Machine language
Binar y representation used for communication within a computer system.
88
Hexadecimal
Numbers in base 16.
89
Opcode
The eld that denotes the oper ation and format of an instruction. destination r egister A destination r egister is a r egister that r eceiv es the r esult of an oper ation. 3.6 Logical oper ations shift A shift mo ves all the bits in a doublewor d to the left or right, lling the emptied bits with 0s.
90
AND
A logical bit- b y-bit oper ation with two oper ands that calculates a 1 only if ther e is a 1 in both oper ands. mask AND can apply a bit pattern t o a set of bits t o for ce 0s wher e ther e is a 0 in the bit pattern. Such a bit pattern in conjunction with AND is tr aditionally called a mask, since the mask " conceals" some bits.
91
OR
A logical bit-b y-bit oper ation with two oper ands that calculates a 1 if ther e is a 1 in either oper and.
92
NOT
A logical bit-b y-bit oper ation with one oper and that inv erts the bits; that is, it r eplaces e very 1 with a 0, and e very 0 with a 1.
93
EOR
A logical bit-b y-bit oper ation with two oper ands that calculates the ex clusiv e OR of the two oper ands. That is, it calculates a 1 only if the v alues ar e diff erent in the two oper ands. 3.7 Instructions for making decisions
94
Conditional br anch
An instruction that tests a v alue and that allows for a subsequent tr ansf er of contr ol to a new addr ess in the pr ogram based on the outcome of the test.
95
Basic block
A sequence of instructions without br anches (ex cept possibly at the end) and without branch tar gets or br anch labels (ex cept possibly at the beginning).
96
Branch addr ess table
Also called br anch table. A table of addr esses of alternativ e instruction sequences. branch table
97
Branch addr ess table
Also called br anch table. A table of addr esses of alternativ e instruction sequences. 3.8 Suppor ting pr ocedur es in computer har dwar e
98
Procedur e
A st ored subr outine that per forms a specic task based on the par ameters with which it is pr ovided.
99
Branch-and-link instruction
An instruction that br anches t o an addr ess and simultaneously sa ves the addr ess of the following instruction in a r egister (LR or X30 in LEGv8).
100
Return addr ess
A link t o the calling site that allows a pr ocedur e to return t o the pr oper addr ess; in LEGv8 it is st ored in r egister LR (X30) .
101
Caller
The pr ogram that instigates a pr ocedur e and pr ovides the necessar y par ameter v alues.
102
Callee
A pr ocedur e that ex ecutes a series of st ored instructions based on par ameters pr ovided b y the caller and then r eturns contr ol to the caller .
103
Program counter (PC )
The r egister containing the addr ess of the instruction in the pr ogram being executed. (PC)
104
Program counter (PC )
The r egister containing the addr ess of the instruction in the pr ogram being executed.
105
Stack
A data structur e for spilling r egisters or ganiz ed as a last-in- rst-out queue.
106
Stack pointer
A v alue denoting the most r ecently allocated addr ess in a stack that shows wher e registers should be spilled or wher e old r egister v alues can be found. In LEGv8, it is r egister SP.
107
Push
Add element t o stack.
108
Pop
Remo ve element fr om stack.
109
Global pointer
The r egister that is r eser ved to point t o the static ar ea.
110
Procedur e frame
Also called activ ation r ecor d. The segment of the stack containing a pr ocedur e's saved registers and local v ariables. activ ation r ecor d
111
Procedur e frame
Also called activ ation r ecor d. The segment of the stack containing a pr ocedur e's saved registers and local v ariables.
112
Frame pointer
A v alue denoting the location of the sa ved registers and local v ariables for a giv en procedur e.
113
Text segment
The segment of a UNI X object le that contains the machine language code for routines in the sour ce le. 3.9 Communicating with people American Standar d Code for Information Inter change Most computers t oday off er 8-bit b ytes t o represent char acters, with the American Standar d Code for Information Inter change (ASCII) being the r epresentation that nearly e veryone follows. ASCII Most computers t oday off er 8-bit b ytes t o represent char acters, with the American Standar d Code for Information Inter change (ASCII) being the r epresentation that nearly e veryone follows. 3.10 LEGv8 addr essing for wide immediates and addr esses
114
PC-r elativ e addr essing
An addr essing r egime in which the addr ess is the sum of the program counter (PC) and a constant in the instruction.
115
Immediate addr essing
The oper and is a constant within the instruction itself.
116
Register addr essing
The oper and is a r egister . Base addr essing / displacement addr essing Base addr essing / displacement addr essing: The oper and is at the memor y location whose addr ess is the sum of a r egister and a constant in the instruction.
117
PC-r elativ e addr essing
The br anch addr ess is the sum of the PC and a constant in the instruction.
118
Addr essing mode
One of se veral addr essing r egimes delimited b y their v aried use of oper ands and/or addr esses. 3.11 Parallelism and instructions: synchr onization
119
Data r ace
T wo memor y accesses form a data r ace if the y are from diff erent thr eads t o same location, at least one is a write, and the y occur one after another . 3.12 Translating and star ting a pr ogram
120
Assembly language
A symbolic language that can be tr anslated int o binar y machine language.
121
Pseudoinstruction
A common v ariation of assembly language instructions often tr eated as if it were an instruction in its own right.
122
Symbol table
A table that matches names of labels t o the addr esses of the memor y wor ds that instructions occup y.
123
Linker
Also called link edit or. A systems pr ogram that combines independently assembled machine language pr ograms and r esolv es all undened labels int o an ex ecutable le. link edit or
124
Linker
Also called link edit or. A systems pr ogram that combines independently assembled machine language pr ograms and r esolv es all undened labels int o an ex ecutable le. Executable le Executable le: A functional pr ogram in the format of an object le that contains no unr esolv ed references. It can contain symbol tables and debugging information. A " stripped ex ecutable " does not contain that information. Relocation information ma y be included for the loader .
125
Loader
A systems pr ogram that places an object pr ogram in main memor y so that it is r eady t o execute.
126
Dynamically link ed libr aries (DLLs)
Libr ary routines that ar e link ed to a pr ogram during ex ecution.
127
Dynamically link ed libr aries (DLLs)
Libr ary routines that ar e link ed to a pr ogram during ex ecution.
128
Java bytecode
Instruction fr om an instruction set designed t o interpr et Ja va programs.
129
Java Vir tual Machine (JVM)
The pr ogram that interpr ets Ja va bytecodes.
130
Java Vir tual Machine (JVM)
The pr ogram that interpr ets Ja va bytecodes.
131
Just In Time compiler (JI T)
The name commonly giv en to a compiler that oper ates at runtime, translating the interpr eted code segments int o the nativ e code of the computer .
132
Just In Time compiler (JI T)
The name commonly giv en to a compiler that oper ates at runtime, translating the interpr eted code segments int o the nativ e code of the computer . 3.15 Advanced material: Compiling C and interpr eting Ja va
133
Loop-unr olling
A technique t o get mor e per formance fr om loops that access arr ays, in which multiple copies of the loop body ar e made and instructions fr om diff erent iter ations ar e scheduled together .
134
Public
A Ja va keywor d that allows a method t o be inv oked b y any other method.
135
Protected
A Ja va keywor d that r estricts inv ocation of a method t o other methods in that package.
136
Package
Basically a dir ectory that contains a gr oup of r elated classes.
137
Static method
A method that applies t o the whole class r ather t o an individual object. It is unrelated t o static in C. 3.18 Real stuff: x86 instructions
138
Gener al-purpose r egister (GPR)
A r egister that can be used for addr esses or for data with vir tually any instruction.
139
Gener al-purpose r egister (GPR)
A r egister that can be used for addr esses or for data with vir tually any instruction. 3.22 Historical perspectiv e and fur ther r eading
140
Accumulat or
Ar chaic term for r egister . On-line use of it as a synonym for "r egister " is a fairly reliable indication that the user has been ar ound quite a while.
141
Load-st ore architectur e
Also called r egister-r egister ar chitectur e. An instruction set ar chitectur e in which all oper ations ar e between r egisters and data memor y ma y only be accessed via loads or stores. register-r egister
142
Load-st ore architectur e
Also called r egister-r egister ar chitectur e. An instruction set ar chitectur e in which all oper ations ar e between r egisters and data memor y ma y only be accessed via loads or stores. 4. Arithmetic for Computers 4.2 Addition and subtr action
143
Arithmetic Logic Unit (AL U)
Har dwar e that per forms addition, subtr action, and usually logical oper ations such as AND and OR.
144
Arithmetic Logic Unit (AL U)
Har dwar e that per forms addition, subtr action, and usually logical oper ations such as AND and OR. 4.4 Division
145
Dividend
A number being divided.
146
Divisor
A number that the dividend is divided b y.
147
Quotient
The primar y result of a division; a number that when multiplied b y the divisor and added to the r emainder pr oduces the dividend.
148
Remainder
The secondar y result of a division; a number that when added t o the pr oduct of the quotient and the divisor pr oduces the dividend. 4.5 Floating point Scientic notation Scientic notation: A notation that r enders numbers with a single digit t o the left of the decimal point.
149
Normaliz ed
A number in oating-point notation that has no leading 0s.
150
Floating point
Computer arithmetic that r epresents numbers in which the binar y point is not x ed.
151
Fraction
The v alue, gener ally between 0 and 1, placed in the fr action eld. The fr action is also called the mantissa .
152
Exponent
In the numerical r epresentation system of oating-point arithmetic, the v alue that is placed in the exponent eld. Overow (oating-point) Overow (oating-point): A situation in which a positiv e exponent becomes t oo lar ge to t in the exponent eld. Underow (oating-point) Underow (oating-point): A situation in which a negativ e exponent becomes t oo lar ge to t in the exponent eld.
153
Double pr ecision
A oating-point v alue r epresented in a 64-bit doublewor d.
154
Single pr ecision
A oating-point v alue r epresented in a 32-bit wor d.
155
Exception
Also called interrupt. An unscheduled e vent that disrupts pr ogram ex ecution; used t o detect o verow . interrupt
156
Exception
Also called interrupt. An unscheduled e vent that disrupts pr ogram ex ecution; used t o detect o verow .
157
Interrupt
An ex ception that comes fr om outside of the pr ocesser . (Some ar chitectur es use the term interrupt for all ex ceptions. NaN IEEE 754 e ven has a symbol for the r esult of inv alid oper ations, such as 0/0 or subtr acting innity from innity . This symbol is NaN, for Not a Number . Not a Number IEEE 754 e ven has a symbol for the r esult of inv alid oper ations, such as 0/0 or subtr acting innity from innity . This symbol is NaN, for Not a Number .
158
Guar d
The rst of two extr a bits k ept on the right during intermediate calculations of oating-point numbers; used t o impr ove rounding accur acy.
159
Round
Method t o mak e the intermediate oating-point r esult t the oating-point format; the goal is typically t o nd the near est number that can be r epresented in the format. It is also the name of the second of two extr a bits k ept on the right during intermediate oating-point calculations, which impr oves rounding accur acy.
160
Units in the last place (ulp)
The number of bits in err or in the least signicant bits of the signicand between the actual number and the number that can be r epresented. ulp
161
Units in the last place (ulp)
The number of bits in err or in the least signicant bits of the signicand between the actual number and the number that can be r epresented.
162
Sticky bit
A bit used in r ounding in addition t o guar d and r ound that is set whene ver ther e are nonz ero bits t o the right of the r ound bit.
163
Fused multiply add
A oating-point instruction that per forms both a multiply and an add, but rounds only once after the add. 4.12 Historical perspectiv e and fur ther r eading copr ocessor A copr ocessor is simply an additional chip that acceler ates a por tion of the work of a pr ocessor; in this case, it acceler ated oating-point computation. 5. The Pr ocessor 5.2 Logic design conv entions
164
Combinational element
An oper ational element, such as an AND gate or an AL U.
165
State element
A memor y element, such as a r egister or a memor y.
166
Clocking methodology
The appr oach used t o determine when data is v alid and stable r elativ e to the clock.
167
Edge-trigger ed clocking
A clocking scheme in which all state changes occur on a clock edge.
168
Contr ol signal
A signal used for multiplex or selection or for dir ecting the oper ation of a functional unit; contr asts with a data signal , which contains information that is oper ated on b y a functional unit.
169
Asser ted
The signal is logically high or true.
170
Deasser ted
The signal is logically low or false. 5.3 Building a datapath
171
Datapath element
A unit used t o oper ate on or hold data within a pr ocessor . In the LEGv8 implementation, the datapath elements include the instruction and data memories, the r egister le, the AL U, and adders.
172
Program counter (PC )
The r egister containing the addr ess of the instruction in the pr ogram being executed.
173
Program counter (PC )
The r egister containing the addr ess of the instruction in the pr ogram being executed. Register le Register le: A state element that consists of a set of r egisters that can be r ead and written b y supplying a r egister number t o be accessed.
174
Sign-extend
T o incr ease the siz e of a data item b y replicating the high-or der sign bit of the original data item in the high-or der bits of the lar ger, destination data item.
175
Branch tar get addr ess
The addr ess specied in a br anch, which becomes the new pr ogram counter (PC ) if the br anch is tak en. In the LEGv8 ar chitectur e, the br anch tar get is giv en b y the sum of the offset eld of the instruction and the addr ess of the br anch.
176
Branch tak en
A br anch wher e the br anch condition is satised and the pr ogram counter (PC ) becomes the br anch tar get. All unconditional br anches ar e tak en br anches.
177
Branch not tak en or (untak en br anch)
A br anch wher e the br anch condition is false and the program counter (PC ) becomes the addr ess of the instruction that sequentially follows the br anch. untak en br anch
178
Branch not tak en or (untak en br anch)
A br anch wher e the br anch condition is false and the program counter (PC ) becomes the addr ess of the instruction that sequentially follows the br anch. 5.4 A simple implementation scheme
179
Truth table
F rom logic, a r epresentation of a logical oper ation b y listing all the v alues of the inputs and then in each case showing what the r esulting outputs should be. Don't-car e term Don't-car e term: An element of a logical function in which the output does not depend on the values of all the inputs. Don 't-car e terms ma y be specied in diff erent wa ys.
180
Opcode
The eld that denotes the oper ation and format of an instruction.
181
Single-cy cle implementation
Also called single clock cy cle implementation. An implementation in which an instruction is ex ecuted in one clock cy cle. While easy t o understand, it is t oo slow t o be practical. single clock cy cle implementation
182
Single-cy cle implementation
Also called single clock cy cle implementation. An implementation in which an instruction is ex ecuted in one clock cy cle. While easy t o understand, it is t oo slow t o be practical. 5.5 An o verview of pipelining
183
Pipelining
An implementation technique in which multiple instructions ar e overlapped in execution, much lik e an assembly line.
184
Structur al hazar d
When a planned instruction cannot ex ecute in the pr oper clock cy cle because the har dwar e does not suppor t the combination of instructions that ar e set t o execute.
185
Data hazar d
Also called a pipeline data hazar d. When a planned instruction cannot ex ecute in the proper clock cy cle because data that is needed t o execute the instruction ar e not y et available. pipeline data hazar d
186
Data hazar d
Also called a pipeline data hazar d. When a planned instruction cannot ex ecute in the proper clock cy cle because data that is needed t o execute the instruction ar e not y et available.
187
Forwarding
Also called b ypassing. A method of r esolving a data hazar d by retrie ving the missing data element fr om internal buff ers r ather than waiting for it t o arriv e from pr ogrammer-visible registers or memor y. bypassing
188
Forwarding
Also called b ypassing. A method of r esolving a data hazar d by retrie ving the missing data element fr om internal buff ers r ather than waiting for it t o arriv e from pr ogrammer-visible registers or memor y.
189
Load-use data hazar d
A specic form of data hazar d in which the data being loaded b y a load instruction has not y et become a vailable when it is needed b y another instruction.
190
Pipeline stall
Also called bubble. A stall initiated in or der t o resolv e a hazar d. bubble
191
Pipeline stall
Also called bubble. A stall initiated in or der t o resolv e a hazar d.
192
Contr ol hazar d
Also called br anch hazar d. When the pr oper instruction cannot ex ecute in the proper pipeline clock cy cle because the instruction that was f etched is not the one that is needed; that is, the ow of instruction addr esses is not what the pipeline expected. branch hazar d
193
Contr ol hazar d
Also called br anch hazar d. When the pr oper instruction cannot ex ecute in the proper pipeline clock cy cle because the instruction that was f etched is not the one that is needed; that is, the ow of instruction addr esses is not what the pipeline expected.
194
Branch pr ediction
A method of r esolving a br anch hazar d that assumes a giv en outcome for the conditional br anch and pr oceeds fr om that assumption r ather than waiting t o ascer tain the actual outcome.
195
Latency (pipeline)
The number of stages in a pipeline or the number of stages between two instructions during ex ecution. pipeline
196
Latency (pipeline)
The number of stages in a pipeline or the number of stages between two instructions during ex ecution. 5.7 Data hazar ds: F orwarding v ersus stalling nop
197
Nop
An instruction that does no oper ation t o change state. 5.8 Contr ol hazar ds
198
Flush
T o discar d instructions in a pipeline, usually due t o an unexpected e vent.
199
Dynamic br anch pr ediction
Pr ediction of br anches at runtime using runtime information.
200
Branch pr ediction buff er
Also called br anch hist ory table. A small memor y that is index ed b y the lower por tion of the addr ess of the br anch instruction and that contains one or mor e bits indicating whether the br anch was r ecently tak en or not. branch hist ory table
201
Branch pr ediction buff er
Also called br anch hist ory table. A small memor y that is index ed b y the lower por tion of the addr ess of the br anch instruction and that contains one or mor e bits indicating whether the br anch was r ecently tak en or not.
202
Branch tar get buff er
A structur e that caches the destination PC or destination instruction for a branch. It is usually or ganiz ed as a cache with tags, making it mor e costly than a simple pr ediction buffer.
203
Corr elating pr edict or
A br anch pr edict or that combines local beha vior of a par ticular br anch and global information about the beha vior of some r ecent number of ex ecuted br anches.
204
Tournament br anch pr edict or
A br anch pr edict or with multiple pr edictions for each br anch and a selection mechanism that chooses which pr edict or to enable for a giv en br anch. 5.9 Exceptions
205
Exception
Also called interrupt. An unscheduled e vent that disrupts pr ogram ex ecution; used t o detect o verow . interrupt
206
Exception
Also called interrupt. An unscheduled e vent that disrupts pr ogram ex ecution; used t o detect o verow .
207
Interrupt
An ex ception that comes fr om outside of the pr ocessor . (Some ar chitectur es use the term interrupt for all ex ceptions.).
208
Vectored interrupt
An interrupt for which the addr ess t o which contr ol is tr ansf erred is determined by the cause of the ex ception.
209
Impr ecise interrupt
Also called impr ecise ex ception. Interrupts or ex ceptions in pipelined computers that ar e not associated with the exact instruction that was the cause of the interrupt or exception. impr ecise ex ception
210
Impr ecise interrupt
Also called impr ecise ex ception. Interrupts or ex ceptions in pipelined computers that ar e not associated with the exact instruction that was the cause of the interrupt or exception.
211
Precise interrupt
Also called pr ecise ex ception. An interrupt or ex ception that is alwa ys associated with the corr ect instruction in pipelined computers. precise ex ception
212
Precise interrupt
Also called pr ecise ex ception. An interrupt or ex ception that is alwa ys associated with the corr ect instruction in pipelined computers. 5.10 Parallelism via instructions
213
Instruction-le vel par allelism
The par allelism among instructions.
214
Multiple issue
A scheme wher eby multiple instructions ar e launched in one clock cy cle.
215
Static multiple issue
An appr oach t o implementing a multiple-issue pr ocessor wher e many decisions ar e made b y the compiler befor e execution.
216
Dynamic multiple issue
An appr oach t o implementing a multiple-issue pr ocessor wher e many decisions ar e made during ex ecution b y the pr ocessor .
217
Issue slots
The positions fr om which instructions could issue in a giv en clock cy cle; b y analogy , these corr espond t o positions at the star ting blocks for a sprint.
218
Speculation
An appr oach wher eby the compiler or pr ocessor guesses the outcome of an instruction t o remo ve it as a dependence in ex ecuting other instructions.
219
Issue pack et
The set of instructions that issues t ogether in one clock cy cle; the pack et ma y be determined statically b y the compiler or dynamically b y the pr ocessor .
220
Very Long Instruction W ord (VLIW )
A style of instruction set ar chitectur e that launches many oper ations that ar e dened t o be independent in a single wide instruction, typically with many separ ate opcode elds.
221
Very Long Instruction W ord (VLIW )
A style of instruction set ar chitectur e that launches many oper ations that ar e dened t o be independent in a single wide instruction, typically with many separ ate opcode elds.
222
Use latency
Number of clock cy cles between a load instruction and an instruction that can use the r esult of the load without stalling the pipeline.
223
Loop unr olling
A technique t o get mor e per formance fr om loops that access arr ays, in which multiple copies of the loop body ar e made and instructions fr om diff erent iter ations ar e scheduled together .
224
Register r enaming
The r enaming of r egisters b y the compiler or har dwar e to remo ve antidependences.
225
Antidependence
Also called name dependence. An or dering for ced b y the r euse of a name, typically a r egister , rather than b y a true dependence that carries a v alue between two instructions. name dependence
226
Antidependence
Also called name dependence. An or dering for ced b y the r euse of a name, typically a r egister , rather than b y a true dependence that carries a v alue between two instructions.
227
Superscalar
An adv anced pipelining technique that enables the pr ocessor t o execute mor e than one instruction per clock cy cle b y selecting them during ex ecution.
228
Dynamic pipeline scheduling
Har dwar e suppor t for r eordering the or der of instruction ex ecution t o avoid stalls.
229
Commit unit
The unit in a dynamic or out-of-or der ex ecution pipeline that decides when it is saf e to release the r esult of an oper ation t o programmer-visible r egisters and memor y.
230
Reser vation station
A buff er within a functional unit that holds the oper ands and the oper ation.
231
Reor der buff er
The buff er that holds r esults in a dynamically scheduled pr ocessor until it is saf e to store the r esults t o memor y or a r egister .
232
Out-of-or der ex ecution
A situation in pipelined ex ecution when an instruction block ed fr om executing does not cause the following instructions t o wait.
233
In-or der commit
A commit in which the r esults of pipelined ex ecution ar e written t o the programmer visible state in the same or der that instructions ar e fetched. 5.11 Real stuff: The ARM Cor tex-A53 and Intel Cor e i7 pipelines
234
Micr oarchitectur e
The or ganization of the pr ocessor , including the major functional units, their inter connection, and contr ol.
235
Architectur al registers
The instruction set of visible r egisters of a pr ocessor; for example, in LEGv8, these ar e the 32 integer and 32 oating-point r egisters. 5.15 Concluding r emarks
236
Instruction latency
The inher ent ex ecution time for an instruction. 6. Memor y Hier archy 6.1 Introduction
237
Tempor al locality
The locality principle stating that if a data location is r eferenced then it will tend to be r eferenced again soon.
238
Spatial locality
The locality principle stating that if a data location is r eferenced, data locations with nearb y addr esses will tend t o be r eferenced soon.
239
Memor y hier archy
A structur e that uses multiple le vels of memories; as the distance fr om the processor incr eases, the siz e of the memories and the access time both incr ease.
240
Block (or line)
The minimum unit of information that can be either pr esent or not pr esent in a cache.
241
Hit rate
The fr action of memor y accesses found in a le vel of the memor y hier archy.
242
Miss r ate
The fr action of memor y accesses not found in a le vel of the memor y hier archy.
243
Hit time
The time r equir ed to access a le vel of the memor y hier archy, including the time needed t o determine whether the access is a hit or a miss.
244
Miss penalty
The time r equir ed to fetch a block int o a le vel of the memor y hier archy fr om the lower le vel, including the time t o access the block, tr ansmit it fr om one le vel to the other , inser t it in the le vel that experienced the miss, and then pass the block t o the r equest or. 6.2 Memor y technologies Flash memor y Flash memor y is a type of electrically er asable pr ogrammable r ead-only memor y (EEPROM).
245
Track
One of thousands of concentric cir cles that mak e up the sur face of a magnetic disk.
246
Sect or
One of the segments that mak e up a tr ack on a magnetic disk; a sect or is the smallest amount of information that is r ead or written on a disk.
247
Seek
The pr ocess of positioning a r ead/write head o ver the pr oper tr ack on a disk.
248
Rotational latency
Also called r otational dela y. The time r equir ed for the desir ed sect or of a disk t o rotate under the r ead/write head; usually assumed t o be half the r otation time. rotational dela y
249
Rotational latency
Also called r otational dela y. The time r equir ed for the desir ed sect or of a disk t o rotate under the r ead/write head; usually assumed t o be half the r otation time. 6.3 The basics of caches
250
Direct-mapped cache
A cache structur e in which each memor y location is mapped t o exactly one location in the cache.
251
Tag
A eld in a table used for a memor y hier archy that contains the addr ess information r equir ed to identify whether the associated block in the hier archy corr esponds t o a r equested wor d.
252
Valid bit
A eld in the tables of a memor y hier archy that indicates that the associated block in the hierarchy contains v alid data.
253
Cache miss
A r equest for data fr om the cache that cannot be lled because the data ar e not present in the cache.
254
Write-thr ough
A scheme in which writes alwa ys update both the cache and the next lower le vel of the memor y hier archy, ensuring that data ar e alwa ys consistent between the two.
255
Write buff er
A queue that holds data while the data ar e waiting t o be written t o memor y.
256
Write-back
A scheme that handles writes b y updating v alues only t o the block in the cache, then writing the modied block t o the lower le vel of the hier archy when the block is r eplaced.
257
Split cache
A scheme in which a le vel of the memor y hier archy is composed of two independent caches that oper ate in par allel with each other , with one handling instructions and one handling data. 6.4 Measuring and impr oving cache per formance
258
Fully associativ e cache
A cache structur e in which a block can be placed in any location in the cache.
259
Set-associativ e cache
A cache that has a x ed number of locations (at least two) wher e each block can be placed.
260
Least r ecently used (LRU)
A r eplacement scheme in which the block r eplaced is the one that has been unused for the longest time.
261
Least r ecently used (LRU)
A r eplacement scheme in which the block r eplaced is the one that has been unused for the longest time.
262
Multile vel cache
A memor y hier archy with multiple le vels of caches, r ather than just a cache and main memor y.
263
Global miss r ate
The fr action of r eferences that miss in all le vels of a mutlile vel cache.
264
Local miss r ate
The fr action of r eferences t o one le vel of a cache that miss; used in multile vel hierarchies. 6.5 Dependable memor y hier archy
265
Error detection code
A code that enables the detection of an err or in data, but not the pr ecise location and, hence, corr ection of the err or. 6.7 Virtual memor y
266
Virtual memor y
A technique that uses main memor y as a " cache " for secondar y storage.
267
Physical addr ess
An addr ess in main memor y.
268
Protection
A set of mechanisms for ensuring that multiple pr ocesses sharing the pr ocessor , memor y, or I/O de vices cannot inter fere, intentionally or unintentionally , with one another b y reading or writing each other 's data. These mechanisms also isolate the oper ating system fr om a user pr ocess.
269
Page fault
An e vent that occurs when an accessed page is not pr esent in main memor y.
270
Virtual addr ess
An addr ess that corr esponds t o a location in vir tual space and is tr anslated b y addr ess mapping t o a physical addr ess when memor y is accessed.
271
Addr ess tr anslation
Also called addr ess mapping. The pr ocess b y which a vir tual addr ess is mapped t o an addr ess used t o access memor y. addr ess mapping
272
Addr ess tr anslation
Also called addr ess mapping. The pr ocess b y which a vir tual addr ess is mapped t o an addr ess used t o access memor y.
273
Segmentation
A v ariable-siz e addr ess mapping scheme in which an addr ess consists of two parts: a segment number , which is mapped t o a physical addr ess, and a segment offset.
274
Page table
The table containing the vir tual t o physical addr ess tr anslations in a vir tual memor y system. The table, which is st ored in memor y, is typically index ed b y the vir tual page number; each entry in the table contains the physical page number for that vir tual page if the page is curr ently in memor y.
275
Swap space
The space on the disk r eser ved for the full vir tual memor y space of a pr ocess.
276
Reference bit
Also called use bit or access bit. A eld that is set whene ver a page is accessed and that is used t o implement LRU or other r eplacement schemes. use bit
277
Reference bit
Also called use bit or access bit. A eld that is set whene ver a page is accessed and that is used t o implement LRU or other r eplacement schemes. access bit
278
Reference bit
Also called use bit or access bit. A eld that is set whene ver a page is accessed and that is used t o implement LRU or other r eplacement schemes.
279
Translation-lookaside buff er (TLB)
A cache that k eeps tr ack of r ecently used addr ess mappings t o try to avoid an access t o the page table.
280
Virtually addr essed cache
A cache that is accessed with a vir tual addr ess r ather than a physical addr ess.
281
Aliasing
A situation in which two addr esses access the same object; it can occur in vir tual memor y when ther e are two vir tual addr esses for the same physical page.
282
Physically addr essed cache
A cache that is addr essed b y a physical addr ess.
283
Super visor mode
Also called k ernal mode. A mode indicating that a running pr ocess is an oper ating system pr ocess. kernal mode
284
Super visor mode
Also called k ernal mode. A mode indicating that a running pr ocess is an oper ating system pr ocess.
285
System call
A special instruction that tr ansf ers contr ol from user mode t o a dedicated location in super visor code space, inv oking the ex ception mechanism in the pr ocess.
286
Context switch
A changing of the internal state of the pr ocessor t o allow a diff erent pr ocess t o use the pr ocessor that includes sa ving the state needed t o return t o the curr ently ex ecuting process.
287
Exception enable
Also called interrupt enable. A signal or action that contr ols whether the pr ocess responds t o an ex ception or not; necessar y for pr eventing the occurr ence of ex ceptions during inter vals befor e the pr ocessor has saf ely sa ved the state needed t o restar t.
288
Restar table instruction
An instruction that can r esume ex ecution after an ex ception is r esolv ed without the ex ception 's aff ecting the r esult of the instruction. 6.8 A common fr amework for memor y hier archy
289
Three Cs model
A cache model in which all cache misses ar e classied int o one of thr ee categories: compulsor y misses, capacity misses, and conict misses.
290
Compulsor y miss
Also called cold-star t miss. A cache miss caused b y the rst access t o a block that has ne ver been in the cache. cold-star t miss
291
Compulsor y miss
Also called cold-star t miss. A cache miss caused b y the rst access t o a block that has ne ver been in the cache.
292
Capacity miss
A cache miss that occurs because the cache, e ven with full associativity , cannot contain all the blocks needed t o satisfy the r equest. Conict miss Conict miss: Also called collision miss. A cache miss that occurs in a set-associativ e or dir ect- mapped cache when multiple blocks compete for the same set and that ar e eliminated in a fully associativ e cache of the same siz e. collision miss Conict miss: Also called collision miss. A cache miss that occurs in a set-associativ e or dir ect- mapped cache when multiple blocks compete for the same set and that ar e eliminated in a fully associativ e cache of the same siz e. 6.9 Using a nite-state machine t o contr ol a simple cache
293
Finite-state machine
A sequential logic function consisting of a set of inputs and outputs, a next- state function that maps the curr ent state and the inputs t o a new state, and an output function that maps the curr ent state and possibly the inputs t o a set of asser ted outputs.
294
Next-state machine
A combinational function that, giv en the inputs and the curr ent state, determines the next state of a nite-state machine. 6.10 Parallelism and memor y hier archies: Cache coher ence
295
False sharing
When two unr elated shar ed variables ar e located in the same cache block and the full block is ex changed between pr ocessors e ven though the pr ocessors ar e accessing diff erent variables. 6.11 Parallelism and memor y hier archy: Redundant arr ays of inexpensiv e disks
296
Redundant arr ays of inexpensiv e disks (RAID)
An or ganization of disks that uses an arr ay of small and inexpensiv e disks so as t o incr ease both per formance and r eliability .
297
Redundant arr ays of inexpensiv e disks (RAID)
An or ganization of disks that uses an arr ay of small and inexpensiv e disks so as t o incr ease both per formance and r eliability .
298
Striping
Allocation of logically sequential blocks t o separ ate disks t o allow higher per formance than a single disk can deliv er.
299
Mirroring
W riting identical data t o multiple disks t o incr ease data a vailability .
300
Protection gr oup
The gr oup of data disks or blocks that shar e a common check disk or block.
301
Hot-swapping
Replacing a har dwar e component while the system is running.
302
Standb y spar es
Reser ve har dwar e resour ces that can immediately tak e the place of a failed component. 6.13 Real stuff: The ARM Cor tex-A53 and Intel Cor e i7 memor y hier archies
303
Nonblocking cache
A cache that allows the pr ocessor t o mak e references t o the cache while the cache is handling an earlier miss. 6.17 Concluding r emarks
304
Prefetching
A technique in which data blocks needed in the futur e are brought int o the cache early by using special instructions that specify the addr ess of the block. 7. Parallel Pr ocessors 7.1 Introduction mathematical model A mathematical model is an equation used t o represent data. variable A variable is a symbol r epresenting data. Ex: T is a common v ariable r epresenting temper ature. independent v ariable An independent v ariable is a symbol r epresenting an input. Ex: x is a common independent variable. A simple wa y to remember independent v ariables ar e for input is 'I' for independent and input.[c] . dependent v ariable A dependent v ariable is a symbol r epresenting an output. Ex: y is a common dependent v ariable. parameters Parameters ar e added and adjusted. Ex: Data modeled t o a line has two par ameters, m and b, such that y = m ∙ x + b. trendline A trendline is a model that captur es changes in data.
305
Multipr ocessor
A computer system with at least two pr ocessors. This computer is in contr ast t o a unipr ocessor , which has one, and is incr easingly har d to nd t oday.
306
Parallel pr ocessing pr ogram
A single pr ogram that runs on multiple pr ocessors simultaneously .
307
Cluster
A set of computers connected o ver a local ar ea network that function as a single lar ge multipr ocessor .
308
Multicor e micr oprocessor
A micr oprocessor containing multiple pr ocessors (" cores") in a single integr ated cir cuit. Vir tually all micr oprocessors t oday in deskt ops and ser vers ar e multicor e.
309
Shar ed memor y multipr ocessor (SMP)
A par allel pr ocessor with a single physical addr ess space.
310
Shar ed memor y multipr ocessor (SMP)
A par allel pr ocessor with a single physical addr ess space. 7.2 The diculty of cr eating par allel pr ocessing pr ograms
311
Strong scaling
Speed-up achie ved on a multipr ocessor without incr easing the siz e of the pr oblem.
312
Weak scaling
Speed-up achie ved on a multipr ocessor while incr easing the siz e of the pr oblem propor tionally t o the incr ease in the number of pr ocessors. 7.3 SISD , MIMD , SIMD , SPMD , and v ector SISD SISD or single instruction str eam, single data str eam: A unipr ocessor . single instruction str eam SISD or single instruction str eam, single data str eam: A unipr ocessor . single data str eam SISD or single instruction str eam, single data str eam: A unipr ocessor . MIMD MIMD or multiple instruction str eams, multiple data str eams: A multipr ocessor . multiple instruction str eams MIMD or multiple instruction str eams, multiple data str eams: A multipr ocessor . multiple data str eams MIMD or multiple instruction str eams, multiple data str eams: A multipr ocessor . SPMD SPMD or single pr ogram, multiple data str eams: The conv entional MIMD pr ogramming model, wher e a single pr ogram runs acr oss all pr ocessors. single pr ogram, multiple data str eams SPMD or single pr ogram, multiple data str eams: The conv entional MIMD pr ogramming model, wher e a single pr ogram runs acr oss all pr ocessors. SIMD SIMD or single instruction str eam, multiple data str eams: The same instruction is applied t o many data str eams, as in a v ector pr ocessor . single instruction str eam, multiple data str eams SIMD or single instruction str eam, multiple data str eams: The same instruction is applied t o many data str eams, as in a v ector pr ocessor .
313
Data-le vel par allelism
P arallelism achie ved b y per forming the same oper ation on independent data.
314
Vector lane
One or mor e vector functional units and a por tion of the v ector register le. Inspir ed b y lanes on highwa ys that incr ease tr ac speed, multiple lanes ex ecute v ector oper ations simultaneously . 7.4 Hardwar e multithr eading
315
Hardwar e multithr eading
Incr easing utilization of a pr ocessor b y switching t o another thr ead when one thr ead is stalled.
316
Thread
A thr ead includes the pr ogram counter , the r egister state, and the stack. It is a lightweight process; wher eas thr eads commonly shar e a single addr ess space, pr ocesses don 't.
317
Process
A pr ocess includes one or mor e thr eads, the addr ess space, and the oper ating system state. Hence, a pr ocess switch usually inv okes the oper ating system, but not a thr ead switch.
318
Fine-gr ained multithr eading
A v ersion of har dwar e multithr eading that implies switching between threads after e very instruction.
319
Coarse-gr ained multithr eading
A v ersion of har dwar e multithr eading that implies switching between thr eads only after signicant e vents, such as a last-le vel cache miss.
320
Simultaneous multithr eading (SM T)
A v ersion of multithr eading that lowers the cost of multithr eading b y utilizing the r esour ces needed for multiple issue, dynamically scheduled micr oarchitectur e.
321
Simultaneous multithr eading (SM T)
A v ersion of multithr eading that lowers the cost of multithr eading b y utilizing the r esour ces needed for multiple issue, dynamically scheduled micr oarchitectur e. 7.5 Multicor e and other shar ed memor y multipr ocessors
322
Uniform memor y access (UM A)
A multipr ocessor in which latency t o any wor d in main memor y is about the same no matter which pr ocessor r equests the access.
323
Uniform memor y access (UM A)
A multipr ocessor in which latency t o any wor d in main memor y is about the same no matter which pr ocessor r equests the access.
324
Nonuniform memor y access (NUM A)
A type of single addr ess space multipr ocessor in which some memor y accesses ar e much faster than others depending on which pr ocessor asks for which wor d.
325
Nonuniform memor y access (NUM A)
A type of single addr ess space multipr ocessor in which some memor y accesses ar e much faster than others depending on which pr ocessor asks for which wor d.
326
Synchr onization
The pr ocess of coor dinating the beha vior of two or mor e processes, which ma y be running on diff erent pr ocessors.
327
Lock
A synchr onization de vice that allows access t o data t o only one pr ocessor at a time.
328
Reduction
A function that pr ocesses a data structur e and r eturns a single v alue.
329
OpenMP
An API for shar ed memor y multipr ocessing in C, C++, or F ortran that runs on UNI X and Micr osoft platforms. It includes compiler dir ectiv es, a libr ary, and runtime dir ectiv es. 7.7 Clusters, war ehouse scale computers, and other message-passing multipr ocessors
330
Message passing
Communicating between multiple pr ocessors b y explicitly sending and receiving information.
331
Send message r outine
A r outine used b y a pr ocessor in machines with priv ate memories t o pass a message t o another pr ocessor .
332
Receiv e message r outine
A r outine used b y a pr ocessor in machines with priv ate memories t o accept a message fr om another pr ocessor .
333
Clusters
Collections of computers connected via I/O o ver standar d network switches t o form a message-passing multipr ocessor .
334
Softwar e as a ser vice (SaaS)
Rather than selling softwar e that is installed and run on cust omers' own computers, softwar e is run at a r emote site and made a vailable o ver the Internet typically via a Web inter face t o cust omers. SaaS cust omers ar e char ged based on use v ersus on ownership.
335
Softwar e as a ser vice (SaaS)
Rather than selling softwar e that is installed and run on cust omers' own computers, softwar e is run at a r emote site and made a vailable o ver the Internet typically via a Web inter face t o cust omers. SaaS cust omers ar e char ged based on use v ersus on ownership. 7.8 Introduction t o multipr ocessor network t opologies
336
Network bandwidth
Informally , the peak tr ansf er rate of a network; can r efer to the speed of a single link or the collectiv e transf er rate of all links in the network.
337
Bisection bandwidth
The bandwidth between two equal par ts of a multipr ocessor . This measur e is for a worst case split of the multipr ocessor .
338
Fully connected network
A network that connects pr ocessor-memor y nodes b y supplying a dedicated communication link between e very node.
339
Multistage network
A network that supplies a small switch at each node.
340
Crossbar network
A network that allows any node t o communicate with any other node in one pass thr ough the network. 7.9 Communicating t o the outside world: Cluster networking Memor y-mapped I/O Memor y-mapped I/O: An I/O scheme in which por tions of the addr ess space ar e assigned t o I/O devices, and r eads and writes t o those addr esses ar e interpr eted as commands t o the I/O de vice.
341
Direct memor y access (DM A)
A mechanism that pr ovides a de vice contr oller with the ability t o transf er data dir ectly t o or fr om the memor y without inv olving the pr ocessor .
342
Direct memor y access (DM A)
A mechanism that pr ovides a de vice contr oller with the ability t o transf er data dir ectly t o or fr om the memor y without inv olving the pr ocessor . Interrupt-driv en I/O Interrupt-driv en I/O: An I/O scheme that emplo ys interrupts t o indicate t o the pr ocessor that an I/O device needs attention.
343
Device driv er
A pr ogram that contr ols an I/O de vice that is attached t o the computer .
344
Polling
The pr ocess of periodically checking the status of an I/O de vice t o determine the need t o service the de vice. 7.10 Multipr ocessor benchmarks and per formance models
345
Pthreads
A UNI X API for cr eating and manipulating thr eads. It is structur ed as a libr ary.
346
Arithmetic intensity
The r atio of oating-point oper ations in a pr ogram t o the number of data bytes accessed b y a pr ogram fr om main memor y. 8. Appendix A 8.2 Gates, truth tables, and logic equations
347
Asser ted signal
A signal that is (logically) true, or 1.
348
Deasser ted signal
A signal that is (logically) false, or 0.
349
Combinational logic
A logic system whose blocks do not contain memor y and hence compute the same output giv en the same input.
350
Sequential logic
A gr oup of logic elements that contain memor y and hence whose v alue depends on the input as well as the curr ent contents of the memor y.
351
Gate
A de vice that implements basic logic functions, such as AND or OR. universal NOR and N AND gates ar e called univ ersal, since any logic function can be built using this one gate type.
352
NOR gate
An inv erted OR gate.
353
NAND gate
An inv erted AND gate. 8.3 Combinational logic encoder An encoder that per forms the inv erse function of a decoder , taking 2 inputs and pr oducing an n- bit output.
354
Decoder
A logic block that has an n-bit input and 2 outputs, wher e only one output is asser ted for each input combination. multiplex or A multiplex or might mor e properly be called a select or, since its output is one of the inputs that is selected b y a contr ol. select or A multiplex or might mor e properly be called a select or, since its output is one of the inputs that is selected b y a contr ol.
355
Select or value
Also called contr ol value. The contr ol signal that is used t o select on of the input values of a multiplex or as the output of the multiplex or. contr ol value
356
Select or value
Also called contr ol value. The contr ol signal that is used t o select on of the input values of a multiplex or as the output of the multiplex or.
357
Sum of pr oducts
A form of logical r epresentation that emplo ys a logical sum (OR) of pr oducts (terms joined using the AND oper ator).
358
Programmable logic arr ay (PL A)
A structur ed-logic element composed of a set of inputs and corresponding input complements and two stages of logic: the rst gener ates pr oduct terms of the inputs and input complements, and the second gener ates sum terms of the pr oduct terms. Hence, PL As implement logic functions as a sum of pr oducts.
359
Programmable logic arr ay (PL A)
A structur ed-logic element composed of a set of inputs and corresponding input complements and two stages of logic: the rst gener ates pr oduct terms of the inputs and input complements, and the second gener ates sum terms of the pr oduct terms. Hence, PL As implement logic functions as a sum of pr oducts.
360
Minterms
Also called pr oduct terms. A set of logic inputs joined b y conjunction (AND oper ations); the pr oduct terms form the rst logic stage of the programmable logic arr ay (PLA). product terms
361
Minterms
Also called pr oduct terms. A set of logic inputs joined b y conjunction (AND oper ations); the pr oduct terms form the rst logic stage of the programmable logic arr ay (PLA).
362
Read-only memor y (ROM)
A memor y whose contents ar e designated at cr eation time, after which the contents can only be r ead. ROM is used as structur ed logic t o implement a set of logic functions b y using the terms in the logic functions as addr ess inputs and the outputs as bits in each wor d of the memor y.
363
Read-only memor y (ROM)
A memor y whose contents ar e designated at cr eation time, after which the contents can only be r ead. ROM is used as structur ed logic t o implement a set of logic functions b y using the terms in the logic functions as addr ess inputs and the outputs as bits in each wor d of the memor y.
364
Programmable ROM (PROM)
A form of r ead-only memor y that can be pr ogrammed when a designer knows its contents.
365
Programmable ROM (PROM)
A form of r ead-only memor y that can be pr ogrammed when a designer knows its contents.
366
Bus
In logic design, a collection of data lines that is tr eated t ogether as a single logical signal; also, a shar ed collection of lines with multiple sour ces and uses. 8.4 Using a har dwar e description language
367
Harware description language
A pr ogramming language for describing har dwar e, used for gener ating simulations of a har dwar e design and also as input t o synthesis t ools that can gener ate actual har dwar e.
368
Verilog
One of the two most common har dwar e description languages.
369
VHDL
One of the two most common har dwar e description languages. Beha vioral specication Beha vioral specication: Describes how a digital system oper ates functionally . Structur al specication Structur al specication: Describes how a digital system is or ganiz ed in terms of a hier archical connection of elements.
370
Hardwar e synthesis t ools
Computer-aided design softwar e that can gener ate a gate-le vel design based on beha vioral descriptions of a digital system.
371
Wire
In V erilog, species a combinational signal.
372
Reg
In V erilog, a r egister .
373
Sensitivity list
The list of signals that species when an always block should be r e-evaluated.
374
Blocking assignment
In V erilog, an assignment that completes befor e the ex ecution of the next statement.
375
Nonblocking assignment
An assignment that continues after e valuating the right-hand side, assigning the left-hand side the v alue only after all right-hand sides ar e evaluated. 8.7 Clocks
376
Edge-trigger ed clocking
A clocking scheme in which all state changes occur on a clock edge.
377
Clocking methodology
The appr oach used t o determine when data ar e valid and stable r elativ e to the clock.
378
State element
A memor y element.
379
Synchr onous system
A memor y system that emplo ys clocks and wher e data signals ar e read only when the clock indicates that the signal v alues ar e stable. Register le Register le: A state element that consists of a set of r egisters that can be r ead and written b y supplying a r egister number t o be accessed. 8.8 Memor y elements: Flip-ops, latches, and r egisters Flip-op Flip-op: A memor y element for which output is equal t o the v alue of the st ored state inside the element and for which the internal state is changed only on a clock edge.
380
Latch
A memor y element in which the output is equal t o the v alue of the st ored state inside the element and the state is changed whene ver the appr opriate inputs change and the clock is asser ted. D ip-op D ip-op: A ip-op with on data input that st ores the v alue of that input signal in the internal memor y when the clock edge occurs.
381
Setup time
The minimum time that the input t o a memor y device must be v alid befor e the clock edge.
382
Hold time
The minimum time during which the input must be v alid after the clock edge. 8.9 Memor y elements: SRAMs and DRAMs
383
Static r andom access memor y (SRAM)
A memor y wher e data is st ored statically (as in ip-ops) rather than dynamically (as in DRAM). SRAMs ar e faster than DRAMs, but less dense and mor e expensiv e per bit.
384
Static r andom access memor y (SRAM)
A memor y wher e data is st ored statically (as in ip-ops) rather than dynamically (as in DRAM). SRAMs ar e faster than DRAMs, but less dense and mor e expensiv e per bit.
385
Error detection code
A code that enables the detection of an err or in data, but not the pr ecise location and, hence, corr ection of the err or. 8.10 Finite-state machines
386
Finite-state machine
A sequential logic function consisting of a set of inputs and outputs, a next- state function that maps the curr ent state and the inputs t o a new state, and an output function that maps the curr ent state and possibly the inputs t o a set of asser ted outputs.
387
Next-state function
A combinational function that, giv en the inputs and the curr ent state, determines the next state of a nite-state machine. 8.11 Timing methodologies
388
Clock sk ew
The diff erence in absolute time between the times when two state elements see a clock edge. level-sensitiv e clocking
389
Level-sensitiv e clocking
A timing methodology in which state changes occur at either high or low clock le vels but ar e not instantaneous as such changes ar e in edge-trigger ed designs.
390
Metastability
A situation that occurs if a signal is sampled when it is not stable for the r equir ed setup and hold times, possibly causing the sampled v alue t o fall in the indeterminate r egion between a high and low v alue.
391
Synchr onizer failur e
A situation in which a ip-op enters a metastable state and wher e some logic blocks r eading the output of the ip-op see a 0 while others see a 1.
392
Propagation time
The time r equir ed for an input t o a ip-op t o propagate t o the outputs of the ip-op. 8.12 Field pr ogrammable de vices
393
Field pr ogrammable de vices (FPD)
An integr ated cir cuit containing combinational logic, and possibly memor y devices, that ar e congur able b y the end user .
394
Field pr ogrammable de vices (FPD)
An integr ated cir cuit containing combinational logic, and possibly memor y devices, that ar e congur able b y the end user .
395
Programmable logic de vice (PLD)
An integr ated cir cuit containing combinational logic whose function is congur ed b y the end user .
396
Programmable logic de vice (PLD)
An integr ated cir cuit containing combinational logic whose function is congur ed b y the end user .
397
Field pr ogrammable gate arr ay (FPGA)
A congur able integr ated cir cuit containing both combinational logic blocks and ip-ops.
398
Field pr ogrammable gate arr ay (FPGA)
A congur able integr ated cir cuit containing both combinational logic blocks and ip-ops.
399
Simple pr ogrammable logic de vice (SPLD)
Pr ogrammable logic de vice, usually containing either a single P AL or PL A.
400
Simple pr ogrammable logic de vice (SPLD)
Pr ogrammable logic de vice, usually containing either a single P AL or PL A.
401
Programmable arr ay logic (P AL)
Contains a pr ogrammable and-plane followed b y a x ed or-plane.
402
Programmable arr ay logic (P AL)
Contains a pr ogrammable and-plane followed b y a x ed or-plane.
403
Antifuse
A structur e in an integr ated cir cuit that when pr ogrammed mak es a permanent connection between two wir es.
404
Lookup table (L UTs)
In a eld pr ogrammable de vice, the name giv en to the cells because the y consist of a small amount of logic and RAM.
405
Lookup table (L UTs)
In a eld pr ogrammable de vice, the name giv en to the cells because the y consist of a small amount of logic and RAM. 9. Appendix B 9.1 Introduction
406
Graphics pr ocessing unit (GPU)
A pr ocessor optimiz ed for 2D and 3D gr aphics, video, visual computing, and displa y.
407
Graphics pr ocessing unit (GPU)
A pr ocessor optimiz ed for 2D and 3D gr aphics, video, visual computing, and displa y.
408
Visual computing
A mix of gr aphics pr ocessing and computing that lets y ou visually inter act with computed objects via gr aphics, images, and video.
409
Heter ogeneous system
A system combining diff erent pr ocessor types. A PC is a heter ogeneous CPU-GPU system.
410
Application pr ogramming inter face (API)
A set of function and data structur e denitions pr oviding an inter face t o a libr ary of functions.
411
Application pr ogramming inter face (API)
A set of function and data structur e denitions pr oviding an inter face t o a libr ary of functions.
412
GPU computing
Using a GPU for computing via a par allel pr ogramming language and API.
413
GPGPU
Using a GPU for gener al-purpose computation via a tr aditional gr aphics API and gr aphics pipeline.
414
CUD A
A scalable par allel pr ogramming model and language based on C/C++. It is a par allel programming platform for GPUs and multicor e CPUs. 9.2 GPU system ar chitectur es
415
PCI-Expr ess (PCIe)
A standar d system I/O inter connect that uses point-t o-point links. Links ha ve a congur able number of lanes and bandwidth.
416
PCI-Expr ess (PCIe)
A standar d system I/O inter connect that uses point-t o-point links. Links ha ve a congur able number of lanes and bandwidth. Unied memor y architectur e Unied memor y architectur e (UM A): A system ar chitectur e in which the CPU and GPU shar e a common system memor y. UMA Unied memor y architectur e (UM A): A system ar chitectur e in which the CPU and GPU shar e a common system memor y.
417
AGP
An extended v ersion of the original PCI I/O bus, which pr ovided up t o eight times the bandwidth of the original PCI bus t o a single car d slot. Its primar y purpose was t o connect graphics subsystems int o PC systems. 9.3 Programming GPUs
418
OpenGL
An open-standar d graphics API.
419
Direct3D
A gr aphics API dened b y Micr osoft and par tners.
420
Textur e
A 1D , 2D, or 3D arr ay that suppor ts sampled and lter ed lookups with interpolated coor dinates.
421
Shader
A pr ogram that oper ates on gr aphics data such as a v ertex or a pix el fragment.
422
Shading language
A gr aphics r endering language, usually ha ving a dataow or str eaming programming model.
423
Kernel
A pr ogram or function for one thr ead, designed t o be ex ecuted b y many thr eads.
424
Thread block
A set of concurr ent thr eads that ex ecute the same thr ead pr ogram and ma y cooper ate t o compute a r esult.
425
Grid
A set of thr ead blocks that ex ecute the same k ernel pr ogram.
426
Synchr onization barrier
Thr eads wait at a synchr onization barrier until all thr eads in the thr ead block arriv e at the barrier .
427
Atomic memor y oper ation
A memor y read, modify , write oper ation sequence that completes without any inter vening access.
428
Local memor y
Per-thr ead local memor y priv ate t o the thr ead.
429
Shar ed memor y
Per-block memor y shar ed b y all thr eads of the block.
430
Global memor y
Per-application memor y shar ed b y all thr eads.
431
Single-pr ogram multiple data (SPMD)
A style of par allel pr ogramming model in which all thr eads execute the same pr ogram. SPMD thr eads typically coor dinate with barrier synchr onization.
432
Single-pr ogram multiple data (SPMD)
A style of par allel pr ogramming model in which all thr eads execute the same pr ogram. SPMD thr eads typically coor dinate with barrier synchr onization. 9.4 Multithr eaded multipr ocessor ar chitectur e
433
Single-instruction multiple-thr ead (SIM T)
A pr ocessor ar chitectur e that applies one instruction t o multiple independent thr eads in par allel.
434
Warp
The set of par allel thr eads that ex ecute the same instruction t ogether in a SIM T architectur e.
435
Cooper ative thr ead arr ay (CTA)
A set of concurr ent thr eads that ex ecutes the same thr ead program and ma y cooper ate t o compute a r esult. A GPU C TA implements a CUD A thr ead block. 9.6 Floating point arithmetic
436
Half pr ecision
A 16-bit binar y oating-point format, with 1 sign bit, 5-bit exponent, 10-bit fr action, and an implied integer bit.
437
Multiply-add (M AD)
A single oating-point instruction that per forms a compound oper ation: multiplication followed b y addition.
438
Special function unit (SFU)
A har dwar e unit that computes special functions and interpoles planar attributes.
439
MIP-map
A Latin phr ase multum in par vo, or much in a small space. A MIP-map contains precalculated images of diff erent r esolutions, used t o incr ease r endering speed and r educe artifacts.