Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
HP.com home

HP DCPI tool

» 

DCPI

Site information

» Send us your comments

Installation

» Download DCPI
» Installing DCPI

Product information

» Frequently asked questions
» Documentation
» Publications
customer times newsletter link

dcpiprofileme(1)

NAME

dcpiprofileme - Using DCPI to collect and view ProfileMe data

COLLECTING PROFILEME SAMPLES

On an Alpha 21264a/EV67 or later processor, tell dcpi to gather ProfileMe data via the command:

  dcpid -slot pm <profile db dir>
This causes dcpi to collect ProfileMe samples. The data for each sample is decomposed into named "bit" and "counter" values. Note that some alternate "counter" statistics can be gathered by specifying pm0, pm2, or pm3 instead of pm. See "COUNTER NAMES AND THEIR MEANINGS" below.

If no -slot option is present, the default on Alpha 21264a/EV67 and later processors is to multiplex between collecting traditional aggregate cycle samples and collecting ProfileMe (type pm) statistics.

BIT NAMES AND THEIR MEANINGS

retired
The instruction retired, i.e., it was not in the shadow of any trap. It may have caused a mispredict trap, though.

taken
The conditional branch was taken. This bit is UNDEFINED for samples for instructions other than conditional branches or for a conditional branch when it mispredicts.

cbrmispredict
The conditional branch was mispredicted. This bit is clear for instructions other than conditional branches.

valid
The instruction retired and didn't cause a trap.

nyp
Stands for "Not Yet Prefetched." Indicates that when the fetcher asked for the fetch block containing the instruction, the instruction was not in the icache and the prefetcher had not yet initiated an off-chip request for the instruction.

If nyp is set, the instruction's fetch block definitely caused an icache miss stall.

If nyp is clear, the instruction's fetch block may have still caused an icache miss stall: the prefetcher may have made an off-chip request for the instruction, but the instruction may not have arrived at the time the fetcher needed it.

ldstorder
Supposed to indicate that a replay trap was caused by one of the following:
  • load store order

    a younger load issuing before an older store to the same physical address

  • troll order

    a younger load issuing before an older store where the dcache indexes for the physical addresses match but the higher order address bits are different

  • simultaneous load and store

    a load and a store to the same physical address issuing simultaneously

In all three cases, the younger instruction causes a replay trap.

map_stall
The instruction stalled after it was fetched and before it was mapped. Such stalls are caused by a shortage of physical registers, integer issue queue space, floating-point issue queue space, or inums. There are 80 inums used to track instructions that are in flight.

early_kill
The instruction was killed early in the pipeline -- before it entered an issue queue.

late_kill
The instruction was killed late in the pipeline.

COUNTER NAMES AND THEIR MEANINGS

retdelay
A lower bound on the number of cycles that the instruction's inum delayed the advance of the retire pointer. Large values indicate a probable performance problem. E.g., the retdelay of the first instruction that uses the result of a load that misses out to memory might have a retdelay of 100. This statistic is gathered by default and/or when the -slot pm option is specified.

inflight
For instructions that retired without trapping (retired^notrap), this is approximately the number of cycles that the instruction was inflight. More precisely, it is -3 plus the number of cycles elapsed from when the instruction exited the fetch stage until the instruction retired. This statistic is gathered by default or when one of the -slot pm0, -slot pm, or -slot pm3 options is specified.

retires
For instructions that retired without trapping (retired^notrap), this is approximately the number of instructions that retired while the profiled instruction was inflight. This statistic is gathered when either the -slot pm0 or the -slot pm2 option is present.

bcmisses
For instructions that retired without trapping (retired^notrap), this is approximately the number of bcache misses that occurred while the profiled instruction was inflight. This statistic is gathered when the -slot pm2 option is specified.

replays
For instructions that retired without trapping (retired^notrap), this is approximately the number of replay traps that occurred while the profiled instruction was inflight. This statistic is gathered when the -slot pm3 option is used.

TRAP BIT NAMES AND THEIR MEANINGS

Exactly one trap bit is set in any given ProfileMe sample.

notrap
None of the below

mispredict
The instruction caused a JSR/RET/JMP/JMP_COROUTINE or conditional branch mispredict

replays
The instruction caused a replay trap.

unaligntrap
The instruction caused an unaligned load or store.

dtbmiss
The instruction caused a DTB single miss.

dtb2miss3
The instruction caused a DTB double miss. (3-level page tables)

dtb2miss4
The instruction caused a DTB double miss. (4-level page tables)

itbmiss
The instruction caused an Instruction TLB miss. Most other bit and counter values will be those for the first instruction in the ITB miss handler.

arithtrap
The instruction caused an arithmetic trap.

fpdisabledtrap
The instruction caused a floating point disabled trap.

MT_FPCRtrap

dfaulttrap
The instruction caused a Dstream fault because the virtual page is inaccessible or because the virtual address is malformed, i.e., not properly sign-extended.