 |
» |
|
|
|
 |
 |
dcpiprofileme(1)
NAME
dcpiprofileme - Using DCPI to collect and view ProfileMe data
COLLECTING PROFILEME SAMPLES
On an Alpha 21264a/EV67 or later processor, tell dcpi to gather ProfileMe data
via the command:
dcpid -slot pm <profile db dir>
This causes dcpi to collect ProfileMe samples. The data for
each sample is decomposed into named "bit" and "counter" values. Note that
some alternate "counter" statistics can be gathered by specifying pm0, pm2,
or pm3 instead of pm. See "COUNTER NAMES AND THEIR MEANINGS" below.
If no -slot option is present, the default on Alpha 21264a/EV67
and later processors is to multiplex between collecting traditional aggregate
cycle samples and collecting ProfileMe (type pm) statistics.
BIT NAMES AND THEIR MEANINGS
- retired
- The instruction retired, i.e., it was not in the shadow of any trap.
It may have caused a mispredict trap, though.
- taken
- The conditional branch was taken. This bit is UNDEFINED for samples
for instructions other than conditional branches or for a conditional
branch when it mispredicts.
- cbrmispredict
- The conditional branch was mispredicted. This bit is clear for
instructions other than conditional branches.
- valid
- The instruction retired and didn't cause a trap.
- nyp
- Stands for "Not Yet Prefetched." Indicates
that when the fetcher asked for the fetch
block containing the instruction, the instruction
was not in the icache and the prefetcher
had not yet initiated an off-chip request
for the instruction.
If nyp is set, the instruction's fetch
block definitely caused an icache miss
stall.
If nyp is clear, the instruction's
fetch block may have still caused an
icache miss stall: the prefetcher may
have made an off-chip request for the
instruction, but the instruction may
not have arrived at the time the fetcher
needed it.
- ldstorder
- Supposed
to indicate
that a replay
trap was
caused by
one of the
following:
- load
store
order
a
younger load issuing before an
older store to the same physical
address
- troll
order
a
younger
load
issuing
before
an
older
store
where
the
dcache
indexes
for
the
physical
addresses
match
but
the
higher
order
address
bits
are
different
- simultaneous
load
and
store
a
load
and
a
store
to
the
same
physical
address
issuing
simultaneously
In
all three
cases,
the younger
instruction
causes
a replay
trap.
- map_stall
- The
instruction
stalled
after
it
was
fetched
and
before
it
was
mapped.
Such
stalls
are
caused
by
a
shortage
of
physical
registers,
integer
issue
queue
space,
floating-point
issue
queue
space,
or
inums.
There
are
80
inums
used
to
track
instructions
that
are
in
flight.
- early_kill
- The
instruction
was
killed
early
in
the
pipeline
--
before
it
entered
an
issue
queue.
- late_kill
- The
instruction
was
killed
late
in
the
pipeline.
COUNTER NAMES AND THEIR MEANINGS
- retdelay
- A lower bound on the number of cycles that the instruction's inum delayed
the advance of the retire pointer. Large values indicate a probable performance
problem. E.g., the retdelay of the first instruction that uses the result
of a load that misses out to memory might have a retdelay of 100. This
statistic is gathered by default and/or when the -slot pm option
is specified.
- inflight
- For instructions that retired without trapping (retired^notrap), this
is approximately the number of cycles that the instruction was inflight.
More precisely, it is -3 plus the number of cycles elapsed from when the
instruction exited the fetch stage until the instruction retired. This
statistic is gathered by default or when one of the -slot pm0, -slot
pm, or -slot pm3 options is specified.
- retires
- For instructions that retired without trapping (retired^notrap), this
is approximately the number of instructions that retired while the profiled
instruction was inflight. This statistic is gathered when either the -slot
pm0 or the -slot pm2 option is present.
- bcmisses
- For instructions that retired without trapping (retired^notrap), this
is approximately the number of bcache misses that occurred while the profiled
instruction was inflight. This statistic is gathered when the -slot
pm2 option is specified.
- replays
- For instructions that retired without trapping (retired^notrap), this
is approximately the number of replay traps that occurred while the profiled
instruction was inflight. This statistic is gathered when the -slot
pm3 option is used.
TRAP BIT NAMES AND THEIR MEANINGS
Exactly one trap bit is set in any given ProfileMe sample.
- notrap
- None of the below
- mispredict
- The instruction caused a JSR/RET/JMP/JMP_COROUTINE or conditional
branch mispredict
- replays
- The instruction caused a replay trap.
- unaligntrap
- The instruction caused an unaligned load or store.
- dtbmiss
- The instruction caused a DTB single miss.
- dtb2miss3
- The instruction
caused a
DTB double
miss. (3-level
page tables)
- dtb2miss4
- The
instruction
caused
a
DTB
double
miss.
(4-level
page
tables)
- itbmiss
- The
instruction
caused
an
Instruction
TLB
miss.
Most
other
bit
and
counter
values
will
be
those
for
the
first
instruction
in
the
ITB
miss
handler.
- arithtrap
- The
instruction
caused
an
arithmetic
trap.
- fpdisabledtrap
- The
instruction
caused
a
floating
point
disabled
trap.
- MT_FPCRtrap
-
- dfaulttrap
- The
instruction
caused
a
Dstream
fault
because
the
virtual
page
is
inaccessible
or
because
the
virtual
address
is
malformed,
i.e.,
not
properly
sign-extended.
| |