x86 architecture coherency
serializing instructions and events |
doc? #1 |
instruction or event |
486? #2 |
use? #3 |
description and comments |
yes |
IRET(D) |
yes |
yes |
may be privileged under some circumstances |
yes |
RSM |
yes |
yes |
can only be executed from within SMM |
yes |
CPUID |
no |
no |
non-privileged |
yes |
LGDT M[w+y] |
no |
no |
privileged |
yes |
LIDT M[w+y] |
no |
no |
privileged |
yes |
LLDT Mw/Rv |
no |
no |
privileged |
yes |
LTR Mw/Rv |
no |
no |
privileged |
yes |
INVLPG M |
no |
no |
privileged, implemented badly in Intel P5 and P54 |
yes |
INVLPGA |
no |
no |
privileged |
yes |
INVEPT G[wy],Mo |
no |
no |
privileged |
yes |
INVVPID G[wy],Mo |
no |
no |
privileged |
yes |
INVD |
no |
no |
privileged, does not write back cache contents |
yes |
WBINVD |
yes |
yes |
privileged |
no |
LMSW Ew |
yes |
no |
privileged |
yes |
MOV CR0,Ry |
yes |
yes |
privileged |
yes |
MOV CR2,Ry |
no |
no |
privileged |
yes |
MOV CR3,Ry |
no |
no |
privileged |
yes |
MOV CR4,Ry |
no |
no |
privileged |
yes |
MOV CR8,Ry |
n/a |
yes |
privileged |
yes |
MOV DR0...7,Ry |
yes |
yes |
privileged |
yes |
WRMSR #6 |
n/a |
yes |
privileged |
yes |
SWAPGS |
n/a |
yes |
privileged |
no |
exceptions #4 |
yes |
no |
incl. INT Ib, INT1, INT3, INTO (taken), BOUND (taken) |
no |
interrupts #4 |
yes |
no |
INTR, NMI, SMI, INIT |
no |
branches |
yes |
no |
CALL Ap/Mp/Ev/Jz, RET, RET Iw, RETF, RETF Iw
JMP Ap/Mp/Ev/Jz/Jb, Jcc Jb/Jz (taken), JrCXZ
LOOP, LOOPE, LOOPNE
|
no |
segment loads |
no |
no |
LDS/LES Gz,Mp and LFS/LGS/LSS Gv,Mp
POP DS/ES/FS/GS/SS
MOV Sw,Mw/Rv
|
no |
A20M# changes #5 |
yes |
no |
KBC or PS/2 |
notes |
descriptions |
#1 |
Only the documented instructions and events are guaranteed to be serializing on future x86 processors. |
#2 |
Serializing instructions and events were defined and documented starting with Intel's P5-core processors. |
#3 |
To ensure backward compatibility it is not recommended to use these. (This depends on #1 and #2.) |
#4 |
The nature of the x86 architecture implies that these instructions and events are serializing. |
#5 |
In case of an OUTS instruction serialization isn't guaranteed until all iterations have been completed. |
#6 |
A WRMSR to one of the x2APIC MSRs (0000_0800h...0000_0BFFh) is not guaranteed to be serializing. |
TLB invalidation |
- writes to CR3 #1
- changes to CR3 during a task switch #1
- changes to CR0.PE
- changes to CR0.PG #2
- changes to CR0.WP #5
- changes to CR4.PSE (if PSE is supported) #2
- changes to CR4.PGE (if PGE is supported)
- changes to CR4.PAE (if PAE is supported)
- changes to CR4.PCIDE (if PCID is supported)
- changes to CR4.SMEP (if SMEP is supported)
- INVLPG M instruction
- RSM instruction
- writes to MTRRs (if MTRRs are supported)
- writes to PAT MSR (if PAT is supported)
- writes to APIC_BASE MSR (if APIC is supported)
- SMI #3
- A20M# changes #4
|
notes |
descriptions |
#1 |
global entries remain if PGE is supported |
#2 |
not on Intel P5-core processors |
#3 |
if TLB is used to implement SMM remapping |
#4 |
if TLB is used to implement A20M# |
#5 |
if implemented at TLB fill (rather than lookup) |
|
PDPTE-to-PDPTR reloading |
- writes to CR3 #1
- changes to CR3 during a task switch #1, #2
- a 0-to-1 change of CR0.PG while CR4.PAE=1 #3
- a 0-to-1 change of CR4.PAE while CR0.PG=1 #3
- changes to CR4.PSE (if PSE is supported) #4
- changes to CR4.PGE (if PGE is supported) #4
- changes to CR4.SMEP (if SMEP is supported) #4
- RSM instruction #5
|
notes |
descriptions |
#1 |
while CR0.PG=1 and CR4.PAE=1 |
#2 |
Intel P4-core processors always reload |
#3 |
a 1-to-0 change should set the PDPTRs to zero |
#4 |
unnecessary, but done by Intel processors |
#5 |
SMI should save the PDPTRs in the SSM,
and then set them to zero (P6 doesn't, but P4 does)
|
|
store buffer draining |
- processor exceptions and external interrupts
- serializing instructions (see above)
- I/O instructions (IN, (REP) INS, OUT, (REP) OUTS)
- LOCKed operations (explicit and implicit)
- SFENCE instruction (if SSE is supported)
- MFENCE instruction (if SSE2 is supported)
- reads from memory regions that are marked UC
|
MTRR conflicts |
|
UC |
WC |
WT |
WP |
WB |
UC |
UC |
UC |
UC |
UC |
UC |
WC |
UC |
WC |
UC |
WC |
UC |
WT |
UC |
UC |
WT |
WT |
WT |
WP |
UC |
WC |
WT |
WP |
WT |
WB |
UC |
UC |
WT |
WT |
WB |
note |
Because the behavior of the gray cases is
reserved, it should not be relied upon.
In essence the processor computes the
logical AND of all the involved memory
types, as shown in this table.
|
|
MTRR-PAT conflicts |
|
PAT |
UC |
WC |
WT |
WP |
WB |
UC- |
M T R R s |
UC |
UC_M |
#1 |
UC_M |
UC_M |
UC_M |
UC_M |
WC |
UC_P |
WC |
UC |
UC |
WC |
WC |
WT |
UC_P |
WC |
WT |
#2 |
WT |
UC_P |
WP |
UC_P |
WC |
#2 |
WP |
WP |
UC_P |
WB |
UC_P |
WC |
WT |
WP |
WB |
UC_P |
notes |
descriptions |
#1 |
From an architectural standpoint the processor
should honour MTRR_DEF_TYPE.E. While set
to 0 the MTRRs are disabled, memory should
be treated as UC, and PAT=WC should not be
able to take precedence; thus the result should
be UC_M. However, while set to 1 the MTRRs
are enabled, and PAT=WC should be able to
take precedence; thus the result should be WC.
While Intel processors do honour the E bit, AMD
processors do not -- for them PAT=WC always
takes predence; thus their result is always WC.
|
#2 |
Because the behavior of this particular case is
reserved, it shouldn't be relied upon. While Intel
processors compute the logical AND, resulting
in WT, AMD processors treat this combination
as explicitly illegal, resulting in UC.
|
|
|