x86 architecture coherency
serializing instructions and events |
doc? #1 |
instruction or event |
486? #2 |
use? #3 |
description and comments |
|
yes |
SERIALIZE |
n/a |
yes :-) |
the modern and (frankly) strongly preferred choice! |
|
yes |
CPUID |
no |
yes :-| |
may be privileged, slow, modifies regs, variable latency |
|
Intel: no AMD: yes |
MFENCE |
n/a |
no |
not a serializing instruction (beware: LFENCE + SFENCE != MFENCE)
instead: a fencing instruction for memory
|
no |
SFENCE |
n/a |
no |
not a serializing instruction
instead: a fencing instruction for stores
|
no |
LFENCE |
n/a |
no |
not a serializing instruction (beware: very specific verbiage used below)
instead: a fencing instruction for loads
does not execute until all prior instructions have completed locally
and no later instruction begins execution until LFENCE completes
AMD processors may treat LFENCE as "dispatch serializing" – the
instruction and all previous instructions are forced to retire before
the next instruction is executed (cf. CPUID 8000_0021h.EAX.2=1)
|
|
yes |
IRET |
yes |
yes |
may be privileged |
yes |
RSM |
yes |
yes |
may be privileged, and requires execution within SMM |
|
yes |
LGDT |
no |
no |
privileged |
yes |
LIDT |
no |
no |
privileged |
yes |
LLDT |
no |
no |
privileged |
yes |
LTR |
no |
no |
privileged |
|
yes |
INVLPG |
no |
no |
privileged, and implemented badly in Intel P5 and P54 |
Intel: n/a AMD: yes |
INVLPGA |
n/a |
no |
privileged, and requires AMD SVM support |
Intel: n/a AMD: no |
INVLPGB |
n/a |
no |
privileged, and requires AMD INVLPGB support |
Intel: n/a AMD: yes |
TLBSYNC |
n/a |
no |
privileged, and requires AMD INVLPGB support |
Intel: yes AMD: n/a |
INVEPT |
n/a |
no |
privileged, and requires Intel VMX support |
Intel: yes AMD: n/a |
INVVPID |
n/a |
no |
privileged, and requires Intel VMX support |
Intel: no AMD: yes |
INVPCID |
n/a |
no |
privileged, and requires INVPCID support |
|
yes |
INVD |
no |
no |
privileged, and may not write back the cache contents |
yes |
WBINVD |
yes |
yes |
privileged, slow |
yes |
WBNOINVD |
n/a |
no |
privileged, and requires WBNOINVD support |
|
no |
LMSW |
yes |
no |
privileged |
yes |
MOV to CR0 |
yes |
yes |
privileged |
yes |
MOV to CR2 |
no |
no |
privileged |
yes |
MOV to CR3 |
no |
no |
privileged |
yes |
MOV to CR4 |
no |
no |
privileged |
Intel: no AMD: yes |
MOV to CR8 |
n/a |
no |
privileged |
yes |
MOV to DR0...7 |
yes |
no |
privileged |
yes |
WRMSR #4 |
n/a |
no |
privileged |
|
yes |
EENTER aka ENCLU[2] |
n/a |
no |
requires Intel SGX |
|
no |
exceptions #5 |
yes |
no |
faults, traps, aborts |
no |
interrupts #5 |
yes |
no |
INTR, NMI, SMI, INIT |
|
no |
branches |
yes |
no |
CALL Ap/Mp/Ev/Jz, RET, RET Iw, RETF, RETF Iw
JMP Ap/Mp/Ev/Jz/Jb, Jcc Jb/Jz (taken), JrCXZ
LOOP, LOOPE, LOOPNE
also: INT Ib, INT1, INT3, INTO (taken), BOUND (taken)
|
no |
segment loads |
no |
no |
LDS/LES/LFS/LGS/LSS Gv,Mp
POP DS/ES/FS/GS/SS
MOV Sw,Mw/Rv
|
no |
SWAPGS |
n/a |
no |
privileged, and unavailable if CR4.FRED=1 |
|
no |
A20M# changes #6 |
yes |
no |
KBC or PS/2 |
|
notes |
descriptions |
|
#1 |
Only the documented instructions and events are guaranteed to be serializing on future x86 processors. |
#2 |
Serializing instructions and events were defined and documented starting with Intel's P5-core processors. |
#3 |
To ensure backward compatibility it is (not) recommended to use these. (This does depend on #1 and #2.) |
#4 |
Any WRMSR to one of the following MSRs is not serializing: x2APIC MSRs (0000_0800h...0000_0BFFh),
TSC_DEADLINE (6E0h), PKRS (6E1h), HWP_REQUEST (774h), SPEC_CTRL (48h), PRED_CMD (49h),
FLUSH_CMD (10Bh), TSX_CTRL (122h), and UARCH_MISC_CTRL (1B01h).
|
#5 |
The nature of the x86 architecture implies that these instructions and events are serializing nevertheless. |
#6 |
In case of an OUTS instruction serialization is not guaranteed until all its iterations have been completed. |
TLB invalidation |
|
- writes to CR3 #1
- changes to CR3 during a task switch #1
- changes to CR0.PE
- changes to CR0.PG #2
- changes to CR0.WP #5
- changes to CR4.PSE #2
- changes to CR4.PGE
- changes to CR4.PAE
- changes to EFER.NXE do not invalidate
- a 1-to-0 change of CR4.PCIDE
- a 0-to-1 change of CR4.SMEP
- changes to CR4.SMAP do not invalidate
- changes to EFLAGS.AC do not invalidate
- CLAC and STAC instructions do not invalidate
note: SMAP must be implemented at TLB lookup (rather than fill), and AC
changes must be memory fencing, to achieve guaranteed SMAP behavior
- a 0-to-1 change of CR4.PKE
- a 1-to-0 change of CR4.PKE must not invalidate
- a 0-to-1 change of CR4.PKS
- a 1-to-0 change of CR4.PKS must not invalidate
- INVLPG instruction
- INVLPGA instruction (with AMD SVM)
- INVLPGB instruction (with AMD INVLPGB)
- INVEPT instruction (with Intel VMX)
- INVVPID instruction (with Intel VMX)
- INVPCID instruction (with INVPCID)
- RSM instruction
- writes to MTRRs
- writes to PAT MSR
- writes to APIC_BASE MSR
- SMI #3
- A20M# changes #4
- Remote Action Requests
|
|
notes |
descriptions |
|
#1 |
global entries remain if PGE is supported |
#2 |
not on Intel P5-core processors |
#3 |
if TLB is used to implement SMM remapping |
#4 |
if TLB is used to implement A20M# |
#5 |
if implemented at TLB fill (rather than lookup) |
|
PDPTE-to-PDPTR reloading |
|
- writes to CR3 #1
- changes to CR3 during a task switch #1, #2
- a 0-to-1 change of CR0.PG while CR4.PAE=1 #3
- a 0-to-1 change of CR4.PAE while CR0.PG=1 #3
- RSM instruction #4
- Intel VMX entry: via incoming CR3, or from VMCS
- Intel VMX exit: via incoming CR3
- AMD SVM entry: via incoming CR3
- AMD SVM exit: via incoming CR3
- changes to CR0.CD #5
- changes to CR0.NW #5
- changes to CR4.PSE #5
- changes to CR4.PGE #5
- changes to CR4.SMEP #5
|
|
notes |
descriptions |
|
#1 |
while CR0.PG=1 and CR4.PAE=1 |
#2 |
Intel P4-core processors always reload |
#3 |
1-to-0 change should set the PDPTRs to 0 |
#4 |
SMI should save the PDPTRs in the SSM
and set them to 0 (P6 does not, P4 does)
|
#5 |
unnecessary, but done by Intel processors |
|
store buffer draining |
|
- processor exceptions and external interrupts
- serializing instructions (see above)
- I/O instructions (IN, (REP) INS, OUT, (REP) OUTS)
- LOCKed operations (explicit and implicit)
- SFENCE instruction (if SSE is supported)
- MFENCE instruction (if SSE2 is supported)
- reads from memory regions that are marked as UC
|
MTRR conflicts |
|
|
UC |
WC |
WT |
WP |
WB |
UC |
UC |
UC |
UC |
UC |
UC |
WC |
UC |
WC |
UC |
WC |
UC |
WT |
UC |
UC |
WT |
WT |
WT |
WP |
UC |
WC |
WT |
WP |
WT |
WB |
UC |
UC |
WT |
WT |
WB |
|
note |
Because the behavior of the gray cases is
reserved, it should not be relied upon.
In essence the processor computes the
logical AND of all the involved memory
types, as shown in this table.
|
|
MTRR-PAT conflicts |
|
|
PAT |
UC |
WC |
WT |
WP |
WB |
UC- |
M T R R s |
UC |
UC_M |
#1 |
UC_M |
UC_M |
UC_M |
UC_M |
WC |
UC_P |
WC |
UC |
UC |
WC |
WC |
WT |
UC_P |
WC |
WT |
#2 |
WT |
UC_P |
WP |
UC_P |
WC |
#2 |
WP |
WP |
UC_P |
WB |
UC_P |
WC |
WT |
WP |
WB |
UC_P |
|
notes |
descriptions |
|
#1 |
From an architectural standpoint the processor
should honour MTRR_DEF_TYPE.E. While set
to 0 the MTRRs are disabled, memory should
be treated as UC, and PAT=WC should not be
able to take precedence; thus the result should
be UC_M. However, while set to 1 the MTRRs
are enabled, and PAT=WC should be able to
take precedence; thus the result should be WC.
While Intel processors do honour the E bit, AMD
processors do not -- for them PAT=WC always
takes predence; thus their result is always WC.
|
#2 |
Because the behavior of this particular case is
reserved, it shouldn't be relied upon. While Intel
processors compute the logical AND, resulting
in WT, AMD processors treat this combination
as explicitly illegal, resulting in UC.
|
|
© 1996-2025 by Christian Ludloff. All rights reserved. Use at your own risk.
|