x86 architecture
coherency




 
serializing instructions and events
 
doc? #1 instruction or event 486? #2 use? #3 description and comments
yes SERIALIZE n/a yes :-) the modern and (frankly) strongly preferred choice!
yes CPUID no yes :-| may be privileged, slow, modifies regs, variable latency
Intel: no
AMD: yes
MFENCE n/a no not a serializing instruction (beware: LFENCE + SFENCE != MFENCE)
instead: a fencing instruction for memory
no SFENCE n/a no not a serializing instruction
instead: a fencing instruction for stores
no LFENCE n/a no not a serializing instruction (beware: very specific verbiage used below)
instead: a fencing instruction for loads
 
does not execute until all prior instructions have completed locally
and no later instruction begins execution until LFENCE completes
 
AMD processors may treat LFENCE as "dispatch serializing" – the
instruction and all previous instructions are forced to retire before
the next instruction is executed (cf. CPUID 8000_0021h.EAX.2=1)
yes IRET yes yes may be privileged
yes RSM yes yes may be privileged, and requires execution within SMM
yes LGDT no no privileged
yes LIDT no no privileged
yes LLDT no no privileged
yes LTR no no privileged
yes INVLPG no no privileged, and implemented badly in Intel P5 and P54
Intel: n/a
AMD: yes
INVLPGA n/a no privileged, and requires AMD SVM support
Intel: n/a
AMD: no
INVLPGB n/a no privileged, and requires AMD INVLPGB support
Intel: n/a
AMD: yes
TLBSYNC n/a no privileged, and requires AMD INVLPGB support
Intel: yes
AMD: n/a
INVEPT n/a no privileged, and requires Intel VMX support
Intel: yes
AMD: n/a
INVVPID n/a no privileged, and requires Intel VMX support
Intel: no
AMD: yes
INVPCID n/a no privileged, and requires INVPCID support
yes INVD no no privileged, and may not write back the cache contents
yes WBINVD yes yes privileged, slow
yes WBNOINVD n/a no privileged, and requires WBNOINVD support
no LMSW yes no privileged
yes MOV to CR0 yes yes privileged
yes MOV to CR2 no no privileged
yes MOV to CR3 no no privileged
yes MOV to CR4 no no privileged
Intel: no
AMD: yes
MOV to CR8 n/a no privileged
yes MOV to DR0...7 yes no privileged
yes WRMSR #4 n/a no privileged
yes EENTER aka ENCLU[2] n/a no requires Intel SGX
no exceptions #5 yes no faults, traps, aborts
no interrupts #5 yes no INTR, NMI, SMI, INIT
no branches yes no CALL Ap/Mp/Ev/Jz, RET, RET Iw, RETF, RETF Iw
JMP Ap/Mp/Ev/Jz/Jb, Jcc Jb/Jz (taken), JrCXZ
LOOP, LOOPE, LOOPNE
also: INT Ib, INT1, INT3, INTO (taken), BOUND (taken)
no segment loads no no LDS/LES/LFS/LGS/LSS Gv,Mp
POP DS/ES/FS/GS/SS
MOV Sw,Mw/Rv
no SWAPGS n/a no privileged, and unavailable if CR4.FRED=1
no A20M# changes #6 yes no KBC or PS/2
notes descriptions
#1 Only the documented instructions and events are guaranteed to be serializing on future x86 processors.
#2 Serializing instructions and events were defined and documented starting with Intel's P5-core processors.
#3 To ensure backward compatibility it is (not) recommended to use these. (This does depend on #1 and #2.)
#4 Any WRMSR to one of the following MSRs is not serializing: x2APIC MSRs (0000_0800h...0000_0BFFh),
TSC_DEADLINE (6E0h), PKRS (6E1h), HWP_REQUEST (774h), SPEC_CTRL (48h), PRED_CMD (49h),
FLUSH_CMD (10Bh), TSX_CTRL (122h), and UARCH_MISC_CTRL (1B01h).
#5 The nature of the x86 architecture implies that these instructions and events are serializing nevertheless.
#6 In case of an OUTS instruction serialization is not guaranteed until all its iterations have been completed.



 
TLB invalidation
 

  • writes to CR3 #1
  • changes to CR3 during a task switch #1

  • changes to CR0.PE
  • changes to CR0.PG #2
  • changes to CR0.WP #5

  • changes to CR4.PSE #2
  • changes to CR4.PGE
  • changes to CR4.PAE

  • changes to EFER.NXE do not invalidate

  • a 1-to-0 change of CR4.PCIDE

  • a 0-to-1 change of CR4.SMEP

  • changes to CR4.SMAP do not invalidate
  • changes to EFLAGS.AC do not invalidate
  • CLAC and STAC instructions do not invalidate
  • note: SMAP must be implemented at TLB lookup (rather than fill), and AC
    changes must be memory fencing, to achieve guaranteed SMAP behavior

  • a 0-to-1 change of CR4.PKE
  • a 1-to-0 change of CR4.PKE must not invalidate
  • a 0-to-1 change of CR4.PKS
  • a 1-to-0 change of CR4.PKS must not invalidate

  • INVLPG instruction
  • INVLPGA instruction (with AMD SVM)
  • INVLPGB instruction (with AMD INVLPGB)
  • INVEPT instruction (with Intel VMX)
  • INVVPID instruction (with Intel VMX)
  • INVPCID instruction (with INVPCID)

  • RSM instruction

  • writes to MTRRs
  • writes to PAT MSR
  • writes to APIC_BASE MSR

  • SMI #3

  • A20M# changes #4

  • Remote Action Requests
notes descriptions
#1 global entries remain if PGE is supported
#2 not on Intel P5-core processors
#3 if TLB is used to implement SMM remapping
#4 if TLB is used to implement A20M#
#5 if implemented at TLB fill (rather than lookup)
 
PDPTE-to-PDPTR reloading
 

  • writes to CR3 #1
  • changes to CR3 during a task switch #1, #2

  • a 0-to-1 change of CR0.PG while CR4.PAE=1 #3
  • a 0-to-1 change of CR4.PAE while CR0.PG=1 #3

  • RSM instruction #4

  • Intel VMX entry: via incoming CR3, or from VMCS
  • Intel VMX exit: via incoming CR3

  • AMD SVM entry: via incoming CR3
  • AMD SVM exit: via incoming CR3

  • changes to CR0.CD #5
  • changes to CR0.NW #5

  • changes to CR4.PSE #5
  • changes to CR4.PGE #5
  • changes to CR4.SMEP #5
notes descriptions
#1 while CR0.PG=1 and CR4.PAE=1
#2 Intel P4-core processors always reload
#3 1-to-0 change should set the PDPTRs to 0
#4 SMI should save the PDPTRs in the SSM and set them to 0 (P6 does not, P4 does)
#5 unnecessary, but done by Intel processors



 
store buffer draining
 

  • processor exceptions and external interrupts
  • serializing instructions (see above)
  • I/O instructions (IN, (REP) INS, OUT, (REP) OUTS)
  • LOCKed operations (explicit and implicit)
  • SFENCE instruction (if SSE is supported)
  • MFENCE instruction (if SSE2 is supported)
  • reads from memory regions that are marked as UC



 
MTRR conflicts
 
  UC WC WT WP WB
UC UC UC UC UC UC
WC UC WC UC WC UC
WT UC UC WT WT WT
WP UC WC WT WP WT
WB UC UC WT WT WB
note Because the behavior of the gray cases is reserved, it should not be relied upon. In essence the processor computes the logical AND of all the involved memory types, as shown in this table.
 
MTRR-PAT conflicts
 
  PAT
UC WC WT WP WB UC-
M
T
R
R
s
UC UC_M #1 UC_M UC_M UC_M UC_M
WC UC_P WC UC UC WC WC
WT UC_P WC WT #2 WT UC_P
WP UC_P WC #2 WP WP UC_P
WB UC_P WC WT WP WB UC_P
notes descriptions
#1 From an architectural standpoint the processor should honour MTRR_DEF_TYPE.E. While set to 0 the MTRRs are disabled, memory should be treated as UC, and PAT=WC should not be able to take precedence; thus the result should be UC_M. However, while set to 1 the MTRRs are enabled, and PAT=WC should be able to take precedence; thus the result should be WC. While Intel processors do honour the E bit, AMD processors do not -- for them PAT=WC always takes predence; thus their result is always WC.
#2 Because the behavior of this particular case is reserved, it shouldn't be relied upon. While Intel processors compute the logical AND, resulting in WT, AMD processors treat this combination as explicitly illegal, resulting in UC.



main page

© 1996-2024 by Christian Ludloff. All rights reserved. Use at your own risk.