sandpile.org -- x86 architecture -- coherency

x86 architecture
coherency

serializing instructions and events
doc? ^#1	instruction or event	486? ^#2	use? ^#3	description and comments

yes	SERIALIZE	n/a	yes :-)	the modern and (frankly) strongly preferred choice!

yes	CPUID	no	yes :-\|	may be privileged, slow, modifies regs, variable latency

Intel: no AMD: yes	~~MFENCE~~	n/a	no	not a serializing instruction (beware: LFENCE + SFENCE != MFENCE) instead: a fencing instruction for memory
no	~~SFENCE~~	n/a	no	not a serializing instruction instead: a fencing instruction for stores
no	~~LFENCE~~	n/a	no	not a serializing instruction (beware: very specific verbiage used below) instead: a fencing instruction for loads does not execute until all prior instructions have completed locally and no later instruction begins execution until LFENCE completes AMD processors may treat LFENCE as "dispatch serializing" – the instruction and all previous instructions are forced to retire before the next instruction is executed (cf. CPUID 8000_0021h.EAX.2=1)

yes	IRET	yes	yes	may be privileged
yes	RSM	yes	yes	may be privileged, and requires execution within SMM

yes	LGDT	no	no	privileged
yes	LIDT	no	no	privileged
yes	LLDT	no	no	privileged
yes	LTR	no	no	privileged

yes	INVLPG	no	no	privileged, and implemented badly in Intel P5 and P54
Intel: n/a AMD: yes	INVLPGA	n/a	no	privileged, and requires AMD SVM support
Intel: n/a AMD: no	INVLPGB	n/a	no	privileged, and requires AMD INVLPGB support
Intel: n/a AMD: yes	TLBSYNC	n/a	no	privileged, and requires AMD INVLPGB support
Intel: yes AMD: n/a	INVEPT	n/a	no	privileged, and requires Intel VMX support
Intel: yes AMD: n/a	INVVPID	n/a	no	privileged, and requires Intel VMX support
Intel: no AMD: yes	INVPCID	n/a	no	privileged, and requires INVPCID support

yes	INVD	no	no	privileged, and may not write back the cache contents
yes	WBINVD	yes	yes	privileged, slow
yes	WBNOINVD	n/a	no	privileged, and requires WBNOINVD support

no	LMSW	yes	no	privileged
yes	MOV to CR0	yes	yes	privileged
yes	MOV to CR2	no	no	privileged
yes	MOV to CR3	no	no	privileged
yes	MOV to CR4	no	no	privileged
Intel: no AMD: yes	MOV to CR8	n/a	no	privileged
yes	MOV to DR0...7	yes	no	privileged
yes	WRMSR ^#4	n/a	no	privileged

yes	EENTER aka ENCLU[2]	n/a	no	requires Intel SGX

no	exceptions ^#5	yes	no	faults, traps, aborts
no	interrupts ^#5	yes	no	INTR, NMI, SMI, INIT

no	branches	yes	no	CALL Ap/Mp/Ev/Jz, RET, RET Iw, RETF, RETF Iw JMP Ap/Mp/Ev/Jz/Jb, Jcc Jb/Jz (taken), JrCXZ LOOP, LOOPE, LOOPNE also: INT Ib, INT1, INT3, INTO (taken), BOUND (taken)
no	segment loads	no	no	LDS/LES/LFS/LGS/LSS Gv,Mp POP DS/ES/FS/GS/SS MOV Sw,Mw/Rv
no	SWAPGS	n/a	no	privileged, and unavailable if CR4.FRED=1

no	A20M# changes ^#6	yes	no	KBC or PS/2

notes	descriptions

#1	Only the documented instructions and events are guaranteed to be serializing on future x86 processors.
#2	Serializing instructions and events were defined and documented starting with Intel's P5-core processors.
#3	To ensure backward compatibility it is (not) recommended to use these. (This does depend on #1 and #2.)
#4	Any WRMSR to one of the following MSRs is not serializing: x2APIC MSRs (0000_0800h...0000_0BFFh), TSC_DEADLINE (6E0h), PKRS (6E1h), HWP_REQUEST (774h), SPEC_CTRL (48h), PRED_CMD (49h), FLUSH_CMD (10Bh), TSX_CTRL (122h), and UARCH_MISC_CTRL (1B01h).
#5	The nature of the x86 architecture implies that these instructions and events are serializing nevertheless.
#6	In case of an OUTS instruction serialization is not guaranteed until all its iterations have been completed.

TLB invalidation

writes to CR3 ^#1 changes to CR3 during a task switch ^#1 changes to CR0.PE changes to CR0.PG ^#2 changes to CR0.WP ^#5 changes to CR4.PSE ^#2 changes to CR4.PGE changes to CR4.PAE changes to EFER.NXE do not invalidate a 1-to-0 change of CR4.PCIDE a 0-to-1 change of CR4.SMEP changes to CR4.SMAP do not invalidate changes to EFLAGS.AC do not invalidate CLAC and STAC instructions do not invalidate note: SMAP must be implemented at TLB lookup (rather than fill), and AC changes must be memory fencing, to achieve guaranteed SMAP behavior a 0-to-1 change of CR4.PKE a 1-to-0 change of CR4.PKE must not invalidate a 0-to-1 change of CR4.PKS a 1-to-0 change of CR4.PKS must not invalidate INVLPG instruction INVLPGA instruction (with AMD SVM) INVLPGB instruction (with AMD INVLPGB) INVEPT instruction (with Intel VMX) INVVPID instruction (with Intel VMX) INVPCID instruction (with INVPCID) RSM instruction writes to MTRRs writes to PAT MSR writes to APIC_BASE MSR SMI ^#3 A20M# changes ^#4 Remote Action Requests

notes	descriptions

#1	global entries remain if PGE is supported
#2	not on Intel P5-core processors
#3	if TLB is used to implement SMM remapping
#4	if TLB is used to implement A20M#
#5	if implemented at TLB fill (rather than lookup)

PDPTE-to-PDPTR reloading

writes to CR3 ^#1 changes to CR3 during a task switch ^{#1, #2} a 0-to-1 change of CR0.PG while CR4.PAE=1 ^#3 a 0-to-1 change of CR4.PAE while CR0.PG=1 ^#3 RSM instruction ^#4 Intel VMX entry: via incoming CR3, or from VMCS Intel VMX exit: via incoming CR3 AMD SVM entry: via incoming CR3 AMD SVM exit: via incoming CR3 changes to CR0.CD ^#5 changes to CR0.NW ^#5 changes to CR4.PSE ^#5 changes to CR4.PGE ^#5 changes to CR4.SMEP ^#5

notes	descriptions

#1	while CR0.PG=1 and CR4.PAE=1
#2	Intel P4-core processors always reload
#3	1-to-0 change should set the PDPTRs to 0
#4	SMI should save the PDPTRs in the SSM and set them to 0 (P6 does not, P4 does)
#5	unnecessary, but done by Intel processors

store buffer draining

processor exceptions and external interrupts
serializing instructions (see above)
I/O instructions (IN, (REP) INS, OUT, (REP) OUTS)
LOCKed operations (explicit and implicit)
SFENCE instruction (if SSE is supported)
MFENCE instruction (if SSE2 is supported)
reads from memory regions that are marked as UC

MTRR conflicts

	UC	WC	WT	WP	WB
UC	UC	UC	UC	UC	UC
WC	UC	WC	UC	WC	UC
WT	UC	UC	WT	WT	WT
WP	UC	WC	WT	WP	WT
WB	UC	UC	WT	WT	WB

note	Because the behavior of the gray cases is reserved, it should not be relied upon. In essence the processor computes the logical AND of all the involved memory types, as shown in this table.

MTRR-PAT conflicts

		PAT
		UC	WC	WT	WP	WB	UC-
M T R R s	UC	UC_M	#1	UC_M	UC_M	UC_M	UC_M
	WC	UC_P	WC	UC	UC	WC	WC
	WT	UC_P	WC	WT	#2	WT	UC_P
	WP	UC_P	WC	#2	WP	WP	UC_P
	WB	UC_P	WC	WT	WP	WB	UC_P

notes		descriptions

#1		From an architectural standpoint the processor should honour MTRR_DEF_TYPE.E. While set to 0 the MTRRs are disabled, memory should be treated as UC, and PAT=WC should not be able to take precedence; thus the result should be UC_M. However, while set to 1 the MTRRs are enabled, and PAT=WC should be able to take precedence; thus the result should be WC. While Intel processors do honour the E bit, AMD processors do not -- for them PAT=WC always takes predence; thus their result is always WC.
#2		Because the behavior of this particular case is reserved, it shouldn't be relied upon. While Intel processors compute the logical AND, resulting in WT, AMD processors treat this combination as explicitly illegal, resulting in UC.