2 Introduction¶
Chapter 2 Introduction A System Memory Management Unit (SMMU) performs a task that is analogous to that of an MMU in a PE, translating addresses for DMA requests from system I/O devices before the requests are passed into the system interconnect. It is active for DMA only. Traffic in the other direction, from the system or PE to the device, is managed by other means – for example, the PE MMUs. SMMU Device PE Memory MMU Figure 2.1: System MMU in DMA traffic Translation of DMA addresses might be performed for reasons of isolation or convenience. To associate device traffic with translations and to differentiate different devices behind an SMMU, requests have an extra property, alongside address, read/write, permissions, to identify a stream. Different streams are logically associated with different devices and the SMMU can perform different translations or checks for each stream. In systems with exactly one client device served by an SMMU the concept still stands, but might have only one ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 26
Chapter 2. Introduction stream. Several SMMUs might exist within a system. An SMMU might translate traffic from just one device or a set of devices. The SMMU supports two stages of translation in a similar way to PEs supporting the Virtualization Extensions [2]. Each stage of translation can be independently enabled. An incoming address is logically translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is intended to be used by a software entity to provide isolation or translation to buffers within the entity, for example DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is intended to virtualize device DMA to guest VM address spaces. When both stage 1 and stage 2 are enabled, the translation configuration is called nested. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 27
Chapter 2. Introduction 2.1. History 2.1 History • SMMUv1 supports a modest number of contexts/streams configured using registers, limiting scalability. • SMMUv2 extends SMMUv1 with Armv8-A translation table formats, large addresses, with the same limited number of contexts and streams. SMMUv1 and SMMUv2 map an incoming data stream onto one of many register-based context banks which indicate translation tables and translation configuration to use. The context bank might also indicate a second context bank for nested translation of a second stage (stage 1 and stage 2). The stream is identified using an externally-generated ID supplied with each transaction. A second ID might be supplied to determine the Security state of a stream or group of streams. The use of register-based configuration limits the number of context banks and support of thousands of concurrent contexts is not possible. Because live data streams might potentially present transactions at any time, the available number of contexts limits the number of streams that might be concurrently enabled. For example, a system might have 1000 network interfaces that might all be idle but whose DMA might be triggered by incoming traffic at any time. The streams must be constantly available to function correctly. It is usually not possible to time-division multiplex a context between many devices requiring service. The SMMU programming interface register SMMU_AIDR indicates which SMMU architecture version the SMMU implements, as follows: • If SMMU_AIDR[7:0] == 0x00, the SMMU implements SMMUv3.0. • If SMMU_AIDR[7:0] == 0x01, the SMMU implements SMMUv3.1. • If SMMU_AIDR[7:0] == 0x02, the SMMU implements SMMUv3.2. • If SMMU_AIDR[7:0] == 0x03, the SMMU implements SMMUv3.3. • If SMMU_AIDR[7:0] == 0x04, the SMMU implements SMMUv3.4. • If SMMU_AIDR[7:0] == 0x05, the SMMU implements SMMUv3.5. Unless specified otherwise, all architecture behaviors apply equally to all minor revisions of SMMUv3. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 28
Chapter 2. Introduction 2.2. SMMUv3.0 features 2.2 SMMUv3.0 features SMMUv3 provides feature to complement PCI Express [1] Root Complexes and other potentially large I/O systems by supporting large numbers of concurrent translation contexts. • Memory-based configuration structures to support large numbers of streams. • Implementations might support only stage 1, only stage 2 or both stages of translation. This capability, and other IMPLEMENTATION SPECIFIC options, can be discovered from the register interface. • Up to 16-bit ASIDs. • Up to 16-bit VMIDs [2]. • Address translation and protection according to Armv8.1 [2] Virtual Memory System Architecture. SMMU translation tables shareable with PEs, allowing software the choice of sharing an existing table or creating an SMMU-private table. • 49-bit VA, matching Armv8-A’s 2×48-bit translation table input sizes. Support for the following is optional in an implementation: • Either stage 1 or stage 2. • Stage 1 and 2 support for the VMSAv8-32 LPAE and VMSAv8-64 translation table format. • Secure stream support. • Broadcast TLB invalidation. • Hardware Translation Table Update (HTTU) of Access flag and dirty state of a page. An implementation might support update of the Access flag only, update of both the Access flag and the dirty state of the page, or no HTTU. • PCIe ATS [1] and PRI, when used with compatible Root Complex. • 16KB and 64KB page granules. However, the presence of 64KB page granules at both stage 1 and stage 2 is suggested to align with the PE requirements in the Server Base System Architecture. Because the support of large numbers of streams using in-memory configuration causes the SMMUv3 programming interface to be significantly different from that of SMMUv2 [4], SMMUv3 is not designed to be backward-compatible with SMMUv2. SMMU feature name Description A-profile feature name SMMUv3.0-ASID16 Support for 16-bit ASIDs, see SMMU_IDR0.ASID16. SMMUv3.0-ATS Support for PCIe ATS, see SMMU_IDR0.ATS and [1]. SMMUv3.0-BTM Support for broadcast of TLB maintenance, see SMMU_IDR0.BTM. SMMUv3.0-HAD Support for disabling hierarchical attributes in translation tables, see SMMU_IDR3.HAD. FEAT_HPDS SMMUv3.0-HTTUA SMMUv3.0-HTTUD Support for hardware translation table Access and dirty state, see SMMU_IDR0.HTTU. FEAT_HAFDBS SMMUv3.0-Hyp Hypervisor stage 1 contexts supported, see SMMU_IDR0.HYP. FEAT_VHE EL2 SMMUv3.0-GRAN4K Support for 4KB translation granule, see SMMU_IDR5.GRAN4K. SMMUv3.0-GRAN16K Support for 16KB translation granule, see SMMU_IDR5.GRAN16K. SMMUv3.0-GRAN64K Support for 64KB translation granule, see SMMU_IDR5.GRAN64K. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 29
Chapter 2. Introduction 2.2. SMMUv3.0 features SMMU feature name Description A-profile feature name SMMUv3.0-PRI Support for PCIe Page Request Interface, see SMMU_IDR0.PRI and [1]. SMMUv3.0-S1P Support for Stage 1 translations, see SMMU_IDR0.S1P. SMMUv3.0-S2P Support for Stage 2 translations, see SMMU_IDR0.S2P. SMMUv3.0-SECURE_IMPL Support for Secure and Non-secure streams, see SMMU_S_IDR1.SECURE_IMPL. SMMUv3.0-TTFAA32 Support for VMSAv8-32 LPAE format translation tables. SMMUv3.0-TTFAA64 Support for VMSAv8-64 format translation tables. SMMUv3.0-VMID16 Support for 16-bit VMID, see SMMU_IDR0.VMID16. FEAT_VMID16 SMMUv3.0-ATOS Support for address translation operation registers, see SMMU_IDR0.ATOS. SMMUv3.0-VATOS Support for stage 1-only address translation operation registers, see SMMU_IDR0.VATOS. SMMUv3.0 also includes a Performance Monitor Counter Group extension, with the following optional features: SMMU PMCG feature name Description SMMU_PMCGv3.0-SID_FILTER_TYPE_ALL Support for filtering of event counts on a global or per-event basis. See SMMU_PMCG_CFGR.SID_FILTER_TYPE. SMMU_PMCGv3.0-CAPTURE Support for software-initiated capture of counter values. See SMMU_PMCG_CFGR.CAPTURE. SMMU_PMCGv3.0-MSI Support for PMCG-originated MSIs. See SMMU_PMCG_CFGR.MSI. SMMU_PMCGv3.0-RELOC_CTRS Support for exposing PMCG event counts in independent page of address space. See SMMU_PMCG_CFGR.RELOC_CTRS. SMMU_PMCGv3.0-SECURE_IMPL Support for counting events from more than one Security state. See SMMU_PMCG_SCR bit [31]. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 30
Chapter 2. Introduction 2.3. SMMUv3.1 features 2.3 SMMUv3.1 features SMMUv3.1 extends the base SMMUv3.0 architecture with the following features: • Support for PEs implementing Armv8.2-A: – Support for 52-bit VA, IPA, and PA. * Note: An SMMUv3.1 implementation is not required to support 52-bit addressing, but the SMMUv3.1 architecture extends fields to allow an implementation the option of doing so. – Page-Based Hardware Attributes (PBHA). – EL0 vs EL1 execute-never controls in stage 2 translation tables. – Note: Armv8.2 introduces a Common not Private (CnP) concept to the PE which does not apply to the SMMU architecture, because all SMMU translations are treated as common. • Support for transactions that perform cache-stash or destructive read side effects. • Performance Monitor Counter Group (PMCG) error status. SMMU feature name Description A-profile feature name SMMUv3.1-XNX Provides support for translation table stage 2 Unprivileged Execute-never, see SMMU_IDR3.XNX. FEAT_XNX SMMUv3.1-TTPBHA Provides support for translation table page-based hardware attributes, see SMMU_IDR3.PBHA. FEAT_HPDS2 SMMUv3.1-VAX Support for large Virtual Address space, see SMMU_IDR5.VAX. FEAT_LVA SMMUv3.1-LPA Support for large Physical Address space, see SMMU_IDR5.OAS. FEAT_LPA ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 31
Chapter 2. Introduction 2.4. SMMUv3.2 features 2.4 SMMUv3.2 features SMMUv3.2 extends the SMMUv3.1 architecture with the following features: • Support for PEs implementing Armv8.4-A [2]: – Support for Memory System Resource Partitioning and Monitoring (MPAM) [3]. * Note: Support for MPAM is optional in SMMUv3.2. – Secure EL2 and Secure stage 2 translation. * All previous rules about Secure streams being stage 1 only are removed. – Stage 2 control of memory types and cacheability. – Small translation tables support. – Range-based TLB invalidation and Level Hint. – Translation table updates without break-before-make. • Introduction of a Virtual Machine Structure for describing some per-VM configuration. SMMU feature name Description A-profile feature name SMMUv3.2-BBML1 SMMUv3.2-BBML2 Support for change in size of translation table mappings, see SMMU_IDR3.BBML. FEAT_BBML1, FEAT_BBML2 SMMUv3.2-RIL Support for range-based TLB invalidation and level hint, see SMMU_IDR3.RIL. FEAT_TTL, FEAT_TLBIRANGE SMMUv3.2-SecEL2 Support for Secure EL2 and Secure stage 2 translations, see SMMU_S_IDR1.SEL2. FEAT_SEL2 SMMUv3.2-STT Support for small translation tables, see SMMU_IDR3.STT. FEAT_TTST SMMUv3.2-MPAM Support for Memory System Resource Partitioning and Monitoring, see SMMU_IDR3.MPAM. FEAT_MPAM SMMUv3.2-S2FWB Support for stage 2 forced Write-Back, see SMMU_IDR3.FWB. FEAT_S2FWB SMMUv3.2 also introduces the following optional features to the PMCG extension: SMMU PMCG feature name Description SMMU_PMCGv3.2-MPAM Support for associating PMCG-originated MSIs with specific MPAM PARTID and PMG values. See SMMU_PMCG_CFGR.MPAM. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 32
Chapter 2. Introduction 2.5. SMMUv3.3 features 2.5 SMMUv3.3 features SMMUv3.3 extends the SMMUv3.2 architecture with the following features: • Support for features of PEs implementing Armv8.5 [2]: – E0PD feature, equivalent to FEAT_E0PD introduced in Armv8.5. – Protected Table Walk (PTW) behavior alignment with Armv8. – MPAM_NS mechanism, for alignment with FORCE_NS feature [3]. – Requirements for interaction with the Memory Tagging Extension [2]. • Enhanced Command queue interface for reducing contention when submitting Commands to the SMMU. • Support for recording non-Translation-related events for ATS Translation Requests. • Guidelines for RAS error recording. SMMU feature name Description A-profile feature name SMMUv3.3-E0PD Mandatory Support for preventing EL0 access to halves of address maps. See SMMU_IDR3.E0PD. FEAT_E0PD SMMUv3.3-PTWNNC Mandatory Support for treating table walks to Device memory as Normal Non- cacheable. See SMMU_IDR3.PTWNNC. SMMUv3.3-MPAM_NS Optional Support for Secure transactions using Non-secure PARTID space. See SMMU_S_MPAMIDR.HAS_MPAM_NS. SMMUv3.3-ECMDQ Optional Support for Enhanced Command queue interfaces. See SMMU_IDR1.ECMDQ. SMMUv3.3-SEC_ECMDQ Optional Support for Enhanced Command queue interfaces for Secure state. See SMMU_S_IDR0.ECMDQ. SMMUv3.3-ATSRECERR Optional Support for recording events on configuration errors for ATS translation requests. See SMMU_IDR0.ATSRECERR. SMMUv3.3 also introduces the following optional features to the PMCG extension: SMMU PMCG feature name Description SMMU_PMCGv3.3-FILTER_MPAM Support for filtering event counts by MPAM attributes. See SMMU_PMCG_CFGR.FILTER_PARTID_PMG. SMMU_PMCGv3.3-MPAM_NS Support for issuing PMCG MSIs for Secure state, associated with a Non-secure MPAM PARTID. See SMMU_PMCG_S_MPAMIDR.HAS_MPAM_NS. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 33
Chapter 2. Introduction 2.6. SMMU for RME features 2.6 SMMU for RME features SMMU for RME introduces support for Granule Protection Checks, for interoperability with PEs that implement FEAT_RME [2]. There are two aspects to RME support for SMMU: • Whether the SMMU has the Root programming interface and can perform Granule Protection Checks. This is advertised with SMMU_ROOT_IDR0.ROOT_IMPL == 1. • Whether the SMMU has RME-related changes exposed to the Secure and Non-secure programming interfaces. This is advertised with SMMU_IDR0.RME_IMPL == 1. Any SMMU behaviors specified as applying to an SMMU with RME apply to an SMMU implementation with SMMU_ROOT_IDR0.ROOT_IMPL == 1. An SMMU with RME must have SMMU_ROOT_IDR0.ROOT_IMPL == 1. It is permitted for an SMMU with RME to have SMMU_IDR0.RME_IMPL == 0. An SMMU with RME also implements SMMUv3.2 or later. An SMMU with SMMU_IDR0.RME_IMPL == 1 does not support the EL3 StreamWorld. This means that: • An STE with STRW configured for EL3 is ILLEGAL and results in C_BAD_STE. • The commands CMD_TLBI_EL3_ALL, CMD_TLBI_EL3_VA result in CERROR_ILL. • The SMMU is not required to perform any invalidation on receipt of a broadcast TLBI for EL3. Note: The value of SMMU_IDR0.RME_IMPL does not affect support for other features associated with Secure state. See also 3.25 Granule Protection Checks. SMMU RME feature name Description A-profile feature name SMMUv3.3-RME_ROOT_IMPL Support for the Root programming interface. See SMMU_ROOT_IDR0.ROOT_IMPL. FEAT_RME SMMUv3.3-RME_IMPL Support for visibility of GPC faults to the Non-secure, Secure and Realm programming interfaces, if supported. See SMMU_IDR0.RME_IMPL. FEAT_RME SMMUv3.3-RME_BGPTM Support for broadcast TLBI PA operations. See SMMU_ROOT_IDR0.BGPTM. FEAT_RME SMMUv3.3-RME_RGPTM Support for register TLBI by PA. See SMMU_ROOT_IDR0.RGPTM. An SMMU with RME implements either SMMUv3.3-RME_ROOT_IMPL or SMMUv3.3-RME_IMPL. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 34
Chapter 2. Introduction 2.7. SMMU for RME DA features 2.7 SMMU for RME DA features SMMU for RME DA introduces features that enable the association between devices and software executing in the Realm Security state. See [2]. Any SMMU behavior specified as applying to an SMMU with RME DA apply to an SMMU implementation with SMMU_ROOT_IDR0.REALM_IMPL == 1. This means that in such implementations, Realm programming interface is supported. SMMU RME DA feature name Description A-profile feature name SMMUv3.3-RME_DA Support for the Realm programming interface. See SMMU_ROOT_IDR0.REALM_IMPL. FEAT_RME SMMUv3.3-MEC_R Support for the RME Memory Encryption Contexts extension. See SMMU_R_IDR3.MEC. FEAT_MEC SMMUv3.3-DPT_R Support for Device Permission Table in Realm state. See SMMU_R_IDR3.DPT. SMMUv3.3-DPT_NS Support for Device Permission Table in Non-secure state. See SMMU_IDR3.DPT. An SMMU with RME DA implements SMMUv3.3-RME_DA. 2.7.1 Required features An SMMU with SMMU_ROOT_IDR0.REALM_IMPL == 1 implements all the mandatory features from SMMUv3.3, including the following requirements: Register field Value Notes SMMU_IDR3.PTWNNC 1 Mandatory from SMMUv3.3 onwards. SMMU_IDR3.E0PD 1 Mandatory from SMMUv3.3 onwards. SMMU_IDR3.STT 1 Mandatory because of Secure EL2 requirement. SMMU_IDR3.FWB 1 Mandatory from SMMUv3.2. SMMU_IDR3.XNX 1 Mandatory from SMMUv3.1. SMMU_IDR3.HAD 1 Mandatory from SMMUv3.1. An SMMU with SMMU_ROOT_IDR0.REALM_IMPL == 1 additionally has the following features: Register field Value Notes SMMU_IDR0.Hyp 1 Required for EL2. SMMU_IDR0.S1P 1 Required for stage 1 translation. SMMU_IDR0.S2P 1 Required for stage 2 translation. SMMU_IDR0.TTF 0b10 VMSAv8-64 only. SMMU_R_IDR3.DPT - Support for DPT is strongly recommended. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 35
Chapter 2. Introduction 2.7. SMMU for RME DA features Register field Value Notes SMMU_IDR0.NS1ATS - If ATS is supported and DPT is not supported, then split-stage ATS must be supported. SMMU_IDR0.COHACC 1 Required for coherent access to RMM-managed tables. SMMU_IDR0.BTM - Support for broadcast TLB maintenance is strongly recommended. SMMU_IDR0.HTTU - Support for Hardware update of Access Flag and Dirty state is strongly recommended. SMMU_IDR0.RME_IMPL 1 Granule Protection Check faults are visible to Non-secure, Realm and Secure states. SMMU_IDR3.BBML 0b10 Level 2 support is required. SMMU_ROOT_IDR0.ROOT_IMPL 1 SMMU must be able to perform Granule Protection Checks. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 36
Chapter 2. Introduction 2.8. SMMUv3.4 features 2.8 SMMUv3.4 features SMMUv3.4 extends the SMMUv3.3 architecture with the following features: • Support for features of PEs implementing Armv8.7 [2]: – 52-bit virtual and physical address spaces when using 4KB and 16KB translation granule sizes. – Enhanced PAN mechanism. – Requirements for interoperability with PEs that implement FEAT_XS. See 3.17.8 TLBInXS maintenance operations. • Support for features of PEs implementing Armv8.9 [2]: – Stage 1 and Stage 2 permission indirections. – Stage 2 permission overlays. – Translation hardening. – Attribute Index Enhancement. – 128-bit descriptors and 56-bit address spaces. – Table descriptor Access flag. – Stage 2 MemAttr NoTagAccess encodings. • Support for the PASID TLP prefix for use on ATS Translated transactions. • Deprecation of stashing translation information in ATS address fields. • Deprecation of InD and PnU as output attributes. • Deprecation of the SMMU_PMCG_PMAUTHSTATUS register. SMMU feature name Description A-profile feature name SMMUv3.4-LPA2 Optional Support for 52-bits of virtual and physical address space when using the 4KB and 16KB translation granule sizes. See SMMU_IDR5.DS. FEAT_LPA2 SMMUv3.4-PAN3 Optional Support for the Enhanced PAN mechanism. See SMMU_IDR3.EPAN. FEAT_PAN3 SMMUv3.4-THE Optional Support for translation hardening extension. See SMMU_IDR3.THE. FEAT_THE SMMUv3.4-S1PIE Optional Support for stage 1 permission indirections. See SMMU_IDR3.S1PI. FEAT_S1PIE SMMUv3.4-S2PIE Optional Support for stage 2 permission indirections. See SMMU_IDR3.S2PI. FEAT_S2PIE SMMUv3.4-S2POE Optional Support for stage 2 permission overlays. See SMMU_IDR3.S2PO. FEAT_S2POE SMMUv3.4-D128 Optional Support for 128-bit translation table descriptors. See SMMU_IDR5.D128, and SMMU_IDR5.{OAS, VAX}. FEAT_D128, FEAT_LVA3, 56-bit physical addresses SMMUv3.4-AIE Optional Support for stage 1 Attribute Index Enhancement. See SMMU_IDR3.AIE. FEAT_AIE SMMUv3.4-HAFT Optional Support for Table descriptor Access flags. See SMMU_IDR0.HTTU. FEAT_HAFT SMMUv3.4-MTE_PERM Mandatory Support for stage 2 MemAttr NoTagAccess encodings. See SMMU_IDR3.MTEPERM. FEAT_MTE_PERM SMMUv3.4-PASIDTT Optional Support for use of the PASID TLP prefix on ATS Translated transactions. See SMMU_IDR3.PASIDTT. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 37
Chapter 2. Introduction 2.9. SMMUv3.5 features 2.9 SMMUv3.5 features SMMUv3.5 extends the SMMUv3.4 architecture with the following features: • Support for features of PEs implementing Armv9.5 [2]: – Above PPS All Access. – Non-Secure only (NSO) GPI encoding. – Interoperability with PEs with FNGx control fields. – Hardware dirty state tracking structure (HDBSS). – Hardware accelerator for cleaning dirty state (HACDBS). – TLBI VMALL for Dirty state. – GPT scaling features. – Granular Data Isolation. • Support for Direct-mode Enhanced Command Queues. • Support for virtual to physical StreamID translation. • Support for software control of memory type attribute transformation. SMMU feature name Description A-profile feature name SMMUv3.5-RME_APPSAA Optional Support for Above PPS All Access. See SMMU_ROOT_IDR0.APPSAA. FEAT_RME_GPC2 SMMUv3.5-RME_NSO Optional Support for the Non-Secure only (NSO) GPI encoding. See SMMU_ROOT_IDR0.NSO. FEAT_RME_GPC2 SMMUv3.5-FNG Mandatory Support for interoperability with a PE with FNGx control fields. See SMMU_IDR3.FNG. FEAT_ASID2 SMMUv3.5-HDBSS Optional Support for hardware dirty state tracking structure. See SMMU_IDR3.HDBSS. FEAT_HDBSS SMMUv3.5-HACDBS Optional Support for hardware accelerator for cleaning dirty state. See SMMU_IDR3.HACDBS. FEAT_HACDBS SMMUv3.5-TLBIW Optional Support TLBI VMALL for Dirty state. See SMMU_IDR3.TLBIW. FEAT_TLBIW SMMUv3.5-RME_GPTS Optional Support for the GPT scaling features. See SMMU_ROOT_IDR0.GPTS. FEAT_RME_GPC3 SMMUv3.5-RME_GDI Optional Support for Granular Data Isolation. See SMMU_ROOT_IDR0.GDI. FEAT_RME_GDI SMMUv3.5-DCMDQ Optional Support for Direct Enhanced Command Queues. See SMMU_IDR6.DCMDQ. SMMUv3.5-VSID Optional Support for virtual to physical StreamID. translation. See SMMU_IDR6.VSID. SMMUv3.5-MTCOMB Mandatory Support for software control of memory type attribute transformation. See SMMU_IDR3.MTCOMB. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 38
Chapter 2. Introduction 2.10. Permitted implementation of subsets of SMMUv3.x and SMMUv3.(x+1) architectural features 2.10 Permitted implementation of subsets of SMMUv3.x and SMMUv3.(x+1) architectural features An SMMUv3.x compliant implementation can include any arbitrary subset of the architectural features of SMMUv3.(x+1), subject only to those constraints that require that certain features be implemented together. An SMMUv3.x compliant implementation cannot include any features of SMMUv3.(x+2) or later. Arm strongly recommends that implementations use the latest version available at design time. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 39
Chapter 2. Introduction 2.11. System placement 2.11 System placement PCIe Root Complex System interconnect M M S SMMU M S S M S Incoming PCIe traffic Prog I/F Prog I/F S M Outgoing PCIe traffic SMMU M S S M I/O ‘in’ interconnect M Root Port Outgoing device traffic S S Device 1 S M Device 2 S M I/O ‘out’ interconnect S M M ATS Switch PCIe Device 1 PCIe Device 2 ATC ATC Incoming device traffic StreamID StreamID S M Memory PEs StreamID RequesterID S M Figure 2.2: SMMU placement in an example system Two example uses of an SMMU are shown in Figure 2.2. One SMMU interfaces incoming traffic from two client devices to the system interconnect. The devices can perform DMA using virtual, IPA or other bus address schemes and the SMMU translates these addresses to PAs. The second example SMMU interfaces one to one to a PCIe Root Complex (which itself hosts a network of devices). This illustrates an additional interface specified in this specification, an ATS port to support PCIe ATS and PRI (or similar functionality for compatible non-PCIe devices). Outgoing accesses to the system interconnect and Completer devices do not pass through an additional SMMU. In general, Requesters are behind an SMMU (or, in the case of PEs, have an inbuilt MMU), so outgoing accesses to the system interconnect and Completer devices are mediated by the MMU of the Requester. If a Requester has no MMU, it has full-system access. Therefore, its DMA must be mediated by software, and in this case only the most privileged system software can program it. In this specification, a Requester associated with an SMMU is referred to as a client device of the SMMU. The SMMU has a programming interface that receives accesses from system software for setup and maintenance. The SMMU also makes accesses of its own (as a Requester) to configuration structures, for example to perform translation table walks. Whether the traffic originating from the SMMU itself shares the same interconnect resources as traffic passed through from device clients is IMPLEMENTATION SPECIFIC. Each SMMU is configured separately to any others that might exist in the system. Note: Arm recommends that SMMUs bridge I/O device DMA addresses onto system or physical addresses. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 40
Chapter 2. Introduction 2.11. System placement Arm recommends that SMMUs are placed between a device Requester port (or I/O interconnect) and system interconnect. Generally, Arm recommends that SMMUs are not placed in series and that the path of an SMMU to memory or other Completer devices does not pass through another SMMU, whether for fetch of SMMU configuration data or client transactions. Note: Interconnect-specific channels to support cache coherency are not shown in Figure 2.2. The SMMU interface to the system interconnect is intended to be IO-coherent, and provide either IO-coherent or fully-coherent access for the client devices of the SMMU. Note: It is feasible to implement an SMMU as part of a complex device containing fully coherent caches in the same way that the MMU of a PE is paired to fully coherent PE caches. Practically, this means the caches must be tagged with physical addresses. Complex device with embedded MMU ‘Smart’ device PCIe Root Complex System interconnect Monolithic SMMU B Device 0 Device 1 Switch PCIe Device 0 PCIe Device 1 ATC ATC Memory Distributed SMMU C Control & translation table walk TLB TLB TLB I/O interconnect Device 0 Device 1 Device 2 I/O interconnect Embedded SMMU A ATS Root Port Figure 2.3: Example SMMU implementations Figure 2.3 shows three example implementations of SMMU. • SMMU A is implemented as part of a complex device, providing translation for accesses from that device only. Arm expects this implementation to have an SMMU programming interface in addition to device-specific control. This design can provide dedicated contention-free translation and TLBs. • SMMU B is a monolithic block that combines translation, programming interface and translation table walk facilities. Two client devices use this SMMU as their path for DMA into the system. • SMMU C is distributed and provides multiple paths into the system for higher bandwidth. It comprises of: – A central translation table walker, which has its own Requester interface to fetch translation and configuration structures and queues and a Completer interface to receive programming accesses. This ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 41
Chapter 2. Introduction 2.11. System placement unit might contain a macro-TLB and caches of configuration. * The central translation table walker also provides an ATS interface to the Root Complex, so that the PCIe Devices can use ATS to make translation requests through to the central unit. – Remote TLB units which, on a miss, make translation requests to the central unit and cache the results locally. Two units are shown, supporting a set of three devices through one port, and a PCIe Root Complex through another. • Finally, a smart device is shown, which embeds a TLB and makes translation requests to the central unit of SMMU C. To software, this looks identical to a simple device connected behind a discrete TLB unit. This design provides a dedicated TLB for the device, but uses the programming interface and translation facilities of the central unit, reducing complexity of the device. In all cases, it appears to software as though a device is connected behind a logically-separate SMMU (similar to Device 0/1 on SMMU B). All implementations give the illusion of simple read/write transactions arriving from a client device to a discrete SMMU, even if physically it is the device performing the read/write transactions directly into the system, using translations provided by an SMMU. Note: This allows a single SMMU driver to be used for radically different SMMU implementations. Note: Devices might integrate a TLB, or whole SMMU, for performance reasons, but a closely-coupled TLB might also be used to provide physical addresses suitable for fully coherent device caches. Regardless of the implementation style, this specification uses the abstraction of client device transactions arriving at an SMMU. The boundary of SMMU might contain a single module or several distributed subcomponents but these must all behave consistently. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 42