Skip to content

3 Operation

Chapter 3 Operation 3.1 Software interface The SMMU has three interfaces that software uses: 1. Memory-based data structures to map devices to translation tables which are used to translate client device addresses. 2. Memory-based circular buffer queues. These are a Command queue for commands to the SMMU, an Event queue for event/fault reports from the SMMU, and a PRI queue for receipt of PCIe page requests. Note: The PRI queue is only present on SMMUs supporting PRI services. This additional queue allows processing of PRI requests from devices separate from event or fault reports. 3. A set of registers, for each supported Security state, for discovery and SMMU-global configuration. The registers indicate the base addresses of the structures and queues, provide feature detection and identification registers and a global control register to enable queue processing and translation of traffic. When Secure state is supported, an additional register set exists to allow Secure software to maintain Secure device structures, issue commands on a second Secure Command queue and read Secure events from a Secure Event queue. In virtualization scenarios allowing stage 1 translation, a guest OS is presented with the same programming interface and therefore believes it is in control of a real SMMU (albeit stage 1-only) with the same format of Command, Event, and optionally PRI, queues, and in-memory data structures. Certain fields in architected SMMU registers and structures are marked as IMPLEMENTATION DEFINED. The content of these fields is specific to the SMMU implementation, but implementers must not use these fields in such a way that a generic SMMUv3 driver becomes unusable. Unless a driver has extended knowledge of particular IMPLEMENTATION DEFINED fields or features, the driver must treat all such fields as Reserved and set them to 0. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 43

Chapter 3. Operation 3.1. Software interface An implementation only uses IMPLEMENTATION DEFINED fields to enable extended functionality or features, and remains compatible with generic driver software by maintaining architected behavior when these fields are set to 0. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 44

Chapter 3. Operation 3.2. Stream numbering 3.2 Stream numbering An incoming transaction has an address, size, and attributes such as read/write, Secure/Non-secure, Shareability, Cacheability. If more than one client device uses the SMMU traffic must also have a sideband StreamID so the sources can be differentiated. How a StreamID is constructed and carried through the system is IMPLEMENTATION DEFINED. Logically, a StreamID corresponds to a device that initiated a transaction. Note: The mapping of a physical device to StreamID must be described to system software. Arm recommends that StreamID be a dense namespace starting at 0. The StreamID namespace is per-SMMU. Devices assigned the same StreamID but behind different SMMUs are seen to be different sources. A device might emit traffic with more than one StreamID, representing data streams differentiated by device-specific state. StreamID is of IMPLEMENTATION DEFINED size, between 0 bits and 32 bits. The StreamID is used to select a Stream Table Entry (STE) in a Stream table, which contains per-device configuration. The maximum size of in-memory configuration structures relates to the maximum StreamID span (see 3.3 Data structures and translation procedure below), with a maximum of 2StreamIDSize entries in the Stream table. Another property, SubstreamID, might optionally be provided to an SMMU implementing stage 1 translation. The SubstreamID is of IMPLEMENTATION DEFINED size, between 0 bits and 20 bits, and differentiates streams of traffic originating from the same logical block to associate different application address translations to each. Note: An example would be a compute accelerator with 8 contexts that might each map to a different user process, but where the single device has common configuration meaning it must be assigned to a VM whole. Note: The SubstreamID is equivalent to a PCIe PASID. Because the concept can be applied to non-PCIe systems, it has been given a more generic name in the SMMU. The maximum size of SubstreamID, 20 bits, matches the maximum size of a PCIe PASID. The incoming transaction flags whether a SubstreamID is supplied and this might differ on a per-transaction basis. Both of these properties and sizes are discoverable through the SMMU_IDR1 register. See section 16.4 System integration for recommendations on StreamID and SubstreamID sizing. The StreamID is the key that identifies all configuration for a transaction. A StreamID is configured to bypass or be subject to translation and such configuration determines which stage 1 or stage 2 translation to apply. The SubstreamID provides a modifier that selects between a set of stage 1 translations indicated by the StreamID but has no effect on the stage 2 translation which is selected by the StreamID only. A stage 2-only implementation does not take a SubstreamID input. An implementation with stage 1 is not required to support substreams, therefore is not required to take a SubstreamID input. The SMMU optionally supports Secure state and, if supported, the StreamID input to the SMMU is qualified by a SEC_SID flag that determines whether the input StreamID value refers to the Secure or Non-secure StreamID namespace. A Non-secure StreamID identifies an STE within the Non-secure Stream table and a Secure StreamID identifies an STE within the Secure Stream table. In this specification, the term StreamID implicitly refers to the StreamID disambiguated by SEC_SID (if present) and does not refer solely to a literal StreamID input value (which would be associated with two STEs when Secure state is supported) unless explicitly stated otherwise. See section 3.10.2 Support for Secure state. Arm expects that, for PCI, StreamID is generated from the PCI RequesterID so that StreamID[15:0] == RequesterID[15:0]. When more than one PCIe hierarchy is hosted by one SMMU, Arm recommends that the 16-bit RequesterID namespaces are arranged into a larger StreamID namespace by using upper bits of StreamID to differentiate the contiguous RequesterID namespaces, so that StreamID[N:16] indicates which Root Complex (PCIe domain/segment) is the source of the stream source. In PCIe systems, the SubstreamID is intended to be directly provided from the PASID [1] in a one to one fashion. Therefore, for SMMU implementations intended for use with PCI clients, supported StreamID size must be at least 16 bits. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 45

Chapter 3. Operation 3.3. Data structures and translation procedure 3.3 Data structures and translation procedure The SMMU uses a set of data structures in memory to locate translation data. Registers hold the base addresses of the initial root structure, the Stream table. An STE contains stage 2 translation table base pointers, and also locates stage 1 configuration structures, which contain translation table base pointers. A Context Descriptor (CD) represents stage 1 translation, and a Stream Table Entry represents stage 2 translation. Therefore, there are two distinct groups of structures used by the SMMU: • Configuration structures, which map from the StreamID of a transaction (a device originator identifier) to the translation table base pointers, configuration, and context under which the translation tables are accessed. • Translation table structures that are used to perform the VA to IPA and IPA to PA translation of addresses for stage 1 and stage 2, respectively. The procedure for translation of an incoming transaction is to first locate configuration appropriate for that transaction, identified by its StreamID and, optionally, SubstreamID, and then to use that configuration to locate translations for the address used. The first step in dealing with an incoming transaction is to locate the STE, which tells the SMMU what other configuration it requires. Conceptually, an STE describes configuration for a client device in terms of whether it is subject to stage 1 or stage 2 translation or both. Multiple devices can be associated with a single Virtual Machine, so multiple STEs can share common stage 2 translation tables. Similarly, multiple devices (strictly, streams) might share common stage 1 configuration, therefore multiple STEs could share common CDs. 3.3.1 Stream table lookup The StreamID of an incoming transaction locates an STE. Two formats of Stream table are supported. The format is set by the Stream table base registers. The incoming StreamID is range-checked against the programmed table size, and a transaction is terminated if its StreamID would otherwise select an entry outside the configured Stream table extent (or outside a level 2 span). See SMMU_STRTAB_BASE_CFG and C_BAD_STREAMID. The StreamID of an incoming transaction might be qualified by SEC_SID, and this determines which Stream table, or cached copies of that Stream table, is used for lookup. See section 3.10.1 StreamID Security state (SEC_SID). 3.3.1.1 Linear Stream table STRTAB_BASE StreamID[n:0] STE 2 STE 0 STE 1 STE 3 Figure 3.1: Linear Stream table A linear Stream table is a contiguous array of STEs, indexed from 0 by StreamID. The size is configurable as a 2n multiple of STE size up to the maximum number of StreamID bits supported in hardware by the SMMU. The linear Stream table format is supported by all SMMU implementations. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 46

Chapter 3. Operation 3.3. Data structures and translation procedure 3.3.1.2 2-level Stream table STRTAB_BASE Desc 3 Desc 0 Desc 1 StreamID[9:8] StreamID[7:0] StreamID[1:0] STE 0x300 STE 0x000 STE 0x001 STE 0x002 STE 0x0fd STE 0x0ff STE 0x100 STE 0x101 STE 0x102 STE 0x103 Addr 0x2f00 Addr 0x4000 Addr 0x1000 Figure 3.2: Example Two level Stream table with SPLIT == 8 A 2-level Stream table is a structure consisting of one top-level table that contains descriptors that point to multiple second-level tables that contain linear arrays of STEs. The span of StreamIDs covered by the entire structure is configurable up to the maximum number supported by the SMMU but the second-level tables do not have to be fully populated and might vary in size. This saves memory and avoids the requirement of large physically-contiguous allocations for very large StreamID spaces. The top-level table is indexed by StreamID[n:x], where n is the uppermost StreamID bit covered, and x is a configurable Split point given by SMMU_(*_)STRTAB_BASE_CFG.SPLIT. The second-level tables are indexed by up to StreamID[x - 1:0], depending on the span of each table. Support for the 2-level Stream table format is discoverable using the SMMU_IDR0.ST_LEVEL field. Where 2-level Stream tables are supported, split points of 6 bits, 8 bits and 10 bits can be used. Implementations support either a linear Stream table format, or both linear and 2-level formats. SMMUs supporting more than 64 StreamIDs (6 bits of StreamID) must also support two-level Stream tables. Note: Implementations supporting fewer than 64 StreamIDs might support two-level Stream tables, but doing so is not useful as all streams would fit within a single second-level table. Note: This rule means that an implementation supports two-level tables when the maximum size of linear Stream table would be too big to fit in a 4KB page. The top-level descriptors contain a pointer to the second-level table along with the StreamID span that the table represents. Each descriptor can also be marked as invalid. This example top-level table is depicted in Figure 3.2, where the split point is set to 8: Level 1 index Valid Level 2 pointer Span of Level 2 Value of L1STD.Span 0 Y 0x1000 28 9 1 Y 0x2F00 22 3 2 N - - 0 ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 47

Chapter 3. Operation 3.3. Data structures and translation procedure Level 1 index Valid Level 2 pointer Span of Level 2 Value of L1STD.Span 3 Y 0x4000 20 1 In this example: • StreamIDs 0-1023 (4 × 8-bit level 2 tables) are represented, though not all are valid. • StreamIDs 0-255 are configured by the array of STEs at 0x1000 (each of which separately enables the relevant StreamID). • StreamIDs 256-259 are configured by the array of STEs at 0x2F00. • StreamIDs 512-767 are all invalid. • The STE of StreamID 768 is at 0x4000. A two-level table with a split point of 8 can reduce the memory usage compared to a large and sparse linear table used with PCIe. If the full 256 PCIe bus numbers are supported, the RequesterID or StreamID space is 16-bits. However, because there is usually one PCIe bus for each physical link and potentially one device for each bus, in the worst case a valid StreamID might only appear once every 256 StreamIDs. Alternatively, a split point of 6 provides 64 bottom-level STEs, enabling use of a 4KB page for each bottom-level table. Note: Depending on the size of the StreamID space, the L1 Stream table might require allocation of a region of physically-contiguous memory greater than a single granule. This table shows some example sizes for the amount of memory occupied by L1 and L2 Stream tables: SIDSIZE SPLIT L1 table size L2 table size 16 6 8KB 4KB 16 8 2KB 16KB 16 10 512B 64KB 24 6 2MB 4KB 24 8 512KB 16KB 24 10 128KB 64KB 3.3.2 StreamIDs to Context Descriptors The STE contains the configuration for each stream indicating: • Whether traffic from the device is enabled. • Whether it is subject to stage 1 translation. • Whether it is subject to stage 2 translation, and the relevant translation tables. • Which data structures locate translation tables for stage 1. If stage 1 is used, the STE indicates the address of one or more CDs in memory using the STE.S1ContextPtr field. The CD associates the StreamID with stage 1 translation table base pointers (to translate VA into IPA), per-stream configuration, and ASID. If substreams are in use, multiple CDs indicate multiple stage 1 translations, one for each substream. Transactions provided with a SubstreamID are terminated when stage 1 translation is not enabled. If stage 2 is used, the STE contains the stage 2 translation table base pointer (to translate IPA to PA) and VMID. If multiple devices are associated with a particular virtual machine, meaning they share stage 2 translation tables, then multiple STEs might map to one stage 2 translation table. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 48

Chapter 3. Operation 3.3. Data structures and translation procedure Note: Arm expects that, where hypervisor software is present, the Stream table and stage 2 translation table are managed by the hypervisor and the CDs and stage 1 translation tables associated with devices under guest control are managed by the guest OS. Additionally, the hypervisor can make use of separate hypervisor stage 1 translations for its own internal purposes. Where a hypervisor is not used, a bare-metal OS manages the Stream table and CDs. For more information, see section 3.6 Structure and queue ownership. When a SubstreamID is supplied with a transaction and the configuration enables substreams, the SubstreamID indexes the CDs to select a stage 1 translation context. In this configuration, if a SubstreamID is not supplied, behavior depends on the STE.S1DSS flag: • When STE.S1DSS == 0b00, all traffic is expected to have a SubstreamID and the lack of SubstreamID is an error. A transaction without a SubstreamID is aborted and an event recorded. • When STE.S1DSS == 0b01, a transaction without a SubstreamID is accepted but is treated exactly as if its configuration were stage 1-bypass. The stage 1 translations are enabled only for transactions with SubstreamIDs. • When STE.S1DSS == 0b10, a transaction without a SubstreamID is accepted and uses the CD of Substream 0. Under this configuration, transactions that arrive with SubstreamID 0 are aborted and an event recorded. When stage 1 is used, the STE.S1ContextPtr field gives the address of one of the following, configured by STE.S1Fmt and STE.S1CDMax: • A single CD. • The start address of a single-level table of CDs. – The table is a contiguous array of CDs indexed by the SubstreamID. • The start address of a first-level, L1, table of L1CDs. – Each L1CD.L2Ptr in the L1 table can be configured with the address of a linear level two, L2, table of CDs. – The L1 table is a contiguous array of L1CDs indexed by upper bits of SubstreamID. The L2 table is a contiguous array of CDs indexed by lower bits of SubstreamID. The ranges of SubstreamID bits that are used for the L1 and L2 indices are configured by STE.S1Fmt. The S1ContextPtr and L2Ptr addresses are IPAs when both stage 1 and stage 2 are used and PAs when only stage 1 is used. S1ContextPtr is not used when stage 1 is not used. The ASID and VMID values provided by the CD and STE structures tag TLB entries created from translation lookups performed through configuration from the CD and STEs. These tags are used on lookup to differentiate translation address spaces between different streams, or to match entries for invalidation on receipt of broadcast TLB maintenance operations. Implementations might also use these tags to efficiently allow sharing of identical translation tables between different streams. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 49

Chapter 3. Operation 3.3. Data structures and translation procedure SMMU_(*_)STRTAB_BASE StreamID Configuration TTB1 TTB0 ASID MAIR Stage 1 translation tables Stage 2 translation tables Context Descriptor (CD) Stream Table Entry (STE) Config S1ContextPtr Other attributes, configuration S2TTB VMID Figure 3.3: Configuration structure example Figure 3.3 shows an example configuration in which a StreamID selects an STE from a linear Stream table, the STE points to a translation table for stage 2 and points to a single CD for stage 1 configuration, and then the CD points to translation tables for stage 1. CD SubstreamID CD Stage 1 Translation tables Stage 2 Translation tables Config S1ContextPtr Other attributes, configuration S2TTB VMID Stream Table Entry (STE) ... Figure 3.4: Multiple Context Descriptors for Substreams Figure 3.4 shows a configuration in which an STE points to an array of several CDs. An incoming SubstreamID selects one of the CDs and therefore the SubstreamID determines which stage 1 translations are used by a transaction. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 50

Chapter 3. Operation 3.3. Data structures and translation procedure L1 ST ptr SMMU_(*_)STRTAB_BASE L1 ST ptr STE STE CD L1 CD ptr L1 CD ptr CD CD CD STE CD Stage 1 Translation tables Figure 3.5: Multi-level Stream and CD tables Figure 3.5 shows a more complex layout in which a multi-level Stream table is used. Two of the STEs point to a single CD, or a flat array of CDs, whereas the third STE points to a multi-level CD table. With multiple levels, many streams and many substreams might be supported without large physically-contiguous tables. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 51

Chapter 3. Operation 3.3. Data structures and translation procedure Stage 1 translation VA to IPA Stage 2 translation IPA to PA Bypass Bypass VA (BA) IPA PA Figure 3.6: Translation stages and addresses An incoming transaction is dealt with in the following logical steps: 1. If the SMMU is globally disabled (for example when it has just come out of reset with SMMU_CR0.SMMUEN == 0), the transaction passes through the SMMU without any address modification. Global attributes, such as memory type or Shareability, might be applied from the SMMU_GBPA register of the SMMU. Or, the SMMU_GBPA register might be configured to abort all transactions. 2. If the global bypass described in (1) does not apply, the configuration is determined: a) An STE is located. b) If the STE enables stage 2 translation, the STE contains the stage 2 translation table base. c) If the STE enables stage 1 translation, a CD is located. If stage 2 translation is also enabled by the STE, the CD is fetched from IPA space which uses the stage 2 translations. Otherwise, the CD is fetched from PA space. 3. Translations are performed, if the configuration is valid. a) If stage 1 is configured to translate, the CD contains a translation table base which is walked. This might require stage 2 translations, if stage 2 is enabled for the STE. Otherwise, stage 1 bypasses translation and the input address is provided directly to stage 2. b) If stage 2 is configured to translate, the STE contains a translation table base that performs a nested walk of a stage 1 translation table if enabled, or a normal walk of an incoming IPA. Otherwise, stage 2 bypasses translation and the stage 2 input address is provided as the output address. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 52

Chapter 3. Operation 3.3. Data structures and translation procedure 4. A transaction with a valid configuration that does not experience a fault on translation has the output address (and memory attributes, as appropriate) applied and is forwarded. Note: This sequence illustrates the path of a transaction on a Non-secure stream. If Secure state is supported, the path of a transaction on a Secure stream is similar, except SMMU_S_CR0.SMMUEN and SMMU_S_GBPA control bypass. An implementation might cache data as required for any of these steps. Section 16.2 Caching describes caching of configuration and translation structures. Furthermore, events might occur at several stages in the process that prevent the transaction from progressing any further. If a transaction fails to locate valid configuration or is of an unsupported type, it is terminated with an abort, and an event might be recorded. If the transaction progresses as far as translation, faults can arise at either stage of translation. The configuration that is specific to the CD and STEs that are used determines whether the transaction is terminated or whether it is stalled, pending software fault resolution, see section 3.12 Fault models, recording and reporting. The two translation stages are described using the VA to IPA and IPA to PA stages of the Armv8-A Virtualization terminology. Note: Some systems refer to the SMMU input as a Bus Address (BA). The term VA emphasizes that the input address to the SMMU can potentially be from the same virtual address space as a PE process (using VAs). Unless otherwise specified, translation tables and their configuration fields act exactly the same way as their equivalents specified in the Armv8-A Translation System for PEs [2]. If an SMMU does not implement one of the two stages of translation, it behaves as though that stage is configured to permanently bypass translation. Other restrictions are also relevant, for example it is not valid to configure a non-present stage to translate. An SMMU must support at least one stage of translation. 3.3.3 Configuration and Translation lookup Output transaction Translation lookup Configuration lookup Input transaction StreamID Get Stream Table Entry Get Context Descriptor SubstreamID ASID, translation table VMID, Stage 2 translation table StreamWorld (translation regime) Input address TLB {Security, StreamWorld, VMID, ASID, Address} to {PA, Permissions} Miss Walk translation tables Data Fill Output address (PA) Figure 3.7: Configuration and translation lookup sequence Figure 3.7 illustrates the concepts that are used in this specification when referring to a configuration lookup and translation lookup. As described in 3.3.2 StreamIDs to Context Descriptors above, an incoming transaction is first subject to a ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 53

Chapter 3. Operation 3.3. Data structures and translation procedure configuration lookup, and the SMMU determines how to begin to translate the transaction. This involves locating the appropriate STE then, if required, a CD. The configuration lookup stage does not depend on the input address and is a function of the: • SMMU global register configuration. • Incoming transaction StreamID. • Incoming transaction SubstreamID (if supplied). The result of the configuration lookup is the stream or substream-specific configuration that locates the translation, including: • Stage 1 translation table base pointers, ASID, and properties modifying the interpretation or walk of the translation tables (such as translation granule). • Stage 2 translation table base pointer, VMID and properties modifying the interpretation or walk of the translation table. • Stream-specific properties, such as the StreamWorld (the Exception Level, or translation regime, in PE terms) to which the stream is assigned. The translation lookup stage logically works the same way as a PE memory address translation system. The output is the final physical address provided to the system, which is a function of the: • Input address • StreamWorld (Stream Security state and Exception level), ASID and VMID (which are provided from the previous step). Figure 3.7 shows a PE-style TLB used in the translation lookup step. Arm expects the SMMU to use a TLB to cache translations instead of performing translation table walks for each transaction, but this is not mandatory. Note: For clarity, Figure 3.7 does not show error reporting paths or CD fetch through stage 2 translation (which would also access the TLB or translation table walk facilities). An implementation might choose to flatten or combine some of the steps shown, while maintaining the same behavior. A cached translation is associated with a StreamWorld that denotes its translation regime. StreamWorld is directly equivalent to an Exception level on a PE. The StreamWorld of a translation is determined by the configuration that inserts that translation. The StreamWorld of a cached translation is determined from the combination of the Security state of an STE, its STE.Config field, its STE.STRW field, and the corresponding SMMU_(*_)CR2.E2H configuration. See the STE.STRW field in section 5.2 Stream Table Entry. In addition to insertion into a TLB, the StreamWorld affects TLB lookups, and the scope of different types of TLB invalidations. An SMMU implementation is not required to distinguish between cached translations inserted for EL2 versus EL2-E2H. For the behavior of TLB invalidations, see section 3.17 TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance. A translation is associated with one of the following StreamWorlds: StreamWorld Properties equivalent to NS-EL1 Non-secure EL1&0 NS-EL2 Non-secure EL2. This is when E2H is not used, and translations do not have an ASID tag. NS-EL2-E2H Non-secure EL2&0. This is when E2H is used, and translations have an ASID tag. S-EL2 Secure EL2. This is when E2H is not used, and translations do not have an ASID tag. S-EL2-E2H Secure EL2&0. This is when E2H is used, and translations have an ASID tag. Secure Either: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 54

Chapter 3. Operation 3.3. Data structures and translation procedure StreamWorld Properties equivalent to - The single translation regime that is used by Secure EL1 and Secure EL3 when EL3 is running in AArch32 state. - Secure EL1&0 when EL3 is running in AArch64 state. EL3 has a separate translation regime. This regime has an ASID and, if Secure stage 2 is supported, a VMID. EL3 EL3 when EL3 is running in AArch64 state and FEAT_RME is not implemented. Realm-EL1 Realm EL1&0. Realm-EL2 Realm EL2. This is when E2H is not used, and translations do not have an ASID tag. Realm-EL2-E2H Realm EL2&0. This is when E2H is used, and translations have an ASID tag. Note: StreamWorld can differentiate multiple translation regimes in the SMMU that are associated with different bodies of software at different Exception levels. For example, a Secure Monitor EL3 translation for address 0x1000 is different to (and unaffected by) a Non-secure hypervisor EL2 translation for address 0x1000, as are NS-EL1 translations for address 0x1000. Arm expects that the StreamWorld configured for a stream in the SMMU will match the Exception level of the software that controls the stream or device. The term any-EL2 is used to describe behaviors common to NS-EL2, S-EL2, and Realm-EL2. The term any-EL2-E2H is used to describe behaviors common to NS-EL2-E2H, S-EL2-E2H, and Realm-EL2-E2H StreamWorlds. In the same way as in an Armv8-A MMU, a translation is architecturally unique if it is identified by a unique set of {StreamWorld, VMID, ASID, Address} input parameters. For example, the following are unique and can all co-exist in a TLB: • Entries with the same address, but different ASIDs. • Entries with the same address and ASID, but different VMIDs. • Entries with the same address and ASID but a different StreamWorld. Architecturally, a translation is not uniquely identified by a StreamID and SubstreamID. This results in two properties: • A translation is not required to be unique for a set of transaction input parameters (StreamID, SubstreamID). – Two streams can be configured to use the same translation configuration and the resulting ASID/VMID from their configuration lookup will identify a single set of shared TLB entries. • Multiple StreamID/SubstreamID configurations that result in identical ASID/VMID/StreamWorld configuration must maintain the same configuration where configuration can affect TLB lookup. – For example, two streams configured for a stage 1, NS-EL1 with ASID == 3 must both use the same translation table base addresses and translation granule. When translating an address, any-EL2 and EL3 regimes use only one translation table. CD.TTB1 is unused in these configurations. All other StreamWorlds use both translation tables, and therefore CD.TTB0 and CD.TTB1 are both required. Only some stage 1 translation table formats are valid in each StreamWorld, consistent with the PE. Valid combinations are described in the CD.AA64 description. Selecting an inconsistent combination of StreamWorld and CD.AA64 (for example, using VMSAv8-32 LPAE translation tables to represent a VMSAv8-64 EL3 translation regime) causes the CD to be ILLEGAL. Secure stage 1 permits VMSAv8-32 LPAE, VMSAv8-64 and VMSAv9-128 translation tables. Secure stage 2 is not supported for VMSAv8-32 LPAE translation tables. In this specification, the term TLB is used to mean the concept of a translation cache, indexed by StreamWorld/VMID/ASID and VA. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 55

Chapter 3. Operation 3.3. Data structures and translation procedure SMMU cache maintenance commands therefore fall into two groups: • Configuration cache maintenance, acting upon StreamIDs and SubstreamIDs. • TLB maintenance, acting on addresses, ASIDs, VMIDs and StreamWorld. The second set of commands directly matches broadcast TLB maintenance operations that might be available from PEs in some systems. The StreamWorld tag determines how TLB entries respond to incoming broadcast TLB invalidations and TLB invalidation SMMU commands, see section 3.17 TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance for details. 3.3.4 Transaction attributes: incoming, two-stage translation and overrides In addition to an address, size and read/write attributes, an incoming transaction might be presented to the SMMU with other attributes, such as an access type (for example Device, WB-cached Normal memory), Shareability (for example Outer Shareable), cache allocation hints, and permissions-related attributes, instruction/data, privileged/unprivileged, Secure/Non-secure. Some of these attributes are used to check the access against the page permissions that are determined from the translation tables. After passing through the SMMU, a transaction presented to the system might also have a set of attributes, which might have been affected by the SMMU. Depending on the StreamWorld configuration, these attributes can be configured differently. For example, the format of access permissions in a stage 1 translation table descriptor is affected when located from a configuration with StreamWorld == any-EL2 or StreamWorld == EL3, consistent with the Armv8 Translation System [2]. Specifically, the AP[1] bit (of the AP[2:1] field) is ignored and treated as if it were 1 because privilege checks are ignored for EL2 and EL3 (VMSAv8-64 and VMSAv9-128) translations. However, any-EL2-E2H translations maintain privileged/non-privileged checks in the same manner as EL1. The details of how input attributes affect the attributes output into the system, in combination with translation table attributes and other configuration, is described in detail in Chapter 13 Attribute Transformation. The input attributes are conceptually provided from the system, either conveyed from a client device that defines the transaction attributes in a device-specific way, or set in a system-specific way by the interconnect before the transaction is input to the SMMU. As an overview: • Permission-related attributes (instruction/data, privileged/unprivileged) and read/write properties are used for checking against translation table permissions, which might deny the access. The permission-related attributes input into the SMMU might be overridden on a per-device basis before the permission checks are performed, using the INSTCFG, PRIVCFG, and NSCFG STE fields. See section 13.5 Summary of attribute/permission configuration fields for information about output attributes. Note: The overrides might be useful if a device is not able to express a particular kind of traffic. • Other attributes (memory type, Shareability, cache hints) are intended to have an effect on the memory system rather than the SMMU, for example, control cache lookup for the transaction. The attributes output into the memory system are a function of the attributes specified by the translation table descriptors (at stage 1, stage 2, or stage 1 and stage 2) used to translate the input address. The SMMU might convey attributes input from a device through this process, so that the device might influence the final transaction access, and input attributes might be overridden on a per-device basis using the MTCFG/MemAttr, SHCFG, ALLOCCFG STE fields. The input attribute, modified by these fields, is primarily useful for setting the resulting output access attribute when both stage 1 and stage 2 translation is bypassed (no translation table descriptors to determine attribute) but can also be useful for stage 2-only configurations in which a device stream might have finer knowledge about the required access behavior than the general virtual machine-global stage 2 translation tables. The STE attribute and permission override fields, MTCFG/MemAttr, SHCFG, ALLOCCFG, INSTCFG, PRIVCFG, and NSCFG, allow an incoming value to be used or, for each field, a specific override value to be selected. For example, INSTCFG can configure a stream as Always Data, replacing an incoming INST property that might be in either state. However, in SMMU implementations that are closely-coupled to, or embedded in, a device, the incoming attribute can always be considered to be the most appropriate. When an SMMU and device guarantee ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 56

Chapter 3. Operation 3.3. Data structures and translation procedure that the incoming attributes are correct, it is permissible for an SMMU to always use the incoming value for each attribute value. See SMMU_IDR1.ATTR_TYPES_OVR and SMMU_IDR1.ATTR_PERMS_OVR for more information. For an SMMU that cannot guarantee that these attributes are always provided correctly from the client device, for example a discrete SMMU design, Arm strongly recommends supporting overrides of incoming attributes. 3.3.5 Translation table descriptors The A-profile architecture[2] defines bits [63:60] of stage 2 Block and Page descriptors as being Reserved for use by a System MMU. In SMMUv3.1 and later, these bits are Reserved, RES0. Note: When PBHA is enabled for a bit in this range, bits [62:60] are affected by the PBHA mechanism. When PBHA is not enabled, the previous definition applies. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 57

Chapter 3. Operation 3.4. Address sizes 3.4 Address sizes There are three address size concepts to consider in the SMMU, the input address size from the system, the Intermediate Address Size (IAS), and the Output Address Size (OAS): • The SMMU input address size is 64 bits. – Note: See section 3.4.1 Input address size and Virtual Address size for recommendations on how a smaller interconnect or device address capability is presented to the SMMU. • IAS reflects the maximum usable IPA of an implementation that is generated by stage 1 and input to stage 2: – This term is defined to illustrate the handling of intermediate addresses in this section and is not a configurable parameter. – The maximum usable IPA size of an SMMU is defined in terms of other SMMU implementation choices, as: IAS = MAX((SMMU_IDR0.TTF[0]==1 ? 40 : 0), (SMMU_IDR0.TTF[1]==1 ? OAS : 0)); – VMSAv8-32 LPAE always supports an IPA size of 40 bits, whereas VMSAv8-64 and VMSAv9-128 limits the maximum IPA size to the maximum PA size. Otherwise, when VMSAv8-32 LPAE is not implemented, the IPA size equals OAS, the PA size, and might be smaller than 40 bits. – The purpose of definition of the IAS term is to abstract away from these implementation variables. • OAS reflects the maximum usable PA output from the last stage of VMSAv8-64 or VMSAv9-128 translations, and must match the system physical address size. The OAS is discoverable from SMMU_IDR5.OAS. Final-stage VMSAv8-32 LPAE translations always output 40 bits which are zero-extended into a larger OAS, or truncated to a smaller OAS. Note: Except where explicitly noted, all address translation and fault checking behavior is consistent with Armv8-A [2]. If the SMMU is disabled (with SMMU_()CR0.SMMUEN == 0, and SMMU()GBPA.ABORT == 0 allows traffic bypass), the input address is presented directly to the output PA. If the input address of a transaction exceeds the size of the OAS, the transaction is terminated with an abort and no event is recorded. Otherwise, when SMMU(*_)CR0.SMMUEN == 1, transactions are treated as described in the rest of this section. When a stream selects an STE with STE.Config == 0b100, transactions bypass all stages of translation. If the input address of a transaction exceeds the size of the OAS, the transaction is terminated with an abort and a stage 1 Address Size fault (F_ADDR_SIZE) is recorded. Note: In Armv8-A PEs, when both stages of translation bypass, a (stage 1) Address Size fault might be generated where an (input) address is greater than the PA size, depending on whether a PE is in AArch32 or AArch64 state. This behavior does not directly translate to the SMMU because no configuration is available to select translation system when in bypass or disabled, therefore the address size is always tested. When a stream selects an STE with one or more stages of translation present: For input to stage 1, the input address is treated as a VA (see section 3.4.1 Input address size and Virtual Address size) and if stage 1 is not bypassed the following stage 1 address checks are performed: 1. On input, a stage 1 Translation fault (F_TRANSLATION) occurs if the VA is outside the range specified by the relevant CD: a. For a CD configured as VMSAv8-32 LPAE, the maximum input range is fixed at 32 bits, and the range of the address input into a given TTB0 or TTB1 translation table is determined by the T0SZ and T1SZ fields. Note: The arrangement of the TTB0/TTB1 translation table input spans might be such that there is a range of 32-bit addresses that is outside both of the TTB0 and TTB1 spans and will always cause a Translation fault. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 58

Chapter 3. Operation 3.4. Address sizes b. For a CD configured as VMSAv8-64, the range is determined by the T0SZ and T1SZ fields. i. For SMMUv3.0, up to a maximum of 49 bits (two 48-bit TTB0/TTB1). ii. For SMMUv3.1 and later, then for each TTBx, the maximum input size is: • 48 bits, if any of the following are true: – SMMU_IDR5.VAX == 0b00. – The TTBx is configured for a 4KB or 16KB granule using CD.TGx and CD.DS == 0. • 52 bits, if SMMU_IDR5.VAX == 0b01 or 0b10 and one of the following is true: – The TTBx is configured for a 64KB granule using CD.TGx. – CD.DS == 1 and the TTBx is configured for a 4KB or 16KB granule using CD.TGx. c. For a CD configured as VMSAv9-128, the range is determined by the T0SZ and T1SZ fields. For each TTBx, the maximum input size is: • 48 bits, if SMMU_IDR5.VAX == 0b00. • 52 bits, if SMMU_IDR5.VAX == 0b01. • 55 bits for EL1 and EL2-E2H, if SMMU_IDR5.VAX == 0b10. • 56 bits for EL3, if SMMU_IDR5.VAX == 0b10. A VA is inside the range only if it is correctly sign-extended from the top bit of the range size upwards, although an exception is made for Top Byte Ignore (TBI) configurations. Note: For example, with a 49-bit VA range and TBI disabled, addresses 0x0000FFFFFFFFFFFF and 0xFFFF000000000000 are within the range but 0x0001000000000000 and 0xFFFE000000000000 are not. See 3.4.1 Input address size and Virtual Address size below for details. 2. The address output from the translation causes a stage 1 Address Size fault if it exceeds the range of the effective IPA size for the given CD: a. For VMSAv8-32 LPAE CDs, the IPA size is fixed at 40 bits (the IPS field of the CD is IGNORED). b. For VMSAv8-64 and VMSAv9-128 CDs, the IPA size is given by the effective value of the IPS field of the CD, which is capped to the OAS. If bypassing stage 1 (because STE.Config == 0b1x0, STE.S1DSS == 0b01 or if unimplemented), the input address is passed directly to stage 2 as the IPA. If the input address of a transaction exceeds the size of the IAS, a stage 1 Address Size fault occurs, the transaction is terminated with an abort and F_ADDR_SIZE is recorded. Otherwise, the address might still lie outside the range that stage 2 will accept. In this case, the stage 2 check 1 described in this section causes a stage 2 Translation fault. Note: The TBI configuration can only be enabled when a CD is used (that is when stage 1 translates) and is always disabled when stage 1 is bypassed or disabled. Note: The SMMU stage 1 bypass behavior is analogous to a PE with stage 1 disabled but stage 2 translating. The SMMU checks stage 1 bypassed addresses against the IAS, which (when VMSAv8-32 LPAE support is implemented) might be greater than the PA. This supports stage 2-only assignment of devices to guest VMs expecting to program 40-bit DMA addresses, which are input to stage 2 translation. Note: This also means that an SMMU implementing only stage 2, or implementing both stages but translating through stage 2 only, can still produce a fault marked as coming from stage 1. Stage 2 receives an IPA, and if not bypassing, the following stage 2 address size checks are performed: 1. On input, a stage 2 Translation fault occurs if the IPA is outside the range configured by the relevant S2T0SZ field of the STE. a. For an STE configured as VMSAv8-32 LPAE (see STE.S2AA64), the input range is capped at 40 bits (and cannot exceed 40 bits regardless of the IAS size.) b. For an STE configured as VMSAv8-64 or VMSAv9-128, the input range is capped to the IAS. Note: If the SMMU supports VMSAv8-32 the IAS is at least 40 bits (that is, even if OAS < 40). This ensures, for a system having OAS < 40, that a VMSAv8-64 stage 2 can accept a 40-bit IPA from a VMSAv8-32 LPAE stage 1. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 59

Chapter 3. Operation 3.4. Address sizes c. For SMMUv3.1 and later, when IAS is 52 bits or 56 bits, then for an STE configured as VMSAv8-64 the stage 2 input range is limited to 52 bits and is further limited to 48 bits unless STE.S2TG indicates 64KB granule or STE.S2DS == 1. 2. The address output from the translation causes a stage 2 Address Size fault if it exceeds the effective PA output range: a. For a VMSAv8-64 or VMSAv9-128 STE, this is the effective value configured in the S2PS field of the STE (which is capped to the OAS). Note: For information on the permitted effective output address sizes, see STE.S2PS. b. For a VMSAv8-32 LPAE STE, this output range is fixed at 40 bits and the STE.S2PS field is IGNORED. If the OAS is less than 40, and if the output address is outside the range of the OAS, the address is silently truncated to fit the OAS. After this check, if the output address of stage 2 is smaller than the OAS, the address is zero-extended to match the OAS. If bypassing stage 2 (because STE.Config == 0b10x or if unimplemented), the IPA is presented directly as the PA output address. If the IPA is outside the range of the OAS, the address is silently truncated to fit the OAS. If the IPA is smaller than the OAS, it is zero-extended. Note: Because the SMMU contains configuration structures that are checked for validity before beginning translation table walks, certain configuration errors are detected as an invalid structure configuration. This includes STE.S2TTB being out of range of the effective stage 2 output address size, or CD.TTBx being out of range of the effective stage 1 output address size. These invoke C_BAD_STE or C_BAD_CD configuration faults, respectively, instead of an Address Size fault. 3.4.1 Input address size and Virtual Address size The architectural input address size of the SMMU is 64 bits. If a client device outputs an address smaller than 64 bits, or if the interconnect between a client device and the SMMU input supports fewer than 64 bits of address, the smaller address is converted to a 64-bit SMMU input address in a system-specific manner. This conversion is outside of the scope of this specification, but must comply with the rules in this section. The A-profile architecture provides support for different maximum VA size, as follows: • Armv8.0 and Armv8.1 [2] support a maximum of a 49-bit VA in AArch64 state, meaning there are up to 49 significant lower bits that are sign-extended to a 64-bit address. • Armv8.2 to Armv8.8 [2] supports a maximum of a 53-bit VA or 49-bit VA in AArch64 state, meaning there are up to 53 or 49 significant lower bits that are sign-extended to a 64-bit address. • Armv8.9 [2] supports a maximum of a 56-bit VA, 53-bit VA or 49-bit VA in AArch64 state, meaning there are up to 56, 53 or 49 significant lower bits respectively, that are sign-extended to a 64-bit address. A 56-bit VA is supported for VMSAv9-128 translations. Stage 1 translation contexts configured as VMSAv8-64 or VMSAv9-128 have an input VA range that is configurable up to the maximum supported size as described above (arranged as two halves and translated through TTB0 and TTB1). The term, VAS, represents the VA size chosen for a given SMMU implementation, determined as follows: • When SMMU_IDR5.VAX == 0b00, this is 49 bits (2 × 48 bits). • When SMMU_IDR5.VAX == 0b01, this is 53 bits (2 × 52 bits). • When SMMU_IDR5.VAX == 0b10, this is 56 bits (2 × 55 bits for EL1 and EL2-E2H StreamWorlds, or 1 × 56 bits for EL3 StreamWorld). Note: In SMMUv3.0, SMMU_IDR5.VAX is RES0 and therefore VAS is always 49 bits. Stage 1’s high translation table, TTB1, can only be selected if VAS significant bits of address are presented to the SMMU sign-extended. If applications require use of both TTB0 and TTB1 then the system design must transmit addresses of at least VAS bits end-to-end, from device address registers through the interconnect to the SMMU, and that sign-extension occurs from the input MSB upwards as described in this section. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 60

Chapter 3. Operation 3.4. Address sizes Stage 1 translation contexts configured as VMSAv8-32 LPAE have a 32-bit VA. In this case, bits [31:0] of the input address are used directly as the VA. A Translation fault is raised if the upper 32 bits of the input address are not all zero. The TxSZ fields are used to select TTB0 or TTB1 from the upper bits [31:n] in the input range. Input range checks made for a stage 1 VMSAv8-64 or VMSAv9-128 translation table configured (with TxSZ) for an input range of N significant bits fail unless bits VA[AddrTop:N - 1] are identical. • When Top Byte Ignore (TBI) is not enabled, AddrTop == 63. • When TBI is enabled, AddrTop == 55, which means that VA[63:56] are ignored. When TBI is enabled, only VA[55:N-1] must be identical and the effective VA[63:56] bits are taken to be a sign-extension of VA[55] for translation purposes. Note: TBI configuration is part of the CD, so it can only be enabled when stage 1 translates. When stage 1 is bypassed or disabled, no CD is used and TBI is always disabled. The term Upstream Address Size (UAS) represents the number of effective bits of address presented to the SMMU from a client device: 1. If 57 <= UAS < 64, TBI has meaning as there are bits supplied in VA[63:56] that might differ from VA[55:(VAS-1)]. TBI determines whether a Translation fault is invoked if they differ. 2. If VAS <= UAS < 57, TBI is meaningless as the input sign-extension means VA[63:56] cannot differ from VA[55]. 3. If UAS <= VAS, the range checks can only fail if translation table range is configured with a T0SZ, or T1SZ, if UAS == 49, smaller than the presented address. That is, the maximum configuration of stage 1 translation tables covers any presented input address. For VMSAv8-64 and VMSAv9-128, the stage 1 translation table, TTB0 or TTB1, is selected based on VA[55]. Therefore, because an address size from the client device that is less than the VAS bits is zero-extended to 64, this means VA[55] == 0 and TTB1 is never selected. If any upper address bits of a 64-bit address programmed into a peripheral are not available to the SMMU sign-checking logic, whether by truncation in the interconnect or peripheral, software must not rely on mis-programmed upper bits to cause a Translation fault in the SMMU. If such checking is required within such a system, software must check the validity of upper bits of DMA addresses programmed into such a device. All input address bits are recorded unmodified in SMMU fault event records. 3.4.2 Address alignment checks The SMMU architecture does not check the alignment of incoming transaction addresses. Note: For a PE, the alignment check is based on the size of an access. This semantic is not directly applicable to client device accesses. 3.4.3 Address sizes of SMMU-originated accesses Distinct from client device accesses forwarded into the system, the SMMU originates accesses to the system for the purposes of: • Configuration structure access (STE, CD). • Queue access (Command, Event, PRI). • MSI interrupt writes. • Last-stage translation table walks: – Note: Addresses output from stage 1 walks in a nested configuration are input to stage 2 and translated in the expected manner (including causing stage 1 Address Size faults, or stage 2 Translation faults from IPAs outside the stage 2 translation range), rather than being output into the system directly. An access address can be out of range if it relates to a base address that is already greater than an allowed address size, or if an index is applied to a base address so that the result is greater than an allowed address size. If an ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 61

Chapter 3. Operation 3.4. Address sizes access address is calculated to be a PA value greater than the SMMU OAS or physical address size, or an IPA value greater than the IAS or intermediate address size, behavior is as follows: Access type Configured by Address type Behavior when address too large CD fetch or L1CD fetch STE.S1ContextPtr If stage 1-only, PA Truncate to OAS or F_CD_FETCH or C_BAD_STE (1) If stage 2 present, IPA Truncate to IAS or C_BAD_STE or Stage 2 Translation fault (2) CD fetch L1CD.L2Ptr If stage 1-only, PA Truncate to OAS or F_CD_FETCH or C_BAD_SUBSTREAMID (3) If stage 2 present, IPA Truncate to IAS or C_BAD_SUBSTREAMID or stage 2 Translation fault (4) STE fetch SMMU_()STRTAB_BASE or L1STD.L2Ptr PA Truncate to OAS or F_STE_FETCH (5) VMS fetch STE.VMSPtr PA C_BAD_STE Queue access SMMU()Q_BASE PA/IPA(1) Truncate to OAS (6) MSI write SMMU_()*_IRQ_CFG {0,1,2} or CMD_SYNC arguments PA Truncate to OAS (6) Last-stage translation table walk Addresses derived from intermediate translation table descriptors located using STE.S2TTB or STE.S_S2TTB or CD.TTB{0,1}, after the first level translation table descriptor fetch. PA Stage 1/2 Address Size fault Starting-level translation table descriptor address in STE.S2TTB or STE.S_S2TTB or CD.TTB{0,1} PA CD or STE ILLEGAL (see CD.TTB{0,1} and STE.S2TTB description). (1) The base address in SMMU_DCMDQ_BASEn.ADDR undergoes translation in the SMMU. See section 3.5.7.3 DCMDQ qSID and STE Association. In the context of these respective access types: 1. An implementation of SMMUv3.1 or later generates C_BAD_STE and terminates the transaction. It is CONSTRAINED UNPREDICTABLE whether an SMMUv3.0 implementation: a. Generates an F_CD_FETCH and terminates the transaction. The event contains the non-truncated fetch address. b. Generates a C_BAD_STE and terminates the transaction. c. Truncates STE.S1ContextPtr to the OAS and initiates a read of a CD/L1CD from this address (translation continues). 2. It is CONSTRAINED UNPREDICTABLE whether an implementation: a. Generates a C_BAD_STE and terminates the transaction. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 62

Chapter 3. Operation 3.4. Address sizes b. Inputs the IPA to stage 2 without truncation, generating a stage 2 Translation fault that reports a non-truncated fault address. c. SMMUv3.0 only, inputs the IPA to stage 2 with truncation to the IAS. If translation is successful, initiates a read of a CD/L1CD from the result otherwise generates a stage 2 fault that reports a truncated fault address. 3. An implementation of SMMUv3.1 or later generates C_BAD_SUBSTREAMID and terminates the transaction. It is CONSTRAINED UNPREDICTABLE whether an SMMUv3.0 implementation: a. Generates an F_CD_FETCH and terminates the transaction. The event contains the non-truncated fetch address. b. Generates a C_BAD_SUBSTREAMID and terminates the transaction. c. Truncates L1CD.L2Ptr to the OAS and initiates a read of a CD from this address (translation continues). 4. It is CONSTRAINED UNPREDICTABLE whether an implementation: a. Generates a C_BAD_SUBSTREAMID and terminates the transaction. b. Inputs the IPA to stage 2 without truncation, generating a stage 2 Translation fault that reports a non-truncated fault address. c. SMMUv3.0 only, inputs the IPA to stage 2 with truncation to the IAS. If translation is successful, initiates a read of a CD from the result otherwise generates a stage 2 fault that reports a truncated fault address. 5. It is CONSTRAINED UNPREDICTABLE whether an implementation truncates an STE fetch address (and continues translation) or generates an F_STE_FETCH condition which terminates the transaction and might deliver an error event. 6. Note: When hypervisor software presents an emulated SMMU interface to a guest, Arm recommends that guest-provided addresses are correctly masked to the IPA size to ensure consistent SMMU behavior from the perspective of the guest driver. In all cases where a non-truncated address is reported in a fault (for instance, a stage 2 Translation fault), the reported address is the calculated address of the structure being accessed, for example an L1CD address calculated from a base address of STE.S1ContextPtr indexed by the incoming SubstreamID to locate a L1CD structure. The address of an L1CD or CD, given by STE.S1ContextPtr or L1CD.L2Ptr, is not subject to a stage 1 Address Size fault check. In summary, configuration registers, command fields, and structure fields programmed with out-of-range physical addresses might truncate the addresses to the OAS or PA size. Note: This behavior in part arises from the fact that register address fields are not required to provide storage for high-order physical address bits beyond the OAS. See section 6.3 Register formats for details. Note: Commands, register, and structure fields taking IPA addresses store the entire field width so that a potential stage 2 fault can be correctly raised (providing a full non-truncated IPA in a fault record). ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 63

Chapter 3. Operation 3.5. Command and Event queues 3.5 Command and Event queues All SMMU queues for both input to, and output from the SMMU are arranged as circular buffers in memory. A programming interface has one Command queue for input and one Event queue (and optionally one PRI queue) for output. Each queue is used in a producer-consumer fashion so that an output queue contains data produced by the SMMU and consumed by software. An input queue contains data produced by software, consumed by the SMMU. 3.5.1 SMMU circular queues A queue is arranged as a 2n-items sized circular FIFO with a base pointer and two index registers, PROD and CONS, indicating the producer and consumer current positions in the queue. In each of the output and input roles, only one index is maintained by the SMMU, with the other is maintained by software. For an input queue (Command queue), the PROD index is updated by software after inserting an item into the queue, and is read by the SMMU to determine new items. The CONS index is updated by the SMMU as items are consumed, and is read by software to determine that items are consumed and space is free. An output queue is the exact opposite. PROD indicates the index of the location that can be written next, if the queue is not full, by the producer. CONS indicates the index of the location that can be read next, if the queue is not empty. The indexes must always increment and wrap to the bottom when they pass the top entry of the queue. The queues use the mirrored circular buffer arrangement that allows all entries to be valid simultaneously (rather than N-1 valid entries in other circular buffer schemes). Each index has a wrap flag, represented by the next higher bit adjacent to the index value contained in PROD and CONS. This bit must toggle each time the index wraps off the high-end and back onto the low-end of the buffer. It is the responsibility of the owner of each index, producer or consumer, to toggle this bit when the owner updates the index after wrapping. It is intended that software reads the register, increments or wraps the index (toggling wrap when required), and writes back both wrap and index fields at the same time. This single update prevents inconsistency between index and wrap state. • If the two indexes are equal and their wrap bits are equal, the queue is empty and nothing can be consumed from it. • If the two indexes are equal and their wrap bits are different, the queue is full and nothing can be produced to it. • If the two indexes differ or the wrap bits differ, the consumer consumes entries, incrementing the CONS index until the queue is empty (both indices and wrap bits are equal). Therefore, the wrap bits differentiate the cases of an empty buffer and a full buffer where otherwise both indexes would indicate the same location in both full and empty cases. On initialization, the queue indexes are written by the agent controlling the SMMU before enabling the queue. The queue indexes must be initialized into one of the following consistent states: • PROD.WR == CONS.RD and PROD.WR_WRAP == CONS.RD_WRAP, representing an empty queue. – Note: Arm expects this to be the state on normal initialization. • PROD.WR == CONS.RD and PROD.WR_WRAP != CONS.RD_WRAP, representing a full queue. • PROD.WR > CONS.RD and PROD.WR_WRAP == CONS.RD_WRAP, representing a partially-full queue. • PROD.WR < CONS.RD and PROD.WR_WRAP != CONS.RD_WRAP, representing a partially-full queue. The agent controlling the SMMU must not write queue indexes to any of the following inconsistent states whether at initialization or after the queue is enabled: • PROD.WR > CONS.RD and PROD.WR_WRAP != CONS.RD_WRAP • PROD.WR < CONS.RD and PROD.WR_WRAP == CONS.RD_WRAP If the queue indexes are written to an inconsistent state, one of the following CONSTRAINED UNPREDICTABLE behaviors is permitted: • The SMMU consumes, or produces as appropriate to the given queue, queue entries at UNKNOWN locations in the queue. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 64

Chapter 3. Operation 3.5. Command and Event queues • The SMMU does not consume, or produce as appropriate, queue entries while the queue indexes are in an inconsistent state. • For a queue where the SMMU is the producer, the SMMU treats the queue as though it is full while the queue indexes are in an inconsistent state. Each circular buffer is 2n-items in size, where 0 <= n <= 19. An implementation might support fewer than 19 bits of index. Each PROD and CONS register is 20 bits to accommodate the maximum 19-bit index plus the wrap bit. The actual buffer size is determined by software, up to the discoverable SMMU IMPLEMENTATION DEFINED limit. The position of the wrap bit depends on the configured index size. Note: For example, when a queue is configured with 128 entries it means: • The queue indices are 7-bit. • PROD.WR and CONS.RD fields are 7 bits large. The queue indexes are bits [6:0] of PROD and CONS. • The wrap bit [7] of PROD and CONS registers. Bits [19:8] are ignored. The lifecycle of a circular buffer is shown in Figure 3.8. Producer Consumer Starts empty A B C New entries Entries consumed: queue empty D Producer wraps E F G H I Prod. wrap = 0 Cons. wrap = 0 Wrap bits same Prod. wrap = 1 I J K L E F G H Entries added: queue full Wrap bits differ J K Cons. wrap = 1 Entries consumed, consumer wraps Entries consumed, queue wraps Wrap bits same Figure 3.8: Circular buffer/queue operation When producing or consuming entries, software must only increment an index (except when an increment will ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 65

Chapter 3. Operation 3.5. Command and Event queues cause a wrap to the start). The index must never otherwise be moved backwards. The SMMU makes the same guarantee, only incrementing or wrapping its index values. There is one Command queue per implemented Security state. The SMMU commands are consumed in order from this queue. The Event queues receive asynchronous events such as faults recorded from device traffic or configuration errors (which might be discovered only when device traffic causes the SMMU to traverse the structures). On the Non-secure side, there is one global Event queue which receives events from all Non-secure streams and configuration. When SMMU_S_IDR1.SECURE_IMPL == 1, there is also one Secure Event queue which receives events from all Secure streams and configuration, see section 3.10.2.1 Secure commands, events and configuration. All output queues are appended to sequentially. 3.5.2 Queue entry visibility semantics Any producer (whether the SMMU or software) must ensure that if an update to the PROD index value is observable by the consumer, all new queue entries are observable by the consumer. For output queues from the SMMU (Event and PRI queues), the SMMU writes queue data to memory and, when that data becomes visible with respect to the rest of the Shareability domain, the SMMU allows the updated PROD index value to be observed. This is the first point that a new queue entry is visible to the consumer. A consumer must not assume presence of a new valid entry in a queue through any mechanism other than having first observed an updated PROD index that covers the entry position. If a consumer reads a queue entry beyond the point indicated by the last read of the PROD index, the entry contains UNKNOWN data. Note: Interrupt ordering rules also exist, see section 3.18 Interrupts and notifications. The SMMU makes queue updates observable through the PROD index no later than at the point where it asserts the queue interrupt. Note: Software must not assume a new queue item is present when an interrupt arrives, without first reading the PROD index. If, for example, a prior interrupt handler consumed all events including those of a second batch (with a second interrupt), the next interrupt handler invocation might find no new queue entries. 3.5.3 Event queue behavior The SMMU might support configurable behavior on Translation-related faults, which enable a faulting transaction to be stalled, pending later resolution, or terminated which immediately aborts the transaction. See section 3.12 Fault models, recording and reporting for details on fault behavior. Events are recorded into the Event queue in response to a configuration error or translation-related fault associated with an incoming transaction. A sequence of faults or errors caused by incoming transactions could fill the Event queue and cause it to overflow if the events are not consumed fast enough. Events resulting from stalled faulting transactions are never discarded if the Event queue is full, but are recorded when entries are consumed from the Event queue and space next becomes available. Other types of events are discarded if the Event queue is full. Note: Arm expects that the classes of events that might be discarded are generally used for debug. Section 7.4 Event queue overflow covers the exact queue behavior upon overflow. Arm recommends that system software consumes entries from the Event queue in a timely manner to avoid overflow during normal operation. In all cases in this specification, when it is stated that an event is recorded, the meaning is that the event is recorded if room is available for a new entry in the Event queue and the queue is writable. A queue is writable if it is enabled, has no global error flagged and would not otherwise overflow, see section 7.2 Event queue recorded faults and events. Events that are not reported in response to a stalled transaction (for example where there is no Stall field, or Stall == 0) are permitted to be discarded if they cannot be recorded. Stall events are generally not discarded and are recorded when the Event queue is next writable, see section 7.2 Event queue recorded faults and events for details of exceptions to this rule. Software must consume events from the queue to free up space, otherwise the ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 66

Chapter 3. Operation 3.5. Command and Event queues pending stall events will not be recorded. Stall events are otherwise no different from any other event. The queue is filled in the same circular order and such events do not overwrite existing, unconsumed, events. Where multiple pending events contend for a write to the Event queue, Arm recommends that an implementation does not unfairly prioritize non-stall events above events with Stall == 1, if it is possible to do so. This helps avoid the case of a steady stream of terminated transactions from a misbehaving device holding back the events of stalled transactions for an indeterminate time. If an event is generated in response to a transaction that is terminated, there is no requirement for the event to be made visible in the Event queue before a transaction response is returned to the client. See CMD_SYNC, section 4.7.3 CMD_SYNC(ComplSignal, MSIAddress, MSIData, MSIWriteAttributes), which enforces visibility of events relating to terminated transactions. Note: This means that an event generated in response to a terminated transaction might be visible as an SMMU event before the point that the transaction termination is reported at the client device. 3.5.4 Definition of event record write “Commit” Generation of an event record can be abstracted into these steps: 1. A situation that triggers an event occurs, for example a translation fault. 2. An event record is assembled internally in the SMMU. 3. It is determined that it is possible to write a new queue entry. 4. The final event record is committed to be written to the Event queue entry. 5. The event record becomes visible in the Event queue: a. The update to the record data location is visible to the required Shareability domain. b. The PROD.WR index is updated to publish the new record to software. In terms of queue semantics, the record is not visible (even if it has been written to memory) until the write index is updated to cover the new entry. The commit point, 4, represents the conceptual point after which the event will definitely be written to the queue and eventually become visible. Until commit, the event write might not happen (for example, if the queue is full and software never consumes any entries, the event write will never commit). An event write that has committed is guaranteed to become visible in the Event queue, if the subsequent write does not experience an external abort, see section 7.2 Event queue recorded faults and events. The write of a stall event record must not commit until the queue entry is deemed writable (the queue is enabled and not full). If it is not writable, the stall record is buffered until the queue is next writable, unless one of the exceptions in section 7.2 Event queue recorded faults and events causes the record to be discarded. 3.5.5 Event merging Implementations are permitted to merge some event records together. This might happen where multiple identical events occurred, and can be used to reduce the volume of events recorded into the Event queue where individual events do not supply additional useful information. Events can be merged where all of the following conditions are upheld: • The event types and all fields are identical, except fields explicitly indicated in section 7.3 Event records. • If present, the Stall field is 0. Stall fault records are not merged, see section 3.12.2 Stall model. An implementation is not required to merge any events, but one that does is required to support the STE.MEV flag to enable or inhibit merging of events relating to a given stream. Note: For debugging purposes, merging of some events can be disabled on a per-stream basis using the STE.MEV flag. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 67

Chapter 3. Operation 3.5. Command and Event queues Software implementations (for example a virtual emulation of an SMMU) are not required to respect STE.MEV. A hypervisor might cause events to continue to be merged after a guest requests merging to be disabled, for example if it determines a misbehaving guest to be causing too many debug events. See section 7.3.1 Event record merging for details. 3.5.6 Enhanced Command queue interfaces This section applies only when any of the following are true: • SMMU_IDR1.ECMDQ is 1. • SMMU_S_IDR0.ECMDQ is 1. • SMMU_R_IDR0.ECMDQ is 1. An SMMU can implement multiple Command queues in the Non-secure, Secure or Realm SMMU programming interfaces. The components of the Enhanced Command queue feature are: • Up to 256 Command queue control pages. • Each Command queue control page contains the control interface for up to 256 queues. • Each Command queue control page is implemented in registers in the SMMU. The interface for each queue in a control page is similar to the SMMU_()CMDQ_BASE, SMMU()CMDQ_CONS and SMMU()CMDQ_PROD registers. The presence of Enhanced Command queue interfaces does not imply removal of the SMMU()CMDQ interfaces. For any fields in an ECMDQ interface which are RES0 when SMMU_ECMDQ_BASEn.DM is 1, please see 3.5.7 Direct-mode Enhanced Command Queues. A Command queue control page contains multiple instances of the base, producer and consumer controls for a Command queue. Each instance is referred to as an Enhanced Command queue, ECMDQ. An implementation might have more than one Command queue control page. The number of Command queue control pages available for a given Security state is advertised in SMMU_()IDR6.CMDQ_CONTROL_PAGE_LOG2NUMP. The registers for each Command queue control page are: • SMMU()CMDQ_CONTROL_PAGE_BASEn. • SMMU()CMDQ_CONTROL_PAGE_CFGn. • SMMU()CMDQ_CONTROL_PAGE_STATUSn. Within each Command queue control page are many ECMDQ controls. Each ECMDQ occupies 16 bytes of address space. Offset Field Size Description 0x00 SMMU()ECMDQ_BASEn 64b / 8B Pointer to queue base address, queue size 0x08 SMMU()ECMDQ_PRODn 32b / 4B Queue producer write index 0x0C SMMU()ECMDQ_CONSn 32b / 4B Queue consumer read index, error status The SMMU reacts to updates of each SMMU(*_)ECMDQ_PRODn in finite time. For more information on the layout of the Command queue control pages, see: • Section 6.1 Memory map • Section 6.2.5 Registers in a Command queue control page ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 68

Chapter 3. Operation 3.5. Command and Event queues 3.5.6.1 Behavior The SMMU accesses the Command queues using the attributes configured in SMMU_()CR1.{QUEUE_SH, QUEUE_OC, QUEUE_IC}, and the MPAM attributes configured in SMMU()GMPAM. If any Enhanced Command queue interface is enabled such that SMMU()CR1.{QUEUE_SH, QUEUE_OC, QUEUE_IC} could be used when generating an access, then SMMU()CR1.{QUEUE_SH, QUEUE_OC, QUEUE_IC} are read-only. The conditions for a queue being empty, full and non-empty are the same as for the SMMU()CMDQ_CONS and SMMU()CMDQ_PROD registers as specified in section 3.5.1 SMMU circular queues. The SMMU consumes commands from a queue if the queue is non-empty. A CMD_SYNC consumed from an ECMDQ in a Command queue control page guarantees that the effects of commands previously-consumed on that queue are complete, including the reporting of events relating to configuration and translation information invalidated by those commands. The rules for consumption of commands other than CMD_SYNC are the same as for the Command queue controlled by SMMU()CMDQ_PROD and SMMU()CMDQ_CONS. The rules for consumption of a CMD_SYNC issued on a particular Command queue through SMMU()CMDQ_PROD or a particular ECMDQ through SMMU()ECMDQ_PRODn are independent of the other queues state. A CMD_SYNC is not required to synchronize commands and events relating to other queues. The SMMU is permitted to consume from many queues in parallel. The SMMU does not give a guaranteed serialization or total order of Commands consumed across different queues. For example, an implementation might consume from many queues in a round-robin or weighted round-robin schedule. If SMMU_IDR0.SEV == 1, the SMMU triggers a WFE wake-up event when any ECMDQ becomes non-full. 3.5.6.2 Enabling and disabling an ECMDQ interface An ECMDQ interface is enabled when SMMU()ECMDQ_PRODn.EN == SMMU()ECMDQ_CONSn.ENACK == 1. An ECMDQ interface is disabled when SMMU()ECMDQ_PRODn.EN == SMMU()ECMDQ_CONSn.ENACK == 0. The same guarantees around being enabled, disabled and completing transitions between the two states apply as for SMMU()CR0.CMDQEN/ SMMU()CR0ACK.CMDQEN. In the transition from enabled to disabled, once the SMMU has updated SMMU()ECMDQ_CONSn.ENACK to 0, it is guaranteed that errors have been reported and consumption of commands has stopped, and therefore that SMMU()ECMDQ_CONSn.{ERR_REASON, ERR, RD_WRAP, RD} are stable. The SMMU updates SMMU()ECMDQ_CONSn.ENACK even if SMMU()ECMDQ_PRODn.ERRACK != SMMU()ECMDQ_CONSn.ERR. 3.5.6.3 Errors relating to an ECMDQ interface If the SMMU encounters any error while fetching or processing a command, it toggles the value of SMMU()ECMDQ_CONSn.ERR and updates the reason in SMMU()ECMDQ_CONSn.ERR_REASON. The SMMU updates SMMU()ECMDQ_CONSn.RD and SMMU()ECMDQ_CONSn.RD_WRAP to point at the command that produced ERR_REASON, in the same manner as SMMU()CMDQ_CONS.RD points at a failed command. If SMMU()ECMDQ_PRODn.ERRACK != SMMU(_)ECMDQ_CONSn.ERR as the result of an error, the SMMU does not consume commands. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 69

Chapter 3. Operation 3.5. Command and Event queues If software inappropriately configures SMMU_()ECMDQ_PRODn.ERRACK to mismatch SMMU()ECMDQ_CONSn.ERR, it is CONSTRAINED UNPREDICTABLE whether the SMMU consumes commands from that ECMDQ and whether a subsequent error is correctly reported. This CONSTRAINED UNPREDICTABLE behavior additionally applies in the case when software transitions an ECMDQ from disabled to enabled while SMMU()ECMDQ_PRODn.ERRACK != SMMU()ECMDQ_CONSn.ERR. Disabling the ECMDQ, making SMMU()ECMDQ_PRODn.ERRACK and SMMU()ECMDQ_CONSn.ERR consistent, then enabling the ECMDQ is sufficient to restore predictable behavior. If the update of SMMU()ECMDQ_CONSn.ERR is visible, then the updates of ERR_REASON, RD and RD_WRAP are visible, and fetches from the corresponding queue have completed. The SMMU updates SMMU()ECMDQ_CONSn.ENACK even if SMMU()ECMDQ_PRODn.ERRACK != SMMU()ECMDQ_CONSn.ERR. Errors relating to ECMDQs for a given Security state are additionally reported in SMMU()GERROR.CMDQP_ERR. ECMDQs operate independently of the error status of SMMU()GERROR.CMDQ_ERR. The activation, deactivation and state of SMMU()GERROR.CMDQP_ERR have no effect on the Command queue control page or ECMDQ interfaces reached through it. If the activation of SMMU()GERROR.CMDQP_ERR is observable, then the SMMU()ECMDQ_CONSn.ERR field indicating the reason for the activation is observable. When the SMMU activates SMMU()GERROR.CMDQP_ERR, the GERROR interrupt is triggered, in the same manner as other GERROR conditions in SMMUv3. If the MSI from a CMD_SYNC issued through a Command queue control page experiences an external abort, the abort is reported in SMMU()GERROR.MSI_CMDQ_ABT_ERR in the same manner as for a CMD_SYNC issued through the SMMU()CMDQ interface. 3.5.7 Direct-mode Enhanced Command Queues This section applies only when SMMU_()IDR6.DCMDQ is 0b01. Note: The use of xCMDQ in this section indicates that a statement applies to registers in either an ECMDQ interface or a DCMDQ interface. The Direct-mode Enhanced Command Queue feature allows ECMDQs to be assigned to a guest operating system (guest OS). An ECMDQ that is assigned to a guest OS is referred to as a Direct-mode Enhanced Command Queue (DCMDQ). Each DCMDQ has a DCMDQ interface, through which it can be controlled by the guest OS. A DCMDQ interface comprises the following registers: • SMMU()DCMDQ_BASEn. • SMMU()DCMDQ_CONSn. • SMMU(*_)DCMDQ_PRODn. Each DCMDQ interface corresponds to an ECMDQ interface. See section 3.5.7.1 Configuration of ECMDQ and DCMDQ interfaces. The hypervisor may monitor the behavior of a DCMDQ through its associated ECMDQ interface. A DCMDQ control page contains one or more DCMDQ interfaces, each corresponding to a DCMDQ. A DCMDQ is presented to the guest OS as a Restricted ECMDQ (RECMDQ), which is an ECMDQ with restricted capabilities. These restrictions convey the set of commands available to the guest OS. See section 3.5.8 Restricted Command Queues. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 70

Chapter 3. Operation 3.5. Command and Event queues The hypervisor can provide a guest OS with access to a DCMDQ control page by setting up an emulated SMMU, where an RECMDQ control page as presented to the guest OS is mapped onto the DCMDQ control page. See section 3.5.7.5 Enabling DCMDQ control pages. By default, the SMMU consumes commands from all enabled DCMDQs within a finite time, unless this behavior is overridden by an IMPLEMENTATION DEFINED prioritization scheme. Resetting the SMMU re-enables the default behavior. 3.5.7.1 Configuration of ECMDQ and DCMDQ interfaces The Direct-mode Enhanced Command Queue feature defines the following fields in the existing ECMDQ interface registers: • SMMU_()ECMDQ_BASEn.{DM, VSID}. • SMMU()ECMDQ_PRODn.HS_ERRACK. • SMMU()ECMDQ_CONSn.{HS_ERR, HS_ERR_REASON, SYNTH_SYNC_ERR}. Note: Register fields defined in the ECMDQ interface but not in the DCMDQ interface are not accessible to the guest OS. Accesses to these fields through the DCMDQ interface are RAZ/WI. The number of implemented: • DCMDQ control pages is specified in SMMU()IDR6.DCMDQ_CONTROL_PAGE_LOG2NUMP. • DCMDQ interfaces per DCMDQ control page is specified in SMMU()IDR6.DCMDQ_CONTROL_PAGE_LOG2NUMQ. When the hypervisor reserves ECMDQs for its own usage, the number of reserved ECMDQs must be a multiple of the number of DCMDQ interfaces per DCMDQ control page. The actual number of available DCMDQ control pages is limited by how many ECMDQs remain after reservation. A DCMDQ is active if all of the following are true: • SMMU()ECMDQ_CONSn.ENACK == 1. • SMMU()ECMDQ_PRODn.EN == 1. • SMMU()ECMDQ_BASEn.DM == 1. Access to a DCMDQ interface is RAZ/WI if the DCMDQ interface is not active. A DCMDQ is enabled if all of the following are true: • It is active. • SMMU()DCMDQ_CONSn.ENACK == 1. • SMMU()DCMDQ_PRODn.EN == 1. The total number of DCMDQs implemented for each Security state is equal to the number of ECMDQs implemented for that Security state. The base address for a DCMDQ is configured in SMMU()DCMDQ_BASEn.ADDR. This address is subject to SMMU translation. When a DCMDQ is active, some register fields in the ECMDQ and DCMDQ interfaces become architectural aliases. For such fields, all of the following apply: • A read from either interface returns the same value. • ECMDQ interface access is RO. • DCMDQ interface access is either RO or RW, depending on the access criteria of the field. The following register fields are architectural aliases when a DCMDQ is active: • SMMU()xCMDQ_BASEn.{RA, ADDR, LOG2SIZE}. • SMMU()xCMDQ_PRODn.{ERRACK, WR}. • SMMU(_)xCMDQ_CONSn.{ERR_REASON, ERR, RD}. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 71

Chapter 3. Operation 3.5. Command and Event queues Note: SMMU_()xCMDQ_PRODn.EN and SMMU(_)xCMDQ_CONSn.ENACK are not architectural aliases. Note: Access to a DCMDQ interface is RAZ/WI when the SMMU comes out of reset. Once access is explicitly enabled through the corresponding ECMDQ interface, register fields might have values different from their specified reset values. Figure 3.9 illustrates the interaction between the ECMDQ and DCMDQ control pages, the ECMDQ and DCMDQ interfaces, and their associated circular-buffer queues. SMMU_CMDQ_CONTROL_PAGE registers <0> 32B SMMU_CMDQ_BASE/PROD/CONS Emulated SMMU registers (vSMMU)
SMMU Register page 0 SMMU Registers SMMU_CMDQ_CONTROL_PAGE registers <0> 32B SMMU Register page 0 SMMU_CMDQ_CONTROL_PAGE registers <1> SMMU_CMDQ_BASE/PROD/CONS SMMU_DCMDQ_BASE/PROD/CONS registers <0,0> Submitting to this queue will lead to trap & emulate Ring buffer for emulated main CMDQ Ring buffer for DCMDQ<1,0> Ring buffer for main CMDQ for HV usage Command queue control pages SMMU_ECMDQ_BASE/PROD/CONS registers <0,0> SMMU_ECMDQ_BASE/PROD/CONS registers <0,1> SMMU_ECMDQ_BASE/PROD/CONS registers <0,2> SMMU_ECMDQ_BASE/PROD/CONS registers <0,3> 64kB 64kB 64kB 64kB ECMDQ is ECMDQ n on ECMDQ page m 1 ECMDQ is in use by the HV (ECMDQ<0,0>) 1 DCMDQ has been assigned to a guest (DCMDQ<0,0>) Legend and notation HV can monitor behavior of DCMDQ<1,0> via ECMDQ<0,1> Guest accessible
Register contains pointer to other resources Direct-mode Command queue control pages Figure 3.9: Example DCMDQ configuration 3.5.7.2 DCMDQ Indexing When referring to command queue interfaces, the following definitions apply: • Local index is used to enumerate an xCMDQ interface for a given control page. • Global index is used to enumerate across all xCMDQs interfaces (ECMDQ or DCMDQ). • Control page index is used to enumerate a control page. There are two ways to index an ECMDQ or DCMDQ interface: 1. Using a control page index and a local index: xCMDQ: where m is a control page index and n is a local index. 2. Using a global index: xCMDQ: where o is a global index. The global index of a DCMDQ interface is the same as the global index of the ECMDQ interface that controls the behavior of that DCMDQ interface. The following equations define the relationship between m, n and o in the xCMDQ and xCMDQ notations: o = (m × 2xCMDQ_CONT ROL_P AGE_LOG2NUMQ) + n m = ⌊ o 2xCMDQ_CONT ROL_P AGE_LOG2NUMQ ⌋ ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 72

Chapter 3. Operation 3.5. Command and Event queues The indices must not exceed the number of implemented xCMDQ interfaces: 0 ≤m < 2xCMDQ_CONT ROL_P AGE_LOG2NUMP 0 ≤n < 2xCMDQ_CONT ROL_P AGE_LOG2NUMQ 0 ≤o < 2xCMDQ_CONT ROL_P AGE_LOG2NUMQ × 2xCMDQ_CONT ROL_P AGE_LOG2NUMP 3.5.7.3 DCMDQ qSID and STE Association Each DCMDQ control page has both: • An assigned StreamID, referred to as the queue StreamID (qSID). • An associated Stream Table Entry (STE). The STE is selected by the qSID. 3.5.7.3.1 Queue StreamID (qSID) The qSID is used for the purpose of issuing translation requests and interrupts associated with the corresponding DCMDQ control page. The qSID is the concatenation of the upper bits of SMMU_()IDR7.QSID_BASE, as defined by log2nump, and the DCMDQ control page index, where log2nump is SMMU()IDR6.DCMDQ_CONTROL_PAGE_LOG2NUMP: qSID = {SMMU()IDR7.QSID_BASE[31:log2nump], DCMDQ control page index[log2nump-1:0]}. All StreamIDs reserved for qSID usage form a contiguous range with a base value specified in SMMU()IDR7.QSID_BASE. Arm expects that StreamIDs that fall within this range are never assigned to client devices. If the StreamID of a client device transaction falls within this range, it is IMPLEMENTATION DEFINED whether: • The transaction is silently terminated with an abort. • The SMMU translates the client transaction. Note: If the SMMU translates such a client transaction, the outcome is UNPREDICTABLE. For example, a GPCF might be reported as originating from a DCMDQ transaction. 3.5.7.3.2 Stream Table Entry (STE) For a DCMDQ to function correctly, a Stream Table Entry (STE) associated with a DCMDQ control page is programmed as follows: • STE.Config and STE.STRW configure which stages of translation are used to translate the address supplied in SMMU(*_)DCMDQ_BASEn.ADDR and CMD_SYNC.MSIAddress. Software is expected to set STE.Config to 0b110 (stage 1 bypass, stage 2 translate). Note: When STE.Config is 0b110, STE.STRW is IGNORED. • If SMMU_IDR1.ATTR_TYPES_OVR is 1, software is expected to set STE.{MTCFG, SHCFG} to {0b0, 0b01} so that the cacheability and shareability attributes of the incoming transaction are used. See section 3.5.7.4 DCMDQ Attributes. • TLBI commands submitted over the DCMDQ interface are scoped with STE.S2VMID. Software is expected to configure the STE in such a way that STE.S2VMID is not IGNORED. • The fault configuration is defined as follows: – Software is expected to set STE.S2S to 0b0 (DCMDQ interfaces are not required to support the Stall model). – Software is expected to set STE.S2R to 0b1. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 73

Chapter 3. Operation 3.5. Command and Event queues When an STE associated with a DCMDQ control page is programmed incorrectly, it is IMPLEMENTATION DEFINED whether: • The SMMU treats the STE as programmed. Transactions related to the DCMDQ interface will be processed according to the programmed STE, which can result in UNPREDICTABLE behavior. – When the configuration of the STE implies STE.S2VMID is IGNORED, then for TLBI commands submitted over the DCMDQ interface, the SMMU is permitted to either: * Perform the invalidation on an UNKNOWN VMID value. * Not perform an invalidation. – When STE.S2S is set to 0b1, DCMDQ transactions will stall upon a fault. This might compromise system stability. • The SMMU treats STE.S2S as 0b0 and generates a C_BAD_STE event when any of the other STE fields are configured incorrectly. • The SMMU generates a C_BAD_STE event when any of the STE fields are configured incorrectly. Note: In all cases, if the STE is ILLEGAL then a C_BAD_STE event is generated. See section 5.2.2 Validity of STE. The hypervisor is expected to invalidate the STE before assigning a DCMDQ control page to another guest OS. See section 3.5.7.6 Disabling and Reclaiming DCMDQ control pages. 3.5.7.4 DCMDQ Attributes When commands are fetched from a DCMDQ, these fetches have all the following properties: • The access is a Privileged Data read. • The memory attributes are Normal-iWB-oWB-iSH. • The Input NS attribute of the access is one of the following: – For Non-secure DCMDQs, the access is Non-secure. – For Secure DCMDQs, the access is Secure. – For Realm DCMDQs, the access is Realm. • The hints are configured as for an ECMDQ: – The Read-Allocate hint is configured in SMMU_()DCMDQ_BASEn.RA. – The Transient hint is IMPLEMENTATION DEFINED. The attributes in CMD_SYNC.{MSIAttr, MSH} are the input attributes for the MSI following a CMD_SYNC. The final attributes of the fetch or MSI are determined by the translation process. For information on the MECID used for DCMDQ-related accesses, see STE.MECID and SMMU_R_GMECID. For information on the MPAM attributes used for DCMDQ fetches, see 17.4 Assignment of PARTID and PMG for SMMU-originated transactions. For details on the cacheability and shareability of memory accesses due to SID translation, see section 3.5.9.1 CIT and VSTT lookup process. 3.5.7.5 Enabling DCMDQ control pages This section describes the operations performed to enable a DCMDQ control page for use by a guest OS. 3.5.7.5.1 Effects of Updating SMMU_ECMDQ_PROD.EN on a DCMDQ When SMMU()ECMDQ_BASEn.DM is 1, setting SMMU()ECMDQ_PRODn.EN to 1 has the following effects: • It triggers an Update of SMMU()DCMDQ_PRODn.EN to 0. • The subsequent Update of SMMU()ECMDQ_CONSn.ENACK to 1 guarantees that the Update of SMMU()DCMDQ_CONSn.ENACK to 0 has completed. Note: The Update of SMMU(*_)DCMDQ_PRODn.EN to 0 guarantees that the DCMDQ will not consume any commands until it is enabled through the DCMDQ interface. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 74

Chapter 3. Operation 3.5. Command and Event queues 3.5.7.5.2 Software behavior To enable guest OS access to a DCMDQ control page, the hypervisor performs the following steps: 1. Disable all ECMDQ interfaces to be assigned to the guest OS by setting: • SMMU_()ECMDQ_PRODn.EN to 0. • SMMU()ECMDQ_CONSn.ENACK to 0. 2. Configure the associated STE and, if SID translation will be enabled for this DCMDQ control page, configure the appropriate memory structures. See section 3.5.9 Virtual to physical SID translation. 3. Mark each ECMDQ to be assigned as in direct-mode by setting: • SMMU()ECMDQ_BASEn.DM to 1. • If SID translation will be enabled, SMMU()ECMDQ_BASEn.VSID to 1. 4. Set SMMU()ECMDQ_CONSn.SYNTH_SYNC_ERR to 0b00. 5. Ensure that SMMU()DCMDQP_ERRNn and SMMU()DCMDQP_ERRn are made consistent. See section 3.5.7.7 DCMDQ Errors and Faults. 6. Activate each DCMDQ interface by setting SMMU()ECMDQ_PRODn.EN to 1. 7. Following the Update of SMMU()ECMDQ_CONSn.ENACK to 1 for all ECMDQ interfaces, the DCMDQ control page is active. 3.5.7.6 Disabling and Reclaiming DCMDQ control pages This section describes the operations performed to reclaim a DCMDQ control page from a guest OS. In a typical scenario, the guest OS will have been descheduled, and therefore the hardware can be reassigned to another guest OS. However, the hypervisor can reclaim DCMDQ control pages assigned to a guest OS while the circular-buffer queue is not empty. 3.5.7.6.1 Synthesized Synchronization When software sets SMMU()ECMDQ_PRODn.EN to 0, it triggers a synthesized synchronization operation that has the following effects: • The SMMU stops fetching new commands and only consumes commands from outstanding fetches. • This operation is applied only when a DCMDQ is disabled through the ECMDQ interface, i.e.: – SMMU()ECMDQ_BASEn.DM == 1. – SMMU()ECMDQ_PRODn.EN == 0. – This operation is not applied if a guest OS disables a DCMDQ, i.e.: * SMMU()ECMDQ_BASEn.DM == 1. * SMMU()DCMDQ_PRODn.EN == 0. – This operation has equivalent properties to a CMD_SYNC submitted to the queue with CS == 0b00. When the SMMU Updates SMMU()ECMDQ_CONSn.ENACK to 0, it guarantees all of the following: • Errors have been reported and command consumption has stopped; i.e., all of the following fields are stable: SMMU()ECMDQ_CONSn.{ERR, ERR_REASON, HS_ERR, HS_ERR_REASON, RD, RD_WRAP}. • The synthesized synchronization operation is complete. • The same synchronization guarantees as for a completed CMD_SYNC apply: – The guarantees are made for all successfully consumed commands (those before SMMU()ECMDQ_CONSn.RD). – No guarantees can be made for the command pointed to by SMMU()ECMDQ_CONSn.RD when either: * An error is active on SMMU()ECMDQ_CONSn.ERR. * A hypervisor-serviced error is active on SMMU()ECMDQ_CONSn.HS_ERR. – SMMU()ECMDQ_CONSn.SYNTH_SYNC_ERR is set as follows: * When one or more CMD_ATC_INV commands since the last synchronization operation timed out and this has not been reported as CERROR_ATC_INV_SYNC, this is reported by setting SMMU()ECMDQ_CONSn.SYNTH_SYNC_ERR[0] to 1. * When the MSI write of the prior CMD_SYNC consumed on the DCMDQ has aborted and this has not been reported as HERROR_MSI_ABT, this is reported by setting SMMU(*_)ECMDQ_CONSn.SYNTH_SYNC_ERR[1] to 1. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 75

Chapter 3. Operation 3.5. Command and Event queues * Setting any bit in SMMU_()ECMDQ_CONSn.SYNTH_SYNC_ERR to 1 does not affect SMMU()DCMDQP_ERRn or SMMU()GERROR.DCMDQP_ERR. * Once an error has been reported in SMMU()ECMDQ_CONSn.SYNTH_SYNC_ERR, the SMMU does not report it on subsequent synchronization operations on the DCMDQ. Note: Once the SMMU has set a bit in SMMU()ECMDQ_CONSn.SYNTH_SYNC_ERR, it is the responsibility of the hypervisor to report the corresponding error to the guest OS or take any other corrective action. If SMMU()ECMDQ_CONSn.SYNTH_SYNC_ERR does not equal 0b00 when a DCMDQ is enabled, it is CONSTRAINED UNPREDICTABLE whether SMMU()ECMDQ_CONSn.SYNTH_SYNC_ERR will correctly report an error condition upon a synthesized synchronization operation. 3.5.7.6.2 Software behavior To disable or reclaim a DCMDQ control page from a guest OS, software is expected to perform the following steps: 1. Unmap the DCMDQ control page so that the guest OS can no longer access it. 2. Emulate the page and trap any updates to SMMU()ECMDQ_PRODn pointers. 3. Disable each DCMDQ by setting SMMU()ECMDQ_PRODn.EN to 0. This has the effect described in 3.5.7.6.1 Synthesized Synchronization. 4. Once all queues on the page have been disabled, the hypervisor must ensure that the data structures associated with the qSID have been invalidated, both in memory and any cached copies in the SMMU: • The cached copies of the STE are invalidated by issuing CMD_CFGI_STE. • Any cached copies of Command Queue Information Table (CIT) and vSID Translation Table (VSTT) entries used during SID translation are invalidated using CMD_CFGI_CIT. See section 3.5.9.3 Caching and invalidation of vSID translation structures. • This is followed by a CMD_SYNC to ensure all previous commands have completed successfully. Note: All of these commands are issued to a command queue assigned to the hypervisor. 3.5.7.7 DCMDQ Errors and Faults This section describes the handling of errors and faults for DCMDQs. An error reported in any of the following registers is referred to as a command queue error: • SMMU()CMDQ_CONS. • SMMU()ECMDQ_CONSn. • SMMU()DCMDQ_CONSn. Upon a command queue error, command consumption stops and an error is reported. When SMMU()IDR6.DCMDQ == 0b01: • Command queue errors reported through SMMU()DCMDQ_CONSn.{ERR, ERR_REASON} are referred to as guest-serviced errors and are handled by the guest OS. See section 3.5.7.7.2 Guest-serviced errors. • Command queue errors reported through SMMU()ECMDQ_CONSn.{HS_ERR, HS_ERR_REASON} are referred to as hypervisor-serviced errors and are handled by the hypervisor. See section 3.5.7.7.3 Hypervisor-serviced errors. The SMMU never reports a guest-serviced error and a hypervisor-serviced error concurrently. Specifically, if a command causes a CERROR_ILL, this takes precedence over the reporting of a hypervisor-serviced error. For a summary of errors that can occur on a DCMDQ, see 7.1.1 Direct-mode Enhanced Command Queue error summary. 3.5.7.7.1 Error reporting for DCMDQ control pages The following registers are used for reporting that a DCMDQ control page has one or more DCMDQ interfaces with an active error: • SMMU(_)DCMDQP_ERRn, the error reporting register. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 76

Chapter 3. Operation 3.5. Command and Event queues • SMMU_()DCMDQP_ERRNn, the error acknowledge register. If the number of implemented DCMDQ control pages is m, then ⌈m 64⌉pairs of these registers are implemented, where register pair n reports errors on DCMDQ control pages n × 64 to [(n + 1) × 64] −1. When the value of SMMU()DCMDQP_ERRn.DCMDQP_ERRp is different from the value of SMMU()DCMDQP_ERRNn.DCMDQP_ERRNp, one or more DCMDQ interfaces associated with DCMDQ control page n × 64 + p have encountered a command that cannot be processed. If another DCMDQ interface associated with the same DCMDQ control page reports an error, the SMMU does not toggle SMMU()DCMDQP_ERRn.DCMDQP_ERRp again. If SMMU()DCMDQP_ERRn.DCMDQP_ERRp and SMMU()DCMDQP_ERRNn.DCMDQP_ERRNp differ when enabling a corresponding DCMDQ interface, it is CONSTRAINED UNPREDICTABLE whether the SMMU correctly reports subsequent errors through these registers. The following process will restore regular behavior: • Disable the DCMDQ. • Make both registers consistent. • Re-enable the DCMDQ. Errors on a DCMDQ are always reported and acknowledged through SMMU()GERROR.DCMDQP_ERR and SMMU()GERRORN.DCMDQP_ERR respectively. Note: SMMU()GERROR.CMDQP_ERR does not report errors on an SMMU()DCMDQP_ERRn or SMMU()DCMDQP_ERRNn register. 3.5.7.7.2 Guest-serviced errors When a guest-serviced error occurs during command consumption, all of the following occur: • The SMMU toggles the value of SMMU()DCMDQ_CONSn.ERR and sets SMMU()DCMDQ_CONSn.ERR_REASON to indicate the reason for the error. • The error is additionally reported in SMMU()DCMDQP_ERRm, where m represents the DCMDQ control page, if it is not already active. • If the bit representing the DCMDQ control page was not already active, the error is additionally reported in SMMU_GERROR.DCMDQP_ERR, if SMMU_GERROR.DCMDQP_ERR is not already active. • If SMMU_GERROR.DCMDQP_ERR was not already active, a GERROR interrupt is raised according to the configuration in SMMU_GERROR_IRQ_CFG. Because SMMU_()GERROR, SMMU_()DCMDQP_ERRm, and the GERROR interrupt are not visible to the guest OS, the hypervisor needs to pass this on to the guest OS via the emulated SMMU register pages and inject a GERROR interrupt into the guest OS. Note: The hypervisor must acknowledge an error through SMMU()GERRORN and SMMU()DCMDQP_ERRn before injecting a GERROR interrupt into the guest OS, to prevent the loss of an error event. Note: Resuming command consumption on a DCMDQ is not affected by the value of any of the following registers: • SMMU()DCMDQP_ERRn and SMMU()DCMDQP_ERRNn. • SMMU()GERROR.DCMDQP_ERR and SMMU()GERRORN.DCMDQP_ERR. Command consumption on the command queue where the error is present restarts once the error has been acknowledged, depending on the error type, either by: • The guest OS through SMMU()DCMDQ_PRODn.ERRACK. • The hypervisor through SMMU()ECMDQ_PRODn.HS_ERRACK. In any of the following cases, it is CONSTRAINED UNPREDICTABLE whether the SMMU consumes commands from a queue and whether a subsequent error is correctly reported: • The error status acknowledgment bits on an enabled queue (SMMU()ECMDQ_PRODn.{ERRACK, HS_ERRACK} or SMMU(*_)DCMDQ_PRODn.ERRACK) are inappropriately configured such that ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 77

Chapter 3. Operation 3.5. Command and Event queues they mismatch the corresponding error status bit (SMMU_()ECMDQ_CONSn.{ERR, HS_ERR} or SMMU()DCMDQ_CONSn.ERR). • A queue is enabled using SMMU()DCMDQ_PRODn.EN while SMMU()DCMDQ_PRODn.ERRACK mismatches SMMU()DCMDQ_CONSn.ERR. • A queue is enabled using SMMU()ECMDQ_PRODn.EN while SMMU()ECMDQ_PRODn.HS_ERRACK mismatches SMMU()ECMDQ_CONSn.HS_ERR. If a queue is enabled using SMMU()DCMDQ_PRODn.EN and a hypervisor-serviced error is active (that is, SMMU()ECMDQ_PRODn.HS_ERRACK mismatches SMMU()ECMDQ_CONSn.HS_ERR), then the SMMU does not consume commands from the queue until the hypervisor restarts command consumption by acknowledging the error. 3.5.7.7.3 Hypervisor-serviced errors Hypervisor-serviced errors are handled as follows: 1. When a fault or configuration error is encountered during address translation for a command queue fetch or an MSI, the transaction is terminated with an abort. In addition, if the Event queue is writable, an Event is recorded. This applies for all of the following faults. Any faults not listed are not applicable to these types of transactions: • Configuration errors (C_BAD_STE, C_BAD_STREAMID). • Faults during STE fetch (F_STE_FETCH). • Faults due to cache-related conflicts (F_TLB_CONFLICT, F_CFG_CONFLICT). • Faults due to external aborts during translation walks (F_WALK_EABT). • Faults during translation (F_TRANSLATION, F_PERMISSION, F_ADDR_SIZE and F_ACCESS). The hypervisor is expected to enforce this behavior by setting: – STE.S2S to 0. – STE.S2R to 1. 2. SMMU()ECMDQ_CONSn.HS_ERR is toggled to indicate an active error. 3. SMMU()ECMDQ_CONSn.HS_ERR_REASON is set to HERROR_IPA. 4. When the error is observable through both SMMU()ECMDQ_CONSn.HS_ERR and SMMU()ECMDQ_CONSn.HS_ERR_REASON, the completion of a CMD_SYNC on any queue guarantees that the Event record is either: • Visible in the Event queue. • Discarded if the Event could not be written to the Event queue due to the queue not being writable. Note: This is in line with the behavior of an unrecorded fault event record relating to a client transaction terminated by the SMMU whose abort or termination response could have been observed by the client device before the start of the CMD_SYNC. 5. The error is additionally reported using SMMU()DCMDQP_ERRn and SMMU()GERROR.DCMDQP_ERR, as described in 3.5.7.7 DCMDQ Errors and Faults. Once the hypervisor has fixed the behavior causing the hypervisor-serviced error, SMMU(_)ECMDQ_PRODn.HS_ERRACK is used to: • Acknowledge the error. • Indicate to the SMMU that the command fetch (and corresponding address translation) can be retried. In the case of Event queue overflow where the hypervisor has no visibility of the event related to the hypervisor-serviced error, the hypervisor is expected to resolve the overflow and restart the queue so that the subsequent retry can result in a hypervisor-serviced error being recorded in the Event queue. Note: For more information on: • Errors during SID translation, see section 3.5.9.4 vSID Errors and external aborts. • The reporting of an aborted MSI following a CMD_SYNC, see section 3.5.7.7.4 DCMDQ MSIs. If SMMU_ROOT_IDR0.ROOT_IMPL == 1 and SMMU_ROOT_CR0.GPCEN == 1, all accesses introduced by the DCMDQ feature are subject to granule protection checks. See 3.25 Granule Protection Checks. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 78

Chapter 3. Operation 3.5. Command and Event queues A GPC fault on any DCMDQ-related access is handled as a GPC fault on an SMMU-originated access and is additionally reported as a hypervisor-serviced error. For information on error codes, see section 3.25.7 DCMDQ-related GPC faults. An implementation is permitted to: • Read or prefetch translation entries for any commands ahead of SMMU_()DCMDQ_CONSn.RD and up to SMMU()DCMDQ_PRODn.WR. • Request translations ahead of SMMU()DCMDQ_PRODn.WR, but it must not report any faults or errors for these translations. When a fault or configuration error is encountered during an address translation for such an operation, the SMMU is permitted, but not required, to generate Event records or to report a GPC fault. Note: When a DCMDQ stops consuming commands because of an error or fault, SMMU()ECMDQ_CONSn.RD will point to the oldest command that has an unresolved error. Software might therefore observe more than one Event or GPC fault due to translation requests for commands between SMMU()DCMDQ_CONSn.RD and SMMU()DCMDQ_PRODn.WR, where only one of these is associated with the error reported in the SMMU()ECMDQ_CONSn register. 3.5.7.7.4 DCMDQ MSIs The completion of a CMD_SYNC on a DCMDQ is signalled using an MSI if CMD_SYNC.CS == SIG_IRQ. In this case, completion of the CMD_SYNC leads to an SMMU-generated MSI controlled by a guest OS, where the SMMU uses the MSIData, MSIAddress and MSIAttr fields provided in the CMD_SYNC to generate the MSI. This is subject to SMMU translation. Note: If CMD_SYNC.{CS, MSIAddress} == (SIG_IRQ, zero), then no interrupt is generated. The DeviceID associated with an MSI that is triggered by a CMD_SYNC on a DCMDQ is generated from the qSID of the corresponding DCMDQ control page, in an IMPLEMENTATION DEFINED manner. Note: This produces a unique DeviceID that does not overlap with DeviceIDs produced for client devices, or for other SMMU-originated MSIs. It allows the GIC ITS to differentiate between SMMU-generated MSIs that are under either hypervisor control or guest OS control. The EventID passed to the GIC ITS is generated from CMD_SYNC.MSIData and is therefore under guest OS control. The SMMU ignores the value of CMD_SYNC.MSI_NS, and the target PA space of the MSI is determined by both of the following: • STE[qSID]. • The SMMU translation process. An implementation must not assert wired interrupts upon completion of a CMD_SYNC on a DCMDQ. Completion may only be signalled through an MSI. The hypervisor is expected to present an emulated SMMU to the guest OS which only supports interrupts through MSIs. This means that in the emulated SMMU, the hypervisor: • Sets SMMU()IDR0.MSI to 1. • Does not advertise support for wired interrupts. When an MSI triggered by a CMD_SYNC consumed on a DCMDQ is aborted: 1. The next CMD_SYNC on this DCMDQ does not complete. 2. Command consumption stops and an error is reported to the hypervisor by toggling SMMU()ECMDQ_CONSn.HS_ERR and setting SMMU()ECMDQ_CONSn.HS_ERR_REASON to HERROR_MSI_ABT. 3. The CMD_SYNC completes after the hypervisor acknowledges the error. SMMU(*_)GERROR.MSI_CMDQ_ABT_ERR does not report aborted MSI writes for CMD_SYNCs that are consumed on DCMDQs. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 79

Chapter 3. Operation 3.5. Command and Event queues The hypervisor is responsible for informing the guest OS about the aborted MSI via the emulated view of SMMU_()GERROR.MSI_CMDQ_ABT_ERR. To meet the completion guarantees of a CMD_SYNC from the perspective of the guest OS, the hypervisor is expected to acknowledge the error and restart command consumption only after doing so. 3.5.8 Restricted Command Queues This section applies only when SMMU()IDR2.RECMDQ is 1. A RECMDQ is an ECMDQ with restricted capabilities. RECMDQs can be used for presenting a DCMDQ to a guest OS. The set of commands supported by a RECMDQ is explicitly advertised using the SMMU()IDR2.ECMDQ_CMD fields. Commands that are not supported by a RECMDQ can still be submitted to the main command queue. If software submits an unsupported command to a RECMDQ, then: • The command is not consumed. • The SMMU returns CERROR_ILL through the RECMDQ interface. The attributes in SMMU_()CR1.{QUEUE_IC, QUEUE_OC, QUEUE_SH} do not apply to RECMDQs. Fetches through an RECMDQ interface are Inner Write-back Cacheable, Outer Write-back Cacheable, Inner Shareable. If software acknowledges an error observable in SMMU()ECMDQ_CONSn.ERR before SMMU()GERROR.CMDQP_ERR is observable, it is CONSTRAINED UNPREDICTABLE whether the SMMU activates SMMU()GERROR.CMDQP_ERR. Prefetch and synchronization commands are always permitted on a RECMDQ. All other behavior is identical to that of regular ECMDQs as described in 3.5.6 Enhanced Command queue interfaces. Note: Software may have to split a single sequence of commands across different command queues, depending on the command types supported on each queue. Software must perform the necessary synchronization operations on a per-queue granularity to maintain the ordering properties of the sequence as a whole. For example, by waiting for the completion of a CMD_SYNC on one queue before issuing a command on another queue. 3.5.9 Virtual to physical SID translation This section applies only when SMMU(R_)IDR6.VSID is 0b01. If the SMMU supports both DCMDQs and ATS, it can optionally support translation of StreamIDs (SIDs). This allows the hypervisor to expose a virtual StreamID space to a guest OS, therefore allowing the guest OS to issue ATS Invalidations and PRG responses over a DCMDQ. Virtual SIDs (vSIDs) that are supplied by the guest OS in ATS-related commands and prefetch commands are translated by the SMMU into physical SIDs (pSIDs). When SID translation is enabled, the hypervisor can signal to a guest OS that ATC and PRI commands are supported by presenting an emulated view of the following register fields (configured accordingly): • SMMU_()IDR2.ECMDQ_CMD_ATC. • SMMU(_)IDR2.ECMDQ_CMD_PRI. Note: A guest OS is always permitted to submit prefetch commands to the DCMDQs. SID translation is performed using a vSID Translation Table (VSTT) structure. This is a 2-level table which has a setup similar to the Stream Table. Each guest OS has their own VSTT. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 80

Chapter 3. Operation 3.5. Command and Event queues Indexing into the VSTT is performed using the vSID of the command being consumed. The VSTT base address can be found through the Command Queue Information Table (CIT) structure. This table has a setup similar to the Stream table and might be a 2-level table. The CIT holds the configuration of each VSTT. Indexing into the CIT is performed using the qSID associated with a command queue control page. Access to the structures required for SID translation is Guarded by SMMU_(*)CR0.VSIDEN. 3.5.9.1 CIT and VSTT lookup process Figure 3.10 shows the table walks needed for SID translation, in the case of a 2-level CIT. The configuration information for the first fetch is held in registers, and all other configuration information is held in memory structures. L1 CIT Descriptor CIT Entry 128 bits 64 bits RES0 RES0 Span L2Ptr RES0 SPLIT LOG2SIZE CMDQ Info Table (CIT) – indexed by qSID V SMMU_CITAB_BASE L2Ptr qSID[(Span-1)-1:0] qSID[LOG2SIZE-1:SPLIT] L1 VSTT Descriptor VSTT Entry 64 bits RES0 RES0 Span L2Ptr PSID_VALID RES0 PSID vSID Translation Table (VSTT) – indexed by vSID CITE.VSTT_BASE Base address of VSTT and indexing configuration L2Ptr 64 bits vSID[(Span-1)-1:0] vSID[CITE.LOG2SIZE-1:CITE.SPLIT] SMMU_CITAB_BASE_CFG RES0 VSTT_BASE RES0 Figure 3.10: Table walks during SID translation The following process is an illustrative example of all walks required for SID translation, when a 2-level CIT is used, SMMU(R_)CITAB_BASE_CFG.SPLIT < SMMU_(R_)CITAB_BASE_CFG.LOG2SIZE and CITE.SPLIT < CITE.LOG2SIZE: 1. The index used for the CIT walk is: CIT_INDEX = qSID[SMMU_(R_)IDR6.DCMDQ_CONTROL_PAGE_LOG2NUMP-1:0]. 2. The address of the L1 CIT entry is calculated: {SMMU_(R_)CITAB_BASE.ADDR, 0b0000} + CIT_INDEX[SMMU_(R_)CITAB_BASE_CFG.LOG2SIZE-1:SMMU_(R_)CITAB_BASE_CFG.SPLIT] * 8. 3. The L1 CIT descriptor (L1CITD) is fetched. 4. The address of the CIT entry (CITE) is calculated: {L1CITD.L2Ptr, 0b0000} + CIT_INDEX[(L1CITD.Span-1)-1:0] * 16. 5. The CIT entry is fetched. 6. The index used for the VSTT walk is: VSTT_INDEX = vSID[SMMU_(R_)IDR6.VSIDSIZE-1:0]. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 81

Chapter 3. Operation 3.5. Command and Event queues 7. The address of the L1 VSTT descriptor (L1VSTTD) is calculated: {CITE.VSTT_BASE, 0b0000} + VSTT_INDEX[CITE.LOG2SIZE-1:CITE.SPLIT] * 8. 8. The L1 VSTT descriptor (L1VSTTD) is fetched. 9. The address of the VSTT entry (VSTTE) is calculated: {L1VSTTD.L2Ptr, 0b0000} + VSTT_INDEX[(L1VSTTD.Span-1)-1:0] * 8. 10. The VSTT entry is fetched. Note: Certain bits in SMMU_(R_)CITAB_BASE.ADDR, L1CITD.L2Ptr, CITE.VSTT_BASE and L1VSTTD.L2Ptr might be treated as zero due to address size and alignment restrictions, as defined in the register field descriptions. 3.5.9.2 Attributes used during vSID translation The cacheability and shareability of memory accesses during SID translation are determined by the SMMU_(R_)CR1.{QUEUE_IC, QUEUE_OC, QUEUE_SH} attributes. The Read-Allocate hint for an access to the CIT and the VSTTs is provided by SMMU_(R_)CITAB_BASE.RA. If MPAM is supported, the MPAM attributes for CIT fetches and VSTT fetches are determined by SMMU_(R_)GMPAM.{SO_PMG, SO_PARTID}. If MEC is supported, the MECID for CIT fetches and VSTT fetches are determined by SMMU_R_GMECID.GMECID. 3.5.9.3 Caching and invalidation of vSID translation structures An implementation is permitted, but not required, to cache the information held in the CIT (descriptors and entries) and the VSTT (descriptors and entries). The following commands can be used to invalidate this cached information: • CMD_CFGI_CIT. • CMD_CFGI_VSTT. • CMD_CFGI_VSTT_VSID. • CMD_CFGI_ALL. • CMD_CFGI_STE_RANGE(Range == 31). The Security state of the queue to which these commands are issued determines the Security state of the cached entries that are invalidated. Completion of a CMD_SYNC guarantees that these commands have completed and that the matching cached configuration entries have been invalidated. Completion of a CMD_SYNC does not guarantee that DCMDQ commands that required SID translation using any of the cached configuration entries targeted by these invalidation operations have completed. Completion of the commands requiring SID translation can only be guaranteed by disabling the DCMDQ associated with the targeted cached configuration entries (that is, by setting SMMU_(*)ECMDQ_PRODn.EN to 0), thereby triggering a synthesized synchronization operation. Note: Providing a StreamID or vSID in these commands that is out of range has one of the following CONSTRAINED UNPREDICTABLE behaviors: • The command has no effect. • The command has an effect, taking an UNPREDICTABLE value for the parameter that is out of range. A StreamID is out of range if it exceeds the implemented StreamID size as reported by SMMU_IDR1.SIDSIZE. A vSID is out of range if it exceeds the implemented vSID size as reported by SMMU(R_)IDR6.VSIDSIZE. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 82

Chapter 3. Operation 3.5. Command and Event queues The SMMU_S_INIT.INV_ALL register field invalidates all caches and TLB contents. This includes any caches used for SID translation. 3.5.9.4 vSID Errors and external aborts In addition to the behavior described in 3.5.7.7 DCMDQ Errors and Faults, the following scenarios may occur when a command requiring SID translation is submitted to a DCMDQ: • When SMMU_()IDR6.VSID == 0b00 or SMMU()ECMDQ_BASEn.VSID == 0, the behavior depends on the command type: – CMD_ATC_INV and CMD_PRI_RESP are ILLEGAL, and this is reported through SMMU()ECMDQ_CONSn.ERR_REASON as CERROR_ILL. – CMD_PREFETCH commands which are not otherwise ILLEGAL are IGNORED. • When an error occurs during the SID translation process, this is reported through SMMU_(R_)ECMDQ_CONSn.ERR_REASON as HERROR_SID_CONFIG in any of the following cases: – The CIT or VSTT structures cannot be accessed. – The CITE required for this SID translation is invalid. – qSID[SMMU_(R_)CITAB_BASE_CFG.SPLIT-1:0] >= 2L1CITD.Span-1. – L1CITD.Span == 0. – L1CITD.Span > (SMMU_(R_)CITAB_BASE_CFG.SPLIT + 1). – L1VSTTD.Span > (CITE.SPLIT + 1). • When the qSID is out of range (qSID >= 2SMMU_(R_)CITAB_BASE_CFG.LOG2SIZE), it is CONSTRAINED UNPRE- DICTABLE if the SMMU uses an UNPREDICTABLE qSID or HERROR_SID_CONFIG is raised. Note: For example, an implementation might truncate the upper bits of the qSID. • For all commands that require SID translation, in any of the following cases the command is IGNORED and no error is reported: – VSTTE.PSID_VALID == 0. – vSID[CITE.SPLIT-1:0] >= 2L1VSTTD.Span-1. – L1VSTTD.Span == 0. • When the vSID is out-of-range because: – vSID >= 2SMMU_(R_)IDR6.VSIDSIZE, the command is IGNORED and no error is reported. – vSID < 2SMMU_(R_)IDR6.VSIDSIZE and vSID >= 2CITE.LOG2SIZE, it is CONSTRAINED UNPREDICTABLE if the SMMU uses an UNPREDICTABLE vSID or the command is IGNORED. Note: For example, an implementation might truncate the upper bits of the vSID. • An external abort encountered during the table walking of the CIT or VSTT is reported through SMMU_(R_)ECMDQ_CONSn.HS_ERR as HERROR_SID_EABT. The software guidance in 3.5.7.7 DCMDQ Errors and Faults can be followed to resolve any of the error behaviors described in this section. For a summary of vSID errors, see 7.1.1 Direct-mode Enhanced Command Queue error summary. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 83

Chapter 3. Operation 3.6. Structure and queue ownership 3.6 Structure and queue ownership Arm expects that the Non-secure Stream table, Command queues, Event queue and PRI queue are controlled by the most privileged Non-secure system software. If present, Arm expects that the Secure Stream table, Secure Command queue and Secure Event queue are controlled by Secure software. For example, these would be controlled by software in EL3 if a separation in control between Secure EL1 and EL3 is required. Arm expects that the stage 2 translation tables indicated by all STEs are controlled by a hypervisor. The ownership of stage 1 CDs and translation tables depends on the configuration in use. If pointed to by a Secure STE, they are controlled by Secure software (one of EL3, S-EL2, or S-EL1). If pointed to by a Non-secure STE, they are controlled by Non-secure software (either NS-EL2 or NS-EL1). If pointed to by a Realm STE, they are controlled by Realm software (either Realm-EL2 or Realm-EL1). Note: For example, the context might be one of the following: • Used by a bare-metal OS, which controls the descriptor and translation tables and is addressed by PA. • Used internally by a hypervisor, which controls the descriptor and translation tables and is addressed by PA. • Used by a guest, in which case Arm expects that the CD and translation tables are controlled by the guest, and addressed by IPA. Note: When a hypervisor is used in a given Security state, Arm expects that the Event queue for that Security state is managed by the hypervisor, which forwards events into guest VMs as appropriate. StreamIDs might be mapped from physical to virtual equivalents during this process. In virtualized scenarios, Arm expects a hypervisor to: • Convert guest STEs into physical SMMU STEs, controlling permissions and features as required. Note: The physical StreamIDs might be hidden from the guest, which would be given virtual StreamIDs, so a mapping between virtual and physical StreamIDs must be maintained. • Read and interpret commands from the guest Command queue. These might result in commands being issued to the SMMU or invalidation of internal shadowed data structures. • Consume new entries from the PRI and Event queues, mapping from host StreamIDs to guest, and deliver appropriate entries to guest Event and PRI queues. See section 3.8 Virtualization. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 84

Chapter 3. Operation 3.7. Programming registers 3.7 Programming registers The SMMU registers occupy a set of contiguous 64K pages of system address space that contain mechanisms for discovering capabilities and configuring pointers to in-memory structures and queues. After initialization, runtime access to the registers is generally limited to maintenance of the Command, Event and PRI queue pointers and interaction with the SMMU is performed using these in-memory queues. Optional regions of IMPLEMENTATION DEFINED register space are supported in the memory map. See section 6.1 Memory map for register definitions and layout. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 85

Chapter 3. Operation 3.8. Virtualization 3.8 Virtualization Devices can be put under direct guest control with stage 2-only mappings, without requiring guest interaction with the SMMU. To the guest OS, they appear as though the device is directly connected and might request DMA to PAs (IPAs) directly. The SMMU does not provide programming interfaces for use directly by virtual machines. Arm expects that, where stage 1 facilities are required for use by a guest in virtualization scenarios, this is supported using hypervisor emulation of a virtual SMMU, or a similar interface for use by a virtual machine. Implementations might provide an IMPLEMENTATION DEFINED number of extra hardware interfaces that are located in an IMPLEMENTATION DEFINED manner but are otherwise compatible with the SMMUv3 programming interface. Each interface might be mapped through to a guest VM for it to use directly, for example appearing as a stage 1-only interface to the guest while the hypervisor interface appears as stage 2-only. The management of such an implementation is beyond the scope of this specification. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 86

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS 3.9 Support for PCI Express, PASIDs, PRI, and ATS A PCI RequesterID maps directly to the low-order bits of a StreamID, therefore maps to one STE, see section 3.2 Stream numbering. A PCIe Function is then the minimum granularity that can be assigned to a VM. The PCIe PASID prefix allows a Function to be subdivided into parts, each of which is intended to be assigned to a different user space process at stage 1. The prefix is optional. Transactions from one StreamID might be supplied with a SubstreamID or not, on a per-transaction basis. Because the prefix just identifies a portion of a Function, the Function remains otherwise indivisible and remains the granularity at which assignments to VMs are made. Therefore, in PCI terms: • Stage 2 is associated with a RequesterID (identifying a Function). The Function is assigned to a VM. • Stage 1 is associated with a (RequesterID, PASID) tuple. That is, the PASID differentiates between different stage 1 translation contexts. • The PASID identifies which of the parts or contexts of the Function are assigned to which process or driver at stage 1. • If transactions from a Function are translated using stage 2 but stage 1 is unused and in bypass, there are no stage 1 translation contexts to differentiate with a PASID. Supply of a PASID or SubstreamID to a configuration without stage 1 translation causes the translation to fail. Such transactions are terminated with an abort and C_BAD_SUBSTREAMID is recorded. PASIDs can be up to 20 bits in size. PASIDs are optional, configurable, and of a size determined by the minimum of the endpoint, system software, PCIe Root Complex and the individually-supported substream width of the SMMU. The SMMU is not required to report an error in the case where an endpoint and Root Complex emit a PASID with a value greater than can be expressed in the SubstreamID width supported by the SMMU. In this scenario, the PASID might be truncated to the SubstreamID size on arrival at the SMMU. To minimize PCIe-specific terms, a RequesterID is referred to using a StreamID (to which the RequesterID maps in a hardware-specific manner). In the SMMU, a PASID is referred to as a SubstreamID. Even when a client device supports SubstreamIDs, it is not mandatory to supply a SubstreamID with all transactions from that device. PCIe permits a PASID to be supplied, or not, on a per-transaction basis. Therefore, where a SubstreamID is input to the SMMU, a validity flag is also provided and this is asserted when a PASID is present. The PASID tag provides additional permission attributes on top of the standard PCIe read/write attribute. The tag can express an Execute and Privileged state that correspond to the SMMU INST and PRIV attributes. A PCIe transaction without a PASID is considered Data, unprivileged. The mapping between PCIe and SMMU permissions is described in section 13.7 PCIe permission attribute interpretation. 3.9.1 ATS Interface An optional extra hardware interface might be provided by an SMMU implementation to support PCIe ATS [1] and PRI. This interface conveys translation and paging requests and responses to and from the PCIe Root Complex, which bridges requests and responses into the PCIe domain. Whether the SMMU implements ATS can be discovered from SMMU_(R_)IDR0.ATS. If ATS is implemented, whether the SMMU also implements PRI can be discovered from SMMU_(R_)IDR0.PRI. This support determines the behavior of SMMU-local configuration and commands but does not guarantee that the rest of the system, and all clients of an SMMU, also support ATS and PRI. The ATS and PRI capabilities of dependent PCIe Root Complexes and endpoints thereof are discovered through other means. PCIe ATS has the following properties: • Note: ATS aims to improve the performance of a system using an SMMU, known as a Translation Agent in PCIe terminology, by caching translations within the endpoint or Requester. This can remove contention on a shared TLB, or reduce latency by helping the device to request translation ahead of time. • The remote endpoint Address Translation Cache (ATC, which is equivalent to a TLB) is filled on-demand by making a Translation Request to the Root Complex which forwards it to the SMMU. If the translation exists ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 87

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS and permission checks pass, a Translation Completion response is given with a Physical Address and the ATC caches this response. • The return of a translated, physical address to an endpoint grants the endpoint permission to access the physical address. The endpoint can now make direct access to PAs, which it does by tagging outgoing traffic as Translated. The Root Complex is expected to provide this tag to the SMMU. The SMMU might then allow such transactions to bypass translation and access PA space directly. • If a Translation Request would result in a fault or error, a negative response is returned and the endpoint is not able to access the address using ATS. This denial might be fatal to the endpoint, reported in a device-specific manner, or when PRI is used, initiate a page-in request. • Page access permission checking is performed at the time of the Translation Request, and takes the form of a request for permission to read or to read/write (and, optionally, to execute). The response grants the device access to read or to read/write (and, optionally, to execute) as specified in [1]. The response returns the permissions that are available, which might consist of a subset of the requested permissions. Note: For example, a request to read and write a page might succeed, but only permission to read might be granted. The endpoint does not write pages it has not been granted write access to. • ATS translation failures are reported to the endpoint, which might make an error software-visible, but the SMMU does not record fault events for ATS translation failures. • Invalidation of the ATC translations is required whenever a translation changes in the SMMU. This is done in software. Broadcast invalidation operations might affect the internal TLBs of the SMMU, but these operations are not forwarded into the PCIe domain. – Note: An Arm broadcast TLB invalidation provides an address, ASID and VMID. The SMMU does not map this information back to the RequesterID of an endpoint in hardware. – Note: When CD.TBI0 or CD.TBI1 are used to enable use of tagged pointers with an endpoint that uses ATS, system software must assume that a given virtual address has been cached by the endpoint’s ATC with any value of address bits [63:56]. This means that invalidation of a given virtual address VA[55:12] requires either 256 ATS Invalidate operations to invalidate all possible aliases that the ATC might have cached, or an ATS Invalidate-all operation. • ATS Invalidation is performed using SMMU commands which the SMMU forwards to the Root Complex. The invalidation responses are collected, and the SMMU maintains the ordering semantics upheld by the Root Complex in which a transaction that might be affected by an ATS Invalidate operation must be visible before the ATS Invalidation completes. • ATS must be disabled at all endpoints before SMMU translation is disabled by clearing SMMU_(R_)CR0.SMMUEN. • An ATS Translation Request might be fulfilled using SMMU TLB entries, or cause SMMU TLB entries to be inserted. Therefore, after a change of translation configuration, an ATS Invalidate Request must be preceded by SMMU TLB invalidation. Software must ensure that the SMMU TLB invalidation is complete before initiating the ATS Invalidation. Note: This order ensures that an ATS Translation Request performed after an ATS Invalidate Request cannot observe stale cached translations. • ATS and PRI are not supported from Secure streams. – In Secure STEs, the EATS field is RES0. – CMD_ATC_INV and CMD_PRI_RESP are not able to target Secure StreamIDs. – The SMMU terminates any incoming traffic marked Translated on a Secure StreamID, aborting the transaction and recording F_TRANSL_FORBIDDEN. – It is IMPLEMENTATION DEFINED whether it is possible for ATS Translation Requests with a Secure StreamID to reach the SMMU. – If it is possible for an implementation to receive an ATS Translation Request from a Secure StreamID, the request is aborted with a UR response and F_BAD_ATS_TREQ is recorded into the Secure Event ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 88

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS queue. The check for a Secure ATS Translation Request takes place prior to checking of StreamID or configuration lookup. • Support for CMD_ATC_INV, CMD_PRI_RESP, CMD_DPTI_ALL and CMD_DPTI_PA on the Secure Command queue is optional and is indicated by SMMU_S_IDR3.SAMS. • The Smallest Translation Unit, (STU, as programmed into the ATS Control Register of the Endpoint) is defined as the smallest granule that the SMMU implements, as discovered from SMMU_IDR5. Software must program the same STU size for all devices serviced by an SMMU, and must not assume all SMMUs in the system are identical in this respect. The Page Request Interface (PRI) adds the ability for PCIe functions to target DMA at unpinned dynamically-paged memory. PRI depends on ATS, but ATS does not require PRI. Like ATS, PRI can be enabled on a per-function basis. When enabled: • If an ATS request fails with a not-present result, the endpoint issues a PRI page request to ask software to make the requested pages resident. • Software receives these PRI Page Requests (PPRs) on the PRI queue and issues a positive PRI response command to the SMMU after making pages present. If a requested address is unavailable, a programming failure has caused the device to request an illegal address, and software must issue a negative PRI response command. • The net effect is that a hypervisor or OS can use unpinned, dynamically-paged memory for DMA. • The PRI queue is of a fixed size and PPRs must not be lost. To ensure this, page request credits are issued by the most privileged system software (that which controls the PRI queue) to each PCIe endpoint using the PRI capability in its configuration space. • If a guest is allowed to use PRI, it enables PRI (through the configuration space) and sets up its own PRI queue. The hypervisor needs to proxy PPRs from the host PRI queue to the guest PRI queue. However, the total system number of PRI queue entries is limited by the PRI queue size of the hypervisor. • The PRI queue size is limited up to a per-SMMU maximum, indicated by SMMU_IDR1.PRIQS. Arm expects that where PRI is used with virtualization then each guest discovers how many PRI queue entries its emulated SMMU supports. The host allocates N from its allocation of L, and ensures that the guest gives out a maximum of N credits (using configuration space) to devices controlled by the guest. L is the total number of PRI queue entries in the PRI queue of the host and is the maximum number of credits actually given out to devices. • If the PRI queue becomes full because of erroneous behavior in a client device, the SMMU and Root Complex will respond to further incoming Page Request messages by returning a successful PRG response. This will not fatally terminate device traffic and a device will simply try ATS, fail, and try PRI again. Arm expects that a system employing this technique would remain functional and free-flowing, if requests were consumed from the PRI queue and space for new requests created, see section 8.1 PRI queue overflow. Note: ATS operation enables an endpoint to issue Translated transactions that bypass the SMMU in some configurations. In these cases use of ATS could be a security issue, particularly when considering untrusted, subverted, or non-compliant ATS devices. For example, a custom FPGA-based device might mark requests as Translated despite the ATS protocol and translation tables not having granted access. SMMU_(R_)CR0.ATSCHK controls whether the SMMU allows Translated traffic to bypass with no further checks, other than an address size check. If configured as requiring further checks, that is SMMU_(R_)CR0.ATSCHK == 1, Translated transactions from an endpoint are controlled by the associated STE.EATS field, which provides a per-device control of whether ATS traffic is allowed. When allowed, that is when the effective value of STE.EATS is 0b01, 0b10 or 0b11, Translated transactions are accepted. Otherwise, when the effective value of STE.EATS is 0b00, the transaction is terminated with an abort, and an F_TRANSL_FORBIDDEN event is recorded. Note: An implementation might perform this traffic interception and checking in a manner that is much quicker than performing full translation, thereby retaining a performance advantage of using ATS while achieving greater safety than permitting all ATS traffic. STE.EATS also allows a mode in which ATS responses are returned with IPAs, the output from the stage 1 of a stage 1 and stage 2 configuration, so that later Translated transactions from the endpoint are considered IPAs and further translated by the SMMU using the stage 2 configuration of the stream. This allows ATS to be used ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 89

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS (for example with PRI) while maintaining stage 2 isolation. This mode is optional in an implementation, and support is discovered through SMMU_IDR0.NS1ATS. When implemented, this mode can only be used when SMMU_(R_)CR0.ATSCHK is 1, as stage 2 translation needs to be applied. Note: When ATS is used with nested stage 1 and stage 2 translation, any modification to stage 1 or stage 2 requires an invalidation of ATC entries, which cache information derived from both stages. This also applies to the EATS == 0b10 Split-stage ATS case. If SMMU_(R_)CR0.SMMUEN == 1, translated transactions that bypass the SMMU, either when SMMU_(R_)CR0.ATSCHK is 0, or when SMMU_(R_)CR0.ATSCHK is 1 and the effective value of STE.EATS is not 0b00, are subject to address size checks. See section 3.9.1.1 Handling of addresses in ATS-related transactions for more information on the required behavior. Note: This situation would not normally arise outside of incorrect ATS Invalidation when transitioning between Split-stage ATS mode and regular ATS mode. The recommended mapping between StreamID and RequesterID is described in 3.2 Stream numbering. Arm expects that most highly-integrated non-PCIe devices requiring translation and paging facilities will use IMPLEMENTATION SPECIFIC distributed SMMU TLB facilities, rather than using ATS and PRI. Using SMMU facilities allows such devices to participate in broadcast TLB invalidation and use the Stall fault model. If the translation requested by an ATS request is valid and HTTU is enabled, the SMMU must update Translation Table Dirty/Access flags on receipt of the ATS Translation Request, see 3.13 Translation tables and Access flag/Dirty state below. An ATS request is either for a read-only translation (in which case the NW flag of the request is set) so only the Access flag is updated, or for a read-write translation (in which case the NW flag of the request is clear) for which both the Access flag and the dirty state of the page are updated. Note: Because the intention is for the actual traffic to bypass the SMMU, the ATS request is the only opportunity the SMMU will have to note the access in the flags. A PASID tag can also be applied to an ATS Translation Request to select translation under a specific SubstreamID. A PASID-tagged ATS TR requests that the endpoint be granted access to a given address, according to the Execute and Privileged attributes of the PASID in addition to the existing NW write intention. When the SMMU returns an ATS Translation Completion for a request that had a PASID, the Global bit of the Translation Completion Data Entry must be zero. Note: The TDISP eXtended TEE (XT) Extensions specification [5] replaces the Global bit with the TE bit. For more information on the TE bit, see 3.9.4.2 TE bit on ATS Translation Completions. Note: The SMMU differentiates translation contexts intended to be shared with the PE from those not shared, using the CD.ASET mechanism. Whether a global translation matches is also a function of ASET. However, no mechanism exists to indicate that all possible global translations (from all contexts used by an endpoint) share an identical address space layout so that global translations can be used. The ATS Global flag must be cleared because a non-shared context must not match global translations from a shared context (and vice versa). Note: Arm expects that general-purpose software will require HTTU for use with PRI. See section 3.13.7 ATS, PRI and translation table flag update for more information on flag updates with ATS. Note: PRI requires ATS to be implemented, but ATS does not require PRI to be implemented. Note: An SMMU that does not support HTTU can support paged DMA mappings for non-PCI devices using the Stall fault model, see section 3.12 Fault models, recording and reporting. PCIe cannot be used with the Stall fault model, so a requirement for paged DMA with PCIe implies a requirement for PRI, which implies a requirement for HTTU. Transactions that make use of ATS might differ from ordinary PCIe non-ATS transactions in several ways: • Translation Requests that do not successfully translate, including those that would ordinarily have CD.A == 0 RAZ/WI behavior, cause an error in the endpoint (recorded in an endpoint-defined manner) or a PRI request, instead of an error or fault being recorded using the SMMU Event queue. • Changes to translations require use of CMD_ATC_INV in addition to SMMU TLB invalidation. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 90

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS • ATS Translated transactions might not represent Instruction/Data and/or Privileged/User marking on the interconnect to memory in the same way as Untranslated transactions. • Pages with execute-only and no read, write or execute permissions cannot be represented, and are inaccessible when using ATS, see section 13.7 PCIe permission attribute interpretation. 3.9.1.1 Handling of addresses in ATS-related transactions Note: In ATS Translation Completions and ATS Translated transactions, the address field is 64 bits wide regardless of the implemented physical address size of the system. Note: It is possible that a buggy or malicious device issues an ATS Translated transaction with non-zero most significant address bits. See also 13.6.2 ATS attributes overview. If an ATS Translated transaction arrives at the SMMU with a physical address where bits above the implemented PA size are non-zero, then one of the following behaviors occurs. It is IMPLEMENTATION DEFINED which behavior occurs: • The transaction is terminated with an abort, and no event or fault is recorded. • The address in the transaction is truncated to the supported PA size of the SMMU as advertised in SMMU_IDR5.OAS, and the truncated address is used for DPT checks, Granule Protection Checks, and the address that is propagated into the system if the transaction is permitted to proceed. 3.9.1.2 Responses to ATS Translation Requests A Translation Request made from a StreamID for which ATS is explicitly or implicitly disabled (because of SMMU_(R_)CR0.SMMUEN == 0, or the effective EATS == 0b00 including where this is because of a Secure STE, or STE.Config == 0b000) results in an ATS Translation Completion with Unsupported Request (UR) status. Configuration or scenario For an ATS Translation Request, leads to SMMUEN == 0 Terminated with UR status and F_BAD_ATS_TREQ generated Using a Secure StreamID Terminated with UR status and F_BAD_ATS_TREQ generated STE.Config == 0b000 Terminated with UR status STE.Config == 0b100 Terminated with UR status and F_BAD_ATS_TREQ generated Effective STE.EATS == 0b00 (Note: Includes EATS == 0b1x when ATSCHK == 0) Terminated with UR status and F_BAD_ATS_TREQ generated A Translation Request that encounters an Address Size, Access or Translation fault arising from the translation process for a page, at either stage, results in an ATS Translation Completion with Success status and R == W == 0 in the Completion Data Entry for that page and no fault is recorded in the SMMU. If the R == W == 0 Translation Completion Data Entry is the first or only entry in the Translation Completion, its translation size is equal to the STU size. A Permission fault can also lead to this response, but other cases that would cause a Permission fault for an ordinary transaction might result in some, but not all, permissions being granted to the endpoint. See 13.7 PCIe permission attribute interpretation for information on permission calculation for ATS. Consistent with Armv8-A, in a two-stage translation, the IPA to PA translation of the output address of a stage 1 Table, Block, or Page descriptor is not architecturally performed unless the descriptor is valid and no fault would arise from the descriptor. This behavior applies to a two-stage translation that is performed for an ATS Translation Request, which means that translation stops if stage 1 leads to an Address Size, Access, or Translation fault, or evaluates to a Translation Completion that grants no permissions. If the final IPA for stage 1 is valid, but does ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 91

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS not provide any access permissions for the Translation Request, the IPA is not translated at stage 2, and no faults from stage 2 are visible. For example, this might happen if the translation tables do not grant any Unprivileged access permissions at stage 1 and the Translation Request has an effective Priv value of 0. See 13.7.1 Permission attributes granted in ATS Translation Completions. Note: The case of Split-stage ATS is included, because permissions are determined from both stages. For more information, see section 13.6.3 Split-stage (STE.EATS == 0b10) ATS behavior and responses. If STE.S1DSS causes a stage 1 skip, and STE.Config == 0b101 (stage 1-only), the response is Success, U == 0, R == W == 1, identity-mapping (see 13.6 PCIe and ATS attribute/permissions handling). A Translation Request that encounters any configuration error (for example ILLEGAL structure contents, or external abort on structure fetch) results in an ATS Translation Completion with Completer Abort (CA) status: If an ordinary transaction were to trigger a. . . ...an ATS Translation Request with the same properties leads to: C_BAD_STREAMID Terminate with CA status. If SMMU_CR2.REC_CFG_ATS == 1 and SMMU_CR2.RECINVSID == 1, the event is recorded. Otherwise, no event is recorded. F_STE_FETCH C_BAD_STE F_VMS_FETCH F_CFG_CONFLICT F_TLB_CONFLICT C_BAD_SUBSTREAMID F_STREAM_DISABLED F_WALK_EABT F_CD_FETCH C_BAD_CD Terminate with CA status. If SMMU_CR2.REC_CFG_ATS == 1, the event is recorded. Otherwise, no event is recorded. F_ADDR_SIZE F_ACCESS F_TRANSLATION Success: R == W == 0 (access denied) This includes stage 2 faults for a CD fetch or stage 1 translation table walk. F_PERMISSION Success. R, W and Exe permission is granted, where requested, from available translation table permissions. In the extreme case, a translation with no access permission gives R == W == 0. Where F_PERMISSION arises at stage 2 for a CD fetch or stage 1 translation table walk, the response of Success and R == W == 0 is given. F_PROTECTED The same behavior that would apply for the original event record. See F_PROTECTED. GPF on output address Terminate with CA status. If the condition described in section Interactions with PCIe ATS applies, the response of W == 0 is given. Note: See Interactions with PCIe ATS for details of when Granule Protection Checks are performed on the output address for ATS Translation Requests. For Event records that are recorded for ATS Translation requests when SMMU_(R_)CR2.REC_CFG_ATS == 1, the RnW field is UNKNOWN. Note: In an SMMU for RME, F_STE_FETCH, F_CD_FETCH, F_VMS_FETCH and F_WALK_EABT can be generated as the result of a GPC fault. See 3.25.2 Interactions with PCIe ATS. The effects of STE overrides on ATS Translation requests are described in 13.1.4 Replace. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 92

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS 3.9.1.3 Handling of ATS Translated transactions Translated transactions generally pass through the SMMU unless one of the following applies: • SMMUEN is disabled for the Security state of the transaction. • A Secure stream is used. • ATSCHK is 1 for the Security state of the transaction, and therefore additional checks are performed. Configuration or scenario For a Translated Transaction, leads to: SMMUEN == 0 F_TRANSL_FORBIDDEN and aborted. Using a Secure StreamID F_TRANSL_FORBIDDEN and aborted. STE.Config == 0b000 If ATSCHK == 1, aborted. STE.Config == 0b100 If ATSCHK == 1, F_TRANSL_FORBIDDEN and aborted. Effective STE.EATS == 0b00 If ATSCHK == 1, F_TRANSL_FORBIDDEN and aborted. GPC fault on output address Aborted. The GPC fault is reported as described in 3.25.4 Reporting of GPC faults. STE.EATS == 0b10 if inappropriate for the protocol(1) IMPLEMENTATION DEFINED whether F_TRANSL_FORBIDDEN and aborted. (1) Some bus protocols, for example CHI, require a translated address to represent a physical address and therefore must have Full ATS enabled. It is IMPLEMENTATION DEFINED whether the SMMU can detect cases where such protocols are in use for StreamIDs that do not have Full ATS enabled by STE.EATS = 0bx1. If the SMMU can detect this case, then if SMMU_(R_)CR0.ATSCHK = 1, transactions checked by the SMMU on a protocol that can only convey a physical address are terminated with an abort and reported as F_TRANSL_FORBIDDEN if STE.EATS is 0b10. If an ordinary transaction were to trigger a. . . ...a Translated transaction with the same properties leads to: F_UUT Aborted. No event is recorded in the Event queue. C_BAD_STREAMID C_BAD_SUBSTREAMID F_STE_FETCH F_VMS_FETCH C_BAD_STE F_CFG_CONFLICT F_STREAM_DISABLED If ATSCHK == 1, aborted. If ATSCHK == 1 and SMMU_CR2.REC_CFG_ATS == 1, the event is recorded in the Event queue. Otherwise, no event is recorded. Note: If ATSCHK == 0, the SMMU does not check configuration for Translated transactions, so does not detect these conditions. Note: Reporting of C_BAD_STREAMID is not affected by SMMU_(R_)CR2.RECINVSID. If a Translated transaction experiences a second stage 2 translation because of an STE.EATS == 0b10 configuration, and if a fault occurs during that stage 2 translation, then the transaction is terminated with an abort and an event is recorded in the same way as for an ordinary transaction. If a PASID is supplied on a Translated transaction, it might be used for the purposes of determining MPAM attributes, see 17.3 PCIe ATS transactions. If SMMU_IDR3.PASIDTT is 0 or an ATS Translated transaction does not contain a PASID TLP prefix, the Translated transaction is treated as though it is presented to the SMMU with PnU == 0, InD == 0, and SSV == 0. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 93

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS If SMMU_IDR3.PASIDTT is 1 and an ATS Translated transaction contains a PASID TLP prefix, PnU (Priv) and InD (Exe) bits are specified by the Translated transaction. These attributes are then overridden by STE.PRIVCFG and STE.INSTCFG as specified in Table 13.4 and considered as follows for the purposes of translation and permissions checking: STE.EATS Behavior 0b01 (Full ATS) For Non-secure streams, no effect. For a Realm stream, then if the target PA space is Non-secure and the access is Instruction, not Data, then F_TRANSL_FORBIDDEN is generated. 0b10 (Split-stage) The attributes are input into the stage 2 translation. 0b11 (Use DPT) Same as for 0b01 (Full ATS). The effects of other STE overrides on ATS Translated transactions are described in 13.1.4 Replace. When processing an ATS Translated transaction with SSV = 1: If the SMMU encounters a translation-related fault then the appropriate Event is recorded. If SMMU_CR2.REC_CFG_ATS == 1 and the SMMU encounters an error when fetching a configuration structure for the ATS Translated transaction then the appropriate Event is recorded. If SMMU_(R_)CR0.SMMUEN == 1 and SMMU_(R_)CR0.ATSCHK == 1, events for an ATS Translated transaction are reported with the following priority: 1. C_BAD_STREAMID. 2. F_STE_FETCH. 3. C_BAD_STE. 4. F_VMS_FETCH. F_VMS_FETCH is only lower priority than C_BAD_STE. F_VMS_FETCH priority relative to other lower priority events in this list is IMPLEMENTATION DEFINED. 5. F_TRANSL_FORBIDDEN arising from any of the following: • STE.EATS is 0b00 (ATS disabled). • STE.EATS is 0b10 (Split-stage) and use of Split-stage ATS is not appropriate for the bus protocol. • STE.NSCFG is 0b01, and an ATS Translated Transaction specifies SEC_SID == Realm and XT == 0. For more information, see 3.9.4.3 XT bit on Untranslated transactions, Translation requests and Translated transactions. 6. If SMMU_IDR3.PASIDTT == 1 and SSV=1: C_BAD_SUBSTREAMID. 7. F_STREAM_DISABLED. 8. If SMMU_IDR3.PASIDTT == 1 and SSV=1: Events encountered while trying to fetch any L1CD or CD as a result of: • The stage 2 translation for the fetch of the L1CD or CD. • F_CD_FETCH. 9. If SMMU_IDR3.PASIDTT == 1 and SSV=1: C_BAD_CD. 10. If STE.EATS is 0b10 (Split-stage), then F_ADDR_SIZE arising from the check of the input address against IAS. Note: This is reported as a stage 1 fault. 11. If STE.EATS is 0b10 (Split-stage), translation-related events from stage 2 translation. 12. If STE.EATS is 0b11 (DPT), F_TRANSL_FORBIDDEN from the DPT check. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 94

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS In an SMMU with SMMU_(R_)IDR3.DPT set to 1, Translated Transactions are subject to DPT checks. When an ATS Translated transaction arrives at the SMMU with SEC_SID = Realm, the following checks are performed: 1. Checks relating to SMMU_R_CR0.{ATSCHK, DPT_WALK_EN} and STE.EATS: • SMMU_R_CR0.ATSCHK is RES1, therefore if STE.EATS is 0b00, 0b01, or 0b10, then: – For PCIe and CXL.io transactions, EATS behavior is as specified in SMMUv3. • If SMMU_R_CR0.DPT_WALK_EN = 0, and STE.EATS = 0b11, and the DPT check cannot be resolved by an existing DPT TLB entry, this is a DPT lookup fault, and is reported as DPT_DISABLED at level 0, and an F_TRANSL_FORBIDDEN event is recorded. • If SMMU_R_CR0.DPT_WALK_EN = 1, and STE.EATS = 0b11, the transaction is checked against DPT configuration. If the DPT check fails because of a Device Access fault, then the transaction is terminated with an abort, and an F_TRANSL_FORBIDDEN event is recorded. If the DPT check fails because of a DPT lookup fault, then the transaction is terminated with an abort, an F_TRANSL_FORBIDDEN event is recorded, and the DPT lookup fault is reported in the appropriate register. 2. The output PA space of the transaction is determined as follows: • If STE.EATS selected Split-stage configuration, the PA space of the transaction is determined from stage 2 translation. If this resolves to a Non-secure PA and the transaction is marked as Instruction not Data, then the transaction is terminated with an abort and an F_PERMISSION event is recorded. • If STE.EATS selected DPT configuration, the PA space of the transaction is determined from the DPT. This is permitted to be a previously-cached value from the result of an earlier ATS translation request, or from a fresh walk of the DPT. If this resolves to a Non-secure PA and the transaction is marked as Instruction not Data, then the transaction is terminated with an abort and an F_TRANSL_FORBIDDEN event is recorded. • Otherwise, the output PA space of the transaction is determined from the input NS attribute and STE.NSCFG. If this resolves to a Non-secure PA and the transaction is marked as Instruction not Data, then the transaction is terminated with an abort and an F_TRANSL_FORBIDDEN event is recorded. Note: SMMU support for the PASID TLP prefix on ATS Translated transactions is optional and therefore the SMMU might not be able to distinguish Instruction versus Data accesses on ATS Translated transactions. The STE.INSTCFG override might force a read transaction to appear as Instruction for the purpose of Permission checks, but Arm does not recommend this configuration. 3. The GPC for the transaction is performed, checking against the PA space determined in step 2. Note: The TDISP eXtended TEE (XT) Extensions specification [5] introduces the XT bit, which affects how the ATS Translated transaction with SEC_SID = Realm are handled. For more information, see 3.9.4.3 XT bit on Untranslated transactions, Translation requests and Translated transactions. When an ATS Translated transaction arrives at the SMMU with SEC_SID = Non-secure, the following checks are performed: 1. Checks relating to SMMU_CR0.{ATSCHK, DPT_WALK_EN} and STE.EATS: • If SMMU_CR0.ATSCHK is 0, no checks are performed in this step. • If SMMU_CR0.ATSCHK is 1 and STE.EATS is 0b00, 0b01 or 0b10, then: – For PCIe and CXL.io transactions the EATS behavior is as specified in SMMUv3. • If SMMU_CR0.DPT_WALK_EN = 0, and STE.EATS = 0b11, and the DPT check cannot be resolved by an existing DPT TLB entry. This is a DPT lookup fault and is reported as DPT_DISABLED at level 0, and an F_TRANSL_FORBIDDEN event is recorded. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 95

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS • If SMMU_CR0.DPT_WALK_EN = 1, and STE.EATS = 0b11, the transaction is checked against DPT configuration. If the DPT check fails then the transaction is terminated with an abort, and an F_TRANSL_FORBIDDEN event is recorded. 2. The GPC for the transaction is performed, based on the transaction targetting Non-secure PA space. 3.9.1.4 ATS Invalidation timeout A CMD_ATC_INV causes an ATS Invalidate Request to be sent to an endpoint and, in the case that a response is not received within the timeout period specified by ATS, Arm strongly recommends the following behavior: • The Root Complex isolates the endpoint in a PCI-specific manner, if it is possible to do so. • A CMD_SYNC that waits for completion of one or more prior CMD_ATC_INV operations causes a CERROR_ATC_INV_SYNC command error if any of the CMD_ATC_INV operations have not successfully completed. See 7.1 Command queue errors. If SMMU_(*)IDR6.DCMDQ is 0b01, this behavior is mandatory. See 3.5.7.6.1 Synthesized Synchronization. – Note: Command processing stops and this situation is differentiated from a normal completion of a CMD_SYNC, which avoids the potential re-use and corruption of a page that has been unmapped but whose translation was incorrectly invalidated. • If it is not possible for an implementation to cause CERROR_ATC_INV_SYNC for a CMD_SYNC that waits for the completion of failed CMD_ATC_INV operations, Arm recommends that the CMD_SYNC either does not complete or a command error must be raised. An IMPLEMENTATION DEFINED error mechanism asynchronous to the completion of the CMD_SYNC must record information of the failure. – Note: This scenario is not recoverable but prevents the invalidation from appearing to have completed, leading to potential data corruption (the error is contained and propagation is avoided). – Note: A completion of a CMD_SYNC without completing an invalidation might lead to corruption of a page that is subsequently re-used by different mappings. 3.9.1.5 ATS Invalidation errors A CMD_ATC_INV that generates an ATS Invalidate Request that causes a UR response from an endpoint completes without error in the SMMU. An invalidation might not have been performed in response to the command. Note: A UR response to an invalidation can occur in several circumstances as specified by [1], including where an invalidation is sent with an out-of-range PASID value. 3.9.2 Changing ATS configuration The ATS behavior of an endpoint is dependent on the STE.EATS field that is associated with the endpoint and on SMMU(R_)CR0.ATSCHK. In addition to enabling extra checks on Translated transactions, ATSCHK changes the interpretation of the EATS == 0b10 encoding, and because ATSCHK is permitted to be cached in configuration caches, this means that a change to ATSCHK must be followed by invalidation of any STEs that are required to heed the new value. Note: The EATS encodings of 0bx1 and 0b10 will respond to Translation Requests and interpret Translated transactions using different address spaces. A direct transition between these encodings might cause IPAs to be interpreted as PAs or vice-versa, which might lead to data corruption. To enable ATS on an existing valid STE with EATS == 0b00: 1. EATS is set to 0bx1 or 0b10 and caches of the STE are invalidated (including CMD_SYNC to ensure completion of the invalidation) 2. ATS is enabled at the endpoint. To disable ATS on an existing STE with EATS != 0b00: 1. ATS must be disabled at the endpoint, the ATCs invalidated, and CMD_SYNC used to ensure visibility of prior transactions using ATS that are in progress. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 96

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS 2. EATS is then set to 0b00. 3. Caches of the STE are then invalidated. EATS must not be transitioned between 0bx1 and 0b10 (in either direction) without first disabling ATS with the procedure described in this section, transitioning through EATS == 0b00. EATS is permitted to be transitioned between 0b01 and 0b11 (in either direction) without first transitioning through EATS == 0b00. EATS == 0b10 is valid only when SMMU_(R_)CR0.ATSCHK == 1. ATSCHK must not be cleared while STE configurations (and the possibility of caches thereof) exist with EATS == 0b10. Before clearing ATSCHK, all STE configurations with EATS == 0b10 must be re-configured to use EATS == 0b00 or EATS == 0bx1, using the procedures described in this section. Note: This ensures that Translated traffic using IPA addressing (originating from Translation Requests handled by a stage 1-only EATS == 0b10 configuration) does not encounter an SMMU with ATSCHK == 0, which would pass the traffic into the system with a PA. Although ATSCHK == 0 causes EATS == 0b10 to be interpreted as 0b00 (ATS disabled), ATSCHK must not be used as a global ATS disable. To set ATSCHK to 1: 1. Set SMMU_(R_)CR0.ATSCHK == 1 and wait for Update procedure to complete. 2. STEs (pre)fetched after this point will interpret STE.EATS according to the new ATSCHK value. 3. Unexpected Translated traffic that is associated with an STE with EATS == 0b00 will now be terminated. 4. ATS can be enabled on an STE as described here: a. Note: The STE update procedure invalidates the STE, which will invalidate any old ATSCHK value cached with it. To clear ATSCHK to 0: 1. Ensure that the ATS is disabled for all STEs that were using EATS == 0b10, flushing ATCs and transitioning through EATS == 0b00. a. Note: After this point, there will be no relevant caches of ATSCHK. 2. Set SMMU_(R_)CR0.ATSCHK == 0 and wait for the Update procedure to complete. 3. STEs (pre)fetched after this point will interpret STE.EATS according to the new ATSCHK value. 4. Translated traffic now bypasses the SMMU without additional checks. 5. Split-stage ATS cannot be enabled on an STE, meaning EATS == 0b10 must not be used. Referring to section 13.6.3 Split-stage (STE.EATS == 0b10) ATS behavior and responses and 13.6.4 Full ATS skipping stage 1, it is possible to configure ATS for a stream where only requests made from substreams (PASIDs) return actual translations, and non-substream Translation Requests return an identity-mapped response that might be cached at the endpoint. Substream configuration (STE.S1DSS and STE.S1CDMax) therefore affects the contents of ATS Translation Completion responses and any change of this configuration must also invalidate endpoint ATCs. 3.9.3 SMMU interactions with CXL The Compute Express Link Specification (CXL) [6] introduces some new features to ATS. An SMMU implementation intended to be used with Type 1 or Type 2 CXL devices (those that issue CXL.cache transactions) must support ATS (SMMU_(R_)IDR0.ATS == 1). An SMMU implementation is permitted to not check CXL.cache transactions against STE.EATS, even if SMMU_(R_)CR0.ATSCHK = 1. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 97

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS It is a software error to configure STE.EATS = 0b10 for a StreamID associated with a CXL device that issues CXL.cache transactions. In this scenario, no event is recorded. If the SMMU receives an ATS Translation Request that has the Source-CXL bit set, for a StreamID that has STE.EATS = 0b10, the ATS Translation Completion has the CXL.io bit set. If the translation for an ATS Translation Request with the Source-CXL bit set returns a memory type other than Inner Write-Back Cacheable, Outer Write-Back Cacheable, Shareable, the CXL.io bit is set in the ATS Translation Completion. If the memory attributes for a translation request cannot be determined, for example if both stages of translation are disabled, by setting STE.S1DSS=0b01 and STE.Config=0b101, and STE.SHCFG or STE.MTCFG are configured to Use incoming, then the SMMU uses default attributes of Inner Write-Back Cacheable, Outer Write-Back Cacheable, Shareable when issuing an ATS Translation Completion. 3.9.4 SMMU interactions with the PCIe fields T, TE and XT The following sections describe the SMMU interactions with the T, TE and XT bits. The T bit is defined in the PCI Express specification [1], and the TE and XT bits are defined in the TDISP eXtended TEE (XT) Extensions specification [5]. 3.9.4.1 T bit and the PCIe IDE TLP prefix fields This section applies only when SMMU_R_IDR3.XT is 0. For information on the behavior of the T bit when SMMU_R_IDR3.XT is 1, see: • 3.9.4.3 XT bit on Untranslated transactions, Translation requests and Translated transactions. • 3.9.4.4 XT and T fields on PRI messages, ATS Invalidation messages and Translation completions. The PCI Express specification [1] includes the IDE TLP prefix. This includes a 1 bit field, T, which indicates a TEE (Trusted Execution Environment) request. This means that a TLP can be in one of three states: 1. No IDE TLP prefix. 2. With IDE TLP prefix and T=0: the transaction is encrypted and protected, and is not from a source associated with a TEE. 3. With IDE TLP prefix and T=1: the transaction is encrypted and protected, and is from a source associated with a TEE. In an Arm system, the absence of the IDE TLP prefix, or the presence of the IDE TLP prefix with T set to 0, means the transaction is associated with Non-secure state. This applies in both directions. For example: • Device to host transactions with T set to 0 are presented to the SMMU with SEC_SID = Non-secure. • CMD_ATC_INV and CMD_PRI_RESP commands issued to a Non-secure Command queue are forwarded to the PCIe Root Port with T set to 0. This also applies for a Secure Command queue if SMMU_S_IDR3.SAMS is 0. An SMMU implementation does not distinguish between the absence of the IDE TLP prefix and the presence of the IDE TLP prefix with T set to 0. A transaction with the IDE TLP prefix and T set to 1 is associated with Realm state and has an implicit input NS attribute of Realm. Transactions arriving from PCIe (including ATS translation requests and PRI messages) with the T bit in the IDE TLP prefix set to 1 are presented to the SMMU with SEC_SID = Realm. PRI requests with the T bit in the IDE TLP prefix set to 1 are delivered to the Realm PRI queue. The SMMU transmits ATS Translation Completions with a T bit value matching the T bit value in the corresponding ATS Translation Request. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 98

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS CMD_ATC_INV and CMD_PRI_RESP commands issued to a Realm Command queue are issued to PCIe with T set to 1. If the StreamID in the command does not match an IDE Selective Stream RID range programmed in the Root Port, the command is not propagated further. For CMD_ATC_INV, this error is reported as CERROR_ATC_INV_SYNC on the next CMD_SYNC for the queue where the command was issued. For CMD_PRI_RESP, this error is not reported. Note: Before the TDISP eXtended TEE (XT) Extensions [5] were introduced, the PCI Express [1] specification did not provide a mechanism for an ATS Translated transaction to distinguish whether it targets a Non-secure or Realm physical address. The T bit in the IDE TLP prefix only indicated whether the request is from a source associated with a TEE or not. For more information on the XT bit, see 3.9.4.3 XT bit on Untranslated transactions, Translation requests and Translated transactions. 3.9.4.2 TE bit on ATS Translation Completions This section applies only when SMMU_R_IDR3.XT is 1. The TDISP eXtended TEE (XT) Extensions specification [5] introduces the TE (TE Memory Attribute) bit for ATS Translation Completion TLPs. The TE bit occupies the same position in the TLP where the Global bit was previously located. On an ATS Translation Completion for a Non-secure stream, the SMMU sets TE to 0. Note: This is consistent with the SMMUv3 behavior in which the Global bit is always set to 0 in ATS Translation Completions. On an ATS Translation Completion for a Realm stream, the SMMU sets TE as follows: • If the completion status is not Success, TE is 0. • If the completion status is Success with R == W == 0, TE is 0. • Otherwise: – If the translation resolved to a Non-secure PA, TE is 0. – If the translation resolved to a Realm PA, TE is 1. Note: The determination of the TE bit value on ATS Translation Completions applies regardless of whether STE.EATS is 0b01, 0b10 or 0b11. Note: If all stages of translation are bypassed as the result of STE.S1DSS configuration, and the Translation Completion status is therefore Success with identity-mapping, the PA space for TE calculation is derived by applying STE.NSCFG on the input NS attribute. 3.9.4.3 XT bit on Untranslated transactions, Translation requests and Translated transactions This section applies only when SMMU_R_IDR3.XT is 1. The TDISP eXtended TEE (XT) Extensions specification [5] introduces the XT bit, which is evaluated together with the existing T bit. The combination of these bits is interpreted as follows: XT T Meaning 0 0 Non-TEE request that must target non-TEE memory. 0 1 TEE request that can target TEE or non-TEE memory. 1 0 TEE request that must target non-TEE memory. 1 1 TEE request that must target TEE memory. An SMMU client is permitted to directly present the XT and T bits to the SMMU. In this case, the SEC_SID is determined from the bitwise-OR of T and XT: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 99

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS T | XT SEC_SID 0 Non-secure. 1 Realm. Note: An SMMU implementation does not distinguish between the absence of the IDE TLP prefix and the presence of the IDE TLP prefix with {XT, T} set to {0, 0}. If the SEC_SID is Realm, then the T bit is interpreted as follows: T Behavior 0 Input NS attribute is Non-secure. 1 Input NS attribute is Realm. Alternatively, an SMMU client is permitted to transform the XT and T bits into the following attributes that are presented to the SMMU: • SEC_SID = Non-secure. • SEC_SID = Realm, Input PAS is unspecified. • SEC_SID = Realm, Input PAS is Non-secure. • SEC_SID = Realm, Input PAS is Realm. The XT value and Input NS attribute are then derived as follows: Attribute XT Input NS attribute Input PAS is unspecified. 0 Realm. Input PAS is Realm. 1 Realm. Input PAS is Non-secure. 1 Non-secure. CMD_ATC_INV and CMD_PRI_RESP commands issued to a Non-secure Command queue are forwarded to the PCIe Root Port with {XT, T} set to {0, 0}. This also applies for a Secure Command queue if SMMU_S_IDR3.SAMS is 0. Note: All of the behaviors specified in the following table are performed before input into the GPC, and all faults are raised with higher priority than GPC faults on the target address of the transaction. Note: In the following table, the STE.NSCFG encoding 0b01 gives the specified behavior only if SMMU_R_IDR3.XT is 1. Otherwise this encoding is Reserved, behaves as 0b00. For an Untranslated transaction, Translation request or Translated transaction, The XT value is considered in conjunction with STE.NSCFG, as follows: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 100

Chapter 3. Operation 3.9. Support for PCI Express, PASIDs, PRI, and ATS NSCFG XT Untranslated transaction Translation request Translated transaction !=0b01 x The input NS attribute, STE.NSCFG and all enabled stages of translation behave as specified in SMMUv3. The input NS attribute, STE.NSCFG and all enabled stages of translation behave as specified in SMMUv3. If STE.EATS selects Split-stage ATS, the output PA space is determined from stage 2 translation. If STE.EATS selects Full ATS with DPT, the output PA space is determined from the DPT configuration. If STE.EATS selects Full ATS, then the output PA space is the input NS attribute overridden by STE.NSCFG. 0b01 0 The input NS attribute, STE.NSCFG and all enabled stages of translation behave as specified in SMMUv3. The input NS attribute, STE.NSCFG and all enabled stages of translation behave as specified in SMMUv3. Terminated with an abort, and F_TRANSL_FORBIDDEN. 0b01 1 SMMU computes the expected output PA space according to all enabled stages of translation. If this does not match the input NS attribute, an F_PERMISSION is reported for the final enabled stage of translation. SMMU computes the expected output PA space according to all enabled stages of translation. If this does not match the input NS attribute, an ATS Translation Completion with Success status and R == W == 0 is returned. In this case, granule protection checks are not performed. Note: If STE.EATS selects Split-stage ATS, then stage 2 translation is included in this computation, in the same way that it is included for the computation of permissions. If STE.EATS selects Split-stage ATS, and the input NS attribute does not match the output PA space returned by the stage 2 translation, then the SMMU raises F_PERMISSION and terminates the transaction with abort. If STE.EATS selects Full ATS with DPT, and the input NS attribute does not match the output PA space returned by the DPT, then the DPT check fails as a Device Access fault. This means that the SMMU raises F_TRANSL_FORBIDDEN and terminates the transaction with abort. If STE.EATS selects Full ATS, then the output PA space is determined from the input NS attribute, and the SMMU does not perform any extra checks. Note: If the XT check fails, but the translation conditions otherwise permit a Dirty state update, the Dirty state update is still permitted to occur. 3.9.4.4 XT and T fields on PRI messages, ATS Invalidation messages and Translation completions This section applies only when SMMU_R_IDR3.XT is 1. The SMMU ignores the XT bit on PRI requests and ATS Invalidation completions. Note: The XT bit is always 0 on PRI requests and ATS Invalidation completions and the PCIe Root Port sets XT to 0 on PRI responses and ATS Invalidation requests. The SMMU transmits ATS Translation Completions with both: • A T bit value matching the T bit value in the corresponding ATS Translation Request. • The XT bit value matching the XT bit value in the corresponding ATS Translation Request. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 101

Chapter 3. Operation 3.10. Security states support 3.10 Security states support The Arm architecture provides support for two Security states, each with an associated physical address space (PA space): Security state PA space Secure state Secure (NS == 0) Non-secure state Non-secure (NS == 1) The SMMU always supports the Non-secure state and programming interface. The Realm Management Extension, FEAT_RME, introduces two new security states, each with an associated physical address space: Security state PA space Secure state Secure Non-secure state Non-secure Realm state Realm Root state Root 3.10.1 StreamID Security state (SEC_SID) StreamID Security state (SEC_SID) determines the Security state of the programming interface that controls a given transaction. The association between a device and the Security state of the programming interface is a system-defined property. If SMMU_S_IDR1.SECURE_IMPL == 0, then incoming transactions have a StreamID, and either: • A SEC_SID identifier with a value of 0. • No SEC_SID identifer, and SEC_SID is implicitly treated as 0. If SMMU_S_IDR1.SECURE_IMPL == 1, incoming transactions have a StreamID, and a SEC_SID identifier. SEC_SID Meaning 0 The StreamID is a Non-secure stream, and indexes into the Non-secure Stream table. 1 The StreamID is a Secure stream, and indexes into the Secure Stream table. In this specification, the terms Secure StreamID and Secure stream refer to a stream that is associated with the Secure programming interface, as determined by SEC_SID. The terms Non-secure StreamID and Non-secure stream refer to a stream that is associated with the Non-secure programming interface, which might be determined by SEC_SID or the absence of the SEC_SID identifier. Note: Whether a stream is under Secure control or not is a different property to the target PA space of a transaction. If a stream is Secure, it means that it is controlled by Secure software through the Secure Stream table. Whether a transaction on that stream results in a transaction targeting Secure PA space depends on the translation table attributes of the configured translation, or, for bypass, the incoming NS attribute. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 102

Chapter 3. Operation 3.10. Security states support For an SMMU with RME DA, the encoding of SEC_SID is extended to 2 bits, and has the following encoding: SEC_SID Meaning 0b00 Non-secure 0b01 Secure 0b10 Realm 0b11 Reserved Transactions with a SEC_SID value of Realm are associated with the Realm programming interface. 3.10.2 Support for Secure state SMMU_S_IDR1.SECURE_IMPL indicates whether an SMMU implementation supports the Secure state. When SMMU_S_IDR1.SECURE_IMPL == 0: • The SMMU does not support the Secure state. • SMMU_S_ registers are RAZ/WI to all accesses. • Support for stage 1 translation is OPTIONAL. When SMMU_S_IDR1.SECURE_IMPL == 1: • The SMMU supports the Secure state. • SMMU_S_ registers configure Secure state, including a Secure Command queue, Secure Event queue and a Secure Stream table. • The SMMU supports stage 1 translation and might support stage 2 translation. • The SMMU can generate transactions to the memory system, to Secure PA space (NS == 0) and Non-secure PA space (NS == 1) where permitted by SMMU configuration. The Non-secure StreamID namespace and the Secure StreamID namespace are separate namespaces. The assignment of a client device to either a Secure StreamID or a Non-secure StreamID, and reassignment between StreamID namespaces, is system-defined. With the exception of SMMU_S_INIT, SMMU_S_* registers are Secure access only, and RAZ/WI to Non-secure accesses. Note: Arm does not expect a single software driver to be responsible for programming both the Secure and Non-secure interface. However, the two programming interfaces are intentionally similar. When a stream is identified as being under Secure control according to SEC_SID, see 3.10.1 StreamID Security state (SEC_SID), its configuration is taken from the Secure Stream table or from the global bypass attributes that are determined by SMMU_S_GBPA. Otherwise, its configuration is taken from the Non-secure Stream table or from the global bypass attributes that are determined by SMMU_GBPA. The Secure programming interface and Non-secure programming interface have separate global SMMUEN translation-enable controls that determine whether bypass occurs. A transaction that belongs to a Stream that is under Secure control can generate transactions to the memory system that target Secure (NS == 0) and Non-secure (NS == 1) PA spaces. A transaction that belongs to a Stream that is under Non-secure control can only generate transactions to the memory system that target Non-secure (NS == 1) PA space. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 103

Chapter 3. Operation 3.10. Security states support Security state Permitted target PA spaces Secure Secure, Non-secure Non-secure Non-secure 3.10.2.1 Secure commands, events and configuration In this specification, the term Event queue and the term Command queue refer to the queue that is appropriate to the Security state of the relevant stream. Similarly, the term Stream table and Stream Table Entry (STE) refer to the table or table entry that is appropriate to the Security state of the stream as indicated by SEC_SID. For instance: • An event that originates from a Secure StreamID is written to the Secure Event queue. • An event that originates from a Non-secure StreamID is written to the Non-secure Event queue. • Commands that are issued on the Non-secure Command queue only affect streams that are configured as Non-secure. • Some commands that are issued on the Secure Command queue can affect any stream or data in the system. • The stream configuration for a Non-secure StreamID X is taken from the Xth entry in the Non-secure Stream table. • Stream configuration for a Secure StreamID Y is taken from the Yth entry in the Secure Stream table. The Non-secure programming interface of an SMMU with SMMU_S_IDR1.SECURE_IMPL == 1 is identical to the interface of an SMMU with SMMU_S_IDR1.SECURE_IMPL == 0. Note: To simplify descriptions of commands and programming, this specification refers to the Non-secure programming interface registers, Stream table, Command queue and Event queue even when SMMU_S_IDR1.SECURE_IMPL == 0. The register names associated with the Non-secure programming interface are of the form SMMU_x. The register names associated with the Secure programming interface are of the form SMMU_S_x. In this specification, where reference is made to a register but the description applies equally to the Secure or Non-secure version, the register name is given as SMMU_(S_)x. Where an association exists between multiple Non-secure, or multiple Secure registers and reference is made using the SMMU_(S_)x syntax, the registers all relate to the same Security state unless otherwise specified. The two programming interfaces operate independently as though two logical and separate SMMUs are present, with the exception that some commands issued on the Secure Command queue and some Secure registers might affect Non-secure state, as indicated in this specification. This independence means that: • The Command and Event queues that are associated with a programming interface operate independently of the Command and Event queues that are associated with the other programming interface. The operation of one does not affect the other programming interface, for example when: – The queues are full. – The queues overflow. – The queues experience an error condition, for example a Command queue that stops processing because of a command error, or an abort on queue access. • Translation through each programming interface can be separately enabled and disabled using the SMMUEN field that is associated with the particular programming interface. This means that one interface might bypass transactions in which case the behavior is governed by the respective SMMU_()GBPA and the other programming interface might translate transactions. • Error conditions in SMMU(_)GERROR apply only to the programming interface with which the register is associated. • Each programming interface has its own *ATOS interface, where ATOS is implemented. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 104

Chapter 3. Operation 3.10. Security states support • Interrupts are configured and enabled separately for the Secure and Non-secure programming interface interrupt sources. When SMMU_S_IDR1.SECURE_IMPL == 1, Arm expects that the SMMU will be controlled by a PE that also implements Secure state. The host PE might: • Implement Armv7-A. • Implement Armv8-A, with EL3 using AArch64 state. • Implement Armv8-A, with EL3 using AArch32 state. StreamWorld differentiates the Secure EL1 translation regime from the EL3 translation regime, allowing TLB entries to be maintained separately for each of these two translation regimes. Secure EL1 TLB entries might be tagged with an ASID, whereas EL3 TLB entries are not. In this case, Arm expects that the Secure SMMU interface is either: • Managed by Secure EL1, with no SMMU usage by EL3. • Managed by EL3 with any EL1 usage brokered to EL3 using a software interface, which is outside the scope of this specification. Arm recommends that Secure EL1 and EL3 do not attempt to both access the Secure Command queue. Arm further recommends that Secure EL1 does not configure streams to cause TLB entries to be marked as EL3. For a PE that implements Armv8-A and uses AArch32 state in EL3 or a PE that implements Armv7-A, there is only one privileged Secure translation regime. No separation is made between TLB entries inserted for Secure OS and Secure monitor software. When a client device is associated with this type of Secure system, Arm recommends that the StreamWorld is configured as Secure so that resulting TLB entries that are associated with this Secure translation regime are ASID-tagged. In this case, Arm recommends that StreamWorld is not configured to insert EL3 TLB entries, because broadcast TLB invalidation from the PE would not be able to affect these TLB entries. For more information, see section 3.17 TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance. A client device with a Secure StreamID provides an input attribute called NS that indicates whether an access is intended to be to a Secure or Non-secure address. A Secure STE might override the input NS attribute of a Secure stream. In bypass configurations of a Secure stream, overriding the input NS attribute allows a client device to issue Secure accesses even if the device is not able to control the input NS attribute. If the input NS attribute is not overridden, the client device can control whether it makes accesses to the Secure or Non-secure PA spaces. In the case where Secure stage 1 is disabled and Secure stage 2 translation is enabled, the input NS attribute distinguishes between Secure and Non-secure IPA spaces. When a Secure STE is configured for stage 1 only translation, the stage 1 translation table descriptor (in conjunction with intermediate NSTable bits) determines the output NS attribute if the translation table descriptor is fetched from Secure memory, in the same way as in a PE and in the SMMUv2 architecture [4]. See Chapter 13 Attribute Transformation. A Secure STE can also be configured for stage 2 translation, if supported. See section 3.10.2.2 Secure EL2 and support for Secure stage 2 translation, A Non-secure STE does not override the input NS attribute, which is treated as Non-secure for all transactions belonging to a Non-secure stream. Access to the Secure Stream table, the Secure Event queue and the Secure Command queue are always made to the Secure PA space. For access to L1CDs and CDs, then the use of Secure IPA or PA space applies at the appropriate stage: • If Secure stage 2 is not in use, L1CD and CD addresses are treated as Secure physical addresses. • If Secure stage 2 is enabled, L1CD and CD addresses are translated through the Secure IPA space. See section 3.10.2.2 Secure EL2 and support for Secure stage 2 translation. Some SMMU commands take a StreamID parameter. When issued to the Secure Command queue, an additional parameter, SSec, indicates whether the SMMU interprets the command as applying to a Secure or a Non-secure StreamID. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 105

Chapter 3. Operation 3.10. Security states support The SMMU_S_CR0.SIF flag provides a mechanism to terminate instruction fetches from Secure streams that target Non-secure PAs or Non-secure IPAs in some configurations. See section 6.3.76.2 SIF for details. 3.10.2.2 Secure EL2 and support for Secure stage 2 translation SMMUv3.2 introduces support for a Secure EL2 translation regime, corresponding with that in an Armv8.4 PE. A Secure STE can be configured with Config[1] set to 1 to enable stage 2 translation if Secure stage 2 is implemented. Support for Secure EL2 and Secure stage 2 is optional for implementations supporting SMMUv3.2 or later. An implementation might support Secure EL2 and Secure stage 2, if the implementation also supports both stage 1 and stage 2. The following implementation options are supported: SMMU_S_IDR1.SECURE_IMPL SMMU_S_IDR1.SEL2 Result 0 X Secure programming interface absent. Secure state is not supported. 1 0 Secure EL2 is not supported. Secure stage 2 is not supported. 1 1 Secure EL2 is supported. Secure stage 2 is supported for use by a Secure STE. In the same way as described in Armv8.4, the result of a Secure stage 1 translation is an address in one of two address spaces, for Secure IPA and a Secure stream Non-secure IPA. The Secure stream Secure IPA space corresponds to a stage 1 output targeting Secure IPA space. The Secure stream Non-secure IPA space corresponds to a stage 1 output of a Secure stream targeting Non-secure IPA space. A Secure stream Non-secure IPA space is translated differently to a Non-secure stream IPA space. A Secure stage 2 supports two translation tables, corresponding to input from each of the two IPA spaces. For a Secure stream with stage 2 translation enabled, the final transaction PA space is determined at stage 2 from the S2SW, S2SA, S2NSW and S2NSA configuration of the selected Secure stream IPA spaces as follows: • If the input into stage 2 is a Secure IPA, the Secure stream Secure IPA space is used for translation. The translation table configured in STE.S_S2TTB is used. The translation table accesses are made to the Secure or Non-secure PA space configured in STE.S2SW. If STE.S2SW == 0, then the final output PA space is determined by by STE.S2SA, otherwise the final output PA space is Non-secure. • If the input into stage 2 is a Non-secure IPA, the Secure stream Non-secure IPA space is used for translation. The translation table configured in STE.S2TTB is used. The translation table accesses are made to the Secure or Non-secure PA space configured in STE.S2NSW. If STE.S2SW == 0 and STE.S2SA == 0 and STE.S2NSW == 0, then the final output PA space is determined by STE.S2NSA. Otherwise the final output PA space is Non-secure. For a Non-secure stream, translation table accesses and final output PA space is always Non-secure. A Secure translation regime with stage 1 and stage 2 configured fetches the L1CD and CD using a Secure IPA. For a Secure stage 2-only translation (resulting from STE.Config == 0b110 or from STE.S1DSS causing stage 1 to be skipped), the choice of whether the IPA is in Non-secure or Secure IPA space after stage 1 bypass is determined from the result of the STE.NSCFG field. For a Secure EL2 translation table walk, the target PA space of the initial level of walk is given by CD.NSCFG{0,1}, depending on the translation table used. Note: An S-EL2 StreamWorld uses one translation table, CD.TTB0 and an S-EL2-E2H StreamWorld might use two translation tables, CD.TTB0 and CD.TTB1. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 106

Chapter 3. Operation 3.10. Security states support The S-EL2 and S-EL2-E2H translation regimes are only used in CD.AA64 == 1 configuration. A Secure STE with stage 2 translation enabled is not permitted to have STE.S2AA64 select VMSAv8-32 LPAE. When stage 2 translation is disabled, all Secure IPA accesses become Secure PA accesses, and all Secure stream Non-secure IPA accesses become Non-secure PA accesses. A Secure translation regime that supports Secure stage 2 configuration uses a VMID tag for TLB entries. This is a Secure VMID and is a distinct namespace from the Non-secure VMID namespace. When Secure stage 2 is implemented then TLB entries inserted from StreamWorld == Secure configurations are: • Tagged with the VMID from STE.S2VMID when stage 2 is enabled. • Tagged with VMID 0 when stage 2 is not enabled. – Note: These entries are affected by corresponding TLB invalidation operations that target VMID 0. See section 3.17 TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance. – Note: This behavior differs from that of the Non-secure S2VMID field because the STE.S2VMID field was IGNORED in Secure STEs before SMMUv3.2. Consistent with Armv8.4, a translation table entry fetched for a Secure stream is treated as non-global if it is read from the Non-secure IPA space. That is, these entries are treated as if nG == 1, regardless of the value of the nG bit in the descriptor. See section 3.17.1 The Global flag in the translation table descriptor. 3.10.3 Support for Realm state The Realm translation regimes are: • The same as the Realm translation regimes in the Armv9-A architecture. • Supported only when VMSAv8-64 or VMSAv9-128 translation tables are used. The SMMU also supports Stream disabled and Stream bypass configuration for Realm state. The size of a StreamID parameter for Realm state is the same as for Non-secure state, as advertised in SMMU_IDR1.SIDSIZE. Consistent with the Realm translation regimes in FEAT_RME [2], the output physical address space of a transaction on a Realm stream is as follows: Configuration Output address space determination Stream bypass, and bypass due to STE.S1DSS. See 3.10.3.3 Realm stream bypass. EL1 stage 1 only Output PA space is always Realm. EL1 stage 1 and 2 Output PA space determined by stage 2 translation. EL1 stage 2 only Output PA space determined by stage 2 translation. EL2 or EL2-E2H stage 1 Output PA space determined by stage 1 translation. A Realm L1STD has the same format and meaning as a Non-secure L1STD, except that L1STD.L2Ptr is a Realm physical address. Unless otherwise specified, a Realm STE has the same format and meaning as a Non-secure STE, except that all pointers from a Realm STE are Realm addresses. Realm L1CD has the same format and meaning as a Non-secure L1CD, except that L1CD.L2Ptr for a Realm L1CD is a Realm address. Realm CD has the same format and meaning as a Non-secure CD, except that CD.TTB0 and CD.TTB1 for a Realm CD are Realm addresses. Note: This means CD.NSCFG0 and CD.NSCFG1 are IGNORED for a Realm stream. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 107

Chapter 3. Operation 3.10. Security states support For all commands issued to a Realm Command queue, then: • The command applies to Realm SEC_SID only. • Any command with a StreamID is interpreted as a Realm StreamID. • SSec == 1 gives CERROR_ILL. For more details, see 16.4.1 System integration for an SMMU with RME DA. 3.10.3.1 Input NS attribute For a Realm stream, the input NS attribute distinguishes between Non-secure and Realm. This applies for untranslated transactions, translated transactions, and translation requests. If the client device does not provide an input NS attribute, the input NS attribute takes a default value of Realm. For example, in AMBA, the NSE signal allows for distinction between the Realm and Non-secure address spaces. For PCIe-related behaviors, see 3.9.4 SMMU interactions with the PCIe fields T, TE and XT. 3.10.3.2 Realm stream disabled Note: If SMMU_R_CR0.SMMUEN is 1 and a Realm STE is configured with STE.Config == 0b000, that stream is disabled. Realm stream disabled is consistent with the stream disabled behavior of Non-secure state. Note: This means that if a Realm stream is disabled then transactions are terminated with an abort. 3.10.3.3 Realm stream bypass Note: If SMMU_R_CR0.SMMUEN is 1, and a Realm STE is configured with STE.Config == 0b100, that stream is in stream bypass mode. The requirements in this subsection also apply when translation is bypassed as a result of STE.S1DSS configuration. Realm stream bypass is consistent with the behavior of stream bypass for Non-secure state, except that the output physical address space for a transaction is derived by applying the STE.NSCFG configuration to the input NS attribute. Note: Transactions for a Realm StreamID that is configured for stream bypass can still result in: • F_ADDR_SIZE. • F_PERMISSION, if an instruction access targets Non-secure PA space. • F_BAD_ATS_TREQ for ATS translation requests. • F_TRANSL_FORBIDDEN for ATS Translated transactions. • Granule protection check faults. Note: In Realm stream bypass mode, client-originated transactions are still associated with the MECID configured in STE.MECID. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 108

Chapter 3. Operation 3.11. Reset, Enable and initialization 3.11 Reset, Enable and initialization The SMMU can reset to a disabled state in which traffic bypasses the SMMU without translation or checking of any kind. The SMMU appears transparent to transactions from client devices, which are given attributes according to the disabled bypass configuration (see Chapter 13 Attribute Transformation). The SMMU can also optionally reset to a disabled state that aborts all transactions for a Security state. This behavior is controlled by the reset state of SMMU_GBPA.ABORT or SMMU_S_GBPA.ABORT. Note: When an SMMU resets to a bypass configuration, it enables client devices that are connected to an SMMU to be used by legacy system software that lacks awareness of the SMMU. Translation of Non-secure Streams is enabled using SMMU_CR0.SMMUEN. When SMMU_S_IDR1.SECURE_IMPL == 1, the Secure programming interface also contains an enable flag, SMMU_S_CR0.SMMUEN, which controls translation of Secure streams. When translation is not enabled for a Security state, an SMMU: • When SMMU_()GBPA.ABORT == 1, aborts all transactions: • When SMMU(S_)GBPA.ABORT == 0, applies attributes to a transaction as determined by SMMU_(A)GBPA or SMMU_S_(A)GBPA. See section 13.2 SMMU disabled global bypass attributes. • Never accesses the Stream table so SMMU_()STRTAB register content is ignored. • Denies PRI Page Requests as though SMMU_(R_)CR0.PRIQEN == 0, regardless of the value of SMMU_(R_)CR0.PRIQEN. See Chapter 8 Page request queue. • Does not perform ATOS operations. See SMMU_GATOS_CTRL. • Does not perform ATS translations. See section 3.9.1.2 Responses to ATS Translation Requests. • Allows registers to be accessed and updated in the normal manner. • Can process commands after the relevant queue pointers are initialized and SMMU_()CR0.CMDQEN is enabled. • Does not record new translation events. However, if SMMU()CR0.EVENTQEN is enabled and the queue pointers are set up, the SMMU might continue to write out buffered events that were generated by earlier translations from when translation was still enabled. See section 6.3.9.6 SMMUEN for a full description of the operation of, and the effect of changes to, the SMMUEN flag. The SMMU()STRTAB_BASE register and the SMMU()CR1 table attributes must be configured before enabling an SMMU interface using SMMU()CR0.SMMUEN. Note: This avoids the possibility of incoming traffic attempting a lookup through uninitialized configuration structure pointers. When translation is disabled for a Security state, transactions on streams that are associated with that Security state are not translated, and take attributes from the appropriate Global Bypass Attribute registers, SMMU(A)GBPA or SMMU_S_(A)GBPA. When translation is enabled for a Security state, transactions on streams that are associated with that Security state follow the SMMU translation flow determined by the appropriate Stream Table Entry. SMMU_CR0.SMMUEN SMMU_S_CR0.SMMUEN Traffic 0 Unimplemented All traffic bypasses SMMU/aborts (as determined by SMMU_GBPA.ABORT). Always targets Non-secure PA space. 1 Unimplemented Traffic follows the SMMU translation flow. Always targets Non-secure PA space. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 109

Chapter 3. Operation 3.11. Reset, Enable and initialization SMMU_CR0.SMMUEN SMMU_S_CR0.SMMUEN Traffic 0 0 Secure and Non-secure streams are controlled by SMMU_(S_)(A)GBPA. SEC_SID determines the Security state of a given stream. Bypass/abort configuration and attributes, including input NS attribute, are provided by the Global Bypass Attribute register (GBPA) appropriate to the Security state. 0 1 SEC_SID determines the Security state: • Secure traffic follows SMMU translation flow. • Non-secure traffic bypasses the SMMU (attributes taken SMMU_GBPA), or aborts. 1 0 SEC_SID determines Security state: • Secure traffic bypasses the SMMU (attributes taken from SMMU_S_GBPA), or aborts. • Non-secure traffic follows the SMMU translation flow. 1 1 SEC_SID determines the Security state, follows usual SMMU translation flow. The state of the caches and TLBs at reset is IMPLEMENTATION SPECIFIC. To avoid UNPREDICTABLE behavior, software must perform the following steps before enabling translation: • Invalidate all configuration and TLB caches. • When SMMU_S_IDR1.SECURE_IMPL == 1, ensure Secure software fully invalidates any Secure cached configuration or TLB entries in the SMMU through the Secure programming interface before handover to Non-secure software. The SMMU is not required to invalidate cached configuration or TLB entries when a change to SMMU_()CR0.SMMUEN occurs. Arm recommends that software initializing the SMMU performs the following steps: 1. Allocate and initialize Stream table memory and base pointers. 2. Allocate and initialize Command queue and Event queue memory, base pointers and indexes. 3. Enable command processing through SMMU()CR0.CMDQEN, and if applicable, Event queue through the relevant EVENTQEN. 4. Issue commands to invalidate all cached configuration and TLB entries (see sections 4.3 Configuration structure invalidation and 4.4 TLB invalidation). 5. Enable translation by setting SMMU(*_)CR0.SMMUEN. Note: These steps are a summary, and do not show the required register update procedure or DSB operations ensuring correct memory and register access ordering. SMMU_S_INIT invalidates SMMU caches and TLBs without issuing commands using the Command queue. Caches and TLBs are invalidated using this register with the following sequence: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 110

Chapter 3. Operation 3.11. Reset, Enable and initialization • Perform a write to SMMU_S_INIT, setting INV_ALL. • Poll SMMU_S_INIT.INV_ALL until it returns to 0, at which point the invalidation is complete. When SMMU_S_IDR1.SECURE_IMPL == 1, Arm expects Secure software to initialize the SMMU using the steps above. If Secure software is not guaranteed to initialize the SMMU in accordance with the steps above, Arm recommends that the system provides an IMPLEMENTATION DEFINED mechanism to allow Non-secure software to access SMMU_S_INIT. This is an exception to the general rule that only Secure software can access SMMU_S_* registers. Note: For example, a system might allow Non-secure access to SMMU_S_INIT from reset, but might provide a means for Secure software to disable this access. Note: Arm expects Non-secure initialization to use SMMU commands to perform configuration cache and TLB invalidation. Non-secure access to SMMU_S_INIT is not guaranteed, so the INV_ALL feature must not be relied on by the Non-secure state. Note: Invalidation of all Non-secure TLB information can be achieved by issuing CMD_TLBI_EL2_ALL and CMD_TLBI_NSNH_ALL commands. If an SMMU implementation creates TLB entries when bypass is selected with SMMUEN == 0, these entries are not visible to software. An implementation does not require TLB entries inserted to support transaction bypass to be explicitly invalidated by software, such as when SMMUEN is transitioned from 0 to 1. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 111

Chapter 3. Operation 3.12. Fault models, recording and reporting 3.12 Fault models, recording and reporting An incoming transaction goes through several logical stages before continuing into the system. If the transaction is of a type or has a property that an SMMU cannot support for IMPLEMENTATION DEFINED reasons, an Unsupported Upstream Transaction fault event is recorded and the transaction is terminated with an abort. Otherwise, configuration is located for the transaction, given its StreamID (and SubstreamID, if supplied). If all of the required STE and CD structures cannot be located or are invalid, a configuration error event is recorded, if there is a free location in the Event queue, and the transaction is terminated with an abort. If a valid configuration is located so that the translation tables can be accessed, the translation process begins, other faults can occur during this phase. See section 7.2 Event queue recorded faults and events for more information about the individual events that are recorded for configuration errors and faults. When a transaction progresses as far as translation, or during the process of fetching a CD from IPA space through stage 2 translation, the behavior on encountering a fault becomes configurable, if this is supported by the implementation. There are four fault types that constitute Translation-related faults when they are generated at either stage 1 or stage 2: • F_TRANSLATION • F_ADDR_SIZE • F_ACCESS • F_PERMISSION Behavior for these faults can be switched between the Terminate and Stall model as determined by the CD.{A,R,S} flags for stage 1 and the STE.{S2R,S2S} flags for stage 2. All other faults (including F_WALK_EABT and F_TLB_CONFLICT) and configuration errors terminate the transaction with an abort. Note: An F_ADDR_SIZE can also arise from a transaction that bypassed stage 1 but that has an out-of-range IPA, see section 3.4 Address sizes. In this case the transaction is always terminated with an abort. Note: An F_PERMISSION can also arise as a result of an instruction fetch transaction on a Secure stream that bypasses stage 1, is determined to be Non-secure and that is prevented with SMMU_S_CR0.SIF == 1, see section 6.3.76.2 SIF. In this case the transaction is always terminated with an abort. The fault behavior configuration at stage 1 is at a per-substream granularity when substreams are used, that is where an STE points to multiple CDs. When substreams are not configured, that is where an STE points to one CD, the fault behavior configuration at stage 1 is at a per-stream granularity. Use of the Stall model at stage 1 can be disabled by setting STE.S1STALLD == 1. The stage 2 fault behavior is configured using STE.{S2R,S2S}; that is, at a per-stream granularity. When a fault occurs at either stage 1 or stage 2, then when the fault is detected it is known at which stage it occurred, and the SMMU performs the action configured for that stage. For example: • A two-stage configuration that encounters a translation fault in the stage 1 translation tables is a stage 1 fault. • A transaction that progresses through stage 1 to an IPA and then faults when it is translated using the stage 2 translation tables is a stage 2 fault. • A stage 2 translation fault that occurs during a stage 1 translation table walk counts as a stage 2 fault. The event that is recorded differentiates this access from a transaction that access a faulting IPA post-stage 1 translation table walk, so that hypervisor software can inform the VM of the correct event type (a simulated external abort on translation table walk). • A Stage 2 translation fault that occurs fetching a CD from an IPA address is a stage 2 fault. The event that is recorded shows that a CD was being fetched. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 112

Chapter 3. Operation 3.12. Fault models, recording and reporting Note: The Hypervisor might fix the cause of the fault and retry the stalled transaction, or if the transaction is terminated, inform the VM of the correct event type (a simulated external abort on CD fetch). After a transaction progresses through the SMMU into the system, certain system-specific transaction aborts might occur on the path to the memory system. Whether, and how, these are reported to the client device is interconnect-specific. The SMMU does not record any faults for these events. The SMMU only records fault events that are generated by its own accesses or by client device accesses that encounter an internal translation issue. When an incoming transaction is immediately terminated, for any reason, an order is not enforced between the response to the client device and the event that is recorded into the Event queue. However, if an event is committed to be recorded for the terminated transaction, a CMD_SYNC ensures that the event record is visible in the Event queue before the CMD_SYNC is considered complete. See section 4.7.3 CMD_SYNC(ComplSignal, MSIAddress, MSIData, MSIWriteAttributes). The SMMU treats a transaction as being independent of all other transactions (regardless of whether the transaction originates from the same traffic stream or from different streams) and the fault behavior of one transaction has no direct effect on any other transaction. Section 3.12.2 Stall model below describes interconnect ordering issues and recommendations for the presentation of grouped fault information to software. Whether an external agent makes an association between different transactions is outside the scope of the SMMU architecture. When a transaction causes a Translation related fault at stage 1, the transaction might be: • Terminated with an abort (CD.S == 0 and CD.A == 1) • Terminated with RAZ/WI behavior (CD.S == 0 and CD.A == 0) • Stalled (CD.S == 1 and STE.S1STALLD == 0) When a transaction causes a Translation related fault at stage 2, the transaction might be: • Terminated with an abort (STE.S2S == 0) • Stalled (STE.S2S == 1) Support for stalling or terminating a transaction is IMPLEMENTATION DEFINED, indicated by SMMU_(*_)IDR0 .STALL_MODEL. When SMMU_S_IDR1.SECURE_IMPL == 1: • SMMU_S_IDR0.STALL_MODEL indicates the physical capabilities of the SMMU implementation, • SMMU_IDR0.STALL_MODEL indicates the capabilities that Non-secure software is permitted to use. – This field is generated from SMMU_S_IDR0.STALL_MODEL and affected by the SMMU_S_CR0.NSSTALLD flag which, when set on an SMMU implementation supporting both the Stall model and the Terminate model, prevents Non-secure use of stalling faults. – Note: This can be used to guarantee Non-secure software cannot stall transactions where doing so might cause external problems in certain system topologies. When SMMU_S_IDR1.SECURE_IMPL == 0, SMMU_IDR0.STALL_MODEL reflects the physical capabilities of the SMMU implementation. SMMU_S_IDR0 .STALL_MODEL SMMU_S_CR0 .NSSTALLD SMMU_IDR0 .STALL_MODEL Notes: 0b00 (Stall and Terminate models supported) 0 (do not filter NS use of stall) 0b00 (Stall and Terminate model supported for NS) NS usage reflects physical reality 0b00 (Stall and Terminate models supported) 1 (NS cannot use stall) 0b01 (Terminate model supported for NS) NS usage limited to terminate-only, even though physically the SMMU supports stall too. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 113

Chapter 3. Operation 3.12. Fault models, recording and reporting SMMU_S_IDR0 .STALL_MODEL SMMU_S_CR0 .NSSTALLD SMMU_IDR0 .STALL_MODEL Notes: 0b01 (Terminate model supported) X (No stall to filter) 0b01 (Terminate model supported) NSSTALLD irrelevant, no stall to prevent. 0b10 (Stall model supported) X (No alternative to stall) 0b10 (Stall model supported) NSSTALLD irrelevant, no alternative to stall so cannot disable. The SMMU_IDR0.TERM_MODEL field indicates the termination models provided by an implementation, globally. An implementation might, for a stage 1 fault, offer the choice of terminate with abort or RAZ/WI behavior, or an implementation might only allow termination by abort, in which case the CD.A bit must be set. Note: A transaction faulting at stage 2 is, when terminated, always aborted. It is optional whether an SMMU implementation supports the Stall model, the Terminate model, or both. Where system usage cannot be anticipated, Arm recommends that both fault models (SMMU_IDR0.STALL_MODEL == 0) and both termination models (SMMU_IDR0.TERM_MODEL == 0) are implemented. If there is a risk that the stability of the system is compromised when the stall configuration is used for a set of client devices you can consider the following countermeasures: • An implementation that supports both the Stall and Terminate models is permitted, but not required, to treat a stalling configuration for these devices as a terminating configuration. – When stalling is configured for these devices, faulting transactions are terminated instead of stalled. – The faults are reported with Stall == 0. – The transaction is terminated with Abort. • These devices are not required to be defined by the SMMU implementation, but are an IMPLEMENTATION DEFINED system property. Note: For faulting transactions that are associated with client devices that have been configured to stall, but where the system has not explicitly advertised the client devices to be usable with the stall model, Arm recommends for software to expect that events might be recorded with Stall == 0. 3.12.1 Terminate model When stage 1 is configured to terminate faults, a transaction that faults at stage 1 is either terminated with an abort reported to the client device that is making the access, or the transaction completing successfully with reads returning 0 and writes being ignored (RAZ/WI), depending on the setting of CD.A and SMMU_IDR0.TERM_MODEL. See section 5.5 Fault configuration (A, R, S bits). When stage 2 is configured to terminate faults, a transaction that faults at stage 2 is terminated with an abort. The behavior of the client device after termination is specific to the device. If a stage that is configured to terminate faults is also configured with CD.R == 1 or STE.S2R == 1, as appropriate to the stage of the fault, the SMMU records the details of the access into one Event record in the Event queue, supplying information including: • Address • Syndrome • Attributes (Read/Write, Inst/Data, Privileged/Unprivileged, NS) • Type (S1/S2 Translation, Permission, Address Size, Access flag fault) ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 114

Chapter 3. Operation 3.12. Fault models, recording and reporting If the Event queue is full, the event record is lost. Note: In some interconnects, stalling the transaction until its fault can be recorded might trigger interconnect timeouts or deadlocks from which it might be more difficult to recover than from a lost fault record. Arm expects that such fault records arise from programming errors and that software will not implement any mechanism that depends on the delivery of terminate fault records. Streams that originate from PCIe subsystems must not stall and must be configured to use the Terminate model at all enabled stages of translation. This is enforced at stage 1 through the STE.S1STALLD flag, see section 16.4 System integration. 3.12.2 Stall model When a stage is configured to stall transactions on a fault, and a transaction experiences a Translation-related fault as described in 3.12 Fault models, recording and reporting, the faulting transaction does not progress and no response is reported to the client device. The transaction is buffered in a stalled state until subsequent resolution. The SMMU always records the details of the access into the Event queue. A stalled transaction is retried or terminated by issuing a CMD_RESUME or CMD_STALL_TERM command. If retry is chosen, the transaction is handled as though it had just arrived at the SMMU. This means that the transaction will be affected by any configuration or translation changes that occurred since it originally stalled. Note: This means a transaction can stall and later when it is retried, use a configuration that causes it to immediately terminate, for example, a change to stall configuration in the meantime. This property can safely clean up stalled transactions on a stream by ensuring that a new configuration for transactions that are retried causes them to be terminated. If a stalled transaction is terminated by a CMD_RESUME command, a command parameter determines whether an abort is reported, or if SMMU_IDR0.TERM_MODEL == 0, whether the transactions completes with RAZ/WI behavior. To ensure that no transaction is stalled indefinitely, software must ensure that every stall event has a corresponding CMD_RESUME command, is subject to a CMD_STALL_TERM command, or that stalled transactions are terminated because translation is disabled by clearing SMMU_()CR0.SMMUEN to 0. When an event record is generated for a stalled transaction, a Stall Tag (STAG) is supplied by the SMMU as part of the record to uniquely identify the transaction. The SMMU uses the combination of StreamID and STAG parameters to CMD_RESUME to identify the stalled transaction. A CMD_RESUME command has no effect on any stalled transaction other than on the transaction that is uniquely identified by the combination of STAG and StreamID. Note: This identification is required for virtualization correctness, where a CMD_RESUME from a guest VM is trapped and reinterpreted by a hypervisor and generates a CMD_RESUME to the SMMU. The hypervisor validates the correctness of the StreamID parameter, but the STAG parameter is passed directly from the guest, and cannot be trusted to be correct and cannot be the sole selector of a stalled transaction. The format of the STAG field is IMPLEMENTATION SPECIFIC, with the restriction that a value cannot be re-used until the transaction it was last associated with has been acknowledged through a CMD_RESUME or a CMD_STALL_TERM command, or translation is disabled by clearing SMMU()CR0.SMMUEN to 0. If the Event queue is not writable at the time when the fault record of a stall is to be written, the stalled transaction is retried as though it had just arrived when the queue is next writable and a new fault record is generated. For more information about recording faults and events, see section 7.2 Event queue recorded faults and events. Note: For software to be notified of stalled transactions, it must enable the Event queue using SMMU(*_)CR0.EVENTQEN. Software can depend on the delivery of fault records from stalled transactions, see section 3.12.4 Virtual Memory paging with SMMU. Note: Retrying the stalled transaction when the queue becomes writable might lead to the transaction succeeding ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 115

Chapter 3. Operation 3.12. Fault models, recording and reporting or experiencing a different type of fault, if the configuration or translations were changed before the queue became writable. Therefore, an event can be written that is different from the originally-attempted event. If the client device and interconnect rules allow it, a later transaction might pass through the SMMU and complete before an earlier stalled transaction that is associated with the same stream. The SMMU does not require any additional ordering between transactions from different streams beyond that required by the interconnect rules. Note: The following cases are all considered to be from different streams: • Transactions with different StreamIDs, • Transactions with the same StreamID but different SubstreamIDs, • Two transactions with the same StreamID but where only one transaction has a SubstreamID. Note: Accesses with PM = 1 must not stall, as software has no mechanism to resolve this condition. For more information see: • STE.S2S. • CD.S. 3.12.2.1 Suppression of duplicate Stall event records If a transaction faults and then stalls, and a subsequent transaction belonging to the same stream also faults and then stalls, the SMMU is permitted but not required to suppress the generation of a new stall fault record for the new transaction if all of the following apply: • The transactions require access to the same page. • The transactions have the same privilege. • The transactions have the same data/instruction attribute. • The transactions have the same type, that is they are both reads or both writes. • The transactions are associated with the same SubstreamID, if present. • The first stalled transaction is still stalled when a subsequent transaction stalls. Arm recommends that an implementation suppresses additional fault records where possible. Note: It is not guaranteed that event records are suppressed in all possible scenarios. Software must ensure correctness where a transaction records a fault that duplicates a previous fault that was recorded for an earlier transaction. When a stall fault record is acknowledged by a CMD_RESUME command, any related suppressed stalled transaction are retried by the SMMU as though they had just arrived. Note: A series of faults for one page might result in a single stall fault record, with a single CMD_RESUME command enabling all stalled transactions for that page to progress. If the CMD_RESUME command terminates the stalled transaction that is specified by the stall fault record, the re-trying of the other stalled transactions might cause new fault records to be recorded. Note: For example, transactions A, B, C, D from the same stream that fault for the same reason might cause a single stall record for A to be recorded, and those for B, C, D to be suppressed. If software decides that the address was an error and terminates A, transactions B, C, D retry and fault again. A stall for B is recorded (and C, D might be suppressed). Software terminates B and the process repeats. Ultimately, A, B, C, D are all visible to software (rather than some being silently terminated), which can aid debug. Stall fault records are not merged, see section 7.3.1 Event record merging. Note: The suppression of identical stall fault records as described in this section is not the same as non-stall events being merged. When a stall record is suppressed, a stalled transaction still might exist and can affect future behavior, whereas the act of merging non-stall events completes the delivery of those events. If a new transaction stalls for a reason that is unrelated to that of an existing stalled transaction, a new fault is recorded, – that is, it is not suppressed by dissimilar prior stalls even for the same StreamID and SubstreamID. Arm recommends that the new fault is recorded without being delayed by prior unrelated faults or CMD_RESUME activities where possible. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 116

Chapter 3. Operation 3.12. Fault models, recording and reporting The SMMU does not record more than one fault for each incoming transaction, with the exception of the scenario in which a transaction stalls, and is explicitly retried with CMD_RESUME(Retry). After this command it is considered to be a new transaction and might again encounter a fault. 3.12.2.2 Early retry of Stalled transactions The SMMU is permitted to speculatively retry a stalled transaction without first receiving a CMD_RESUME(Retry) command that matches the stalled transaction, this is referred to as early retry. If this occurs: • An early retry is similar in function to the retry caused by an explicit CMD_RESUME(Retry). The transaction undergoes the full translation procedure and does not use any stale cached configuration or translation data that was invalidated since the time of the stall. • A recorded stalled transaction causes a single fault record. An early retry of the stalled transaction does not cause additional faults to be recorded. When a retry is directly caused by a matching CMD_RESUME that indicates a retry, it is not considered to be an early retry, and this rule does not apply. This rule is in keeping with the behavior that an explicit retry command causes the transaction to be retried as though it had just arrived at the SMMU. Note: This rule includes the case where configuration has been changed to terminate faults after the transaction stalls then, under the new configuration, the transaction retries successfully by termination. Alternatively, if a retry occurs under stall fault configuration, the transaction remains stalled. Neither of these cases result in the reporting of a new event record. • The progress of a transaction and the device-specific behavior are the only indications that an early retry has occurred that is visible to software. • A successful early retry does not remove the requirement for software to acknowledge the stall fault record, see section 3.12.2 Stall model. A successful early retry does not remove the restriction on re-using STAG values, see section 3.12.2 Stall model. If targeted by CMD_RESUME(Terminate) or CMD_STALL_TERM, a stalled transaction is eventually terminated, if the transaction does not early-retry and successfully progress into the system before the termination can take place. If the transaction early-retries and fails to successfully translate, it remains stalled until the termination action takes effect, or a successive early-retry enables the transaction to progress successfully. A CMD_RESUME(Retry) guarantees that the stalled transaction will be retried at a future point, unless it is terminated by CMD_STALL_TERM command or an SMMUEN transition before the retry. A stalled transaction is only guaranteed to be retried by the use of a CMD_RESUME(Retry) command. A CMD_RESUME(Terminate) does not prevent a stalled transaction from being retried after the CMD_RESUME is consumed by the SMMU, but guarantees that the transaction will be terminated if the transaction cannot successfully early-retry. Note: For example, if translations have not changed from the time that a fault was generated, a transaction cannot successfully early-retry. Note: Arm does not expect software to modify a translation table descriptor from a faulting or invalid state into a valid state, and then terminate a transaction that has previously stalled because of the initial state of the descriptor. The transaction could early-retry, observe the valid state and then progress into the system. If the SMMU is able to successfully early-retry a stalled faulting transaction before the original stall event is committed to be written to the Event queue, the SMMU is permitted to discard the fault event or to continue on and commit to the event write. Note: To software, this race condition is indistinguishable from a temporally-later transaction that translates successfully the first time so a stall event record is not required. If an implementation records the event, the behavior described in this section applies. If a stalled faulting transaction is retried before the original stall event is committed to be written to the event queue and experiences a fault that is different to the previous fault, the most recent fault is recorded, provided that it is possible to do so and that the previous fault is invisible. Note: This scenario is permitted to occur when the Event queue is writable. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 117

Chapter 3. Operation 3.12. Fault models, recording and reporting See section 7.2.1 Recording of events and conditions for writing to the Event queue for retry behavior requirements when the Event queue is not writable. 3.12.2.3 Miscellaneous Stall considerations The number of transactions that can be stalled before the ingress port cannot accept any more transactions, from the same stream or from other streams, is IMPLEMENTATION SPECIFIC. Stalling traffic can therefore cause backpressure that affects the flow of traffic for other devices behind the SMMU. If a stall blocks other traffic and resolving the fault condition that caused the stall involves transferring data using another device, the system architecture must ensure that the act of fetching the data will not itself stall behind the original transaction. Note: STE.S1STALLD == 1 prevents a guest VM from using the Stall model. This guarantees that stalled transactions cannot affect other parts of the system, such as a different guest VM, where stalls could cause deadlocks. Arm expects that hypervisor software uses the virtualization of SMMU_IDR0.STALL_MODEL to report to the guest VM that the Stall model was not supported. If a transaction experiences a fault during an IPA to PA translation of a stage 1 translation table walk or CD fetch, it is not required to be terminated and might stall, depending on the stage 2 fault configuration provided by STE.S2S. Software might address the cause of the stage 2 fault and retry the transaction, which will re-fetch the configuration and translation structures as necessary. 3.12.3 Considerations for client devices using the Stall fault model If a transaction from a client device experiences a fault that stalls and is terminated by software issuing CMD_RESUME(Terminate), the transaction is marked and guaranteed to terminate at some point in the future if translations do not change so as to allow an early-retry to succeed in the meantime. The SMMU does not guarantee when a stalled transaction is terminated. Note: A situation might arise in which software is required to reconfigure translations so that a previously-marked stalled transaction might now succeed if it were to retry. For example, a transaction that is made to an unmapped address causes an initial fault, and then a terminate operation is performed. Later software creates a legitimate mapping at that address and, if the original transaction was a write that retries and now succeeds, data corruption might result. Software might need a mechanism to ensure that previous transactions have all completed, both terminated stalls and transactions that are progress that the SMMU is not yet aware of. The system, or client devices, must provide a mechanism to enable software to wait for these previous transactions to complete before changing configuration to a state that might let them proceed. This might be an explicit indication from the client device that its outstanding transactions have all been terminated or completed, an interconnect ordering guarantee that prior transactions are all visible, or another mechanism. 3.12.4 Virtual Memory paging with SMMU The SMMU architecture supports three models of usage with respect to translation-related faults that occur during translation of client device accesses: 1. A fault that occurs due to a device access might always be considered to be an error by the system and is terminated. Note: This might be the result of a programming error. 2. A fault that occurs due to a device access might be considered permanent due to a programming error, or temporary due to particular page state resulting from use of virtual memory with the address space, and one of the following is configured to occur: a. The device transaction is stalled, the fault is reported to software and then the transaction is resumed after the virtual memory system resolves the cause of the fault. Or, if the virtual memory system determines that the access was invalid, the transaction is terminated. This model can only be used with a device and interconnect that can support stalls. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 118

Chapter 3. Operation 3.12. Fault models, recording and reporting b. For PCIe devices, for which transactions cannot safely be stalled, the PCIe specification provides ATS and PRI. ATS enables an endpoint to ascertain whether a page can be accessed without causing a fault in the SMMU before accessing it. PRI provides a mechanism for a page fault to be resolved if the prior ATS step indicates a fault would otherwise occur. 3.12.4.1 Page-in request event When non-PCIe devices are used with the Stall fault model to access paged virtual memory spaces, the Stall fault record itself is the notification to software that a page miss occurred and that software intervention is required. Note: Devices used with, or integrating, an SMMU will generally emit transactions when the access is required. Although read speculation is permitted, writes cannot be emitted speculatively to trigger a page fault early, see section 3.14 Speculative accesses. In particular, stall fault records do not arise from accesses marked speculative. An optional hint event record, E_PAGE_REQUEST, can be provided by an implementation to request that software initiates any costly page-in operations early. An implementation might provide an IMPLEMENTATION DEFINED mechanism to convey this message from client devices. This message: • Is a hint, and can be ignored or dropped by the SMMU or software. • Can be issued speculatively. • Requires no response. Note: A stall fault record is generated in response to a non-speculative transaction. A speculative transaction generates no software-visible record. E_PAGE_REQUEST allows a software-visible record to make an early start on fetch of pages from secondary storage and can be used to hide latency. 3.12.5 Combinations of fault configuration with two stages When the Stall model and the Terminate model are used differently at different stages of translation, the resulting behavior depends on the stage at which the transaction faulted and the type of fault. For Translation-related faults that can stall the following scenarios arise: Stage 1 config Stage 2 config Fault at Transaction result Event parameters Hypervisor behavior Terminate Terminate Stage 1 Terminated VA Event passed to guest as stage 1-only event. Stage 2 Terminated VA, IPA Might log IPA of fault for debug purposes. (1)Might pass event to guest if terminated. Terminate Stall Stage 1 Terminated VA Event passed to guest as S1-only event. Stage 2 Stalled VA, IPA May terminate with CMD_RESUME (Terminate) and log IPA of fault for debug purposes. Or, correct the translation fo IPA then CMD_RESUME(Retry). (1) Might pass event to guest if terminated. Stall Terminate Stage 1 Stalled VA Event passed to guest as S1-only event with stall. Guest must CMD_RESUME(Retry/Terminate). Stage 2 Terminated VA, IPA Might log IPA of fault for debug purposes. (1)Might pass event to guest if terminated. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 119

Chapter 3. Operation 3.12. Fault models, recording and reporting Stage 1 config Stage 2 config Fault at Transaction result Event parameters Hypervisor behavior Stall Stall Stage 1 Stalled VA Event passed to guest as S1-only event with stall. Guest must CMD_RESUME(Retry/Terminate) Stage 2 Stalled VA, IPA Might terminate with CMD_RESUME (Terminate) and log IPA for fault or debug purposes. Or, correct translation for IPA then CMD_RESUME(Retry). (1) Might pass event to guest if terminated. 1. Might pass event to guest: Anything that is terminated at stage 2 is equivalent to a stage 1 external abort. A successful stage 1 translation that outputs an incorrect IPA that leads to a stage 2 fault would not ordinarily be reported to the guest through its SMMU interface, because its stage 1 translation succeeded and the error arises outside of the (stage 1) domain of the SMMU interface. Arm expects that a stage 1 translation table walk that faults at stage 2 is reported to the guest as F_WALK_EABT by the hypervisor. All other fault types cause the transaction to be aborted. For example, a failure to locate a valid STE (F_BAD_STE) or CD (F_BAD_CD) terminate the transaction with an abort. Note: When both stage 1 and stage 2 are enabled, a CD or stage 1 translation table descriptor fetch might cause a stage 2 Translation-related fault, and might therefore stall the transaction. Regardless of the reason for making the IPA access, the fault can be resolved at stage 2 and restarted. This is the same behavior as with a faulting IPA access for the transaction address after stage 1 translation. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 120

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state 3.13 Translation tables and Access flag/Dirty state The SMMU might support Hardware Translation Table Update (HTTU) of the Access flag and the dirty state of the page for AArch64 translation tables. Some Armv8-A PEs might support Hardware Update to Access flag and dirty state [2]. SMMU support of HTTU can coexist with both hardware and software flag update from the PEs. The SMMU update of descriptors behaves in an identical manner to those described in [2], with the additional SMMU-specific behavior in section 3.13.4 HTTU behavior summary, although its configuration method differs. HTTU increases the efficiency of maintaining Access flag and dirty state in translation tables. A single translation table can be shared between any combination of agents that perform software updates of flags, and other agents that perform HTTU. Agents supporting HTTU update the flags atomically. Software must also use atomic primitives to perform its own updates on translation tables when they are shared with another agent that performs HTTU. Note: In general, an update of the Access flag and the dirty state of the page in a system is associated with the use of dynamic paging and, in the context of the SMMU, associated with DMA targeting paged memory. Arm does not expect applications that constrain DMA access to static, pinned or non-paged mappings to perform or require dynamic update to the Access or to the dirty state of the page. Support for HTTU is indicated by SMMU_IDR0.HTTU, and can be one of the following: • No flag updates are supported. • Only Access flag updates are supported. • Update of both the Access flag and the dirty state of the page is supported. If HTTU is supported, separate enable bits in the CDs and STEs determine whether a particular stage 1 or stage 2 translation table (referenced from the CD or STE) is updated in this manner, and the scope of the updates. Note: It is possible for several CDs to reference the same translation table, or for several STEs to reference the same CD. Where translation tables are shared between CDs that contain the same ASID (within a translation regime), the CD HA and HD fields must be identical. See section 5.4.1 CD notes. Note: Accessed means a translation to which an access has been made. Software might attempt to detect a working set by clearing the Access flags and observing which flags are set again. Dirty state of the page means a writable translation to which a write access or other modification has been made. When reclaiming or repurposing a Dirty page, software might preserve the modifications to storage. Clean means a writable translation to which a modification has not been made. When reclaiming or repurposing a Clean page, software might simply discard the page contents (as another up-to-date copy might be available in storage elsewhere). HTTU for Access flag and Dirty state updates are not performed for accesses tagged with PM = 1. This includes hardware update of Access flag in Table descriptors. See section 3.25.10.1.1 Protected Mode. 3.13.1 Software update of flags Note: In the context of a PE that does not support HTTU, software is generally expected to maintain the Access flag and the dirty state of a page, where required, as follows: • A read or write that fetches a translation table descriptor with AF == 0 causes an Access fault, if AFFD == 0. An exception handler marks the descriptor as AF == 1 and retries the instruction that caused the access. Agents are not permitted to cache such entries in TLBs. No TLB invalidation is required when setting AF to 1. • A Dirty state is usually implemented in software by write-protecting a translation. A write access to such a translation generates a Permission fault, at which point the exception handler might rewrite the descriptor to mark it writable. The exception handler might use additional software structures or a software-defined descriptor flag to differentiate a genuinely non-writable page from a page that is only temporarily non-writable in order to generate the Permission fault. In this arrangement, a descriptor that has write permission is considered to be writable-dirty and a descriptor that has no write permission but is marked as temporarily unwritable is considered to be writable-clean. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 121

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state • A write to a genuinely non-writable page is an error. A write to a writable-clean (temporarily non-writable) page causes the page to become writable-dirty. A write to a writable-dirty page causes no additional state change. • An Access fault takes priority over a Permission fault. That is, a write to a writable-clean page with AF == 0 and AFFD == 0 causes an Access fault, and only after AF == 1 does a Permission fault occur. When pinned DMA translations are used with an SMMU, software can update the translation flags as appropriate to the expected access. Arm does not expect faults to be generated when pinned translations are used, and such faults represent a programming error. Arm expects software to use the Terminate model for such scenarios, so that faulting transactions are terminated. An SMMU can operate in a similar manner to the PE example when using unpinned DMA translations, so that transactions that are translated by the SMMU cause faults to be recorded and the SMMU driver software sets descriptor state in response to these records. For more information about faults, see section 3.12 Fault models, recording and reporting. Where this is the case, Arm recommends that this is implemented using the Stall model. Arm expects that the SMMU driver maintains software Access or dirty state by doing one of the following: • Responding to an F_ACCESS fault by setting AF to 1 in the relevant translation table descriptor. • Responding to an F_PERMISSION fault for write to a writable-clean page by marking the page as writable-dirty. • Finally, issuing a CMD_RESUME to the SMMU to retry the transactions held up due to the fault Note: The AFFD field in a stage 1 and stage 2 configuration modifies the behavior of the AF flag of a descriptor. A descriptor, at a translation stage with AFFD == 1, and AF == 0 does not cause an F_ACCESS to be generated. Instead, the translation is used as though AF == 1. This configuration is only relevant where HTTU is not used. A translation table can be shared between multiple agents, if all agents that update the Access flag and the dirty state of the page use the same semantics to differentiate a descriptor marked non-writable from one marked temporarily non-writable. Usually, this is a software-defined bit that flags a page as potentially writable as opposed to a page that is intended to always be non-writable. HTTU removes the fault record and software handling from the path of updating translation table flags. An agent is permitted to perform HTTU on a translation table that might be shared with an agent performing software update. The Dirty Bit Modifier field has been added in Armv8.1-A to differentiate non-writable and writable-clean states. For more information about the Dirty Bit Modifier, see section 3.13.3 Dirty state hardware update. Software intending to provide software-updated translation table descriptor flags from one agent (for example a PE without HTTU) while sharing translations with another agent that uses HTTU must use the DBM flag convention, and perform atomic updates. For Protected Mode accesses, to prevent faulting, software is expected to perform the following sequence: 1. AF is set to 1 in any page or block descriptor that might be accessed using a PM = 1 access. 2. If HAFT is enabled, AF is set to 1 in any Table descriptor that points to a page or block descriptor that might be accessed using a PM = 1 access. 3. If a page or block descriptor might be written by a PM = 1 access, it is marked as writable-dirty. Note: HTTU must be suppressed for PM = 1 accesses to prevent data leakage through Access flag and Dirty state updates. See section 3.25.10.1.1 Protected Mode. 3.13.2 Access flag hardware update When HTTU is supported and enabled for a stream, a translation that causes an SMMU fetch of a descriptor with AF == 0 that would, without HTTU and with AFFD == 0, have caused an Access fault performs an atomic update to set AF == 1 in the descriptor. Note: This includes stage 2 translation for the fetch of an L1CD or CD. The SMMU never clears AF. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 122

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state If access to a descriptor causes a permission fault, it is UNKNOWN whether the AF flag of the descriptor is updated to 1. When HTTU is disabled, or not supported by the SMMU, a transaction that leads to access of translations with AF == 0 and AFFD == 0 causes F_ACCESS. If the update of the dirty state of the page takes place, the final translation table descriptor will also have AF == 1. For an access with PM = 1, all of the following apply: • If the access results in a stage 1 or stage 2 walk, the SMMU does not set the Access flag and instead generates F_ACCESS if it encounters either of the following: – A page or block descriptor that has AF = 0, when HTTU for Access flag update is enabled. – A Table descriptor that has AF = 0, when HAFT is enabled. • If the access requires write permission, HTTU for Dirty state update is enabled and a stage 1 or stage 2 walk encounters a page or block descriptor that is writable-clean, the SMMU does not mark the page as writable-dirty. Instead, it generates F_PERMISSION. Note: In both cases the event record that is written to the event queue is F_PROTECTED. See 7.3.21 F_PROTECTED. 3.13.3 Dirty state hardware update 3.13.3.1 Direct Permission Scheme In order to coexist with an agent that is not using hardware update, HTTU defines a new flag, at bit[51] of Block and Page descriptors, called the Dirty Bit Modifier (DBM). The DBM bit marks the overall intention of a translation as ultimately writable, to differentiate a non-writable page from a writable-clean page in the same way as a software-maintained mechanism would. The DBM bit only applies to a stage of translation that uses the Direct Permission Scheme. Note: The Armv8-A stage 1 descriptor field AP[2:1] has no bit[0]. The stage 2 equivalent is named S2AP[1:0]. When HTTU of the dirty state of a page is supported and enabled for the stream, a non-writable descriptor is automatically marked as writable by the SMMU when a translation for a write occurs, if it is a descriptor with DBM == 1. A Permission Fault is not generated, and the translation continues. Specifically, if a descriptor is read-only only as a result of AP[2:1] == 0b1x at stage 1, or non-writable using S2AP[1:0] == 0b0x at stage 2, then if DBM == 1 and a translation for write occurs, the SMMU atomically sets AP[2] to 0 in the descriptor held in memory, in a coherent manner if appropriate. If DBM == 0, the page has no write permission and a write translation results in a Permission fault. Note: HTTU of the dirty state of a page is not applicable to a descriptor that is made effectively read-only because of the hierarchical control of access permissions using APTable. All references to page or block permissions in this section are made on the assumption that the page or block is otherwise accessible, will not generate a Permission fault for other reasons, such as PAN or the APTable having removed access, and that a page is only read-only (or otherwise non-writable) because of the page or block AP/S2AP permissions. When the APTable does not remove write access: • A read-only stage 1 descriptor has DBM == 0 and AP[2:1] == 0b1x. A non-writable stage 2 descriptor has DBM == 0 and S2AP[1:0] == 0b0x. A write causes a Permission fault as the page has no write permission. The software fault handler invokes an error-handling routine. Because DBM == 0, the software handler can determine that the page is not allowed to be written. In the case of an SMMU stalled fault, software can use CMD_RESUME to terminate an erroneous transaction. • A writable-clean translation table descriptor has DBM == 1 and AP[2:1] == 0b1x/S2AP[1:0] == 0b0x. Without HTTU, this descriptor is Non-writable and a write causes a Permission fault. Because DBM == 1, the page is intended to be writable and the software fault handler can mark the page as dirty by setting AP[2] ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 123

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state == 0/S2AP[1] == 1 and performing the appropriate TLB invalidation. The page is now marked writable-dirty. In the case of an SMMU stalled fault, software can use CMD_RESUME to retry the transaction, which might then continue without fault. When HTTU is enabled, a write transaction causes the SMMU to atomically set AP[2] == 0 or S2AP[1] == 1 as appropriate, and then allows the write to proceed. • A writable-dirty descriptor has AP[2] == 0/S2AP[1:0] == 0b1x. With or without HTTU, this page is writable and will not generate a Permission fault on a write. Note: Although DBM is ignored by hardware in this state, it might be useful for software to use the convention of leaving DBM == 1 when a page AP[2] transitions from non-writable to writable. This allows the DBM bit to be used as a one-bit flag to indicate that, overall, a page is intended to be written regardless of its current Clean or Dirty state. • The SMMU never sets or clears DBM. • The SMMU never clears S2AP[1]. • The SMMU never sets AP[2]. A descriptor is never made writable by the SMMU unless DBM == 1. The SMMU never sets S2AP[1] == 1 for the stage 2 translation for the fetch of an L1CD or CD. 3.13.3.2 Indirect Permission Scheme When the Indirect Permission Scheme is used for stage 1 Base permissions, CD.HD exclusively defines whether the dirty state is managed by hardware or software. When the Indirect Permission Scheme is used for stage 2 Base permissions, STE.S2HD exclusively defines whether the dirty state is managed by hardware or software. Note: If the Indirect Permission Scheme is in use for a stage of translation, there is no DBM field in either the Block or Page descriptor. 3.13.4 HTTU behavior summary SMMU HTTU operation has the same behavior as described in Hardware Updates to Access Flag and dirty state in the Armv8.9-A architecture [2]. The following HTTU behavior is specific to the SMMU: • A descriptor update that occurred because of a completed ATOS translation is made visible to the required Shareability domain, as specified by the translation table walk attributes, by completion of a CMD_SYNC that was submitted after the ATOS translation began. • A descriptor update that occurred because of a completed incoming transaction is made visible to the required Shareability domain (as specified by the translation table walk attributes) by completion of a CMD_SYNC that was submitted after the completion of the incoming transaction. – In addition, the completion of a TLB invalidation operation makes descriptor updates that were caused by transactions that are themselves completed by the completion of the TLB invalidation visible. Both broadcast and explicit CMD_TLBI_* invalidations have this property. The SMMU HTTU behavior follows the same rules as the A-profile architecture[2], including all TLB invalidation completion requirements on HTTU visibility, with the following exception: • If stage 2 hardware update of Dirty state is enabled, the SMMU is permitted to speculatively update the Dirty state of a stage 2 descriptor used for a stage 1 translation table walk, even if stage 1 hardware updates of Access flag or Dirty state are disabled. Note: In the A-profile architecture[2] this is only permitted if stage 1 hardware updates of Access flag or Dirty state are enabled. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 124

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state 3.13.5 HTTU with two stages of translation When two stages of translation exist, multiple translation table descriptors determine the translation, that is the stage 1 descriptor, the stage 2 descriptors mapping all steps of the stage 1 walk, and finally the stage 2 descriptor mapping the IPA output of stage 1. Therefore one access might result in several descriptor updates. Figure 3.11 shows an example: S1 L1 TTD addr S1 L2 TTD addr S1 final TTD addr ... Have IPA, translate to PA for final result Read S2 L1 TTD Read S2 L2 TTD RmW S2 final TTD: update AF ... Translate IPA to PA Stage 1 config, TTBx bases Stage 2 config, TT base ... Translate IPA to PA ... Translate IPA to PA RmW S2 final TTD: update AF/D ... Translate IPA to PA PA for access/ translation response Result Read S2 L1 TTD Read S2 L2 TTD Read S2 L1 TTD Read S2 L2 TTD Read S2 L1 TTD Read S2 L2 TTD RmW S2 final TTD: update AF RmW S2 final TTD: update AF/D Read
L1 TTD from PA Stage 1 TTD has updated AF/D based on incoming request (R/W) (Permitted even if Stage 2 fault on IPA.) Stage 2 TTD has updated AF/D based on incoming request (R/W) Read-modify-Write: Fetch data, atomically updating flag(s). (Permitted to update D, as well as AF, even if S1 TTD is not updated.) Stage 1 Stage 2 Input addr Read
L2 TTD from PA RmW final TTD from PA: update AF/D Stage 2 TTD AF updated as S1TTD will be accessed. Permitted to also update D before accessing S1TTD (predict updated)
Figure 3.11: Example Hardware flag update with nested translation Note: Figure 3.11 is an example procedure and does not depict all permitted ways of performing a nested translation walk with HTTU enabled. Because a stage 1 descriptor hardware update is a write, the stage 2 mapping for its IPA must allow writes for the update to succeed. 3.13.6 Access flag in Table descriptors An SMMU that supports HTTU might also support hardware update of Access flag in Table descriptors. This functionality is indicated by SMMU_IDR0.HTTU and can be enabled in stage 1 and stage 2 translation independently using the following controls: Stage of translation Control field Stage 1 CD.HAFT Stage 2 STE.S2HAFT ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 125

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state If hardware update of Access flag is disabled for a translation stage, then hardware update of Access flag in Table descriptors is also disabled. The SMMU behaviors for hardware update of Access flag in Table descriptors are the same as in the PE architecture, including the following requirements: • When the Access flag in a Table descriptor must be updated. • When the Access flag in a Table descriptor is permitted to be updated. • If HAFT is enabled, then any Table entry with the Access flag clear is not permitted to be cached in a TLB. • The ordering requirements for hardware update of Access flag in a Table descriptor relative to hardware updates of other descriptors updated in the translation table walk, and the final access. Hardware updates of Access flag in Table descriptors are made observable by completion of a CMD_SYNC in the same manner as hardware updates of the Access flag in page or block descriptors. See section 3.13.4 HTTU behavior summary. Hardware updates of Access flag in Table descriptors are caused by ATS Translation Requests in the same manner as hardware updates of the Access flag in page or block descriptors. See section 3.13.7 ATS, PRI and translation table flag update. 3.13.7 ATS, PRI and translation table flag update When ATS and PRI are used to support device access to dynamically paged memory, the Access state and the dirty state of the page need to be maintained. This section describes the SMMU page flag maintenance behavior in a system using ATS with PRI targeting dynamically paged memory. Note: Maintenance of the Access flag and the dirty state of the page is primarily of importance to DMA to unpinned or paged memory, because use-cases with DMA to pinned memory would normally statically initialize page state. 3.13.7.1 Hardware flag update for ATS & PRI Because the purpose of ATS is to cache translations outside the SMMU and to avoid subsequent translation interaction with the SMMU, if HTTU is enabled it is performed at the time of the ATS Translation Request (TR). When an ATS TR is made, it must be assumed that a device will subsequently access the page. If the page is otherwise valid and an ATS response will be returned, AF is set to 1 in the descriptor in the same way as a direct transaction access through the SMMU. In addition to the behavior that is described earlier in this section, if hardware-management of dirty state is enabled and an ATS request for write access (with NW == 0) is made to a page that is marked writable-clean, the SMMU assumes a write will be made to that page and marks the page as writable-dirty before returning the ATS response that grants write access. When this happens, the modification to the page data by a device is not visible before the page state is visible as writable-dirty. If HTTU is only enabled for Access, an ATS request for a write to a writable-clean or Read Only page results in an ATS Translation Completion with W == 0, and write access is denied. If an ATS Translation Request is made for a write (NW == 0) to a nested translation configuration and the associated stage 1 translation is read-only (not writable), the dirty state is not updated in either of the stage 1 descriptor, or the stage 2 descriptor that is used to translate the output address from the stage 1 descriptor. Note: This also applies if the stage 2 translation of the stage 1 output address is writable-clean. When HTTU is enabled for stage 1 and stage 2 and Split-stage ATS is used (STE.EATS == 0b10), the ATS TR performs HTTU at stage 1, and updates for the stage 2 descriptors that are used to fetch and update the stage 1 descriptors are made. The following applies to the stage 2 descriptor for the final IPA: • The AF in the stage 2 descriptor for the final IPA is permitted to be speculatively set to 1 by the ATS TR. • If write permission is granted in the ATS Translation Completion, a writable-clean stage 2 descriptor for the final IPA is permitted to be marked as writable-dirty by the ATS TR. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 126

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state – When a subsequent Translated access to the IPA is translated, this choice does not affect the HTTU behavior of stage 2. – This choice does not affect the permissions that are returned in the Translation Completion, which reflect whether a write transaction is permitted by the combined permissions of both stages and treat a writable-clean stage 2 descriptor as writable. See section 13.6.3 Split-stage (STE.EATS == 0b10) ATS behavior and responses. • The stage 2 translation of a subsequent Translated access marks the stage 2 descriptor as Accessed and might mark the descriptor as writable-dirty, provided that the ATS TR has not already performed this action. 3.13.7.2 Behavior with respect to flag maintenance for ATS & PRI without HTTU If HTTU is not enabled for the Access flag, an ATS request to a page with AF == 0 and AFFD == 0 is denied. For this address a response granting R == W == 0, that is no access, is returned. The client device might then raise an error in a device-specific manner, or might issue a PRI page request, if supported and configured, to request that software makes the page available. Software can manually set AF == 1 on receipt of the PRI page request in anticipation of the device access. An ATS request to any read-only page does not grant write access, that is it returns W == 0, if hardware update of dirty state of the page is not enabled. Read access might be granted in the response, if the conditions for the Access flag set out in this section are satisfied. The client device might raise an error in a device-specific manner, or might issue a PRI page request to request write access to the address. On receipt of a PRI request, software could assume that a request issued for write was initiated because data will shortly be written and mark the page writable-dirty before responding to the PRI request. An ATS request for write to a page marked writable might grant write access, that is it returns W == 1 in the response. Software must consider writable pages as potentially dirty. Note: PCIe PRI requests can be issued speculatively by an Endpoint. This implies speculatively marking the page as Dirty. This is not permitted by the Armv8-A architecture [4] and might be problematic for some software systems. Because pages cannot speculatively be marked as Dirty, Arm recommends that a system designed for general-purpose software supports HTTU when PRI is used, so that the state of the page is marked Dirty only when a request for write access is made using ATS. 3.13.8 Hardware flag update for Cache Maintenance Operations and Destructive Reads HTTU for Dirty state update is not performed for the following operations: • Invalidate Cache Maintenance Operations. • Destructive Reads. • Destrutive Hints. See also: • 16.7.2.2 Permissions model for Cache Maintenance Operations • 3.22.2 Permissions model When these operations are performed to a writable-clean translation table descriptor, the descriptor is not updated to be writable-dirty. If the required Read or Execute permissions are available, but the descriptor is not writable-dirty, the operations are downgraded as described in the corresponding section. This rule does not affect HTTU of the Access flag, which occurs if required. In this case, an update to the Dirty state of the stage 2 descriptor that translates the stage 1 table is performed. Note: For the purposes of determining execute permission, a writable-clean descriptor is considered to be writable when HTTU is enabled, which is consistent with the Armv8-A architecture. As described in this section, this principle applies even when the descriptor is not updated to writable-dirty for an Invalidate, DR or DH. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 127

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state 3.13.9 Hardware Dirty state tracking Structure This section applies only when SMMU_IDR3.HDBSS is 1. The Hardware Dirty state tracking structure (HDBSS) provides an enhanced method for tracking the Dirty state of translation table descriptors in a compacted format. An HDBSS comprises of two HDBSS tables and is defined by the following registers: • SMMU_()HDBSS_BASE0. • SMMU()HDBSS_PROD0. • SMMU()HDBSS_BASE1. • SMMU()HDBSS_PROD1. Figure 3.12 shows a visual representation of the HDBSS tables. Figure 3.12: HDBSS tables If STE.S2HDBSS is 1 and an HDBSS table is valid, then the SMMU appends an entry to the HDBSS when the Dirty state of a stage 2 descriptor transitions from writable-clean to writable-dirty. The base address of an HDBSS table is a physical address configured in SMMU()HDBSS_BASEn.BADDR. The PA space for an HDBSS table is determined as follows: • For a Non-secure HDBSS, the PA space is Non-secure. • For a Secure HDBSS, the PA space is Secure. • For a Realm HDBSS, the PA space is Realm. SMMU accesses to an HDBSS table have the following memory attributes: • The Endianness is little-endian. • Cacheability and shareability attributes are configured in SMMU()CR1.{QUEUE_SH, QUEUE_OC, QUEUE_IC}. • Memory type is Normal memory. • All accesses are Tag Unchecked. • If SMMU_IDR3.MPAM is 1, the MPAM attributes are determined from the values configured in SMMU()HDBSS_MPAM. • For a Realm HDBSS, if SMMU_R_IDR3.MEC is 1, HDBSS updates for a Realm STE are performed using the MECID value configured in SMMU_R_HDBSS_MECID. The size of an HDBSS table is configured in SMMU(_)HDBSS_BASEn.SZ. The format for HDBSS entries is as described in FEAT_HDBSS in the Armv9.5-A architecture [2]. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 128

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state 3.13.9.1 HDBSS Table updates While performing Dirty state tracking, the SMMU appends entries to one of the two HDBSS tables as described below: • When a transaction triggers an HDBSS update and the HDBSS is not halted, the SMMU appends entries to HDBSS table 1 if either of the following are true: – HDBSS table 0 is invalid. – HDBSS table 0 is full. • Otherwise, the SMMU appends entries to HDBSS table 0. Note: For more information on halting the HDBSS, see 3.13.9.5 HDBSS Errors. The SMMU can only append entries to a single HDBSS table at a time. The rules for appending entries to an HDBSS table are the same as those described in FEAT_HDBSS in the Armv9.5-A architecture [2], using SMMU_()HDBSS_PRODn.INDEX. The single-copy atomicity size of HDBSS updates is at least 64-bit. An HDBSS table is full when SMMU()HDBSS_PRODn.INDEX is greater than or equal to 2(SMMU()HDBSS_BASEn.SZ+12)/8. For normal hardware behavior, this is indicated by SMMU()HDBSS_PRODn.INDEX[SMMU_HDBSS_BASEn.SZ+9] being set to 1. When an HDBSS table becomes full, an interrupt is generated. See section 3.18.2 Interrupt sources. If an update to SMMU()HDBSS_PRODn.INDEX becomes observable, then all updates to the HDBSS table up to that index value are also observable. This means that when an HDBSS write becomes visible to the rest of the Shareability domain, the SMMU permits the corresponding update to SMMU()HDBSS_PRODn.INDEX to be observed. If an HDBSS table update is observable, then a corresponding HTTU update is guaranteed to be observable by the completion of a CMD_SYNC. The SMMU is permitted to set any HDBSS entries that are ahead of SMMU()HDBSS_PRODn.INDEX to zero. If SMMU_ROOT_IDR0.ROOT_IMPL is 1 and SMMU_ROOT_CR0.GPCEN is 1, then accesses to an HDBSS are subject to Granule Protection Checks. See 3.25 Granule Protection Checks. There is no ordering guarantee between two updates and atomicity might be lost in the following cases: • The HDBSS and the translation table entry to be updated reside at the same location. • The write to an HDBSS and the write access being translated target the same location. No architected PMCG events count accesses to an HDBSS. Note: HTTU updates are captured in PMCG events in the same manner as described in 10.3 Monitor events. 3.13.9.2 HDBSS Table Validity An HDBSS table is valid if both of the following are true: • SMMU()HDBSS_BASEn.V == 1. • SMMU()HDBSS_PRODn.VACK == 1. If software programs SMMU()HDBSS_BASEn.V to 0, an HDBSS table is considered valid until the Update completes, that is until the SMMU sets SMMU()HDBSS_PRODn.VACK to 0. An HDBSS table is invalid if both of the following are true: • SMMU()HDBSS_BASEn.V == 0. • SMMU(*_)HDBSS_PRODn.VACK == 0. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 129

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state If software programs SMMU_()HDBSS_BASEn.V to 1, an HDBSS table is considered invalid until the Update completes, that is until the SMMU sets SMMU()HDBSS_PRODn.VACK to 1. It is a programming error to update the STE.S2HDBSS field from 0 to 1 when both SMMU()HDBSS_BASEn.V fields are 0. However, software is permitted to update one of the SMMU()HDBSS_BASEn.V fields from 1 to 0 during Dirty state tracking. If an HDBSS table is invalid, the SMMU does not perform any accesses, including speculative read accesses, to the memory region configured by the following fields: • SMMU()HDBSS_BASEn.SZ. • SMMU()HDBSS_BASEn.BADDR. 3.13.9.3 HDBSS Initialization Procedure To initialize an HDBSS, software is required to perform the following sequence: 1. Program both of the SMMU()HDBSS_PRODn fields to 0. This ensures the index of each HDBSS points to the beginning of the structure, and that the error status field is cleared. 2. Program both of the SMMU()HDBSS_BASEn fields with a base address and size. 3. Program both of the SMMU()HDBSS_BASEn.V fields to 1 to indicate to the SMMU that the HDBSS tables are available for use. This step completes when the SMMU sets both of the SMMU()HDBSS_PRODn.VACK fields to 1. 4. Program the STE.S2HDBSS field to 1 for the STEs of the devices associated with the VM being migrated, and issue appropriate CMD_CFGI commands to ensure that the new configuration is in use. 3.13.9.4 HDBSS Processing Procedure To process an HDBSS, software is expected to perform the following sequence: 1. Program the SMMU_()HDBSS_BASE0.V field to 0. This step completes when the SMMU sets the SMMU()HDBSS_PROD0.VACK field to 0. This guarantees that all outstanding updates have completed. An interrupt resulting from the HDBSS table becoming full must be ignored. 2. Share the location of the contents of the HDBSS table with the agent responsible for cleaning the stage 2 descriptors. 3. Either allocate a new and empty HDBSS table, or copy the contents and zero the current HDBSS table. 4. Program the SMMU()HDBSS_PROD0.INDEX field to zero. 5. Program the SMMU()HDBSS_BASE0.V field to 1, advertising that the HDBSS table 0 is available for use. 6. Wait until the SMMU()HDBSS_PROD0.VACK field is 1. 7. At this point, the SMMU has switched to using HDBSS table 0. 8. Repeat steps 1-6 for HDBSS table 1. 3.13.9.5 HDBSS Errors If the SMMU encounters an error while attempting to append an entry to an HDBSS table, it updates both of the following fields: • SMMU()HDBSS_PRODn.ERR. • SMMU()HDBSS_PRODn.ERR_REASON. Dirty state tracking stops when the value of SMMU()HDBSS_PRODn.ERR is not the same as the value of SMMU()HDBSS_BASEn.ERRACK. The HDBSS is considered to be halted if any of the following conditions are true: • Both HDBSS tables are full. • SMMU()HDBSS_BASE1.V is 0 and HDBSS table 0 is full. • SMMU()HDBSS_BASE0.V is 0 and HDBSS table 1 is full. • Both of the SMMU(*_)HDBSS_BASEn.V fields are 0. If the HDBSS is halted and a transaction triggers an HDBSS update, all of the following apply: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 130

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state • The value of SMMU_()HDBSS_PROD1.ERR is toggled. • The SMMU()HDBSS_PROD1.ERR_REASON field is set to 0b11. • No entry is appended to the HDBSS table. If SMMU_ROOT_IDR0.ROOT_IMPL is 1, SMMU_ROOT_CR0.GPCEN is 1, and an access to the HDBSS generates a GPC fault, all of the following apply: • The value of SMMU()HDBSS_PRODn.ERR is toggled. • SMMU()HDBSS_PRODn.ERR_REASON is set to 0b10. • SMMU()HDBSS_PRODn.INDEX points to the entry that generated the fault. • The SMMU handles the fault as a GPC fault on an SMMU-originated access. See 3.25.3 SMMU-originated accesses. When an external abort occurs on an HDBSS write, all of the following apply: • The value of SMMU()HDBSS_PRODn.ERR is toggled. • SMMU()HDBSS_PRODn.ERR_REASON is set to 0b01. • SMMU()HDBSS_PRODn.INDEX points to the entry that generated the fault. Note: n is the index of the HDBSS table that would have been updated if the error condition had not occurred. If the error condition is due to a halted HDBSS, n is always 1. When the value of SMMU()HDBSS_PRODn.ERR is toggled, if SMMU()GERRORN.HDBSS_ERR does not already indicate an active error, the SMMU activates SMMU()GERROR.HDBSS_ERR. If an error is observable in SMMU()GERROR.HDBSS_ERR, then at least one of the conditions described above has been made observable. Note: An HDBSS error does not cause a transaction to be aborted. Note: The error behavior specified by this feature diverges from the Arm A-profile architecture in that stage 2 dirty state hardware update is not prevented when an HDBSS error has been encountered. If software inappropriately configures the value of SMMU()HDBSS_BASEn.ERRACK to mismatch the value of SMMU()HDBSS_PRODn.ERR, it is CONSTRAINED UNPREDICTABLE whether the SMMU records a dirty state update to the HDBSS table or whether a subsequent error is correctly reported. This CONSTRAINED UNPRE- DICTABLE behavior also applies when software transitions an HDBSS table from invalid to valid while the value of SMMU()HDBSS_BASEn.ERRACK is not the same as the value of SMMU()HDBSS_PRODn.ERR. 3.13.9.6 HDBSS Restoration Procedure To restore operation of the HDBSS following any error, EL2 software must perform the following steps (steps 2-5 may be performed in any order): 1. Program both of the SMMU()HDBSS_BASEn.V fields to 0. 2. Acknowledge the error by updating the value of SMMU()GERRORN.HDBSS_ERR. 3. Program the values of SMMU()HDBSS_PRODn.ERR and SMMU()HDBSS_BASEn.ERRACK so that they are consistent. 4. Either allocate a new, empty HDBSS table or zero the current table. This applies to the table affected by the error (in the case of a GPC fault or external abort) or to both tables in the HDBSS-halted scenario. 5. Program both of the SMMU()HDBSS_PRODn fields to 0. 6. Program both of the SMMU(_)HDBSS_BASEn.V fields to 1. 3.13.10 Hardware Accelerator for Cleaning Dirty State This section applies only when SMMU_IDR3.HACDBS is 1. The Hardware Accelerator for Cleaning Dirty State (HACDBS) provides an enhanced method for updating page descriptors from writable-dirty to writable-clean. The HACDBS is a structure, similar to an HDBSS, that is generated by either the SMMU or a PE during live migration. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 131

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state Figure 3.13 shows a visual representation of the HACDBS. Figure 3.13: HACDBS components The base address of the HACDBS is a physical address configured in SMMU_()HACDBS_BASE.BADDR. The PA space for the HACDBS is determined as follows: • For a Non-secure HACDBS, the PA space is Non-secure. • For a Secure HACDBS, the PA space is Secure. • For a Realm HACDBS, the PA space is Realm. SMMU accesses to the HACDBS have the following memory attributes: • The Endianness is little-endian. • Cacheability and shareability attributes are configured in SMMU()CR1.{QUEUE_SH, QUEUE_OC, QUEUE_IC}. • Memory type is Normal memory. • All accesses are Tag Unchecked. • If SMMU_IDR3.MPAM is 1, the MPAM attributes are determined from the values configured in SMMU()HACDBS_MPAM. • For a Realm HACDBS, if SMMU_R_IDR3.MEC is 1, HACDBS accesses are performed using the MECID value configured in SMMU_R_HACDBS_MECID. The size of the HACDBS is configured in SMMU()HACDBS_BASE.SZ. The format for HACDBS entries is as described in FEAT_HACDBS in the Armv9.5-A architecture [2]. 3.13.10.1 HACDBS Processing When processing an HACDBS entry, the SMMU uses the same cleaning process as described in FEAT_HACDBS in the Armv9.5-A architecture [2], with the following exceptions: • The consumer index is stored in SMMU()HACDBS_CONS.INDEX. • The stage 2 translation table walk uses the configuration of the STE for the StreamID stored in SMMU()HACDBS_CONS.STREAMID. The SMMU has reached the end of the HACDBS structure when the value of SMMU()HACDBS_CONS.INDEX is greater than or equal to 2(SMMU()HACDBS_BASE.SZ+12)/8. When the SMMU has reached the end of the HACDBS structure, it stops processing entries, and an interrupt is generated. See section 3.18.2 Interrupt sources. The SMMU is permitted to process entries ahead of the current value of SMMU()HACDBS_CONS.INDEX. If an error is present, the SMMU is permitted to have successfully cleaned entries after the one indicated by SMMU()HACDBS_CONS.INDEX. When SMMU()CR0.SMMUEN == 0, processing a valid HACDBS entry causes the HACDBS to report an error. This is reported by setting SMMU(_)HACDBS_CONS.ERR_REASON to 0b010 (IPAF). ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 132

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state The effect of stage 2 permissions on how entries are processed by the SMMU is the same as described in FEAT_HACDBS in the Armv9.5-A architecture [2], with the following exceptions: • The index that is incremented is the consumer index stored in SMMU_()HACDBS_CONS.INDEX. • Faults caused by the stage 2 descriptor having permissions other than writable-clean or writable-dirty are reported by setting SMMU()HACDBS_CONS.ERR_REASON to 0b011 (IPAHACF). For the purpose of cleaning descriptors using this feature, STE.S2HD is treated as 1. As described in FEAT_HACDBS in the Armv9.5-A architecture [2], this also means that, for a descriptor to be qualified as writable-clean or writable-dirty in this context, it is not required that the dirty-state hardware update be enabled at stage 2. If SMMU_ROOT_IDR0.ROOT_IMPL is 1 and SMMU_ROOT_CR0.GPCEN is 1, then accesses to the HACDBS are subject to Granule Protection Checks. See 3.25 Granule Protection Checks. Translations and accesses required to clean stage 2 descriptors are captured in PMCG events in the same manner as described in 10.3 Monitor events. This means that Event IDs 1 to 5 can be counted as part of HACDBS processing. Note : Consistent with the A-profile architecture[2], accesses to the HACDBS are not counted. 3.13.10.2 HACDBS States The HACDBS is enabled if both of the following are true: • SMMU()HACDBS_BASE.EN == 1. • SMMU()HACDBS_CONS.ENACK == 1. If software programs SMMU()HACDBS_BASE.EN to 1, the HACDBS is considered to be disabled until the Update completes, that is until the SMMU sets SMMU()HACDBS_CONS.ENACK to 1. The HACDBS is disabled if both of the following are true: • SMMU()HACDBS_BASE.EN == 0. • SMMU()HACDBS_CONS.ENACK == 0. If software programs SMMU()HACDBS_BASE.EN to 0, the HACDBS is considered to be enabled until the Update completes, that is until the SMMU sets SMMU()HACDBS_CONS.ENACK to 0. When software updates SMMU()HACDBS_BASE.EN from 1 to 0, by also updating SMMU()HACDBS_CONS.ENACK, the SMMU guarantees all of the following: • All outstanding walks, including updates of descriptors from writable-dirty to writable-clean, have completed. • All entries up to and including SMMU()HACDBS_CONS.INDEX - 1 have been successfully processed. • In the case of an error, SMMU()HACDBS_CONS.INDEX points to the earliest entry that generated an error, and the values of the SMMU()HACDBS_CONS.{ERR_REASON, ERR} fields are updated. • No errors will be reported later. Completion of an Update of SMMU()CR0.SMMUEN from 1 to 0 guarantees that HACDBS translation requests and updates of descriptors from writable-dirty to writable-clean have completed. Any resulting errors are not guaranteed to be observed until completion of an Update of SMMU()HACDBS_BASE.EN from 1 to 0. 3.13.10.3 HACDBS Errors All errors described in FEAT_HACDBS in the Armv9.5-A architecture [2] apply to the HACDBS in the SMMU. Note: The bit encodings in SMMU()HACDBS_CONS.ERR_REASON do not match the ones defined by FEAT_HACDBS in the Armv9.5-A architecture [2], however, the order and descriptions are the same. If the SMMU encounters an error while processing the HACDBS, it updates both of the following fields: • SMMU()HACDBS_CONS.ERR. • SMMU(_)HACDBS_CONS.ERR_REASON. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 133

Chapter 3. Operation 3.13. Translation tables and Access flag/Dirty state When an error occurs, SMMU_()HACDBS_CONS.INDEX has the same behavior as HACDBSCONS_EL2.INDEX as described by FEAT_HACDBS in the Armv9.5-A architecture [2]. The SMMU handles a GPC fault on an access to the HACDBS as a GPC fault on an SMMU-originated access. See 3.25.3 SMMU-originated accesses. When there is an error related to the STE pointed to by SMMU()HACDBS_CONS.STREAMID or when STE.Config != 0b11x, this is reported by setting SMMU()HACDBS_CONS.ERR_REASON == 0b100 (STEF). The following table provides an overview of how HACDBS errors are reported: If HACDBS encounters an... ...the error reported in SMMU()HACDBS_CONS.ERR_REASON is Abort on HACDBS read STRUCTF Fetch of HACDBS from unsupported memory type STRUCTF GPF or GPT lookup error during access to the HACDBS table STRUCTF F_STE_FETCH, C_BAD_STREAMID, C_BAD_STE STEF F_VMS_FETCH, F_CD_FETCH, C_BAD_CD STEF STE with stage 2 not configured STEF GPF or GPT lookup error during access to the STE STEF F_ADDR_SIZE, F_TRANSLATION, or F_WALK_EABT IPAF External abort on the update of descriptor IPAF GPF or GPT lookup error during access to the translation tables IPAF SMMU()CR0.SMMUEN == 0 IPAF Errors that would report IPAHACF in FEAT_HACDBS IPAHACF The SMMU does not record any event in the Event queue for HACDBS errors. When the value of SMMU()HACDBS_CONS.ERR is toggled, if SMMU()GERRORN.HACDBS_ERR does not already indicate an active error, the SMMU activates SMMU()GERROR.HACDBS_ERR. If an error is observable in SMMU()GERROR.HACDBS_ERR, then it has already been made observable in SMMU()HACDBS_CONS.{ERR, ERR_REASON}. If the value of SMMU()HACDBS_CONS.ERR does not match the value of SMMU()HACDBS_BASE.ERRACK, then all of the following apply: • The SMMU stops processing entries from the HACDBS. • The error is reported in SMMU()GERROR.HACDBS_ERR. This raises an interrupt to the PE. If there is an error during processing of an HACDBS entry, software is expected to program SMMU()HACDBS_BASE.EN to 0. When the SMMU sets SMMU()HACDBS_CONS.ENACK to 0, it is guaranteed that all outstanding fetches of HACDBS entries, stage 2 walks and the update of descriptors from writable-dirty to writable-clean have completed. If software inappropriately configures the value of SMMU()HACDBS_BASE.ERRACK to mismatch the value of SMMU()HACDBS_CONS.ERR, it is CONSTRAINED UNPREDICTABLE whether the SMMU continues processing the HACDBS or whether a subsequent error is correctly reported. This CONSTRAINED UNPREDICTABLE behavior also applies when software transitions the HACDBS from disabled to enabled while the value of SMMU()HACDBS_BASE.ERRACK is not the same as the value of SMMU(*_)HACDBS_CONS.ERR. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 134

Chapter 3. Operation 3.14. Speculative accesses 3.14 Speculative accesses An implementation might allow incoming transactions to be marked as speculative in an IMPLEMENTATION DEFINED manner. Only read transactions are allowed to be marked as speculative. The behavior of a write transaction that is marked as speculative is always to terminate the transaction with an abort, and no event is recorded to software. The behavior of a read transaction that is marked as speculative depends on two things: 1. If the translation occurs successfully without faulting, the read transaction continues into the system and returns data. Otherwise if any kind of fault or configuration error occurs, the transaction is terminated with an abort; no event is recorded to software for any speculative transaction. The determination of a fault is no different from non-speculative read transactions, including Access flag faults. 2. If HTTU is enabled and translation succeeds without fault, the read transaction updates the Access flags of relevant translation table descriptors. The SMMU HTTU rules match those set out for Armv8.1-A [2], with respect to hardware update of Access flag and dirty state, including update of stage 2 translation table flags for both speculative accesses made at stage 1 and writes of stage 1 descriptors due to the setting of Access flags. An implementation might provide translation services to a client device, and might support speculatively-issued Translation Requests. An IMPLEMENTATION DEFINED mechanism must be used to differentiate speculative Translation Requests from non-speculative Translation Requests. Note: This mechanism might arise as an IMPLEMENTATION SPECIFIC service provided to another device. PCIe ATS Translation Requests are always non-speculative. If a received Translation Request is marked as speculative, behavior is dependent on the read/write property of the request: • Translation Requests for an address to be written grant write in the response only if the translation table descriptors that translate the address are all marked writable-dirty. In this case, if hardware management of the Access flag is enabled, the request updates AF. If hardware management of dirty state is enabled, speculative Write Translation Requests do not mark any writable-clean descriptor in the first or only stage of translation as writable-dirty. If the descriptor is marked writable-clean, the response does not grant write access. • Translation Requests for an address to be read return a successful response, if appropriate, and if hardware management of the Access flag is enabled updates AF. • In both cases, if hardware management of Access flag and dirty state is enabled in a nested translation then an update of a stage 1 descriptor to set AF or the Dirty state of the page might cause the stage 2 descriptors related to the updated stage 1 descriptor to be marked as Dirty as required. The response to a Translation Request indicates whether a translation request was denied because of a page fault or otherwise missing translation, or whether a valid translation existed but the request failed because the translation was writable-clean. Note: A device might use this information to determine whether to stop making requests or whether to subsequently try again with a non-speculative write. For speculative accesses of SMMU structures and translations, see section 3.21.1 Translation tables and TLB invalidation completion behavior and 3.21.3 Configuration structures and configuration invalidation completion. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 135

Chapter 3. Operation 3.15. Coherency considerations and memory access types 3.15 Coherency considerations and memory access types Arm anticipates that the SMMU will access all in-memory structures and queues in a manner that does not require software cache maintenance of the PE caches. Arm expects that this be IO-coherent access to normal shared memory, but in implementations that cannot support cache coherency, this might be non-cached access. Some Embedded Implementations might require use of memory mapped addresses as non-cached by the PEs, see section 3.16 Embedded Implementations below. The degree to which the different memory access types and attributes are supported is IMPLEMENTATION DEFINED. All in-memory structures and queues are accessed using Normal memory types. Configuration fields exist for Stream table, Context Descriptor and translation table fetches to govern Cacheability and Shareability of such accesses. MSI writes can be configured to make Device-type accesses. If hardware update of the Access flag or the dirty state of a page is supported, atomic access is required to update translation tables that are shared between the PE and SMMU. Support for atomic access using local monitors requires a fully-coherent interconnect port. If the memory system supports Armv8.1 [2] atomic operations, the SMMU might support atomic updates without local monitors, and not require a fully-coherent port. Because different SMMU implementations might use different mechanisms for atomic update of the flags, and because local monitors require coherent cacheable access, behavior is IMPLEMENTATION DEFINED if hardware flag updates are enabled on translation tables configured to be accessed as Non-cacheable. To limit complexity, the SMMU might respond to snoops from the system only as much as required for atomic updates to translation tables with local monitors, if required. This means that all other memory access by the SMMU might be IO-coherent. That is, SMMU configuration caches are not required to be snooped by PE accesses. When configuration data structures are changed, software is required to issue invalidation commands to the SMMU. The SMMU respects the same single-copy atomicity rules as PEs regarding 64-bit translation table descriptor accesses, or 128-bit translation table descriptor accesses if VMSAv9-128 is in use. When configured by software, that is when not fixed in Embedded Implementations, Arm recommends that the in-memory data structures and queues are treated as Normal memory cached by the PE when the SMMU implementation is able to access them IO-coherently. Note: This might be useful to avoid explicit cache maintenance on the PE side. When an SMMU is not able to make IO-coherent access, a similar programming model might be achieved using normal non-cached mappings from the PE. Note: The configuration structure invalidation commands might be used by a hypervisor to maintain coherency between guest and shadow structures that it might use. When a system supports IO-coherent accesses from the SMMU for access to configuration structures, translation tables, queues and CMD_SYNC, GERROR, Event queue and PRI queue MSIs, this is presented to software using SMMU_IDR0.COHACC == 1. If a system does not support IO-coherent access from the SMMU, SMMU_IDR0.COHACC must be 0. 3.15.1 Client devices SMMU translation is supported for Cache maintenance operations sent from client devices into the system. TLB-maintenance operations sent from client devices into the system are not permitted and are never propagated by the SMMU. SMMU clients might contain caches that are fully-coherent with the rest of the system. Fully-coherent traffic uses physical addresses for both memory and snoop requests on either direction. For example, client access to the cache and writebacks from the cache might use addresses that were obtained from the SMMU using ATS or obtained from a TLB in the client that is filled from the SMMU. The latter case might arise where an SMMU is implemented as part of a complex device. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 136

Chapter 3. Operation 3.15. Coherency considerations and memory access types Devices might contain caches that do not support hardware coherency and which might be filled using non-physical addresses through an SMMU. In distributed systems, different client devices might have different paths through the SMMU into the system, and these can differ in their ability to perform IO-coherent access. These paths might also differ from those used by the SMMU for its own configuration access, hence SMMU_IDR0.COHACC does not indicate whether client devices can also make IO-coherent accesses. Arm recommends that whether a given client device is capable of performing IO-coherent access is described to system software in a system-specific manner. 3.15.1.1 Fully-coherent client devices For a fully-coherent SMMU client the following behaviors apply: • Granule protection checks (GPC) apply to fully-coherent requests. • DPT checks apply to fully-coherent requests, with the following exception: – The DPT W bit is permitted to be treated as 1 for a fully-coherent client, where this is required by the coherency protocol. • Client-originated snoop requests bypass the SMMU and are not subject to DPT checks or GPC. Such requests might be terminated by a system-specific policy. – Note: The forwarding of snoop requests from the system to an SMMU fully-coherent client is IMPLE- MENTATION DEFINED and must meet the security requirements of the system. • A fully-coherent SMMU client is permitted to be either a StreamID client or a NoStreamID client. Related definitions in this context: • A client-originated fully-coherent request allows a cache line to be allocated into a coherent cache in an SMMU client, or relates to a previously allocated cache line. • A client-originated snoop request allows retrieving cache line data and/or modifying cache line state at a coherent cache, for cache lines that are managed by a home agent in an SMMU client. – Note: An SMMU client device might include a home agent in cases where the device is the backing store for memory that is coherently accessible to observers in the system. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 137

Chapter 3. Operation 3.16. Embedded Implementations 3.16 Embedded Implementations Some implementations might support the use of on-chip or internal storage for one or more of the Stream table structures, the Command queue, or the Event queue. This manifests itself as register base pointers and properties that are hard-wired to point to the on-chip storage. Such queues and structures are of a fixed size and configuration and in all cases are discoverable by system software. Software must not assume that it is necessary to allocate tables in RAM and set up pointers. It must initially probe for an existing configuration. SMMU_IDR1.TABLES_PRESET and SMMU_IDR1.QUEUES_PRESET indicate that the Stream table base address and queue base addresses are hardwired to indicate pre-existing storage for the tables or queues, or both. When SMMU_IDR1.REL is set, the base addresses are given relative to the start of the SMMU register memory map, rather than as absolute addresses. An implementation using internal storage for configuration and queues is not required to access this storage through the coherency domain of the PEs. Data accesses from the PE require manual cache maintenance or use of a non-cached memory type for these addresses. For an implementation using internal storage for configuration and queues, it is required that all address regions for those configuration structures and queues do not overlap. This requirement applies both within the same physical address space, and across Non-secure and Secure PA spaces. 3.16.1 Changes to structure and queue storage behavior when fixed/preset Non-preset tables/queues are stored in normal memory. When an Embedded Implementation (EI) contains a preset structure or queue in internal storage it is not required that all bits of all structures/queue entries are accessible exactly as they would be in normal memory. For example, an implementation might not provide storage for fields in structures and queues that would not be used by architected behavior. 3.16.1.1 Event Queue and PRI Queue All entries in an embedded Event queue or PRI queue, that is where SMMU_IDR1.QUEUES_PRESET == 1, are permitted to have read-only/write-ignored behavior with respect to software accesses. 3.16.1.2 Command Queue Entries in an embedded Command queue, that is where SMMU_IDR1.QUEUES_PRESET == 1, are readable and writable, but are not required to provide storage outside of the union of all defined fields for all implemented commands. In addition, referring to the Command encodings in Chapter 4 Commands, storage is not required to be provided for: • Reserved and undefined fields. • High-order bits of StreamID fields beyond the implemented range of StreamIDs. • High-order bits of SubstreamID fields beyond the implemented range of SubstreamIDs (including the entire field if SubstreamIDs are not implemented). • SSV fields, if SubstreamIDs are not implemented. • STAG bits that are always generated as ‘0’ to software. Note: An implementation might choose to use fewer than 16 bits of STAG when communicating stalled faults to software. • SSec, if only Non-secure state is supported. • CMD_SYNC MSIData, MSIAddress and MSIWriteAttributes if MSIs are not supported by the Security state of the Command queue. • ASID[15:8] if SMMU_IDR0.ASID16 == 0, or VMID[15:8] if SMMU_IDR0.VMID16 == 0. • CMD_CFGI_STE Leaf parameter (embedded Stream tables are single-level). • Fields in any command type that gives rise to CERROR_ILL. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 138

Chapter 3. Operation 3.16. Embedded Implementations A bit that is not stored due to these rules has RAZ/WI behavior. Note: An implementation determines the set of required storage bits from IMPLEMENTATION SPECIFIC configuration options and values. Software must not assume that it can write an arbitrary 16-byte sequence to a Command queue entry and read back the sequence unmodified. However, functional fields that form valid command parameters must be readable by software for debug and read-modify-write construction of commands (the queue is not considered write-only). 3.16.1.3 Stream Table Entry Entries in an embedded Stream table are freely read/write accessible, but storage is not required to be provided for: • Undefined fields. • Reserved/ RES0 fields. • Fields that are IGNORED in all possible configurations that an implementation supports. • Fields permitted to have RAZ/WI behavior. As an example, storage is not required for STE.S1ContextPtr on an SMMU that has an embedded Stream table but does not support stage 1. Note: Fields Reserved for software use do not alter SMMU function but must be stored in their entirety. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 139

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance 3.17 TLB tagging, VMIDs, ASIDs and participation in broadcast TLB mainte- nance Cached translations within the SMMU are tagged with: • A translation regime, given by the STE’s StreamWorld and derived in part from STE.STRW. This applies to all cached translations. • An ASID, if the translation regime supports ASIDs. • A VMID, if stage 2 is implemented by the SMMU and if the translation regime supports VMIDs. This is summarized in the table below: StreamWorld Address Type Tags ASID ASET VMID NS-EL1 VA Yes, if TTD.nG == 1 Yes(2) Yes(1) NS-EL1 IPA (1) No No(3) Yes(1) Realm-EL1 VA Yes, if TTD.nG == 1 Yes(2) Yes(1) Realm-EL1 IPA (1) No No(3) Yes(1) any-EL2(4) VA No No(3) No any-EL2-E2H(4) VA Yes, if TTD.nG == 1 Yes(2) No Secure VA Yes, if TTD.nG == 1 Yes(2) Yes(5) EL3 VA No No(3) No (1) If SMMU_IDR0.S2P == 1. (2) ASET is required to be included in TLB records when the nG bit in the descriptor is 0. Arm expects, but does not require, ASET to be included in TLB records when the nG bit in the descriptor is 1 to support the limitation of broadcast invalidation feature of ASET. (3) Arm permits but does not require the inclusion of ASET in TLB records for EL3 and EL2. (4) Applies to each of NS-EL2, S-EL2, and Realm-EL2, if supported. (5) When Secure stage 2 is supported. This is consistent with TLB tagging in PEs. Note: In this specification, the term cached translations refers to the contents of a PE-style ASID/VMID/Address TLB. Any cached configuration structures are considered architecturally separate from the translations that are located from the configuration. Configuration caches are not required to be tagged as described in this section. Use of these tags ensures that no aliasing occurs between different translations for the same address within different ASIDs, or between the same ASID under different VMIDs, or between the same ASID within different translation regimes, or between different translation regimes without ASIDs (for example any-EL2 and EL3). For both lookup and invalidation purposes, ASID values can be considered to be separate namespaces within each VMID and translation regime. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 140

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance Note: For example, TLB entries tagged as ASID 3 in a Secure stage 1 cannot be matched by lookups for ASID 3 in an NS-EL1 stage 1 configuration. Similarly, a TLB entry that is tagged as either of S-EL2 or NS-EL2 can never be matched by a lookup from an EL3 context, even if the address matches. In a regime that lacks ASIDs to differentiate address spaces, all CDs are considered equivalent (similar to two CDs with the same ASID value) even if referenced using different STEs. This implies that SubstreamIDs cannot differentiate address spaces in any-EL2/EL3 StreamWorlds. See section 5.4.1 CD notes for restrictions on permitted differences between CDs in such StreamWorlds. EL3 tags TLB entries as EL3 without an ASID and Arm expects this StreamWorld to be selected for Secure streams used by EL3 software on an Armv8-A host PE whose EL3 runs in AArch64 state. There are no ASIDs in an AArch64 EL3 translation regime. When SMMU_IDR0.S1P == 1, the SMMU supports 16-bit ASIDs if SMMU_IDR0.ASID16 == 1. When SMMU_IDR0.S2P == 1, the SMMU supports 16-bit VMIDs if SMMU_IDR0.VMID16 == 1. Consistent with PEs, all TLB entries inserted using NS-EL1 configurations are tagged with VMIDs when stage 2 is implemented, regardless of whether configuration is stage 1-only, stage 2-only or stage 1+stage 2 translation. When stage 2 is configured, with or without stage 1, or if stage 1-only translation is configured, the VMID is taken from the STE.S2VMID field. When stage 2 is not implemented, no VMID tag is required on TLB entries. Note: Some implementations and interconnects might support the transmission of VMID value onwards into the system, so that Completer devices might further arbitrate access on a per-transaction basis. Where SMMU bypass is enabled (SMMU_(S_)CR0.SMMUEN == 0) so that STE structures are unused, the SMMU_(S_)GBPA.IMPDEF field might be used. Otherwise, the STE.S2VMID field might be used. Details of these use cases are outside of the scope of this specification. SMMU support for TLB maintenance messages that are broadcast from the PE is optional, but Arm recommends that support is implemented. These messages convey TLB invalidations from certain TLBI instructions on PEs. The Armv8.0 architecture [2] requires invalidation broadcasts to affect other PEs or agents in the same Inner Shareable domain. Armv8.4 [2] adds support for a PE to issue broadcast TLB invalidation operations to the Outer Shareable domain as well as the existing behavior of issuing to the existing Inner Shareable domain. Broadcast TLB invalidate messages convey one or more of an address, an ASID or a VMID as required for a given invalidation operation, the scope defined by a translation stage to be affected and the translation regime in which the TLB entries are tagged. Support for broadcast invalidation is indicated by SMMU_IDR0.BTM. Note: The Shareability domain of the SMMU is a property of the system. When broadcast TLB invalidation is implemented then an implementation of any SMMUv3 version responds to the broadcast invalidation scope corresponding to its assigned Shareability domain. If SMMU_IDR0.BTM == 1 and SMMU_()CR2.PTM == 1, the SMMU is permitted but not required to ignore broadcast TLB invalidation operations for the corresponding Security state. Arm strongly recommends that SMMU implementations ignore broadcast TLB invalidation operations if SMMU(_)CR2.PTM is 1, for performance and reliability considerations. Broadcast TLB invalidation messages that would invoke an illegal operation, such as an invalidation that applies to a stage or Security state that is not implemented in the SMMU, are silently ignored, with the exception that messages that have a combined effect must affect the implemented stages and ignore any unimplemented stage. When SMMU_IDR0.S2P == 0, the SMMU matches VMID 0 for incoming broadcast TLB invalidation messages for a regime that uses VMIDs. When SMMU_IDR0.S2P == 1, a broadcast TLB invalidation message for a regime that uses VMIDs is treated as having VMID 0 if it is sent from a PE that does not implement EL2. Note: On PEs that implement EL2 but have stage 2 disabled, Arm expects software to configure VTTBR_EL2.VMID to 0. This ensures that for broadcast TLBI operations that include a VMID the VMID is set to 0. Note: If, in a translation regime that uses VMIDs, stage 1-only translations coexist with stage 1 and stage 2 translations, then different VMID values must be used in each configuration to avoid the stage 1-only translations matching lookups that use stage 1 and stage 2 configurations. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 141

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance Note: Arm expects broadcast invalidation from PEs to be used where address spaces are shared with the SMMU and common translations are maintained, such as Shared Virtual Memory applications, or stage 2 translations that share a common stage 2 translation table between VMs and the SMMU. When an address space is not shared with PE processes, broadcast TLB invalidations from the PEs to the SMMU have no useful effects and might over-invalidate unrelated TLB entries. Non-shared address spaces arise when a custom address space is set up for a particular device – for example, scatter-gather or DMA isolation use cases. Arm expects that both shared and non-shared stage 1 translations might be in use simultaneously. Stage 1 CDs contain an ASET flag, that represents the shared or non-shared nature of the ASID and address space. The set of ASIDs for non-shared address spaces might opt out of broadcast invalidation. Note: Arm expects that SMMU stage 2 address spaces are generally shared with their respective PE virtual machine stage 2 configuration. If broadcast invalidation is required to be avoided for a particular SMMU stage 2 address space, Arm recommends that a hypervisor configures the STE with a VMID that is not allocated for virtual machine use on the PEs. For a stage 1 configuration with a StreamWorld that has ASIDs, when CD.ASET == 1, the address space and ASID are non-shared. TLB entries that belong to that StreamWorld are not required to be invalidated by the broadcast invalidation operations that match with an ASID. These operations are VA{L}ExIS and ASIDExIS in the appropriate translation regime. All other matching broadcast invalidations are required to affect these entries. Where CD.ASET == 0, the ASID is considered shared with PE processes. TLB entries that belong to that StreamWorld are required to be affected by all matching broadcast invalidates. The definition of matching is identical to that of Armv8-A PE TLBs [2]. CD.ASET does not affect the invalidation of stage 2 translation information. For translation lookup of non-global TLB entries and command-based invalidation purposes, ASID values with CD.ASET == 0 are considered equivalent to ASID values with CD.ASET == 1. CMD_TLBI_* commands invalidate all matching TLB entries regardless of their ASET value. CD.ASET affects translation lookup of global TLB entries. For information about global TLB entry matching, see section 3.17.1 The Global flag in the translation table descriptor. A stage 1 configuration in a translation regime that does not have ASIDs, that is where StreamWorld == any-EL2 or StreamWorld == EL3, ignores the ASID field and is permitted but not required to tag TLB entries using ASETs. An equivalent semantic applies, in that ASET == 0 entries are affected by broadcast invalidation and ASET == 1 entries are not required to be invalidated by certain operations. EL2 TLB entries with ASET == 1 are not required to be invalidated by VA{L}ExIS or VAA{L}ExIS but must be invalidated by ALLE2IS. EL3 TLB entries with ASET == 1 are not required to be invalidated by VA{L}E3IS but must be invalidated by ALLE3IS. Note: Arm does not anticipate that ASET == 1 has an effect on EL2 and EL3 contexts, however the behavior described here is consistent with other StreamWorld configurations. In a regime that lacks ASIDs to differentiate address spaces, all CDs are considered equivalent (similar to two CDs with the same ASID value) even if referenced using different STEs. This implies that SubstreamIDs cannot differentiate address spaces in any-EL2/EL3 StreamWorlds. See section 5.4.1 CD notes for restrictions on permitted differences between CDs in such StreamWorlds. Note: A broadcast invalidation operation that originates from a PE in any-EL2-E2H mode is not required to invalidate SMMU TLB entries that were inserted with StreamWorld == any-EL2, see section 3.17.5 EL2 ASIDs and TLB maintenance in EL2 Host (E2H) mode. Note: See section 16.7.7 AMBA DVM messages with respect to CD.ASET == 1 TLB entries for AMBA interconnect DVM behavior with respect to ASET == 1. Note: The ASID namespace might be affected by ASID rollover on the PE. These situations might be handled by: • Refreshing the ASID namespace on the PE side and reallocating free ASIDs to new processes, but leaving ASIDs that are shared with SMMU contexts untouched. • Swapping the ASID that is used in a CD, so that the old ASID is removed from the SMMU, and future traffic uses a freshly-allocated ASID. This can be achieved with an overlap in which both the old ASID and new ASID are active as the old ASID is updated in the CDs. This is followed by invalidation commands to the ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 142

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance affected CDs (causing the SMMU to use the new ASID), and a CMD_SYNC. These steps are followed by TLB invalidation commands and a CMD_SYNC to remove all usage of the old ASID. When the final CMD_SYNC has ensured that these commands are complete, the old ASID can be considered free and the system can re-use it for a different address space. 3.17.1 The Global flag in the translation table descriptor For translation regimes that have an ASID, Armv8-A [2], defines an nG bit in the the Block and Page descriptors that allows them to be marked as either global or non-global. A translation that is performed for a Secure stream is treated as non-global, regardless of the value of the nG bit in the descriptor, if the descriptor is fetched from Non-secure memory. The any-EL2 and EL3 StreamWorlds do not support ASIDs, and the nG bit in the descriptor has no effect on these regimes. Note: A translation table descriptor fetch from Non-secure memory might happen for several different reasons, for example because of the effective NS bit at that level of the translation table walk, or because the fetch uses the Non-secure IPA space which is implicitly Non-secure. See section 3.10.2.2 Secure EL2 and support for Secure stage 2 translation. When entries are global, an ASID is not required to be recorded in any resulting TLB entry. During lookup, a TLB entry marked as global can match regardless of the ASID that is provided with the lookup. The nG bit in the descriptor does not allow a TLB entry to match a lookup from a StreamWorld that is not the same as the one from which the TLB entry was created. Note: Global translations are used across address spaces with identical layout conventions, OS kernel addresses might be common across all process address spaces, and so they might be marked global. However, an SMMU might be used with many custom address spaces that are laid out in a manner convenient to the client device they serve, without common mappings. The SMMU only matches global TLB entries, that is where the nG bit in the descriptor is zero, against lookups from the same StreamWorld and ASID set (ASET). Global TLB entries with ASET == 0 do not match lookups through configurations with ASET == 1. Global TLB entries with ASET == 1 do not match lookups through configurations with ASET == 0. Invalidation rules for non-locked global mappings are identical to those in Armv8-A, where the ASIDE1 and ASIDE2 scopes are not required to invalidate global mappings. 3.17.2 Broadcast TLB maintenance from Armv8-A PEs with EL3 in AArch64 When the Secure Stream table is controlled by an Armv8-A PE where EL3 is using AArch64 state, software has the option of marking an STE as Secure, S-EL2, S-EL2-E2H or EL3 using the StreamWorld field, STE.STRW. When the StreamWorld is Secure, the stream is configured on behalf of Secure-EL1 software and the resultant TLB entries are tagged as Secure, including an ASID if non-global. Such entries must be invalidated by: • PE broadcast TLB invalidations (where supported and if CD.ASET allows) from Secure EL1 instructions with the following scope: – VA{L}E1 – VAA{L}E1 – ASIDE1, for non-global entries – ALLE1 • SMMU invalidation commands on the Secure Command queue. These commands are: – CMD_TLBI_NH_ALL – CMD_TLBI_NH_ASID, for non-global entries – CMD_TLBI_NH_VAA – CMD_TLBI_NH_VA When StreamWorld is EL3, the stream is configured on behalf of EL3 software and resultant TLB entries are tagged as EL3, without ASID, which differentiates them from the Secure case above. Such entries are invalidated ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 143

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance by: • PE broadcast TLB invalidations (where supported and if CD.ASET allows) from EL3 instructions with scope of VA{L}E3, ALLE3. • SMMU invalidation commands on the Secure Command queue: – CMD_TLBI_EL3_VA – CMD_TLBI_EL3_ALL An SMMU with SMMU_IDR0.RME_IMPL == 1 is not required to perform any invalidation on receipt of a broadcast TLBI for EL3 as EL3 StreamWorld is not supported. 3.17.2.1 Broadcast TLB maintenance when Secure EL2 is implemented The Secure EL2 and Secure stage 2 facilities introduced in SMMUv3.2 interoperate with the corresponding facilities on a PE. Broadcast TLB maintenance from a PE affects SMMU TLB entries of the same scope where supported and if CD.ASET allows. For broadcast TLB maintenance with Secure EL1 scope from a PE that does not implement Secure EL2, or a PE that implements Secure EL2 and has SCR_EL3.EEL2 == 0, the following rules apply: • When SMMU_S_IDR1.SEL2 == 0, invalidation scope is not determined by VMID and invalidation occurs as described in 3.17.2 Broadcast TLB maintenance from Armv8-A PEs with EL3 in AArch64 and in [2]. • When SMMU_S_IDR1.SEL2 == 1, the invalidation operations are interpreted as though they have VMID 0. – Note: When Secure EL2 is implemented, StreamWorld=Secure TLB entries that are inserted from configuration with stage 2 disabled are tagged with VMID 0. See section 3.10.2.2 Secure EL2 and support for Secure stage 2 translation. Note: When a PE executes a VMALLS12 that targets Secure state, and Secure EL2 is either disabled or not implemented, it is only guaranteed to emit a broadcast TLBI with scope of stage 1 and VMID == 0. For broadcast TLB maintenance with Secure EL1 scope from a PE that has SCR_EL3.EEL2 == 1, the following rules apply: • When SMMU_S_IDR1.SEL2 == 0, SMMU TLB entries are not required to be affected. • When SMMU_S_IDR1.SEL2 == 1, TLB entries that are in scope of both the invalidation operation and the supplied VMID are invalidated. Note: Arm recommends that care is taken when integrating an SMMU implementation of SMMUv3.1 or earlier into a system that supports broadcast TLB maintenance from PEs implementing Secure EL2. Arm recommends that broadcast TLB maintenance from a PE that has SCR_EL3.EEL2 == 1 does not affect Secure SMMU TLB entries, and that steps are taken to ensure that malfunction is avoided if the SMMU receives new Secure broadcast TLB maintenance operations that contain a VMID. 3.17.3 Broadcast TLB maintenance from ARMv7-A PEs or Armv8-A PEs with EL3 using AArch32 When the Secure Stream table is controlled by an ARMv7-A PE or an Armv8-A PE where EL3 is using AArch32 state, Arm expects software to mark the StreamWorld of an STE as Secure. The resultant TLB entries are tagged as Secure, including an ASID if non-global. Such entries are invalidated by: • PE broadcast TLB invalidations (where supported and if CD.ASET allows) from Secure instructions with the following scope: – MVA{L} – MVAA{L} – ASID, for non-global entries – ALL – Note: VA{L}E3 and ALLE3 are AArch64-only and unavailable in this scenario. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 144

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance • SMMU invalidation commands on the Secure Command queue. These commands are: – CMD_TLBI_NH_ALL – CMD_TLBI_NH_ASID, for non-global entries – CMD_TLBI_NH_VAA – CMD_TLBI_NH_VA Note: Broadcast invalidations from ARMv7-A PEs or Armv8 PEs where EL3 is using AArch32 might fail to invalidate SMMU TLB entries that are tagged with StreamWorld == EL3. 3.17.4 Broadcast TLB maintenance in mixed AArch32 and AArch64 systems and with mixed ASID or VMID sizes Broadcast TLB maintenance instructions that are executed from Armv7-A and Armv8-A PEs using AArch32 and AArch64 Exception levels affect SMMU TLB entries (if broadcast TLB invalidation is supported and participation enabled), when the addresses, ASIDs, VMIDs and StreamWorlds match as appropriate. The exceptions are: • If a PE where EL3 uses AArch32 state issues an AArch32 TLBI that affects Secure entries, the TLBI is not required to affect SMMU TLB entries that were created with StreamWorld == EL3. • If a PE where EL3 uses AArch64 state issues an AArch64 or AArch32 TLBI that affects Secure EL1 entries, the TLBI is not required to affect SMMU TLB entries created with StreamWorld == EL3. • If a PE where EL3 uses AArch64 state issues an AArch64 TLBI that affects EL3 entries, the TLBI is not required to affect SMMU TLB entries created with StreamWorld == Secure. ARMv7-A PEs have 8-bit ASIDs and VMIDs. Armv8-A PEs might have 16-bit ASIDs or VMIDs or both. An SMMU implementation supports 8-bit or 16-bit ASIDs and VMIDs, as indicated by SMMU_IDR0.{ASID16,VMID16}. A difference in ASID or VMID size between the originator of the broadcast TLB maintenance instruction and the SMMU is resolved as follows. For each of ASID and VMID: • An SMMU that supports a 16-bit ASID or VMID compares the incoming 16-bit broadcast value to its TLB tags directly and matches if the values are equal. – The incoming 16-bit value is constructed by the system from an originator with an 8-bit ASID or VMID by zero-extending the value to 16 bits. • An SMMU that supports an 8-bit ASID or VMID compares the bottom 8 bits of the incoming broadcast values to its TLB tags: – The comparison is required to match if the bottom 8 bits are equal and the top 8 bits are zero. – The comparison is not required to, but might, match if the bottom 8 bits are equal but the top 8 bits are non-zero. When the SMMU supports 16-bit ASIDs, that is when SMMU_IDR0.ASID16 == 1, it does so for all StreamWorlds that use ASIDs (NS-EL1, Secure, any-EL2-E2H). The SMMU does not differentiate ASID size by AArch32 state contexts, as does an Armv8-A PE and, if supported by an implementation, 16-bit ASIDs can be used in CDs where CD.AA64 == 0. Arm expects that legacy software will continue to write zero-extended 8-bit values in the ASID field in this case. The same behavior applies for 16-bit VMIDs, when SMMU_IDR0.VMID16 == 1, the behavior of which is not modified by STE.S2AA64 == 0. 3.17.5 EL2 ASIDs and TLB maintenance in EL2 Host (E2H) mode The Non-secure programming interface supports StreamWorlds NS-EL2 and NS-EL2-E2H that correspond to PE Exception level EL2 in Non-secure state with and without E2H mode. When Secure EL2 is supported, the Secure programming interface supports StreamWorld S-EL2 and S-EL2-E2H that correspond to PE Exception level EL2 in Secure state with and without E2H mode. The EL2 translation regime consists of: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 145

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance • A stage 1 translation with one translation table without user permission checking. • TLB entries that are not tagged with an ASID. In E2H mode, the EL2 translation regime remains stage 1-only, but consists of two stage 1 translation tables that function the same way as those for an EL1 stage 1 translation. The resulting TLB entries are tagged as EL2, and might include an ASID. Non-secure EL2-E2H mode can be configured for Non-secure STEs using STE.STRW when SMMU_IDR0.Hyp == 1 and SMMU_CR2.E2H == 1. Secure EL2-E2H mode can be configured for Secure STEs using STE.STRW when SMMU_S_IDR1.SEL2 == 1 and SMMU_S_CR2.E2H == 1. Note: In the Armv8.1-A architecture [2], EL2-E2H mode is referred to as the Virtualization Host Extensions. Broadcast TLB maintenance from a PE affects SMMU TLB entries when the whole system uses EL2 or EL2-E2H mode. Broadcast messages from a PE running in EL2-E2H supply an EL2 scope, and also supply an ASID to match. In addition, EL2-E2H mode extends the set of EL2 broadcast invalidations to the following operations: • Invalidate all, for EL2-tagged entries. • Invalidate by VA and ASID, for EL2. • Invalidate by VA in all ASIDs, for EL2. • Invalidate all by ASID, for EL2. If a PE running at EL2 uses E2H mode, but an SMMU contains TLB entries that were inserted with StreamWorld == any-EL2 configuration, the EL2-E2H broadcast invalidations from the PE are not required to invalidate these TLB entries. If a PE running at EL2 does not use E2H mode, but an SMMU contains TLB entries that were inserted with StreamWorld == any-EL2-E2H configurations, EL2 broadcast invalidations from the PE are not required to invalidate these TLB entries. Note: For each Security state, if broadcast invalidation is required for translations controlled by EL2 software, Arm recommends that StreamWorld == any-EL2 is used when the corresponding Security state in host PEs does not use E2H mode, and that StreamWorld == any-EL2-E2H is used when the corresponding Security state in host PEs use the E2H mode. An implementation is not required to differentiate TLB entries with StreamWorld == any-EL2 from those with StreamWorld == any-EL2-E2H. Arm expects that, for a given Security state, the SMMU is programmed in a way that ensures that TLB entries of both of these StreamWorlds never coexist in a TLB. Therefore: • A change to SMMU_CR2.E2H must be accompanied by an invalidation of all TLB entries that could have been created from a Non-secure STE with StreamWorld == NS-EL2 or StreamWorld == NS-EL2-E2H. See 6.3.12.3 E2H for details. • A change to SMMU_S_CR2.E2H must be accompanied by an invalidation of all TLB entries that could have been created from a Secure STE with StreamWorld == S-EL2 or StreamWorld == S-EL2-E2H. • The behavior of TLB invalidation commands CMD_TLBI_EL2_VAA and CMD_TLBI_EL2_VA might change depending on SMMU_CR2.E2H, see the individual commands for details. • The behavior of TLB invalidation commands CMD_TLBI_S_EL2_VAA and CMD_TLBI_S_EL2_VA might change depending on SMMU_S_CR2.E2H, see the individual commands for details. Note: A TLB lookup through a configuration with StreamWorld == any-EL2-E2H matches the ASID of the configuration with the ASID tag of the EL2 TLB entry, unless the entry is marked Global. A TLB insertion through the same configuration inserts a TLB entry tagged with the ASID of the configuration unless the translation is Global. When StreamWorld == any-EL2 a TLB lookup does not match the ASID tag of EL2 TLB entries, nor does a TLB insertion tag the entry with a known ASID value. A change to SMMU_CR2.E2H or SMMU_S_CR2.E2H can cause unexpected TLB entries to match. Note: SMMU_CR2.E2H and SMMU_S_CR2.E2H may also be cached in a Configuration Cache. 3.17.6 VMID Wildcards Some virtualization use cases involve the presentation of different views of page permissions in the same address space to different device streams. The mechanism by which one address space is split into more than one view is ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 146

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance outside the scope of this specification. Note: STEs in the same Security state and with different translation table pointers must have different VMIDs. TTBs in the same VMID are considered equivalent within a Security state. In the SMMU, SMMU_CR0.VMW controls a VMID wildcard function that enables groups of Non-secure VMIDs to be associated with each other for the purposes of invalidation. An invalidation operation that matches on Non-secure VMID matches one exact VMID or ignores a configured number of VMID LSBs, as configured by SMMU_CR0.VMW. Note: For example, SMMU_CR0.VMW might configure 1 LSB of VMID to be ignored so an incoming broadcast invalidate for VMID 0x0020 matches TLB entries tagged with VMID 0x0020 or 0x0021. This configuration allows VMIDs to be allocated in groups of adjacent or contiguous values, using one VMID in the PEs for the VM and one or more of the others to support different IPA address space views in different device stage 2 configurations. Both broadcast TLB invalidation and explicit SMMU TLB invalidation commands, whether for stage 1 within the guest or at stage 2, affect all Non-secure VMIDs that match the group wildcard when SMMU_CR0.VMW != 0. Note: The broadcast TLB invalidation mechanisms that might exist on a PE and interconnect are not modified by this feature. The VMW field modifies the internal behavior of the SMMU on receipt of such a broadcast invalidation. The VMID wildcard controlled by SMMU_CR0.VMW only affects a VMID that matches on invalidation. The SMMU continues to store all bits of VMID in the TLB entries that require them, and does not allow dissimilar VMID values to alias on lookup. The SMMU_S_CR0.VMW field provides a second Secure VMID wildcard feature that works in a similar way as described in this section, except affects Secure VMIDs only. 3.17.7 Broadcast TLB maintenance for GPT information An SMMU with RME and SMMU_ROOT_IDR0.BGPTM == 1 participates in broadcast TLBI PA instructions from PEs that are executing in EL3. Consistent with the definition for FEAT_RME [2], a TLBI PA to the Outer Shareable shareability domain affects the SMMU. This applies to all SMMUs with RME and SMMU_ROOT_IDR0.BGPTM == 1, regardless of the values of SMMU_IDR0.BTM and SMMU_()CR2.PTM. Note: The behavior of SMMU_IDR0.BTM and SMMU()CR2.PTM applies only to broadcast invalidations relating to stage 1 and stage 2 translation. Note: An SMMU with RME does not have to receive the other broadcast TLBI operations. Support for broadcast operations is indicated in SMMU_IDR0.BTM and configured in SMMU(_)CR2.PTM. Note: The SMMU guarantees the same rules around observability and completion of TLBI PA* and DSB instructions as defined in the RME specification. 3.17.8 TLBInXS maintenance operations This section applies only when SMMU_IDR0.BTM == 1. An SMMU that participates in broadcast TLB maintenance operations must correctly interoperate with PEs that support the XS attribute and nXS variants of the TLBI and DSB instructions that were introduced in the Armv8.7-A architecture [2] under the feature name FEAT_XS. Note: The Armv8.7-A architecture [2] includes the following rules regarding the effects of the nXS qualifier on the TLBI and DSB instructions: • TLBI instructions might be executed with or without the nXS qualifier. • DSB instructions might be executed with or without the nXS qualifier. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 147

Chapter 3. Operation 3.17. TLB tagging, VMIDs, ASIDs and participation in broadcast TLB maintenance • The HCRX_EL2.FnXS control bit might override the effect of the nXS qualifier for TLBI instructions issued from EL1. The requirements for an SMMU to interoperate with PEs that support the XS attribute and nXS variants of the TLBI and DSB instructions are as follows: • The MAIR encodings 0b0000dd01, 0b01000000, and 0b10100000 remain Reserved, and the XS attribute is taken to be 0 for all MAIR encodings. • Bit [11] of stage 2 block and page descriptors remain RES0, and the XS attribute is taken to be 0 for all stage 2 translations. • The SMMU behaves as though the XS attribute for cached translations is 0 when determining the effect of a TLBI or TLBInXS operation. • For each Security state, if the corresponding SMMU_(*_)CR2.PTM == 0: – The SMMU does not complete DSB instructions with the nXS qualifier until the requirements for previously-received TLBInXS operations are complete. This includes the case of TLBI instructions without the nXS qualifier issued from EL1 when HCRX_EL2.FnXS == 1. – The SMMU does not complete DSB instructions without the nXS qualifier until the requirements for previously-received TLBI and TLBInXS operations are complete. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 148

Chapter 3. Operation 3.18. Interrupts and notifications 3.18 Interrupts and notifications Events that are recorded to the Event queues, PRI requests and Global errors have associated interrupts to allow asynchronous notification to a PE. An implementation might support Message Signaled Interrupts (MSIs) which take the form of a 32-bit data write of a configurable value to a configurable location, typically, in GICv3 systems, the GITS_TRANSLATER or GICD_SETSPI_NSR registers. For more information, see [7]. When SMMU_S_IDR1.SECURE_IMPL == 1, Arm expects notifications that were generated by Secure events to set Secure SPIs using the GICD_SETSPI_SR register in systems that implement the GICv3 architecture. An implementation must support one of, or optionally both of, wired interrupts and MSIs. Whether an implementation supports MSIs is discoverable from SMMU_IDR0.MSI and SMMU_S_IDR0.MSI. An implementation might support wired interrupt outputs that are edge-triggered. The discovery of support for wired interrupts is IMPLEMENTATION DEFINED. Arm recommends that an implementation does not support Secure MSIs without also supporting Non-secure MSIs. It is not permitted for an interrupt notification of the presence of new information to be observable before the new information is also observable. This applies to MSI and wired interrupts, when: • A Global error condition arises. The change to the Global error register, GERROR, must be observable if the interrupt is observable. • New entries are written to an output queue. The presence of the new entries must be observable to reads of the queue index registers if the interrupt is observable. See section 3.5.2 Queue entry visibility semantics. • A CMD_SYNC completes. The consumption of the CMD_SYNC must be observable to reads of the queue index registers if the interrupt is observable. Each MSI can be independently configured with a memory type and Shareability. This makes it possible to target a Device MSI target register or a location in Normal memory (that might be cached and shareable). See the SMMU_IDR0.COHACC field, which indicates whether the SMMU and the system support coherent accesses, including MSI writes. Note: A PE might poll this location or might, for example in Armv8-A PEs, wait for loss of an exclusive reservation that covers an address targeted by the notification. In this example, an Armv8-A PE with an SMMU that is capable of making shared cacheable accesses can achieve the same behavior as a WFE wake-up event notification (see CMD_SYNC) without wired event signals, using MSIs that are directed at a shared memory location. Note: If the destination of an MSI write is a register in another device, Arm recommends that it is configured with Device-nGnRnE or Device-nGnRE attributes. The SMMU does not output inconsistent attributes as a result of misconfiguration. Outer Shareable is used as the effective Shareability when Device or Normal Inner Non-cacheable Outer Non-cacheable types are configured. MSIs that are generated by Secure sources are performed with Secure accesses and target the Secure PA space. MSIs from Non-secure sources are performed with Non-secure accesses and they target the Non-secure PA space. Apart from the memory type, Shareability and NS attributes of MSIs, all other attributes of the MSI write are IMPLEMENTATION DEFINED. A GICv3 Interrupt Translation Service (ITS) differentiates interrupt sources using a DeviceID. To support this, the SMMU does the following: a) Passes StreamIDs of incoming client device transactions. These generate DeviceIDs in a system-specific manner. b) Produces a unique DeviceID of its own, one that does not overlap with those produced for client devices, for outgoing MSIs that originate from the SMMU. As with any other MSI-producing Requester, this is set statically in a system-defined manner. SMMU MSIs are configured with several separate pieces of register state. The MSI destination address, data payload, Shareability, memory type and enables in combination construct the MSI write, onto which the unique DeviceID of the SMMU is attached. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 149

Chapter 3. Operation 3.18. Interrupts and notifications Edge-triggered interrupts can be coalesced within the system interrupt controller. The SMMU can internally coalesce events and identical interrupts so that only the latest interrupt is sent, but any coalescence must not significantly delay the notification. This applies to both MSIs and edge-triggered wired interrupts. When MSIs are not supported, the interrupt configuration register fields that would configure MSI address and data are unused. Only the interrupt Enable field is used. 3.18.1 MSI synchronization The SMMU ensures that previously-issued MSI writes are completed at the following synchronization points: • For register-based MSI configuration, the act of disabling an MSI through SMMU_()IRQ_CTRL, see Chapter 6 Memory map and registers. • A CMD_SYNC ensures completion of MSIs that originate from the completion of prior CMD_SYNC commands that were consumed from the same Command queue. Completion of an MSI guarantees that the MSI write is visible to its Shareability domain or, if an abort response was returned, ensures that the abort is visible in GERROR with the appropriate SMMU()GERROR.MSI*_ABT_ERR flag. Note: Completion of an MSI terminated with abort sets a GERROR flag but does not guarantee completion of a subsequent GERROR interrupt that might be raised to signal the setting of the flag. The two synchronization points define a point in time, t, for the respective interrupt sources. If MSIs are related to occurrences before this point t, they do not become visible after point t. In the case of register-based MSI configurations, the additional guarantee is made that MSIs triggered after the MSI is re-enabled will use the new configuration. For more information on interrupt enable and synchronization, see SMMU_IRQ_CTRL. 3.18.2 Interrupt sources The SMMU has the following interrupt sources. Depending on the implementation, each interrupt source asserts a wired interrupt output that is unique to the source, or sends an MSI, or both. Source Trigger reason Notes Event queue Event queue transitions from empty to non-empty - Secure Event queue Event queue transitions from empty to non-empty - PRI queue PRI queue interrupt condition, see SMMU_PRIQ_IRQ_CFG2 - Command queues CMD_SYNC Sync complete, with option of generated interrupt MSI configuration (destination & data) present in command. See also 3.5.7.7.4 DCMDQ MSIs. Secure Command queues CMD_SYNC Sync complete, with option of generated interrupt MSI configuration (destination & data) present in command. See also 3.5.7.7.4 DCMDQ MSIs. GERROR Global error activated in SMMU_GERROR registers - S_GERROR Secure Global error activated in SMMU_S_GERROR registers - ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 150

Chapter 3. Operation 3.18. Interrupts and notifications Source Trigger reason Notes HDBSS table full An HDBSS table is full and requires software servicing This interrupt is triggered when an HDBSS table becomes full. When this interrupt becomes observable, SMMU_HDBSS_PRODn.INDEX reflects that table n is full, where n is the index of the table that became full. Software should then follow the procedure described in FEAT_HDBSS [2] and update the relevant SMMU registers. Secure HDBSS table full An HDBSS table is full and requires software servicing This interrupt is triggered when a Secure HDBSS table becomes full. When this interrupt becomes observable, SMMU_S_HDBSS_PRODn.INDEX reflects that table n is full, where n is the index of the table that became full. Software should then follow the procedure described in FEAT_HDBSS [2] and update the relevant SMMU registers. HACDBS processing complete The SMMU has reached the end of the HACDBS structure. This interrupt is triggered when the SMMU has reached the end of the HACDBS structure, and as a result has stopped processing entries. Secure HACDBS processing complete The SMMU has reached the end of the HACDBS structure. This interrupt is triggered when the SMMU has reached the end of the Secure HACDBS structure, and as a result has stopped processing entries. Realm Event queue Event queue transition from empty to non-empty - Realm PRI queue Value of SMMU_R_PRIQ_IRQ_CFG2.LO bit - Realm Command queues CMD_SYNC Sync complete, with option of generated interrupt MSI configuration (destination & data) present in command. See also 3.5.7.7.4 DCMDQ MSIs. Realm GERROR Activation of a Realm GERROR - Realm HDBSS table full An HDBSS table is full and requires software servicing This interrupt is triggered when a Realm HDBSS table becomes full. When this interrupt becomes observable, SMMU_R_HDBSS_PRODn.INDEX reflects that table n is full, where n is the index of the table that became full. Software should then follow the procedure described in FEAT_HDBSS [2] and update the relevant SMMU registers. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 151

Chapter 3. Operation 3.18. Interrupts and notifications Source Trigger reason Notes Realm HACDBS processing complete The SMMU has reached the end of the HACDBS structure. This interrupt is triggered when the SMMU has reached the end of the Realm HACDBS structure, and as a result has stopped processing entries. GPF_FAR An error becomes active in SMMU_ROOT_GPF_FAR - GPT_CFG_FAR An error becomes active in SMMU_ROOT_GPT_CFG_FAR - Each interrupt source can be enabled individually through SMMU_(*_)IRQ_CTRL if present. If enabled, a pulse is asserted on a unique wired interrupt output, if this is implemented. If enabled, an MSI is sent if MSIs are supported and if the MSI configuration of the source enables the sending of an MSI by using an ADDR value that is not zero. This allows an implementation that supports both MSIs and wired interrupts to use both types concurrently. For example, the Secure programming interface might use wired interrupts (whose source would be enabled, but with the MSI ADDR == 0 to disable MSIs) and the Non-secure programming interface might use MSIs (whose source would be enabled and have MSI address and data configured). The conditions that cause an interrupt to be triggered are all transient events and interrupt outputs are effectively edge-triggered. There is no facility to reset the pending state of the interrupt sources. Where an implementation supports RAS features, additional interrupts might be present. The operation, configuration and assertion of these interrupts has no effect on any of the interrupts listed in this section for normal SMMU usage. See Chapter 12 Reliability, Availability and Serviceability (RAS) for more information on RAS features. An SMMU with RME has two additional wired interrupts. See section 3.25.5 SMMU behavior if a GPC fault is active for details. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 152

Chapter 3. Operation 3.19. Power control 3.19 Power control An implementation might support IMPLEMENTATION SPECIFIC automatic power saving techniques, for example, power and clock gating, retention states during idle periods of normal operation. All use of these techniques is functionally invisible to devices and software. If implemented, these automatic powerdown or retention states: • Might retain all valid cache contents or might cause loss of cached information. • Do not allow undefined cache contents to become valid, valid cache contents to change, or otherwise corrupt any SMMU state. • Seamlessly operate with wake on demand behavior in the event of incoming device transactions. Alternatively, a system might allow an SMMU to be powered off at the request of system software, in an IMPLEMENTATION DEFINED manner. This state is requested when no further SMMU operation is required by the system. Software must not make accesses to the SMMU programming interface in this state. If more than one Security state is supported, this power off state is not entered unless privileged software from all Security states which either configure or use the SMMU request for or agree to a powerdown. Because such a power off represents a complete loss of state and functionality, this state must only be used when all client devices and interconnect are quiescent. Software must disable client device DMA and ensure any SMMU commands, invalidations and transactions from client devices that are in progress are complete before requesting powerdown. If any existing transactions are in a stalled state at the time of the powerdown, they must be terminated with an abort. The behavior when a transaction arrives at the SMMU after the powerdown state is entered is UNPREDICTABLE. On an IMPLEMENTATION DEFINED wakeup event, the SMMU must be reset and the return of the SMMU to software control is signaled through an IMPLEMENTATION DEFINED mechanism. The SMMU is then in a state consistent with a full reset and the SMMU registers are required to be re-initialized before client devices can be enabled. 3.19.1 Dormant state Implementations might provide automatic powerdown modes during idle periods in which SMMU registers are accessible but internal structures might be powered down. An implementation might provide a hint to software, through the SMMU_STATUSR.DORMANT flag, that it contains no cached configuration or translation information, possibly because of cache powerdown. Software can use this flag to determine that no structure or TLB invalidation is required and avoid issuing maintenance commands. When SMMU_STATUSR.DORMANT == 1, the SMMU guarantees that: • No caches of any structures or translations are present. • Any required configuration or translation information will access the information in the configuration structures or translation tables in memory. • No pre-fetch of any configuration or translation data is in progress. • If any structures or translations were altered in memory, no stale version will be used by the SMMU. Software can make use of this flag by: 1. Altering translations or configuration structure data. 2. Testing the flag • If the flag is 0, issuing invalidation commands or broadcast invalidation messages to invalidate any potentially-cached copies. • If the flag is 1, avoiding invalidation of the altered structure. An implementation is not required to support this hint, and software is not required to take note of this hint. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 153

Chapter 3. Operation 3.20. TLB and configuration cache conflict 3.20 TLB and configuration cache conflict 3.20.1 TLB conflict A programming error might cause overlapping or otherwise conflicting TLB entries to be generated in the SMMU. When an incoming transaction matches more than one TLB entry, this is an error. An implementation is not required to detect any or all TLB conflict conditions, but Arm recommends that an implementation detects TLB conflict conditions wherever possible. If an implementation detects a TLB conflict, all of the following apply: • It aborts the transaction that caused the lookup that resulted in conflict. • It attempts to record a F_TLB_CONFLICT event. The F_TLB_CONFLICT event contains IMPLEMENTA- TION DEFINED fields that might include diagnostic information that exposes IMPLEMENTATION SPECIFIC TLB layout. If an implementation does not detect a TLB conflict experienced by a transaction, behavior is UNPREDICTABLE, with the restriction that a transaction cannot access a physical address to which the configuration of a stream does not explicitly grant access. A TLB conflict never enables transactions to do any of the following: • Match a TLB entry tagged with a different VMID to that under which the lookup is performed. • Match a TLB entry tagged with a different Security state to that under which the lookup is performed. • Match a TLB entry tagged with a different StreamWorld to that under which the lookup is performed. • When stage 2 is enabled, access any physical address outside of the set of PAs configured in the stage 2 translation tables that a given transaction is configured to use. Any failure to invalidate the TLB by code running at a particular level of privilege does not give rise to the possibility of a device under control of that level of privilege accessing regions of memory with permissions or attributes that could not be achieved at that same level of privilege. Note: For example, a stream configured with StreamWorld == NS-EL1 must never be able to access addresses using TLB entries tagged with a different VMID, or tagged as Non-secure EL2, Secure EL2, EL3, or Secure. A TLB conflict caused by a transaction from one stream must not cause traffic for different streams with other VMID, StreamWorld, or Security configurations to be terminated. Arm recommends that an implementation does not cause a TLB conflict to affect traffic for other ASIDs within the same VMID configuration. 3.20.2 Configuration cache conflicts All configuration structures match a fixed-size lookup span of one entry with the exception of the STE, which contains a CONT field allowing a contiguous span of STEs to be represented by one cache entry. A programming error might cause an STE to be cached with a span that covers an existing cached STE, which results in an STE lookup matching more than one STE. An implementation is not required to detect any or all configuration cache conflict conditions but Arm recommends that an implementation detects conflict conditions wherever possible. If an implementation detects a configuration cache conflict, all of the following apply: • The transaction that caused the lookup that resulted in conflict is aborted. • The SMMU attempts to record a F_CFG_CONFLICT event. The F_CFG_CONFLICT event contains IMPLEMENTATION DEFINED fields that might include diagnostic information that exposes IMPLEMENTATION SPECIFIC cache layout. If an implementation does not detect a conflict experienced by a transaction, behavior is UNPREDICTABLE. A configuration cache conflict cannot cause an STE to be treated as though it is associated with a different Security state. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 154

Chapter 3. Operation 3.21. Structure access rules and update procedures 3.21 Structure access rules and update procedures 3.21.1 Translation tables and TLB invalidation completion behavior Translation table walks, caching, TLB invalidation, and invalidation completion semantics match those of Armv8-A [2], including rules on prefetch and caching of valid translations only. If intermediate translation table data is cached in the SMMU (a walk cache) this is invalidated during appropriate TLB maintenance operations in the same way as it would be on a PE. Explicit TLB invalidation maintains TLB entries when the translation configuration has changed, to ensure visibility of the new configuration. Translation configuration is the collective term for translation table descriptors and the set of SMMU configuration information that is permitted to be cached in the TLB. This maintenance is performed from a PE using TLBI broadcast invalidation or using explicit CMD_TLBI_* commands. A broadcast TLB invalidation operation becomes visible to the SMMU after a Shareable TLBI instruction is executed by a PE in a common Shareability domain. A command TLB invalidation operation becomes visible to the SMMU after it is consumed from the Command queue. A TLB invalidation operation is complete after all of the following become true: • All TLB entries targeted by the scope of the invalidation have been invalidated. • Any relevant HTTUs are globally visible to their Shareability domain as set out in section 3.13.4 HTTU behavior summary. • No accesses can become visible to their Shareability domain using addresses or attributes that are not described by the translation configuration, as observed after the invalidation operation became visible. This means that invalidation completes after: – All translation table walks that could, prior to the start of the invalidation, have formed TLB entries that were targeted by the invalidation are complete, so that all accesses to any fetched levels of the translation table are globally visible to their Shareability domain. This applies to a translation table walk performed for any reason, including: * A translation table walk that makes use of walk caches that are targeted by the invalidation. * A stage 2 translation table walk that is performed because of a stage 1 descriptor fetch, CD fetch or L1CD fetch. Note: To achieve this, a translation table walk might be stopped early and the partial result discarded. – SMMU-originated accesses that were translated using TLB entries that were targeted by the invalidation are globally visible to their Shareability domain. These accesses are stage 1 descriptor accesses, CD fetches or L1CD fetches. – Where a stage 2 invalidation targets TLB entries that might have translated a stage 1 descriptor access, the stage 1 descriptor access is required to be globally visible by the time of the invalidation completion, but neither the overall stage 1 translation table walk or the operation that caused the stage 1 translation table walk are required to be globally visible. Otherwise, for stage 1 and stage 1 and stage 2 scopes of invalidation, all client device transactions that were translated using any of the TLB entries that were targeted by the invalidation are globally visible to their Shareability domain. – The result of an ATOS operation cannot be based on addresses or attributes that are not described by translation configuration that could have been observed after the invalidation operation became visible. Note: An in-progress translation table walk (performed for any reason, including prefetch) can be affected by a TLB invalidation, if the TLB invalidation could have invalidated a cached intermediate descriptor that was previously referenced as part of the walk. The completion of a TLB invalidation ensures that a translation table walk that could have been affected by the TLB invalidate is either: • Fully complete by the time the TLB invalidation completes. • Stopped and restarted from the beginning. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 155

Chapter 3. Operation 3.21. Structure access rules and update procedures Note: This ensures that old or invalid pointers to translation sub-tables are never followed after a TLB invalidation (whether broadcast or CMD_TLBI_) is complete. Completion of a TLB invalidation means the point at which a broadcast invalidation sync completion is returned to the system (for example, on AMBA interconnect, completion of a DVM Sync), or, for CMD_TLBI_ invalidations, the completion of a later CMD_SYNC command. Note: The architecture states that where a translation table walk is affected by a TLB invalidation, one option is that the walk is stopped by the completion of the invalidation. An implementation must give this appearance, which means no observable side effects of doing otherwise could ever be observed. However, after an invalidation completion that affects prior translation table reads made before the invalidation, an implementation is permitted to make further fetches of a translation table walk if, and only if, it is guaranteed that these reads have no effect on the SMMU or the rest of the system, are not made to addresses with read side effects and will not affect the architectural behavior of the system. TLB entries (pertinent to a Security state) are not inserted when SMMU_(*_)CR0.SMMUEN == 0. 3.21.1.1 Translation tables update procedure When altering a translation table descriptor, the A-profile architecture[2] before v8.4-A, and SMMUv3 architectures before SMMUv3.2, require a break-before-make procedure for: • Changes to memory type. • Changes to Cacheability attributes. • Changes to output address. • Changes to block or page size. • Creating a global entry where there might be non-global entries in a TLB that overlap the global entry. Note: For example, to split a block into constituent granules (or to merge a span of granules into an equivalent block), the A-profile architecture[2] requires the region to be made invalid, a TLB invalidate performed, then to make the region take the new configuration. Note: The requirement for a break-before-make sequence can cause problems for unrelated I/O streams that might use addresses overlapping a region of interest, because the I/O streams cannot always be conveniently stopped and might not tolerate translation faults. It is advantageous to perform live update of a block into smaller translations, or a set of translations into a larger block size. The Armv8.4 [2] architecture offers 3 levels of support when changing block size without changing any other parameters that are listed as requiring use of break-before-make. These are described here as Level 0, 1 and 2, where: • Level 0 is equivalent to the requirements when the PE feature FEAT_BBML1 is not implemented. • Level 1 is equivalent to the requirements introduced by the PE feature FEAT_BBML1. • Level 2 is equivalent to the requirements introduced by the PE feature FEAT_BBML2. Implementations of SMMUv3.2 or later are required to support Level 1 or Level 2 behavior, as indicated by SMMU_IDR3.BBML. Note: Arm recommends that an implementation supports Level 2 behavior for performance reasons. Note: The features FEAT_BBML1 and FEAT_BBML2 permit the PE to report a TLB conflict abort in a wider range of scenarios than are permitted by SMMUv3.2. Note: The requirement for support of Level 1 or 2 and the stricter requirements regarding TLB conflict abort means that an implementation of SMMUv3.2 or later guarantees that a mechanism is available to change block or page size without interrupting I/O streams with a fault. The Armv8.4 feature FEAT_BBML1 adds a new bit, ‘nT’, at bit [16] in Block translation descriptors, which is supported by an implementation of SMMUv3.2 or later in the same way as for the PE, depending on the BBML level as described below. The nT bit allows a valid Block descriptor to be used for translation but prevents it from being cached in a way that can cause a TLB conflict with existing TLB entries. For VMSAv9-128, an ‘nT’ bit is added to stage 1 and stage 2 Table descriptors which have SKL != 0b00, in order to support changing between larger and smaller table sizes. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 156

Chapter 3. Operation 3.21. Structure access rules and update procedures In an SMMU with SMMU_IDR5.D128 == 1, the BBM level support indicated in SMMU_IDR3.BBML also applies to the SMMU’s support for changing the size of translation tables. The SMMUv3 architecture requirements in a TLB conflict scenario are not affected by BBML level. Note: When multiple system components, whether SMMU, PE or other, are sharing one translation table then behavior according to the lowest common break-before-make Level must be used when updating the table. 3.21.1.2 When SMMU_IDR3.BBML == 1 (Level 1) The implementation requires software to use the nT bit when changing translation size without using a break-before-make procedure. An F_TLB_CONFLICT fault might occur if a translation table update is made without using break-before-make or the nT bit. Setting nT == 1 does not cause a fault. Note: A Translation Fault is a permitted behavior for a PE implementing FEAT_BBML1 with nT == 1, but this is prohibited in a level 1 SMMU. A Block descriptor having nT == 1 is not cached in a way that will cause a TLB conflict. Note: One interpretation of the nT bit is to prevent the caching of a translation when nT == 1. This might significantly impact translation performance for the lifetime of the translation table entry. In a Level 1 SMMU a change to only the Contiguous bit, bit 52 in the descriptor, at either block or page level and with other properties unchanged, does not lead to a TLB conflict fault. When the Armv8-A requirements for use of the Contiguous bit are followed, a change to the Contiguous bit can be performed without using a break-before-make procedure and without using the nT bit in the case of a Block descriptor. Note: Example implementation styles include ignoring the Contiguous bit at all levels, or reconciling the output of any overlapping TLB entries that might result. Note: A change from a block translation to an equivalent span of page translations can be performed by changing the nT bit of the Block descriptor from 0 to 1, followed by TLB invalidation of the block, followed by replacement of the Block descriptor with a Table descriptor to a next-level table containing equivalent page translations, followed by TLB invalidation of the affected range. Note: A change from a span of page translations to an equivalent block translation can be performed by changing a mid-level Table descriptor to a Block descriptor having nT == 1, followed by TLB invalidation of the affected range, followed by an update of the nT bit of the Block descriptor from 1 to 0, followed by TLB invalidation of the block. Note: The use of the nT bit in these procedures ensures that a TLB multi-match scenario cannot arise. 3.21.1.3 When SMMU_IDR3.BBML == 2 (Level 2) The implementation ignores the nT bit in the Block descriptor and a change to a translation size can be performed without using break-before-make and without using the nT bit. The implementation automatically resolves any TLB multi-hit scenarios and an F_TLB_CONFLICT fault does not occur. If a change is made to the size of a valid translation without first making the translation invalid, then: • A TLB conflict does not occur and F_TLB_CONFLICT is never reported. • All of the following apply for translations that might discover multiple matching TLB entries for an address: – They are translated using information from at most one of the matching entries. – They do not experience a fault that would not otherwise be possible using the translation table descriptor state from either before or after the update. • The result of a translation: – Does not combine information from multiple matching TLB entries. – Does not combine information from the state of a descriptor both before and after the update. – Does not contain information that was not present in a valid tdescriptor. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 157

Chapter 3. Operation 3.21. Structure access rules and update procedures Note: An implementation might achieve this behavior by resolving a TLB lookup that has multiple matches by choosing zero or one of the results, or by causing an invalidation of all of the matching entries followed by a re-fetch of the translation. A TLB invalidation operation removes all matching TLB entries even if overlapping entries exist for a given address. The rules on TLB invalidation or the atomicity of a descriptor are not affected by BBML level. The behavior and usage requirements of the Contiguous bit in a Level 2 implementation is the same as in a Level 1 implementation. A change to the Contiguous bit can be performed without using break-before-make and without using the nT bit, and does not lead to a TLB conflict fault. Note: Arm expects that a Level 2 implementation will also automatically resolve TLB multi-hit scenarios that might arise from a change to a Contiguous bit and recommends that the Contiguous bit is not ignored. Note: Arm expects that the only change that is made to a valid translation table descriptor is that of changing the page or block size of a range of addresses. A change to memory type, Cacheability, nG or output address might impair coherency or ordering semantics of accesses using the translation. Note: A change from one set of translations to another equivalent set by changing the translation size can be performed by replacing a block or range of pages with equivalent translations of a different size, followed by TLB invalidation of the affected range. 3.21.2 Queues Note: In this section, the term Command queue refers to any type of command queue. The SMMU does not write to the Command queue. The SMMU writes to the PRI and Event queues. Arm expects the PRI queue and Event queue to be read but not modified by the agent controlling the SMMU. Writes to the Command queue do not require any SMMU action to ensure that the SMMU observes the values written, other than a write of the PROD register of the Command that causes the written command entries to be considered valid for SMMU consumption. If the SMMU internally caches Command queue entries, no other explicit maintenance of this cache is required. Arm expects that the SMMU is configured to read the queue from the required Shareability domain in at least an IO-coherent manner, or that both the SMMU and other entities make non-cached accesses to the queue so that Cache Maintenance Operations are not required. To issue commands to the SMMU, the agent submitting the commands: 1. Determines (using PROD/CONS indexes) that there is space to insert commands. 2. Writes one or more commands to the appropriate location in the queue. 3. Performs a DSB operation to ensure observability of data written in step (2) before the update in step (4). 4. Updates the Command queue’s PROD index register to publish the new commands to the SMMU. Software is permitted to write any entry of the Command queue that is in an empty location between CONS and PROD indexes. The SMMU might read and internally cache any command that is in a full location between PROD and CONS indexes If a command is cached, the cache is not required to be coherent with PE caches but if it is not coherent the following rules apply: • When the SMMU stops processing commands because of a Command queue error, or when the queue is disabled, the SMMU invalidates all commands that it might have cached. • A cached command must only be consumed one time and no stale cached value can be used instead of a new value when the queue location is later reused for a new command. Note: The first rule means software can fix up or replace commands in the queue after an error, or while the queue is disabled, without performing any other synchronization other than restarting command processing. Software must not alter memory locations representing commands previously submitted to the queue until those commands have been consumed, as indicated by the CONS index, and must not assume that any alteration to a command in a full location will be observed by the SMMU. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 158

Chapter 3. Operation 3.21. Structure access rules and update procedures Software must only write the CONS index of an output queue (Event queue or PRI queue) in a consistent manner, with the appropriate incrementing and wrapping, unless the queue is disabled. If this rule is broken, for example by writing the CONS index with a smaller value, or incorrectly-wrapped index, the queue contents are UNKNOWN. Software must only write the PROD index of the Command queue in a consistent manner, with the appropriate incrementing and wrapping, unless the queue is disabled or a command error is active, see section 7.1 Command queue errors. If this rule is broken, one of the following CONSTRAINED UNPREDICTABLE behaviors occurs: • The SMMU executes one or more UNPREDICTABLE commands. • The SMMU stops consuming commands from the Command queue until the queue is disabled and re-enabled. 3.21.3 Configuration structures and configuration invalidation completion The entries of the configuration structures, the Stream table and Context Descriptors, all contain fields that determine their validity. An SMMU might read any entry at any time, for any reason. STEs and CDs contain a valid flag, V. A structure is considered valid only when the SMMU observes it contains V == 1 and no configuration inconsistency in its fields causes it to be considered ILLEGAL. Some structures contain pointers to subsequent tables of structures (STE.S1ContextPtr, (L1STD.L2Ptr) and L1CD.L2Ptr). If a structure is invalid, the pointers within it are invalid. The SMMU does not follow invalid pointers, whether speculatively or in response to an incoming transaction. STEs in a linear Stream table and L1ST descriptors in a multi-level Stream table are located through the SMMU_()STRTAB_BASE address. Entries in these tables are not fetched if SMMU()CR0.SMMUEN == 0, because the base pointer is not guaranteed to be valid. These base pointers must be valid when the corresponding SMMUEN == 1. Configuration cache entries associated with a Security state are not inserted when, for that Security state, SMMU()CR0.SMMUEN == 0. Similarly, CDs or L1CDs are located through the STE.S1ContextPtr and L1CD.L2Ptr pointers. A CD must never be fetched or prefetched unless indicated from a valid STE, meaning that the STE S1ContextPtr is valid and therefore the STE enables stage 1. Note: A particular area of memory is only considered to be an STE or CD because a valid pointer of a certain type points to it (the SMMU()STRTAB_BASE or L1STD.L2Ptr or STE.S1ContextPtr or L1CD.L2Ptr pointers respectively). A CD cannot be prefetched from an address that is not derived directly from the CD table configuration in a valid STE, as an area of memory is not a CD unless a valid STE or L1CD.L2Ptr points to it. Similarly, an L1CD is not actually an L1CD structure unless a valid STE points to it. A structure is said to be reachable if a valid pointer is available to locate the structure. Depending on the structure type, the pointer might be a register base address or a pointer within a precursor structure (either in memory or cached). When SMMUEN == 0, no configuration structures are reachable. Otherwise: • An STE is reachable if it is within the table given by the base and size indicated in the SMMU()STRTAB_BASE registers for a linear Stream table, or if it is within the 2nd-level table indicated by a valid L1STD base and span for a two-level Stream table. • A L1ST descriptor in a two-level Stream table is reachable if it is within the first-level table indicated by the base and size set in the SMMU_()STRTAB_BASE registers. • A CD is reachable if it is within the table given by the base and size indicated by a valid stage 1 S1ContextPtr and S1CDMax of a valid STE for a linear CD table, or if it is within the 2nd-level table indicated by a valid L1CD base and span for a two-level CD table. • A VMS is reachable if the VMS is enabled for an STE. An implementation does not fetch an unreachable structure. Walk of the tree of configuration tables does not progress beyond any invalid structure. An implementation is permitted to fetch or prefetch any reachable structure at any time, as long as the generated address lies within the bounds of the table containing the structure. An implementation is permitted to cache any successfully fetched or prefetched configuration structure, whether marked as valid or not, in its entirety or partially. That is: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 159

Chapter 3. Operation 3.21. Structure access rules and update procedures • Any STE or L1STD within the STE table (given by the base, size and intermediate table spans if appropriate) can be fetched and cached. • Any CD or L1CD within a CD table can be fetched and cached. When fetching a structure in response to a transaction, an implementation might read and cache more data than the required structures, as long as the limits of the tables are respected. Note: Any change to a structure must be followed by the appropriate structure invalidation command to the SMMU, even if the structure was initially marked invalid. Note: An unreachable structure cannot be fetched, because there is no valid pointer to it. However, a structure might be cached if it was fetched while the structure was reachable, even if it is subsequently made unreachable. For example, a valid STE could remain cached and later used after the SMMU_(_)STRTAB_BASE registers are altered. Software must perform configuration cache maintenance upon changing configuration that might make structures unreachable. A structure that is not actually fetched, such as a CD/L1CD, STE/L1STD or VMS that experiences an external abort (F_CD_FETCH, F_STE_FETCH or F_VMS_FETCH) or a CD/L1CD that fails stage 2 translation, does not cause knowledge of the failure to be cached. A future access of the structure must attempt to re-fetch the structure without requiring an explicit configuration structure invalidation command before retrying the operation that caused the initial structure fetch. Note: For example, where stage 2 is configured to stall, to progress a transaction that causes a CD fetch that in turn causes a stage 2 Translation-related fault (an event with Stall == 1), it is sufficient to: 1. Resolve the cause of the translation fault, for example by writing a Translation table entry. 2. Issue a TLB invalidation operation, if required by the translation table alteration. 3. Issue a CMD_RESUME, giving the StreamID and STAG appropriate to the event record. An implementation must not: • Read any address outside of the configured range of any table. – Speculative access of reachable structures is permitted, but address speculation outside of configured structures is not permitted. • Cache any structure under a different type to the table from which it was read. For example, it must not follow the pointer of an STE to a CD and cache that CD (or any adjacent CD in the CD table) as anything non-CD, for example a translation table entry. Software must ensure it only configures tables that are wholly contained in Normal memory. A configuration invalidation operation completes after all of the following become true: • All configuration cache entries targeted by the invalidation have been invalidated. • No accesses can become visible to their Shareability domain using addresses or attributes that could not result from the configuration structures as observed after the invalidation operation became visible. This means that invalidation completes after: – Any client device transactions that used configuration cache entries that were targeted by the invalidation are globally visible to their Shareability domain. – Any configuration structure walks that used configuration cache entries that were targeted by the invalidation are complete so that all accesses to any fetched levels of the structures are globally visible to their Shareability domain. This applies to a configuration structure walk performed for any reason, including a configuration structure walk performed because of a prefetch, command, incoming transaction, ATOS request or Translation Request. An in-progress configuration structure walk (performed for any reason, including prefetch) can be affected by a configuration invalidation command (CMD_CFGI_*) if a cached intermediate structure that was previously referenced as part of the walk could have been invalidated. The completion of a configuration invalidation command (as determined by the completion of a subsequent CMD_SYNC) ensures that any configuration structure walk that could be affected by the invalidate is either: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 160

Chapter 3. Operation 3.21. Structure access rules and update procedures • Fully completed by the time the CMD_SYNC completes. • Stopped and restarted from the beginning after the CMD_SYNC completes. Note: This ensures that old or invalid pointers to subsequent configuration structures are never followed after an invalidation is complete. For example, when an SMMU has an STE pointing to a two-stage CD table and is prefetching a CD, then on reading the L1CD pointer, a CMD_CFGI_STE is processed that invalidates the STE that located the L1CD table. If the STE is made invalid, the pointer to the CD table is no longer valid and the SMMU must not continue to fetch the second-level CD after acknowledging to software that it considers the STE invalid. Software is free to re-use the memory used for the CD tables after receiving this acknowledgment, so continuing the prefetch after this point risks loading now-unrelated data. The SMMU must abort the fetch and not read the second-level CD, or must read the second-level CD before signaling the CMD_CFGI_STE/CMD_SYNC as complete to software. Note: Refer to the note in section 3.21.1 Translation tables and TLB invalidation completion behavior regarding observability of whether a translation table walk is stopped. For a configuration table walk that is stopped by an affected invalidation completion, an implementation is permitted to perform further fetches of a configuration structure walk after the completion, based on affected prior configuration structure reads that were made before the invalidation if, and only if, it is guaranteed that these reads have no effect on the SMMU or the rest of the system, are not made to addresses with read side effects, and will thus not affect the architectural behavior of the system. The size of single-copy atomic reads made by the SMMU (single-copy atomicity size) is IMPLEMENTATION DEFINED but must comply with the following: • If an SMMU is integrated in a system with PEs that implement FEAT_LSE2, then the single-copy atomicity size for fetches of configuration structures issued by the SMMU is 128-bit. – Note: The SMMU is still permitted to issue smaller transactions where required. For example, the single-copy atomicity size of fetches and updates of VMSAv8-64 translation table descriptors is 64-bit. – Note: The single-copy atomicity size for fetches and updates of VMSAv9-128 translation table descriptors is 128-bit. • Otherwise, the single-copy atomicity size must be at least 64-bit. For a single-copy atomicity size that corresponds to a given structure, any single field within an aligned span of that size can be altered without first making the structure invalid. For example, to change the ASID in a CD, the ASID field can be written directly, followed by CMD_CFGI_CD and CMD_SYNC. However, if there are two fields separated so that one single-copy atomic write cannot atomically alter both at the same time, the structure cannot be modified in this way. Non-single copy atomic writes might be visible to the SMMU separately and an inconsistent state might be cached (in which one field update has been read but another missed). The structure must, in this case, be made invalid, modified, then made valid, using the procedures described in section 3.21.3.1 Configuration structure update procedure. Note: In some systems, 64-bit single-copy atomicity is only guaranteed to addresses backed by certain memories. If software requires such atomicity, it must locate SMMU configuration structures in these memories. For example, in LPAE ARMv7 systems, main memory is expected to be used to contain translation tables, and is therefore required to support 64-bit single-copy atomicity. When a structure is fetched, the constituent 64-bit double-words of a structure are permitted to be accessed by the SMMU non-atomically with respect to the structure as a whole and in any temporal sequence (maintaining the relative address sequence of the read portions). 3.21.3.1 Configuration structure update procedure Note: The SMMU is not required to observe the structure word that contains the V flag in a particular order with respect to the other data in the structure. This gives rise to a requirement for an additional invalidation when transitioning a structure from V == 0 to V == 1. Because the SMMU can read any reachable structure at any time, and is not required to read the double-words of the structure in order, Arm recommends that the following procedure is used to initialize structures: 1. Structure starts invalid, having V == 0. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 161

Chapter 3. Operation 3.21. Structure access rules and update procedures 2. Fill in all fields, leaving V == 0, then perform a DSB operation to ensure written data is observable from the SMMU. 3. Issue a CMD_CFGI_, as appropriate. 4. Issue a CMD_SYNC, and wait for completion. 5. Set V to 1, then perform a DSB operation to ensure write is observable by the SMMU. 6. Issue CMD_CFGI_, as appropriate. 7. Optionally issue a CMD_SYNC, and wait for completion. This must be done if a subsequent software operation, such as enabling device DMA, depends on the SMMU using the new structure. To make a structure invalid, Arm recommends that this procedure is used: 1. Structure starts valid, having V == 1. 2. Set V == 0, then perform a DSB operation to ensure write is observable from the SMMU. 3. Issue a CMD_CFGI_, as appropriate. 4. Issue a CMD_SYNC, and wait for completion. If software modifies the structure while it is valid, it must not allow the structure to enter an invalid intermediate state. Note: Because the rules in section 3.21.3 Configuration structures and configuration invalidation completion disallow prefetch of a structure that is not directly reachable using a valid pointer, structures might be fully initialized (including with V == 1) prior to a pointer to the structure becoming observable by the SMMU. For example, a stage 1 translation can be set up with this procedure: 1. Allocate memory for a CD, initialize all fields including setting CD.V to 1. 2. Select an STE, initialize all fields and point to the CD, but leave STE.V == 0. 3. Perform a DSB operation to ensure writes are observable from the SMMU. 4. Issue a CMD_CFGI_STE and a CMD_SYNC and wait for completion. 5. Set STE.V to 1, then perform a DSB operation to ensure write is observable from the SMMU. 6. Issue a CMD_CFGI_STE and CMD_SYNC and wait for completion. Note: No CMD_CFGI_CD is required because it is impossible for the CD to have been prefetched in an invalid state. However, a CMD_CFGI_CD must be issued as part of a procedure that subsequently makes the CD invalid. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 162

Chapter 3. Operation 3.22. Destructive reads and directed cache prefetch transactions 3.22 Destructive reads and directed cache prefetch transactions Some interconnect architectures might support the following types of transaction input to the SMMU: 1. RCI: Read with clean and invalidate: • A read transaction containing a hint side effect of clean and invalidate. • Note: The AMBA AXI5 interface [8] ReadOnceCleanInvalid transaction is an example of this class of transaction. 2. DR: Destructive read: • A read transaction with a data-destructive side effect that intentionally causes addressed cache lines to be invalidated, without writeback, even if they are dirty. • Note: The AMBA AXI5 interface [8] ReadOnceMakeInvalid transaction is an example of this class of transaction. 3. W-DCP: Write with directed cache prefetch: • A write transaction containing a hint that changes the cache allocation in a part of the cache hierarchy that is not on the direct path to memory. This class of operation does not include those with data-destructive side effects. • Note: The following AMBA AXI5 interface [8] transactions are examples of this class of transaction: WriteUniquePtlStash, WriteUniqueFullStash. 4. NW-DCP: A directed cache prefetch without write data: • A transaction that is neither a read nor a write, but performs a cache prefetch in a similar way to a write with directed cache prefetch, without the written data. • Note: The following AMBA AXI5 interface [8] transactions are examples of this class of transaction: StashOnceShared, StashOnceUnique. The side effects of these transactions are hints and are therefore distinct from, and treated differently to, Cache Maintenance Operations. See section 16.7.2 Non-data transfer transactions. In SMMUv3.0, the architecture does not support these transactions, which are unconditionally converted on output as specified by the interconnect architecture. In SMMUv3.1 and later, these transactions are permitted to pass into the system unmodified when the transaction bypasses all implemented stages of translation, see section 3.22.3 Memory types and Shareability for permitted memory types. This happens when: • SMMU_()CR0.SMMUEN == 0 for the Security state of the stream: – These transactions are affected by SMMU()GBPA overrides in the same way as the implementation treats ordinary transactions. • SMMU(*_)CR0.SMMUEN == 1 for the Security state of the stream, but the valid STE of the stream has STE.Config == 0b100. • The valid STE for the transaction has STE.S1DSS == 0b01 and STE.Config == 0b101, and the transaction is supplied without a SubstreamID. When the output interconnect does not support these types of transaction, or when the conditions described in sections 3.22.1 Control of transaction downgrade, 3.22.2 Permissions model and 3.22.3 Memory types and Shareability apply, these classes of transaction are downgraded with the following transformations: Input Transaction Class Output/downgraded transaction class Read with clean and invalidate (RCI) No downgrade, or downgrade to ordinary read transaction. (1) ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 163

Chapter 3. Operation 3.22. Destructive reads and directed cache prefetch transactions Input Transaction Class Output/downgraded transaction class Destructive read (DR) Non-destructive read: An ordinary transaction, or a read with a clean and invalidate side effect. (2) Write with directed cache prefetch (W-DCP) Ordinary write transaction. Directed cache prefetch without write data (NW-DCP) No-op: Transaction completes successfully with no effect on the memory system. (1) It is IMPLEMENTATION DEFINED whether an implementation downgrades a RCI into a read, or whether this transaction remains unchanged. An implementation might only downgrade the RCI into a read if the output interconnect supports read but not RCI transactions. (2) It is IMPLEMENTATION DEFINED whether a downgrade of a destructive read to a non-destructive read chooses to downgrade to an ordinary read or a RCI. The RCI and DR transactions are read transactions with an additional hint. These transactions are not permitted to be issued speculatively. The W-DCP transaction is a write transaction with an additional hint. The NW-DCP transaction is a hint without data transfer and is issued speculatively. An implementation is permitted to downgrade the transaction as described in this section, for any reason. The data transfer portion of these transactions, if present, is not a hint, and is treated in the same way as an ordinary read or write. Note: Unless the SMMU has been explicitly configured to do so, Arm recommends that the common behavior of an implementation is to avoid downgrading these transactions. 3.22.1 Control of transaction downgrade An implementation of SMMUv3.1 or later supporting these classes of transactions provides STE.{DRE,DCP} controls to permit these classes of transaction to pass into the system without transformation when one or more stages of translation are applied. This does not include the case where the only stage of translation is skipped because of the value of STE.S1DSS. When these controls are disabled, the respective class of transactions is downgraded as described in the previous section: Input transaction class Requirement to be eligible to pass into the system without class downgrade Notes Read with clean and invalidate (RCI) No additional requirements - Destructive read (DR) STE.DRE == 1. If STE.DRE == 0, downgraded into non-destructive read (read, or read with clean and invalidate). If SMMU_IDR3.MTCOMB is 1, then for a Forced-WB transaction, the value of STE.DRE is treated as 0. Write with directed cache prefetch (W-DCP) STE.DCP == 1. If STE.DCP == 0, downgraded into ordinary write. - Directed cache prefetch without write data (NW-DCP) STE.DCP == 1. If STE.DCP == 0, downgraded into no-op. - A read with clean and invalidate is non-destructive and is not required to be transformed into a different class of transaction by the SMMU. The SMMU evaluates permissions for this type of transaction the same way it does for ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 164

Chapter 3. Operation 3.22. Destructive reads and directed cache prefetch transactions an ordinary read, see section 3.22.3 Memory types and Shareability. A read with clean and invalidate might be transformed as required by the final memory type or Shareability. If a transaction is enabled to progress without downgrade, it can only progress if the required translation table permissions are present, as described in the next section. If the required permissions are not present, but sufficient permissions exist to downgrade the transaction, then it is downgraded. Otherwise it causes a fault. 3.22.2 Permissions model When one or more stages of translation are applied to these transactions, the interaction with the permissions determined from translations is shown below. The behaviors listed here assume that translation for the given address has progressed up to permission checking, and that no higher-priority fault, for example a Translation fault or an Access flag fault, has occurred. Transaction type Required permissions Behavior if permissions not met Read with clean and invalidate (RCI) (1) Identical to ordinary read: Requires Read or Execute permission, (depending on input InD and STE.INSTCFG) at a privilege appropriate to PnU input and STE.PRIVCFG. Identical to ordinary read. Destructive read (DR) Requires Read or Execute permission (depending on input InD and STE.INSTCFG), and Write permission that does not result in HTTU update of Dirty state, at a privilege appropriate to PnU input and STE.PRIVCFG. If write access is not granted, then downgraded as above into a read or read with clean and invalidate. (2) (3) If no Read/Execute permission (as appropriate), identical to ordinary read. Write with directed cache prefetch (W-DCP) Identical to ordinary write: Requires Write permission at privilege appropriate to PnU input and STE.PRIVCFG. Always Data.(4). Identical to ordinary write. Directed cache prefetch without write data (NW-DCP) Requires either Read permission or Write permission that does not result in HTTU update of Dirty state, or Execute at privilege appropriate to PnU input and STE.PRIVCFG, at each enabled stage of translation. It is IMPLEMENTATION SPECIFIC whether this permission is evaluated as the effective combination of permissions at all stages, or evaluated at each stage separately (6). Prefetch does not occur. (5) (1) This includes the case where a destructive read is downgraded to a read with clean and invalidate because STE.DRE == 0. (2) Though a DR requires write permission to progress into the system as a DR, it does not cause a Permission fault for write. (3) This includes the case where a GPC does not grant write permission when SMMU_ROOT_IDR0.GDI == 1. See 3.25.10 Granular Data Isolation. (4) The SMMU treats all writes as Data regardless of InD input and STE.INSTCFG. (5) An NW-DCP is not a write and, if HTTU of dirty state is enabled, does not mark a page Dirty. If an NW-DCP has the required permissions at a given stage of translation and HTTU of Access flag is enabled for that stage, AF is updated. If required permissions are not met for an NW-DCP at a given stage of translation, the transaction does not progress into the system. However, if the translation conditions permit an AF update, a coincidental speculative update of AF might occur. (6) This applies to the case where a non-overlapping set of permissions is available at stage 1 versus stage 2. For example, a stage 1 read-only translation with a stage 2 write-only translation. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 165

Chapter 3. Operation 3.22. Destructive reads and directed cache prefetch transactions See section 3.13.8 Hardware flag update for Cache Maintenance Operations and Destructive Reads for information on the behavior of HTTU for Destructive Reads. HTTU for RCI and W-DCP behaves the same as for an ordinary read or write, respectively. A directed cache prefetch without write data (NW-DCP) does not cause faults in the SMMU. Where the interconnect architecture requires a response to an NW-DCP, and where the SMMU terminates an NW-DCP, the SMMU does not cause abort responses to be returned. If RCI or DR ultimately lead to a fault, they are recorded as reads (data or instruction, as appropriate to input InD/INSTCFG). If W-DCP ultimately leads to a fault, it is recorded as a write. If the RCI, DR and W-DCP transactions lead to a fault, they stall in the same way as an ordinary read or write transaction if the SMMU is configured for stalling fault behavior. Retry and termination behave the same as for an ordinary read or write transaction. If these transactions are stalled and retried, they are retried as the same transaction type. 3.22.3 Memory types and Shareability The interconnect architecture of an implementation might impose constraints on the memory type or Shareability that output DR, RCI, W-DCP and NW-DCP operations can take. At the point of final output the SMMU downgrades these operations, as described in 3.22 Destructive reads and directed cache prefetch transactions, if the operations are not valid for output with the determined output attribute. This rule applies to all such operations in all translation and bypass configurations, including: • Global bypass (attribute set from GBPA). • STE bypass (the only stage of translation is skipped because of STE.Config == 0b100 or STE.S1DSS == 0b01 and STE.Config == 0b101). • Translation. Note: On AMBA AXI5 interfaces [8], the W-DCP operations (WriteUniquePtlStash, WriteUniqueFullStash) are not permitted to be emitted with a Non-shareable or Sys Shareability. The NW-DCP operations (StashOnceShared, StashOnceUnique) are not permitted to be emitted with Sys Shareability. RCI and DR operations (ReadOnceCleanInvalid and ReadOnceMakeInvalid) are not permitted to be emitted with NSH or Sys Shareability. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 166

Chapter 3. Operation 3.23. Memory Tagging Extension 3.23 Memory Tagging Extension MTE introduces a new MAIR field encoding, 0xf0. This encoding is Reserved in SMMUv3, in the CD.MAIR0 and CD.MAIR1 fields. All SMMU-originated accesses are Tag Unchecked accesses. The SMMU does not write Allocation Tags. The terms Tag Unchecked and Allocation Tag are defined in [2]. 3.23.1 SMMU support for FEAT_MTE_PERM If SMMU_IDR3.MTEPERM is 1, then stage 2 MemAttr encodings that have the NoTagAccess attribute introduced by FEAT_MTE_PERM in [2] are treated as both: • Not having the NoTagAccess attribute in the SMMU. • Having the same memory type and Cacheability attributes as in FEAT_MTE_PERM in [2]. The resulting encodings are: STE.S2FWB Stage 2 MemAttr[3:0] SMMU interpretation 0 0b0100 Normal Inner Write-Back Cacheable, Outer Write-Back Cacheable 1 0b1111 Same as 0b0111. 1 0b1110 Same as 0b0110. 1 0b1101 Reserved. 1 0b1100 Reserved. 1 0b10xx Reserved. Note: The stage 2 MemAttr behavior for the SMMU specified here is consistent with all accesses not having the Tagged attribute at stage 2. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 167

Chapter 3. Operation 3.24. Device Permission Table 3.24 Device Permission Table The Device Permission Table and associated behavior provides a mechanism to enforce the association between granules of physical address space and the memory footprint of virtual machines. Use of the DPT is only supported for StreamIDs that are configured to use StreamWorld EL1, and otherwise results in C_BAD_STE. The architecture supports an independent DPT for each of Non-secure and Realm states. A DPT is an in-memory structure that describes the accessibility of each granule of configured physical address space as one of the following: • No access is permitted to the granule. • The granule is accessible for accesses associated with some specific VMIDs. • The granule is accessible regardless of VMID association. DPT configuration is not required for StreamIDs where ATS is disabled or configured for Split-stage operation. See 3.9.1 ATS Interface and STE.EATS. The Device Permission Table, DPT, is independently optional for each of the Non-secure and Realm programming interfaces: • Support for DPT in the Non-secure state is indicated in SMMU_IDR3.DPT. • Support for DPT in the Realm state is indicated in SMMU_R_IDR3.DPT. 3.24.1 DPT check A successful DPT lookup resolves to the following information for a granule or contiguous region: • A No Access indication. This indicates that an access to the region is not permitted and that the SMMU is not permitted to create a DPT TLB entry for the lookup. See: 3.24.2 DPT caching behavior. • Whether an access is permitted according to the Access Control (AC), W and VMID fields returned by the lookup. The DPT check includes all of the following: • If the address input into the DPT check is outside the address range configured in SMMU_(R_)DPT_BASE_CFG.DPTPS, this indicates No Access, and the DPT check fails as a Device Access fault. • If a DPT descriptor indicates No Access, the DPT check fails as a Device Access fault. There are two ways for a descriptor to indicate No Access: – A Level 0 No Access entry. – The encoding of the A[1:0] field in a Level 1 descriptor indicates No Access. • If the region is marked as W = 0 and the incoming transaction is a write access, the DPT check fails as a Device Access fault. Note: In certain coherency protocol implementations, if the DPT grants a fully-coherent client with access to a page, it is not possible to enforce separate read and write permissions. In this case, the DPT W bit is ignored (that is, treated as 1) when processing fully-coherent translated transactions. The method by which system software discovers if write permission can be enforced for each fully-coherent client is system-specific. • The AC and VMID fields from the DPT are checked against the STE.{DPT_VMATCH, S2VMID} fields for the StreamID of the access that is being checked. This table indicates whether the VMID in the DPT entry is required to match the STE.S2VMID field for the access in order for the access to be permitted: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 168

Chapter 3. Operation 3.24. Device Permission Table STE.DPT_VMATCH AC=0b00 AC=0b01 AC=0b10 0b00 Yes Yes No 0b01 Yes No No 0b10 No No No Note: For Realm STEs, DPT_VMATCH is always 0b00. If VMID is required to match and does not match, the DPT check fails as a Device Access fault. Note: The DPT check might generate a Device Access fault only if DPT configuration did not lead to a DPT lookup fault. See section: 3.24.4 DPT lookup errors. Note: In the absence of correct DPT TLB maintenance, the DPT check might be made using stale information cached in the DPT TLB. The output PA space of the region is determined as follows: • For the Non-secure DPT, the output PA space is Non-secure. • For the Realm DPT the output PA space is determined from AC: – If AC is 0b01 or 0b10, the output PA space is Non-secure. – Otherwise, the output PA space is Realm. Note: The STE.DPT_VMATCH field has no effect on the output PA space for the access. The PM attribute is ignored by the SMMU for the purpose of DPT checks. See section 3.25.10.1.1 Protected Mode. 3.24.2 DPT caching behavior For a StreamID with STE.EATS == 0b11, there are two situations in which the result of a lookup is permitted to be cached in a DPT TLB: • When an ATS Translation Request results in a successful ATS Translation Completion with any permissions other than R == W == 0, the SMMU is permitted to create a DPT TLB entry based on either of: – The final enabled stage of translation that the request was subject to. – All enabled stages of translation that the request was subject to. In both cases, this is only permitted within the bounds specified later in this subsection. If the ATS Translation Request did not result in a successful ATS Translation Completion only because of a GPC fault on the output address, then the SMMU is still permitted to create a DPT TLB entry for it in a DPT TLB that does not cache GPT information. Note: If hardware update of the Access flag or dirty state is enabled, the SMMU still follows the existing rules for performing the updates both speculatively and in response to an ATS translation request. See 3.13.7 ATS, PRI and translation table flag update. DPT TLB entries are never created from the result of ATS Translation Requests that bypass all stages of translation. This includes the case where STE.S1DSS configuration means that Translation Requests with SSV=0 effectively bypass stage 1, on a stream where stage 2 is configured for bypass. • When a walk of the DPT does not result in a DPT lookup fault, and the DPT information returned by the walk does not indicate No Access, the SMMU is permitted to create a DPT TLB entry based on the result of the DPT lookup. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 169

Chapter 3. Operation 3.24. Device Permission Table This property applies both for DPT walks caused by an incoming transaction, and DPT walks that are performed speculatively by the SMMU. Note: This means software must ensure consistent configuration of the DPT and the final enabled stage of translation. The properties of a DPT TLB are: • It is indexed by SEC_SID and the final Output Address of the translation. • Entries are permitted to cover an address region up to the effective size of the translation, as determined by the level of the translation result and the Contiguous bit for a translation table walk, or the level of the DPT lookup and the Contig field for a DPT walk. • If the result of an ATS Translation request covers an address region smaller than the configured region size in SMMU_(R_)DPT_BASE_CFG.DPTGS, the SMMU is permitted to use either region size. • The entries distinguish whether access to the region is read-only, or has read and write permissions. • The entries determine the Output physical address space of the translation. For Realm state, this is either Realm or Non-secure PA space. For Non-secure state, this is always Non-secure PA space. • The entries determine whether accesses to a region must be associated with a specific VMID or not. If the region is associated with a specific VMID, the VMID is also part of the entry. • Entries inserted based on the result of an ATS Translation Completion might contain less information than can be determined from the result of a DPT walk. Note: The DPT format cannot express permission for a write-only region of memory and therefore write-only permission is instead treated as read and write permission by the DPT and DPT checks. Implementations are permitted to combine DPT and GPT information in TLBs. In an implementation that combines DPT and GPT information in a TLB, all of the following apply: • The SMMU removes entries as part of both CMD_DPTI_ operations and TLBI PA* operations that match the properties of the entries. • Any use of a DPT TLB entry cannot allow an access to bypass the requirements of Granule Protection Checks. When a DPT TLB entry is created because of a successful ATS Translation Completion, it is permitted to be cached according to all enabled stages of translation or the final enabled stage of translation that was used to service the ATS Translation Request, in a manner consistent with observing the following values from a DPT walk for the PA returned in the ATS Translation Completion: DPT Field Value Notes A[x] Implicitly grants access to the appropriate granule or contiguous region if the ATS Translation Completion indicated any access permission. - W The write permission returned by the combined or final stage of translation that was used to generate the ATS Translation Completion. For translation stages using hardware update of dirty state then a writable-clean translation does not count as granting write access, unless the ATS Translation Request also caused the translation to transition to writable-dirty. - FWB If the transaction is Forced-WB, FWB is cached as 0b1. Otherwise, it is cached as 0b0. Only supported when SMMU_IDR3.MTCOMB is 1. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 170

Chapter 3. Operation 3.24. Device Permission Table DPT Field Value Notes AC For a Non-secure stream, 0b00. For a Realm stream: - If the output PA space was Non-secure, 0b01. - If the output PA space was Realm, 0b00. - VMID STE.S2VMID value for the StreamID for which the ATS Translation Request was performed. - Contig The region size returned by the combined or final stage of translation that was used to generate the ATS Translation Completion, but the SMMU limits the size to never exceed the level 0 DPT size configured in SMMU_(R_)DPT_BASE_CFG.L0DPTSZ, or a size of 1GB if L0DPTSZ is programmed to a Reserved value. - Note: If SMMU_IDR3.MTCOMB is 1, then software is expected to configure FWB to a value consistent with stage 2 translation. Inconsistent configuration of FWB within a DPT entry may lead to mismatched memory attributes used when accessing the memory location, which may lead to loss of coherency. The configuration of STE.DPT_VMATCH does not affect the attributes of any DPT TLB entry. See also 4.6 DPT maintenance. 3.24.2.1 DPT TLB caching and Device Access faults The SMMU is only permitted to generate a Device Access fault directly from information cached in a DPT TLB entry in the following situation: • The DPT TLB entry was created from the result of a DPT walk that did not result in a DPT lookup fault and the information returned by the DPT lookup did not indicate No Access. The SMMU is not permitted to generate a Device Access fault directly from information cached in a DPT TLB entry in the following situation: • The DPT TLB entry was created from the result of the translation table walk performed as part of responding to an ATS Translation Request. In the case where the SMMU is not permitted to generate a Device Access fault based on an existing DPT TLB entry, the SMMU is required to perform a DPT walk in order to correctly perform the DPT check. The SMMU is permitted to not generate a Device Access fault in any situation where an existing DPT TLB entry grants access for the ATS Translated transaction being checked. If a DPT TLB entry does not grant sufficient permissions for any of the following transaction types: • A cache maintenance operation (CMO). • A destructive read (DR) transaction. • A directed cache prefetch without write data (NW-DCP) transaction. then the following behaviors apply: • The transaction is permitted to be downgraded, if this is allowed by the TLB-cached permissions, as specified in: – 16.7.2 Non-data transfer transactions. – 3.22 Destructive reads and directed cache prefetch transactions. • If the DPT TLB entry was created from the result of a VMSA translation performed as part of responding to an ATS Translation Request, then even if the transaction can be downgraded, the SMMU is permitted to initiate a DPT walk in order to precisely perform the DPT check. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 171

Chapter 3. Operation 3.24. Device Permission Table When the SMMU creates a DPT TLB entry based on the results of an ATS Translation Request with PM = 1, the following behaviors apply: • The entry is permitted to be cached in a TLB that combines DPT and GPT information, and is then tagged with the PM attribute. • The entry is permitted to be cached in a TLB that does not cache GPT information, and is therefore not tagged with the PM attribute. In this case, the entry is still permitted to cache the absence of write permission as the result of the GPT check, because the SMMU will perform a DPT walk if the required permissions are not met. See section 3.25.10.1.1 Protected Mode. 3.24.2.2 Changing the DPT structure for a region of memory If the configuration of the DPT expresses a consistent set of attributes for a region of memory, and the configuration of the DPT is changed such that the attributes for that region of memory remain the same and consistent, then DPT checks continue to be performed correctly even in the absence of DPT TLB maintenance operations. This includes the following example cases: • Replacing a set of Level 1 entries that have the Contig field set to zero with a set of Level 1 entries that have the Contig field configured to the size of the region. • Replacing a single Level 0 Block entry with a Level 0 Table entry that points to a level 1 table containing Level 1 entries. • Replacing a Level 0 Table entry that points to a level 1 table containing Level 1 entries with a single Level 0 Block entry. In this case, the SMMU might still fetch from the level 1 table until completion of a DPT TLB maintenance operation that removes the Level 0 Table entry from the DPT TLB. 3.24.3 DPT format and lookup process The DPT has two levels, level 0 and level 1. All descriptors are 8 bytes in length, and are little-endian. All tables are aligned to their size in memory. DPT lookups are made with the memory attributes configured in SMMU_(R_)CR1.{TABLE_IC, TABLE_OC, TABLE_SH}, and the Read-allocate hint in SMMU_(R_)DPT_BASE.RA. DPT lookups are made with the MPAM STE.{PARTID, PMG} values configured in the STE for the StreamID for which the lookup is performed. The MPAM PARTID space is the same as the space that would be used to fetch stage 2 translation tables for that StreamID. DPT lookups are made with behavior consistent with PBHA being disabled or not implemented. Note: If SMMU_R_IDR3.MEC is 1, DPT lookups for Realm state are performed with the MECID value configured in SMMU_R_GMECID. The L0DPT base address, and next-level table base addresses for L1DPT tables, are aligned by the hardware. The input PA is interpreted as described in Table 3.36. In Table 3.36, the placeholder values are: • oas is the decoded value of SMMU_IDR5.OAS as a bit width. • dptps is the decoded value of SMMU_(R_)DPT_BASE_CFG.DPTPS as a bit width. • l0dptsz is the decoded value of SMMU_(R_)DPT_BASE_CFG.L0DPTSZ as a bit width. • dptgs is the decoded value of SMMU_(R_)DPT_BASE_CFG.DPTGS as a bit width. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 172

Chapter 3. Operation 3.24. Device Permission Table Table 3.36: Input PA interpretation for DPT lookups PA bits Usage [oas-1:dptps] Device Access fault if non-zero [dptps-1:l0dptsz] L0DPT index [l0dptsz-1:dptgs+1] L1DPT index [dptgs] Level 1 page descriptor index [dptgs-1:0] IGNORED Note: Bits [63:oas] of the input PA are processed according to the general requirements for handling of addresses in ATS-related transactions. The algorithm for the DPT walk process is as follows: 1. The SMMU_(R_)DPT_BASE register points at the base of the level 0 table. 2. The SMMU uses bits [dptps-1:l0dptsz] of the input PA as the index into the level 0 table. Note: Each level 0 entry is 8 bytes, so the offset into the level 0 table is (level 0 index) * (8 bytes). Note: There is one level 0 table. Every entry in the level 0 table is one of the three level 0 descriptors, or Invalid. The size of the level 0 table is determined by (number of entries) * (8 bytes), where (number of entries) is DPTPS / L0DPTSZ. 3. If the level 0 entry is a No Access, Block or invalid entry, then the DPT walk is complete. Otherwise, the level 0 entry is a level 0 Table entry and it contains a pointer to the base of a level 1 table. Note: There are many level 1 tables. The size of all the level 1 tables is the same, and determined by (number of entries) * (8 bytes), where (number of entries) is (L0DPTSZ / DPTGS) / 2. 4. The SMMU uses bits [l0dptsz-1:dptgs+1] of the input PA as the index into the level 1 table determined in the previous step. Note: Every entry in a level 1 table is a level 1 descriptor. Note: Every entry in a level 1 table is 8 bytes, so the offset into a level 1 table is (level 1 index) * (8 bytes). 5. The SMMU decodes the level 1 entry according to the rules for a level 1 descriptor. Bit [dptgs] of the input PA might be used to select between the upper and lower half of the descriptor, depending on the value of A[1:0] in the descriptor. The DPT walk is now complete. Note: Depending on PMCG configuration, DPT lookups are counted in PMCG events. 3.24.3.1 DPT descriptor formats There are four DPT descriptor formats: 1. Level 0 No Access entry: Indicates that a region of size L0DPTSZ is not accessible. 2. Level 0 Block entry: Indicates that a region of size L0DPTSZ is accessible with some constraints on Access Control and VMID. 3. Level 0 Table entry: Includes a pointer to a Level 1 entry. 4. Level 1 entry: Indicates either No Access, or Access Control and VMID information, for either two granules or a contiguous region. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 173

Chapter 3. Operation 3.24. Device Permission Table An entry that does not match any of the formats at the appropriate level is invalid. 3.24.3.1.1 Level 0 No Access entry MBZ 63 32 MBZ 31 2 0b00 1 0 Figure 3.14: Level 0 No Access entry Note: In all cases, if a bit described as RES0 is non-zero, or a field is configured to a Reserved value, the descriptor is Invalid. See section 3.24.4 DPT lookup errors. At level 0, a descriptor with bits [1:0] set to 0b00 indicates No Access. A level 0 No Access entry is not permitted to be cached in a DPT TLB. 3.24.3.1.2 Level 0 Block entry MBZ 63 32 VMID 31 16 MBZ 15 6 FWB 5 W 4 AC 3 2 0b01 1 0 Figure 3.15: Level 0 Block entry Note: In all cases, if a bit described as RES0 is non-zero, or a field is configured to a Reserved value, the descriptor is Invalid. See section 3.24.4 DPT lookup errors. At level 0, a descriptor with bits [1:0] set to 0b01 indicates a Block descriptor. The AC, W and FWB fields are valid. The encoding of the W field is: Value Meaning 0b0 Write access not permitted. 0b1 Write access permitted. The encoding of the AC field is: Value Meaning 0b00 VMID field is valid and is checked unless STE.DPT_VMATCH is 0b10. 0b01 VMID field is valid and is checked unless ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 174

Chapter 3. Operation 3.24. Device Permission Table Value Meaning STE.DPT_VMATCH is 0b01 or 0b10. 0b10 VMID field is RES0. 0b11 Reserved, invalid. Note: For a Realm STE, DPT_VMATCH is always 0b00 and therefore VMID is always checked when AC is 0b00 or 0b01. Note: For the Realm DPT, the value of AC determines the output PA space. If SMMU_IDR0.VMID16 is 0, then VMID[15:8] are RES0 regardless of the value of AC. If SMMU_IDR3.MTCOMB is 1, the encoding of the FWB field is: Value Meaning 0b0 Incoming memory type passes through unchanged. 0b1 Incoming memory type is replaced with Normal-iWB-oWB. If FWB is 1, then the transaction is Forced-WB. This further means that the following classes of Translated transactions are transformed as follows: Input transaction class Transformed to... Destructive read Non-destructive read (read, or read with clean and invalidate) Invalidate CleanInvalidate Destructive hint No-op If SMMU_IDR3.MTCOMB is 1 and the final memory type is any Normal cacheable type, then all of the following apply: • If the incoming transaction specifies a Normal cacheable memory type, then the final cache allocation hints are the incoming cache allocation hints after applying STE.ALLOCCFG. • If the incoming transaction does not specify a Normal cacheable memory type, then the final cache allocation hints are Read No-Allocate, Write No-Allocate. A level 0 DPT Block entry that does not generate a DPT lookup fault is permitted to be cached in a TLB as though the Block is a contiguous region of granules each of the size configured in SMMU_(R_)DPT_BASE_CFG.DPTGS. 3.24.3.1.3 Level 0 Table entry MBZ 63 56 Address[55:12] 55 32 Address[55:12] 31 12 MBZ 11 2 0b11 1 0 Figure 3.16: Level 0 Table entry ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 175

Chapter 3. Operation 3.24. Device Permission Table Note: In all cases, if a bit described as RES0 is non-zero, or a field is configured to a Reserved value, the descriptor is Invalid. See section 3.24.4 DPT lookup errors. At level 0, a descriptor with bits [1:0] set to 0b11 indicates a Table descriptor. The Address field indicates the next-level base address of a level 1 table. The address is aligned by the hardware to the size of the level 1 table. Bits above the implemented physical address size, indicated in SMMU_IDR5.OAS, are RES0. A level 0 DPT Table entry that does not generate a DPT lookup fault is permitted to be cached in a TLB. 3.24.3.1.4 Level 1 entry VMID1 63 48 MBZ 47 38 37 W1 36 AC1 35 34 MBZ 33 32 FWB1 VMID0 31 16 MBZ 15 12 Contig 11 8 MBZ 7 6 5 W0 4 AC0 3 2 A[1:0] 1 0 FWB0 Figure 3.17: Level 1 entry Note: In all cases, if a bit described as RES0 is non-zero, or a field is configured to a Reserved value, the descriptor is Invalid. See section 3.24.4 DPT lookup errors. A level 1 entry represents either: • The attributes for two adjacent granules, which might have different values. • The attributes for a naturally-aligned contiguous region, as indicated in the Contig field. The encoding of the A[1:0] field is as follows, and affects the interpretation of the Contig field: A[1] A[0] Contig Behavior 0b0 0b0 RES0 No Access to upper or lower granule. 0b0 0b1 RES0 No Access to upper granule. Lower granule controlled by AC0, W0, and VMID0. 0b1 0b0 RES0 Upper granule controlled by AC1, W1, and VMID1. No Access to lower granule. 0b1 0b1 Zero Upper granule controlled by AC1, W1, and VMID1. Lower granule controlled by AC0, W0, and VMID0. 0b1 0b1 Non-zero Contiguous region controlled by AC0, W0, and VMID0. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 176

Chapter 3. Operation 3.24. Device Permission Table The encoding of the Contig field is as follows: Value Meaning 0b0000 No contiguity 0b0001 64KB. Reserved if DPTGS is 64KB. 0b0010 2MB 0b0011 32MB 0b0100 512MB 0b0101 1GB 0b0110 16GB 0b0111 64GB Otherwise Reserved If A is 0b00, all other fields in the descriptor are RES0. If A is 0b01 or 0b10, then Contig is RES0 and the descriptor describes the properties for two granules. If A is 0b11 and Contig is zero, then the descriptor describes the properties for two granules. If the descriptor describes the properties for two granules, then: • Access to the upper granule is governed by A[1], AC1, W1, FWB1 and VMID1. If A[1] is 0, then the AC1, W1, FWB1 and VMID1 fields are RES0. • Access to the lower granule is governed by A[0], AC0, W0, FWB0 and VMID0. If A[0] is 0, then the AC0, W0, FWB0 and VMID0 fields are RES0. If A is 0b11 and Contig is non-zero, the AC1, W1, FWB1 and VMID1 fields are RES0 and only the AC0, W0, FWB0 and VMID0 fields are considered. The value of Contig indicates the size of the contiguous region. If Contig is not RES0, then encodings of Contig that select a region size greater than L0DPTSZ are Reserved. If Contig selects a Reserved encoding, the descriptor is invalid. If SMMU_IDR0.VMID16 is 0, then VMID0[15:8] and VMID1[15:8] are RES0 regardless of the values of ACx. The encoding and meaning of the ACx, Wx and FWBx fields is the same as for the AC, W and FWB fields in a Level 0 Block entry. It is possible that contiguous regions are inconsistently configured in the DPT: • In the case where a region is composed of descriptors that provide the same attributes, and differ only by the value of the Contig field, the DPT check is correctly applied. • In the case where a region is composed of descriptors that provide different attributes, and the values of the Contig fields produce overlapping regions, the SMMU might use any of the configured attributes in the overlapping regions. If the lookup of a level 1 DPT entry does not generate a DPT lookup fault, then: • If Contig is 0, each half of the entry that does not indicate No Access is permitted to be cached in a TLB. • If Contig is not RES0 and is non-zero, the entry is permitted to be cached in a TLB as a naturally-aligned contiguous region of granules, each of the size configured in SMMU_(R_)DPT_BASE_CFG.DPTGS. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 177

Chapter 3. Operation 3.24. Device Permission Table 3.24.4 DPT lookup errors In the DPT check, there are two classes of fault that can occur: • The DPT lookup process succeeds, but does not grant access for the transaction being checked. This is referred to as a Device Access fault, and is reported as F_TRANSL_FORBIDDEN. The criteria for generating a Device Access Fault are specified in 3.24.1 DPT check and section 3.24.2.1 DPT TLB caching and Device Access faults. • The DPT lookup process fails. This is referred to as a DPT lookup fault, and is reported both in SMMU_(R_)DPT_CFG_FAR and as F_TRANSL_FORBIDDEN. See 7.3.8 F_TRANSL_FORBIDDEN. The fault reporting priority for DPT lookup faults is as follows: Priority Reason Reported as Level 1 DPT_WALK_EN = 0 DPT_DISABLED 0 2 Invalid DPT register configuration DPT_WALK_FAULT 0 3 GPC fault on level 0 fetch DPT_GPC_FAULT 0 4 External abort on level 0 fetch DPT_EABT 0 5 Invalid level 0 descriptor DPT_WALK_FAULT 0 6 GPC fault on level 1 fetch DPT_GPC_FAULT 1 7 External abort on level 1 fetch DPT_EABT 1 8 Invalid level 1 descriptor DPT_WALK_FAULT 1 If the SMMU encounters any of these faults while performing a DPT lookup, and SMMU_(R_)DPT_CFG_FAR.FAULT = 0, the SMMU reports the information in SMMU_(R_)DPT_CFG_FAR and sets the FAULT bit to 1. If the FAULT bit is already 1, the fault is not reported in SMMU_(R_)DPT_CFG_FAR. When the SMMU makes a fault active in SMMU_(R_)DPT_CFG_FAR, it additionally makes the corresponding SMMU_(R_)GERROR.DPT_ERR active, if the value of SMMU_(R_)GERROR(N).DPT_ERR does not already indicate an active fault. If a fault is observable in SMMU_(R_)GERROR.DPT_ERR, then it has already been made observable in SMMU_(R_)DPT_CFG_FAR. If a client-originated access generates a DPT lookup fault, and the abort response arising from this is visible to the device, then the corresponding update of SMMU_(R_)GERROR.DPT_ERR is also observable. If a DPT lookup fault is observable in SMMU_(R_)GERROR.DPT_ERR, or the abort arising from that fault is visible to the client device then completion of a CMD_SYNC on an appropriate Command queue for the Security state guarantees observability of the corresponding F_TRANSL_FORBIDDEN in the Event queue, or that the F_TRANSL_FORBIDDEN has been discarded if the Event queue is unwritable, consistent with the existing rules for Event queue observability. If the F_TRANSL_FORBIDDEN event arising from a DPT lookup fault is observable, then the corresponding update of SMMU_(R_)GERROR.DPT_ERR is also observable. The SMMU is not required to report any of these faults if a DPT check can be resolved by a successful DPT TLB lookup. The following configurations are treated as Invalid DPT register configuration: • Reserved value 0b111, or a value exceeding SMMU_IDR5.OAS, for DPTPS. • Invalid or Reserved encoding of DPTGS. • Invalid or Reserved encoding of L0DPTSZ. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 178

Chapter 3. Operation 3.24. Device Permission Table • Configuration of L0DPTSZ to an address size that exceeds the address sizes in SMMU_IDR5.OAS or DPTPS. A DPT entry is treated as invalid if any of the following apply: • The entry does not match one of the descriptor formats for the given level of lookup. • The entry matches one of the formats for the given level of lookup, but has any RES0 bit set to 1 in the descriptor. • The entry contains a field that is not RES0 and is configured to a Reserved encoding. If a DPT_GPC_FAULT is reported then the corresponding GPC fault information is reported in the appropriate SMMU_ROOT_GPT_CFG_FAR or SMMU_ROOT_GPF_FAR register, with REASON = TRANSLATION and FAULTCODE = GPF_WALK_EABT. If a DPT_EABT is reported as the result of a RAS error, then the corresponding RAS information is reported in RAS syndrome registers, if implemented. See 12.2 Error consumption visible through the SMMU programming interface. 3.24.5 DPT maintenance operations The DPT maintenance commands remove cached DPT information from DPT TLBs regardless of whether the entry was inserted as the result of a successful ATS translation request or a walk of the DPT. The CMD_DPTI_ALL and CMD_DPTI_PA commands have the same consumption and completion behavior as for CMD_TLBI_ commands, in that: • Consumption of the command does not provide any guarantees. • Consumption of a subsequent CMD_SYNC on the same Command queue for which the CMD_DPTI_ command was issued guarantees that the appropriate invalidation has been performed, all Events and faults relating to the invalidated entries have been reported, and all client transactions using the invalidated entries have completed. Note: Within a given security state, DPT maintenance behaves as follows: • CMD_DPTI_ALL is always sufficient to invalidate all cached DPT information. • To invalidate cached copies of a Level 0 Table descriptor, a CMD_DPTI_PA with Leaf as 0 and an address anywhere in the region covered by that descriptor is sufficient. • To invalidate cached copies of a Level 0 Table descriptor as well as any Level 1 entries in the table pointed to by the Level 0 Table descriptor, a CMD_DPTI_PA with Leaf as 0 and SIZE matching the size of the region pointed to by the Level 0 Table descriptor and address aligned to the size of the region is sufficient. • To invalidate cached copies of a Level 0 Block descriptor or Level 1 entries for a contiguous region, a CMD_DPTI_PA with Leaf as 1, SIZE matching the size of the region and an address aligned to the size of the region is sufficient. • To invalidate cached copies of the DPT information in one granule of a non-contiguous Level 1 entry, a CMD_DPTI_PA with Leaf=1 and the address of the granule is sufficient. Note: CMD_TLBI_* Commands and broadcast TLBI operations for stage 1 and stage 2 are not required to invalidate any DPT TLB entries. If an implementation combines GPT and DPT information in DPT TLB entries, TLBI by PA operations remove DPT TLB entries according to the requirements for TLBI by PA operations. Otherwise, TLBI by PA operations are not required to invalidate any DPT TLB entries. The DPT maintenance operations can be found in 4.6 DPT maintenance. 3.24.6 Software guidance The DPT is expected, in the general case, to be used to partition physical address space between different EL1 contexts. The statements in this section therefore generally apply to stage 2 translation configuration. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 179

Chapter 3. Operation 3.24. Device Permission Table However, it is possible that future use cases might include use of the EL2-E2H StreamWorld, and in that case the statements in this section would apply to stage 1 translation configuration. 3.24.6.1 Access permissions considerations The DPT configuration for a memory region should be configured to represent the most-permissive access permissions for that memory region, for the final enabled stage of translation. This means that the DPT should grant access for a given granule if any of the following are true for the final enabled stage of translation: • The descriptor has AF = 1. • The descriptor has AF = 0 and hardware update of the Access Flag is enabled. This also means that the DPT should grant write access for a given granule if any of the following are true for the final enabled stage of translation: • The descriptor grants write access. This includes the case where the descriptor is writable-dirty. • The descriptor is writable-clean and HTTU is enabled. 3.24.6.2 Invalid to valid transition Assuming that initially the translation is Invalid then software must: 1. Configure the DPT to grant access for the granule 2. Perform appropriate cache maintenance and barriers 3. Configure the final enabled stage of translation to grant access for the granule Note: TLB maintenance is not required for Invalid to Valid transitions. 3.24.6.3 Valid to invalid transition When revoking access to a granule, software must: 1. Mark the descriptor in the final enabled stage of translation as Invalid. 2. Perform the appropriate TLBI and sync operation. Note: this means that new ATS translation requests will fail, but Translated transactions may still succeed. 3. Perform the appropriate CMD_ATC_INV and sync operation. Note: This means the device should not issue ATS Translated transactions, other than write-backs in case of a fully-coherent device. 4. If the StreamID is associated with a fully-coherent device, issue the appropriate CMOs for the granule. This might result with device write-backs. 5. Mark the DPT configuration as invalid. Note: This means that rogue ATS Translated transactions might succeed or fail. 6. Perform the appropriate CMD_DPTI_* and sync operation. Note: This means that rogue ATS Translated transactions will fail. 3.24.6.4 Clearing DPT lookup errors DPT lookup errors for a Security state are reported in both SMMU_(R_)DPT_CFG_FAR and SMMU_(R_)GERROR.DPT_ERR, if they are not already active. This means that the algorithm for clearing a DPT lookup error, once the error is resolved, is as follows: 1. Write 0 to SMMU_(R_)DPT_CFG_FAR.FAULT. Note: This will clear the whole register to zero. 2. Acknowledge SMMU_(R_)GERROR.DPT_ERR by writing SMMU_(R_)GERRORN.DPT_ERR to the same value. 3. Read SMMU_(R_)DPT_CFG_FAR.FAULT again to see if a new fault occurred between steps 1 and 2. See also: • 3.24.4 DPT lookup errors. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 180

Chapter 3. Operation 3.24. Device Permission Table 3.24.7 Considerations for configuring Split-stage ATS versus Full ATS with DPT checking This section is informative. If a use case relies on direct device access to physical address space, Full ATS with DPT checks must be configured for functional, not security reasons. If direct device access to physical address space is not required, then Split-stage ATS is preferable to configuration of Full ATS with DPT checks. The two approaches have comparable protection and performance characteristics, but Split-stage ATS is simpler to configure. Functional considerations: • If the device interface requires direct access to physical address space, for example if it has a fully coherent cache or if PCIe peer-to-peer routing is required without routing transactions via the host Root Port, Full ATS is required in order for address-based routing to function correctly. Security considerations: • Both configurations still enforce granule protection checks and limit access from a device to the physical address footprint of the guest in question. • Split-stage ATS permits the SMMU to enforce stage 2 permissions checks on Translated transactions, at a granularity of read/write/execute. Full ATS with DPT cannot distinguish these levels of permission. Performance considerations: • If Split-stage ATS is used, the EL2 software managing the SMMU does not need to configure the DPT for granules associated with that StreamID, and can therefore avoid the overhead of having to update the table and issue CMD_DPTI_* operations. • SMMU lookup performance and cacheability is expected to be comparable between the two different configurations. On a TLB miss, DPT lookup is likely to result in fewer levels of walk. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 181

Chapter 3. Operation 3.25. Granule Protection Checks 3.25 Granule Protection Checks The Realm Management Extension (FEAT_RME) [2] specifies the behavior of granule protection checks. An SMMU with RME performs the same checks for non-PE Requesters. The SMMU behavior is similar to that defined in FEAT_RME, FEAT_RME_GPC2, FEAT_RME_GPC3, and FEAT_RME_GDI in the following areas: • GPT format (the GPT encodings are identical, but their interpretation may differ between a PE and the SMMU. For example, this applies to the GDI encodings). • Invalidation and synchronization mechanisms for GPT updates (these are the same in a PE and the SMMU). The SMMU configuration registers for the GPT base address and format are equivalent to those in FEAT_RME. See also 17.4 Assignment of PARTID and PMG for SMMU-originated transactions. Granule protection checks are enabled only when SMMU_ROOT_CR0.GPCEN is 1. All statements in this section that describe how granule protection checks are performed only apply when granule protection checks are enabled. 3.25.1 Client-originated accesses Accesses to all physical addresses, except for fetches of GPT information, are subject to granule protection checks. A client-originated access that experiences a GPC fault is signaled to the client device in the same manner as an External abort. A client-originated access that experiences a GPC fault on the output address of the access is not reported in the Event queue. 3.25.1.1 GPC for client devices without a StreamID Granule protection checks also apply to accesses from client devices that are not associated with a StreamID. These devices are referred to as NoStreamID devices. NoStreamID devices only access PA space, and are not associated with any stage 1 or stage 2 translation configuration. The GPC fault reporting behavior for accesses from NoStreamID devices is the same as for regular client-originated accesses. NoStreamID devices are not associated with a SEC_SID value. Transactions issued by a NoStreamID device include both a physical address and a PA space. An access from a NoStreamID device with a physical address that exceeds the implemented output address size, advertised in SMMU_IDR5.OAS, is terminated with an abort and no Event record or fault is recorded. The SMMU does not perform any architectural transformations or overrides on NoStreamID accesses, but the SMMU may apply protocol-specific normalization on transaction attributes. Note: Accesses from System Agent (SA) clients are NoStreamID accesses and are therefore subject to granule protection checks. See 3.25.10.2 System Agent (SA). 3.25.1.2 Speculative and hint accesses Note: The SMMU does not report faults encountered during a speculative translation request, translation of transactions marked as speculative, prefetch commands, or for NW-DCP or DH transactions. See also 3.14 Speculative accesses. For an SMMU with RME, GPC faults encountered during a speculative translation request, translation of transactions marked as speculative, prefetch commands, or for NW-DCP or DH transactions, are reported as follows: • No event record is generated. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 182

Chapter 3. Operation 3.25. Granule Protection Checks • If SMMU_IDR0.RME_IMPL = 0, it is CONSTRAINED UNPREDICTABLE whether the GPC fault is reported or not reported. If it is reported, then it is reported in the appropriate SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register, if that register does not already contain an active fault. • If SMMU_IDR0.RME_IMPL = 1, the granule protection check fault is not reported. For speculative translation requests, then: • If SMMU_IDR0.RME_IMPL = 0, it is CONSTRAINED UNPREDICTABLE whether the GPC on the output address of a translation request is applied at the time of the translation or only when a transaction using the translation is issued. • If SMMU_IDR0.RME_IMPL = 1, the GPC on the output address of a translation request is applied at the time of the translation. 3.25.2 Interactions with PCIe ATS Consistent with the rules for all accesses, all PCIe client transactions are subject to granule protection checks. The SMMU_CR0.ATSCHK bit has no effect on granule protection checks. If an SMMU-originated access experiences a GPC fault while servicing an ATS Translation Request, the SMMU responds to the ATS Translation Request as Completer Abort. If an ATS Translation Request is completed with Success and R == W == 0, the address in the Translation Completion is not valid and it is not subject to granule protection checks. If SMMU_IDR0.RME_IMPL == 1, RME features are supported and the SMMU performs the GPC on the output address for the result of an ATS Translation Request before sending the completion. The SMMU returns a translation region size in the ATS Translation Completion such that the GPC passes for accesses to anywhere in the region. If SMMU_IDR0.RME_IMPL == 0, the SMMU is permitted but not required to perform a GPC on the output address for the result of an ATS Translation Request. If the output address for an ATS Translation Request fails a GPC, the SMMU responds to the ATS Translation Request as Completer Abort. ATS Translated transactions are subject to granule protection checks. An ATS Translated transaction that fails the GPC is terminated with an abort. Granule protection checks for an ATS Translation Request with PM = 1 are the same as for the equivalent Untranslated transaction. This means that NSP enforcement rules apply to that transaction. See 3.25.10.1.2 Granule Protection Checks for NSP. For an ATS Translation Request that has PM = 1, all of the following are true: • If the GPC forbids the access only because it requires write permission, the result is a successful ATS Translation Completion with W == 0, and with R and Exe resolved by all enabled stages of translation. • If the GPC forbids the access for any other reason, the result is an ATS Translation Completion with Completer Abort (CA) status. Note: A device must distinguish between the permissions granted for a PM = 0 Translation Request and those granted for a PM = 1 Translation Request. For example, this can be achieved by tagging translations in an ATC with the PM attribute and considering it during cache lookups. An ATC must correctly process an ATS Invalidation request regardless of the PM value associated with a translation. See section 3.25.10.1.1 Protected Mode. 3.25.3 SMMU-originated accesses An SMMU-originated access that experiences a GPC fault is reported as though it had experienced an External abort. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 183

Chapter 3. Operation 3.25. Granule Protection Checks Consistent with the behavior of F_STE_FETCH, F_CD_FETCH, F_VMS_FETCH and F_WALK_EABT, signaling and reporting of these failures is not affected by the value of the CD.{A, R, S} bits nor the STE.{S2S, S2R} bits. For an SMMU with SMMU_IDR0.RME_IMPL == 1: • For each of the F_STE_FETCH, F_CD_FETCH, F_VMS_FETCH and F_WALK_EABT event records, there is a new field at bit 80 named GPCF. • An F_STE_FETCH, F_CD_FETCH, F_VMS_FETCH or F_WALK_EABT arising from a GPC fault is reported with GPCF=1. • An F_STE_FETCH, F_CD_FETCH, F_VMS_FETCH or F_WALK_EABT arising for a reason other than a GPC fault is reported with GPCF=0. This is unchanged from an SMMU without SMMU_IDR0.RME_IMPL == 1. For example, if the SMMU experiences a GPC fault: • On an access to an STE, it is reported as F_STE_FETCH, with GPCF=1. This is signaled to the client as an External abort. • On an access to a Non-secure Event queue, it is reported through SMMU_GERROR.EVENTQ_ABT_ERR. 3.25.4 Reporting of GPC faults The reasons for a GPC fault are categorized into three groups: • Faults arising from an access to a Location forbidden by the GPT configuration, referred to as a Granule Protection Fault (GPF). • Faults arising from misconfiguration of SMMU_ROOT_GPT_BASE, SMMU_ROOT_GPT_BASE_CFG or the GPT, referred to as a GPT lookup error. • Faults arising from RAS errors. The following error conditions represent a GPF, and are reported in SMMU_ROOT_GPF_FAR: • If SMMU_ROOT_GPT_BASE_CFG.APPSAA is 0 and an access to a PA space other than Non-secure has a physical address that exceeds the range configured in SMMU_ROOT_GPT_BASE_CFG.PPS. See 3.25.9 Accesses above the protected address space. • An access was attempted to a location that is forbidden by the GPT configuration. Note: The accesses that are forbidden by the GPT configuration are extended by Granular Data Isolation (GDI). See 3.25.10 Granular Data Isolation. The following error conditions represent a GPT lookup error, and are reported in SMMU_ROOT_GPT_CFG_FAR: • Fields in SMMU_ROOT_GPT_BASE_CFG are configured to reserved values. • Configuration of SMMU_ROOT_GPT_BASE_CFG.PPS to exceed SMMU_IDR5.OAS. • Configuration of SMMU_ROOT_GPT_BASE_CFG.{SH, IRGN, ORGN} to an invalid combination. • SMMU_ROOT_GPT_BASE.ADDR is configured to exceed the size configured in SMMU_ROOT_GPT_BASE_CFG.PPS. • The output address of a GPT Table Entry exceeds the size configured in SMMU_ROOT_GPT_BASE_CFG.PPS. • The SMMU depends on using values in an invalid GPT Entry. • The SMMU experienced an External abort while fetching a GPT Entry. • The SMMU experienced a RAS error while fetching a GPT Entry. This is reported in the same manner as if the SMMU experienced an External abort while fetching a GPT Entry. 3.25.5 SMMU behavior if a GPC fault is active If a client-originated or SMMU-originated access experiences a GPF reported in SMMU_ROOT_GPF_FAR, then: • If there is no prior GPF in SMMU_ROOT_GPF_FAR, the appropriate syndrome information is recorded in SMMU_ROOT_GPF_FAR. • Other accesses that do not experience a GPF or GPT lookup error continue as specified. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 184

Chapter 3. Operation 3.25. Granule Protection Checks • The GPF remains active until software writes 0 to SMMU_ROOT_GPF_FAR.FAULT. If a client-originated or SMMU-originated access experiences a GPT lookup error reported in SMMU_ROOT_GPT_CFG_FAR, then: • If there is no prior GPT lookup error in SMMU_ROOT_GPT_CFG_FAR, the appropriate syndrome information is recorded in SMMU_ROOT_GPT_CFG_FAR. • Other accesses that do not experience a GPF or GPT lookup error continue as specified. • The GPT lookup error remains active until software writes 0 to SMMU_ROOT_GPT_CFG_FAR.FAULT. An SMMU with RME has two additional edge-triggered wired interrupts: Source Trigger reason GPF_FAR An error becomes active in SMMU_ROOT_GPF_FAR. GPT_CFG_FAR An error becomes active in SMMU_ROOT_GPT_CFG_FAR. 3.25.6 Observability of GPC faults If the termination of a client transaction as a result of a GPC fault is observable to the client device, then: • If the appropriate SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register already contained an active fault, then it is not updated in this case. • If the appropriate SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register did not already contain an active fault, then the related syndrome information is observable in the appropriate register. If an interrupt indicating the presence of a GPC fault is observable, then the syndrome information is observable in the SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register as appropriate. If the termination of a client transaction as a result of a GPC fault has been observable to the client device, completion of a subsequent CMD_SYNC guarantees observability of any related events in the Event queue or, if the Event queue is unwritable, that the Event will not become observable. For an SMMU with SMMU_ROOT_IDR0.BGPTM == 1, then: • After completion of a broadcast TLBI PA and DSB instruction on the PE, completion of a subsequent CMD_SYNC guarantees that no Events relating to GPT configuration invalidated by that TLBI and DSB will later be made observable in the Event queue. • After a broadcast TLBI PA instruction on the PE, completion of a subsequent DSB instruction guarantees that any errors reported in the SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR registers relating to GPT configuration invalidated by that TLBI are already observable. For an SMMU with SMMU_ROOT_IDR0.RGPTM == 1, then: • After completion of a register-based TLBI by PA, indicated by SMMU_ROOT_TLBI_CTRL.RUN, completion of a subsequent CMD_SYNC guarantees that no Events relating to GPT configuration invalidated by that TLBI by PA will later be made observable in the Event queue. • Completion of a register-based TLBI by PA, indicated by SMMU_ROOT_TLBI_CTRL.RUN, guarantees that any errors reported in the SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR registers relating to GPT configuration invalidated by that TLBI by PA are already observable. If an F_STE_FETCH, F_CD_FETCH, F_VMS_FETCH or F_WALK_EABT with GPCF == 1 is observable in the Event queue then either: • If the appropriate SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register already contained an active fault, then it is not updated in this case. • If the appropriate SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register did not already contain an active fault, then the related syndrome information is observable in the appropriate register. If an update to SMMU_(*_)GERROR resulting from a GPC fault is observable, then either: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 185

Chapter 3. Operation 3.25. Granule Protection Checks • If the appropriate SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register already contained an active fault, then it is not updated in this case. • If the appropriate SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR register did not already contain an active fault, then the related syndrome information is observable in the appropriate register. If a fault to be reported in SMMU_()GERROR is observable in SMMU_ROOT_GPF_FAR or SMMU_ROOT_GPT_CFG_FAR, then the fault will also be reported in SMMU()GERROR in finite time. The existing mechanisms for ensuring the visibility of errors in SMMU()GERROR also apply in this case. For example, completion of an Update of SMMU_CR0.EVENTQEN to 0 guarantees observability of any errors to be reported in SMMU_GERROR.EVENTQ_ABT_ERR. 3.25.7 DCMDQ-related GPC faults If SMMU()IDR6.DCMDQ is 0b01, a GPF or GPT lookup error generated by any of the DCMDQ-related operations listed below is reported in both: • SMMU(*_)ECMDQ_CONSn.{HS_ERR, HS_ERR_REASON}. • SMMU_ROOT_GPF_FAR.{REASON, FAULTCODE} or SMMU_ROOT_GPT_CFG_FAR.{REASON, FAULTCODE}. Access type Value of REASON Value of FAULTCODE Value of HS_ERR_REASON STE fetch (for DCMDQ command fetch/CMD_SYNC MSI translation) TRANSLATION GPF_STE_FETCH HERROR_IPA(1) Stage 2 walk (for DCMDQ command fetch/CMD_SYNC MSI translation) TRANSLATION GPF_WALK_EABT HERROR_IPA(1) DCMDQ fetch GERROR DCMDQ_GPF HERROR_IPA(1) CIT/VSTT fetch TRANSLATION GPF_CIT_FETCH or GPF_VSTT_FETCH HERROR_SID_EABT MSI following CMD_SYNC GERROR MSI_CMDQ_GPF HERROR_MSI_ABT(2) (1) GPC faults on these accesses are not reported as though they have experienced an External abort. This behavior differs from other SMMU-originated accesses experiencing a GPC fault. (2) In line with the reporting of HERROR_MSI_ABT as described in 3.5.7.7 DCMDQ Errors and Faults, the External abort is observable on the subsequent CMD_SYNC. 3.25.8 Non-secure Only (NSO) This section applies only when SMMU_ROOT_IDR0.NSO is 1. If SMMU_ROOT_GPT_BASE_CFG.NSO is 1, Granule Protection Checks are extended to support the Non-secure Only (NSO) GPI field encoding introduced in the Armv9.5 architecture[2]. The NSO encoding specifies that accesses to the Non-secure PA space are permitted only by the Non-secure or Root Security states. If SMMU_ROOT_GPT_BASE_CFG.NSO is 1, then an access to a location that is marked as NSO by a GPT descriptor is forbidden when any of the following are true: • The access originates from a Realm StreamID. This includes accesses to the Non-secure PA space. • The access originates from a Secure StreamID. This includes accesses to the Non-secure PA space. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 186

Chapter 3. Operation 3.25. Granule Protection Checks • The access is a NoStreamID access to any PA space other than the Non-secure PA space. Note: A forbidden access, as described above, causes a GPF to be reported. For more information, see 3.25.4 Reporting of GPC faults. 3.25.9 Accesses above the protected address space This section applies only when SMMU_ROOT_IDR0.APPSAA is 1. The SMMU_ROOT_GPT_BASE_CFG.APPSAA register field controls the behavior of accesses that have physical addresses above the protected address space, as configured in SMMU_ROOT_GPT_BASE_CFG.PPS. If SMMU_ROOT_GPT_BASE_CFG.APPSAA is 0, for an access with a physical address above the protected address space, all of the following apply: • The access must be to the Non-secure PA space. • If the access is to any PA space other than the Non-secure PA space, it causes a level 0 Granule Protection Fault (GPF). Note: For more information, see 3.25.4 Reporting of GPC faults. Note: If SMMU_ROOT_GPT_BASE_CFG.APPSAA is 0, locations that are above the protected address space are effectively marked by the GPT as NS. If SMMU_ROOT_GPT_BASE_CFG.APPSAA is 1, an access with a physical address that is above the protected address space is permitted to be to any PA space. Note: If SMMU_ROOT_GPT_BASE_CFG.APPSAA is 1, locations that are above the protected address space are effectively marked by the GPT as All Accesses Permitted. 3.25.10 Granular Data Isolation Granular Data Isolation (GDI) provides support for two additional PA spaces: • Non-secure Protected (NSP). See 3.25.10.1 Non-secure Protected (NSP). • System Agent (SA). See 3.25.10.2 System Agent (SA). The GPT is extended to support new GPI encodings, as described in the following table: Value Meaning 0b0100 Accesses permitted to System Agent (SA) PA space only. 0b0101 Accesses permitted to Non-secure Protected (NSP) PA space in compliance with NSP enforcement rules. See section: 3.25.10.1.2 Granule Protection Checks for NSP. 0b0110 Reserved. 0b0111 Reserved. Note: The interpretation of these GPI encodings differs between the PE and the SMMU. These GPI encodings are enabled using the following controls: • SMMU_ROOT_GPT_BASE_CFG.SA for SA. • SMMU_ROOT_GPT_BASE_CFG.NSP for NSP. Otherwise these encodings are considered to be reserved for the purpose of GPT descriptor validity checks. If SMMU_ROOT_IDR0.GDI is 1, L1 GPT descriptor validity checks must be performed on a pair of L1 GPT descriptors within a naturally aligned 16-byte region of memory. An individual L1 GPT descriptor is valid only when both descriptors within the pair are valid. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 187

Chapter 3. Operation 3.25. Granule Protection Checks For information on the effect of GDI on PMCG, MPAM and MEC, see: • 10.4 StreamIDs and filtering. • 17.7 Determination of PARTID space values. • Chapter 18 Support for Memory Encryption Contexts. 3.25.10.1 Non-secure Protected (NSP) The following sections apply only when SMMU_ROOT_IDR0.GDI is 1. 3.25.10.1.1 Protected Mode The NSP PA space is accessible only by Non-secure SMMU client devices when they are in Protected Mode (PM). If SMMU_ROOT_IDR0.GDI is 1, the SMMU architecture is extended to support the PM attribute, which is associated with Non-secure streams. The presence of the PM attribute in accesses is determined as follows: • Accesses associated with the Non-secure state (for example, accesses from Non-secure StreamIDs) always specify a PM attribute. • The following accesses do not specify a PM attribute: – Accesses associated with the Realm state. – Accesses associated with the Secure state. – Accesses from NoStreamID clients. The PM attribute propagates with a request and is presented to the Granule Protection Check along with the resolved PA and PA space. The SMMU does not support overrides for the PM attribute. If a client device does not provide a PM attribute, it takes a default value of 0. If SMMU_ROOT_CR0.GPCEN is 0 or SMMU_ROOT_GPT_BASE_CFG.NSP is 0, the PM attribute is treated as 0 for the purpose of resolving the output PA space. For SMMU-originated accesses, the PM attribute is always 0. Note: An SMMU client sets the PM attribute on a transaction if it contains, or is derived from, data associated with the Protected Mode. The rules for client entry and exit from Protected Mode are outside of the scope of this specification. Note: If SMMU_ROOT_IDR0.GDI is 0, PM inputs are ignored by the SMMU. If any of the following event records are generated as a result of a transaction with PM = 1, they are converted to an F_PROTECTED event record before being written to the event queue: • 7.3.2 F_UUT. • 7.3.6 F_BAD_ATS_TREQ. • 7.3.8 F_TRANSL_FORBIDDEN. • 7.3.12 F_WALK_EABT. • 7.3.13 F_TRANSLATION. • 7.3.14 F_ADDR_SIZE. • 7.3.15 F_ACCESS. • 7.3.16 F_PERMISSION. • 7.3.17 F_TLB_CONFLICT. • 7.3.19 E_PAGE_REQUEST. Note: See also 7.3.21 F_PROTECTED. For all other event records, the SMMU guarantees that IMPLEMENTATION DEFINED fields do not include information that is derived from the client transaction attributes, other than the StreamID or the SubstreamID. Note: This guarantee prevents malicious disclosure of protected information through input addresses. An MMU fault generated by an access with PM = 1 might be fatal to the stream, because the event record does not provide ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 188

Chapter 3. Operation 3.25. Granule Protection Checks software with sufficient information to handle it. Software should take this restriction into account when configuring translation tables for devices with PM = 1. For example, NSP pages must be statically mapped to avoid Translation faults. For an ordinary transaction that generates an event that is converted to an F_PROTECTED event, a corresponding ATS Translation Request with the same properties is handled as specified in section 3.9.1.2 Responses to ATS Translation Requests for the original event. Additional restrictions apply for Protected Mode transactions. For more information, see: • 7.5 Global error recording. • 3.13.1 Software update of flags. • 3.13.2 Access flag hardware update. • 3.12.2 Stall model. For information on the effect of the PM attribute on PCIe and ATS, see: • 13.7 PCIe permission attribute interpretation. • 3.25.2 Interactions with PCIe ATS. For information on the effect of the PM attribute on DPT checks and DPT caching, see: • 3.24.1 DPT check. • 3.24.2 DPT caching behavior. 3.25.10.1.2 Granule Protection Checks for NSP If SMMU_ROOT_GPT_BASE_CFG.NSP is 1, NSP enforcement rules apply. This means that accesses forbidden by the GPT configuration are extended to include the following cases: • An access to a location marked by the GPT as NSP, where any of the following apply: – The access is associated with the Non-secure state and has PM = 0. – The access is associated with the Realm state. – The access is associated with the Secure state. – The access is a NoStreamID access in a PA space other than NSP. • An access that is to a location marked by the GPT as either NS, NSO or All Accesses Permitted, where all of the following apply: – The access is associated with the Non-secure state. – The access requires write permission. – The access has PM = 1. The SMMU overrides the output PA space of an access to NSP if both of the following apply: • The access originates from a Non-secure StreamID with PM = 1. • The access is to a location that is marked as NSP by the GPT. Certain transactions requiring write permission are downgraded by the SMMU if write permission is not granted. For such transactions, if the GPC forbids them only because of the missing write permission, the GPC behavior is as described in the following sections: • 3.22 Destructive reads and directed cache prefetch transactions. • 16.7.2 Non-data transfer transactions. This means that a GPF is not reported, and that the GPC behavior is as follows: Operation type GPC behavior if no write permission Destructive read Transformed into a read or read with clean and invalidate. Invalidate Transformed into a CleanInvalidate operation. Destructive hint Invalidate does not occur. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 189

Chapter 3. Operation 3.25. Granule Protection Checks Note: The SMMU might observe accesses from NoStreamID devices that specify NSP as an input PA space. For example, an access issued by a debug port with privileges to access the NSP PA space. The NSP enforcement rules allow such accesses if the location is marked by the GPT as NSP, or as All Accesses Permitted (since these accesses do not specify a PM attribute). This means that there are two scenarios in which the output PA space of the access is NSP: • A Non-secure StreamID access has PM = 1 and the location is marked as NSP in the GPT. • A NoStreamID access has PAS = NSP and the location is marked as NSP or All Accesses Permitted in the GPT. Note: Locations that are above the PPS are effectively marked by the GPT as either NS or All Accesses Permitted, depending on the configuration of SMMU_ROOT_GPT_BASE_CFG.APPSAA == 1. Locations within a GPC bypass window are effectively marked by the GPT as All Accesses Permitted. For more information, see 3.25.9 Accesses above the protected address space. 3.25.10.2 System Agent (SA) This section applies only when SMMU_ROOT_IDR0.GDI is 1. All System Agent (SA) clients are NoStreamID devices. Note: Only NoStreamID devices are capable of generating accesses in the SA PA space. Note: The SMMU does not limit the set of PA spaces that a specific NoStreamID device can express. System integration must guarantee that each NoStreamID device can access only the set of PA spaces permitted by its Security state and by the system’s External Debug State. Accordingly, system integration must guarantee that a SA client can issue accesses only to the SA PA space and Non-secure PA space. If SMMU_ROOT_GPT_BASE_CFG.SA is 1, accesses forbidden by the GPT configuration are extended to include any access where both of the following apply: • The access is in a PA space that is not SA. • The access is to a location marked by the GPT as SA. Note: An access in the SA PA space is permitted only if the location is marked by the GPT as either of the following: • SA. • All Accesses Permitted. Note: If SMMU_ROOT_GPT_BASE_CFG.SA is 1, the SMMU Root programming interface is exclusively controlled by an agent trusted by all System Agent clients. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 190

Chapter 3. Operation 3.26. Permission Indirections 3.26 Permission Indirections The Armv8.9 architecture[2] introduces a permissions indirection scheme, for stage 1 and stage 2 independently. This section describes the corresponding SMMU architecture changes for this feature. 3.26.1 Stage 1 permission indirections Note: PIIndex is a translation table descriptor field, introduced by FEAT_S1PIE in the A-profile architecture[2]. Stage 1 permissions are determined according to feature support and control bits as follows: SMMU_IDR3.S1PI STE.S1PIE CD.PIE Result 0 RES0 RES0 Stage 1 permission indirections are not supported and stage 1 permissions are determined directly from stage 1 translation tables. 1 0 RES0 Stage 1 permissions are determined directly from stage 1 translation tables. 1 1 0 Stage 1 permissions are determined directly from stage 1 translation tables. 1 1 1 Stage 1 permissions are determined from CD.PIIP, and CD.PIIU if appropriate, using PIIndex from the stage 1 descriptors. Note: The STE.S1PIE control allows a hypervisor to prevent stage 1 configuration of permission indirections, for StreamIDs where Context Descriptors are controlled directly by a guest. In this case, the hypervisor is expected to present an emulated SMMU without support for permission indirections to the guest. Note: The SMMU does not support the stage 1 permission overlay feature present in the PE architecture. If the stage 1 Indirect Permission Scheme is enabled, then CD.WXN is RES0 and has no effect. If the stage 1 Indirect Permission Scheme is enabled, the stage 1 permissions are computed as follows: 1. The permissions are decoded from CD.PIIU[PIIndex] and CD.PIIP[PIIndex]. 2. For the NS-EL1, Secure, Realm-EL1 and any-EL2-E2H StreamWorlds, CD.PAN is applied as follows: • If CD.PAN is 1 and any Unprivileged access is granted then Privileged read and write permissions are removed. This applies regardless of the value of CD.EPAN. • It is IMPLEMENTATION DEFINED whether this step is performed here or after step 4. 3. For a translation for Secure state, then if SMMU_S_CR0.SIF is 1 and the stage 1 output address is Non-secure, then both Privileged execute and Unprivileged execute permission are removed. 4. For a translation for Realm state, then if the stage 1 output address is Non-secure, then both Privileged execute and Unprivileged execute permission are removed. This permissions computation is not affected by CD.{EPD0, EPD1, E0PD0, E0PD1}. 3.26.2 Stage 2 permission indirections Note: POIndex is a translation table descriptor field, introduced by FEAT_S2POE in the A-profile architecture[2]. Stage 2 permissions are determined according to feature support and control bits as follows: ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 191

Chapter 3. Operation 3.26. Permission Indirections SMMU_IDR3.S2PI STE.S2PIE STE.S2POE Result 0 RES0 RES0 Stage 2 permission indirections are not supported and stage 2 permissions are determined directly from stage 2 translation tables. 1 0 0 Stage 2 permissions are determined directly from stage 2 translation tables. 1 0 1 ILLEGAL, generates C_BAD_STE. 1 1 0 Stage 2 permissions are determined from SMMU_S2PII using PIIndex from the stage 2 descriptors. 1 1 1 Stage 2 permissions are determined from STE.S2POI using POIndex from the stage 2 descriptors, combined with SMMU_S2PII using PIIndex from the stage 2 descriptors. Stage 2 permissions are computed in the following order, and F_PERMISSION events are reported with this priority from highest to lowest: 1. For the stage 2 translation of a stage 1 output address, including next-level table addresses of a stage 1 translation table walk, the AssuredOnly permission check is applied. 2. The Base and Overlay permissions are applied as follows: • If permission overlays are enabled, the Overlay permissions are looked up from STE.S2POI indexed by POIndex. • If permission indirection is enabled then the Base permissions are looked up from SMMU_S2PII indexed by PIIndex. Otherwise, the Base permissions are taken directly from the translation table descriptor. • The base and overlay permissions are combined as described in the A-profile architecture[2] and the resulting permissions are applied. 3. For the translation of a stage 1 translation table walk or CD fetch, the effect of STE.S2PTW is applied. 4. For any access that requires write permission, if permission indirection is enabled, the Dirty state permission check is applied. 5. For directed prefetch operations and cache maintenance operations, the effects of STE.DRE and STE.DCP are applied. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 192

Chapter 3. Operation 3.27. Translation Hardening 3.27 Translation Hardening 3.27.1 Protected attribute If SMMU_IDR3.THE == 1, SMMU behavior for the stage 1 Protected attribute is the same as for the PE behavior for the Protected attribute, including: • The Protected attribute is always present in VMSAv9-128 descriptors. • The Protected attribute is only present in VMSAv8-64 descriptors if CD.PnCH is 1. • The Contiguous bit is not present in VMSAv8-64 descriptors if CD.PnCH is 1. Note: The SMMU does not support Read-Check-Write (RCW) operations, and therefore none of the RCW checks are applied by the SMMU. Note: Unless the STE.AssuredOnly check is enabled, there is no benefit to enabling CD.PnCH in the SMMU, except when VMSAv8-64 stage 1 translation tables for a PE context using the Protected attribute are shared with SMMU contexts, and the SMMU must therefore not interpret bit 52 of Block and Page descriptors as the Contiguous bit. 3.27.2 AssuredOnly permission checks If SMMU_IDR3.THE == 1, SMMU behavior for the stage 2 AssuredOnly attribute is the same as for the PE behavior for the AssuredOnly attribute, except that: • Use of the AssuredOnly attribute is configured via STE.AssuredOnly instead of VTCR_EL2.AssuredOnly. • A stage 2 Permission fault that would be reported with ESR_EL2.AssuredOnly set to 1 is instead reported as a stage 2 F_PERMISSION with AssuredOnly set to 1. • If a CD, or the L1CD that points to it, is fetched from memory that is not marked AssuredOnly at stage 2, then any access translated from the TTB0 or TTB1 field in that CD does not have the Assured Translation property, and this overrides the Assured Translation property as-defined in the A-profile architecture[2]. • If a CD, and the L1CD that points to it, is fetched from memory that is marked AssuredOnly at stage 2, then any access translated from the TTB0 or TTB1 field in that CD gets the Assured Translation property according to the definition in the A-profile architecture[2]. Note: The AssuredOnly check does not apply to Context Descriptors. For example, if an L1CD is fetched from memory that is not marked AssuredOnly at stage 2, and the subsequent CD is fetched from memory that is marked as AssuredOnly at stage 2, then this does not result with an AssuredOnly permission fault. Note: In the PE architecture, if stage 1 translation is disabled, then any access to a region marked AssuredOnly at stage 2 generates a Permission fault. This also applies in the SMMU architecture for any case where stage 1 translation is bypassed, whether because of STE.Config or STE.S1DSS configuration. For an ATS Translation Request, the AssuredOnly check is performed in the same manner as for a regular transaction, and if the AssuredOnly check fails then the ATS Translation Completion is sent with Success and R == W == 0. For an ATS Translated Transaction when STE.EATS is 0b10 (Split-stage ATS), then AssuredOnly is ignored for the purposes of determining whether the transaction is permitted. ARM IHI 0070 H.a Copyright © 2016-2026 Arm Limited or its affiliates. All rights reserved. Non-confidential 193