Show/Hide Mobile Menu

Interactivity in RealityKit

May 26, 2022

For mobile AR you need to translate interactions on a 2D screen to the 3D scene. This is done with raycasting which involves shooting a ray from the camera and checking what objects the ray hits. ARView provides two main raycasting methods, one is for hitting entities in your scene and the other is for hitting AR planes.

Raycasting Entities

To do a raycast on entities you would use the following method:

let hits = hitTest(screenLocation, query: .all, mask: .all)

It's a good idea to use collision groups so you are only hitting the entities you are interested in. Collision groups have set methods so you can target multiple groups by using the union method for example. CollisionGroup is an OptionSet so you use bitwise operators to set the bit for each group.

let redCollisionGroup = CollisionGroup(rawValue: 1 << 0)
let blueCollisionGroup = CollisionGroup(rawValue: 1 << 1)

<< bitwise operator means move the value on the left by the number of bits on the right. So redCollisionGroup has a binary value of 01, while blueCollisionGroup has a binary value of 10. You need to add a collision shape to an entity if you want it to work with raycasting as well as have it involved in physics.

let collisionFilter = CollisionFilter(group: redCollisionGroup, mask: .default)
entity.collision = .init(shapes: [.generateBox(size: [1, 1, 1])], mode: .default, filter: collisionFilter)

When you create a collision filter you specify what group the entity is part of and what groups it interacts with during a physics simulation. Since we are not working with physics now we will just use a default collision filter for the mask. For the collision shape you can use simple shapes like box, sphere and capsule or get ShapeResource to generate a shape for you with generateConvex method. Be careful with this method as it might create a shape that's larger than your mesh. Generally when working with collision shapes you should try and go for the simplest shape to reduce computation. So to do a raycast when the user taps on the screen and only hit the nearest entity that is in the red collision group you would do the following:

@IBAction func handleTap(_ gestureRecognizer: UITapGestureRecognizer) {
    guard gestureRecognizer.view != nil else { return }

    let screenLocation = gestureRecognizer.location(in: self)
    let hits = hitTest(screenLocation, query: .nearest, mask: redCollisionGroup)
    if hits.count > 0 {
        // Do something with the entity (hits[0].entity)
    }
}

To hit both red and blue collision groups you would do:

let hits = hitTest(screenLocation, query: .nearest, mask: redCollisionGroup.union(blueCollisionGroup))

Raycasting AR Planes

To raycast to a AR plane you would use the following method in ARView:

let results = raycast(from: screenLocation, allowing: .existingPlaneInfinite, alignment: .horizontal)

For allowing parameter I tend to use .exitingPlaneInfinite in case the user taps just outside the plane geometry.

Example App

Before we can use these methods we need to visualize the AR planes so the user can tap on them to place an object. Open XCode and go to File > New > Project. Select Augmented Reality App under the iOS tab. On the next screen enter a Product Name. ARViewContainer struct in ContentView.swift should look like the code below.

Alternatively you can download the project from Github by clicking on the green Code button and selecting Download ZIP. Then open InteractivityInRealityKit.xcodeproj in XCode.

ContentView.swift:

struct ARViewContainer: UIViewRepresentable {
    
    func makeUIView(context: Context) -> ARView {
        return CustomARView(frame: .zero)
    }
    
    func updateUIView(_ uiView: ARView, context: Context) {}
}

Create a new Swift file called CustomARView.swift and paste in the following code:

import RealityKit
import ARKit

class CustomARView: ARView, ARCoachingOverlayViewDelegate, ARSessionDelegate {
    private var showARPlanes = true
    private let arPlaneMaterial = SimpleMaterial(color: .init(white: 1.0, alpha: 0.5), isMetallic: false)
    private var anchorEntitiesByAnchor: [ARAnchor: AnchorEntity] = [:]
    
    required init(frame: CGRect) {
        super.init(frame: frame)
        
        let config = ARWorldTrackingConfiguration()
        config.planeDetection = [.horizontal]
        session.delegate = self
        session.run(config, options: [])
        
        addCoaching()
    }
    
    @objc required dynamic init?(coder decorder: NSCoder) {
        fatalError("init(coder:) has not been implemented")
    }
    
    // MARK: - ARSessionDelegate
    
    func session(_ session: ARSession, didAdd anchors: [ARAnchor]) {
        guard showARPlanes else { return }
        for anchor in anchors {
            if let planeAnchor = anchor as? ARPlaneAnchor {
                let anchorEntity = AnchorEntity(anchor: planeAnchor)
                let planeEntity = buildPlaneEntity(planeAnchor: planeAnchor)
                anchorEntity.addChild(planeEntity)
                scene.addAnchor(anchorEntity)
                anchorEntitiesByAnchor[planeAnchor] = anchorEntity
            }
        }
    }
    
    func session(_ session: ARSession, didUpdate anchors: [ARAnchor]) {
        guard showARPlanes else { return }
        for anchor in anchors {
            if let planeAnchor = anchor as? ARPlaneAnchor, let anchorEntity = anchorEntitiesByAnchor[planeAnchor] {
                anchorEntity.children.remove(at: 0)
                anchorEntity.addChild(buildPlaneEntity(planeAnchor: planeAnchor))
            }
        }
    }
    
    func session(_ session: ARSession, didRemove anchors: [ARAnchor]) {
        guard showARPlanes else { return }
        for anchor in anchors {
            if let planeAnchor = anchor as? ARPlaneAnchor, let anchorEntity = anchorEntitiesByAnchor[planeAnchor] {
                scene.removeAnchor(anchorEntity)
                anchorEntitiesByAnchor.removeValue(forKey: planeAnchor)
            }
        }
    }
    
    private func addCoaching() {
        let coachingOverlay = ARCoachingOverlayView()
        coachingOverlay.session = session
        coachingOverlay.autoresizingMask = [.flexibleWidth, .flexibleHeight]
        coachingOverlay.goal = .horizontalPlane
        self.addSubview(coachingOverlay)
    }
    
    private func buildPlaneEntity(planeAnchor: ARPlaneAnchor) -> ModelEntity {
        let geometry = planeAnchor.geometry
        var descriptor = MeshDescriptor(name: "ARPlaneVisualized")
        descriptor.positions = MeshBuffer(geometry.vertices)
        descriptor.primitives = .triangles(geometry.triangleIndices.map { UInt32($0) })
        descriptor.textureCoordinates = MeshBuffer(geometry.textureCoordinates)
        return ModelEntity(mesh: try! .generate(from: [descriptor]), materials: [arPlaneMaterial])
    }
}

The above code configures an AR session to detect horizontal planes. When an AR plane is detected an anchor entity is created from it and a mesh is generated from the plane's geometry. In the next piece of code when a user taps on a detected AR plane a box will be placed on it:

CustomARView.swift:

private var placedBox = false

required init(frame: CGRect) {
    ...
    session.run(config, options: [])

    // Added
    addGestureRecognizer(UITapGestureRecognizer(target: self, action: #selector(handleTap)))
    
    addCoaching()
}

@IBAction func handleTap(_ gestureRecognizer: UITapGestureRecognizer) {
    guard gestureRecognizer.view != nil else { return }
    
    // Carry out the action when the user lifts their finger
    if gestureRecognizer.state == .ended {
        let screenLocation = gestureRecognizer.location(in: self)
        if !placedBox {
            let results = raycast(from: screenLocation, allowing: .existingPlaneInfinite, alignment: .horizontal)
            if results.count > 0, let planeAnchor = results[0].anchor as? ARPlaneAnchor {
                showARPlanes = false
                removePlaneEntities(planeAnchor: planeAnchor)
                addBox(raycastResult: results[0])
            }
        }
    }
}

private func addBox(raycastResult: ARRaycastResult) {
    if let planeAnchor = raycastResult.anchor as? ARPlaneAnchor, let anchorEntity = anchorEntitiesByAnchor[planeAnchor] {
        let box = ModelEntity(mesh: .generateBox(size: 0.1), materials: [SimpleMaterial(color: .red, isMetallic: false)])
        box.position = raycastResult.worldTransform.position
        anchorEntity.addChild(box)
        placedBox = true
    }
}

private func removePlaneEntities(planeAnchor: ARPlaneAnchor) {
    if let currentFrame = session.currentFrame {
        // Remove anchor entities except the one that is passed
        for itPlaneAnchor in currentFrame.anchors {
            if itPlaneAnchor != planeAnchor {
                if let anchorEntity = anchorEntitiesByAnchor[itPlaneAnchor] {
                    scene.removeAnchor(anchorEntity)
                    anchorEntitiesByAnchor.removeValue(forKey: itPlaneAnchor)
                }
            }
        }
        // Remove the visualized plane of the anchor entity that is passed
        if let anchorEntity = anchorEntitiesByAnchor[planeAnchor] {
            anchorEntity.children.remove(at: 0)
        }
    }
}

You can view the diff of the above changes here.

In the addBox() method above the box's position needs to be set to where the ray intersects the AR plane. Unfortunately ARRaycastResult doesn't have a position property so you have to derive it from worldTransform property which is a 4 x 4 matrix. Translation in a 4 x 4 matrix is the first 3 numbers of the fourth column. You can create an extension to make this easier to access.

Extensions.swift:

import simd

extension simd_float4x4 {
    public var position: simd_float3 {
        return [columns.3.x, columns.3.y, columns.3.z]
    }
}

The next task is to change the color of the box when a user taps on it. We need to add a collision shape to the box and then use ARView's hitTest method to do the raycast.

CustomARView.swift:

@IBAction func handleTap(_ gestureRecognizer: UITapGestureRecognizer) {
    guard gestureRecognizer.view != nil else { return }
    
    // Carry out the action when the user lifts their finger
    if gestureRecognizer.state == .ended {
        let screenLocation = gestureRecognizer.location(in: self)
        if !placedBox {
            ...
        } else {
            let hits = hitTest(screenLocation, query: .nearest, mask: .all)
            if hits.count > 0 {
                changeBoxColor(hits[0].entity, color: .blue)
            }
        }
    }
}

private func addBox(raycastResult: ARRaycastResult) {
    if let planeAnchor = raycastResult.anchor as? ARPlaneAnchor, let anchorEntity = anchorEntitiesByAnchor[planeAnchor] {
        let size: Float = 0.1
        let box = ModelEntity(mesh: .generateBox(size: size, cornerRadius: 0), materials: [SimpleMaterial(color: .red, isMetallic: false)])
        box.collision = .init(shapes: [.generateBox(size: [size, size, size])])
        ...
    }
}

private func changeBoxColor(_ entity: Entity, color: UIColor) {
    if let modelEntity = entity as? ModelEntity {
        modelEntity.model?.materials = [SimpleMaterial(color: color, isMetallic: false)]
    }
}

You can view the diff of the above changes here.

Scale, Rotate and Move Entities

To scale, rotate and move an entity requires just one line:

installGestures(for: box)

This creates three entity gestures - EntityRotationGesturesRecognizer, EntityScaleGestureRecognizer and EntityTranslationGestureRecognizer. Rotation and scale require two finger gestures while translation will work with both one and two fingers. You can limit the gestures by passing an array of ARView.EntityGestures:

installGestures([.rotation, .scale], for: box)

You can view the diff of the above changes here and download the completed example app here.