Simulating 1000 flocking fish in Unreal Engine

  ·  27 min read

Introduction #

Simulating the flocking behavior of birds or the schooling of fish is a classic problem in computer graphics. The Boids algorithm, created by Craig Reynolds in 1986, offers a powerful solution. It demonstrates how complex, life-like group motion can emerge from individuals following a few simple rules, without any central leader or complex choreography.

This principle offers a way to add life to game worlds. A common application is simulating a school of fish, but a significant technical challenge arises when scaling up to large numbers. The target here was to effectively simulate a school of 1000 fish, moving as one cohesive unit while still allowing for individual interaction.

The initial implementation used a standard Unreal Engine approach: an AActor_Fish class with the basic boids logic in its Tick function. When spawning 1000 instances, the result was a slideshow. Performance cratered to around 6 FPS.

This post outlines my step-by-step journey, taking that simulation from a crawl to a fluid 70+ FPS. We’ll cover profiling, shifting from a object-oriented setup to a data-oriented design, and leveraging Unreal Engine’s features for parallel processing and GPU acceleration, all while maintaining the crucial gameplay feature of selecting and catching a single fish from the crowd.

For context, the starting point is a common actor-based approach where each fish is an AActor_Fish instance managing its own state. This caused performance issues due to actor ticking overhead and inefficient neighbor searches. Through successive passes, these were addressed, keeping the ability to interact with individual fish intact.

All tests were conducted on a machine with a Ryzen 5950X 16-core CPU, 128GB DDR4-3200 memory, and an RTX 3070 Ti GPU. To establish a fair baseline, measurements were taken in standalone mode in DebugGame configuration with the Rider debugger attached. This provides a conservative estimate. If the simulation performs well under these conditions, it will run even smoother in optimized shipping builds.

Preview

The Gameplay Loop: Catching a Fish in a Sea of Data #

Before covering phases, it’s crucial to understand the core gameplay loop handled by UActorComponent_FishingComponent. This loop coordinates player input, animations, UI, and the fish simulation itself. The main challenge at scale is to perform steps like “Find the nearest fish” and “Control a single fish” without stalling the entire system.

sequenceDiagram participant Player participant FishingComponent participant AnimBP as Animation Blueprint participant FishSim as Fish Simulation participant TargetFish as Targeted Fish Player->>FishingComponent: Hold LMB (OnCastAction) loop Charging Cast FishingComponent->>FishingComponent: Update Decal & Cast Power FishingComponent->>Player: Broadcast UI Progress end Player->>FishingComponent: Release LMB (OnCastActionEnded) FishingComponent->>AnimBP: Play "Throw" Animation Note right of AnimBP: Animation plays... AnimBP-->>FishingComponent: AnimNotify: LaunchBobber Note left of FishingComponent: Bobber flies and lands... FishingComponent->>FishingComponent: OnBobberLandsOnWater() FishingComponent->>FishSim: FindNearestFish(Location) FishSim-->>FishingComponent: Return Fish Index FishingComponent->>FishSim: PromoteBoidToActor(Index) FishSim-->>FishingComponent: Return Spawned Actor (TargetFish) Note over FishingComponent, TargetFish: Bite timer starts... alt Player Reels In Successfully Player->>FishingComponent: Click LMB (After Bite) FishingComponent->>TargetFish: ReeledIn() TargetFish->>TargetFish: Move towards player Player->>FishingComponent: Click LMB again FishingComponent->>TargetFish: Catch() TargetFish->>TargetFish: Attach to rod else Player Reels In Too Early Player->>FishingComponent: Click LMB (Before Bite) FishingComponent->>TargetFish: Escape() TargetFish->>TargetFish: Flee and despawn end

Here’s a breakdown of the steps shown in the diagram:

  1. Charging The Cast: The player holds LMB. The OnCastAction delegate in the FishingComponent calculates the cast distance and updates a target decal on the water.
  2. Casting: Releasing LMB triggers OnCastActionEnded, which tells the animation blueprint to play a “throw” animation.
  3. The Bobber Flies: An anim notify within the throw animation sends a message back to the FishingComponent to launch the bobber projectile.
  4. Finding The Fish: When the bobber lands, OnBobberLandsOnWater is called. This is a critical step: it queries the FishManager (our simulation) to find the nearest fish data point. The manager then “promotes” this data point into a full AActor_Fish for interaction.
  5. The Bite: A timer starts. When it ends, the newly spawned fish actor is “hooked.” Its simulation logic is overridden, and it begins moving toward the bobber.
  6. Reeling In: Clicking LMB after the bite successfully catches the fish. Clicking before the bite causes it to escape, and the actor is “demoted” back into a data point in the simulation.

In code, this uses interfaces like ICatchableInterface, implemented by AActor_Fish. When selected, the fish’s bBeingTargeted flag disables flocking and enables timeline-based movement.

The UActorComponent_FishingComponent manages states, timers, and interactions between the player, rod (ICatcherInterface), and fish.

Here’s the casting handling in UActorComponent_FishingComponent:

 1void UActorComponent_FishingComponent::OnCastAction(const float& InElapsedTime)
 2{
 3    if (CurrentFishingState != FFishingTags::Get().FishingComponent_State_Idling) return;
 4
 5    DetermineCastLocation(InElapsedTime);
 6    AttemptToCast(InitialActorLocation + InitialActorForwardVector * 100.f);
 7
 8    const float Progress = GetMappedElapsedTimeToMaximumCastTime(InElapsedTime);
 9    BroadcastUIMessage(Progress);
10}
11void UActorComponent_FishingComponent::OnCastActionEnded(const float& InElapsedTime)
12{
13    ToggleDecalVisibility(false);
14
15    if (CurrentFishingState != FFishingTags::Get().FishingComponent_State_Idling) return;
16    CurrentFishingState = FFishingTags::Get().FishingComponent_State_Casting;
17
18    UVAGameplayMessagingSubsystem::Get(this).BroadcastMessage(this, FFishingTags::Get().Messaging_Fishing_AnimInstance_StateChange, FFishingTags::Get().AnimInstance_Fishing_State_Casting);
19}

When the bobber lands, it searches for the nearest fish:

1void UActorComponent_FishingComponent::OnBobberLandsOnWater(const FVector& InBobberLandsOnWaterLocation)
2{
3    if (CurrentFishingState != FFishingTags::Get().FishingComponent_State_Casting) return;
4    CurrentFishingState = FFishingTags::Get().FishingComponent_State_WaitingForFish;
5
6    MockBobberLandsOnWaterDelegate.Broadcast(0.f);
7
8    AttemptGetNearestCatchable();
9}

The fish interaction via ICatchableInterface in AActor_Fish:

 1void AActor_Fish::ReeledIn(const FVector& RodLocation)
 2{
 3    bBeingTargeted = true;
 4    Velocity = FVector::ZeroVector;
 5    ReelInLocation = RodLocation;
 6    LookAtReelInRotation = UKismetMathLibrary::FindLookAtRotation(GetActorLocation(), ReelInLocation);
 7
 8    ReeledInTimeline.PlayFromStart();
 9}
10void AActor_Fish::Escape()
11{
12    ReeledInTimeline.Stop();
13
14    Velocity = FVector::ZeroVector;
15    EscapeRotation = UKismetMathLibrary::FindLookAtRotation(GetActorLocation(), InitialActorLocation);
16
17    EscapeTimeline.PlayFromStart();
18}
19void AActor_Fish::Catch(USceneComponent* InCatchingRod)
20{
21    if (!InCatchingRod) return;
22    bBeingTargeted = true;
23
24    Velocity = FVector::ZeroVector;
25
26    AttachToComponent(InCatchingRod, FAttachmentTransformRules::SnapToTargetIncludingScale, NAME_None);
27}

The component mediates the reel-in:

1void UActorComponent_FishingComponent::ReelInCurrentCatchable()
2{
3    if (!CurrentCatchable) return;
4    if (!CurrentCatcher) return;
5    CurrentCatchable->Catch(CurrentCatcher->GetCatcherAttachComponent());
6
7    CurrentFishingState = FFishingTags::Get().FishingComponent_State_ReelingIn;
8    UVAGameplayMessagingSubsystem::Get(this).BroadcastMessage(this, FFishingTags::Get().Messaging_Fishing_AnimInstance_StateChange, FFishingTags::Get().AnimInstance_Fishing_State_ReelingIn);
9}

This setup supports integration between player input, animations, and fish behavior.

What Are Boids, Anyway? #

The Boids algorithm is a classic in the world of artificial life, first presented by Craig Reynolds in 1987 at the SIGGRAPH conference. His paper, “Flocks, Herds, and Schools: A Distributed Behavioral Model,” was revolutionary. Instead of programming a complex central brain to command the flock, Reynolds proposed that intricate, life-like motion could emerge from each individual “boid” (a portmanteau of “bird-oid object”) following three simple, local rules.

These rules only require a boid to consider its immediate neighbors, not the entire flock. This decentralized approach is what makes the simulation so powerful and, as we’ll see, presents its primary performance challenge. The three original rules are:

  1. Separation: Steer to avoid crowding local flockmates.
  2. Alignment: Steer towards the average heading of local flockmates.
  3. Cohesion: Steer to move toward the average position (center of mass) of local flockmates.

When translated into code, these three rules, along with a fourth one for boundary containment, form the core of our simulation logic. In the initial AActor_Fish class, these were applied every frame in Tick function.

The Core Tick Logic (Baseline) #

Each fish processed this logic independently.

 1void AActor_Fish::Tick(float DeltaSeconds)
 2{
 3    Super::Tick(DeltaSeconds);
 4
 5    Flock(DeltaSeconds); // Contains all boids logic
 6    TickTimelines(DeltaSeconds); // For player interaction
 7}
 8void AActor_Fish::Flock(float DeltaSeconds)
 9{
10    if (bBeingTargeted || DeltaSeconds == 0.f) return;
11
12    // 1. Find Neighbors (The bottleneck!)
13    TArray<AActor_Fish*> Neighbors;
14    GetNeighbors(Neighbors);
15
16    // 2. Calculate the total steering force from all rules
17    const FVector SteeringForce = CalculateFlockForce(Neighbors);
18
19    // 3. Apply force to velocity (simple Euler integration)
20    Velocity += SteeringForce * DeltaSeconds;
21    Velocity = Velocity.GetClampedToMaxSize(MaxSpeed);
22
23    // 4. Update the actor's transform
24    const FVector NewLocation = GetActorLocation() + Velocity * DeltaSeconds;
25    const FRotator TargetRotation = Velocity.ToOrientationRotator();
26    const FRotator InterpolatedRotation = FMath::RInterpTo(GetActorRotation(), TargetRotation, DeltaSeconds, 10.0f);
27    SetActorLocationAndRotation(NewLocation, InterpolatedRotation);
28}
29FVector AActor_Fish::CalculateFlockForce(const TArray<AActor_Fish*>& Neighbors) const
30{
31    if (bBeingTargeted) return FVector::ZeroVector;
32
33    const FVector FCohesion = Cohesion(Neighbors) * CohesionWeight;
34    const FVector FSeparation = Separation(Neighbors) * SeparationWeight;
35    const FVector FAlignment = Alignment(Neighbors) * AlignmentWeight;
36    const FVector FBoundary = BoundaryContainment() * ContainmentWeight;
37   
38    FVector TotalForce = FCohesion + FSeparation + FAlignment + FBoundary;
39    return TotalForce.GetClampedToMaxSize(MaxForce);
40}

The three core boids rules were implemented as follows:

 1FVector AActor_Fish::Separation(const TArray<AActor_Fish*>& Neighbors) const
 2{
 3    FVector RepulsionAccumulator = FVector::ZeroVector;
 4
 5    int32 Count = 0;
 6    const float DistSqThreshold = FMath::Square(SeparationDistance);
 7    for (const AActor_Fish* Other : Neighbors)
 8    {
 9        FVector Difference = GetActorLocation() - Other->GetActorLocation();
10        float DistSq = Difference.SizeSquared();
11        if (DistSq > 0 && DistSq < DistSqThreshold)
12        {
13            RepulsionAccumulator += Difference.GetSafeNormal() / FMath::Sqrt(DistSq);
14            Count++;
15        }
16    }
17
18    if (Count > 0)
19    {
20        RepulsionAccumulator /= Count;
21        if (!RepulsionAccumulator.IsNearlyZero()) {
22            FVector TargetVel = RepulsionAccumulator.GetSafeNormal() * MaxSpeed;
23            return Steer(TargetVel);
24        }
25    }
26
27    return FVector::ZeroVector;
28}
29FVector AActor_Fish::Alignment(const TArray<AActor_Fish*>& Neighbors) const
30{
31    FVector AvgVelocity = FVector::ZeroVector;
32    int32 AlignCount = 0;
33
34    for (const AActor_Fish* Fish : Neighbors)
35    {
36        if (Fish->GetFlockGroupID() == GetFlockGroupID())
37        {
38            AvgVelocity += Fish->Velocity;
39            AlignCount++;
40        }
41    }
42
43    if (AlignCount > 0)
44    {
45        AvgVelocity /= AlignCount;
46        return Steer(AvgVelocity);
47    }
48
49    return FVector::ZeroVector;
50}
51FVector AActor_Fish::Cohesion(const TArray<AActor_Fish*>& Neighbors) const
52{
53    FVector CenterOfMass = FVector::ZeroVector;
54    int32 CohesionCount = 0;
55
56    for (const AActor_Fish* Fish : Neighbors)
57    {
58        if (Fish->GetFlockGroupID() == GetFlockGroupID())
59        {
60            CenterOfMass += Fish->GetActorLocation();
61            CohesionCount++;
62        }
63    }
64
65    if (CohesionCount > 0)
66    {
67        CenterOfMass /= CohesionCount;
68        return Steer(CenterOfMass - GetActorLocation());
69    }
70
71    return FVector::ZeroVector;
72}

The Steer function, common to all rules, computes the steering force toward a desired velocity:

 1FVector AActor_Fish::Steer(const FVector& Target) const
 2{
 3    if (MaxSpeed <= 0.0f || MaxForce <= 0.0f) return FVector::ZeroVector;
 4
 5    FVector DesiredVelocity = Target;
 6    if (!DesiredVelocity.IsNearlyZero())
 7    {
 8        DesiredVelocity = DesiredVelocity.GetClampedToSize(0.0f, MaxSpeed);
 9    }
10
11    FVector Steering = DesiredVelocity - Velocity;
12    if (Steering.SizeSquared() <= FMath::Square(MaxForce)) return Steering;
13
14    Steering = Steering.GetClampedToMaxSize(MaxForce);
15    return Steering;
16}

Additionally, a boundary containment rule keeps fish within their spawn area:

 1FVector AActor_Fish::BoundaryContainment() const
 2{
 3    const FVector Location = GetActorLocation();
 4    const FVector BoxMin = ContainingSpawnAreaCenter - ContainingSpawnAreaBoxExtent;
 5    const FVector BoxMax = ContainingSpawnAreaCenter + ContainingSpawnAreaBoxExtent;
 6    FVector Force = FVector::ZeroVector;
 7
 8    auto CalculateAxisRepulsion = [&](float CurrentPos, float MinPos, float MaxPos, float CheckDist, float& OutForceComponent)
 9    {
10        const float DistToMin = CurrentPos - MinPos;
11        const float DistToMax = MaxPos - CurrentPos;
12        float Strength = 0.0f;
13        if (DistToMin < CheckDist)
14        {
15            Strength = 1.0f - (DistToMin / CheckDist);
16            OutForceComponent += FMath::Lerp(0.0f, MaxForce, Strength);
17        }
18        else if (DistToMax < CheckDist)
19        {
20            Strength = 1.0f - (DistToMax / CheckDist);
21            OutForceComponent -= FMath::Lerp(0.0f, MaxForce, Strength);
22        }
23        if (CurrentPos < MinPos || CurrentPos > MaxPos)
24        {
25            OutForceComponent = (CurrentPos < MinPos) ? MaxForce : -MaxForce;
26        }
27    };
28
29    CalculateAxisRepulsion(Location.X, BoxMin.X, BoxMax.X, ContainmentCheckDistance, Force.X);
30    CalculateAxisRepulsion(Location.Y, BoxMin.Y, BoxMax.Y, ContainmentCheckDistance, Force.Y);
31    CalculateAxisRepulsion(Location.Z, BoxMin.Z, BoxMax.Z, ContainmentCheckDistance, Force.Z);
32    return Force;
33}

The logic was functional, but performance needed improvement. The main issue was in GetNeighbors.

The Starting Point: The Actor-Based O(N²) Approach #

The GetNeighbors function was the primary bottleneck.

 1void AActor_Fish::GetNeighbors(TArray<AActor_Fish*>& OutNeighbors) const
 2{
 3    OutNeighbors.Empty();
 4
 5    const UWorld* World = GetWorld();
 6    if (!World) return;
 7
 8    TArray<AActor*> FoundActors;
 9    // The main offender: iterating all actors for every single fish.
10    UGameplayStatics::GetAllActorsOfClass(World, StaticClass(), FoundActors);
11   
12    const FVector CurrentLocation = GetActorLocation();
13    const float RadiusSquared = FMath::Square(NeighborRadius);
14    for (AActor* Actor : FoundActors)
15    {
16        if (Actor == this) continue;
17        AActor_Fish* Fish = Cast<AActor_Fish>(Actor);
18
19        if (!Fish || Fish->bBeingTargeted) continue;
20        if (FVector::DistSquared(CurrentLocation, Fish->GetActorLocation()) > RadiusSquared) continue;
21
22        OutNeighbors.Add(Fish);
23    }
24}

For 1000 fish, this resulted in approximately 1,000,000 iterations per frame due to repeated actor list scans.

Profiling showed GetAllActorsOfClass taking most CPU time, as it scans the full actor list each time. Distance checks contributed to the quadratic O(N²) complexity for neighbor queries.

Baseline Profile

Baseline Profile

The Result: Straightforward to implement, but led to significant performance issues. 6 FPS Max.

Baseline Result

Step 1: Implementing a Spatial Grid #

A spatial hash grid divides space into cells.

This was added in UTickableWorldSubsystem_FishManager.

The manager tracks fish instances.

 1// In AActor_Fish.h
 2virtual void BeginPlay() override;
 3virtual void EndPlay(const EEndPlayReason::Type EndPlayReason) override;
 4// In AActor_Fish.cpp
 5void AActor_Fish::BeginPlay()
 6{
 7    Super::BeginPlay();
 8
 9    if (const UWorld* World = GetWorld())
10    {
11        if (UTickableWorldSubsystem_FishManager* FishManager = World->GetSubsystem<UTickableWorldSubsystem_FishManager>())
12        {
13            FishManager->RegisterFish(this);
14        }
15    }
16}
17void AActor_Fish::EndPlay(const EEndPlayReason::Type EndPlayReason)
18{
19    if (const UWorld* World = GetWorld())
20    {
21        if (UTickableWorldSubsystem_FishManager* FishManager = World->GetSubsystem<UTickableWorldSubsystem_FishManager>())
22        {
23            FishManager->UnregisterFish(this);
24        }
25    }
26
27    Super::EndPlay(EndPlayReason);
28}

The FishManager stores actors and updates the grid in its Tick.

 1UCLASS()
 2class FISHINGFEATURE_API UTickableWorldSubsystem_FishManager : public UTickableWorldSubsystem
 3{
 4GENERATED_BODY()
 5public:
 6virtual void Tick(float DeltaTime) override;
 7// ...
 8void RegisterFish(AActor_Fish* InFishActor);
 9void UnregisterFish(AActor_Fish* InFishActor);
10void FindNeighborsInRadius(const AActor_Fish* InFishActor, int32 InRadius, TArray<AActor_Fish*>& OutNeighbors) const;
11private:
12void UpdateSpatialGrid();
13UPROPERTY(Transient)
14TArray<AActor_Fish*> AllFish;
15TMap<FIntVector, TArray<AActor_Fish*>> SpatialGrid;
16float CellSize = 1000.0f;
17};
 1void UTickableWorldSubsystem_FishManager::Tick(float DeltaTime)
 2{
 3    Super::Tick(DeltaTime);
 4
 5    UpdateSpatialGrid(); // Rebuild the grid every frame with the latest fish positions
 6}
 7void UTickableWorldSubsystem_FishManager::RegisterFish(AActor_Fish* InFishActor)
 8{
 9    if (IsValid(InFishActor)) AllFish.Add(InFishActor);
10}
11void UTickableWorldSubsystem_FishManager::UnregisterFish(AActor_Fish* InFishActor)
12{
13    if (IsValid(InFishActor)) AllFish.Remove(InFishActor);
14}
15void UTickableWorldSubsystem_FishManager::UpdateSpatialGrid()
16{
17    SpatialGrid.Reset();
18
19    for (AActor_Fish* Fish : AllFish)
20    {
21        if (!IsValid(Fish)) continue;
22
23        const FVector& Position = Fish->GetActorLocation();
24        const FIntVector CellCoord(
25            FMath::FloorToInt(Position.X / CellSize),
26            FMath::FloorToInt(Position.Y / CellSize),
27            FMath::FloorToInt(Position.Z / CellSize)
28        );
29        SpatialGrid.FindOrAdd(CellCoord).Add(Fish);
30    }
31}
32void UTickableWorldSubsystem_FishManager::FindNeighborsInRadius(const AActor_Fish* InFishActor, int32 InRadius, TArray<AActor_Fish*>& OutNeighbors) const
33{
34    OutNeighbors.Reset();
35
36    if (!IsValid(InFishActor)) return;
37
38    const FVector& FishLocation = InFishActor->GetActorLocation();
39    const float RadiusSquared = FMath::Square(InRadius);
40    const FIntVector OriginCellCoord(
41        FMath::FloorToInt(FishLocation.X / CellSize),
42        FMath::FloorToInt(FishLocation.Y / CellSize),
43        FMath::FloorToInt(FishLocation.Z / CellSize)
44    );
45
46    // Iterate through the 3x3x3 cube of cells around the origin cell
47    for (int Z = -1; Z <= 1; ++Z)
48    {
49        for (int Y = -1; Y <= 1; ++Y)
50        {
51            for (int X = -1; X <= 1; ++X)
52            {
53                const FIntVector CellToCheck(OriginCellCoord + FIntVector(X, Y, Z));
54                const TArray<AActor_Fish*>* FishesInCell = SpatialGrid.Find(CellToCheck);
55                if (!FishesInCell) continue;
56               
57                for (AActor_Fish* PotentialNeighbor : *FishesInCell)
58                {
59                    if (PotentialNeighbor == InFishActor) continue;
60                    if (FVector::DistSquared(FishLocation, PotentialNeighbor->GetActorLocation()) > RadiusSquared) continue;
61
62                    OutNeighbors.Add(PotentialNeighbor);
63                }
64            }
65        }
66    }
67}

The cell size (1000 units) approximated the neighbor radius, limiting queries to nearby cells (27 in 3D). This reduced average neighbor checks from N to a constant, making the simulation O(N) overall.

In AActor_Fish::GetNeighbors, the GetAllActorsOfClass was replaced with a manager query:

 1void AActor_Fish::GetNeighbors(TArray<AActor_Fish*>& OutNeighbors) const
 2{
 3    OutNeighbors.Empty();
 4
 5    const UWorld* World = GetWorld();
 6    if (!World) return;
 7
 8    UTickableWorldSubsystem_FishManager* FishManagerSubsystem = World->GetSubsystem<UTickableWorldSubsystem_FishManager>();
 9    if (!FishManagerSubsystem) return;
10
11    FishManagerSubsystem->FindNeighborsInRadius(this, NeighborRadius, OutNeighbors);
12}

Profiling indicated reduced time in neighbor searches, with grid rebuild being O(N) and queries localized.

Spatial Grid Profile

Spatial Grid Profile

The Result: Addressed the O(N²) issue. Performance improved to around 14 FPS max, though actor overhead remained.

Spatial Grid Result

Step 2: Centralizing Logic & Array of Structs (AoS) #

Next, actor overhead was reduced by making AActor_Fish handle only targeted states, moving simulation to the FishManager with an Array of Structs (AoS).

 1USTRUCT()
 2struct FFishData
 3{
 4    GENERATED_BODY()
 5   
 6    UPROPERTY(Transient) TObjectPtr<AActor_Fish> Fish = nullptr;
 7    UPROPERTY(Transient) FVector Position = FVector::ZeroVector;
 8    UPROPERTY(Transient) FVector Velocity = FVector::ZeroVector;
 9    UPROPERTY(Transient) int32 FlockGroupID = 0;
10    UPROPERTY(Transient) float MaxSpeed = 0.f;
11    UPROPERTY(Transient) float MaxForce = 0.f;
12    UPROPERTY(Transient) float NeighborRadius = 0.f;
13    UPROPERTY(Transient) float SeparationDistance = 0.f;
14    UPROPERTY(Transient) float CohesionWeight = 0.f;
15    UPROPERTY(Transient) float SeparationWeight = 0.f;
16    UPROPERTY(Transient) float AlignmentWeight = 0.f;
17    UPROPERTY(Transient) float ContainmentWeight = 0.f;
18    UPROPERTY(Transient) float ContainmentCheckDistance = 0.f;
19    UPROPERTY(Transient) FVector SpawnAreaCenter = FVector::ZeroVector;
20    UPROPERTY(Transient) FVector SpawnAreaBoxExtent = FVector::ZeroVector;
21    UPROPERTY(Transient) int32 ID = INDEX_NONE;
22};
23// In the manager class:
24UPROPERTY(Transient) TArray<FFishData> AllFish;

AActor_Fish disables ticking by default, enabling it only when targeted:

 1AActor_Fish::AActor_Fish()
 2{
 3    PrimaryActorTick.bCanEverTick = true;
 4    PrimaryActorTick.bStartWithTickEnabled = false;
 5}
 6void AActor_Fish::ReeledIn(const FVector& RodLocation)
 7{
 8    SetActorTickEnabled(true);
 9
10    bBeingTargeted = true;
11    Velocity = FVector::ZeroVector;
12    ReelInLocation = RodLocation;
13    LookAtReelInRotation = (RodLocation - GetActorLocation()).Rotation();
14    ReeledInTimeline.PlayFromStart();
15}
16void AActor_Fish::Escape()
17{
18    SetActorTickEnabled(true);
19
20    ReeledInTimeline.Stop();
21    Velocity = FVector::ZeroVector;
22    EscapeRotation = (InitialActorLocation - GetActorLocation()).Rotation();
23    EscapeTimeline.PlayFromStart();
24}

The Tick in AActor_Fish handles only timelines:

1void AActor_Fish::Tick(float DeltaSeconds)
2{
3    Super::Tick(DeltaSeconds);
4
5    TickTimelines(DeltaSeconds);
6}

The manager’s Tick simulates data first, then applies to actors.

 1void UTickableWorldSubsystem_FishManager::Tick(float DeltaTime)
 2{
 3    if (AllFish.Num() == 0) return;
 4
 5    UpdateSpatialGrid(); // Grid now maps to TArray<int32> (indices into AllFish)
 6
 7    TArray<FVector> NextVelocities, NextPositions;
 8    NextVelocities.SetNumUninitialized(AllFish.Num());
 9    NextPositions.SetNumUninitialized(AllFish.Num());
10
11    // Phase 1: SIMULATE (Single-threaded)
12    for (int32 i = 0; i < AllFish.Num(); ++i)
13    {
14        const FFishData& CurrentFish = AllFish[i];
15        if (!IsValid(CurrentFish.Fish) || CurrentFish.Fish->IsBeingTargeted())
16        {
17            NextVelocities[i] = CurrentFish.Velocity;
18            NextPositions[i] = CurrentFish.Position;
19            continue;
20        }
21
22        TArray<int32> NeighborIndices; // Indices into the AllFish array
23        FindNeighborsInRadius(i, CurrentFish.NeighborRadius, NeighborIndices);
24       
25        const FVector FCohesion = Cohesion(i, NeighborIndices) * CurrentFish.CohesionWeight;
26        const FVector FSeparation = Separation(i, NeighborIndices) * CurrentFish.SeparationWeight;
27        const FVector FAlignment = Alignment(i, NeighborIndices) * CurrentFish.AlignmentWeight;
28        const FVector FBoundary = BoundaryContainment(CurrentFish) * CurrentFish.ContainmentWeight;
29        const FVector TotalForce = FCohesion + FSeparation + FAlignment + FBoundary;
30       
31        FVector NewVelocity = CurrentFish.Velocity + TotalForce * DeltaTime;
32        NewVelocity = NewVelocity.GetClampedToMaxSize(CurrentFish.MaxSpeed);
33       
34        NextVelocities[i] = NewVelocity;
35        NextPositions[i] = CurrentFish.Position + NewVelocity * DeltaTime;
36    }
37
38    // Phase 2: COMMIT & APPLY
39    for (int32 i = 0; i < AllFish.Num(); ++i)
40    {
41        AllFish[i].Position = NextPositions[i];
42        AllFish[i].Velocity = NextVelocities[i];
43
44        if (AActor_Fish* FishActor = AllFish[i].Fish)
45        {
46             if (!FishActor->IsBeingTargeted())
47             {
48                 const FRotator NewRotation = AllFish[i].Velocity.ToOrientationRotator();
49                 FishActor->SetActorLocationAndRotation(AllFish[i].Position, NewRotation, false, nullptr, ETeleportType::TeleportPhysics);
50             }
51        }
52    }
53}

The boids rules were adapted to use indices:

 1FVector UTickableWorldSubsystem_FishManager::Cohesion(const int32 InFishIndex, const TArray<int32>& InNeighborIndices) const
 2{
 3    FVector CenterOfMass = FVector::ZeroVector;
 4    int32 CohesionCount = 0;
 5    const int32 CurrentGroupID = AllFish[InFishIndex].FlockGroupID;
 6
 7    for (const int32 NeighborIndex : InNeighborIndices)
 8    {
 9        if (AllFish[NeighborIndex].FlockGroupID != CurrentGroupID) continue;
10
11        CenterOfMass += AllFish[NeighborIndex].Position;
12        CohesionCount++;
13    }
14
15    if (CohesionCount == 0) return FVector::ZeroVector;
16
17    CenterOfMass /= CohesionCount;
18    FVector Desired = CenterOfMass - AllFish[InFishIndex].Position;
19    return Steer(AllFish[InFishIndex].Velocity, Desired, AllFish[InFishIndex].MaxForce);
20}

Similar adaptations for other rules.

Centralizing logic reduced per-actor overhead, as actors avoided flocking calculations unless targeted. AoS kept data contiguous for better cache access.

Profiling showed less time in actor ticking, with the simulation loop as the main focus.

Centralizing Logic & AoS Profile

The Result: Improved architecture and reduced overhead, reaching around 22 FPS max.

Centralizing Logic & AoS Result

Step 3: Going Parallel with Structure of Arrays (SoA) #

For parallelism, data was restructured to Structure of Arrays (SoA) for cache efficiency.

 1// FROM: TArray<FFishData> AllFish;
 2// TO: Individual TArrays for each property
 3UPROPERTY(Transient) TArray<AActor_Fish*> FishActors;
 4UPROPERTY(Transient) TMap<AActor_Fish*, int32> FishActorToIndexMap;
 5TMap<FIntVector, TArray<int32>> SpatialGrid;
 6int32 FishCount = 0;
 7TArray<FVector> Positions;
 8TArray<FVector> Velocities;
 9TArray<int32> FlockGroupIDs;
10TArray<float> MaxSpeeds;
11TArray<float> MaxForces;
12TArray<float> NeighborRadii;
13TArray<float> SeparationDistances;
14TArray<float> CohesionWeights;
15TArray<float> SeparationWeights;
16TArray<float> AlignmentWeights;
17TArray<float> ContainmentWeights;
18TArray<float> ContainmentCheckDistances;
19TArray<FVector> SpawnAreaCenters;
20TArray<FVector> SpawnAreaBoxExtents;

SoA enhances locality, as accessing one property loads contiguous data.

Registration maps actors to array indices:

 1void UTickableWorldSubsystem_FishManager::RegisterFish(AActor_Fish* InFishActor)
 2{
 3    if (!IsValid(InFishActor) || FishActorToIndexMap.Contains(InFishActor)) return;
 4    const int32 NewIndex = FishCount++;
 5
 6    FishActors.Add(InFishActor);
 7    Positions.Add(InFishActor->GetActorLocation());
 8    Velocities.Add(InFishActor->GetActorForwardVector() * InFishActor->GetMaxSpeed() * 0.5f);
 9    FlockGroupIDs.Add(InFishActor->GetFlockGroupID());
10    MaxSpeeds.Add(InFishActor->GetMaxSpeed());
11    MaxForces.Add(InFishActor->GetMaxForce());
12    NeighborRadii.Add(InFishActor->GetNeighborRadius());
13    SeparationDistances.Add(InFishActor->GetSeparationDistance());
14    CohesionWeights.Add(InFishActor->GetCohesionWeight());
15    SeparationWeights.Add(InFishActor->GetSeparationWeight());
16    AlignmentWeights.Add(InFishActor->GetAlignmentWeight());
17    ContainmentWeights.Add(InFishActor->GetContainmentWeight());
18    ContainmentCheckDistances.Add(InFishActor->GetContainmentCheckDistance());
19    SpawnAreaCenters.Add(InFishActor->GetContainingSpawnAreaCenter());
20    SpawnAreaBoxExtents.Add(InFishActor->GetContainingSpawnAreaBoxExtent());
21    FishActorToIndexMap.Add(InFishActor, NewIndex);
22}

Thread Safety with ParallelFor #

ParallelFor requires separating reads and writes to avoid races. The loop reads from current arrays and writes to separate next arrays. Updates are committed serially.

 1void UTickableWorldSubsystem_FishManager::Tick(float DeltaTime)
 2{
 3    if (FishCount == 0) return;
 4
 5    UpdateSpatialGrid();
 6   
 7    TArray<FVector> NextVelocities, NextPositions;
 8    NextVelocities.SetNumUninitialized(FishCount);
 9    NextPositions.SetNumUninitialized(FishCount);
10
11    // Phase 1: PARALLEL SIMULATION
12    ParallelFor(FishCount, [&](int32 i)
13    {
14        if (!IsValid(FishActors[i]) || FishActors[i]->IsBeingTargeted())
15        {
16            NextVelocities[i] = Velocities[i];
17            NextPositions[i] = Positions[i];
18            return;
19        }
20       
21        TArray<int32> NeighborIndices;
22        FindNeighborsInRadius(i, NeighborRadii[i], NeighborIndices);
23        const FVector FCohesion = Cohesion(i, NeighborIndices) * CohesionWeights[i];
24        const FVector FSeparation = Separation(i, NeighborIndices) * SeparationWeights[i];
25        const FVector FAlignment = Alignment(i, NeighborIndices) * AlignmentWeights[i];
26        const FVector FBoundary = BoundaryContainment(i) * ContainmentWeights[i];
27        const FVector TotalForce = FCohesion + FSeparation + FAlignment + FBoundary;
28       
29        FVector NewVelocity = Velocities[i] + TotalForce * DeltaTime;
30        NewVelocity = NewVelocity.GetClampedToMaxSize(MaxSpeeds[i]);
31       
32        NextVelocities[i] = NewVelocity;
33        NextPositions[i] = Positions[i] + NewVelocity * DeltaTime;
34    });
35
36    // Phase 2: SERIAL COMMIT (main thread only)
37    Positions = MoveTemp(NextPositions);
38    Velocities = MoveTemp(NextVelocities);
39   
40    // Phase 3: SERIAL ACTOR UPDATE (main thread only)
41    for (int32 i = 0; i < FishCount; ++i)
42    {
43        AActor_Fish* FishActor = FishActors[i];
44        if (FishActor && !FishActor->IsBeingTargeted())
45        {
46            const FRotator NewRotation = Velocities[i].ToOrientationRotator();
47            FishActor->SetActorLocationAndRotation(Positions[i], NewRotation, false, nullptr, ETeleportType::TeleportPhysics);
48        }
49    }
50}

Rules access SoA directly for cache benefits.

Profiling confirmed better multi-core use, reducing frame time.

SoA & Parallelization Profile

SoA & Parallelization Profile

The Result: Multi-core utilization improved performance to around 25 FPS max.

SoA & Parallelization Result

A jump from 22 to 25 FPS might seem modest for a 16-core CPU. This is likely a sign of bottleneck shifting: the CPU work was now efficient enough to reveal that the true bottleneck lay elsewhere.

The overall frame rate was being limited by the GPU, which still had to process 1,000 individual draw calls. This confirmed that the next and most critical optimization had to be on the rendering side.

Step 4: ISM Rendering #

The CPU was efficient, but GPU draw calls for 1000 actors were a bottleneck. UInstancedStaticMeshComponent (ISM) addressed this. By default, no AActor_Fish are spawned.

Actor_FishSpawnArea provides assets and data to the FishManager without spawning actors.

 1void AActor_FishSpawnArea::OnFishSpawnAssetLoaded()
 2{
 3    UObject* LoadedAsset = FishSpawnAssetHandle.Get()->GetLoadedAsset();
 4    UClass* LoadedAssetAsClass = Cast<UClass>(LoadedAsset);
 5    if (!SpawnAreaBox)
 6    {
 7        UE_LOG(LogFishingFeature, Error, TEXT("Spawn Area Box is not valid, this should not happen. Won't continue spawning fish..."));
 8        return;
 9    }
10
11    if (!FishSpawnAreaConfigData)
12    {
13        UE_LOG(LogFishingFeature, Error, TEXT("Fish Spawn Area Config Data is not set, are you sure you have a valid data asset set? Won't continue spawning fish..."));
14        return;
15    }
16
17    const FFishSpawnAreaConfig FishSpawnAreaConfig = FishSpawnAreaConfigData->GetFishSpawnAreaConfig();
18    
19    // We get the visual assets from the default object (CDO) of the Fish class
20    AActor_Fish* FishCDO = LoadedFishClass->GetDefaultObject<AActor_Fish>();
21    UStaticMeshComponent* MeshComponentCDO = FishCDO->FindComponentByClass<UStaticMeshComponent>();
22    if (!MeshComponentCDO || !MeshComponentCDO->GetStaticMesh()) return;
23   
24    UStaticMesh* FishMesh = MeshComponentCDO->GetStaticMesh();
25    UMaterialInterface* FishMaterial = MeshComponentCDO->GetMaterial(0);
26
27    // Give the manager the assets it needs to set up its ISM component
28    FishManager->SetFishAssets(LoadedFishClass, FishMesh, FishMaterial);
29
30    // Get spawn parameters from our config data asset
31    const FFishSpawnAreaConfig FishSpawnAreaConfig = FishSpawnAreaConfigData->GetFishSpawnAreaConfig();
32    const int32 FishSpawnAmount = FishSpawnAreaConfig.FishSpawnAmount;
33   
34    const int32 NumFlockGroups = FMath::Max(1, FishSpawnAreaConfig.NumberOfFlockGroups);
35    TArray<int32> GroupIDsToAssign;
36    GroupIDsToAssign.Reserve(FishSpawnAmount);
37   
38    const int32 BaseSize = FishSpawnAmount / NumFlockGroups;
39    int32 Remainder = FishSpawnAmount % NumFlockGroups;
40   
41    for (int32 GroupIndex = 0; GroupIndex < NumFlockGroups; ++GroupIndex)
42    {
43        int32 CurrentGroupSize = BaseSize + (Remainder > 0 ? 1 : 0);
44       
45        for (int32 i = 0; i < CurrentGroupSize; ++i)
46        {
47            GroupIDsToAssign.Add(GroupIndex);
48        }
49        if (Remainder <= 0) continue;
50        Remainder--;
51    }
52
53    if (GroupIDsToAssign.Num() == FishSpawnAmount)
54    {
55        const int32 LastIndex = GroupIDsToAssign.Num() - 1;
56        for (int32 i = 0; i <= LastIndex; ++i)
57        {
58            const int32 RandIndex = FMath::RandRange(i, LastIndex);
59            GroupIDsToAssign.Swap(i, RandIndex);
60        }
61    }
62
63    // Now, instead of spawning actors, we just add pure data to the manager
64    const UDataAsset_ActorFishConfig* ActorFishConfigData = FishCDO->GetActorFishConfigData();
65    if (ActorFishConfigData)
66    {
67        const FActorFishConfig FishConfig = ActorFishConfigData->GetActorFishConfig();
68
69        FVector Min, Max;
70        MeshComponentCDO->GetLocalBounds(Min, Max);
71        const float FishLength = (Max - Min).GetMax();
72
73        for (int32 i = 0; i < FishSpawnAmount; ++i)
74        {
75            const FVector RandomLocation = UKismetMathLibrary::RandomPointInBoundingBox(CenterLocation, BoxExtent);
76            const int32 GroupID = (i < GroupIDsToAssign.Num()) ? GroupIDsToAssign[i] : 0;
77
78            FishManager->AddFishData(RandomLocation, GroupID, FishConfig, CenterLocation, BoxExtent, FishSpawnAmount, FishLength);
79        }
80    }
81}

In FishManager, SetFishAssets sets up the component, and AddFishData populates arrays and instances.

 1void UTickableWorldSubsystem_FishManager::OnWorldBeginPlay(UWorld& InWorld)
 2{
 3    Super::OnWorldBeginPlay(InWorld);
 4
 5    // We need an actor in the world to own our component
 6    ISMOwner = InWorld.SpawnActor<AActor>();
 7    bIsInitialized = true;
 8}
 9void UTickableWorldSubsystem_FishManager::SetFishAssets(TSubclassOf<AActor_Fish> InFishClass, UStaticMesh* InMesh, UMaterialInterface* InMaterial)
10{
11    if (!bIsInitialized || !ISMOwner || FishISMComponent) return;
12
13    FishActorClass = InFishClass; // Store the class we'll need to spawn for promotion
14    FishISMComponent = NewObject<UInstancedStaticMeshComponent>(ISMOwner);
15    FishISMComponent->RegisterComponent();
16    FishISMComponent->SetStaticMesh(InMesh);
17
18    if (InMaterial) FishISMComponent->SetMaterial(0, InMaterial);
19    FishISMComponent->SetCollisionEnabled(ECollisionEnabled::NoCollision);
20
21    ISMOwner->AddInstanceComponent(FishISMComponent);
22}
23void UTickableWorldSubsystem_FishManager::AddFishData(const FVector& InPosition, int32 InFlockGroupID, const FActorFishConfig& FishConfig, const FVector& InSpawnCenter, const FVector& InSpawnExtent, int32 TotalFishInSim, float FishLength)
24{
25    if (!FishISMComponent) return;
26   
27    // OPTIMIZATION: Add to the initial buffer [0] for our double-buffered arrays
28    Positions[0].Add(InPosition);
29    Velocities[0].Add(FMath::VRand().GetSafeNormal() * FishConfig.MaxSpeed * 0.5f);
30   
31    FlockGroupIDs.Add(InFlockGroupID);
32    MaxSpeeds.Add(FishConfig.MaxSpeed);
33    MaxForces.Add(FishConfig.MaxForce);
34   
35    NeighborRadii.Add(FishConfig.NeighborRadius);
36    SeparationDistances.Add(FishConfig.SeparationDistance);
37   
38    CohesionWeights.Add(FishConfig.CohesionWeight);
39    SeparationWeights.Add(FishConfig.SeparationWeight);
40    AlignmentWeights.Add(FishConfig.AlignmentWeight);
41    ContainmentWeights.Add(FishConfig.ContainmentWeight);
42   
43    ContainmentCheckDistances.Add(FishConfig.ContainmentCheckDistance);
44   
45    SpawnAreaCenters.Add(InSpawnCenter);
46    SpawnAreaBoxExtents.Add(InSpawnExtent);
47
48    // BEHAVIOR: Add data for wandering
49    WanderVectors.Add(FMath::VRand());
50
51    WanderStrengths.Add(1.5f);
52    WanderJitters.Add(10.0f);
53
54    // OPTIMIZATION: Keep track of the largest neighbor radius to set an optimal grid cell size
55    MaxNeighborRadius = FMath::Max(MaxNeighborRadius, FishConfig.NeighborRadius);
56    SpatialGridCellSize = MaxNeighborRadius > 0.f ? MaxNeighborRadius : 1000.f;
57    IsBoidPromoted.Add(false);
58
59    // Add a visual instance for this new fish at its starting location
60    const FTransform InitialTransform(InPosition);
61    FishISMComponent->AddInstance(InitialTransform);
62    FishCount++;
63}

The Tick updates ISM transforms in batch:

 1void UTickableWorldSubsystem_FishManager::Tick(float DeltaTime)
 2{
 3    // ... simulation code ...
 4    TArray<FTransform> CurrentTransforms;
 5    CurrentTransforms.Reserve(FishCount);
 6    for (int32 i = 0; i < FishCount; ++i)
 7    {
 8        if (IsBoidPromoted[i]) continue; // Skip promoted ones, as they are actors
 9        CurrentTransforms.Add(FTransform(Velocities[i].ToOrientationRotator(), Positions[i]));
10    }
11
12    if (CurrentTransforms.Num() > 0)
13        FishISMComponent->BatchUpdateInstancesTransforms(0, CurrentTransforms, true, true);
14}

The promote/demote pattern handles interactions by swapping instances for actors.

When finding nearest, query index and promote:

 1bool UTickableWorldSubsystem_FishManager::FindNearestBoid(const FVector& Location, float Radius, int32& OutBoidIndex)
 2{
 3    OutBoidIndex = INDEX_NONE;
 4    float MinDistSquared = FMath::Square(Radius);
 5    for (int32 i = 0; i < FishCount; ++i)
 6    {
 7        if (IsBoidPromoted[i]) continue;
 8
 9        float DistSquared = FVector::DistSquared(Location, Positions[i]);
10        if (DistSquared < MinDistSquared)
11        {
12            MinDistSquared = DistSquared;
13            OutBoidIndex = i;
14        }
15    }
16
17    return OutBoidIndex != INDEX_NONE;
18}
19AActor_Fish* UTickableWorldSubsystem_FishManager::PromoteBoidToActor(int32 BoidIndex)
20{
21    if (BoidIndex < 0 || BoidIndex >= FishCount || IsBoidPromoted[BoidIndex] || !FishActorClass) return nullptr;
22
23    UWorld* World = GetWorld();
24    if (!World) return nullptr;
25
26    FActorSpawnParameters SpawnParams;
27    SpawnParams.SpawnCollisionHandlingOverride = ESpawnActorCollisionHandlingMethod::AlwaysSpawn;
28    AActor_Fish* NewFishActor = World->SpawnActor<AActor_Fish>(FishActorClass, Positions[BoidIndex], Velocities[BoidIndex].ToOrientationRotator(), SpawnParams);
29
30    if (!NewFishActor) return nullptr;
31
32    // Sync state from boid to actor
33    NewFishActor->SetSpawnAreaCenterAndExtent(SpawnAreaCenters[BoidIndex], SpawnAreaBoxExtents[BoidIndex]);
34    NewFishActor->SetFlockGroupID(FlockGroupIDs[BoidIndex]);
35    NewFishActor->SetTotalFishInSimulation(FishCount);
36    NewFishActor->SetBoidIndex(BoidIndex, this); // So actor knows its origin
37
38    // Hide the instance
39    FTransform HiddenTransform = FTransform::Identity;
40    HiddenTransform.SetScale3D(FVector::ZeroVector);
41    FishISMComponent->UpdateInstanceTransform(BoidIndex, HiddenTransform, false, true);
42    IsBoidPromoted[BoidIndex] = true;
43    PromotedFishActors.Add(BoidIndex, NewFishActor);
44    ActorToBoidIndexMap.Add(NewFishActor, BoidIndex);
45
46    return NewFishActor;
47}

Demotion hides the actor and restores the instance.

 1void UTickableWorldSubsystem_FishManager::DemoteActorToBoid(int32 BoidIndex)
 2{
 3    TObjectPtr<AActor_Fish>* ActorPtr = PromotedFishActors.Find(BoidIndex);
 4    if (!ActorPtr || !*ActorPtr) return;
 5
 6    AActor_Fish* FishActor = *ActorPtr;
 7    
 8    // Sync state back if needed
 9    Positions[BoidIndex] = FishActor->GetActorLocation();
10    Velocities[BoidIndex] = FishActor->GetVelocity();
11
12    // Restore the instance
13    FTransform RestoreTransform(Velocities[BoidIndex].ToOrientationRotator(), Positions[BoidIndex]);
14    FishISMComponent->UpdateInstanceTransform(BoidIndex, RestoreTransform, false, true);
15
16    // Destroy the actor
17    FishActor->Destroy();
18    IsBoidPromoted[BoidIndex] = false;
19    PromotedFishActors.Remove(BoidIndex);
20    ActorToBoidIndexMap.Remove(FishActor);
21}

In AActor_Fish, track for demotion:

 1void AActor_Fish::SetBoidIndex(int32 InBoidIndex, UTickableWorldSubsystem_FishManager* InManager)
 2{
 3    BoidIndex = InBoidIndex;
 4    ManagerPtr = InManager;
 5}
 6
 7void AActor_Fish::EndPlay(const EEndPlayReason::Type EndPlayReason)
 8{
 9    if (ManagerPtr.IsValid() && BoidIndex != INDEX_NONE)
10    {
11        ManagerPtr->DemoteActorToBoid(BoidIndex);
12    }
13
14    Super::EndPlay(EndPlayReason);
15}

ISM batches draws, reducing GPU load from 1000 meshes to one. Only promoted fish are actors.

Profiling showed significant GPU time reduction via instancing.

ISM Rendering Profile

The Result: Resolved GPU issues. Around 71 FPS max.

ISM Rendering Result

Step 5: Buffering and Tweaks for Efficiency #

The final refinements focused on making the parallel simulation loop more optimized.

  1. Implementing Double Buffering for Parallel Safety The previous step used temporary arrays for the next frame’s data, which were then moved back into the main arrays using MoveTemp. While MoveTemp is extremely efficient, a more robust pattern for parallel processing is double buffering.

Instead of creating temporary arrays each frame, two persistent sets of arrays are maintained for positions and velocities. In any given frame, one set acts as the immutable read buffer, while the other serves as the write buffer.

 1// Double-buffering for data that changes each frame.
 2TArray<FVector> Positions[2];
 3TArray<FVector> Velocities[2];
 4int32 CurrentBufferIndex = 0; // 0 is read, 1 is write (or vice-versa)
 5
 6// Data for wandering behavior
 7TArray<FVector> WanderVectors;
 8TArray<float> WanderStrengths;
 9TArray<float> WanderJitters;
10
11// Dynamically sized spatial grid
12float SpatialGridCellSize = 1000.f;
13float MaxNeighborRadius = 0.f;
14
15// State tracking for Promote/Demote
16TArray<bool> IsBoidPromoted;
17UPROPERTY() TMap<int32, TObjectPtr<AActor_Fish>> PromotedFishActors;
18UPROPERTY() TMap<TObjectPtr<AActor_Fish>, int32> ActorToBoidIndexMap;
19
20// ... other SoA data remains the same ...

This approach has two key advantages:

  • Guaranteed Thread Safety: It provides a clean, formal separation of read and write data. The ParallelFor loop can safely read from the ReadIndex buffer knowing it will not be modified, while concurrently writing results to the WriteIndex buffer. This is a safe way to prevent data race when doing parallelism.
  • Eliminates Mid-Frame Allocations: Since both buffers are pre-allocated and persist, it guarantees zero memory reallocations during the Tick function. This eliminates a potential source of performance hitches, making frame times more stable and predictable.

At the end of the tick, “flipping” the buffers is a virtually zero-cost operation. It’s just a single integer assignment.

In AddFishData, set cell size dynamically:

1MaxNeighborRadius = FMath::Max(MaxNeighborRadius, FishConfig.NeighborRadius);
2SpatialGridCellSize = MaxNeighborRadius > 0.f ? MaxNeighborRadius : 1000.f;

This optimizes grid for different radii.

  1. Accumulating Desired Velocities: Rules return desired velocity, steered once.
1FVector UTickableWorldSubsystem_FishManager::Cohesion(int32 FishIndex, const TArray<int32>& NeighborIndices) const
2{
3    // ... calculate CenterOfMass ...
4    if (CohesionCount == 0) return FVector::ZeroVector;
5
6    CenterOfMass /= CohesionCount;
7    return (CenterOfMass - Positions[CurrentBufferIndex][FishIndex]).GetSafeNormal() * MaxSpeeds[FishIndex];
8}

Accumulate and steer:

 1const FVector WeightedTotal =
 2    (DesiredCohesion * CohesionWeights[i]) +
 3    (DesiredSeparation * SeparationWeights[i]) +
 4    (DesiredAlignment * AlignmentWeights[i]) +
 5    (DesiredBoundary * ContainmentWeights[i]) +
 6    (DesiredWander * WanderStrengths[i]);
 7FVector TotalForce = FVector::ZeroVector;
 8if (!WeightedTotal.IsNearlyZero())
 9{
10    FVector DesiredVelocity = WeightedTotal.GetSafeNormal() * MaxSpeeds[i];
11    TotalForce = Steer(CurrentVelocity, DesiredVelocity, MaxForces[i]);
12}

This minimizes operations.

Added wandering for natural movement:

1WanderVectors[i] += FMath::VRand() * WanderJitters[i] * DeltaTime;
2WanderVectors[i].Normalize();
3const FVector DesiredWander = WanderVectors[i] * MaxSpeeds[i];

Final Tick:

 1void UTickableWorldSubsystem_FishManager::Tick(float DeltaTime)
 2{
 3    if (FishCount == 0 || !FishISMComponent) return;
 4
 5    const int32 ReadIndex = CurrentBufferIndex;
 6    const int32 WriteIndex = (CurrentBufferIndex + 1) % 2;
 7
 8    UpdateSpatialGrid(); // Reads from [ReadIndex]
 9
10    Positions[WriteIndex].SetNumUninitialized(FishCount);
11    Velocities[WriteIndex].SetNumUninitialized(FishCount);
12
13    ParallelFor(FishCount, [&](int32 i)
14    {
15        if (IsBoidPromoted[i])
16        {
17            Velocities[WriteIndex][i] = Velocities[ReadIndex][i];
18            Positions[WriteIndex][i] = Positions[ReadIndex][i];
19            return;
20        }
21
22        const FVector CurrentPosition = Positions[ReadIndex][i];
23        const FVector CurrentVelocity = Velocities[ReadIndex][i];
24       
25        TArray<int32> NeighborIndices;
26        FindNeighborsInRadius(i, NeighborRadii[i], NeighborIndices);
27
28        const FVector DesiredCohesion = Cohesion(i, NeighborIndices);
29        const FVector DesiredSeparation = Separation(i, NeighborIndices);
30        const FVector DesiredAlignment = Alignment(i, NeighborIndices);
31        const FVector DesiredBoundary = BoundaryContainment(i);
32
33        WanderVectors[i] += FMath::VRand() * WanderJitters[i] * DeltaTime;
34        WanderVectors[i].Normalize();
35
36        const FVector DesiredWander = WanderVectors[i] * MaxSpeeds[i];
37        const FVector WeightedTotal =
38            (DesiredCohesion * CohesionWeights[i]) + (DesiredSeparation * SeparationWeights[i]) +
39            (DesiredAlignment * AlignmentWeights[i]) + (DesiredBoundary * ContainmentWeights[i]) +
40            (DesiredWander * WanderStrengths[i]);
41        FVector TotalForce = FVector::ZeroVector;
42        
43        if (!WeightedTotal.IsNearlyZero())
44        {
45            FVector DesiredVelocity = WeightedTotal.GetSafeNormal() * MaxSpeeds[i];
46            TotalForce = Steer(CurrentVelocity, DesiredVelocity, MaxForces[i]);
47        }
48       
49        FVector NewVelocity = CurrentVelocity + TotalForce * DeltaTime;
50        Velocities[WriteIndex][i] = NewVelocity.GetClampedToMaxSize(MaxSpeeds[i]);
51        Positions[WriteIndex][i] = CurrentPosition + Velocities[WriteIndex][i] * DeltaTime;
52    });
53
54    CurrentBufferIndex = WriteIndex; // Flip buffers - zero cost!
55   
56    TArray<FTransform> CurrentTransforms;
57    CurrentTransforms.Reserve(FishCount);
58    for (int32 i = 0; i < FishCount; ++i)
59    {
60        if (IsBoidPromoted[i])
61            CurrentTransforms.Add(FTransform(FQuat::Identity, Positions[CurrentBufferIndex][i], FVector::ZeroVector));
62        else
63            CurrentTransforms.Add(FTransform(Velocities[CurrentBufferIndex][i].ToOrientationRotator(), Positions[CurrentBufferIndex][i]));
64    }
65   
66    if (CurrentTransforms.Num() > 0)
67        FishISMComponent->BatchUpdateInstancesTransforms(0, CurrentTransforms, true, true);
68}

Profiling showed low overhead and stable performance.

Double-buffer & accumulating desired movement profile

The Result: Further refinements brought performance to around 74 FPS max.

Double-buffer & accumulating desired movement result

What’s Next? Part 2: Graduating to Mass? #

This system aligns with Unreal Engine’s Mass Entity Component System. Refactoring to Mass could provide further gains and integration. More on that in a future update, coming soon™

Conclusion #

Improving from 6 FPS to over 70 FPS demonstrates an optimization approach: start basic, profile, and address bottlenecks systematically, shifting from object-oriented to adhere more to data-oriented design principles.

StepTechniqueKey Idea
StartNaive ActorsSimple, but O(N²) neighbor search impacted performance.
1Spatial GridReplaced full searches with localized lookups.
2AoS & CentralizationMoved logic to manager, reduced actor ticking.
3SoA & ParallelizationRestructured data for multi-core use.
4ISM RenderingMajor gain. Used instances for GPU efficiency.
5Final PolishBuffering and tweaks for efficiency.

For large-scale simulations, this journey from 6 to over 70 FPS demonstrates that data-oriented approaches leveraging engine capabilities are a highly effective strategy.

This has been a deep dive into a complex optimization process. Feedback is always welcome if this kind of detailed, code-heavy breakdown is valuable.

Thank you for reading. Hopefully, this detailed writeup proves useful in your own projects. See you next time!