We propose a comprehensive trajectory prediction dataset A-to-X that consists of a representative set of trajectories, which will enable better generalization under realistic circumstances that are either complex or unsafe and out-of-distribution (OOD) with respect to current datasets.

In order to understand what the shortcomings of current datasets are, we first taxonomize the characteristics of human trajectories. The TrajNet++ benchmark proposed an initial taxonomy that only considers short-term characteristics, e.g., standing still, moving linearly, or avoiding collisions (Fig. 1). While the original taxonomy is sufficient for describing the trajectories in many real datasets and their agent-to-agent (A-to-A or A2A) interactions, models that learn exclusively from these types are insufficient for most applications, which consider environments that have non-navigable regions and time frames longer than 5 seconds, which is the practical limit for most models before they become exponentially erroneous. We have improved upon this by considering long-term characteristics (Fig. 1), i.e., pathfinding alone and navigating through crowded bottlenecks.

These types of trajectories emerge from agent-to-environment (A-to-E or A2E) interactions, which unfold over a longer time frame than A2A interactions and are essential for navigation within any environment.

alt text

Fig. 1: Agent-to-Agent and Agent-to-Environment Taxonomy

Agent-to-Agent Interactions

For representing A2A interactions, we make use of each prior dataset as described in Section 2, including ETH, UCY, SDD, CFF, LCAS, WT, and TrajNet++. These datasets feature transient interactions between agents and little interaction with the environment, which is made difficult to measure by the frequent unavailability of environment information. Therefore, we approximate environment information based on the principle of stigmergy, which observes the self-organization of human navigation along trails. For each position that agents have traveled through in either the training or testing sets of the ground truth, a 1-meter radius around the position is considered to be navigable. This guarantees that predictions with less than 1 meter of displacement from the ground truth at all times will never intersect with the environment. In order to compensate for the imbalance between A2A and A2E interactions in prior datasets, we propose the generation of synthetic data in addition to that of TrajNet++. While real datasets are valuable for their veridicality, there are logistical limitations that prevent the acquisition of real data in OOD scenarios that are unsafe for human participants or prohibitively expensive from an organizational standpoint.

1CFFContains data collected from a train station.
2LCASContains data collected from a university building.
3ORCAContains simulated data of agents moving to the opposite side of a circle from their initial positions around the circumference.
4TrajNetContains data from the ETH, UCY, and SDD datasets, containing trajectories from outdoor scenes.
5WILDTRACKContains data collected from a public square.

Agent-to-Environment Interactions

Two such scenarios are used to sample trajectories exhibiting A2E interactions: (1) pathfinding alone in a large, complex environment, which has prohibitive logistical cost and navigating through bottlenecks of varied width with a dense crowd. Though simulation models are normally less accurate than predictive models in predicting human trajectories, they currently outperform predictive models and have ecologically validity in these A2E scenarios, which have not had sufficient real data for training predictive models until A-to-X.

We leverage the prevalent Social Force model to simulate num scenarios of a single agent navigating between random points in complex 112 x 112 m2 environments from. This produces long-term isolated interactions between single agents and the environment. We then use the same model to simulate bottleneck scenarios in a 25 x 7 m2 room that vary in terms of (a) the density of agents (Level of Service) from {0.2, 0.4, 0.6, 0.8, 1.0} agents/m2 and (b) the ratio between the width of the bottleneck and the width of the room (Exit-Entrance Ratio) from {0.2, 0.3, 0.4, 0.6, 0.7}. Several scenarios have been generated for each combination of Level of Service and Exit Entrance Ratio. This produces long-term interactions between agents as a result of the constricting environment. Exact environment information has been provided for both types of scenarios. We later show that current models trained on existing A2A datasets are unable to generalize to testing on these critical scenarios, but with the addition of training data on these scenarios, the accuracy of predictions significantly improves.

1BottleneckContains simulated data of agents of varied initial density moving through a bottleneck of varied size.
2PathfindingContains simulated data of a single agent navigating through a complex environment.