Proximity Attention Reveals the Hidden Path of Parcels

Every morning in modern cities a quiet battle unfolds on asphalt and in dashboards. Packages queued in warehouses become delicate tradeoffs of timing, routes, and human judgement as couriers weave through traffic, weather, and roadwork. The math behind that everyday drama is invisible to most of us, but it is the heartbeat of same day delivery that once would have been impossible to scale.

Researchers at IDLab Antwerpen Universiteit Antwerpen in Belgium, led by Hansi Denis, Siegfried Mercelis, and Ngoc-Quang Luong, built PAPN to forecast the path a courier is most likely to take when picking up parcels within a first mile window. They combined a local proximity based attention mechanism with a global transformer style view of the whole network and wrapped it inside a decoder that points to the next stop, much like a GPS that learns from human habits rather than just minimizing distance.

They tested the approach on LaDE, a large industry dataset from Cainiao with more than ten million packages and around twenty one thousand couriers over six months in Hangzhou and other cities. The key claim is not that the model finds the mathematically optimal route but that it predicts the route a human carrier is likely to choose given real world constraints. This subtle shift predicting intent rather than ideal routing makes a big practical difference for dispatch traffic planning and training new couriers.

What PAPN sees and how it walks a route

PAPN treats every potential pickup as a node in a dynamic graph. Each node carries features like location time window and how close it is to the starting point.

Then a proximity attention layer looks at the reachability mask the set of nodes currently visitable at each step and computes how strongly each neighbor should influence the current decision. The trick is to let local context the near by pickups and their connections drive the moment to moment choice while a Transformer encoder builds a global sense of the whole city rhythm.

Next the paper fuses local embeddings with a global embedding creating a richer representation that knows both the local neighborhood and the big picture. The decoder is a Pointer Network that at each step picks the next node by attending to the encoded representations but it also respects which nodes are currently available. The result is a predicted sequence of pickups that reflects typical human routing decisions under time constraints.

Why this could reshape cities couriers and shoppers

Why does this matter Because last mile logistics are the bottleneck of modern e commerce. A small improvement in predicting routes can ripple into less idle time, shorter trips, and fewer emissions.

With better predictions dispatchers can assign a parcel to the courier whose predicted route naturally passes by it, reducing driving miles and traffic churn. Businesses can train new couriers by showing them examples of routes that people actually follow, not just the shortest path. And city planners can simulate how changes in street layouts or bike lanes might ripple through the delivery network.

On the technical side PAPN’s results place it among the best supervised methods for this problem on the public LaDE dataset, and it even holds its own against some reinforcement learning approaches that require heavier training. That is meaningful you can get near state of the art without the complexities of reinforcement learning which often demands more compute and more data.

What surprised researchers and where the path goes next

A few surprises emerged. The researchers found that mixing two levels of context local proximity driven embeddings and the global transformer context improved predictions across multiple cities and time windows. Simply relying on local context or global context alone didnt cut it.

They also ran an ablation study showing that removing the transformer encoder or the proximity layer degrades performance underscoring that the combination pays off. And the optimal learning rate mattered a smaller setting around three times ten to the minus five yielded better results than the default values used for other models highlighting that training dynamics can be as important as architecture.

Looking ahead PAPN could be extended to joint predictions of route and time delivering a more complete planning tool The dataset is public enabling more researchers to benchmark new ideas and push the field forward The authors even released the data generation code encouraging a community approach to what has long been a murky area of evaluation.