This problem occurs whenever there is an indeterminacy over how to handle an observation that has reached a categorical split which was determined when the observation in question’s level was absent during training.

TL;DR No feature engineering heuristics seem to really help mitigate this kind problem.