*By clinical, we mean that we have to look at the magnitude of the stratum-specific differences. Differences that are so small to be of very little relevance from a clinical or biologic perspective are not worth reporting. In contrast, very large differences are really telling us something clinically and we should want to report these.
* By statistical, we mean that we need to look at the p value and confidence intervals, but what p value should we use? There are inherent limitations in the statistical power of tests of homogeneity. Only relatively large magnitude of difference between stratum-specific estimates or large sample sizes can achieve p values of less than 0.05. Hence, it may be worthwhile to use a higher threshold - not for declaring statistical significance of interaction but for when deciding when to report stratum-specific estimates as opposed to pooling them. It should be emphasized that we are not condoning a different cut off of statistical significance for tests of interaction as if to say that they are fundamentally different than any other hypothesis testing. They are indeed interpreted just like any other p value.
* Finally, from a practical perspective, the question is just how complicated is it to report stratum-specific estimates individually instead of just one number which would apply for all strata. If there are 10 different strata to report on, this could make for a complicated message. On the hand, if there are just two strata, then it is probably worthwhile to report this than ignore it.