As I understand, we feed the neural network with a short history of the latest positions, and it outputs both a value estimate for the position and a tensor of probabilities for the possible moves.
Is it true to say that the current position's value should be the expected value of the possible positions after one move, weighted by the probabilities of the possible moves given by the probability tensor?
Title: APPROXIMATING CNNS WITH BAG-OF-LOCAL-FEATURES MODELS WORKS SURPRISINGLY WELL ON IMAGENET
Description (in my own words): Deep convolutional neural networks can easily classify partly scrambled images, which leads to conclude that they rely in large part on texture detection. This is in opposition to the common interpretation that they build a hierarchical representation of shapes.
I agree with your interpretation. I think this is the heart of the paper and you are right in stating that the actual "inference" stage is not described clearly enough.
I find this technique reminiscent of transfer learning. In this case we transfer the learning that we obtained in building semantic vectors (the paragraph vectors) that augment a short sequence of words in the task of predicting the next word.
Count me in (not sure if I can keep the pace, but I'll try!)