I think so. My guess is during the inference, we run a sliding window on the unseen paragraph to generate a set of context windows. Then, we learn a paragraph vector by maximizing the average log likelihood for all context windows. Of course, we need to freeze the weights on the word embedding matrix; otherwise, the word embedding matrix will diverge from what we learn from the training data.
I am in.