d, thank you for your valuable information.
I'll check the paper.
In formula (1) in page 3, L2 regularization term is added.
I understand regularization term is added in order to mainly prevent overfitting.
I don't think overfitting occurs in the reinforcement learning because non labeled data are used for training.
I would appreciate it if anyone could tell me the details of the above.
Great! I'm in.