I've updated the article to indicate that the main network weights are updated using the Bellman…

Oct 3, 2021

I've updated the article to indicate that the main network weights are updated using the Bellman Equation. The target network is used to calculate the Temporal Difference Target.

In addition, the main network samples and trains on a batch of past experiences every 4 steps. The main network weights are then copied to the target network weights every 100 steps.

Written by Mike Wang

No responses yet