Mike Wang
Oct 3, 2021

--

I've updated the article to indicate that the main network weights are updated using the Bellman Equation. The target network is used to calculate the Temporal Difference Target.

In addition, the main network samples and trains on a batch of past experiences every 4 steps. The main network weights are then copied to the target network weights every 100 steps.

--

--

Mike Wang
Mike Wang

Written by Mike Wang

Hi there, I write and teach about cool and interesting Engineering topics

No responses yet