In OpenAI gym classic-like env training, the model yields good results and completes the task. Validating with unseen data yields considerably lower results. Tried:
No matter, getting same results with the above. Any idea what it could be or what I can try?
1.4m articles
1.4m replys
5 comments
57.0k users