-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
I wanted to ask you for some clarification on the results. I ran the code for ~72 hours on a core i7 with Geforce Nvidia 1070Ti and here is what I got as a result (I Ctrl+C after 72 hours). I have 2 specific questions:
- What's the difference between Policy/Policy+MCTS? According to the Table 2 in the paper, I guess Policy+MCTS is the AlphaTSP, Greedy is the Nearest Neighbour. Am I right?
- Why the exact solution differes in each iteration? It starts with 4.43 for the first testing phase and then goes up to 4.61 and then goes down again to 4.57. What's the difference between testing phases below?
- I checked the GPU usage using nvtop in Ubuntu, and I can say it barely used GPU (<10% in average), even though the Pytorch had been correctly setup. I was expecting to see more GPU usage during the execution of the code. What do you think?
Thanks again for your interesting and useful contribution.
$ python3 main.py --experiment selfplay
Generating examples and training...
Testing...
Results:
Policy: 14.852219307851573
Policy+MCTS: 6.079631066890421
MCTS: 5.835310067268816
Greedy: 5.188520984973079
Exact: 4.436138053603336
Testing...
Results:
Policy: 15.904635022762267
Policy+MCTS: 6.510548698490385
MCTS: 6.243669944175507
Greedy: 5.4000203183930315
Exact: 4.616644095916955
Testing...
Results:
Policy: 15.679962777211916
Policy+MCTS: 6.082748741008646
MCTS: 6.199633294276332
Greedy: 5.606375414084934
Exact: 4.571680987924188
Metadata
Metadata
Assignees
Labels
No labels