I'm going to upload the mission result. (For the fair comparison of the two loss functions, the structure of the artificial neural network was left intact.)
Compared to using KL-div as the loss function, it seems to be a relatively smaller value from the loss function of the first epoch. Thus, the same number of epochs were taught, but the loss function value of the last epoch was smaller with CrossEntropyLoss.
After verification with the test set, the loss function is slightly larger than the training set (0.003) and the accuracy is about 2% lower (compared to the KL-div), so it seems to have been relatively slightly (?) understudied in the training set (compared to the KL-div. (Is this how you interpret the results...? Or should we consider it a proper learning...?) Even if we increase the number of epochs by one more. I'm going to change the structure of the phytosis...
And I have a question. I'd like to check how the weight matrix was determined at the end of the optimization process for each batch. I wonder how I can check it out. When I printed out the print (model.lin3), I couldn't find any specific matrix shapes.
+
Comparing the codes with the above, they were all the same from the training stage. After running the case of KL-div, we carefully guess if the previously optimized weight matrix was used in the mission because the model was not initialized. (It's almost the same as my code, but the results are different, so I thought about it a little bit.)
Thank you for reading the long article!