The conventional answer #1 is correct here; the arguments in the contradicting answer #2 do not actually hold.
When having such doubts, it is useful to imagine that you simply do not have any access in any test set during the model fitting process (which includes feature importance); you should treat the test set as literally unseen data (and, since unseen, they could not have been used for feature importance scores).
Hastie & Tibshirani have clearly argued long ago about the correct & wrong way to perform such processes; I have summarized the issue in a blog post, How NOT to perform feature selection! - and although the discussion is about cross-validation, it can be easily seen that the arguments hold for the case of train/test split, too.
The only argument that actually holds in your contradicting answer #2 is that
the overall historical data is not analyzed
Nevertheless, this is the necessary price to pay in order to have an independent test set for performance assessment, otherwise, with the same logic, we should use the test set for training, too, shouldn't we?
Wrap up: the test set is there solely for performance assessment of your model, and it should not be used in any stage of model building, including feature selection.
UPDATE (after comments):
the trends in the Test Set may be different
A standard (but often implicit) assumption here is that the training & test sets are qualitatively similar; it is exactly due to this assumption that we feel OK to just use simple random splits to get them. If we have reasons to believe that our data change in significant ways (not only between train & test, but during model deployment, too), the whole rationale breaks down, and completely different approaches are required.
Also, on doing so, there can be a high probability of Over-fitting
The only certain way of overfitting is to use the test set in any way during the pipeline (including for feature selection, as you suggest). Arguably, the linked blog post has enough arguments (including quotes & links) to be convincing. Classic example, the testimony in The Dangers of Overfitting or How to Drop 50 spots in 1 minute:
as the competition went on, I began to use much more feature selection and preprocessing. However, I made the classic mistake in my cross-validation method by not including this in the cross-validation folds (for more on this mistake, see this short description or section 7.10.2 in The Elements of Statistical Learning). This lead to increasingly optimistic cross-validation estimates.
As I have already said, although the discussion here is about cross-validation, it should not be difficult to convince yourself that it perfectly applies to the train/test case, too.
feature selection should be done in such a way that Model Performance is enhanced
Well, nobody can argue with this, of course! The catch is - which exact performance are we talking about? Because the Kaggler quoted above was indeed getting better "performance" as he was going along (applying a mistaken procedure), until his model was faced with real unseen data (the moment of truth!), and it unsurprisingly flopped.
Admittedly, this is not trivial stuff, and it may take some time until you internalize them (it's no coincidence that, as Hastie & Tibshirani demonstrate, there are even research papers where the procedure is performed wrongly). Until then, my advice to keep you safe, is: during all stages of model building (including feature selection), pretend that you don't have access to the test set at all, and that it becomes available only when you need to assess the performance of your final model.