Where to Submit? Helping Researchers to Choose the Right Venue

Whenever researchers write a paper, the same question occurs: “Where to submit?” In this work, we introduce WTS, an open and interpretable NLP system that recommends conferences and journals to researchers based on the title, abstract, and/or keywords of a given paper. We adapt the TextCNN architecture and automatically analyze its predictions using the Integrated Gradients method to highlight words and phrases that led to the recommendation of a scientific venue. We train and test our method on publications from the fields of artificial intelligence (AI) and medicine, both derived from the Semantic Scholar dataset. WTS achieves an Accuracy@5 of approximately 83% for AI papers and 95% in the field of medicine. It is open source and available for testing on https://wheretosubmit.ml.


Introduction
When choosing a scientific conference or journal (in the following called venue) to submit a manuscript, researchers consider several factors. While factors such as the venue's impact, time, or location are important, the main factor is the manuscript's thematic fit to the conference. This can be ensured by inspecting the Call for Papers (e.g. this paper fits into EMNLP as it is an "interesting application nugget") or by analyzing previously published papers at the given conference. Given the growing number of conferences (e.g., the exponential growth of computer science publications indicates more and/or larger conferences 2 ), the second approach has become harder than ever, especially for novice researchers, or even senior researchers wanting to publish in a new domain. Finding a thematically fitting venue for a manuscript therefore is a time consuming task. 1 https://github.com/konstantinkobs/wts 2 As visualized on https://dblp.uni-trier.de/ statistics/recordsindblp. Figure 1: Overview of WTS. Title, abstract, and keywords are processed separately by a convolution layer and max-over-time pooling. The output vectors are concatenated and fed through two fully connected layers that predict fitting venues. For the top five entries, we calculate the important words and phrases using the Integrated Gradient method (Sundararajan et al., 2017).
In this work, we try to simplify this process by introducing "Where to Submit" (WTS), an NLP system based on a convolutional neural network that recommends academic venues given the title, abstract, and/or keywords of a planned publication. The system is trained on previously published manuscripts. To understand the system's choice of recommending a specific venue, WTS analyzes the words and phrases that had the highest impact on a recommendation. A researcher can then use the list of recommended venues as a starting point to find the best fitting venue based on other factors such as the Call for Papers, rank, or deadline. Our main contributions are: (1) We describe an effective method that can recommend scientific venues based on a paper's title, abstract, and keywords.
(2) We incorporate an Explainable AI method into our system to give feedback on why a certain conference or journal was recommended. (3) We evaluate our approach on two datasets from two different research areas to show its general applicability.
(4) We make WTS available as a web service for everyone to use.

Related Work
There exist several online services that recommend venues based on the contents of a publication, but all of them are lacking in some ways: 1.
Most of them only recommend journals, not conferences (e.g. Elsevier; Journal Guide; Springer; Wiley; Enago; Edanz; Manuscript Matcher; Journal/Author Name Estimator; SJFinder). Especially in the fields of Computer Science and AI, most work is published on conferences (Vrettas and Sanderson, 2015), making this a severe drawback for AI researchers.
2. Most of the services are commercially motivated (Elsevier; IEEE; Springer; Wiley; Enago; Edanz; Manuscript Matcher). Publishers and companies provide them to promote their own portfolio or other services. Thus, they diminish the variety of the recommendations by only considering their own journals.
3. Many of the services are black boxes without any information on how they perform their recommendations. There are a few exceptions to this: Journal/Author Name Estimator uses the open source search engine software Lucene to find the 50 most similar papers according to the Lucene index and recommends the journals that occur most often in this set (Schuemie and Kors, 2008). Elsevier extracts noun phrases from the paper and matches these with a database using the Okapi BM25 algorithm (Kang et al., 2015;Robertson, 1990).
4. None of the provided services explain why a specific venue was chosen. Only very recently, recommending conferences based on authors, abstracts, and keywords became a new research area (Iana et al., 2019). However, the authors approach a more general setting that includes conferences from a wide variety of fields. They also incorporate author information into the recommendation and do not provide an explanation to the user why a given conference was recommended. With WTS, we introduce an open and explainable system that recommends both journals and conferences.

Task and Methodology
Now we describe the task, our proposed method WTS, and the baselines we compare it to.

Task Definition
Given a title, abstract, and keywords of a publication, we aim to predict the venue where the paper was published. We interpret the classification task as a ranking task by ordering the potential venues according to their score in the model output and use metrics that assess the ranking performance.

Approaches
In the following, we describe the applied baseline methods as well as our own approach.
Random Baseline The simplest baseline is to always predict venues in a uniformly random order. As this will yield variances in prediction quality, we report the expected value for each metric.

Majority Baseline
The majority baseline orders venues by the number of publications in the training set in descending order. Assuming stratified sampling, common venues are ranked higher.
Logistic Regression Baseline For this baseline, we tokenize the title, abstract, and keywords. From all tokens, we create a term frequency vector and train a multi-class logistic regression. The venues are then sorted in decreasing order based on the model output. For implementation, we use sklearn's methods for vectorization and logistic regression (Pedregosa et al., 2011).

Iana et al.
We also compare our method to one of the approaches outlined in (Iana et al., 2019). For better comparison, we use their best performing approach (according to Recall@10) that does not incorporate any third-party information (called "Ensemble TF-IDF & word2vec plus CNN (10)"). A logistic regression combines two classifiers: (1) Concatenating all corresponding abstracts of a venue, creating one TF-IDF representation and ranking venues using their distance to the provided abstract representation and (2) classifying abstracts using TextCNN, a convolutional neural network (CNN) for text classification (Kim, 2014). (Paszke et al., 2019). Our network's structure is shown in Figure 1. In contrast to Iana et al., we also provide the network with title and keyword information. We lowercase and embed each word in the title, abstract, and keywords using Word2Vec (Mikolov et al., 2013), trained on the abstracts and titles of the respective dataset. This creates three twodimensional inputs for the model. Each input is then processed through a convolution layer with potentially multiple filter sizes and max-over-time pooling, which maps the processed inputs to a fixed  size. The resulting vectors are concatenated and fed through two feed-forward layers that map to a vector representing the venues. Training with categorical cross entropy leads to higher outputs for more likely venues. Dropout (Srivastava et al., 2014) and batch normalization (Ioffe and Szegedy, 2015) are used for regularization.

Datasets
For training and testing, we extract all publications from the Semantic Scholar dataset (Ammar et al., 2018) 3 that were published in the research fields artificial intelligence (AI) and medicine. 4 A publication is considered to be an AI paper if it was published in one of the scientific venues given in (Kersting et al., 2019). We manually match them as closely as possible to the Semantic Scholar venues. This procedure leads to 77 distinct venues. We also add a class called "non-AI" consisting of 20 000 publications from other fields, to let the model learn the difference between AI and non-AI venues, resulting in 78 classes for this dataset.
In the field of medicine, we only use publications from Semantic Scholar that originate from Medline, a medical publication database. Due to a high number of venues with few publications, we only consider the top 78 venues (the same number as for the AI dataset, making the performance metrics comparable), which account for about 10 % of the publications.
In general, we only keep publications where no input information is missing. Table 1 gives an 3 Release from 2019-01-31. 4 Code to reproduce the data will be published.  overview of both datasets. From both datasets we randomly sample 80 % as training, 10 % as validation and 10 % as test sets in a stratified manner. While our approach might favor larger conferences with this sampling strategy, we argue that this procedure better reflects the venue landscape. Larger conferences usually cover a larger thematic scope and accept more manuscripts.

Experimental Setup
Given the training, validation, and test splits of both datasets described in Section 4, we train all models on the training dataset and perform hyperparameter optimization on the validation data. We then report several metrics, detailed below, on the test dataset.
Hyperparameters For WTS, we perform random search (Bergstra and Bengio, 2012)   1 rank i , where rank i is the position at which the target item is ranked by the model. Always predicting the correct item at first position leads to a MRR of one, while bad models achieve a MRR closer to zero. Table 3 shows the results of all tested methods on the test sets of both datasets. On both datasets, WTS outperforms all baseline methods in all metrics. A Wilcoxon signed-rank test (Wilcoxon, 1992) at 1 % confidence level shows a significant difference in the ranking of correct venues between WTS and the logistic regression baseline on both datasets. Together with the better MRR, this shows the superiority of our method to the baselines. In approximately 83 % (AI) and 95 % (Medicine) of all cases, the correct venue was in the top five.

Results
Interestingly, compared to the Medicine dataset, the method by Iana et al. performs poorly on AI publications. We suspect this is due to the smaller size of the AI dataset and a higher skew in publication counts per venue (cf. Table 1).

Explainability
As a key part of WTS is explainability, we do not only want to recommend venues to the user but also explain why a certain venue was recommended. We  Figure 2: Excerpt from a publication. WTS highlights the words leading to its prediction "NAACL". use the Integrated Gradients method introduced by Sundararajan et al. (2017) implemented in the Py-Torch Captum library 5 to find the most influential words and phrases for the top five recommendations of the network. The method varies the input by linearly transitioning in 50 steps from the actual embedding inputs to matrices filled with zeros. Then, the gradients of the desired venue output with respect to the inputs are calculated for each of these steps. The gradients are averaged and multiplied by the initial input, giving positive or negative values to words and phrases that had positive or negative impact on the score of the desired venue. Figure 2 visualizes an excerpt from WTS's output for the well-known BERT paper, which received the best paper award at NAACL 2019 (Devlin et al., 2019). It correctly ranks "NAACL" first. Integrated Gradients correctly identifies "Transformers" and "Language Understanding" as words that qualify this publication as an NLP paper.

Website
In order to make our system available to the public, we release WTS as a web service 6 where researchers can input their AI paper's information and receive recommendations for venues. The web service applies the trained CNN and explainability method and shows the top five predicted venues for the given paper along with a color-coded explanation (cf. Section 7) and venue-related links. The Accuracy@5 results described in Section 6 indicate that most of the times, a fitting venue is displayed to the user.

Conclusion
We have presented WTS, an NLP system that recommends scientific venues based on the title, abstract, and/or keywords of a publication. WTS is designed to provide an explanation as to why a certain venue was recommended, making it the first interpretable and open recommendation service for both, conferences and journals. We have shown that WTS provides strong recommendations on publications from the areas of AI and medicine.
Future work may regard evaluation: While each publication only was published at one specific venue, it might also be suitable for multiple other venues, implying that our current scores are just lower bounds for the actual performance. Improvements to the provided web service could be to make the list of venues sortable based on their deadline, impact, or other configurable factors.