Fig. 2From: Informing the development of an outcome set and banks of items to measure mobility among individuals with acquired brain injury using natural language processingThe iterative improvement process for preliminary item bank process. The process began with an initial Sentence-BERT model and relied heavily on the ICF ontology to produce a good enough first clustering. At each step, a grid search was collected over a wide range of hyperparameter values and a best clustering was retained according to automatic heuristics and human evaluation. After each clustering, expert annotations were collected to improve the Sentence-BERT model and yield better clusterings. We report the F1 score of each clustering with respect to the first and second expert annotations, respectively named E_1 and E_2. Here, E_2 is the most reliable metric, as it associates items with adequate labels, while E_1 associates item pairs with whether or not they belong together. By nature, E_1 penalizes having a large number of clusters, as can be seen on the third clustering's score. Also note that both E_1 and E_2 are not exact metrics, as, for instance, the third clustering still required heavy finetuning by experts to yield a satisfying Core Outcome Set despite the near-perfect E_2 score.Back to article page