Skip to main content
Fig. 2 | Environmental Evidence

Fig. 2

From: Testing the utility of GPT for title and abstract screening in environmental systematic evidence synthesis

Fig. 2

Performance in terms of number of records screened out by GPT, and the false positive errors (Type I) and false negative error (Type II) made as a function of the relevance probability. Columns in the panel show results for three versions of GPT model API used: Top row GPT3.5 as of 1st March 2023, second row, same model as of 13th June, and third row GPT4 as of 6th Nov 2023. Left hand side a, c, e, shows results benchmarked towards the results from title and abstract screening stage conducted by humans, and right-hand side b, d, f, compared to final set of included records in the review after full-text screening by humans. Numbers are shown above dark blue Type II errors lines for clarity

Back to article page