Identify Text Written by Large Language Models using GPTZero • gptzeror

gptzeror provides an R interface to GPTZero API. GPTZero predicts if text was generated by “AI” like ChatGPT. It splits documents by paragraph and sentence, allowing for detection when text is partially written by “AI” and partially by humans.

Installation

You can install the development version of gptzeror from GitHub with:

# install.packages('remotes')
remotes::install_github('christopherkenny/gptzeror')

Example

Below is an example using the abstract of Kenny, McCartan, Simko, Kuriwaki, and Imai (2023).

abstr <- 'Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To separate the partisan effects of redistricting from the effects of other factors including geography and redistricting rules, we compare possible party compositions of the U.S. House under the enacted plan to those under a set of alternative simulated plans that serve as a non-partisan baseline. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the electoral bias it creates cancels at the national level, giving Republicans two additional seats on average. Geography and redistricting rules separately contribute a moderate pro-Republican bias. Finally, we find that partisan gerrymandering reduces electoral competition and makes the partisan composition of the U.S. House less responsive to shifts in the national vote.'

We can pass text directly via gptzero_predict_text().

library(gptzeror)
gptzero_predict_text(abstr)
#> # A tibble: 5 × 10
#>   doc_average_generated_prob doc_completely_generated_p…¹ doc_overall_burstiness
#>                        <dbl>                        <dbl>                  <dbl>
#> 1                        0.2                      0.00228                   101.
#> 2                        0.2                      0.00228                   101.
#> 3                        0.2                      0.00228                   101.
#> 4                        0.2                      0.00228                   101.
#> 5                        0.2                      0.00228                   101.
#> # ℹ abbreviated name: ¹doc_completely_generated_prob
#> # ℹ 7 more variables: par_completely_generated_prob <dbl>,
#> #   par_num_sentences <int>, par_start_sentence_index <int>,
#> #   sentence_index <int>, generated_prob <int>, perplexity <int>,
#> #   sentence <chr>

The API also accepts common file types as uploads, including .txt, .docx, and .pdf. To access this endpoint, use gptzero_predict_file().

temp_file <- tempfile(fileext = '.txt')
cat(abstr, file = temp_file)

gptzero_predict_file(temp_file)
#> # A tibble: 5 × 10
#>   doc_average_generated_prob doc_completely_generated_p…¹ doc_overall_burstiness
#>                        <dbl>                        <dbl>                  <dbl>
#> 1                        0.2                      0.00228                   101.
#> 2                        0.2                      0.00228                   101.
#> 3                        0.2                      0.00228                   101.
#> 4                        0.2                      0.00228                   101.
#> 5                        0.2                      0.00228                   101.
#> # ℹ abbreviated name: ¹doc_completely_generated_prob
#> # ℹ 7 more variables: par_completely_generated_prob <dbl>,
#> #   par_num_sentences <int>, par_start_sentence_index <int>,
#> #   sentence_index <int>, generated_prob <int>, perplexity <int>,
#> #   sentence <chr>

Additional Information

Documentation for the GPTZero API is available here.