Importing Data.table Development Version Hosted on GitHub into an R-Package for Seamless Function Loading

Importing Data.table Development Version Hosted on GitHub into an R-Package

===========================================================

Introduction

The data.table package is a popular and powerful data manipulation library in R. However, its development version, hosted on GitHub, can be challenging to integrate into an R-package. In this article, we will explore the steps required to import the latest data.table development version into your R-package.

The Problem

The user in question has updated their data.table package using data.table::update_dev_pkg(), and now wants to use its new env argument within a function defined in their R-package. However, when they try to load the function using devtools::load_all(), they receive an error message.

The user has already taken several steps to address this issue, including:

  • Updating data.table with the latest development version
  • Adding Remotes: github::Rdatatable/data.table to their DESCRIPTION file
  • Including additional repositories in their DESCRIPTION file

Despite these efforts, the user is still unable to load the function and uses the following minimal example code:

# libraries (for interactive use only, do not deploy inside package)
data.table::update_dev_pkg()
library(purrr)
library(data.table)

# function (use this inside the package)
make_labels <-
    function(.dtt,.lbl){
        f <- 
            function(.dtt,clm,val,lbl){
                .dtt[
                    ,clm := as.character(clm)
                    ,env = list(clm=clm)
                ][
                     clm == val
                    ,clm := lbl
                    ,env = list(clm=clm,val=val,lbl=I(lbl))
                ]
            }
        purrr::pwalk(.lbl,f,.dtt)
    }

#sample data
dtt <-
    data.table::data.table(
         v1 = rep(1:2,5)
        ,v2 = rep(1:5,2)
    )        
lbl <-
    data.table::data.table(
         clm = c(rep("v1",2),rep("v2",5))
        ,val = c(1:2,1:5)
        ,lbl = letters[1:7]
    )

#deploy function
make_labels(dtt,lbl)

Despite this, the user still encounters an error when trying to load the function using devtools::load_all().

The Solution

After further investigation and reading related vignettes and stackoverflow posts, we can identify two key issues that led to the problem:

  • Missing NAMESPACE entry via roxygen
  • Incorrect REMOTES field in DESCRIPTION file

To resolve these issues, we need to make the following adjustments:

1. Put Remotes: Rdatatable/data.table to the DESCRIPTION file.

# fields (required)
Rcpp: depends = "Rcpp"
Remotes: github::Rdatatable/data.table

This will automatically import the latest development version of data.table from GitHub when you update your package.

2. Add #' @import data.table to the top of your function file and execute devtools::document().

#' Make Labels (Function)
#'
#' This function replaces numerical codes with character-values according to a definition-file (.lbl) that is deployed to a data-file (.dtt).
#'
#' @param .dtt Data table object.
#' @param .lbl Definition file object.
#'
#' @return Modified data table object
make_labels <-
    function(.dtt,.lbl){
        # Function implementation...
    }

This will ensure that the library(data.table) statement is correctly added to your package’s namespace.

Conclusion

In this article, we explored the challenges of importing the latest data.table development version into an R-package. By following these steps and making adjustments to our DESCRIPTION file and NAMESPACE entry via roxygen, we can successfully load functions defined within the package using devtools::load_all().


Last modified on 2023-07-11