Excluding Folders from Downloaded R Packages on GitHub
As an R developer, you’re likely familiar with hosting your packages on GitHub and using devtools::install_github to install them. However, sometimes you may need to exclude certain folders from being downloaded as part of the package. In this article, we’ll explore how to achieve this using various methods.
Background
When you use devtools::install_github, it downloads the entire master zip ball, which includes all files and subfolders within your repository. If you have a large number of files or subfolders that you don’t want to be part of the downloadable package, this can lead to slow download times.
One way to exclude folders from being downloaded is by using .gitattributes files in your repository. This method allows you to specify which folders should be excluded from being included in the tarball when someone downloads a copy of your repository.
Using .gitattributes
The simplest way to exclude folders from being downloaded is by adding a .gitattributes file to your repository, listing the folders that you don’t want to be downloaded. Each folder should have the attribute “export-ignore” specified, as shown in the following example:
foldertoexclude export-ignore
You can put this line in any folder that you want to exclude from being downloaded. For example, if you have a folder called data that contains scripts and objects that aren’t meant to be part of the downloadable package, you would add the line above to the .gitattributes file within that folder.
Example .gitattributes File
Here’s an example of what your .gitattributes file might look like:
[gitattribute "export-ignore" = true]
foldertoexclude/
data/
docsite/
src/
In this example, the foldertoexclude/ directory and its contents are excluded from being downloaded. The data/, docsite/, and src/ directories are also excluded.
Warning
Using .gitattributes to exclude folders from being downloaded affects how your repository can be downloaded as a tarball. If someone wants to download a copy of your repository – not to install the package, but just to modify your project – then their downloaded tarball won’t include the excluded folder.
Alternative Method: Using pkgdown
Another way to exclude folders from being downloaded is by using pkgdown to generate documentation sites in separate repositories. When you use pkgdown, it creates a new repository solely for the website, and you can specify which folder should be included in the tarball when someone downloads a copy of your project.
For example, if you want to exclude the docs/ folder from being downloaded, you can use the following command:
build_site(path = "../docsite/docs")
This will create a new repository for the documentation site, and only include the files in the specified path (../docsite/docs) in the tarball.
Editing Site Configuration YAML
As of the latest version of pkgdown, there is no longer a path parameter. Instead, you need to specify it in the site configuration YAML file.
For example, if you want to exclude the docs/ folder from being downloaded, you would add the following line to your site.yml file:
build:
dir: "docs"
In this case, docs/ is excluded from being included in the tarball.
Conclusion
Excluding folders from being downloaded as part of an R package on GitHub can be achieved using various methods. By adding a .gitattributes file to your repository or using pkgdown, you can exclude specific folders from being included in the tarball when someone downloads a copy of your project.
While using .gitattributes has its limitations, it’s a simple and effective way to achieve this goal. For more complex projects, using pkgdown may be a better option.
Last modified on 2024-01-27