dplyr-cli
uses the Rscript
executable to run dplyr commands on CSV
files in the terminal.
dplyr-cli
makes use of the terminal pipe |
instead of the magrittr
pipe (%>%
) to run sequences of commands.
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
#> | cyl| mpg|
#> |---:|--------:|
#> | 4| 26.66364|
#> | 6| 19.74286|
#> | 8| 15.10000|
I wanted to be able to do quick hacks on CSV files on the command line using dplyr syntax, but without actually starting a proper R session.
Any command of the form:
dplyr::verb(.data, code)
dplyr::*_join(.data, .rhs)
Currently two extra commands are supported which are not part of
dplyr
.
csv
performs no dplyr command, but only outputs the input data as CSV to stdoutkable
performs no dplyr command, but only outputs the input data as aknitr::kable()
formatted string to stdout
- Only tested under ‘bash’ on OSX. YMMV.
- Every command runs in a separate R session.
- When using special shell characters such as
()
, you’ll have to quote your code arguments. Some shells will require more quoting than others. - “joins” (such as
left_join
) do not currently let you specify theby
argument, so there must be columns in common to both dataset
dplyr --help
# dplyr-cli
#
# Usage:
# dplyr <command> [--file=fn] [--csv | -c] [--verbose | -v] [<code>...]
# dplyr -h | --help
#
# Options:
# -h --help show this help text
# -f FILE --file=FILE input CSV or RDS filename. If reading from stdin, assumes CSV [default: stdin]
# -c --csv write output to stdout in CSV format (instead of default RDS file)
# -v --verbose be verbose
- Initial release
- Switch to ‘Rscript’ for easier install for users
- rename ‘dplyr.sh’ to just ‘dplyr’
- Support for joins e.g.
left_join
- More robust tmpdir handling
- Fix handling for latest
read_csv()
. Fixes #9
- aborusso - documentation
Because this script straddles a great divide between R and the shell, you need to ensure both are set up correctly for this to work.
- Install R packages
- Clone this repo and put
dplyr
in your path
dplyr-cli
is run from the shell but at every invocation is starting a
new rsession where the following packages are expected to be installed:
install.packages('readr') # read in CSV data
install.packages('dplyr') # data manipulation
install.packages('docopt') # CLI description language
Click to reveal instructions for installing packages on the command line
To do it from the cli on a linux-ish system, install r-base
(sudo apt -y install r-base
) and then run
sudo su - -c "R -e \"install.packages('readr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('dplyr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('docopt', repos='http://cran.rstudio.com/')\""
You’ll then need to download the shell script from this repository and
put dplyr
somewhere in your path.
git clone https://github.com/coolbutuseless/dplyr-cli
cp dplyr-cli/dplyr ./somewhere/in/your/search/path
Put an example CSV file on the filesystem. Note: This CSV file is now
included as mtcars.csv
as part of this git repository, as is a second
CSV file for demonstrating joins - cyl.csv
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
# cat contents of input CSV into dplyr-cli.
# Use '-c' to output CSV if this is the final step
cat mtcars.csv | dplyr filter -c "mpg == 21"
# "mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
# 21,6,160,110,3.9,2.62,16.46,0,1,4,4
# 21,6,160,110,3.9,2.875,17.02,0,1,4,4
# Put quotes around any commands which contain special characters like <>()
cat mtcars.csv | dplyr filter -c "mpg < 11"
# "mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
# 10.4,8,472,205,2.93,5.25,17.98,0,0,3,4
# 10.4,8,460,215,3,5.424,17.82,0,0,3,4
# Combine dplyr commands with shell 'head' command
dplyr select --file mtcars.csv -c cyl | head -n 6
# "cyl"
# 6
# 6
# 4
# 6
# 8
cat mtcars.csv | \
dplyr mutate "cyl2 = 2 * cyl" | \
dplyr filter "cyl == 8" | \
dplyr kable
# | mpg| cyl| disp| hp| drat| wt| qsec| vs| am| gear| carb| cyl2|
# |----:|---:|-----:|---:|----:|-----:|-----:|--:|--:|----:|----:|----:|
# | 18.7| 8| 360.0| 175| 3.15| 3.440| 17.02| 0| 0| 3| 2| 16|
# | 14.3| 8| 360.0| 245| 3.21| 3.570| 15.84| 0| 0| 3| 4| 16|
# | 16.4| 8| 275.8| 180| 3.07| 4.070| 17.40| 0| 0| 3| 3| 16|
# | 17.3| 8| 275.8| 180| 3.07| 3.730| 17.60| 0| 0| 3| 3| 16|
# | 15.2| 8| 275.8| 180| 3.07| 3.780| 18.00| 0| 0| 3| 3| 16|
# | 10.4| 8| 472.0| 205| 2.93| 5.250| 17.98| 0| 0| 3| 4| 16|
# | 10.4| 8| 460.0| 215| 3.00| 5.424| 17.82| 0| 0| 3| 4| 16|
# | 14.7| 8| 440.0| 230| 3.23| 5.345| 17.42| 0| 0| 3| 4| 16|
# | 15.5| 8| 318.0| 150| 2.76| 3.520| 16.87| 0| 0| 3| 2| 16|
# | 15.2| 8| 304.0| 150| 3.15| 3.435| 17.30| 0| 0| 3| 2| 16|
# | 13.3| 8| 350.0| 245| 3.73| 3.840| 15.41| 0| 0| 3| 4| 16|
# | 19.2| 8| 400.0| 175| 3.08| 3.845| 17.05| 0| 0| 3| 2| 16|
# | 15.8| 8| 351.0| 264| 4.22| 3.170| 14.50| 0| 1| 5| 4| 16|
# | 15.0| 8| 301.0| 335| 3.54| 3.570| 14.60| 0| 1| 5| 8| 16|
alias mutate="dplyr mutate"
alias filter="dplyr filter"
alias select="dplyr select"
alias summarise="dplyr summarise"
alias group_by="dplyr group_by"
alias ungroup="dplyr ungroup"
alias count="dplyr count"
alias arrange="dplyr arrange"
alias kable="dplyr kable"
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
# | cyl| mpg|
# |---:|--------:|
# | 4| 26.66364|
# | 6| 19.74286|
# | 8| 15.10000|
Limitations:
- first argument after a join command must be an existing file (either CSV or RDS)
- You can’t yet specify a
by
argument for a join, so there must be a column in common to join by
cat cyl.csv
# cyl,description
# 4,four
# 6,six
cat mtcars.csv | dplyr inner_join cyl.csv | dplyr kable
# | mpg| cyl| disp| hp| drat| wt| qsec| vs| am| gear| carb|description |
# |----:|---:|-----:|---:|----:|-----:|-----:|--:|--:|----:|----:|:-----------|
# | 21.0| 6| 160.0| 110| 3.90| 2.620| 16.46| 0| 1| 4| 4|six |
# | 21.0| 6| 160.0| 110| 3.90| 2.875| 17.02| 0| 1| 4| 4|six |
# | 22.8| 4| 108.0| 93| 3.85| 2.320| 18.61| 1| 1| 4| 1|four |
# | 21.4| 6| 258.0| 110| 3.08| 3.215| 19.44| 1| 0| 3| 1|six |
# | 18.1| 6| 225.0| 105| 2.76| 3.460| 20.22| 1| 0| 3| 1|six |
# | 24.4| 4| 146.7| 62| 3.69| 3.190| 20.00| 1| 0| 4| 2|four |
# | 22.8| 4| 140.8| 95| 3.92| 3.150| 22.90| 1| 0| 4| 2|four |
# | 19.2| 6| 167.6| 123| 3.92| 3.440| 18.30| 1| 0| 4| 4|six |
# | 17.8| 6| 167.6| 123| 3.92| 3.440| 18.90| 1| 0| 4| 4|six |
# | 32.4| 4| 78.7| 66| 4.08| 2.200| 19.47| 1| 1| 4| 1|four |
# | 30.4| 4| 75.7| 52| 4.93| 1.615| 18.52| 1| 1| 4| 2|four |
# | 33.9| 4| 71.1| 65| 4.22| 1.835| 19.90| 1| 1| 4| 1|four |
# | 21.5| 4| 120.1| 97| 3.70| 2.465| 20.01| 1| 0| 3| 1|four |
# | 27.3| 4| 79.0| 66| 4.08| 1.935| 18.90| 1| 1| 4| 1|four |
# | 26.0| 4| 120.3| 91| 4.43| 2.140| 16.70| 0| 1| 5| 2|four |
# | 30.4| 4| 95.1| 113| 3.77| 1.513| 16.90| 1| 1| 5| 2|four |
# | 19.7| 6| 145.0| 175| 3.62| 2.770| 15.50| 0| 1| 5| 6|six |
# | 21.4| 4| 121.0| 109| 4.11| 2.780| 18.60| 1| 1| 4| 2|four |
dplyr-cli
uses eval(parse(text = ...))
on user input. Do not expose
this program to the internet or random users under any circumstances.