Datamash
Jump to navigation
Jump to search
datamash[1] is a command-line program which performs basic numeric and statistical operations.
brew install datamash
Usage: datamash [OPTION] op [fld] [op fld ...] Performs numeric/string operations on input from stdin. 'op' is the operation to perform. If a primary operation is used, it must be listed first, optionally followed by other operations. 'fld' is the input field to use. 'fld' can be a number (1=first field), or a field name when using the -H or --header-in options. Multiple fields can be listed with a comma (e.g. 1,6,8). A range of fields can be listed with a dash (e.g. 2-8). Use colons for operations which require a pair of fields (e.g. 'pcov 2:6'). Primary operations: groupby, crosstab, transpose, reverse, check Line-Filtering operations: rmdup Per-Line operations: base64, debase64, md5, sha1, sha224, sha256, sha384, sha512, bin, strbin, round, floor, ceil, trunc, frac, dirname, basename, barename, extname, getnum, cut Numeric Grouping operations: sum, min, max, absmin, absmax, range Textual/Numeric Grouping operations: count, first, last, rand, unique, collapse, countunique Statistical Grouping operations: mean, geomean, harmmean, trimmean, median, q1, q3, iqr, perc, mode, antimode, pstdev, sstdev, pvar, svar, ms, rms, mad, madraw, pskew, sskew, pkurt, skurt, dpo, jarque, scov, pcov, spearson, ppearson Options: Grouping Options: -C, --skip-comments skip comment lines (starting with '#' or ';' and optional whitespace) -f, --full print entire input line before op results (default: print only the grouped keys) This option is only sensible for linewise operations. Other uses are deprecated and will be removed in a future version of GNU Datamash. -g, --group=X[,Y,Z] group via fields X,[Y,Z]; equivalent to primary operation 'groupby' --header-in first input line is column headers --header-out print column headers as first line -H, --headers same as '--header-in --header-out' -i, --ignore-case ignore upper/lower case when comparing text; this affects grouping, and string operations -s, --sort sort the input before grouping; this removes the need to manually pipe the input through 'sort' -c, --collapse-delimiter=X use X to separate elements in collapse and unique lists (default: comma) File Operation Options: --no-strict allow lines with varying number of fields --filler=X fill missing values with X (default N/A) General Options: -t, --field-separator=X use X instead of TAB as field delimiter --format=FORMAT print numeric values with printf style floating-point FORMAT. --output-delimiter=X use X instead as output field delimiter (default: use same delimiter as -t/-W) --narm skip NA/NaN values -R, --round=N round numeric output to N decimal places -W, --whitespace use whitespace (one or more spaces and/or tabs) for field delimiters -z, --zero-terminated end lines with 0 byte, not newline --sort-cmd=/path/to/sort Alternative sort(1) to use. --help display this help and exit --version output version information and exit Environment: LC_NUMERIC decimal-point character and thousands separator Examples: Print the sum and the mean of values from column 1: $ seq 10 | datamash sum 1 mean 1 55 5.5 Transpose input: $ seq 10 | paste - - | datamash transpose 1 3 5 7 9 2 4 6 8 10 For detailed usage information and examples, see man datamash The manual and more examples are available at https://www.gnu.org/software/datamash
Basic Usage
seq 10 | datamash --header-out min 1 mean 1 median 1 max 1 sstdev 1 count 1
- Add two numbers:
echo -e "100\n 50" | datamash sum 1
See also
Advertising: