Difference between revisions of "Datamash"
Jump to navigation
Jump to search
↑ https://www.gnu.org/software/datamash/
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[datamash]]<ref>https://www.gnu.org/software/datamash/</ref> is a command-line program which performs basic numeric and statistical operations. | [[datamash]]<ref>https://www.gnu.org/software/datamash/</ref> is a command-line program which performs basic numeric and statistical operations. | ||
+ | |||
+ | [[brew install datamash]] | ||
+ | |||
+ | |||
+ | <pre> | ||
+ | Usage: datamash [OPTION] op [fld] [op fld ...] | ||
+ | |||
+ | Performs numeric/string operations on input from stdin. | ||
+ | |||
+ | 'op' is the operation to perform. If a primary operation is used, | ||
+ | it must be listed first, optionally followed by other operations. | ||
+ | 'fld' is the input field to use. 'fld' can be a number (1=first field), | ||
+ | or a field name when using the -H or --header-in options. | ||
+ | Multiple fields can be listed with a comma (e.g. 1,6,8). A range of | ||
+ | fields can be listed with a dash (e.g. 2-8). Use colons for operations | ||
+ | which require a pair of fields (e.g. 'pcov 2:6'). | ||
+ | |||
+ | |||
+ | Primary operations: | ||
+ | groupby, crosstab, transpose, reverse, check | ||
+ | Line-Filtering operations: | ||
+ | rmdup | ||
+ | Per-Line operations: | ||
+ | base64, debase64, md5, sha1, sha224, sha256, sha384, sha512, | ||
+ | bin, strbin, round, floor, ceil, trunc, frac, | ||
+ | dirname, basename, barename, extname, getnum, cut | ||
+ | Numeric Grouping operations: | ||
+ | sum, min, max, absmin, absmax, range | ||
+ | Textual/Numeric Grouping operations: | ||
+ | count, first, last, rand, unique, collapse, countunique | ||
+ | Statistical Grouping operations: | ||
+ | mean, geomean, harmmean, trimmean, median, q1, q3, iqr, perc, | ||
+ | mode, antimode, pstdev, sstdev, pvar, svar, ms, rms, mad, madraw, | ||
+ | pskew, sskew, pkurt, skurt, dpo, jarque, | ||
+ | scov, pcov, spearson, ppearson | ||
+ | |||
+ | |||
+ | Options: | ||
+ | |||
+ | Grouping Options: | ||
+ | -C, --skip-comments skip comment lines (starting with '#' or ';' | ||
+ | and optional whitespace) | ||
+ | -f, --full print entire input line before op results | ||
+ | (default: print only the grouped keys) | ||
+ | This option is only sensible for linewise | ||
+ | operations. Other uses are deprecated and | ||
+ | will be removed in a future version of GNU | ||
+ | Datamash. | ||
+ | -g, --group=X[,Y,Z] group via fields X,[Y,Z]; | ||
+ | equivalent to primary operation 'groupby' | ||
+ | --header-in first input line is column headers | ||
+ | --header-out print column headers as first line | ||
+ | -H, --headers same as '--header-in --header-out' | ||
+ | -i, --ignore-case ignore upper/lower case when comparing text; | ||
+ | this affects grouping, and string operations | ||
+ | -s, --sort sort the input before grouping; this removes the | ||
+ | need to manually pipe the input through 'sort' | ||
+ | -c, --collapse-delimiter=X use X to separate elements in collapse and | ||
+ | unique lists (default: comma) | ||
+ | File Operation Options: | ||
+ | --no-strict allow lines with varying number of fields | ||
+ | --filler=X fill missing values with X (default N/A) | ||
+ | |||
+ | General Options: | ||
+ | -t, --field-separator=X use X instead of TAB as field delimiter | ||
+ | --format=FORMAT print numeric values with printf style | ||
+ | floating-point FORMAT. | ||
+ | --output-delimiter=X use X instead as output field delimiter | ||
+ | (default: use same delimiter as -t/-W) | ||
+ | --narm skip NA/NaN values | ||
+ | -R, --round=N round numeric output to N decimal places | ||
+ | -W, --whitespace use whitespace (one or more spaces and/or tabs) | ||
+ | for field delimiters | ||
+ | -z, --zero-terminated end lines with 0 byte, not newline | ||
+ | --sort-cmd=/path/to/sort Alternative sort(1) to use. | ||
+ | --help display this help and exit | ||
+ | --version output version information and exit | ||
+ | |||
+ | |||
+ | Environment: | ||
+ | LC_NUMERIC decimal-point character and thousands separator | ||
+ | |||
+ | |||
+ | Examples: | ||
+ | |||
+ | Print the sum and the mean of values from column 1: | ||
+ | $ seq 10 | datamash sum 1 mean 1 | ||
+ | 55 5.5 | ||
+ | |||
+ | Transpose input: | ||
+ | $ seq 10 | paste - - | datamash transpose | ||
+ | 1 3 5 7 9 | ||
+ | 2 4 6 8 10 | ||
+ | |||
+ | For detailed usage information and examples, see | ||
+ | man datamash | ||
+ | The manual and more examples are available at | ||
+ | https://www.gnu.org/software/datamash | ||
+ | |||
+ | |||
+ | </pre> | ||
== Basic Usage == | == Basic Usage == | ||
* <code>seq 10 | datamash --header-out min 1 mean 1 median 1 max 1 sstdev 1 count 1</code> | * <code>seq 10 | datamash --header-out min 1 mean 1 median 1 max 1 sstdev 1 count 1</code> | ||
− | * Add two numbers: <code>echo -e "100 \n 50" | datamash sum 1</code> | + | * Add two numbers: <code>[[echo]] -e "100\n 50" | datamash sum 1</code> |
== See also == | == See also == | ||
− | * [[ | + | * [[Octave]] |
+ | * {{bc}} | ||
+ | |||
+ | |||
+ | [[Category:Computing]] |
Latest revision as of 14:46, 4 May 2023
datamash[1] is a command-line program which performs basic numeric and statistical operations.
brew install datamash
Usage: datamash [OPTION] op [fld] [op fld ...] Performs numeric/string operations on input from stdin. 'op' is the operation to perform. If a primary operation is used, it must be listed first, optionally followed by other operations. 'fld' is the input field to use. 'fld' can be a number (1=first field), or a field name when using the -H or --header-in options. Multiple fields can be listed with a comma (e.g. 1,6,8). A range of fields can be listed with a dash (e.g. 2-8). Use colons for operations which require a pair of fields (e.g. 'pcov 2:6'). Primary operations: groupby, crosstab, transpose, reverse, check Line-Filtering operations: rmdup Per-Line operations: base64, debase64, md5, sha1, sha224, sha256, sha384, sha512, bin, strbin, round, floor, ceil, trunc, frac, dirname, basename, barename, extname, getnum, cut Numeric Grouping operations: sum, min, max, absmin, absmax, range Textual/Numeric Grouping operations: count, first, last, rand, unique, collapse, countunique Statistical Grouping operations: mean, geomean, harmmean, trimmean, median, q1, q3, iqr, perc, mode, antimode, pstdev, sstdev, pvar, svar, ms, rms, mad, madraw, pskew, sskew, pkurt, skurt, dpo, jarque, scov, pcov, spearson, ppearson Options: Grouping Options: -C, --skip-comments skip comment lines (starting with '#' or ';' and optional whitespace) -f, --full print entire input line before op results (default: print only the grouped keys) This option is only sensible for linewise operations. Other uses are deprecated and will be removed in a future version of GNU Datamash. -g, --group=X[,Y,Z] group via fields X,[Y,Z]; equivalent to primary operation 'groupby' --header-in first input line is column headers --header-out print column headers as first line -H, --headers same as '--header-in --header-out' -i, --ignore-case ignore upper/lower case when comparing text; this affects grouping, and string operations -s, --sort sort the input before grouping; this removes the need to manually pipe the input through 'sort' -c, --collapse-delimiter=X use X to separate elements in collapse and unique lists (default: comma) File Operation Options: --no-strict allow lines with varying number of fields --filler=X fill missing values with X (default N/A) General Options: -t, --field-separator=X use X instead of TAB as field delimiter --format=FORMAT print numeric values with printf style floating-point FORMAT. --output-delimiter=X use X instead as output field delimiter (default: use same delimiter as -t/-W) --narm skip NA/NaN values -R, --round=N round numeric output to N decimal places -W, --whitespace use whitespace (one or more spaces and/or tabs) for field delimiters -z, --zero-terminated end lines with 0 byte, not newline --sort-cmd=/path/to/sort Alternative sort(1) to use. --help display this help and exit --version output version information and exit Environment: LC_NUMERIC decimal-point character and thousands separator Examples: Print the sum and the mean of values from column 1: $ seq 10 | datamash sum 1 mean 1 55 5.5 Transpose input: $ seq 10 | paste - - | datamash transpose 1 3 5 7 9 2 4 6 8 10 For detailed usage information and examples, see man datamash The manual and more examples are available at https://www.gnu.org/software/datamash
Basic Usage[edit]
seq 10 | datamash --header-out min 1 mean 1 median 1 max 1 sstdev 1 count 1
- Add two numbers:
echo -e "100\n 50" | datamash sum 1
See also[edit]
Advertising: