Datamash

From wikieduonline
Revision as of 14:46, 4 May 2023 by Welcome (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

datamash[1] is a command-line program which performs basic numeric and statistical operations.

brew install datamash


Usage: datamash [OPTION] op [fld] [op fld ...]

Performs numeric/string operations on input from stdin.

'op' is the operation to perform.  If a primary operation is used,
it must be listed first, optionally followed by other operations.
'fld' is the input field to use.  'fld' can be a number (1=first field),
or a field name when using the -H or --header-in options.
Multiple fields can be listed with a comma (e.g. 1,6,8).  A range of
fields can be listed with a dash (e.g. 2-8).  Use colons for operations
which require a pair of fields (e.g. 'pcov 2:6').


Primary operations:
  groupby, crosstab, transpose, reverse, check
Line-Filtering operations:
  rmdup
Per-Line operations:
  base64, debase64, md5, sha1, sha224, sha256, sha384, sha512,
  bin, strbin, round, floor, ceil, trunc, frac,
  dirname, basename, barename, extname, getnum, cut
Numeric Grouping operations:
  sum, min, max, absmin, absmax, range
Textual/Numeric Grouping operations:
  count, first, last, rand, unique, collapse, countunique
Statistical Grouping operations:
  mean, geomean, harmmean, trimmean, median, q1, q3, iqr, perc,
  mode, antimode, pstdev, sstdev, pvar, svar, ms, rms, mad, madraw,
  pskew, sskew, pkurt, skurt, dpo, jarque,
  scov, pcov, spearson, ppearson


Options:

Grouping Options:
  -C, --skip-comments       skip comment lines (starting with '#' or ';'
                              and optional whitespace)
  -f, --full                print entire input line before op results
                              (default: print only the grouped keys)
                            This option is only sensible for linewise
                            operations. Other uses are deprecated and
                            will be removed in a future version of GNU
                            Datamash.
  -g, --group=X[,Y,Z]       group via fields X,[Y,Z];
                              equivalent to primary operation 'groupby'
      --header-in           first input line is column headers
      --header-out          print column headers as first line
  -H, --headers             same as '--header-in --header-out'
  -i, --ignore-case         ignore upper/lower case when comparing text;
                              this affects grouping, and string operations
  -s, --sort                sort the input before grouping; this removes the
                              need to manually pipe the input through 'sort'
  -c, --collapse-delimiter=X  use X to separate elements in collapse and
                              unique lists (default: comma)
File Operation Options:
      --no-strict           allow lines with varying number of fields
      --filler=X            fill missing values with X (default N/A)

General Options:
  -t, --field-separator=X   use X instead of TAB as field delimiter
      --format=FORMAT       print numeric values with printf style
                            floating-point FORMAT.
      --output-delimiter=X  use X instead as output field delimiter
                            (default: use same delimiter as -t/-W)
      --narm                skip NA/NaN values
  -R, --round=N             round numeric output to N decimal places
  -W, --whitespace          use whitespace (one or more spaces and/or tabs)
                              for field delimiters
  -z, --zero-terminated     end lines with 0 byte, not newline
      --sort-cmd=/path/to/sort   Alternative sort(1) to use.
      --help     display this help and exit
      --version  output version information and exit


Environment:
  LC_NUMERIC        decimal-point character and thousands separator


Examples:

Print the sum and the mean of values from column 1:
  $ seq 10 | datamash sum 1 mean 1
  55  5.5

Transpose input:
  $ seq 10 | paste - - | datamash transpose
  1    3    5    7    9
  2    4    6    8    10

For detailed usage information and examples, see
  man datamash
The manual and more examples are available at
  https://www.gnu.org/software/datamash


Basic Usage

  • seq 10 | datamash --header-out min 1 mean 1 median 1 max 1 sstdev 1 count 1
  • Add two numbers: echo -e "100\n 50" | datamash sum 1

See also

  • https://www.gnu.org/software/datamash/
  • Advertising: