Difference between revisions of "Datamash"

From wikieduonline
Jump to navigation Jump to search
 
Line 2: Line 2:
  
 
  [[brew install datamash]]
 
  [[brew install datamash]]
 +
 +
 +
<pre>
 +
Usage: datamash [OPTION] op [fld] [op fld ...]
 +
 +
Performs numeric/string operations on input from stdin.
 +
 +
'op' is the operation to perform.  If a primary operation is used,
 +
it must be listed first, optionally followed by other operations.
 +
'fld' is the input field to use.  'fld' can be a number (1=first field),
 +
or a field name when using the -H or --header-in options.
 +
Multiple fields can be listed with a comma (e.g. 1,6,8).  A range of
 +
fields can be listed with a dash (e.g. 2-8).  Use colons for operations
 +
which require a pair of fields (e.g. 'pcov 2:6').
 +
 +
 +
Primary operations:
 +
  groupby, crosstab, transpose, reverse, check
 +
Line-Filtering operations:
 +
  rmdup
 +
Per-Line operations:
 +
  base64, debase64, md5, sha1, sha224, sha256, sha384, sha512,
 +
  bin, strbin, round, floor, ceil, trunc, frac,
 +
  dirname, basename, barename, extname, getnum, cut
 +
Numeric Grouping operations:
 +
  sum, min, max, absmin, absmax, range
 +
Textual/Numeric Grouping operations:
 +
  count, first, last, rand, unique, collapse, countunique
 +
Statistical Grouping operations:
 +
  mean, geomean, harmmean, trimmean, median, q1, q3, iqr, perc,
 +
  mode, antimode, pstdev, sstdev, pvar, svar, ms, rms, mad, madraw,
 +
  pskew, sskew, pkurt, skurt, dpo, jarque,
 +
  scov, pcov, spearson, ppearson
 +
 +
 +
Options:
 +
 +
Grouping Options:
 +
  -C, --skip-comments      skip comment lines (starting with '#' or ';'
 +
                              and optional whitespace)
 +
  -f, --full                print entire input line before op results
 +
                              (default: print only the grouped keys)
 +
                            This option is only sensible for linewise
 +
                            operations. Other uses are deprecated and
 +
                            will be removed in a future version of GNU
 +
                            Datamash.
 +
  -g, --group=X[,Y,Z]      group via fields X,[Y,Z];
 +
                              equivalent to primary operation 'groupby'
 +
      --header-in          first input line is column headers
 +
      --header-out          print column headers as first line
 +
  -H, --headers            same as '--header-in --header-out'
 +
  -i, --ignore-case        ignore upper/lower case when comparing text;
 +
                              this affects grouping, and string operations
 +
  -s, --sort                sort the input before grouping; this removes the
 +
                              need to manually pipe the input through 'sort'
 +
  -c, --collapse-delimiter=X  use X to separate elements in collapse and
 +
                              unique lists (default: comma)
 +
File Operation Options:
 +
      --no-strict          allow lines with varying number of fields
 +
      --filler=X            fill missing values with X (default N/A)
 +
 +
General Options:
 +
  -t, --field-separator=X  use X instead of TAB as field delimiter
 +
      --format=FORMAT      print numeric values with printf style
 +
                            floating-point FORMAT.
 +
      --output-delimiter=X  use X instead as output field delimiter
 +
                            (default: use same delimiter as -t/-W)
 +
      --narm                skip NA/NaN values
 +
  -R, --round=N            round numeric output to N decimal places
 +
  -W, --whitespace          use whitespace (one or more spaces and/or tabs)
 +
                              for field delimiters
 +
  -z, --zero-terminated    end lines with 0 byte, not newline
 +
      --sort-cmd=/path/to/sort  Alternative sort(1) to use.
 +
      --help    display this help and exit
 +
      --version  output version information and exit
 +
 +
 +
Environment:
 +
  LC_NUMERIC        decimal-point character and thousands separator
 +
 +
 +
Examples:
 +
 +
Print the sum and the mean of values from column 1:
 +
  $ seq 10 | datamash sum 1 mean 1
 +
  55  5.5
 +
 +
Transpose input:
 +
  $ seq 10 | paste - - | datamash transpose
 +
  1    3    5    7    9
 +
  2    4    6    8    10
 +
 +
For detailed usage information and examples, see
 +
  man datamash
 +
The manual and more examples are available at
 +
  https://www.gnu.org/software/datamash
 +
 +
 +
</pre>
  
 
== Basic Usage ==
 
== Basic Usage ==

Latest revision as of 14:46, 4 May 2023

datamash[1] is a command-line program which performs basic numeric and statistical operations.

brew install datamash


Usage: datamash [OPTION] op [fld] [op fld ...]

Performs numeric/string operations on input from stdin.

'op' is the operation to perform.  If a primary operation is used,
it must be listed first, optionally followed by other operations.
'fld' is the input field to use.  'fld' can be a number (1=first field),
or a field name when using the -H or --header-in options.
Multiple fields can be listed with a comma (e.g. 1,6,8).  A range of
fields can be listed with a dash (e.g. 2-8).  Use colons for operations
which require a pair of fields (e.g. 'pcov 2:6').


Primary operations:
  groupby, crosstab, transpose, reverse, check
Line-Filtering operations:
  rmdup
Per-Line operations:
  base64, debase64, md5, sha1, sha224, sha256, sha384, sha512,
  bin, strbin, round, floor, ceil, trunc, frac,
  dirname, basename, barename, extname, getnum, cut
Numeric Grouping operations:
  sum, min, max, absmin, absmax, range
Textual/Numeric Grouping operations:
  count, first, last, rand, unique, collapse, countunique
Statistical Grouping operations:
  mean, geomean, harmmean, trimmean, median, q1, q3, iqr, perc,
  mode, antimode, pstdev, sstdev, pvar, svar, ms, rms, mad, madraw,
  pskew, sskew, pkurt, skurt, dpo, jarque,
  scov, pcov, spearson, ppearson


Options:

Grouping Options:
  -C, --skip-comments       skip comment lines (starting with '#' or ';'
                              and optional whitespace)
  -f, --full                print entire input line before op results
                              (default: print only the grouped keys)
                            This option is only sensible for linewise
                            operations. Other uses are deprecated and
                            will be removed in a future version of GNU
                            Datamash.
  -g, --group=X[,Y,Z]       group via fields X,[Y,Z];
                              equivalent to primary operation 'groupby'
      --header-in           first input line is column headers
      --header-out          print column headers as first line
  -H, --headers             same as '--header-in --header-out'
  -i, --ignore-case         ignore upper/lower case when comparing text;
                              this affects grouping, and string operations
  -s, --sort                sort the input before grouping; this removes the
                              need to manually pipe the input through 'sort'
  -c, --collapse-delimiter=X  use X to separate elements in collapse and
                              unique lists (default: comma)
File Operation Options:
      --no-strict           allow lines with varying number of fields
      --filler=X            fill missing values with X (default N/A)

General Options:
  -t, --field-separator=X   use X instead of TAB as field delimiter
      --format=FORMAT       print numeric values with printf style
                            floating-point FORMAT.
      --output-delimiter=X  use X instead as output field delimiter
                            (default: use same delimiter as -t/-W)
      --narm                skip NA/NaN values
  -R, --round=N             round numeric output to N decimal places
  -W, --whitespace          use whitespace (one or more spaces and/or tabs)
                              for field delimiters
  -z, --zero-terminated     end lines with 0 byte, not newline
      --sort-cmd=/path/to/sort   Alternative sort(1) to use.
      --help     display this help and exit
      --version  output version information and exit


Environment:
  LC_NUMERIC        decimal-point character and thousands separator


Examples:

Print the sum and the mean of values from column 1:
  $ seq 10 | datamash sum 1 mean 1
  55  5.5

Transpose input:
  $ seq 10 | paste - - | datamash transpose
  1    3    5    7    9
  2    4    6    8    10

For detailed usage information and examples, see
  man datamash
The manual and more examples are available at
  https://www.gnu.org/software/datamash


Basic Usage[edit]

  • seq 10 | datamash --header-out min 1 mean 1 median 1 max 1 sstdev 1 count 1
  • Add two numbers: echo -e "100\n 50" | datamash sum 1

See also[edit]

  • https://www.gnu.org/software/datamash/
  • Advertising: