tsv_utils

Modules

common
module tsv_utils.common

Utility functions used by tsv-utils programs.

csv2tsv
module tsv_utils.csv2tsv

Convert CSV formatted data to TSV format.

keep_header
module tsv_utils.keep_header

Command line tool that executes a command while preserving header lines.

number_lines
module tsv_utils.number_lines

A simple version of the unix 'nl' program.

tsv_append
module tsv_utils.tsv_append

Command line tool that appends multiple TSV files. It is header aware and supports tracking the original source file of each row.

tsv_filter
module tsv_utils.tsv_filter

Command line tool that filters TSV files.

tsv_join
module tsv_utils.tsv_join

Command line tool that joins tab-separated value files based on a common key.

tsv_pretty
module tsv_utils.tsv_pretty

Command line tool that prints TSV data aligned for easier reading on consoles and traditional command-line environments.

tsv_sample
module tsv_utils.tsv_sample

Command line tool for shuffling or sampling lines from input streams. Several methods are available, including weighted and unweighted shuffling, simple and weighted random sampling, sampling with replacement, Bernoulli sampling, and distinct sampling.

tsv_select
module tsv_utils.tsv_select

A variant of the unix 'cut' program, with the ability to reorder fields.

tsv_split
module tsv_utils.tsv_split

Command line tool for splitting a files (or files) into multiple output files. Several methods for splitting are available, including splitting by line count, splitting by random assignment, and splitting by random assignment based on key fields.

tsv_summarize
module tsv_utils.tsv_summarize

Command line tool that reads TSV files and summarizes field values associated with equivalent keys.

tsv_uniq
module tsv_utils.tsv_uniq

Command line tool that identifies equivalent lines in an input stream. Equivalent lines are identified using either the full line or a set of fields as the key. By default, input is written to standard output, retaining only the first occurrence of equivalent lines. There are also options for marking and numbering equivalent lines rather, without filtering out duplicates.