k_seq.utility: miscellaneous utility functions

k_seq.utility.mp_job(fn, itr, n_proc=12, chunksize=None, use_fork=False, **kwargs)

Run jobs in the iterator in parallel

k_seq.utility.file_tools module

k_seq.utility.file_tools.check_dir(path)

Check if a path exists, create if not

k_seq.utility.file_tools.dump_json(obj, path=None, indent=2)

Convert object to a JSON file or JSON string

k_seq.utility.file_tools.dump_pickle(obj, path)

Save object as picked file

k_seq.utility.file_tools.extract_metadata(name, template)
Function to extract metadata info from a string (name, e.g. file name) given a template

indicating the position of each metadata domain.

Parameters
  • name (str) – string to extract info, e.g. sample file name

  • template (str) – naming convention to extract metadata. Use [...] to include the region of sample_name, use {domain_name[, int/float]} to indicate region of domain to extract as metadata, including int or float will convert the domain value to int/float in applicable, otherwise, string

Returns

dictionary of all metadata extracted from domains indicated in pattern

Return type

dict

Example

Example on metadata extraction from pattern: >>> metadata = extract_metadata(

sample_name = “R4B-1250A_S16_counts.txt” template = “R4[{exp_rep}-{concentration, float}{seq_rep}_S{id, int}]_counts.txt”

)

>>> metadata
{
    'name': 'B-1250A_S16',
    'exp_rep': 'B',
    'concentration': 1250.0,
    'seq_rep': 'A',
    'id': 16
}
Notice: two back-to-back domain can only be parsed if one of them is numeric and one of them is alphabetic, and missing

value will raise error

Valid: matching ‘-A1-‘ to ‘-{{sample}}{{replicate, int}}-‘ gives {{ ‘sample’: ‘A’, ‘replicate’: 1}} Not valid: matching ‘-A-‘ to ‘-{{sample}}{{replicate, int}}-‘ will cause error

matching ‘-AA-‘ to ‘-{{sample}}{{replicate}}-‘ will cause error

k_seq.utility.file_tools.get_file_list(file_root, pattern=None, file_list=None, black_list=None, full_path=True)

Return files under the given file root match the template if applicable, folders are not included

Parameters
  • file_root (str of list of str) – root directory/directories to search

  • pattern (str) – optional, include all the files under directories if None

  • file_list (list of str) – optional, only includes the files with names in the file_list if exists

  • black_list (list of str) – optional, file names included in black_list will be excluded

  • full_path (bool) – if return the full path or only name of the file, by default, if file_root is one string, only file name will be returned; if file_root contains multiple strings, full path will be returned

Returns

list of str (file names) or path.Path (full directory)

k_seq.utility.file_tools.load_tar_gz(tarfile, file_to_extract, load_fn, gzip=True)
k_seq.utility.file_tools.read_json(path)

Read json file

k_seq.utility.file_tools.read_pickle(path)

Read pickled object form path

k_seq.utility.file_tools.read_table_files(file_path, col_name=None, header=1)

Read common seq_table files - .xls or .xlsx: first sheet will be read with first row as header - .csv: read the csv files with first row as header, separator is ‘,’ - .tsv: read the tsv files with first row as header, separator is ‘/t’

k_seq.utility.file_tools.table_object_to_dataframe(obj, table_name=None)

Convert object (file path, SeqData) to pd.DataFrame

k_seq.utility.func_tools module

class k_seq.utility.func_tools.AttrScope(attr_dict=None, keys=None, **attr_kwargs)

Bases: object

A name scope for a group of attributes

__init__(attr_dict=None, keys=None, **attr_kwargs)

Create a name scope for a group of attributes

Parameters
  • attr_dict (dict) – a dictionary with values to pass

  • keys (list of str) – a list of attributes to initialize with None

  • attr_kwargs – or directly pass some keyword arguments

add(attr_dict=None, **kwargs)
class k_seq.utility.func_tools.FuncToMethod(functions, *args, **kwargs)

Bases: object

Convert a set of functions to a collection of methods on the object

k_seq.utility.func_tools.check_attr_value(obj, **attr)
k_seq.utility.func_tools.dict_flatten(d, parent_key='', sep='_')
k_seq.utility.func_tools.get_func_params(func, required_only=True)

Get the name of arguments for a function (callable), or the arguments in __init__ for a Class (self not included)

Parameters
  • func (callable) – the function

  • required_only (bool) – if exclude arguments with default values

Returns: a list of arguments name in order

k_seq.utility.func_tools.is_int(x)
k_seq.utility.func_tools.is_numeric(x)
k_seq.utility.func_tools.is_sparse(obj)
k_seq.utility.func_tools.param_to_dict(key_list, **kwargs)

Assign kwargs to the dictionary with key from key_list - if the arg is a single value, it will be assigned to all keys - if the arg is a list, it will should have same length as key_list - if the arg is a dict, it should contain all members in the key

k_seq.utility.func_tools.run_subprocess(cmd, name=None, **kwargs)
k_seq.utility.func_tools.update_none(arg, update_by)

Update arguments with some default value :param arg: variable object :param update_by: variable object

k_seq.utility.plot_tools module

This module contains project level utility functions

class k_seq.utility.plot_tools.Presets

Bases: object

Collection of preset colors/markers

static color_cat10(num=5)
static color_pastel1(num=5)
static color_tab10(num=5)
static from_list(prop_list)
static markers(num=5, with_line=False)
k_seq.utility.plot_tools.ax_none(ax, figsize=None)
k_seq.utility.plot_tools.barplot(series, ax, label=None, yticklabels=None, barplot_kwargs=None)

General barplot for single series

k_seq.utility.plot_tools.blue_header(header)
class k_seq.utility.plot_tools.color

Bases: object

BLUE = '\x1b[94m'
BOLD = '\x1b[1m'
CYAN = '\x1b[96m'
DARKCYAN = '\x1b[36m'
END = '\x1b[0m'
GREEN = '\x1b[92m'
PURPLE = '\x1b[95m'
RED = '\x1b[91m'
UNDERLINE = '\x1b[4m'
YELLOW = '\x1b[93m'
k_seq.utility.plot_tools.format_ticks(ax, axis, tick_num=3, log=False, int_tick_only=False, tick_formatter=None)

Manual formatting for figure ticks

k_seq.utility.plot_tools.pairplot(data, vars_name=None, vars_lim=None, vars_log=None, figsize=2, 2, **kwargs)

Wrapper over seaborn.pairplot to visualize pairwise correlationw with log option

k_seq.utility.plot_tools.plot_curve(model, x=None, y=None, param=None, major_param=None, subsample=20, x_label=None, y_label=None, x_lim=None, y_lim=None, major_curve_kwargs=None, curve_kwargs=None, datapoint_kwargs=None, major_curve_label='major curve', curve_label='curves', datapoint_label='data', legend=False, legend_loc='upper right', fontsize=12, ax=None, x_tick_formatter=None, y_tick_formatter=None, **kwargs)
Plot fitting results with
  1. data points (scatter), ii) major curve, iii) a set of curves (e.g. from bootstrap or convergence)

Parameters
  • model (callable) – kinetic model returns y with first argument as x

  • x (list-like) – x values of data

  • y (list-like) – y values of corresponding x data, same length

  • param (dict or pd.DataFrame) – estimated parameters for model from fitting(s)

  • major_param (dict) – a major parameter estimated

  • subsample (int) – maximal num of fitting curves to show

  • x_label (str) – x axis label name

  • y_label (str) – y axis label name

  • x_lim (2-tuple) – lower and upper limit of x axis

  • y_lim (2-tuple) – lower and upper limit of y axis

  • major_curve_kwargs (dict) – plot arguments for major curve

  • curve_kwargs (dict) – plot arguments for other curves

  • datapoint_kwargs (dict) – scatter plot arguments for data points

  • major_curve_label (str) – label on legend for major curve

  • curve_label (str) – label on legend for other curves

  • datapoint_label (str) – label on legend for data points

  • legend (bool) – if show legend

  • legend_loc (str or 4-tuple) – specify the location of legend, default is upper right

  • ax (plt.Axes) – Axes to plot on

k_seq.utility.plot_tools.plot_loss_heatmap(model, x, y, param, param_name, param1_range, param2_range, param_log=False, resolution=100, subsample=20, fixed_params=None, cost_fn=None, z_log=True, datapoint_color='#E45756', datapoint_label='data', datapoint_kwargs=None, colorbar=True, legend=False, legend_loc='upper left', fontsize=12, ax=None, tick_num=3, x_tick_formatter=None, y_tick_formatter=None)

Plot a heatmap to show the energy landscape for cost function, on two params

Parameters
  • model (callable) – kinetic model with first argument as x. Broadcast should be implemented with x as the innest dimension

  • x (list-like) – x values of data

  • y (list-like) – y values of corresponding x data, same length

  • param (dict or pd.DataFrame) – estimated parameters for model from fitting(s)

  • param_name (2-tuple of str) – name for two params to scan

  • scan_range (dict of two tuple) –

    scan range of two parameters: {param1:(low, high),

    param2:(low, high)}

    Note: in model output, dim of param1 should always be out of dim param2

  • fixed_params (dict) – optional. If there is any fixed params, except for the two to scan

  • param_log (bool or dict of bool) – if the scan is spacing on log scale

  • resolution (int or dict of int) – resolution for two scan, default 50

  • cost_fn (callable) – cost function in calculating cost between y_ and y, take (y_, y). Default is mean squared error

  • ax (plt.Axes) – Axes to plot on

k_seq.utility.plot_tools.regplot(x, y, ax=None, xlabel=None, ylabel=None, digit=4, equation_loc='best', xlog=False, ylog=False, kwargs_scatter=None, kwargs_line=None)
k_seq.utility.plot_tools.savefig(save_fig_to, dpi=300, alpha=0)
k_seq.utility.plot_tools.value_to_loc(value, range, resolution, log)

Convert actual value to location on heatmap