`k_seq.utility`: miscellaneous utility functions¶

k_seq.utility.mp_job(fn, itr, n_proc=12, chunksize=None, use_fork=False, **kwargs)¶: Run jobs in the iterator in parallel

k_seq.utility.file_tools module¶

k_seq.utility.file_tools.check_dir(path)¶: Check if a path exists, create if not

k_seq.utility.file_tools.dump_json(obj, path=None, indent=2)¶: Convert object to a JSON file or JSON string

k_seq.utility.file_tools.dump_pickle(obj, path)¶: Save object as picked file

k_seq.utility.file_tools.extract_metadata(name, template)¶

Function to extract metadata info from a string (name, e.g. file name) given a template: indicating the position of each metadata domain.

Parameters

name (str) – string to extract info, e.g. sample file name
template (str) – naming convention to extract metadata. Use [...] to include the region of sample_name, use {domain_name[, int/float]} to indicate region of domain to extract as metadata, including int or float will convert the domain value to int/float in applicable, otherwise, string

Returns

dictionary of all metadata extracted from domains indicated in pattern

Return type

dict

Example

Example on metadata extraction from pattern: >>> metadata = extract_metadata(

sample_name = “R4B-1250A_S16_counts.txt” template = “R4[{exp_rep}-{concentration, float}{seq_rep}_S{id, int}]_counts.txt”

)

>>> metadata
{
    'name': 'B-1250A_S16',
    'exp_rep': 'B',
    'concentration': 1250.0,
    'seq_rep': 'A',
    'id': 16
}

Notice: two back-to-back domain can only be parsed if one of them is numeric and one of them is alphabetic, and missing

value will raise error

Valid: matching ‘-A1-‘ to ‘-{{sample}}{{replicate, int}}-‘ gives {{ ‘sample’: ‘A’, ‘replicate’: 1}} Not valid: matching ‘-A-‘ to ‘-{{sample}}{{replicate, int}}-‘ will cause error

matching ‘-AA-‘ to ‘-{{sample}}{{replicate}}-‘ will cause error

k_seq.utility.file_tools.get_file_list(file_root, pattern=None, file_list=None, black_list=None, full_path=True)¶

Return files under the given file root match the template if applicable, folders are not included

Parameters

file_root (str of list of str) – root directory/directories to search
pattern (str) – optional, include all the files under directories if None
file_list (list of str) – optional, only includes the files with names in the file_list if exists
black_list (list of str) – optional, file names included in black_list will be excluded
full_path (bool) – if return the full path or only name of the file, by default, if file_root is one string, only file name will be returned; if file_root contains multiple strings, full path will be returned

Returns

list of str (file names) or path.Path (full directory)

k_seq.utility.file_tools.load_tar_gz(tarfile, file_to_extract, load_fn, gzip=True)¶

k_seq.utility.file_tools.read_json(path)¶: Read json file

k_seq.utility.file_tools.read_pickle(path)¶: Read pickled object form path

k_seq.utility.file_tools.read_table_files(file_path, col_name=None, header=1)¶: Read common seq_table files - .xls or .xlsx: first sheet will be read with first row as header - .csv: read the csv files with first row as header, separator is ‘,’ - .tsv: read the tsv files with first row as header, separator is ‘/t’

k_seq.utility.file_tools.table_object_to_dataframe(obj, table_name=None)¶: Convert object (file path, SeqData) to pd.DataFrame

k_seq.utility.func_tools module¶

class k_seq.utility.func_tools.AttrScope(attr_dict=None, keys=None, **attr_kwargs)¶

Bases: object

A name scope for a group of attributes

__init__(attr_dict=None, keys=None, **attr_kwargs)¶

Create a name scope for a group of attributes

Parameters

attr_dict (dict) – a dictionary with values to pass
keys (list of str) – a list of attributes to initialize with None
attr_kwargs – or directly pass some keyword arguments

add(attr_dict=None, **kwargs)¶

class k_seq.utility.func_tools.FuncToMethod(functions, *args, **kwargs)¶

Bases: object

Convert a set of functions to a collection of methods on the object

k_seq.utility.func_tools.check_attr_value(obj, **attr)¶

k_seq.utility.func_tools.dict_flatten(d, parent_key='', sep='_')¶

k_seq.utility.func_tools.get_func_params(func, required_only=True)¶

Get the name of arguments for a function (callable), or the arguments in __init__ for a Class (self not included)

Parameters

func (callable) – the function
required_only (bool) – if exclude arguments with default values

Returns: a list of arguments name in order

k_seq.utility.func_tools.is_int(x)¶

k_seq.utility.func_tools.is_numeric(x)¶

k_seq.utility.func_tools.is_sparse(obj)¶

k_seq.utility.func_tools.param_to_dict(key_list, **kwargs)¶: Assign kwargs to the dictionary with key from key_list - if the arg is a single value, it will be assigned to all keys - if the arg is a list, it will should have same length as key_list - if the arg is a dict, it should contain all members in the key

k_seq.utility.func_tools.run_subprocess(cmd, name=None, **kwargs)¶

k_seq.utility.func_tools.update_none(arg, update_by)¶: Update arguments with some default value :param arg: variable object :param update_by: variable object

k_seq.utility.plot_tools module¶

This module contains project level utility functions

class k_seq.utility.plot_tools.Presets¶

Bases: object

Collection of preset colors/markers

static color_cat10(num=5)¶

static color_pastel1(num=5)¶

static color_tab10(num=5)¶

static from_list(prop_list)¶

static markers(num=5, with_line=False)¶

k_seq.utility.plot_tools.ax_none(ax, figsize=None)¶

k_seq.utility.plot_tools.barplot(series, ax, label=None, yticklabels=None, barplot_kwargs=None)¶: General barplot for single series

k_seq.utility.plot_tools.blue_header(header)¶

class k_seq.utility.plot_tools.color¶

Bases: object

BLUE = '\x1b[94m'¶

BOLD = '\x1b[1m'¶

CYAN = '\x1b[96m'¶

DARKCYAN = '\x1b[36m'¶

END = '\x1b[0m'¶

GREEN = '\x1b[92m'¶

PURPLE = '\x1b[95m'¶

RED = '\x1b[91m'¶

UNDERLINE = '\x1b[4m'¶

YELLOW = '\x1b[93m'¶

k_seq.utility.plot_tools.format_ticks(ax, axis, tick_num=3, log=False, int_tick_only=False, tick_formatter=None)¶: Manual formatting for figure ticks

k_seq.utility.plot_tools.pairplot(data, vars_name=None, vars_lim=None, vars_log=None, figsize=2, 2, **kwargs)¶: Wrapper over seaborn.pairplot to visualize pairwise correlationw with log option

k_seq.utility.plot_tools.plot_curve(model, x=None, y=None, param=None, major_param=None, subsample=20, x_label=None, y_label=None, x_lim=None, y_lim=None, major_curve_kwargs=None, curve_kwargs=None, datapoint_kwargs=None, major_curve_label='major curve', curve_label='curves', datapoint_label='data', legend=False, legend_loc='upper right', fontsize=12, ax=None, x_tick_formatter=None, y_tick_formatter=None, **kwargs)¶

Plot fitting results with

data points (scatter), ii) major curve, iii) a set of curves (e.g. from bootstrap or convergence)

Parameters

model (callable) – kinetic model returns y with first argument as x
x (list-like) – x values of data
y (list-like) – y values of corresponding x data, same length
param (dict or pd.DataFrame) – estimated parameters for model from fitting(s)
major_param (dict) – a major parameter estimated
subsample (int) – maximal num of fitting curves to show
x_label (str) – x axis label name
y_label (str) – y axis label name
x_lim (2-tuple) – lower and upper limit of x axis
y_lim (2-tuple) – lower and upper limit of y axis
major_curve_kwargs (dict) – plot arguments for major curve
curve_kwargs (dict) – plot arguments for other curves
datapoint_kwargs (dict) – scatter plot arguments for data points
major_curve_label (str) – label on legend for major curve
curve_label (str) – label on legend for other curves
datapoint_label (str) – label on legend for data points
legend (bool) – if show legend
legend_loc (str or 4-tuple) – specify the location of legend, default is upper right
ax (plt.Axes) – Axes to plot on

k_seq.utility.plot_tools.plot_loss_heatmap(model, x, y, param, param_name, param1_range, param2_range, param_log=False, resolution=100, subsample=20, fixed_params=None, cost_fn=None, z_log=True, datapoint_color='#E45756', datapoint_label='data', datapoint_kwargs=None, colorbar=True, legend=False, legend_loc='upper left', fontsize=12, ax=None, tick_num=3, x_tick_formatter=None, y_tick_formatter=None)¶

Plot a heatmap to show the energy landscape for cost function, on two params

Parameters

model (callable) – kinetic model with first argument as x. Broadcast should be implemented with x as the innest dimension
x (list-like) – x values of data
y (list-like) – y values of corresponding x data, same length
param (dict or pd.DataFrame) – estimated parameters for model from fitting(s)
param_name (2-tuple of str) – name for two params to scan
scan_range (dict of two tuple) –
scan range of two parameters: {param1:(low, high),

param2:(low, high)}

Note: in model output, dim of param1 should always be out of dim param2
fixed_params (dict) – optional. If there is any fixed params, except for the two to scan
param_log (bool or dict of bool) – if the scan is spacing on log scale
resolution (int or dict of int) – resolution for two scan, default 50
cost_fn (callable) – cost function in calculating cost between y_ and y, take (y_, y). Default is mean squared error
ax (plt.Axes) – Axes to plot on

k_seq.utility.plot_tools.regplot(x, y, ax=None, xlabel=None, ylabel=None, digit=4, equation_loc='best', xlog=False, ylog=False, kwargs_scatter=None, kwargs_line=None)¶

k_seq.utility.plot_tools.savefig(save_fig_to, dpi=300, alpha=0)¶

k_seq.utility.plot_tools.value_to_loc(value, range, resolution, log)¶: Convert actual value to location on heatmap

k_seq.utility: miscellaneous utility functions¶

k_seq.utility.file_tools module¶

k_seq.utility.func_tools module¶

k_seq.utility.plot_tools module¶

`k_seq.utility`: miscellaneous utility functions¶