k_seq.utility
: miscellaneous utility functions¶
-
k_seq.utility.
mp_job
(fn, itr, n_proc=12, chunksize=None, use_fork=False, **kwargs)¶ Run jobs in the iterator in parallel
k_seq.utility.file_tools module¶
-
k_seq.utility.file_tools.
check_dir
(path)¶ Check if a path exists, create if not
-
k_seq.utility.file_tools.
dump_json
(obj, path=None, indent=2)¶ Convert object to a JSON file or JSON string
-
k_seq.utility.file_tools.
dump_pickle
(obj, path)¶ Save object as picked file
-
k_seq.utility.file_tools.
extract_metadata
(name, template)¶ - Function to extract metadata info from a string (name, e.g. file name) given a template
indicating the position of each metadata domain.
- Parameters
name (str) – string to extract info, e.g. sample file name
template (str) – naming convention to extract metadata. Use
[...]
to include the region of sample_name, use{domain_name[, int/float]}
to indicate region of domain to extract as metadata, includingint
orfloat
will convert the domain value to int/float in applicable, otherwise, string
- Returns
dictionary of all metadata extracted from domains indicated in
pattern
- Return type
dict
Example
Example on metadata extraction from pattern: >>> metadata = extract_metadata(
sample_name = “R4B-1250A_S16_counts.txt” template = “R4[{exp_rep}-{concentration, float}{seq_rep}_S{id, int}]_counts.txt”
)
>>> metadata { 'name': 'B-1250A_S16', 'exp_rep': 'B', 'concentration': 1250.0, 'seq_rep': 'A', 'id': 16 }
- Notice: two back-to-back domain can only be parsed if one of them is numeric and one of them is alphabetic, and missing
value will raise error
Valid: matching ‘-A1-‘ to ‘-{{sample}}{{replicate, int}}-‘ gives {{ ‘sample’: ‘A’, ‘replicate’: 1}} Not valid: matching ‘-A-‘ to ‘-{{sample}}{{replicate, int}}-‘ will cause error
matching ‘-AA-‘ to ‘-{{sample}}{{replicate}}-‘ will cause error
-
k_seq.utility.file_tools.
get_file_list
(file_root, pattern=None, file_list=None, black_list=None, full_path=True)¶ Return files under the given file root match the template if applicable, folders are not included
- Parameters
file_root (str of list of str) – root directory/directories to search
pattern (str) – optional, include all the files under directories if None
file_list (list of str) – optional, only includes the files with names in the file_list if exists
black_list (list of str) – optional, file names included in black_list will be excluded
full_path (bool) – if return the full path or only name of the file, by default, if file_root is one string, only file name will be returned; if file_root contains multiple strings, full path will be returned
- Returns
list of str (file names) or path.Path (full directory)
-
k_seq.utility.file_tools.
load_tar_gz
(tarfile, file_to_extract, load_fn, gzip=True)¶
-
k_seq.utility.file_tools.
read_json
(path)¶ Read json file
-
k_seq.utility.file_tools.
read_pickle
(path)¶ Read pickled object form path
-
k_seq.utility.file_tools.
read_table_files
(file_path, col_name=None, header=1)¶ Read common seq_table files - .xls or .xlsx: first sheet will be read with first row as header - .csv: read the csv files with first row as header, separator is ‘,’ - .tsv: read the tsv files with first row as header, separator is ‘/t’
-
k_seq.utility.file_tools.
table_object_to_dataframe
(obj, table_name=None)¶ Convert object (file path, SeqData) to pd.DataFrame
k_seq.utility.func_tools module¶
-
class
k_seq.utility.func_tools.
AttrScope
(attr_dict=None, keys=None, **attr_kwargs)¶ Bases:
object
A name scope for a group of attributes
-
__init__
(attr_dict=None, keys=None, **attr_kwargs)¶ Create a name scope for a group of attributes
- Parameters
attr_dict (dict) – a dictionary with values to pass
keys (list of str) – a list of attributes to initialize with None
attr_kwargs – or directly pass some keyword arguments
-
add
(attr_dict=None, **kwargs)¶
-
-
class
k_seq.utility.func_tools.
FuncToMethod
(functions, *args, **kwargs)¶ Bases:
object
Convert a set of functions to a collection of methods on the object
-
k_seq.utility.func_tools.
check_attr_value
(obj, **attr)¶
-
k_seq.utility.func_tools.
dict_flatten
(d, parent_key='', sep='_')¶
-
k_seq.utility.func_tools.
get_func_params
(func, required_only=True)¶ Get the name of arguments for a function (callable), or the arguments in __init__ for a Class (self not included)
- Parameters
func (callable) – the function
required_only (bool) – if exclude arguments with default values
Returns: a list of arguments name in order
-
k_seq.utility.func_tools.
is_int
(x)¶
-
k_seq.utility.func_tools.
is_numeric
(x)¶
-
k_seq.utility.func_tools.
is_sparse
(obj)¶
-
k_seq.utility.func_tools.
param_to_dict
(key_list, **kwargs)¶ Assign kwargs to the dictionary with key from key_list - if the arg is a single value, it will be assigned to all keys - if the arg is a list, it will should have same length as key_list - if the arg is a dict, it should contain all members in the key
-
k_seq.utility.func_tools.
run_subprocess
(cmd, name=None, **kwargs)¶
-
k_seq.utility.func_tools.
update_none
(arg, update_by)¶ Update arguments with some default value :param arg: variable object :param update_by: variable object
k_seq.utility.plot_tools module¶
This module contains project level utility functions
-
class
k_seq.utility.plot_tools.
Presets
¶ Bases:
object
Collection of preset colors/markers
-
static
color_cat10
(num=5)¶
-
static
color_pastel1
(num=5)¶
-
static
color_tab10
(num=5)¶
-
static
from_list
(prop_list)¶
-
static
markers
(num=5, with_line=False)¶
-
static
-
k_seq.utility.plot_tools.
ax_none
(ax, figsize=None)¶
-
k_seq.utility.plot_tools.
barplot
(series, ax, label=None, yticklabels=None, barplot_kwargs=None)¶ General barplot for single series
-
k_seq.utility.plot_tools.
blue_header
(header)¶
-
class
k_seq.utility.plot_tools.
color
¶ Bases:
object
-
BLUE
= '\x1b[94m'¶
-
BOLD
= '\x1b[1m'¶
-
CYAN
= '\x1b[96m'¶
-
DARKCYAN
= '\x1b[36m'¶
-
END
= '\x1b[0m'¶
-
GREEN
= '\x1b[92m'¶
-
PURPLE
= '\x1b[95m'¶
-
RED
= '\x1b[91m'¶
-
UNDERLINE
= '\x1b[4m'¶
-
YELLOW
= '\x1b[93m'¶
-
-
k_seq.utility.plot_tools.
format_ticks
(ax, axis, tick_num=3, log=False, int_tick_only=False, tick_formatter=None)¶ Manual formatting for figure ticks
-
k_seq.utility.plot_tools.
pairplot
(data, vars_name=None, vars_lim=None, vars_log=None, figsize=2, 2, **kwargs)¶ Wrapper over seaborn.pairplot to visualize pairwise correlationw with log option
-
k_seq.utility.plot_tools.
plot_curve
(model, x=None, y=None, param=None, major_param=None, subsample=20, x_label=None, y_label=None, x_lim=None, y_lim=None, major_curve_kwargs=None, curve_kwargs=None, datapoint_kwargs=None, major_curve_label='major curve', curve_label='curves', datapoint_label='data', legend=False, legend_loc='upper right', fontsize=12, ax=None, x_tick_formatter=None, y_tick_formatter=None, **kwargs)¶ - Plot fitting results with
data points (scatter), ii) major curve, iii) a set of curves (e.g. from bootstrap or convergence)
- Parameters
model (callable) – kinetic model returns y with first argument as x
x (list-like) – x values of data
y (list-like) – y values of corresponding x data, same length
param (dict or pd.DataFrame) – estimated parameters for model from fitting(s)
major_param (dict) – a major parameter estimated
subsample (int) – maximal num of fitting curves to show
x_label (str) – x axis label name
y_label (str) – y axis label name
x_lim (2-tuple) – lower and upper limit of x axis
y_lim (2-tuple) – lower and upper limit of y axis
major_curve_kwargs (dict) – plot arguments for major curve
curve_kwargs (dict) – plot arguments for other curves
datapoint_kwargs (dict) – scatter plot arguments for data points
major_curve_label (str) – label on legend for major curve
curve_label (str) – label on legend for other curves
datapoint_label (str) – label on legend for data points
legend (bool) – if show legend
legend_loc (str or 4-tuple) – specify the location of legend, default is upper right
ax (plt.Axes) – Axes to plot on
-
k_seq.utility.plot_tools.
plot_loss_heatmap
(model, x, y, param, param_name, param1_range, param2_range, param_log=False, resolution=100, subsample=20, fixed_params=None, cost_fn=None, z_log=True, datapoint_color='#E45756', datapoint_label='data', datapoint_kwargs=None, colorbar=True, legend=False, legend_loc='upper left', fontsize=12, ax=None, tick_num=3, x_tick_formatter=None, y_tick_formatter=None)¶ Plot a heatmap to show the energy landscape for cost function, on two params
- Parameters
model (callable) – kinetic model with first argument as x. Broadcast should be implemented with x as the innest dimension
x (list-like) – x values of data
y (list-like) – y values of corresponding x data, same length
param (dict or pd.DataFrame) – estimated parameters for model from fitting(s)
param_name (2-tuple of str) – name for two params to scan
scan_range (dict of two tuple) –
scan range of two parameters: {param1:(low, high),
param2:(low, high)}
Note: in model output, dim of param1 should always be out of dim param2
fixed_params (dict) – optional. If there is any fixed params, except for the two to scan
param_log (bool or dict of bool) – if the scan is spacing on log scale
resolution (int or dict of int) – resolution for two scan, default 50
cost_fn (callable) – cost function in calculating cost between y_ and y, take (y_, y). Default is mean squared error
ax (plt.Axes) – Axes to plot on
-
k_seq.utility.plot_tools.
regplot
(x, y, ax=None, xlabel=None, ylabel=None, digit=4, equation_loc='best', xlog=False, ylog=False, kwargs_scatter=None, kwargs_line=None)¶
-
k_seq.utility.plot_tools.
savefig
(save_fig_to, dpi=300, alpha=0)¶
-
k_seq.utility.plot_tools.
value_to_loc
(value, range, resolution, log)¶ Convert actual value to location on heatmap