nuad documentation

nuad.constraints

This module defines types for helping to define DNA sequence design constraints.

The key classes are Design, Strand, Domain to define a DNA design, and various subclasses of Constraint, such as StrandConstraint or StrandPairConstraint, to define constraints on the sequences assigned to each Domain when calling search.search_for_dna_sequences().

Also important are two other types of constraints (not subclasses of Constraint), which are used prior to the search to determine if it is even legal to use a DNA sequence: subclasses of the abstract base class NumpyFilter, and SequenceFilter, an alias for a function taking a string as input and returning a bool.

See the README on the GitHub page for more detailed explaination of these classes: https://github.com/UC-Davis-molecular-computing/dsd#data-model

constraints.all_dna_bases = {'A', 'C', 'G', 'T'}

Set of all DNA bases.

class constraints.M13Variant(value)[source]

Variants of M13mp18 viral genome. “Standard” variant is p7249. Other variants are longer.

p7249 = 'p7249'

“Standard” variant of M13mp18; 7249 bases long, available from, for example

https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p7249

https://www.neb.com/products/n4040-m13mp18-single-stranded-dna

http://www.bayoubiolabs.com/biochemicat/vectors/pUCM13/

p7560 = 'p7560'

Variant of M13mp18 that is 7560 bases long. Available from, for example

https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p7560

p8064 = 'p8064'

Variant of M13mp18 that is 8064 bases long. Available from, for example

https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p8064

length()[source]
Returns

length of this variant of M13 (e.g., 7249 for variant M13Variant.p7249)

Return type

int

constraints.m13(rotation=5587, variant=M13Variant.p7249)[source]

The M13mp18 DNA sequence (commonly called simply M13).

By default, starts from cyclic rotation 5587 (with 0-based indexing; commonly this is called rotation 5588, which assumes that indexing begins at 1), as defined in GenBank.

By default, returns the “standard” variant of consisting of 7249 bases, sold by companies such as Tilibit and New England Biolabs.

For a more detailed discussion of why the default rotation 5587 of M13 is used, see Supplementary Note S8 in [Folding DNA to create nanoscale shapes and patterns. Paul W. K. Rothemund, Nature 440:297-302 (2006)].

Parameters
  • rotation (int) – rotation of circular strand. Valid values are 0 through length-1.

  • variant (M13Variant) – variant of M13 strand to use

Returns

M13 strand sequence

Return type

str

constraints.m13_substrings_of_length(length, except_indices=(5514, 5515, 5516, 5517, 5518, 5519, 5520, 5521, 5522, 5523, 5524, 5525, 5526, 5527, 5528, 5529, 5530, 5531, 5532, 5533, 5534, 5535, 5536, 5537, 5538, 5539, 5540, 5541, 5542, 5543, 5544, 5545, 5546, 5547, 5548, 5549, 5550, 5551, 5552, 5553, 5554, 5555, 5556), variant=M13Variant.p7249)[source]

WARNING: This function was previously recommended to use with DomainPool.possible_sequences to specify possible rotations of M13 to use. However, it creates a large file size to write all those sequences to disk on every update in the search. A better method now exists to specify this, which is to specify a SubstringSampler object as the value for DomainPool.possible_sequences instead of calling this function.

Return all substrings of the M13mp18 DNA sequence of length length, except those overlapping indices in except_start_indices.

This is useful with the field DomainPool.possible_sequences, when one strand in the Design represents a small portion of the full M13 sequence, and part of the sequence design process is to choose a rotation of M13 to use. One can set that strand to have a single Domain, which contains dependent subdomains (those with Domain.dependent set to True). These subdomains are the smaller domains where M13 attaches to other Strand’s in the Design. Then, give the parent Domain a DomainPool with DomainPool.possible_sequences set to the return value of this function, to allow the search to explore different rotations of M13.

For example, suppose m13_subdomains is a list containing Domain’s from the Design, which are consecutive subdomains of M13 from 5’ to 3’ (all with Domain.dependent set to True), and m13_length is the sum of their lengths (note this needs to be calculated manually since the following code assumes no Domain in m13_subdomains has a DomainPool yet, thus none yet have a length). Then the following code creates a Strand representing the M13 portion that binds to other Strand’s in the Design.

m13_subdomains = # subdomains of M13 used in the design
m13_length = # sum of lengths of domains in m13_subdomains
m13_substrings = dc.m13_substrings_of_length(m13_length)
m13_domain_pool = dc.DomainPool(name='m13 domain pool', possible_sequences=m13_substrings)
m13_domain = dc.Domain(name='m13', subdomains=m13_subdomains, pool=m13_domain_pool)
m13_strand = dc.Strand(name='m13', domains=[m13_domain])
Parameters
  • length (int) – length of substrings to return

  • except_indices (Iterable[int]) – Indices of M13 to avoid in any part of the substring. If not specified, based on length, indices 5514-5556 are avoided, which are known to contain a long hairpin. (When using 1-based indexing, these are indices 5515-5557.) For example, if length = 10, then the starting indices of substrings will avoid the list [5505, 5506, …, 5556]

  • variant (M13Variant) – M13Variant to use

Returns

All substrings of the M13mp18 DNA sequence, except those that overlap any index in except_start_indices.

Return type

List[str]

constraints.default_score_transfer_function(x)[source]

A quadratic transfer function.

Returns

max(0.0, x^2)

Parameters

x (float) –

Return type

float

constraints.logger = <Logger dsd (DEBUG)>

Global logger instance used throughout dsd.

Call logger.removeHandler(logger.handlers[0]) to stop screen output (assuming that you haven’t added or removed any handlers to the dsd logger instance already; by default there is one StreamHandler, and removing it will stop screen output).

Call logger.addHandler(logging.FileHandler(filename)) to direct to a file.

constraints.all_pairs(values, with_replacement=True, where=<function <lambda>>)[source]

Strongly typed function to get list of all pairs from iterable. (for using with mypy)

Parameters
  • values (Iterable[T]) – Iterable of values.

  • with_replacement (bool) – Whether to include self pairs, i.e., pairs (a,a)

  • where (Callable[[T, T], bool]) – Predicate indicating whether to include a specific pair. Must take two parameters, each of type T, and return a bool.

Returns

List of all pairs of values from iterable.

Return type

List[Tuple[T, T]]

constraints.all_pairs_iterator(values, with_replacement=True, where=<function <lambda>>)[source]

Strongly typed function to get iterator of all pairs from iterable. (for using with mypy)

This is WITH replacement; to specify without replacement, set with_replacement = False

Parameters
  • values (Iterable[T]) – Iterable of values.

  • with_replacement (bool) – Whether to include self pairs, i.e., pairs (a,a)

  • where (Callable[[Tuple[T, T]], bool]) – Predicate indicating whether to include a specific pair.

Returns

Iterator of all pairs of values from iterable. Unlike all_pairs(), which returns a list, the iterator returned may be iterated over only ONCE.

Return type

Iterator[Tuple[T, T]]

constraints.SequenceFilter = typing.Callable[[str], bool]

Filter (see description of NumpyFilter for explanation of the term “filter”) that applies to a DNA sequence; the difference between this an a DomainConstraint is that these are applied before a sequence is assigned to a Domain, so the constraint can only be based on the DNA sequence, and not, for instance, on the Domain’s DomainPool.

Consequently SequenceFilter’s, like NumpyFilter’s, are treated differently than subtypes of Constraint, since a DNA sequence failing any SequenceFilter’s or NumpyFilter’s is never allowed to be assigned into any Domain.

The difference with NumpyFilter is that a NumpyFilter requires one to express the constraint in a way that is efficient for the linear algebra operations of numpy. If you cannot figure out how to do this, a SequenceFilter can be expressed in pure Python, but typically will be much slower to apply than a NumpyFilter.

alias of Callable[[str], bool]

class constraints.NumpyFilter[source]

Abstract base class for numpy filters. A “filter” is a hard constraint applied to sequences for a Domain; a sequence not passing the filter is never allowed to be assigned to a Domain. This constrasts with the various subclasses of Constraint, which are different in two ways: 1) they can apply to large parts of the design than just a domain, e.g., a Strand or a pair of Domain’s, and 2) they are “soft” constraint that are allowed to be violated during the course of the search.

A NumpyFilter is one that can be efficiently encoded as numpy operations on 2D arrays of bytes representing DNA sequences, through the class np.DNASeqList (which uses such a 2D array as the field np.DNASeqList.seqarr).

Subclasses should set the value NumpyFilter.name, inherited from this class.

Pre-made subclasses of NumpyFilter provided in this library, such as RestrictBasesFilter or NearestNeighborEnergyFilter, are dataclasses (https://docs.python.org/3/library/dataclasses.html). There is no requirement that custom subclasses be dataclasses, but since the subclasses will inherit the field NumpyFilter.name, you can easily make them dataclasses to get, for example, free repr and str implementations. See the source code for examples.

The related type SequenceFilter (which is just an alias for a Python function with a certain signature) has a similar purpose, but is used for filters that cannot be encoded as numpy operations. Since they are applied by running a Python loop, they are much slower to evaluate than a NumpyFilter.

name: str = 'TODO: give a concrete name to this NumpyFilter'

Name of this NumpyFilter.

abstract remove_violating_sequences(seqs)[source]

Subclasses should override this method.

Since these are filters that use numpy, generally they will access the numpy ndarray instance seqs.seqarr, operate on it, and then create a new np.DNASeqList instance via the constructor np.DNASeqList taking an numpy ndarray as input.

See the source code of included constraints for examples, such as NearestNeighborEnergyFilter.remove_violating_sequences() or BaseCountFilter.remove_violating_sequences(). These are usually quite tricky to write, requiring one to think in terms of linear algebra operations. The code tends not to be easy to read. But when a constraint can be expressed in this way, it is typically very fast to apply; many millions of sequences can be processed in a few seconds.

Parameters

seqs (DNASeqList) – np.DNASeqList object representing DNA sequences

Returns

a new np.DNASeqList object representing the DNA sequences in seqs that satisfy the constraint

Return type

DNASeqList

__init__()
Return type

None

class constraints.RestrictBasesFilter(bases)[source]

Restricts the sequence to use only a subset of bases. This can be used to implement a so-called “three-letter code”, for instance, in which a certain subset of Strand uses only the bases A, T, C (and Strand’s with complementary Domain use only A, T, G), to help reduce secondary structure of those Strand’s. See for example Supplementary Section S1.1 of “Scaling Up Digital Circuit Computation with DNA Strand Displacement Cascades”, Qian and Winfree, Science 332:1196–1201, 2011. DOI: 10.1126/science.1200520, https://science.sciencemag.org/content/332/6034/1196, http://www.qianlab.caltech.edu/seesaw_digital_circuits2011_SI.pdf

Note, however, that this is a filter for Domain’s, not whole Strand’s, so for a three-letter code to work, you must take care not to mixed Domain’s on a Strand that will use different alphabets.

Parameters

bases (Collection[str]) –

bases: Collection[str]

Bases to use. Must be a strict subset of {‘A’, ‘C’, ‘G’, ‘T’} with at least two bases.

remove_violating_sequences(seqs)[source]

Should never be called directly; it is handled specially by the library when initially generating sequences.

Parameters

seqs (DNASeqList) –

Return type

DNASeqList

__init__(bases)
Parameters

bases (Collection[str]) –

Return type

None

class constraints.NearestNeighborEnergyFilter(low_energy, high_energy, temperature=37.0)[source]

This constraint calculates the nearest-neighbor binding energy of a domain with its perfect complement (summing over all length-2 substrings of the domain’s sequence), using parameters from the 2004 Santa-Lucia and Hicks paper (https://www.annualreviews.org/doi/abs/10.1146/annurev.biophys.32.110601.141800, see Table 1, and example on page 419). It rejects any sequences whose energy according to this sum is outside the range [NearestNeighborEnergyFilter.low_energy, NearestNeighborEnergyFilter.high_energy].

Parameters
  • low_energy (float) –

  • high_energy (float) –

  • temperature (float) –

low_energy: float

Low threshold for nearest-neighbor energy.

high_energy: float

High threshold for nearest-neighbor energy.

temperature: float = 37.0

Temperature in Celsius at which to calculate nearest-neighbor energy.

remove_violating_sequences(seqs)[source]

Remove sequences with nearest-neighbor energies outside of an interval.

Parameters

seqs (DNASeqList) –

Return type

DNASeqList

__init__(low_energy, high_energy, temperature=37.0)
Parameters
  • low_energy (float) –

  • high_energy (float) –

  • temperature (float) –

Return type

None

class constraints.BaseCountFilter(base, high_count=None, low_count=None)[source]

Restricts the sequence to contain a certain number of occurences of a given base.

Parameters
  • base (str) –

  • high_count (int | None) –

  • low_count (int | None) –

base: str

Base to count.

high_count: int | None = None

Count of BaseCountFilter.base must be at most BaseCountFilter.high_count.

low_count: int | None = None

Count of BaseCountFilter.base must be at least BaseCountFilter.low_count.

remove_violating_sequences(seqs)[source]

Remove sequences whose counts of a certain base are outside of an interval.

Parameters

seqs (DNASeqList) –

Return type

DNASeqList

__init__(base, high_count=None, low_count=None)
Parameters
  • base (str) –

  • high_count (int | None) –

  • low_count (int | None) –

Return type

None

class constraints.BaseEndFilter(bases, distance_from_end=0, five_prime=True, three_prime=True)[source]

Restricts the sequence to contain only certain bases on (or near, if BaseEndFilter.distance > 0) each end.

Parameters
  • bases (Collection[str]) –

  • distance_from_end (int) –

  • five_prime (bool) –

  • three_prime (bool) –

bases: Collection[str]

Bases to require on ends.

distance_from_end: int = 0

Distance from end.

five_prime: bool = True

Whether to apply to 5’ end of sequence (left end of DNA sequence, lowest index).

three_prime: bool = True

Whether to apply to 3’ end of sequence (right end of DNA sequence, highest index).

remove_violating_sequences(seqs)[source]

Keeps sequences with the given bases at given distance from the 5’ or 3’ end.

Parameters

seqs (DNASeqList) –

Return type

DNASeqList

__init__(bases, distance_from_end=0, five_prime=True, three_prime=True)
Parameters
  • bases (Collection[str]) –

  • distance_from_end (int) –

  • five_prime (bool) –

  • three_prime (bool) –

Return type

None

class constraints.BaseAtPositionFilter(bases, position)[source]

Restricts the sequence to contain only certain base(s) on at a particular position.

One use case is that many internal modifications (e.g., biotin or fluorophore) can only be placed on an T.

Parameters
  • bases (str | Collection[str]) –

  • position (int) –

bases: str | Collection[str]

Base(s) to require at position BasePositionConstraint.position.

Can either be a single base, or a collection (e.g., list, tuple, set). If several bases are specified, the base at BasePositionConstraint.position must be one of the bases in BasePositionConstraint.bases.

position: int

Position of base to check.

remove_violating_sequences(seqs)[source]

Remove sequences that don’t have one of the given bases at the given position.

Parameters

seqs (DNASeqList) –

Return type

DNASeqList

__init__(bases, position)
Parameters
  • bases (str | Collection[str]) –

  • position (int) –

Return type

None

class constraints.ForbiddenSubstringFilter(substrings, indices=None)[source]

Restricts the sequence not to contain a certain substring(s), e.g., GGGG.

Parameters
  • substrings (str | Collection[str]) –

  • indices (Sequence[int] | None) –

substrings: str | Collection[str]

Substring(s) to forbid.

Can either be a single substring, or a collection (e.g., list, tuple, set). If a collection, all substrings must have the same length.

indices: Sequence[int] | None = None

Indices at which to check for each substring in ForbiddenSubstringFilter.substrings. If not specified, all appropriate indices are checked.

length()[source]
Returns

length of substring(s) to check

Return type

int

remove_violating_sequences(seqs)[source]

Remove sequences that have a string in ForbiddenSubstringFilter.substrings as a substring.

Parameters

seqs (DNASeqList) –

Return type

DNASeqList

__init__(substrings, indices=None)
Parameters
  • substrings (str | Collection[str]) –

  • indices (Sequence[int] | None) –

Return type

None

class constraints.RunsOfBasesFilter(bases, length)[source]

Restricts the sequence not to contain runs of a certain length from a certain subset of bases, (e.g., forbidding any substring in {C,G}^3; no four bases can appear in a row that are either C or G)

This works by simply generating all strings representing the runs of bases, and then using a ForbiddenSubstringFilter with those strings. So this will not be efficient for forbidding, for example {A,C,T}^20 (i.e., all runs of A’s, C’s, or T’s of length 20), which would generate all 3^20 = 3,486,784,401 strings of length 20 from the alphabet {A,C,T}^20. Hopefully such a constraint would not be used in practice.

Parameters
  • bases (Collection[str]) –

  • length (int) –

__init__(bases, length)[source]
Parameters
  • bases (str | Collection[str]) – Can either be a single base, or a collection (e.g., list, tuple, set).

  • length (int) – length of run to forbid

Return type

None

bases: Collection[str]

Bases to forbid in runs of length RunsOfBasesFilter.length.

length: int

Length of run to forbid.

remove_violating_sequences(seqs)[source]

Remove sequences that have a run of given length of bases from given bases.

Parameters

seqs (DNASeqList) –

Return type

DNASeqList

class constraints.SubstringSampler(supersequence, substring_length, except_start_indices=None, except_overlapping_indices=None, circular=False)[source]

A SubstringSampler is an object for specifying a common case for the field DomainPool.possible_sequences, namely where we want the set of possible sequences to be all (or many) substrings of a single longer sequence.

For example, this can be used to choose a rotation of the M13mp18 strand in sequence design. If for example 300 consecutive bases of M13 will be used in the design, and we want to choose the rotation, but disallow the substring of length 300 to overlap the hairpin at indices 5514-5556, then one would do the following

possible_sequences = SubstringSampler(
    supersequence=m13(), substring_length=300,
    except_overlapping_indices=range(5514, 5557), circular=True)
pool = DomainPool('M13 rotations', possible_sequences=possible_sequences)
Parameters
  • supersequence (str) –

  • substring_length (int) –

  • except_start_indices (Tuple[int]) –

  • except_overlapping_indices (Iterable[int] | None) –

  • circular (bool) –

__init__(supersequence, substring_length, except_start_indices=None, except_overlapping_indices=None, circular=False)[source]
Parameters
  • supersequence (str) –

  • substring_length (int) –

  • except_start_indices (Iterable[int] | None) –

  • except_overlapping_indices (Iterable[int] | None) –

  • circular (bool) –

Return type

None

supersequence: str

The longer sequence from which to sample substrings.

substring_length: int

Length of substrings to sample.

circular: bool

Whether SubstringSampler.supersequence is circular. If so, then we can sample indices near the end and the substrings will start at the end and wrap around to the start.

except_start_indices: Tuple[int]

Start indices in SubstringSampler.supersequence to avoid. In the constructor this can be specified directly. Another option (mutually exclusive with the parameter except_start_indices) is to specify the parameter except_overlapping_indices, which sets SubstringSampler.except_start_indices so that substrings will not intersect any indices in except_overlapping_indices.

extended_supersequence: str

If SubstringSampler.circular is True, then this is SubstringSampler.supersequence extended by its own prefix of length SubstringSampler.substring_length - 1, to make sampling easier. Otherwise it is simply identical to SubstringSampler.supersequence. Computed in constructor from other arguments.

start_indices: Tuple[int]

List of start indices from which to sample when calling SubstringSampler.sample_substring(). Computed in constructor from other arguments.

sample_substring(rng)[source]
Returns

a random substring of SubstringSampler.supersequence of length SubstringSampler.substring_length.

Parameters

rng (Generator) –

Return type

str

class constraints.DomainPool(name, length=None, possible_sequences=None, replace_with_close_sequences=True, hamming_probability=<factory>, numpy_filters=<factory>, sequence_filters=<factory>)[source]

Represents a group of related Domain’s that share common properties in their sequence design, such as length of DNA sequence, or bounds on nearest-neighbor duplex energy.

Also serves as a “source” of DNA sequences for Domain’s in this DomainPool. By calling DomainPool.generate_sequence() repeatedly, we can produce DNA sequences satisfying the constraints defining this DomainPool.

Parameters
  • name (str) –

  • length (int | None) –

  • possible_sequences (List[str] | SubstringSampler | None) –

  • replace_with_close_sequences (bool) –

  • hamming_probability (Dict[int, float]) –

  • numpy_filters (List[NumpyFilter]) –

  • sequence_filters (List[SequenceFilter]) –

name: str

Name of this DomainPool. Must be unique.

length: int | None = None

Length of DNA sequences generated by this DomainPool.

Should be None if DomainPool.possible_sequences is specified.

possible_sequences: List[str] | SubstringSampler | None = None

If specified, all other fields except DomainPool.name and DomainPool.length are ignored. This is an explicit list of sequences to consider for Domain’s using this DomainPool. During the search, if a domain with this DomainPool is picked to have its sequence changed, then a sequence will be picked uniformly at random from this list. Note that no NumpyFilter’s or SequenceFilter’s will be applied.

Alternatively, the field can be an instance of SubstringSampler for the common case that the set of possible sequences is simple all (or many) substrings of a single longer sequence. For example, this can be used to choose a rotation of the M13mp18 strand in sequence design.

Should be None if DomainPool.length is specified.

replace_with_close_sequences: bool = True

If True, instead of picking a sequence uniformly at random from all those satisfying the constraints when returning a sequence from DomainPool.generate_sequence(), one is picked “close” in Hamming distance to the previous sequence of the Domain. The field DomainPool.hamming_probability is used to pick a distance at random, after which a sequence that distance from the previous sequence is selected to return.

hamming_probability: Dict[int, float]

Dictionary that specifies probability of taking a new sequence from the pool that is some integer number of bases different from the previous sequence (Hamming distance).

numpy_filters: List[NumpyFilter]

NumpyFilter’s shared by all Domain’s in this DomainPool. This is used to choose potential sequences to assign to the Domain’s in this DomainPool in the method DomainPool.generate_sequence().

The difference with DomainPool.sequence_filters is that these constraints can be applied efficiently to many sequences at once, represented as a numpy 2D array of bytes (via the class np.DNASeqList), so they are done in large batches in advance. In contrast, the constraints in DomainPool.sequence_filters are done on Python strings representing DNA sequences, and they are called one at a time when a new sequence is requested in DomainPool.generate_sequence().

Optional; default is empty.

sequence_filters: List[SequenceFilter]

SequenceFilter’s shared by all Domain’s in this DomainPool. This is used to choose potential sequences to assign to the Domain’s in this DomainPool in the method DomainPool.generate().

See DomainPool.numpy_filters for an explanation of the difference between them.

See DomainPool.domain_constraints for an explanation of the difference between them.

Optional; default is empty.

satisfies_sequence_constraints(sequence)[source]
Parameters

sequence (str) – DNA sequence to check

Returns

whether sequence satisfies all constraints in DomainPool.sequence_filters

Return type

bool

generate_sequence(rng, previous_sequence=None)[source]

Returns a DNA sequence of given length satisfying DomainPool.numpy_filters and DomainPool.sequence_filters

Note: By default, there is no check that the sequence returned is unequal to one already assigned somewhere in the design, since both DomainPool.numpy_filters and DomainPool.sequence_filters do not have access to the whole Design. But the DomainPairConstraint returned by domains_not_substrings_of_each_other_constraint() can be used to specify this Design-wide constraint.

Note that if DomainPool.possible_sequences is specified, then all constraints are ignored, and instead a sequence is chosen randomly to be returned from that list.

Parameters
  • rng (np.random.Generator) – numpy random number generator to use. To use a default, pass np.default_rng.

  • previous_sequence (str | None) – previously generated sequence to be replaced by a new sequence; None if no previous sequence exists. Used to choose a new sequence “close” to itself in Hamming distance, if the field DomainPool.replace_with_close_sequences is True and previous_sequence is not None. The number of differences between previous_sequence and its neighbors is determined by randomly picking a Hamming distance from DomainPool.hamming_probability with weighted probabilities of choosing each distance.

Returns

DNA sequence of given length satisfying DomainPool.numpy_filters and DomainPool.sequence_filters

Return type

str

__init__(name, length=None, possible_sequences=None, replace_with_close_sequences=True, hamming_probability=<factory>, numpy_filters=<factory>, sequence_filters=<factory>)
Parameters
  • name (str) –

  • length (int | None) –

  • possible_sequences (List[str] | SubstringSampler | None) –

  • replace_with_close_sequences (bool) –

  • hamming_probability (Dict[int, float]) –

  • numpy_filters (List[NumpyFilter]) –

  • sequence_filters (List[SequenceFilter]) –

Return type

None

class constraints.Part[source]
class constraints.Domain(name, pool=None, sequence=None, fixed=False, label=None, dependent=False, subdomains=None, weight=None)[source]

Represents a contiguous substring of the DNA sequence of a Strand, which is intended to be either single-stranded, or to bind fully to the Watson-Crick complement of the Domain.

If two domains are complementary, they are represented by the same Domain object. They are distinguished only by whether the Strand object containing them has the Domain in its set Strand.starred_domains or not.

A Domain uses only its name to compute hash and equality checks, not its sequence. This allows a Domain to be used in sets and dicts while modifying the sequence assigned to it, and also modifying the pool (letting the pool be assigned after it is created).

Parameters
  • name (str) –

  • pool (DomainPool | None) –

  • sequence (str | None) –

  • fixed (bool) –

  • label (DomainLabel | None) –

  • dependent (bool) –

  • subdomains (List[Domain] | None) –

  • weight (float | None) –

length: int | None = None

Length of this domain. If None, then the method Domain.get_length() asks Domain.pool for the length. However, a Domain with Domain.dependent set to True has no Domain.pool. For such domains, it is necessary to set a Domain.length field directly.

parent: Domain | None = None

Domain of which this is a subdomain. Note, this is not set manually, this is set by the library based on the Domain.subdomains of other domains in the same tree.

__init__(name, pool=None, sequence=None, fixed=False, label=None, dependent=False, subdomains=None, weight=None)[source]
Parameters
  • name (str) –

  • pool (DomainPool | None) –

  • sequence (str | None) –

  • fixed (bool) –

  • label (DomainLabel | None) –

  • dependent (bool) –

  • subdomains (List[Domain] | None) –

  • weight (float | None) –

Return type

None

fixed: bool = False

Whether this Domain’s DNA sequence is fixed, i.e., cannot be changed by the search algorithm search.search_for_dna_sequences().

Note: If a domain is fixed then all of its subdomains must also be fixed.

label: DomainLabel | None = None

Optional generic “label” object to associate to this Domain.

Useful for associating extra information with the Domain that will be serialized, for example, for DNA sequence design. It must be an object (e.g., a dict or primitive type such as str or int) that is naturally JSON serializable. (Calling json.dumps on the object should succeed without having to specify a custom encoder.)

dependent: bool = False

Whether this Domain’s DNA sequence is dependent on others. Usually this is not the case. However, domains can be subdivided hierarchically into a tree of domains by setting Domain.subdomains to describe the tree. In this case exactly one domain along every path from the root to any leaf must be independent, and the rest dependent: the dependent domains will have their sequences calculated from the indepenedent ones.

A possible use case is that one strand represents a subsequence of M13 of length 300, of which there are 7249 possible DNA sequences to assign based on the different rotations of M13. If this strand is bound to several other strands, it will have several domains, but they cannot be set independently of each other. This can be done by creating a strand with a single long domain, which is subdivided into many dependent child domains. Only the entire strand, the root domain, can be assigned at once, changing every domain at once, so the domains are dependent on the root domain’s assigned sequence.

weight: float = 1.0

Weight to apply before picking domain at random to change when re-assigning DNA sequences during search. Should only be changed for independent domains. (those with Domain.dependent set to False)

Normally a domain’s probability of being changed is proportional to the total score of violations it causes, but that total score is first multiplied by Domain.weight. This is useful, for instance, to weight a domain lower when it has many subdomains that intersect many strands, for example if a domain represents an M13 strand. It may be more efficient to pick such a domain less often since changing it will change many strands in the design and, when the design gets close to optimized, this will likely cause the score to go up.

to_json_serializable(suppress_indent=True)[source]
Returns

Dictionary d representing this Domain that is “naturally” JSON serializable, by calling json.dumps(d).

Parameters

suppress_indent (bool) –

Return type

NoIndent | Dict[str, Any]

static from_json_serializable(json_map, pool_with_name, label_decoder=<function Domain.<lambda>>)[source]
Parameters
  • json_map (Dict[str, Any]) – JSON serializable object encoding this Domain, as returned by Domain.to_json_serializable().

  • pool_with_name (Dict[str, DomainPool] | None) – dict mapping name to DomainPool with that name; required to rehydrate Domain’s. If None, then a DomainPool with no constraints is created with the name and domain length found in the JSON.

  • label_decoder (Callable[[Any], DomainLabel]) – Function transforming object deserialized from JSON (e.g, dict, list, string) into an object of type DomainLabel.

Returns

Domain represented by dict json_map, assuming it was created by Domain.to_json_serializable().

Return type

Domain[DomainLabel]

property name: str

DomainPool of this Domain

Type

return

property pool: DomainPool

DomainPool of this Domain

Type

return

property subdomains: List[Domain]

Subdomains of this Domain.

Used in connection with Domain.dependent to declare that some Domain’s are contained within other domains (forming a tree in general), and domains with Domain.dependent set to True automatically take their sequences from independent domains.

WARNING: this can be a bit tricky to determine the order when setting these. The subdomains should be listed in 5’ to 3’ order for UNSTARRED domains. If there is a starred domain with starred subdomains, they would be listed in REVERSE order.

For example, if there is a domain dom* [---------> of length 11 with two subdomains sub1* [-----> of length 7 and sub2* [--> of length 4 (put together they look like [----->[-->) that appear in that order left to right (5’ to 3’), then one would assign the domain dom to have subdomains [sub2, sub1], since the UNSTARRED domains appear <-----]<--], i.e., in 5’ to 3’ order for the unstarred domains, first the length 4 domain dom2 appears, then the length 7 domain dom1.

has_length()[source]
Returns

True if this Domain has a length, which means either a sequence has been assigned to it, or it has a DomainPool.

Return type

bool

get_length()[source]
Returns

Length of this domain (delegates to pool)

Raises

ValueError – if no DomainPool has been set for this Domain

Return type

int

sequence()[source]
Returns

DNA sequence of this domain (unstarred version)

Raises

ValueError – If no sequence has been assigned.

Return type

str

set_sequence(new_sequence)[source]
Parameters

new_sequence (str) – new DNA sequence to set

Return type

None

set_fixed_sequence(fixed_sequence)[source]

Set DNA sequence and fix it so it is not changed by the dsd sequence designer.

Since it is being fixed, there is no Domain pool, so we don’t check the pool or whether it has a length. We also bypass the check that it is not fixed.

Parameters

fixed_sequence (str) – new fixed DNA sequence to set

Return type

None

property starred_name: str

The value Domain.name with * appended to it.

Type

return

property starred_sequence: str

Watson-Crick complement of DNA sequence assigned to this Domain.

Type

return

get_name(starred)[source]
Parameters

starred (bool) – whether to return the starred or unstarred version of the name

Returns

The value Domain.name or Domain.starred_name, depending on the value of parameter starred.

Return type

str

concrete_sequence(starred)[source]
Parameters

starred (bool) – whether to return the starred or unstarred version of the sequence

Returns

The value Domain.sequence or Domain.starred_sequence, depending on the value of parameter starred.

Raises

ValueError – if this Domain does not have a sequence assigned

Return type

str

has_sequence()[source]
Returns

Whether a complete DNA sequence has been assigned to this Domain. If this domain has subdomains, False if any subdomain has not been assigned a sequence.

Return type

bool

static complementary_domain_name(domain_name)[source]

Returns the name of the domain complementary to domain_name. In other words, a * is either removed from the end of domain_name, or appended to it if not already there.

Parameters

domain_name (str) – name of domain

Returns

name of complementary domain

Return type

str

all_domains_in_tree()[source]
Returns

list of all domains in the same subdomain tree as this domain (including itself)

Return type

List[Domain]

all_domains_intersecting()[source]
Returns

list of all domains intersecting this one, meaning those domains in the subtree rooted at this domain (including itself), plus any ancestors of this domain.

Return type

List[Domain]

ancestors()[source]
Returns

list of all domains that are ancestors of this one, NOT including this domain

Return type

List[Domain]

has_pool()[source]
Returns

whether a DomainPool has been assigned to this Domain

Return type

bool

contains_in_subtree(other)[source]
Parameters

other (Domain) – another Domain

Returns

True if self contains other in its subtree of subdomains

Return type

bool

independent_source()[source]

Like independent_ancestor_or_descendent(), but returns this Domain if it is already independent.

Returns

the independent Domain that this domain depends on, which is itself if it is already independent

Return type

Domain

independent_ancestor_or_descendent()[source]

Find the independent ancestor or descendent of this dependent Domain. Raises exception if this is not a dependent Domain.

Returns

The independent ancestor or descendent of this Domain.

Return type

Domain

constraints.domains_not_substrings_of_each_other_constraint(check_complements=True, short_description='dom neq', weight=1.0, min_length=0, pairs=None)[source]

Returns constraint ensuring no two domains are substrings of each other. Note that this ensures that no two Domain’s are equal if they are the same length.

Parameters
  • check_complements (bool) – whether to also ensure the check for Watson-Crick complements of the sequences

  • short_description (str) – short description of constraint suitable for logging to stdout

  • weight (float) – weight to assign to constraint

  • min_length (int) – minimum length substring to check. For instance if min_length is 4, then having two domains with sequences AAAA and CAAAAC would violate this constraint, but domains with sequences AAA and CAAAC would not.

  • pairs (Iterable[Tuple[Domain, Domain]] | None) – pairs of domains to check. By default all pairs of unequal domains are compared unless both are fixed.

Returns

a DomainPairConstraint ensuring no two domain sequences contain each other as a substring (in particular, if they are equal length, then they are not the same domain)

Return type

DomainPairConstraint

class constraints.IDTFields(scale='25nm', purification='STD', plate=None, well=None)[source]

Data required when ordering DNA strands from the synthesis company IDT (Integrated DNA Technologies). This data is used when automatically generating files used to order DNA from IDT.

When exporting to IDT files via Design.write_idt_plate_excel_file() or Design.write_idt_bulk_input_file(), the field Strand.name is used for the name if it exists, otherwise a reasonable default is chosen.

Parameters
  • scale (str) –

  • purification (str) –

  • plate (str | None) –

  • well (str | None) –

scale: str = '25nm'

Synthesis scale at which to synthesize the strand (third field in IDT bulk input: https://www.idtdna.com/site/order/oligoentry). Choices supplied by IDT at the time this was written: "25nm", "100nm", "250nm", "1um", "5um", "10um", "4nmU", "20nmU", "PU", "25nmS".

purification: str = 'STD'

Purification options (fourth field in IDT bulk input: https://www.idtdna.com/site/order/oligoentry). Choices supplied by IDT at the time this was written: "STD", "PAGE", "HPLC", "IEHPLC", "RNASE", "DUALHPLC", "PAGEHPLC".

plate: str | None = None

Name of plate in case this strand will be ordered on a 96-well or 384-well plate.

Optional field, but non-optional if IDTFields.well is not None.

well: str | None = None

Well position on plate in case this strand will be ordered on a 96-well or 384-well plate.

Optional field, but non-optional if IDTFields.plate is not None.

__init__(scale='25nm', purification='STD', plate=None, well=None)
Parameters
  • scale (str) –

  • purification (str) –

  • plate (str | None) –

  • well (str | None) –

Return type

None

class constraints.Strand(domains=None, starred_domain_indices=None, group='default_strand_group', name=None, label=None, idt=None)[source]

Represents a DNA strand, made of several Domain’s.

Parameters
  • domains (Iterable[Domain[DomainLabel]] | None) –

  • starred_domain_indices (Iterable[int] | None) –

  • group (str) –

  • name (str | None) –

  • label (StrandLabel | None) –

  • idt (IDTFields | None) –

modification_5p: nm.Modification5Prime | None = None

5’ modification; None if there is no 5’ modification.

modification_3p: nm.Modification3Prime | None = None

3’ modification; None if there is no 3’ modification.

__init__(domains=None, starred_domain_indices=None, group='default_strand_group', name=None, label=None, idt=None)[source]

A Strand can be created only by listing explicit Domain objects via parameter domains. To specify a Strand by giving domain names, see the method Design.add_strand().

Parameters
  • domains (Iterable[Domain[DomainLabel]] | None) – list of Domain’s on this Strand

  • starred_domain_indices (Iterable[int] | None) – Indices of Domain’s in domains that are starred.

  • group (str) – name of group of this Strand.

  • name (str | None) – Name of this Strand.

  • label (StrandLabel | None) – Label to associate with this Strand.

  • idt (IDTFields | None) – IDTFields object to associate with this Strand; needed to call methods for exporting to IDT formats (e.g., Strand.write_idt_bulk_input_file())

Return type

None

group: str

Optional “group” field to describe strands that share similar properties.

domains: List[Domain[DomainLabel]]

The Domain’s on this Strand, in order from 5’ end to 3’ end.

starred_domain_indices: FrozenSet[int]

Set of positions of Domain’s in Strand.domains on this Strand that are starred.

label: StrandLabel | None = None

Optional generic “label” object to associate to this Strand.

Useful for associating extra information with the Strand that will be serialized, for example, for DNA sequence design. It must be an object (e.g., a dict or primitive type such as str or int) that is naturally JSON serializable. (Calling json.dumps on the object should succeed without having to specify a custom encoder.)

idt: IDTFields | None = None

Fields used when ordering strands from the synthesis company IDT (Integrated DNA Technologies, Coralville, IA). If present (i.e., not equal to None) then the method Design.write_idt_bulk_input_file() can be called to automatically generate an text file for ordering strands in test tubes: https://www.idtdna.com/site/order/oligoentry, as can the method Design.write_idt_plate_excel_file() for writing a Microsoft Excel file that can be uploaded to IDT’s website for describing DNA sequences to be ordered in 96-well or 384-well plates.

modifications_int: Dict[int, nm.ModificationInternal]

modifications.Modification’s to the DNA sequence (e.g., biotin, Cy3/Cy5 fluorphores).

Maps index within DNA sequence to modification. If the internal modification is attached to a base (e.g., internal biotin, /iBiodT/ from IDT), then the index is that of the base. If it goes between two bases (e.g., internal Cy3, /iCy3/ from IDT), then the index is that of the previous base, e.g., to put a Cy3 between bases at indices 3 and 4, the index should be 3. So for an internal modified base on a sequence of length n, the allowed indices are 0,…,n-1, and for an internal modification that goes between bases, the allowed indices are 0,…,n-2.

clone(name)[source]

Returns a copy of this Strand. The copy is “shallow” in that the Domain’s are shared. This is useful for creating multiple versions of each Strand, e.g., for having a variant with an extension.

WARNING: the Strand.label will be shared between them. If it should be copied, this must be done manually. A shallow copy of it can be made by setting

Parameters

name (str | None) – new name to give this Strand

Returns

A copy of this Strand.

Return type

Strand

compute_derived_fields()[source]

Re-computes derived fields of this Strand. Should be called after modifications to the Strand. (Done automatically at the start of search.search_for_dna_sequences().)

intersects_domain(domain)[source]
Parameters

domain (Domain) – domain to test for intersection

Returns

whether this strand intersects domain, which is true if either domain is in the list Strand.domains, or if any of those domains have domain in their hierarchical tree as a subdomain or an ancestor

Return type

bool

length()[source]
Returns

Sum of lengths of Domain’s in this Strand. Each Domain must have a DomainPool assigned so that the length is defined.

Return type

int

domain_names_concatenated(delim='-')[source]
Parameters

delim (str) – Delimiter to put between domain names.

Returns

names of Domain’s in this Strand, concatenated with delim in between.

Return type

str

domain_names_tuple()[source]
Returns

tuple of names of Domain’s in this Strand.

Return type

Tuple[str, …]

idt_dna_sequence()[source]
Returns

DNA sequence as it needs to be typed to order from IDT, with Modification5Prime’s, Modification3Prime’s, and ModificationInternal’s represented with text codes, e.g., “/5Biosg/ACGT” for sequence ACGT with a 5’ biotin modification.

Return type

str

to_json_serializable(suppress_indent=True)[source]
Returns

Dictionary d representing this Strand that is “naturally” JSON serializable, by calling json.dumps(d).

Parameters

suppress_indent (bool) –

Return type

NoIndent | Dict[str, Any]

static from_json_serializable(json_map, domain_with_name, label_decoder=<function Strand.<lambda>>)[source]
Returns

Strand represented by dict json_map, assuming it was created by Strand.to_json_serializable().

Parameters
  • json_map (Dict[str, Any]) –

  • domain_with_name (Dict[str, Domain[DomainLabel]]) –

  • label_decoder (Callable[[Any], StrandLabel]) –

Return type

Strand[StrandLabel, DomainLabel]

unstarred_domains()[source]
Returns

list of unstarred Domain’s in this Strand, in order they appear in Strand.domains

Return type

List[Domain[DomainLabel]]

starred_domains()[source]
Returns

list of starred Domain’s in this Strand, in order they appear in Strand.domains

Return type

List[Domain[DomainLabel]]

unstarred_domains_set()[source]
Returns

set of unstarred Domain’s in this Strand

Return type

OrderedSet[Domain[DomainLabel]]

starred_domains_set()[source]
Returns

set of starred Domain’s in this Strand

Return type

OrderedSet[Domain[DomainLabel]]

sequence(delimiter='')[source]
Parameters

delimiter (str) – Delimiter string to place between sequences of each Domain in this Strand. For instance, if delimiter = '--', then it will return a string such as ACGTAGCTGA--CGCTAGCTGA--CGATCGATC--GCGATCGAT

Returns

DNA sequence assigned to this Strand, calculated by concatenating all sequences assigned to its Domain’s.

Raises

ValueError – if any Domain of this Strand does not have a sequence assigned

Return type

str

assign_dna(sequence)[source]
Parameters

sequence (str) – DNA sequence to assign to this Strand. Must have length = Strand.length().

Return type

None

property fixed: bool

True if every Domain on this Strand has a fixed DNA sequence.

unfixed_domains()[source]
Returns

all Domain’s in this Strand where Domain.fixed is False

Return type

Tuple[Domain[DomainLabel]]

property name: str

name of this Strand if it was assigned one, otherwise Domain names are concatenated with ‘-’ joining them

Type

return

address_of_domain(domain_idx)[source]

Returns StrandDomainAddress of the domain located at domain_idx

Rparam domain_idx

Index of domain

Parameters

domain_idx (int) –

Return type

StrandDomainAddress

address_of_nth_domain_occurence(domain_name, n, forward=True)[source]

Returns StrandDomainAddress of the n’th occurence of domain named domain_name.

Parameters
  • domain_name (str) – name of Domain to find address of

  • n (int) – which occurrence (in order on the Strand) of Domain with name domain_name to find address of.

  • forward – if True, starts searching from 5’ end, otherwise starts searching from 3’ end.

Returns

StrandDomainAddress of the n’th occurence of domain named domain_name.

Return type

StrandDomainAddress

address_of_first_domain_occurence(domain_name)[source]

Returns StrandDomainAddress of the first occurrence of domain named domain_name starting from the 5’ end.

Parameters

domain_name (str) –

Return type

StrandDomainAddress

address_of_last_domain_occurence(domain_name)[source]

Returns StrandDomainAddress of the nth occurrence of domain named domain_name starting from the 3’ end.

Parameters

domain_name (str) –

Return type

StrandDomainAddress

append_domain(domain, starred=False)[source]

Appends domain to 3’ end of this Strand.

Parameters
  • domain (Domain) – Domain to append

  • starred (bool) – whether domain is starred

Return type

None

prepend_domain(domain, starred=False)[source]

Prepends domain to 5’ end of this Strand (i.e., the beginning of the Strand).

Parameters
  • domain (Domain) – Domain to prepend

  • starred (bool) – whether domain is starred

Return type

None

insert_domain(idx, domain, starred=False)[source]

Inserts domain at index idx of this Strand, with same semantics as Python’s List.insert. For example, strand.insert(0, domain) is equivalent to strand.prepend_domain(domain) and strand.insert(len(strand.domains), domain) is equivalent to strand.append_domain(domain).

Parameters
  • idx (int) – index at which to insert domain into this Strand

  • domain (Domain) – Domain to append

  • starred (bool) – whether domain is starred

Return type

None

set_fixed_sequence(seq)[source]

Sets each domain of this Strand to have a substring of seq, such that the entire strand has the sequence seq. All Domain’s in this strand will be fixed after doing this. (And if any of them are already fixed it will raise an error.)

Parameters

seq (str) – sequence to assign to this Strand

Return type

None

class constraints.DomainPair(domain1: 'Domain', domain2: 'Domain')[source]
Parameters
__init__(domain1, domain2)
Parameters
Return type

None

class constraints.StrandPair(strand1: 'Strand', strand2: 'Strand')[source]
Parameters
__init__(strand1, strand2)
Parameters
Return type

None

class constraints.Complex(*args: 'Strand')[source]
Parameters

args (Strand) –

__init__(*args)[source]

Creates a complex of strands given as arguments, e.g., Complex(strand1, strand2) creates a 2-strand complex.

Parameters

args (Strand) –

Return type

None

strands: Tuple[Strand, ...]

The strands in this complex.

constraints.remove_duplicates(lst)[source]
Parameters

lst (Iterable[T]) – an Iterable of objects

Returns

a List consisting of elements of lst with duplicates removed, while preserving iteration order of lst (naive approach using Python set would not preserve order, since iteration order of Python sets is not specified)

Return type

List[T]

class constraints.PlateType(value)[source]

Represents two different types of plates in which DNA sequences can be ordered.

wells96 = 96

96-well plate.

wells384 = 384

384-well plate.

num_wells_per_plate()[source]
Returns

number of wells in this plate type

Return type

int

min_wells_per_plate()[source]
Returns

minimum number of wells in this plate type to avoid extra charge by IDT

Return type

int

class constraints.Design(strands=())[source]

Represents a complete design, i.e., a set of DNA Strand’s with domains, and Constraint’s on the sequences to assign to them via search.search_for_dna_sequences().

Parameters

strands (List[Strand[StrandLabel, DomainLabel]]) –

domains: List[Domain[DomainLabel]]

List of all Domain’s in this Design. (without repetitions)

Computed from Design.strands, so not specified in constructor.

strands_by_group_name: Dict[str, List[Strand[StrandLabel, DomainLabel]]]

Dict mapping each group name to a list of the Strand’s in this Design in the group.

Computed from Design.strands, so not specified in constructor.

domain_pools_to_domain_map: Dict[DomainPool, List[Domain]]

Dict mapping each DomainPool to a list of the Domain’s in this Design in the pool.

Computed from Design.strands, so not specified in constructor.

domains_by_name: Dict[str, Domain]

Dict mapping each name of a Domain to the Domain’s in this Design.

Computed from Design.strands, so not specified in constructor.

__init__(strands=())[source]
Parameters

strands (Iterable[Strand]) – the Strand’s in this Design

Return type

None

strands: List[Strand[StrandLabel, DomainLabel]]

List of all Strand’s in this Design.

compute_derived_fields()[source]

Computes derived fields of this Design. Used to ensure that all fields are valid in case the Design was manually modified after being created, before running search.search_for_dna_sequences().

Return type

None

to_json()[source]
Returns

JSON string representing this Design.

Return type

str

to_json_serializable(suppress_indent=True)[source]
Parameters

suppress_indent (bool) – Whether to suppress indentation of some objects using the NoIndent object.

Returns

Dictionary d representing this Design that is “naturally” JSON serializable, by calling json.dumps(d).

Return type

Dict[str, Any]

write_design_file(directory='.', filename=None, extension='json')[source]

Write JSON file representing this Design, which can be imported via the method Design.from_design_file(), with the output file having the same name as the running script but with .py changed to .json, unless filename is explicitly specified. For instance, if the script is named my_design.py, then the design will be written to my_design.json. If extension is specified (but filename is not), then the design will be written to my_design.<extension>

The string written is that returned by Design.to_json().

Parameters
  • directory (str) – directory in which to put file (default: current working directory)

  • filename (str | None) – filename (default: name of script with .py replaced by .sc). Mutually exclusive with extension

  • extension (str) – extension for filename (default: .sc) Mutually exclusive with filename

Return type

None

static from_design_file(filename, strand_label_decoder=<function Design.<lambda>>, domain_label_decoder=<function Design.<lambda>>)[source]
Parameters
  • filename (str) – name of JSON file describing the Design

  • domain_label_decoder (Callable[[Any], DomainLabel]) – Function that transforms JSON representation of Domain.label into the proper type.

  • strand_label_decoder (Callable[[Any], StrandLabel]) – Function that transforms JSON representation of Strand.label into the proper type.

Returns

Design described by the JSON file with name filename, assuming it was created using :py:meth`Design.to_json`.

Return type

Design[StrandLabel, DomainLabel]

static from_json(json_str, strand_label_decoder=<function Design.<lambda>>, domain_label_decoder=<function Design.<lambda>>)[source]
Parameters
  • json_str (str) – The string representing the Design as a JSON object.

  • domain_label_decoder (Callable[[Any], DomainLabel]) – Function that transforms JSON representation of Domain.label into the proper type.

  • strand_label_decoder (Callable[[Any], StrandLabel]) – Function that transforms JSON representation of Strand.label into the proper type.

Returns

Design described by this JSON string, assuming it was created using :py:meth`Design.to_json`.

Return type

Design[StrandLabel, DomainLabel]

static from_json_serializable(json_map, domain_label_decoder=<function Design.<lambda>>, strand_label_decoder=<function Design.<lambda>>)[source]
Parameters
  • json_map (Dict[str, Any]) – JSON serializable object encoding this Design, as returned by Design.to_json_serializable().

  • domain_label_decoder (Callable[[Any], DomainLabel]) – Function that transforms JSON representation of Domain.label into the proper type.

  • strand_label_decoder (Callable[[Any], StrandLabel]) – Function that transforms JSON representation of Strand.label into the proper type.

Returns

Design represented by dict json_map, assuming it was created by Design.to_json_serializable(). No constraints are populated.

Return type

Design[StrandLabel, DomainLabel]

add_strand(domain_names=None, domains=None, starred_domain_indices=None, group='default_strand_group', name=None, label=None, idt=None)[source]

This is an alternative way to create strands instead of calling the Strand constructor explicitly. It behaves similarly to the Strand constructor, but it has an option to specify Domain’s simply by giving a name.

A Strand can be created either by listing explicit Domain objects via parameter domains (as in the Strand constructor), or by giving names via parameter domain_names. If domain_names is specified, then by convention those that end with a * are assumed to be starred. Also, Domain’s created in this way are “interned” as variables in a cache stored in the Design object; no two Domain’s with the same name in this design will be created, and subsequent uses of the same name will refer to the same Domain object.

Parameters
  • domain_names (List[str] | None) – Names of the Domain’s on this Strand. Mutually exclusive with Strand.domains and Strand.starred_domain_indices.

  • domains (List[Domain[DomainLabel]] | None) – list of Domain’s on this Strand. Mutually exclusive with Strand.domain_names, and must be specified jointly with Strand.starred_domain_indices.

  • starred_domain_indices (Iterable[int] | None) – Indices of Domain’s in domains that are starred. Mutually exclusive with Strand.domain_names, and must be specified jointly with Strand.domains.

  • group (str) – name of group of this Strand.

  • name (str | None) – Name of this Strand.

  • label (StrandLabel | None) – Label to associate with this Strand.

  • idt (IDTFields | None) – IDTFields object to associate with this Strand; needed to call methods for exporting to IDT formats (e.g., Strand.write_idt_bulk_input_file())

Returns

the Strand that is created

Return type

Strand

modifications(mod_type=None)[source]

Returns either set of all modifications.Modification’s in this Design, or set of all modifications of a given type (5’, 3’, or internal).

Parameters

mod_type (nm.ModificationType | None) – type of modifications (5’, 3’, or internal); if not specified, all three types are returned

Returns

Set of all modifications in this Design (possibly of a given type).

Return type

Set[nm.Modification]

to_idt_bulk_input_format(delimiter=',', key=None, warn_duplicate_name=False, only_strands_with_idt=False, strands=None)[source]

Called by Design.write_idt_bulk_input_file() to determine what string to write to the file. This function can be used to get the string directly without creating a file.

Parameters have the same meaning as in Design.write_idt_bulk_input_file().

Returns

string that is written to the file in the method Design.write_idt_bulk_input_file().

Parameters
  • delimiter (str) –

  • key (KeyFunction[Strand] | None) –

  • warn_duplicate_name (bool) –

  • only_strands_with_idt (bool) –

  • strands (Iterable[Strand] | None) –

Return type

str

write_idt_bulk_input_file(*, filename=None, directory='.', key=None, extension=None, delimiter=',', warn_duplicate_name=True, only_strands_with_idt=False, strands=None)[source]

Write .idt text file encoding the strands of this Design with the field Strand.idt, suitable for pasting into the “Bulk Input” field of IDT (Integrated DNA Technologies, Coralville, IA, https://www.idtdna.com/), with the output file having the same name as the running script but with .py changed to .idt, unless filename is explicitly specified. For instance, if the script is named my_origami.py, then the sequences will be written to my_origami.idt. If filename is not specified but extension is, then that extension is used instead of idt. At least one of filename or extension must be None.

The string written is that returned by Design.to_idt_bulk_input_format().

Parameters
  • filename (str) – optional custom filename to use (instead of currently running script)

  • directory (str) – specifies a directory in which to place the file, either absolute or relative to the current working directory. Default is the current working directory.

  • key (KeyFunction[Strand] | None) – key function used to determine order in which to output strand sequences. Some useful defaults are provided by strand_order_key_function()

  • extension (str | None) – alternate filename extension to use (instead of idt)

  • delimiter (str) – is the symbol to delimit the four IDT fields name,sequence,scale,purification.

  • warn_duplicate_name (bool) – if True prints a warning when two different Strand’s have the same IDTFields.name and the same Strand.sequence(). A ValueError is raised (regardless of the value of this parameter) if two different Strand’s have the same name but different sequences, IDT scales, or IDT purifications.

  • only_strands_with_idt (bool) – If False (the default), all non-scaffold sequences are output, with reasonable default values chosen if the field Strand.idt is missing. If True, then strands lacking the field Strand.idt will not be exported.

  • strands (Iterable[Strand] | None) – strands to export; if not specified, all strands in design are exported. NOTE: it is not checked that each Strand in strands is actually contained in this any:Design

Return type

None

write_idt_plate_excel_file(*, filename=None, directory='.', key=None, warn_duplicate_name=False, only_strands_with_idt=False, use_default_plates=True, warn_using_default_plates=True, plate_type=PlateType.wells96, strands=None)[source]

Write .xls (Microsoft Excel) file encoding the strands of this Design with the field Strand.idt, suitable for uploading to IDT (Integrated DNA Technologies, Coralville, IA, https://www.idtdna.com/) to describe a 96-well or 384-well plate (https://www.idtdna.com/site/order/plate/index/dna/), with the output file having the same name as the running script but with .py changed to .xls, unless filename is explicitly specified. For instance, if the script is named my_origami.py, then the sequences will be written to my_origami.xls.

If the last plate as fewer than 24 strands for a 96-well plate, or fewer than 96 strands for a 384-well plate, then the last two plates are rebalanced to ensure that each plate has at least that number of strands, because IDT charges extra for a plate with too few strands: https://www.idtdna.com/pages/products/custom-dna-rna/dna-oligos/custom-dna-oligos

Parameters
  • filename (str) – custom filename if default (explained above) is not desired

  • directory (str) – specifies a directory in which to place the file, either absolute or relative to the current working directory. Default is the current working directory.

  • key (KeyFunction[Strand] | None) –

    key function used to determine order in which to output strand sequences. Some useful defaults are provided by strand_order_key_function()

  • warn_duplicate_name (bool) – if True prints a warning when two different Strand’s have the same IDTFields.name and the same Strand.sequence(). A ValueError is raised (regardless of the value of this parameter) if two different Strand’s have the same name but different sequences, IDT scales, or IDT purifications.

  • only_strands_with_idt (bool) – If False (the default), all non-scaffold sequences are output, with reasonable default values chosen if the field Strand.idt is missing. (though scaffold is included if export_scaffold is True). If True, then strands lacking the field Strand.idt will not be exported. If False, then use_default_plates must be True.

  • use_default_plates (bool) – Use default values for plate and well (ignoring those in idt fields, which may be None). If False, each Strand to export must have the field Strand.idt, so in particular the parameter only_strands_with_idt must be True.

  • warn_using_default_plates (bool) – specifies whether, if use_default_plates is True, to print a warning for strands whose Strand.idt has the fields IDTFields.plate and IDTFields.well, since use_default_plates directs these fields to be ignored.

  • plate_type (PlateType) – a PlateType specifying whether to use a 96-well plate or a 384-well plate if the use_default_plates parameter is True. Ignored if use_default_plates is False, because in that case the wells are explicitly set by the user, who is free to use coordinates for either plate type.

  • strands (Iterable[Strand] | None) – strands to export; if not specified, all strands in design are exported. NOTE: it is not checked that each Strand in strands is actually contained in this any:Design

Return type

None

domain_pools()[source]
Returns

list of all DomainPool’s in this Design

Return type

List[DomainPool]

domains_by_pool_name(domain_pool_name)[source]
Parameters

domain_pool_name (str) – name of a DomainPool

Returns

the Domain’s in domain_pool

Return type

List[Domain[DomainLabel]]

static from_scadnano_file(sc_filename, fix_assigned_sequences=True, ignored_strands=None)[source]

Converts a scadnano Design stored in file named sc_filename to a a Design for doing DNA sequence design. Each Strand name and Domain name from the scadnano Design are assigned as Strand.name and Domain.name in the obvious way. Assumes each Strand label is a string describing the strand group.

The scadnano package must be importable.

Also assigns sequences from domains in sc_design to those of the returned Design. If fix_assigned_sequences is true, then these DNA sequences are fixed; otherwise not.

Parameters
  • sc_filename (str) – Name of file containing scadnano Design.

  • fix_assigned_sequences (bool) – Whether to fix the sequences that are assigned from those found in sc_design.

  • ignored_strands (Iterable[Strand] | None) – Strands to ignore

Returns

An equivalent Design, ready to be given constraints for DNA sequence design.

Raises

TypeError – If any scadnano strand label is not a string.

Return type

Design[StrandLabel, DomainLabel]

static from_scadnano_design(sc_design, fix_assigned_sequences=True, ignored_strands=None, warn_existing_domain_labels=True)[source]

Converts a scadnano Design sc_design to a a Design for doing DNA sequence design. Each Strand name and Domain name from the scadnano Design are assigned as Strand.name and Domain.name in the obvious way. Assumes each Strand label is a string describing the strand group.

The scadnano package must be importable.

Also assigns sequences from domains in sc_design to those of the returned Design. If fix_assigned_sequences is true, then these DNA sequences are fixed; otherwise not.

Parameters
  • sc_design (sc.Design[StrandLabel, DomainLabel]) – Instance of scadnano.Design from the scadnano Python scripting library.

  • fix_assigned_sequences (bool) – Whether to fix the sequences that are assigned from those found in sc_design.

  • ignored_strands (Iterable[Strand] | None) – Strands to ignore; none are ignore if not specified.

  • warn_existing_domain_labels (bool) – If True, logs warning when dsd Domain already has a label and so does scadnano domain, since scadnano label will not be assigned to the dsd Domain.

Returns

An equivalent Design, ready to be given constraints for DNA sequence design.

Raises

TypeError – If any scadnano strand label is not a string.

Return type

Design[StrandLabel, DomainLabel]

assign_fields_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]

Assigns DNA sequence, IDTFields, and StrandGroups (as a key in a scadnano String.label dict under key “group”). TODO: document more

Parameters
  • sc_design (Design[StrandLabel, DomainLabel]) –

  • ignored_strands (Iterable[Strand]) –

  • overwrite (bool) –

assign_sequences_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]

Assigns sequences from this Design into sc_design.

Also writes a label to each scadnano strand. If the label is None a new one is created as a dict with a key group. The name of the StrandGroup of the nuad design is the value to assign to this key. If the scadnano strand label is already a dict, it adds this key. If the strand label is not None or a dict, an exception is raised.

Assumes that each domain name in domains in sc_design is a Domain.name of a Domain in this Design.

If multiple strands in sc_design share the same name, then all of them are assigned the DNA sequence of the nuad Strand with that name.

Parameters
  • sc_design (Design[StrandLabel, DomainLabel]) – a scadnano design.

  • ignored_strands (Iterable[Strand]) – strands in the scadnano design that are to be ignored by the sequence designer.

  • overwrite (bool) – if True, overwrites existing sequences; otherwise gives an error if an existing sequence disagrees with the newly assigned sequence

Return type

None

shared_strands_with_scadnano_design(sc_design, ignored_strands=())[source]

Returns a list of pairs (nuad_strand, sc_strands), where nuad_strand has the same name as all scadnano Strands in sc_strands, but only scadnano strands are included in the list that do not appear in ignored_strands.

Parameters
  • sc_design (Design) –

  • ignored_strands (Iterable[Strand]) –

Return type

List[Tuple[Strand, List[Strand]]]

assign_strand_groups_to_labels(sc_design, ignored_strands=(), overwrite=False)[source]

TODO: document this

Parameters
  • sc_design (Design) –

  • ignored_strands (Iterable[Strand]) –

  • overwrite (bool) –

Return type

None

assign_idt_fields_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]

Assigns IDTFields from this Design into sc_design.

If multiple strands in sc_design share the same name, then all of them are assigned the IDT fields of the dsd Strand with that name.

Parameters
  • sc_design (Design[StrandLabel, DomainLabel]) – a scadnano design.

  • ignored_strands (Iterable[Strand]) – strands in the scadnano design that are to be not assigned.

  • overwrite (bool) – whether to overwrite existing fields.

Raises

ValueError – if scadnano strand already has any modifications assigned

Return type

None

assign_modifications_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]

Assigns modifications.Modification’s from this Design into sc_design.

If multiple strands in sc_design share the same name, then all of them are assigned the modifications of the dsd Strand with that name.

Parameters
  • sc_design (Design[StrandLabel, DomainLabel]) – a scadnano design.

  • ignored_strands (Iterable[Strand]) – strands in the scadnano design that are to be not assigned.

  • overwrite (bool) – whether to overwrite existing fields in scadnano design

Raises

ValueError – if scadnano strand already has any modifications assigned

Return type

None

copy_sequences_from(other)[source]

Assuming every Domain in this Design is has a matching (same name) Domain in other, copies sequences from other into this Design.

Parameters

other (Design) – other Design from which to copy sequences

Return type

None

check_all_subdomain_graphs_acyclic()[source]

Check that all domain graphs (if subdomains are used) are acyclic.

Return type

None

check_all_subdomain_graphs_uniquely_assignable()[source]

Check that subdomain graphs are consistent and raise error if not.

Return type

None

class constraints.Constraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>)[source]

Abstract base class of all “soft” constraints to apply when running search.search_for_dna_sequences(). Unlike a NumpyFilter or a SequenceFilter, which disallow certain DNA sequences from ever being assigned to a Domain, a Constraint can be violated during the search. The goal of the search is to reduce the number of violated Constraint’s. See search.search_for_dna_sequences() for a more detailed description of how the search algorithm interacts with the constraints.

You will not use this class directly, but instead its concrete subclasses DomainConstraint, StrandConstraint, DomainPairConstraint, StrandPairConstraint, ComplexConstraint, which are subclasses of SingularConstraint, DomainsConstraint, StrandsConstraint, DomainPairsConstraint, StrandPairsConstraint, which are subclasses of BulkConstraint, or DesignConstraint.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

description: str

Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’.

short_description: str = ''

Very short description of the constraint suitable for compactly logging to the screen, e.g., ‘strand_ss’

weight: float = 1.0

Constant multiplier Weight of the problem; the higher the total weight of all the Constraint’s a Domain has caused, the greater likelihood its sequence is changed when stochastically searching for sequences to satisfy all constraints.

score_transfer_function()

Score transfer function to use. When a constraint is violated, the constraint returns a nonnegative float (the score) indicating the “severity” of the violation. For example, if a Strand has secondary structure energy exceeding a threshold, the score returned is the difference between the energy and the threshold.

The score is then passed through the score_transfer_function. The default is the squared ReLU function: f(x) = max(0, x^2). This “punishes” more severe violations more, for example, it would bring down the total score of violations more to reduce a violation 3 kcal/mol in excess of its threshold to 2 kcal/mol excess, than to reduce a violation only 1 kcal/mol in excess of its threshold down to 0.

Parameters

x (float) –

Return type

float

abstract static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

Return type

None

class constraints.Result(excess, summary=None, value=None)[source]

A Result is returned from the function SingularConstraint.evaluate, and a list of Result’s is returned from the function BulkConstraint.evaluate_bulk, describing the result of evaluating the constraint on the design “part”.

A Result must have an “excess” and “summary” specified.

Optionally one may also specify a “value”, which helps in graphically displaying the results of evaluating constraints using the function display_report().

For example, if the constraint checks that the NUPACK complex free energy of a strand is at least -2.5 kcal/mol, and a strand has energy -3.4 kcal/mol, then the following are sensible values for these fields:

  • value = -3.4 or "-3.4 kcal/mol" or pint.Quantity(Decimal(-3.4), "kcal/mol")

  • excess = -0.9

  • summary = "-3.4 kcal/mol"

Parameters
  • excess (float) –

  • summary (str | None) –

  • value (float | str | pint.Quantity[Decimal] | None) –

__init__(excess, summary=None, value=None)[source]
Parameters
  • excess (float) –

  • summary (str | None) –

  • value (float | str | pint.Quantity[Decimal] | None) –

Return type

None

excess: float

The excess is a nonnegative value that is turned into a score, and the search minimizes the total score of all constraint evaluations. Setting this to 0 (or a negative value) means the constraint is satisfied, and setting it to a positive value means the constraint is violated. The interpretation is that the larger excess is, the more the constraint is violated.

For example, a common value for excess is the amount by which the NUPACK complex free energy exceeds a threshold.

summary: str = ''

This string is displayed in the text report on constraints, after the name of the “part” (e.g., strand, pair of domains, pair of strands).

value: pint.Quantity[Decimal] | None = None

If this is a “numeric” constraint, i.e., checking some number such as the complex free energy of a strand and comparing it to a threshold, this is the “raw” value. It is optional, but if specified, then the raw values can be plotted in a Jupyter notebook by the function display_report().

If a float, then no units are assumed. If it is a str, then it is assumed that it can be passed to the constructor pint.Quantity and interpreted as a value with units, e.g., the string “-3.4 kcal/mol”.

score: float

Set by the search algorithm based on Result.excess as well as other data such as the constraint’s weight and the SearchParameters.score_transfer_function.

part: DesignPart

Set by the search algorithm based on the part that was evaluated.

constraints.normalize_quantity(quantity, compact=False)[source]

Normalize quantity so that it has a Decimal madnitude, is “compact” if specified (uses units within the correct “3 orders of magnitude”: https://pint.readthedocs.io/en/0.18/tutorial.html#simplifying-units) and eliminate trailing zeros.

Parameters
  • quantity (pint.Quantity) – a pint Quantity[Decimal]

  • compact (bool) – whether to change units to make compact (within correct 3 orders of magnitude, e.g., 30 kg instead of 30,000 g)

Returns

quantity normalized to be compact and without trailing zeros.

Return type

pint.Quantity[Decimal]

class constraints.SingularConstraint(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float]' = <function default_score_transfer_function at 0x7f04a5359160>, evaluate: 'Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]' = <function SingularConstraint.<lambda> at 0x7f04a5220820>, parallel: 'bool' = False)[source]
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

evaluate()

Essentially a wrapper for a function that evaluates the Constraint. It takes as input a tuple of DNA sequences (Python strings) and an optional Part, where Part is one of Domain, Strand, DomainPair, StrandPair, or Complex (the latter being an alias for arbitrary-length tuple of Strand’s).

The second argument will be None if SingularConstraint.parallel is True (since it’s more expensive to serialize the Domain and Strand objects than strings for passing data to processes executing in parallel).

Thus, if the Constraint needs to use more data about the Part than just its DNA sequence, by accessing the second argument, Constraint.parallel should be set to False.

It should return a Result object.

parallel: bool = False

Whether or not to use parallelization across multiple processes to take advantage of multiple processors/cores, by calling SingularConstraint.evaluate on different DesignParts in separate processes.

call_evaluate(seqs, part)[source]

Evaluates this Constraint using function SingularConstraint.evaluate supplied in constructor.

Parameters
  • seqs (Tuple[str, ...]) – sequence(s) of relevant Part, e.g., if part is a pair of Strand’s, then seqs is a pair of strings

  • part (DesignPart | None) – the Part to be evaluated. Might be None if parallelization is being used, since it is cheaper to serialize only the sequence(s) than the entire Part for passing to other processes to evaluate in parallel.

Returns

pair (excess, summary), where excess is a float indicating how much the constraint was violated (0.0 if satisfied) and summary is a string summarizing the violation (or lack thereof), suitable for printing into a line of a report. For example, if measuring complex free energy -2.5 kcal/mol of a strand and comparing against a threshold -1.0 kcal/mol, excess might be the difference 1.5 between the energy and the threshold, and summary might be the string “-2.5 kcal/mol”.

Return type

Result[DesignPart]

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

Return type

None

class constraints.BulkConstraint(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float]' = <function default_score_transfer_function at 0x7f04a5359160>, evaluate_bulk: 'Callable[[Sequence[DesignPart]], List[Result]]' = <function BulkConstraint.<lambda> at 0x7f04a5220b80>)[source]
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

Return type

None

class constraints.ConstraintWithDomains(domains: 'Tuple[Domain, ...] | None' = None)[source]
Parameters

domains (Tuple[Domain, ...] | None) –

domains: Tuple[Domain, ...] | None = None

Tuple of Domain’s to check; if not specified, all Domain’s in Design are checked.

__init__(domains=None)
Parameters

domains (Tuple[Domain, ...] | None) –

Return type

None

class constraints.ConstraintWithStrands(strands: 'Tuple[Strand, ...] | None' = None)[source]
Parameters

strands (Tuple[Strand, ...] | None) –

strands: Tuple[Strand, ...] | None = None

Tuple of Strand’s to check; if not specified, all Strand’s in Design are checked.

__init__(strands=None)
Parameters

strands (Tuple[Strand, ...] | None) –

Return type

None

class constraints.DomainConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domains=None)[source]

Constraint that applies to a single Domain.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • domains (Tuple[Domain, ...] | None) –

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domains=None)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • domains (Tuple[Domain, ...] | None) –

Return type

None

class constraints.StrandConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strands=None)[source]

Constraint that applies to a single Strand.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • strands (Tuple[Strand, ...] | None) –

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strands=None)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • strands (Tuple[Strand, ...] | None) –

Return type

None

class constraints.ConstraintWithDomainPairs(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float]' = <function default_score_transfer_function at 0x7f04a5359160>, domain_pairs: 'Tuple[DomainPair, ...] | None' = None, pairs: 'InitVar[Iterable[Tuple[Domain, Domain], ...] | None]' = None, check_domain_against_itself: 'bool' = True)[source]
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • domain_pairs (Tuple[DomainPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Domain, Domain], ...] | None]) –

  • check_domain_against_itself (bool) –

domain_pairs: Tuple[DomainPair, ...] | None = None

List of DomainPair’s to check; if not specified, all pairs in Design are checked.

This is set internally in the constructor based on the optional __init__ parameter pairs.

pairs: InitVar[Iterable[Tuple[Domain, Domain], ...] | None] = None

Init-only variable (specified in constructor, but is not a field in the class) for specifying pairs of domains to check; if not specified, all pairs in Design are checked.

check_domain_against_itself: bool = True

Whether to check a domain against itself when checking all pairs of Domain’s in the Design. Only used if ConstraintWithDomainPairs.pairs is not specified, otherwise it is ignored.

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, domain_pairs=None, pairs=None, check_domain_against_itself=True)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • domain_pairs (Tuple[DomainPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Domain, Domain], ...] | None]) –

  • check_domain_against_itself (bool) –

Return type

None

class constraints.ConstraintWithStrandPairs(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float]' = <function default_score_transfer_function at 0x7f04a5359160>, strand_pairs: 'Tuple[StrandPair, ...] | None' = None, pairs: 'InitVar[Iterable[Tuple[Strand, Strand], ...] | None]' = None, check_strand_against_itself: 'bool' = True)[source]
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • strand_pairs (Tuple[StrandPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Strand, Strand], ...] | None]) –

  • check_strand_against_itself (bool) –

strand_pairs: Tuple[StrandPair, ...] | None = None

List of StrandPair’s to check; if not specified, all pairs in Design are checked.

This is set internally in the constructor based on the optional __init__ parameter pairs.

pairs: InitVar[Iterable[Tuple[Strand, Strand], ...] | None] = None

Init-only variable (specified in constructor, but is not a field in the class) for specifying pairs of strands; if not specified, all pairs in Design are checked.

check_strand_against_itself: bool = True

Whether to check a strand against itself when checking all pairs of Strand’s in the Design. Only used if ConstraintWithStrandPairs.pairs is not specified, otherwise it is ignored.

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, strand_pairs=None, pairs=None, check_strand_against_itself=True)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • strand_pairs (Tuple[StrandPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Strand, Strand], ...] | None]) –

  • check_strand_against_itself (bool) –

Return type

None

class constraints.DomainPairConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domain_pairs=None, pairs=None, check_domain_against_itself=True)[source]

Constraint that applies to a pair of Domain’s.

These should be symmetric, meaning that the constraint will give the same evaluation whether its evaluate method is given the pair (domain1, domain2), or the pair (domain2, domain1).

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • domain_pairs (Tuple[DomainPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Domain, Domain], ...] | None]) –

  • check_domain_against_itself (bool) –

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domain_pairs=None, pairs=None, check_domain_against_itself=True)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • domain_pairs (Tuple[DomainPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Domain, Domain], ...] | None]) –

  • check_domain_against_itself (bool) –

Return type

None

description: str

Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’.

class constraints.StrandPairConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strand_pairs=None, pairs=None, check_strand_against_itself=True)[source]

Constraint that applies to a pair of Strand’s.

These should be symmetric, meaning that the constraint will give the same evaluation whether its evaluate method is given the pair (strand1, strand2), or the pair (strand2, strand1).

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • strand_pairs (Tuple[StrandPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Strand, Strand], ...] | None]) –

  • check_strand_against_itself (bool) –

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strand_pairs=None, pairs=None, check_strand_against_itself=True)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • strand_pairs (Tuple[StrandPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Strand, Strand], ...] | None]) –

  • check_strand_against_itself (bool) –

Return type

None

description: str

Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’.

class constraints.DomainsConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, domains=None)[source]

Constraint that applies to a several Domain’s.

The difference with DomainConstraint is that the caller may want to process all Domain’s at once, e.g., by giving many of them to a third-party program such as ViennaRNA, which may be more efficient than repeatedly calling a Python function.

It is assumed that the constraint works by checking one Domain at a time. After computing initial violations of constraints, subsequent calls to this constraint only give the domain that was mutated, not the entire of Domain’s in the whole Design. Use DesignConstraint for constraints that require every Domain in the Design.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • domains (Tuple[Domain, ...] | None) –

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, domains=None)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • domains (Tuple[Domain, ...] | None) –

Return type

None

class constraints.StrandsConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, strands=None)[source]

Constraint that applies to a several Strand’s.

The difference with StrandConstraint is that the caller may want to process all Strand’s at once, e.g., by giving many of them to a third-party program such as ViennaRNA.

It is assumed that the constraint works by checking one Strand at a time. After computing initial violations of constraints, subsequent calls to this constraint only give strands containing the domain that was mutated, not the entire of Strand’s in the whole Design. Use DesignConstraint for constraints that require every Strand in the Design.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • strands (Tuple[Strand, ...] | None) –

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, strands=None)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • strands (Tuple[Strand, ...] | None) –

Return type

None

class constraints.DomainPairsConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, domain_pairs=None, pairs=None, check_domain_against_itself=True)[source]

Similar to DomainsConstraint but operates on a specified list of pairs of Domain’s.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • domain_pairs (Tuple[DomainPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Domain, Domain], ...] | None]) –

  • check_domain_against_itself (bool) –

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, domain_pairs=None, pairs=None, check_domain_against_itself=True)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • domain_pairs (Tuple[DomainPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Domain, Domain], ...] | None]) –

  • check_domain_against_itself (bool) –

Return type

None

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

description: str

Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’.

class constraints.StrandPairsConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, strand_pairs=None, pairs=None, check_strand_against_itself=True)[source]

Similar to StrandsConstraint but operates on a specified list of pairs of Strand’s.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • strand_pairs (Tuple[StrandPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Strand, Strand], ...] | None]) –

  • check_strand_against_itself (bool) –

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, strand_pairs=None, pairs=None, check_strand_against_itself=True)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • strand_pairs (Tuple[StrandPair, ...] | None) –

  • pairs (InitVar[Iterable[Tuple[Strand, Strand], ...] | None]) –

  • check_strand_against_itself (bool) –

Return type

None

description: str

Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’.

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

class constraints.DesignConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_design=<function DesignConstraint.<lambda>>)[source]

Constraint that applies to the entire Design. This is used for any Constraint that does not naturally fit the structure of the other types of constraints.

Unlike other constraints, which specify either Constraint._evaluate or Constraint._evaluate_bulk, a DesignConstraint leaves both of these unspecified and specifies DesignConstraint._evaluate_design instead.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_design (Callable[[Design, Iterable[Domain]], List[Tuple[DesignPart, float, str]]]) –

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_design=<function DesignConstraint.<lambda>>)
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_design (Callable[[Design, Iterable[Domain]], List[Tuple[DesignPart, float, str]]]) –

Return type

None

evaluate_design()

Evaluates the Design (first argument), possibly taking into account which Domain’s have changed in the last iteration (second argument).

Returns a list of tuples (part, score, summary), one tuple per violation of the DesignConstraint.

part is the part of the Design that caused the violation. It must be one of Domain, Strand, pair of Domain’s, or tuple of Strand’s.

score is the score of the violation.

summary is a 1-line summary of the violation to put into the generated reports.

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

constraints.verify_designs_match(design1, design2, check_fixed=True)[source]

Verifies that two designs match, other than their constraints. This is useful when loading a design that has been saved in the middle of searching for DNA sequences, to verify that it matches a design created before the DNA sequence search started.

Parameters
  • design1 (Design) – A Design.

  • design2 (Design) – Another Design.

  • check_fixed (bool) – Whether to check for fixed sequences equal between the two (may want to not check in case these are set later).

Raises

ValueError – If the designs do not match. Here is what is checked: - strand names and group names appear in the same order - domain names and pool names appear in the same order in strands with the same name - Domain.fixed matches between Domain’s

Return type

None

constraints.convert_threshold(threshold, key)[source]
Parameters
  • threshold (float | Dict[T, float]) – either a single float, or a dictionary mapping instances of T to floats

  • key (T) – instance of T

Returns

threshold for key

Return type

float

constraints.nupack_domain_free_energy_constraint(threshold, temperature=37.0, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=<function default_score_transfer_function>, parallel=False, description=None, short_description='strand_ss_nupack', domains=None)[source]

Returns constraint that checks individual Domain’s for excessive interaction using NUPACK’s pfunc.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • threshold (float) – energy threshold in kcal/mol

  • temperature (float) – temperature in Celsius

  • sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter

  • magnesium (float) – molarity of magnesium (Mg++) in moles per liter

  • weight (float) – how much to weigh this Constraint

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.

  • domains (Iterable[Domain] | None) – Domain’s to check; if not specified, all domains are checked.

  • description (str | None) – detailed description of constraint suitable for putting in report; if not specified a reasonable default is chosen

  • short_description (str) – short description of constraint suitable for logging to stdout

Returns

the constraint

Return type

DomainConstraint

constraints.nupack_strand_free_energy_constraint(threshold, temperature=37.0, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=<function default_score_transfer_function>, parallel=False, description=None, short_description='strand_ss_nupack', strands=None)[source]

Returns constraint that checks individual Strand’s for excessive interaction using NUPACK’s pfunc. This is the so-called “complex free energy”: https://docs.nupack.org/definitions/#complex-free-energy

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • threshold (float) – energy threshold in kcal/mol

  • temperature (float) – temperature in Celsius

  • sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter

  • magnesium (float) – molarity of magnesium (Mg++) in moles per liter

  • weight (float) – how much to weigh this Constraint

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.

  • strands (Iterable[Strand] | None) – Strands to check; if not specified, all strands are checked.

  • description (str | None) – detailed description of constraint suitable for putting in report; if not specified a reasonable default is chosen

  • short_description (str) – short description of constraint suitable for logging to stdout

Returns

the constraint

Return type

StrandConstraint

constraints.nupack_domain_pair_constraint(threshold, temperature=37.0, sodium=0.05, magnesium=0.0125, parallel=False, weight=1.0, score_transfer_function=<function default_score_transfer_function>, description=None, short_description='dom_pair_nupack', pairs=None)[source]

Returns constraint that checks given pairs of Domain’s for excessive interaction using NUPACK’s pfunc executable. Each of the four combinations of seq1, seq2 and their Watson-Crick complements are compared.

Parameters
  • threshold (float) – Energy threshold in kcal/mol.

  • temperature (float) – Temperature in Celsius

  • sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter

  • magnesium (float) – molarity of magnesium (Mg++) in moles per liter

  • parallel (bool) – Whether to test the each pair of Domain’s in parallel (i.e., sets field Constraint.parallel)

  • weight (float) – How much to weigh this Constraint.

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • description (str | None) – Detailed description of constraint suitable for summary report.

  • short_description (str) – Short description of constraint suitable for logging to stdout.

  • pairs (Iterable[Tuple[Domain, Domain]] | None) – Pairs of Domain’s to compare; if not specified, checks all pairs (including a Domain against itself).

Returns

The DomainPairConstraint.

Return type

DomainPairConstraint

constraints.nupack_strand_pair_constraints_by_number_matching_domains(thresholds, temperature=37.0, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=<function default_score_transfer_function>, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None)[source]

Convenience function for creating many constraints as returned by nupack_strand_pair_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.

Optional parameters description and short_description are also dicts keyed by the same keys.

Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.

It is also common to set different thresholds according to the lengths of the strands. This can be done by calling strand_pairs_by_lengths() to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.

Parameters
  • thresholds (Dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].

  • temperature (float) – Temperature in Celsius.

  • sodium (float) – concentration of Na+ in molar

  • magnesium (float) – concentration of Mg++ in molar

  • weight (float) – How much to weigh this Constraint.

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • descriptions (Dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.

  • short_descriptions (Dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.

  • parallel (bool) – Whether to test the each pair of Strand’s in parallel.

  • strands (Iterable[Strand] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in pairs. Mutually exclusive with pairs.

  • pairs (Iterable[Tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.

Returns

list of constraints, one per threshold in thresholds

Return type

List[StrandPairConstraint]

constraints.nupack_strand_pair_constraint(threshold, temperature=37.0, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=<function default_score_transfer_function>, description=None, short_description='strand_pair_nupack', parallel=False, pairs=None)[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using NUPACK’s pfunc function.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • threshold (float) – Energy threshold in kcal/mol

  • temperature (float) – Temperature in Celsius

  • sodium (float) – concentration of Na+ in molar

  • magnesium (float) – concentration of Mg++ in molar

  • weight (float) – How much to weigh this Constraint.

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.

  • description (str | None) – Detailed description of constraint suitable for report.

  • short_description (str) – Short description of constraint suitable for logging to stdout.

  • pairs (Iterable[Tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs (including a Strand against itself).

Returns

The StrandPairConstraint.

Return type

StrandPairConstraint

constraints.chunker(sequence, chunk_length=None, num_chunks=None)[source]

Collect data into fixed-length chunks or blocks, e.g., chunker(‘ABCDEFG’, 3) –> ABC DEF G

Parameters
  • sequence (Sequence[T]) – Sequence (list or tuple) of items.

  • chunk_length (int | None) – Length of each chunk. Mutually exclusive with num_chunks.

  • num_chunks (int | None) – Number of chunks. Mutually exclusive with chunk_length.

Returns

List of num_chunks lists, each list of length chunk_length (one of num_chunks or chunk_length will be calculated from the other).

Return type

List[List[T]]

constraints.cpu_count(logical=False)[source]

Counts the number of physical CPUs (cores). For greatest accuracy, requires the 3rd party psutil package to be installed.

Parameters

logical (bool) – Whether to count number of logical processors or physical CPU cores.

Returns

Number of physical CPU cores if logical is False and package psutils is installed; otherwise, the number of logical processors.

Return type

int

constraints.rna_duplex_domain_pairs_constraint(threshold, temperature=37.0, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_dup_dom_pairs', pairs=None, parameters_filename='dna_mathews1999.par')[source]

Returns constraint that checks given pairs of Domain’s for excessive interaction using Vienna RNA’s RNAduplex executable.

Parameters
  • threshold (float) – energy threshold

  • temperature (float) – temperature in Celsius

  • weight (float) – how much to weigh this Constraint

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • description (str | None) – long description of constraint suitable for printing in report file

  • short_description (str) – short description of constraint suitable for logging to stdout

  • pairs (Iterable[Tuple[Domain, Domain]] | None) – pairs of Domain’s to compare; if not specified, checks all pairs

  • parameters_filename (str) – name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()

Returns

constraint

Return type

DomainPairsConstraint

constraints.strand_pairs_by_lengths(strands)[source]

Separates pairs of strands in strands by lengths. If there are n different strand lengths in strands, then there are ((n+1) choose 2) keys in the returned dict, one for each pair of lengths (len1, len2), including pairs where len1 == len2. This key maps to a list of all pairs of strands in strands where the first strand has length len1 and the second has length len2.

Parameters

strands (Iterable[Strand]) – strands to check

Returns

dict mapping pairs of lengths to pairs of strands from strands having those respective lengths

Return type

Dict[Tuple[int, int], List[Tuple[Strand, Strand]]]

constraints.strand_pairs_by_number_matching_domains(*, strands=None, pairs=None)[source]

Utility function for calculating number of complementary domains betweeen several pairs of strands.

Parameters
  • strands (Iterable[Strand] | None) – list of Strand’s in which to find pairs. Mutually exclusive with pairs.

  • pairs (Iterable[Tuple[Strand, Strand]] | None) – list of pairs of strands. Mutually exclusive with strands.

Returns

dict mapping integer (number of complementary Domain’s) to the list of pairs of strands in strands with that number of complementary domains

Return type

Dict[int, List[Tuple[Strand, Strand]]]

constraints.rna_cofold_strand_pairs_constraints_by_number_matching_domains(*, thresholds, temperature=37.0, weight=1.0, score_transfer_function=<function default_score_transfer_function>, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, parameters_filename='dna_mathews1999.par')[source]

Similar to rna_duplex_strand_pairs_constraints_by_number_matching_domains() but creates constraints as returned by rna_cofold_strand_pairs_constraint().

Parameters
  • thresholds (Dict[int, float]) –

  • temperature (float) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • descriptions (Dict[int, str] | None) –

  • short_descriptions (Dict[int, str] | None) –

  • parallel (bool) –

  • strands (Iterable[Strand] | None) –

  • pairs (Iterable[Tuple[Strand, Strand]] | None) –

  • parameters_filename (str) –

Return type

List[StrandPairsConstraint]

constraints.rna_duplex_strand_pairs_constraints_by_number_matching_domains(*, thresholds, temperature=37.0, weight=1.0, score_transfer_function=<function default_score_transfer_function>, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, parameters_filename='dna_mathews1999.par')[source]

Convenience function for creating many constraints as returned by rna_duplex_strand_pairs_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.

Optional parameters description and short_description are also dicts keyed by the same keys.

Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.

It is also common to set different thresholds according to the lengths of the strands. This can be done by calling strand_pairs_by_lengths() to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.

Parameters
  • thresholds (Dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].

  • temperature (float) – Temperature in Celsius.

  • weight (float) – How much to weigh this Constraint.

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • descriptions (Dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.

  • short_descriptions (Dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.

  • parallel (bool) – Whether to test the each pair of Strand’s in parallel.

  • strands (Iterable[Strand] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in pairs. Mutually exclusive with pairs.

  • pairs (Iterable[Tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.

  • parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()

Returns

list of constraints, one per threshold in thresholds

Return type

List[StrandPairsConstraint]

constraints.rna_duplex_strand_pairs_constraint(*, threshold, temperature=37.0, weight=1.0, score_transfer_function=<function default_score_transfer_function>, description=None, short_description='rna_dup_strand_pairs', parallel=False, pairs=None, parameters_filename='dna_mathews1999.par')[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using Vienna RNA’s RNAduplex executable.

Often one wishes to let the threshold depend on how many domains match between a pair of strands. The function rna_duplex_strand_pairs_constraints_by_number_matching_domains() is useful for this purpose, returning a list of StrandPairsConstraint’s such as those returned by this function, one for each possible number of matching domains.

Parameters
  • threshold (float) – Energy threshold in kcal/mol. If a float, this is used for all pairs of strands. If a dict[int, float], interpreted to mean that

  • temperature (float) – Temperature in Celsius.

  • weight (float) – How much to weigh this Constraint.

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • description (str | None) – Long description of constraint suitable for putting into constraint report.

  • short_description (str) – Short description of constraint suitable for logging to stdout.

  • parallel (bool) – Whether to test the each pair of Strand’s in parallel.

  • pairs (Iterable[Tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in design.

  • parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()

Returns

The StrandPairsConstraint.

Return type

StrandPairsConstraint

constraints.rna_cofold_strand_pairs_constraint(*, threshold, temperature=37.0, weight=1.0, score_transfer_function=<function default_score_transfer_function>, description=None, short_description='rna_dup_strand_pairs', parallel=False, pairs=None, parameters_filename='dna_mathews1999.par')[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using Vienna RNA’s RNAduplex executable.

Parameters
  • threshold (float) – Energy threshold in kcal/mol

  • temperature (float) – Temperature in Celsius.

  • weight (float) – How much to weigh this Constraint.

  • score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.

  • description (str | None) – Long description of constraint suitable for putting into constraint report.

  • short_description (str) – Short description of constraint suitable for logging to stdout.

  • parallel (bool) – Whether to test the each pair of Strand’s in parallel.

  • pairs (Iterable[Tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs.

  • parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()

Returns

The StrandPairsConstraint.

Return type

StrandPairsConstraint

class constraints.ConstraintWithComplexes(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float]' = <function default_score_transfer_function at 0x7f04a5359160>, complexes: 'Tuple[Complex, ...]' = ())[source]
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • complexes (Tuple[Complex, ...]) –

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, complexes=())
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • complexes (Tuple[Complex, ...]) –

Return type

None

complexes: Tuple[Complex, ...] = ()

List of complexes (tuples of Strand’s) to check.

class constraints.ComplexConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, complexes=())[source]

Constraint that applies to a complex (tuple of Strand’s).

Specify Constraint._evaluate in the constructor.

Unlike other types of Constraint’s such as StrandConstraint or StrandPairConstraint, there is no default list of Complex’s that a ComplexConstraint is applied to. The list of Complex’s must be specified manually in the constructor.

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • complexes (Tuple[Complex, ...]) –

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate=<function SingularConstraint.<lambda>>, parallel=False, complexes=())
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate (Callable[[Tuple[str, ...], DesignPart | None], Result[DesignPart]]) –

  • parallel (bool) –

  • complexes (Tuple[Complex, ...]) –

Return type

None

description: str

Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’.

part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

class constraints.ComplexesConstraint(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, complexes=())[source]

Similar to ComplexConstraint but operates on a specified list of complexes (tuples of Strand’s).

Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • complexes (Tuple[Complex, ...]) –

__init__(description, short_description='', weight=1.0, score_transfer_function=<function default_score_transfer_function>, evaluate_bulk=<function BulkConstraint.<lambda>>, complexes=())
Parameters
  • description (str) –

  • short_description (str) –

  • weight (float) –

  • score_transfer_function (Callable[[float], float]) –

  • evaluate_bulk (Callable[[Sequence[DesignPart]], List[Result]]) –

  • complexes (Tuple[Complex, ...]) –

Return type

None

description: str

Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’.

part_name()[source]

Returns name of the Part that this Constraint tests.

Returns

name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)

Return type

str

constraints.default_interior_to_strand_probability = 0.98

Default probability threshold for BasePairType.INTERIOR_TO_STRAND

constraints.default_adjacent_to_exterior_base_pair = 0.95

Default probability threshold for BasePairType.ADJACENT_TO_EXTERIOR_BASE_PAIR

constraints.default_blunt_end_probability = 0.33

Default probability threshold for BasePairType.BLUNT_END

constraints.default_nick_3p_probability = 0.79

Default probability threshold for BasePairType.NICK_3P

constraints.default_nick_5p_probability = 0.73

Default probability threshold for BasePairType.NICK_5P

constraints.default_dangle_3p_probability = 0.51

Default probability threshold for BasePairType.DANGLE_3P

constraints.default_dangle_5p_probability = 0.57

Default probability threshold for BasePairType.DANGLE_5P

constraints.default_dangle_5p_3p_probability = 0.73

Default probability threshold for BasePairType.DANGLE_5P_3P

constraints.default_overhang_on_this_strand_3p_probability = 0.82

Default probability threshold for BasePairType.OVERHANG_ON_THIS_STRAND_3P

constraints.default_overhang_on_this_strand_5p_probability = 0.79

Default probability threshold for BasePairType.OVERHANG_ON_THIS_STRAND_5P

constraints.default_overhang_on_adjacent_strand_3p_probability = 0.55

Default probability threshold for BasePairType.OVERHANG_ON_ADJACENT_STRAND_3P

constraints.default_overhang_on_adjacent_strand_5p_probability = 0.49

Default probability threshold for BasePairType.OVERHANG_ON_ADJACENT_STRAND_5P

constraints.default_overhang_on_both_strand_3p_probability = 0.61

Default probability threshold for BasePairType.OVERHANG_ON_BOTH_STRANDS_3P

constraints.default_overhang_on_both_strand_5p_probability = 0.55

Default probability threshold for BasePairType.OVERHANG_ON_BOTH_STRANDS_5P

constraints.default_three_arm_junction_probability = 0.69

Default probability threshold for BasePairType.THREE_ARM_JUNCTION

constraints.default_four_arm_junction_probability = 0.84

Default probability threshold for BasePairType.FOUR_ARM_JUNCTION

constraints.default_five_arm_junction_probability = 0.77

Default probability threshold for BasePairType.FIVE_ARM_JUNCTION

constraints.default_mismatch_probability = 0.76

Default probability threshold for BasePairType.MISMATCH

constraints.default_bulge_loop_3p_probability = 0.69

Default probability threshold for BasePairType.BULGE_LOOP_3P

constraints.default_bulge_loop_5p_probability = 0.65

Default probability threshold for BasePairType.BULGE_LOOP_5P

constraints.default_unpaired_probability = 0.95

Default probability threshold for BasePairType.UNPAIRED

constraints.default_other_probability = 0.7

Default probability threshold for BasePairType.OTHER

class constraints.BasePairType(value)[source]

Represents different configurations for a base pair and it’s immediate neighboring base pairs (or lack thereof).

Notation:

  • “#” indicates denotes the ends of a domain. They can either be the end of a strand or they could be connected to another domain.

  • “]” and “[” indicates 5’ ends of strand

  • “>” and “<” indicates 3’ ends of a strand

  • “-” indicates a base (number of these are not important).

  • “|” indicates a bases are bound (forming a base pair). Any “-” not connected by “|” is unbound

Domain Example:

The following represents an unbound domain of length 5

#-----#

The following represents bound domains of length 5

#-----#
 |||||
#-----#

Ocassionally, domains will be vertical in the case of overhangs. In this case, “-” and “|” have opposite meanings

Vertical Domain Example:

# #
|-|
|-|
|-|
|-|
|-|
# #

Formatting:

  • Top strands have 5’ end on left side and 3’ end on right side

  • Bottom strand have 3’ end on left side and 5’ end on right side

Strand Example:

strand0: a-b-c-d
strand1: d*-b*-c*-a*

            a      b      c      d
strand0  [-----##-----##-----##----->
          |||||  |||||  |||||  |||||
strand1  <-----##-----##-----##-----]
            a*     b*     c*     d*

Consecutive “#”:

In some cases, extra “#” are needed to to make space for ascii art. We consider any consecutive “#”s to be equivalent “##”. The following is consider equivalent to the example above

            a       b        c      d
strand0  [-----###-----####-----##----->
          |||||   |||||    |||||  |||||
strand1  <-----###-----####-----##-----]
            a*      b*       c*     d*

Note that only consecutive “#”s is consider equivalent to “##”. The following example is not equivalent to the strands above because the “# #” between b and c are seperated by spaces, so they are not equivalent to “##”, meaning that b and c neednot be adjacent. Note that while b and c need not be adjacent, b* and c* are still adjacent because they are seperated by consecutive “#”s with no spaces in between.

            a       b        c      d
strand0  [-----###-----#  #-----##----->
          |||||   |||||    |||||  |||||
strand1  <-----###-----####-----##-----]
            a*      b*       c*     d*
INTERIOR_TO_STRAND = 1

Base pair is located inside of a strand but not next to a base pair that resides on the end of a strand.

Similar base-pairing probability compared to ADJACENT_TO_EXTERIOR_BASE_PAIR but usually breathes less.

#-----##-----#
 |||||  |||||
#-----##-----#
     ^
     |
 base pair
ADJACENT_TO_EXTERIOR_BASE_PAIR = 2

Base pair is located inside of a strand and next to a base pair that resides on the end of a strand.

Similar base-pairing probability compared to INTERIOR_TO_STRAND but usually breathes more.

#-----#
 |||||
#-----]
    ^
    |
base pair

or

#----->
 |||||
#-----#
    ^
    |
base pair
BLUNT_END = 3

Base pair is located at the end of both strands.

#----->
 |||||
#-----]
     ^
     |
 base pair
NICK_3P = 4

Base pair is located at a nick involving the 3’ end of the strand.

#----->[-----#
 |||||  |||||
#-----##-----#
     ^
     |
 base pair
NICK_5P = 5

Base pair is located at a nick involving the 3’ end of the strand.

#-----##-----#
 |||||  |||||
#-----]<-----#
     ^
     |
 base pair
DANGLE_3P = 6

Base pair is located at the end of a strand with a dangle on the 3’ end.

#-----##----#
 |||||
#-----]
     ^
     |
 base pair
DANGLE_5P = 7

Base pair is located at the end of a strand with a dangle on the 5’ end.

#----->
 |||||
#-----##----#
     ^
     |
 base pair
DANGLE_5P_3P = 8

Base pair is located with dangle at both the 3’ and 5’ end.

#-----##----#
 |||||
#-----##----#
     ^
     |
 base pair
OVERHANG_ON_THIS_STRAND_3P = 9

Base pair is next to a overhang on the 3’ end.

      #
      |
      |
      |
      #
#-----# #-----#
 |||||   |||||
#-----###-----#
     ^
     |
 base pair
OVERHANG_ON_THIS_STRAND_5P = 10

Base pair is next to a overhang on the 5’ end.

 base pair
     |
     v
#-----###-----#
 |||||   |||||
#-----# #-----#
      #
      |
      |
      |
      #
OVERHANG_ON_ADJACENT_STRAND_3P = 11

Base pair 3’ end interfaces with an overhang.

The adjacent base pair type is OVERHANG_ON_THIS_STRAND_5P

        #
        |
        |
        |
        #
#-----# #---#
 |||||   |||
#-----###---#
     ^
     |
 base pair
OVERHANG_ON_ADJACENT_STRAND_5P = 12

Base pair 5’ end interfaces with an overhang.

The adjacent base pair type is OVERHANG_ON_THIS_STRAND_3P

 base pair
     |
     v
#-----###-----#
 |||||   |||||
#-----# #-----#
        #
        |
        |
        |
        #
OVERHANG_ON_BOTH_STRANDS_3P = 13

Base pair’s 3’ end is an overhang and adjacent strand also has an overhang.

      # #
      | |
      | |
      | |
      # #
#-----# #---#
 |||||  ||||
#-----###---#
     ^
     |
 base pair
OVERHANG_ON_BOTH_STRANDS_5P = 14

Base pair’s 5’ end is an overhang and adjacent strand also has an overhang.

 base pair
     |
     v
#-----###-----#
 |||||   |||||
#-----# #-----#
      # #
      | |
      | |
      | |
      # #
THREE_ARM_JUNCTION = 15

Base pair is located next to a three-arm-junction.

      # #
      |-|
      |-|
      |-|
      # #
#-----# #---#
 |||||  ||||
#-----###---#
     ^
     |
 base pair
FOUR_ARM_JUNCTION = 16

Currently, this case isn’t actually detected (considered as OTHER).

Base pair is located next to a four-arm-junction (e.g. Holliday junction).

      # #
      |-|
      |-|
      |-|
      # #
#-----# #-----#
 |||||   |||||
#-----# #-----#
      # #
      |-|
      |-|
      |-|
      # #
Type

TODO

FIVE_ARM_JUNCTION = 17

Currently, this case isn’t actually detected (considered as OTHER).

Base pair is located next to a five-arm-junction.

Type

TODO

MISMATCH = 18

Currently, this case isn’t actually detected (considered as DANGLE_5P_3P).

Base pair is located next to a mismatch.

#-----##-##-----#
 |||||     |||||
#-----##-##-----#
     ^
     |
 base pair
Type

TODO

BULGE_LOOP_3P = 19

Currently, this case isn’t actually detected (considered as OVERHANG_ON_BOTH_STRANDS_3P).

Base pair is located next to a mismatch.

#-----##-##-----#
 |||||     |||||
#-----#####-----#
     ^
     |
 base pair
Type

TODO

BULGE_LOOP_5P = 20

Currently, this case isn’t actually detected (considered as OVERHANG_ON_BOTH_STRANDS_5P).

Base pair is located next to a mismatch.

#-----#####-----#
 |||||     |||||
#-----##-##-----#
     ^
     |
 base pair
Type

TODO

UNPAIRED = 21

Base is unpaired.

Probabilities specify how unlikely a base is to be paired with another base.

OTHER = 22

Other base pair types.

class constraints.StrandDomainAddress(strand, domain_idx)[source]

An addressing scheme for specifying a domain on a strand.

Parameters
  • strand (Strand) –

  • domain_idx (int) –

__init__(strand, domain_idx)
Parameters
  • strand (Strand) –

  • domain_idx (int) –

Return type

None

strand: Strand

strand to index

domain_idx: int

order in which domain appears in StrandDomainAddress.strand

neighbor_5p()[source]

Returns 5’ domain neighbor. If domain is 5’ end of strand, returns None

Returns

StrandDomainAddress of 5’ neighbor or None if no 5’ neighbor

Return type

StrandDomainAddress | None

neighbor_3p()[source]

Returns 3’ domain neighbor. If domain is 3’ end of strand, returns None

Returns

StrandDomainAddress of 3’ neighbor or None if no 3’ neighbor

Return type

StrandDomainAddress | None

domain()[source]

Returns domain referenced by this address.

Returns

domain

Return type

Domain

constraints.BaseAddress = typing.Union[int, typing.Tuple[constraints.StrandD...

Represents a reference to a base. Can be either specified as a NUPACK base index or an index of a dsd StrandDomainAddress:

alias of Union[int, Tuple[StrandDomainAddress, int]]

constraints.BasePairAddress = typing.Tuple[typing.Union[int, typing.Tuple[constr...

Represents a reference to a base pair

alias of Tuple[Union[int, Tuple[StrandDomainAddress, int]], Union[int, Tuple[StrandDomainAddress, int]]]

constraints.BoundDomains = typing.Tuple[constraints.StrandDomainAddress, cons...

Represents bound domains

alias of Tuple[StrandDomainAddress, StrandDomainAddress]

constraints.nupack_complex_base_pair_probability_constraint(strand_complexes, nonimplicit_base_pairs=None, all_base_pairs=None, base_pair_prob_by_type=None, base_pair_prob_by_type_upper_bound=None, base_pair_prob=None, base_unpaired_prob=None, base_pair_prob_upper_bound=None, base_unpaired_prob_upper_bound=None, temperature=37.0, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=<function default_score_transfer_function>, description=None, short_description='ComplexBPProbs', parallel=False)[source]

Returns constraint that checks given base pairs probabilities in tuples of Strand’s

Parameters
  • strand_complexes (List[Complex]) – Iterable of Strand tuples

  • nonimplicit_base_pairs (Optional[Iterable[BoundDomains]]) –

    List of nonimplicit base pairs that cannot be inferred because multiple instances of the same Domain exist in complex.

    The StrandDomainAddress.strand field of each address should reference a strand in the first complex in strand_complexes.

    For example, if one Strand has one T Domain and another strand in the complex has two T* Domain s, then the intended binding graph cannot be inferred and must be stated explicitly in this field.

  • all_base_pairs (Optional[Iterable[BoundDomains]]) –

    List of all base pairs in complex. If not provided, then base pairs are infered based on the name of Domain s in the complex as well as base pairs specified in nonimplicit_base_pairs.

    TODO: This has not been implemented yet, and the behavior is as if this parameter is always None (binding graph is always inferred).

  • base_pair_prob_by_type (Optional[Dict[BasePairType, float]]) –

    Probability lower bounds for each BasePairType. All BasePairType comes with a default such as default_interior_to_strand_probability which will be used if a lower bound is not specified for a particular type.

    Note: Despite the name of this parameter, set thresholds for unpaired bases by specifying a threshold for BasePairType.UNPAIRED.

  • base_pair_prob_by_type_upper_bound (Dict[BasePairType, float], optional) –

    Probability upper bounds for each BasePairType. By default, no upper bound is set.

    Note: Despite the name of this parameter, set thresholds for unpaired bases by specifying a threshold for BasePairType.UNPAIRED.

    TODO: This has not been implemented yet.

  • base_pair_prob (Optional[Dict[BasePairAddress, float]]) –

    Probability lower bounds for each BasePairAddress which takes precedence over probabilities specified by base_pair_prob_by_type.

    TODO: This has not been implemented yet.

  • base_unpaired_prob (Optional[Dict[BaseAddress, float]]) – Probability lower bounds for each BaseAddress representing unpaired bases. These lower bounds take precedence over the probability specified by base_pair_prob_by_type[BasePairType.UNPAIRED].

  • base_pair_prob_upper_bound (Optional[Dict[BasePairAddress, float]]) – Probability upper bounds for each :py:class`BasePairAddress` which takes precedence over probabilties specified by base_pair_prob_by_type_upper_bound.

  • base_unpaired_prob_upper_bound (Optional[Dict[BaseAddress, float]]) – Probability upper bounds for each BaseAddress representing unpaired bases. These lower bounds take precedence over the probability specified by base_pair_prob_by_type_upper_bound[BasePairType.UNPAIRED].

  • temperature (float, optional) – Temperature specified in °C, defaults to vienna_nupack.default_temperature.

  • sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter

  • magnesium (float) – molarity of magnesium (Mg++) in moles per liter

  • weight (float, optional) – See Constraint.weight, defaults to 1.0

  • score_transfer_function (Callable[[float], float], optional) – Score transfer function to use. By default, f(x) = x**2 is used, where x is the sum of the squared errors of each base pair that violates the threshold.

  • description (str | None, optional) – See Constraint.description, defaults to None

  • short_description (str, optional) – See Constraint.short_description defaults to ‘complex_secondary_structure_nupack’

  • parallel (bool, optional) – TODO: Implement this

Raises
  • ImportError – If NUPACK 4 is not installed.

  • ValueError

    If strand_complexes is not valid. In order for strand_complexes to be valid, strand_complexes must:

    • Consist of complexes (tuples of Strand objects)

    • Each complex must be of the same motif

Returns

ComplexConstraint

Return type

ComplexConstraint

nuad.modifications

class modifications.ModificationType(value)[source]

Type of modification (5’, 3’, or internal).

five_prime = "5'"

5’ modification type

three_prime = "5'"

3’ modification type

internal = 'internal'

internal modification type

class modifications.Modification(idt_text, id='WARNING: no id assigned to modification')[source]

Abstract case class of modifications (to DNA sequences, e.g., biotin or Cy3). Use concrete subclasses Modification3Prime, Modification5Prime, or ModificationInternal to instantiate.

If Modification.id is not specified, then Modification.idt_text is used as the unique ID. Each Modification.id must be unique. For example if you create a 5’ “modification” to represent 6 T bases: t6_5p = Modification5Prime(display_text='6T', idt_text='TTTTTT') (this is a useful hack for putting single-stranded extensions on strands until loopouts on the end of a strand are supported; see https://github.com/UC-Davis-molecular-computing/scadnano-python-package/issues/2), then this would clash with a similar 3’ modification without specifying unique IDs for them: t6_3p = Modification3Prime(display_text='6T', idt_text='TTTTTT') # ERROR.

In general it is recommended to create a single Modification object for each type of modification in the design. For example, if many strands have a 5’ biotin, then it is recommended to create a single Modification object and re-use it on each strand with a 5’ biotin:

biotin_5p = Modification5Prime(display_text='B', idt_text='/5Biosg/')
design.strand(0, 0).move(8).with_modification_5p(biotin_5p)
design.strand(1, 0).move(8).with_modification_5p(biotin_5p)
Parameters
  • idt_text (str) –

  • id (str) –

idt_text: str

IDT text string specifying this modification (e.g., ‘/5Biosg/’ for 5’ biotin). optional

id: str = 'WARNING: no id assigned to modification'

Representation as a string; used to write in Strand json representation, while the full description of the modification is written under a global key in the Design. If not specified, but Modification.idt_text is specified, then it will be set equal to that.

__init__(idt_text, id='WARNING: no id assigned to modification')
Parameters
  • idt_text (str) –

  • id (str) –

Return type

None

class modifications.Modification5Prime(idt_text, id='WARNING: no id assigned to modification')[source]

5’ modification of DNA sequence, e.g., biotin or Cy3.

Parameters
  • idt_text (str) –

  • id (str) –

__init__(idt_text, id='WARNING: no id assigned to modification')
Parameters
  • idt_text (str) –

  • id (str) –

Return type

None

idt_text: str

IDT text string specifying this modification (e.g., ‘/5Biosg/’ for 5’ biotin). optional

class modifications.Modification3Prime(idt_text, id='WARNING: no id assigned to modification')[source]

3’ modification of DNA sequence, e.g., biotin or Cy3.

Parameters
  • idt_text (str) –

  • id (str) –

__init__(idt_text, id='WARNING: no id assigned to modification')
Parameters
  • idt_text (str) –

  • id (str) –

Return type

None

idt_text: str

IDT text string specifying this modification (e.g., ‘/5Biosg/’ for 5’ biotin). optional

class modifications.ModificationInternal(idt_text, id='WARNING: no id assigned to modification', allowed_bases=None)[source]

Internal modification of DNA sequence, e.g., biotin or Cy3.

Parameters
  • idt_text (str) –

  • id (str) –

  • allowed_bases (AbstractSet[str] | None) –

__init__(idt_text, id='WARNING: no id assigned to modification', allowed_bases=None)
Parameters
  • idt_text (str) –

  • id (str) –

  • allowed_bases (AbstractSet[str] | None) –

Return type

None

allowed_bases: AbstractSet[str] | None = None

If None, then this is an internal modification that goes between bases. If instead it is a list of bases, then this is an internal modification that attaches to a base, and this lists the allowed bases for this internal modification to be placed at. For example, internal biotins for IDT must be at a T. If any base is allowed, it should be ['A','C','G','T'].

nuad.vienna_nupack

Contains utility functions for accessing NUPACK 4 and ViennaRNA energy calculation algorithms.

The main functions are pfunc() (for calculating complex free energy with NUPACK, along with its helper functions secondary_structure_single_strand() and binding()), nupack_complex_base_pair_probabilities() (for calculating base pair probabilities with NUPACK), rna_duplex_multiple() (for calculating an approximation to two-strand complex free energy that is much faster than calling pfunc() on the same pair of strands).

vienna_nupack.calculate_strand_association_penalty(temperature, num_seqs)[source]

Additive adjustment factor to convert NUPACK’s mole fraction units to molarity.

For details on why this is needed for multi-stranded complexes, see Section S1.1 of http://www.nupack.org/downloads/serve_public_file/fornace20_supp.pdf?type=pdf and Figure 2 of http://www.nupack.org/downloads/serve_public_file/nupack_user_guide_3.2.2.pdf?type=pdf

Parameters
  • temperature (float) – temperature in Celsius

  • num_seqs (int) – number of sequences

Returns

Additive adjustment factor to convert NUPACK’s mole fraction units to molar.

Return type

float

vienna_nupack.pfunc(seqs, temperature=37.0, sodium=0.05, magnesium=0.0125, strand_association_penalty=True)[source]

Calls pfunc from NUPACK 4 (http://www.nupack.org/) on a complex consisting of the unique strands in seqs, returns energy (“delta G”), i.e., generally a negative number.

By default, a strand association penalty is applied that is not applied by NUPACK’s pfunc. See strand_association_penalty parameter documentation for details.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • seqs (str | Tuple[str, ...]) – DNA sequences (tuple, or a single DNA sequence), whose order indicates a cyclic permutation of the complex. For one or two sequences, there is only one cyclic permutation, so the order doesn’t matter in such cases.

  • temperature (float) – temperature in Celsius

  • sodium (float) – molarity of sodium in moles per liter

  • magnesium (float) – molarity of magnesium in moles per liter

  • strand_association_penalty (bool) – Add strand association penalty for a complex, related to converting NUPACK’s mole fraction units to molarity. The quantity added is that returned by calculate_strand_association_penalty() with parameters temperature and len(seqs). For most constraints, which involve only one size of complex, this factor won’t matter other than to adjust the energy threshold by the same factor. The factor depends only on the number of strands in seqs, but not on their sequences. However, this factor is needed for a meaningful comparison of energies between complexes of different sizes, e.g., to calculate equilibrium concentrations of complexes of various sizes. For details on why this is needed for multi-stranded complexes, see Section S1.1 of http://www.nupack.org/downloads/serve_public_file/fornace20_supp.pdf?type=pdf and Figure 2 of http://www.nupack.org/downloads/serve_public_file/nupack_user_guide_3.2.2.pdf?type=pdf

Returns

complex free energy (“delta G”) of ordered complex with strands in given cyclic permutation

Return type

float

vienna_nupack.nupack_complex_base_pair_probabilities(strand_complex, temperature=37.0, sodium=0.05, magnesium=0.0125)[source]

Calculates base-pair probabilities according to NUPACK 4.

Parameters
  • strand_complex (Complex) – Ordered tuple of strands in complex (specifying a particular circular ordering, which is imposed on all considered secondary structures)

  • temperature (float) – temperature in Celsius

  • sodium (float) – molarity of sodium in moles per liter

  • magnesium (float) – molarity of magnesium in moles per liter

Returns

2D Numpy array of floats, with result[i1][i2] giving the base-pair probability of base at position i1 with base at position i2 (if i1 != i2), where i1 and i2 are the absolute positions of the bases in the entire ordered list of strands. For example, with strands AAAA and TTTTT, there are nine indices 0,1,2,3,4,5,6,7,8, with positions 0,1,2,3 on the first strand AAAA, and positions 4,5,6,7,8 on the second strand TTTTT. If i1 == i2, then result[i1][i1] is the probability that the base at position i1 is unpaired.

Return type

ndarray

vienna_nupack.call_subprocess(command_strs, user_input)[source]

Calls system command through a subprocess. Assumes running on a POSIX operating system.

If running on Windows, automatically appends “wsl -e” to start of command to call command through Windows subsystem for Linux, so wsl must be installed for this to work: https://docs.microsoft.com/en-us/windows/wsl/install-win10

Parameters
  • command_strs (List[str]) – List of command and command line arguments, i.e., to call ls -l -a, command_strs should be the list [‘ls’, ‘-l’, ‘-a’].

  • user_input (str) – Input to give once program is running (i.e., what would user type).

Returns

pair of strings (output, error), giving the strings written to stdout and stderr, respectively.

Return type

Tuple[str, str]

vienna_nupack.rna_duplex_multiple(pairs, logger=<RootLogger root (WARNING)>, temperature=37.0, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Calls RNAduplex (from ViennaRNA package: https://www.tbi.univie.ac.at/RNA/) on a list of pairs, specifically: [ (seq1, seq2), (seq2, seq3), (seq4, seq5), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.

Parameters
  • pairs (Sequence[Tuple[str, str]]) – sequence (list or tuple) of pairs of DNA sequences

  • logger (Logger) – logger to use for printing error messages

  • temperature (float) – temperature in Celsius

  • parameters_filename (str) – name of parameters file for NUPACK

  • max_energy (float) – This is the maximum energy possible to assign. If RNAduplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAduplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.

Returns

list of free energies, in the same order as pairs

Return type

Tuple[float]

vienna_nupack.rna_duplex_multiple_parallel(thread_pool, pairs, logger=<RootLogger root (WARNING)>, temperature=37.0, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Parallel version of rna_duplex_multiple(). TODO document this

Parameters
  • thread_pool (ThreadPool) –

  • pairs (Sequence[Tuple[str, str]]) –

  • logger (Logger) –

  • temperature (float) –

  • parameters_filename (str) –

  • max_energy (float) –

Return type

Tuple[float]

vienna_nupack.rna_cofold_multiple(seq_pairs, logger=<RootLogger root (WARNING)>, temperature=37.0, parameters_filename='dna_mathews1999.par')[source]

Calls RNAcofold (from ViennaRNA package: https://www.tbi.univie.ac.at/RNA/) on a list of pairs, specifically: [ (seq1, seq2), (seq2, seq3), (seq4, seq5), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.

Parameters
  • seq_pairs (Sequence[Tuple[str, str]]) – sequence (list or tuple) of pairs of DNA sequences

  • logger (Logger) – logger to use for printing error messages

  • temperature (float) – temperature in Celsius

  • parameters_filename (str) – name of NUPACK parameters file

Returns

list of free energies, in the same order as seq_pairs

Return type

List[float]

vienna_nupack.wc(seq)[source]

Return reverse Watson-Crick complement of seq.

Parameters

seq (str) –

Return type

str

vienna_nupack.free_energy_single_strand(seq, temperature=37.0, sodium=0.05, magnesium=0.0125)[source]

Computes the “complex free energy” (https://docs.nupack.org/definitions/#complex-free-energy) of a single strand according to NUPACK.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • seq (str) –

  • temperature (float) –

  • sodium (float) –

  • magnesium (float) –

Return type

float

vienna_nupack.binding_complement(seq, temperature=37.0, sodium=0.05, magnesium=0.0125, subtract_indv=True)[source]

Computes the complex free energy of a strand with its perfect Watson-Crick complement.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • seq (str) –

  • temperature (float) –

  • sodium (float) –

  • magnesium (float) –

  • subtract_indv (bool) –

Return type

float

vienna_nupack.binding(seq1, seq2, *, temperature=37.0, sodium=0.05, magnesium=0.0125)[source]

Computes the complex free energy of association between two strands.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • seq1 (str) –

  • seq2 (str) –

  • temperature (float) –

  • sodium (float) –

  • magnesium (float) –

Return type

float

vienna_nupack.random_dna_seq(length, bases='ACTG')[source]

Chooses a random DNA sequence.

Parameters
  • length (int) –

  • bases (Sequence) –

Return type

str

vienna_nupack.domain_orthogonal(seq, seqs, temperature, sodium, magnesium, orthogonality, orthogonality_ave=-1, parallel=False)[source]

test orthogonality of domain with all others and their wc complements

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • seq (str) –

  • seqs (Sequence[str]) –

  • temperature (float) –

  • sodium (float) –

  • magnesium (float) –

  • orthogonality (float) –

  • orthogonality_ave (float) –

  • parallel (bool) –

Return type

bool

vienna_nupack.domain_pairwise_concatenated_no_sec_struct(seq, seqs, temperature, sodium, magnesium, concat, concat_ave=-1, parallel=False)[source]

test lack of secondary structure in concatenated domains

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters
  • seq (str) –

  • seqs (Sequence[str]) –

  • temperature (float) –

  • sodium (float) –

  • magnesium (float) –

  • concat (float) –

  • concat_ave (float) –

  • parallel (bool) –

Return type

bool

vienna_nupack.domain_concatenated_no_4gc(seq, seqs)[source]

prevent {G,C}^4 under concatenation

Parameters
  • seq (str) –

  • seqs (Sequence[str]) –

Return type

bool

vienna_nupack.domain_no_4gc(seq)[source]

prevent {G,C}^4

Parameters

seq (str) –

Return type

bool

vienna_nupack.domain_concatenated_no_4g_or_4c(seq, seqs)[source]

prevent G^4 and C^4 under concatenation

Parameters
  • seq (str) –

  • seqs (Sequence[str]) –

Return type

bool

nuad.np

Library for doing sequence design that can be expressed as linear algebra operations for rapid processing by numpy (e.g., generating all DNA sequences of a certain length and calculating all their full duplex binding energies in the nearest neighbor model and filtering those outside a given range).

Based on the DNA single-stranded tile (SST) sequence designer used in the following publication.

“Diverse and robust molecular algorithms using reprogrammable DNA self-assembly” Woods*, Doty*, Myhrvold, Hui, Zhou, Yin, Winfree. (*Joint first co-authors)

np.idx2seq(idx, length)[source]

Return the lexicographic idx’th DNA sequence of given length.

Parameters
  • idx (int) –

  • length (int) –

Return type

str

np.seq2arr(seq, base2bits_local=None)[source]

Convert seq (string with DNA alphabet) to numpy array with integers 0,1,2,3.

Parameters
  • seq (str) –

  • base2bits_local (Dict[str, int] | None) –

Return type

np.ndarray

np.seqs2arr(seqs)[source]

Return numpy 2D array converting the given DNA sequences to integers.

Parameters

seqs (Sequence[str]) –

Return type

ndarray

np.make_array_with_all_dna_seqs(length, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with all DNA sequences of given length in lexicographic order. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Parameters
  • length (int) –

  • bases (Collection[str]) –

Return type

ndarray

np.make_array_with_random_subset_of_dna_seqs(length, num_random_seqs, rng=Generator(PCG64) at 0x7F048D1FD9E0, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with random subset of size num_seqs of DNA sequences of given length. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Sequences returned will be unique (i.e., sampled without replacement) and in a random order

Parameters
  • length (int) – length of each row

  • num_random_seqs (int) – number of rows

  • bases (Collection[str]) – DNA bases to use

  • rng (Generator) – numpy random number generator (type returned by numpy.random.default_rng())

Returns

2D numpy array with random subset of size num_seqs of DNA sequences of given length

Return type

ndarray

np.make_array_with_all_dna_seqs_hamming_distance(dist, seq, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with all DNA sequences of given length in lexicographic order. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Parameters
  • dist (int) –

  • seq (str) –

  • bases (Collection[str]) –

Return type

ndarray

np.make_array_with_random_subset_of_dna_seqs_hamming_distance(num_seqs, dist, seq, rng=Generator(PCG64) at 0x7F048D1FD9E0, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with random subset of size num_seqs of DNA sequences of given length. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Sampled with replacement, so the same row may appear twice in the returned array

Parameters
  • num_seqs (int) – number of sequences to generate

  • dist (int) – Hamming distance to be from seq

  • seq (str) – sequence to generate other sequences close to

  • bases (Collection[str]) – DNA bases to use

  • rng (Generator) – numpy random number generator (type returned by numpy.random.default_rng())

Returns

2D numpy array with random subset of size num_seqs of DNA sequences of given length

Return type

ndarray

np.longest_common_substring(a1, a2, vectorized=True)[source]

Return start and end indices (a1start, a2start, length) of longest common substring (subarray) of 1D arrays a1 and a2.

Parameters
  • a1 (ndarray) –

  • a2 (ndarray) –

  • vectorized (bool) –

Return type

Tuple[int, int, int]

np.longest_common_substrings_singlea1(a1, a2s)[source]

Return start and end indices (a1starts, a2starts, lengths) of longest common substring (subarray) of 1D array a1 and rows of 2D array a2s.

If length[i]=0, then a1starts[i]=a2starts[i]=0 (not -1), so be sure to check length[i] to see if any substrings actually matched.

Parameters
  • a1 (ndarray) –

  • a2s (ndarray) –

Return type

Tuple[ndarray, ndarray, ndarray]

np.longest_common_substrings_product(a1s, a2s)[source]

Return start and end indices (a1starts, a2starts, lengths) of longest common substring (subarray) of each pair in the cross product of rows of a1s and a2s.

If length[i]=0, then a1starts[i]=a2starts[i]=0 (not -1), so be sure to check length[i] to see if any substrings actually matched.

Parameters
  • a1s (ndarray) –

  • a2s (ndarray) –

Return type

Tuple[ndarray, ndarray, ndarray]

np.longest_common_substrings_all_pairs_strings(seqs1, seqs2)[source]

For Python strings

Parameters
  • seqs1 (Sequence[str]) –

  • seqs2 (Sequence[str]) –

Return type

Tuple[ndarray, ndarray, ndarray]

np.strongest_common_substrings_all_pairs_string(seqs1, seqs2, temperature)[source]

For Python strings representing DNA; checks for reverse complement matches rather than direct matches, and evaluates nearest neighbor energy, returning indices lengths, and energies of strongest complementary substrings.

Parameters
  • seqs1 (Sequence[str]) –

  • seqs2 (Sequence[str]) –

  • temperature (float) –

Return type

Tuple[List[float], List[float], List[float], List[float]]

class np.DNASeqList(length=None, num_random_seqs=None, shuffle=False, alphabet=('A', 'C', 'G', 'T'), seqs=None, seqarr=None, filename=None, rng=Generator(PCG64) at 0x7F048D1FD9E0, hamming_distance_from_sequence=None)[source]

Represents a list of DNA sequences of identical length. The sequences are stored as a 2D numpy array of bytes DNASeqList.seqarr. Each byte represents a single DNA base (so it is not a compact representation; the most significant 6 bits of the byte will always be 0).

Parameters
  • length (int | None) –

  • num_random_seqs (int | None) –

  • shuffle (bool) –

  • alphabet (Collection[str]) –

  • seqs (Sequence[str] | None) –

  • seqarr (ndarray) –

  • filename (str | None) –

  • rng (Generator) –

  • hamming_distance_from_sequence (Tuple[int, str] | None) –

__init__(length=None, num_random_seqs=None, shuffle=False, alphabet=('A', 'C', 'G', 'T'), seqs=None, seqarr=None, filename=None, rng=Generator(PCG64) at 0x7F048D1FD9E0, hamming_distance_from_sequence=None)[source]

Creates a set of DNA sequences, all of the same length.

Create either all sequences of a given length if seqs is not specified, or all sequences in seqs if seqs is specified. If neither is specified then all sequences of length 3 are created.

Exactly one of the following should be specified:

  • length (possibly along with alphabet and num_random_seqs)

  • seqs

  • seqarr

  • filename

  • hamming_distance_from_sequence (possibly along with alphabet and num_random_seqs)

Parameters
  • length (int | None) – length of sequences; num_seqs and alphabet can also be specified along with it

  • hamming_distance_from_sequence (Tuple[int, str] | None) – if specified and equal to (dist, seq) of type (int, str), then only sequences at Hamming distance dist from seq will be generated. Raises error if length, seqs, seqarr, or filename is specified.

  • num_random_seqs (int | None) – number of sequences to generate; if not specified, then all sequences of length length using bases from alphabet are generated. Sequences are sampled with replacement, so the same sequence may appear twice.

  • shuffle (bool) – whether to shuffle sequences

  • alphabet (Collection[str]) – a subset of {‘A’, ‘C’, ‘G’, ‘T’}

  • seqs (Sequence[str] | None) – sequence (e.g., list or tuple) of strings, all of the same length

  • seqarr (np.ndarray) – 2D NumPy array, with axis 0 moving between sequences, and axis 1 moving between consecutive DNA bases in a sequence

  • filename (str | None) – name of file containing a DNASeqList as written by DNASeqList.write_to_file()

  • rng (np.random.Generator) – numpy random number generator (type returned by numpy.random.default_rng())

rng: Generator

Random number generator to use.

seqarr: ndarray

Uses a (noncompact) internal representation using 8 bits (1 byte, dtype = np.ubyte) per base, stored in a numpy 2D array of bytes. Each row (axis 0) is a DNA sequence, and each column (axis 1) is a base in a sequence.

The code used is \(A \to 0, C \to 1, G \to 2, T \to 3\).

seqlen: int

Length of each DNA sequence (number of columns, axis 1, in DNASeqList.seqarr)

numseqs: int

Number of DNA sequences (number of rows, axis 0, in DNASeqList.seqarr)

random_choice(num, rng=Generator(PCG64) at 0x7F048D1FD9E0, replace=False)[source]

Returns random choice of num DNA sequence(s) (represented as list of Python strings).

Parameters
  • num (int) – number of sequences to sample

  • rng (Generator) – random number generator to use

  • replace (bool) – whether to sample with replacement

Returns

sampled sequences

Return type

List[str]

random_sequence(rng=Generator(PCG64) at 0x7F048D1FD9E0)[source]

Returns random DNA sequence (represented as Python string).

Returns

sampled sequence

Parameters

rng (Generator) –

Return type

str

write_to_file(filename)[source]

Writes text file describing DNA sequence list, in format

numseqs seqlen seq1 seq2 seq3 …

where numseqs, seqlen are integers, and seq1, … are strings from {A,C,G,T}

Parameters

filename (str) –

Return type

None

wcenergy(idx, temperature)[source]

Return energy of idx’th sequence binding to its complement.

Parameters
  • idx (int) –

  • temperature (float) –

Return type

float

to_list()[source]

Return list of strings representing the sequences, e.g. [‘ACG’,’TAA’]

Return type

List[str]

get_seq_str(idx)[source]

Return idx’th DNA sequence as a string.

Parameters

idx (int) –

Return type

str

get_seqs_str_list(slice_)[source]

Return a list of strings specified by slice.

Parameters

slice_ (slice) –

Return type

List[str]

keep_seqs_at_indices(indices)[source]

Keeps only sequences at the given indices.

Parameters

indices (Iterable[int]) –

Return type

None

pop()[source]

Remove and return last seq, as a string.

Return type

str

pop_array()[source]

Remove and return last seq, as a string.

Return type

ndarray

hamming_map(sequence)[source]

Return dict mapping each length d to a DNASeqList of sequences that are Hamming distance d from seq.

Parameters

sequence (str) –

Return type

Dict[int, DNASeqList]

sublist(start, end=None)[source]

Return sublist of DNASeqList from start, inclusive, to end, exclusive.

If end is not specified, goes until the end of the list.

Parameters
  • start (int) –

  • end (int | None) –

Return type

DNASeqList

filter_energy(low, high, temperature)[source]

Return new DNASeqList with seqs whose wc complement energy is within the given range.

Parameters
  • low (float) –

  • high (float) –

  • temperature (float) –

Return type

DNASeqList

energies(temperature)[source]
Parameters

temperature (float) – temperature in Celsius

Returns

nearest-neighbor energies of each sequence with its perfect Watson-Crick complement

Return type

ndarray

filter_end_gc()[source]

Remove any sequence with A or T on the end. Also remove domains that do not have an A or T either next to that base, or one away. Otherwise we could get a domain ending in {C,G}^3, which, placed next to any domain ending in C or G, will create a substring in {C,G}^4 and be rejected if we are filtering those.

Return type

DNASeqList

filter_end_at(gc_near_end=False)[source]

Remove any sequence with C or G on the end. Also, if gc_near_end is True, remove domains that do not have an C or G either next to that base, or one away, to prevent breathing.

Parameters

gc_near_end (bool) –

Return type

DNASeqList

filter_base_nowhere(base)[source]

Remove any sequence that has given base anywhere.

Parameters

base (str) –

Return type

DNASeqList

filter_base_count(base, low, high)[source]

Remove any sequence not satisfying low <= #base <= high.

Parameters
  • base (str) –

  • low (int) –

  • high (int) –

Return type

DNASeqList

filter_base_at_pos(pos, base)[source]

Remove any sequence that does not have given base at position pos.

Parameters
  • pos (int) –

  • base (str) –

Return type

DNASeqList

filter_substring(subs)[source]

Remove any sequence with any elements from subs as a substring.

Parameters

subs (Sequence[str]) –

Return type

DNASeqList

filter_seqs_by_g_quad()[source]

Removes any sticky ends with 4 G’s in a row (a G-quadruplex).

Return type

DNASeqList

filter_seqs_by_g_quad_c_quad()[source]

Removes any sticky ends with 4 G’s or C’s in a row (a quadruplex).

Return type

DNASeqList

np.create_toeplitz(seqlen, sublen, indices=None)[source]

Creates a toeplitz matrix, useful for finding subsequences.

seqlen is length of larger sequence; sublen is length of substring we’re checking for. If indices is None, then all rows are created, otherwise only rows for checking those indices are created.

Parameters
  • seqlen (int) –

  • sublen (int) –

  • indices (Sequence[int] | None) –

Return type

np.ndarray

np.calculate_loop_energies(temperature, negate=False)[source]

Get SantaLucia and Hicks nearest-neighbor loop energies for given temperature, 1 M Na+.

Parameters
  • temperature (float) –

  • negate (bool) –

Return type

ndarray

np.wcenergy(seq, temperature, negate=False)[source]

Return the wc energy of seq binding to its complement.

Parameters
  • seq (str) –

  • temperature (float) –

  • negate (bool) –

Return type

float

np.calculate_wc_energies(seqarr, temperature, negate=False)[source]

Calculate and store in an array all energies of all sequences in seqarr with their Watson-Crick complements.

Parameters
  • seqarr (ndarray) –

  • temperature (float) –

  • negate (bool) –

Return type

ndarray

np.wc_arr(seqarr)[source]

Return numpy array of complements of sequences in seqarr.

Parameters

seqarr (ndarray) –

Return type

ndarray

np.prefilter_length_10_11(low_dg, high_dg, temperature, end_gc, convert_to_list=True)[source]

Return sequences of length 10 and 11 with wc energies between given values.

Parameters
  • low_dg (float) –

  • high_dg (float) –

  • temperature (float) –

  • end_gc (bool) –

  • convert_to_list (bool) –

Return type

Tuple[List[str], List[str]] | Tuple[DNASeqList, DNASeqList]

np.all_cats(seq, seqs)[source]

Return all sequences obtained by concatenating seq to either end of a sequence in seqs.

For example,

all_cats([0,1,2,3], [[3,3,3], [0,0,0]])

returns the numpy array

[[0,1,2,3,3,3,3],
 [3,3,3,0,1,2,3],
 [0,1,2,3,0,0,0],
 [0,0,0,0,1,2,3]]
Parameters
  • seq (Sequence[int]) –

  • seqs (Sequence[int]) –

Return type

ndarray

Indices and tables