nuad documentation

nuad.search

nuad.constraints

This module defines types for helping to define DNA sequence design constraints.

The key classes are Design, Strand, Domain to define a DNA design, and various subclasses of Constraint, such as StrandConstraint or StrandPairConstraint, to define constraints on the sequences assigned to each Domain when calling search.search_for_dna_sequences().

Also important are two other types of constraints (not subclasses of Constraint), which are used prior to the search to determine if it is even legal to use a DNA sequence: subclasses of the abstract base class NumpyFilter, and SequenceFilter, an alias for a function taking a string as input and returning a bool.

See the README on the GitHub page for more detailed explaination of these classes: https://github.com/UC-Davis-molecular-computing/dsd#data-model

constraints.all_dna_bases = {'A', 'C', 'G', 'T'}: set of all DNA bases.

class constraints.M13Variant(value)[source]

Variants of M13mp18 viral genome. “Standard” variant is p7249. Other variants are longer.

p7249 = 'p7249'

“Standard” variant of M13mp18; 7249 bases long, available from, for example

https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p7249

https://www.neb.com/products/n4040-m13mp18-single-stranded-dna

http://www.bayoubiolabs.com/biochemicat/vectors/pUCM13/

p7560 = 'p7560'

Variant of M13mp18 that is 7560 bases long. Available from, for example

https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p7560

p8064 = 'p8064'

Variant of M13mp18 that is 8064 bases long. Available from, for example

https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p8064

p8634 = 'p8634': Variant of M13mp18 that is 8634 bases long. At the time of this writing, not listed as available from any biotech vender, but Tilibit will make it for you if you ask. (https://www.tilibit.com/pages/contact-us)

length()[source]

Returns:: length of this variant of M13 (e.g., 7249 for variant M13Variant.p7249)
Return type:: int

constraints.m13(rotation=5587, variant=M13Variant.p7249)[source]

The M13mp18 DNA sequence (commonly called simply M13).

By default, starts from cyclic rotation 5587 (with 0-based indexing; commonly this is called rotation 5588, which assumes that indexing begins at 1), as defined in GenBank.

By default, returns the “standard” variant of consisting of 7249 bases, sold by companies such as Tilibit and New England Biolabs.

For a more detailed discussion of why the default rotation 5587 of M13 is used, see Supplementary Note S8 in [Folding DNA to create nanoscale shapes and patterns. Paul W. K. Rothemund, Nature 440:297-302 (2006)].

Parameters:

rotation (int) – rotation of circular strand. Valid values are 0 through length-1.
variant (M13Variant) – variant of M13 strand to use

Returns:

M13 strand sequence

Return type:

str

constraints.m13_substrings_of_length(length, except_indices=(5514, 5515, 5516, 5517, 5518, 5519, 5520, 5521, 5522, 5523, 5524, 5525, 5526, 5527, 5528, 5529, 5530, 5531, 5532, 5533, 5534, 5535, 5536, 5537, 5538, 5539, 5540, 5541, 5542, 5543, 5544, 5545, 5546, 5547, 5548, 5549, 5550, 5551, 5552, 5553, 5554, 5555, 5556), variant=M13Variant.p7249)[source]

WARNING: This function was previously recommended to use with DomainPool.possible_sequences to specify possible rotations of M13 to use. However, it creates a large file size to write all those sequences to disk on every update in the search. A better method now exists to specify this, which is to specify a SubstringSampler object as the value for DomainPool.possible_sequences instead of calling this function.

Return all substrings of the M13mp18 DNA sequence of length length, except those overlapping indices in except_start_indices.

This is useful with the field DomainPool.possible_sequences, when one strand in the Design represents a small portion of the full M13 sequence, and part of the sequence design process is to choose a rotation of M13 to use. One can set that strand to have a single Domain, which contains dependent subdomains (those with Domain.dependent set to True). These subdomains are the smaller domains where M13 attaches to other Strand’s in the Design. Then, give the parent Domain a DomainPool with DomainPool.possible_sequences set to the return value of this function, to allow the search to explore different rotations of M13.

For example, suppose m13_subdomains is a list containing Domain’s from the Design, which are consecutive subdomains of M13 from 5’ to 3’ (all with Domain.dependent set to True), and m13_length is the sum of their lengths (note this needs to be calculated manually since the following code assumes no Domain in m13_subdomains has a DomainPool yet, thus none yet have a length). Then the following code creates a Strand representing the M13 portion that binds to other Strand’s in the Design.

m13_subdomains = # subdomains of M13 used in the design
m13_length = # sum of lengths of domains in m13_subdomains
m13_substrings = dc.m13_substrings_of_length(m13_length)
m13_domain_pool = dc.DomainPool(name='m13 domain pool', possible_sequences=m13_substrings)
m13_domain = dc.Domain(name='m13', subdomains=m13_subdomains, pool=m13_domain_pool)
m13_strand = dc.Strand(name='m13', domains=[m13_domain])

Parameters:

length (int) – length of substrings to return
except_indices (Iterable[int]) – Indices of M13 to avoid in any part of the substring. If not specified, based on length, indices 5514-5556 are avoided, which are known to contain a long hairpin. (When using 1-based indexing, these are indices 5515-5557.) For example, if length = 10, then the starting indices of substrings will avoid the list [5505, 5506, …, 5556]
variant (M13Variant) – M13Variant to use

Returns:

All substrings of the M13mp18 DNA sequence, except those that overlap any index in except_start_indices.

Return type:

list[str]

constraints.default_score_transfer_function(x)[source]

A cubic transfer function.

Returns:: max(0.0, x^3)
Parameters:: x (float)
Return type:: float

constraints.logger = <Logger dsd (DEBUG)>

Global logger instance used throughout dsd.

Call logger.removeHandler(logger.handlers[0]) to stop screen output (assuming that you haven’t added or removed any handlers to the dsd logger instance already; by default there is one StreamHandler, and removing it will stop screen output).

Call logger.addHandler(logging.FileHandler(filename)) to direct to a file.

constraints.all_pairs(values, with_replacement=True, where=<function <lambda>>)[source]

Strongly typed function to get list of all pairs from iterable. (for using with mypy)

Parameters:

values (Iterable[T]) – Iterable of values.
with_replacement (bool) – Whether to include self pairs, i.e., pairs (a,a)
where (Callable[[T, T], bool]) – Predicate indicating whether to include a specific pair. Must take two parameters, each of type T, and return a bool.

Returns:

list of all pairs of values from iterable.

Return type:

list[tuple[T, T]]

constraints.all_pairs_iterator(values, with_replacement=True, where=<function <lambda>>)[source]

Strongly typed function to get iterator of all pairs from iterable. (for using with mypy)

This is WITH replacement; to specify without replacement, set with_replacement = False

Parameters:

values (Iterable[T]) – Iterable of values.
with_replacement (bool) – Whether to include self pairs, i.e., pairs (a,a)
where (Callable[[tuple[T, T]], bool]) – Predicate indicating whether to include a specific pair.

Returns:

Iterator of all pairs of values from iterable. Unlike all_pairs(), which returns a list, the iterator returned may be iterated over only ONCE.

Return type:

Iterator[tuple[T, T]]

constraints.SequenceFilter = typing.Callable[[str], bool]

Filter (see description of NumpyFilter for explanation of the term “filter”) that applies to a DNA sequence; the difference between this an a DomainConstraint is that these are applied before a sequence is assigned to a Domain, so the constraint can only be based on the DNA sequence, and not, for instance, on the Domain’s DomainPool.

Consequently SequenceFilter’s, like NumpyFilter’s, are treated differently than subtypes of Constraint, since a DNA sequence failing any SequenceFilter’s or NumpyFilter’s is never allowed to be assigned into any Domain.

The difference with NumpyFilter is that a NumpyFilter requires one to express the constraint in a way that is efficient for the linear algebra operations of numpy. If you cannot figure out how to do this, a SequenceFilter can be expressed in pure Python, but typically will be much slower to apply than a NumpyFilter.

alias of Callable[[str], bool]

class constraints.NumpyFilter[source]

Abstract base class for numpy filters. A “filter” is a hard constraint applied to sequences for a Domain; a sequence not passing the filter is never allowed to be assigned to a Domain. This constrasts with the various subclasses of Constraint, which are different in two ways: 1) they can apply to large parts of the design than just a domain, e.g., a Strand or a pair of Domain’s, and 2) they are “soft” constraints that are allowed to be violated during the course of the search.

A NumpyFilter is one that can be efficiently encoded as numpy operations on 2D arrays of bytes representing DNA sequences, through the class np.DNASeqList (which uses such a 2D array as the field np.DNASeqList.seqarr).

Subclasses should set the value NumpyFilter.name, inherited from this class.

Pre-made subclasses of NumpyFilter provided in this library, such as RestrictBasesFilter or NearestNeighborEnergyFilter, are dataclasses (https://docs.python.org/3/library/dataclasses.html). There is no requirement that custom subclasses be dataclasses, but since the subclasses will inherit the field NumpyFilter.name, you can easily make them dataclasses to get, for example, free repr and str implementations. See the source code for examples.

The related type SequenceFilter (which is just an alias for a Python function with a certain signature) has a similar purpose, but is used for filters that cannot be encoded as numpy operations. Since they are applied by running a Python loop, they are much slower to evaluate than a NumpyFilter.

name: str = 'TODO: give a concrete name to this NumpyFilter': Name of this NumpyFilter.

abstract remove_violating_sequences(seqs)[source]

Subclasses should override this method.

Since these are filters that use numpy, generally they will access the numpy ndarray instance seqs.seqarr, operate on it, and then create a new np.DNASeqList instance via the constructor np.DNASeqList taking an numpy ndarray as input.

See the source code of included constraints for examples, such as NearestNeighborEnergyFilter.remove_violating_sequences() or BaseCountFilter.remove_violating_sequences(). These are usually quite tricky to write, requiring one to think in terms of linear algebra operations. The code tends not to be easy to read. But when a constraint can be expressed in this way, it is typically very fast to apply; many millions of sequences can be processed in a few seconds.

Parameters:: seqs (DNASeqList) – np.DNASeqList object representing DNA sequences
Returns:: a new np.DNASeqList object representing the DNA sequences in seqs that satisfy the constraint
Return type:: DNASeqList

__init__()

Return type:: None

class constraints.RestrictBasesFilter(bases)[source]

Restricts the sequence to use only a subset of bases. This can be used to implement a so-called “three-letter code”, for instance, in which a certain subset of Strand uses only the bases A, T, C (and Strand’s with complementary Domain use only A, T, G), to help reduce secondary structure of those Strand’s. See for example Supplementary Section S1.1 of “Scaling Up Digital Circuit Computation with DNA Strand Displacement Cascades”, Qian and Winfree, Science 332:1196–1201, 2011. DOI: 10.1126/science.1200520, https://science.sciencemag.org/content/332/6034/1196, http://www.qianlab.caltech.edu/seesaw_digital_circuits2011_SI.pdf

Note, however, that this is a filter for Domain’s, not whole Strand’s, so for a three-letter code to work, you must take care not to mixed Domain’s on a Strand that will use different alphabets.

Parameters:: bases (Collection[str])

bases: Collection[str]: Bases to use. Must be a strict subset of {‘A’, ‘C’, ‘G’, ‘T’} with at least two bases.

remove_violating_sequences(seqs)[source]

Should never be called directly; it is handled specially by the library when initially generating sequences.

Parameters:: seqs (DNASeqList)
Return type:: DNASeqList

__init__(bases)

Parameters:: bases (Collection[str])
Return type:: None

class constraints.NearestNeighborEnergyFilter(low_energy=None, high_energy=None, temperature=37.0)[source]

This constraint calculates the nearest-neighbor binding energy of a domain with its perfect complement (summing over all length-2 substrings of the domain’s sequence), using parameters from the 2004 Santa-Lucia and Hicks paper (https://www.annualreviews.org/doi/abs/10.1146/annurev.biophys.32.110601.141800, see Table 1, and example on page 419). It rejects any sequences whose energy according to this sum is outside the range [NearestNeighborEnergyFilter.low_energy, NearestNeighborEnergyFilter.high_energy].

If NearestNeighborEnergyFilter.low_energy or NearestNeighborEnergyFilter.high_energy is not specified, only the other is used, e.g., NearestNeighborEnergyFilter(high_energy=-7.5) filters out all sequences with nearest neighbor binding energy above -7.5. At least one must be specified.

Parameters:

low_energy (float | None)
high_energy (float | None)
temperature (float)

low_energy: float | None = None: Low threshold for nearest-neighbor energy.

high_energy: float | None = None: High threshold for nearest-neighbor energy.

temperature: float = 37.0: Temperature in Celsius at which to calculate nearest-neighbor energy.

remove_violating_sequences(seqs)[source]

Remove sequences with nearest-neighbor energies outside of an interval.

Parameters:: seqs (DNASeqList)
Return type:: DNASeqList

__init__(low_energy=None, high_energy=None, temperature=37.0)

Parameters:

low_energy (float | None)
high_energy (float | None)
temperature (float)

Return type:

None

class constraints.BaseCountFilter(bases, high=None, low=None)[source]

Restricts the sequence to contain a certain number of occurences of given bases.

This is useful to implement, for instance, a “pseudo three letter code”, in which the alphabet is mostly ATC, but with the exception up to one G.

Parameters:

bases (str)
high (int | None)
low (int | None)

bases: str: Bases to count. e.g., ‘A’ or ‘AT’. If multiple bases are included, it ensures the sum of their counts is between BaseCountFilter.high and BaseCountFilter.low.

high: int | None = None: Count of bases in BaseCountFilter.bases must be at most this.

low: int | None = None: Count of bases in BaseCountFilter.bases must be at least this.

remove_violating_sequences(seqs)[source]

Remove sequences whose counts of a certain bases are outside of an interval.

Parameters:: seqs (DNASeqList)
Return type:: DNASeqList

__init__(bases, high=None, low=None)

Parameters:

bases (str)
high (int | None)
low (int | None)

Return type:

None

class constraints.BaseEndFilter(bases, distance_from_end=0, five_prime=True, three_prime=True)[source]

Restricts the sequence to contain only certain bases on (or near, if BaseEndFilter.distance_from_end > 0) each end.

Parameters:

bases (Collection[str])
distance_from_end (int)
five_prime (bool)
three_prime (bool)

bases: Collection[str]: Bases to require on ends.

distance_from_end: int = 0: Distance from end.

five_prime: bool = True: Whether to apply to 5’ end of sequence (left end of DNA sequence, lowest index).

three_prime: bool = True: Whether to apply to 3’ end of sequence (right end of DNA sequence, highest index).

remove_violating_sequences(seqs)[source]

Keeps sequences with the given bases at given distance from the 5’ or 3’ end.

Parameters:: seqs (DNASeqList)
Return type:: DNASeqList

__init__(bases, distance_from_end=0, five_prime=True, three_prime=True)

Parameters:

bases (Collection[str])
distance_from_end (int)
five_prime (bool)
three_prime (bool)

Return type:

None

class constraints.BaseAtPositionFilter(bases, position)[source]

Restricts the sequence to contain only certain base(s) on at a particular position.

One use case is that many internal modifications (e.g., biotin or fluorophore) can only be placed on an T.

Parameters:

bases (str | Collection[str])
position (int)

bases: str | Collection[str]

Base(s) to require at position BasePositionConstraint.position.

Can either be a single base, or a collection (e.g., list, tuple, set). If several bases are specified, the base at BasePositionConstraint.position must be one of the bases in BasePositionConstraint.bases.

position: int: Position of base to check.

remove_violating_sequences(seqs)[source]

Remove sequences that don’t have one of the given bases at the given position.

Parameters:: seqs (DNASeqList)
Return type:: DNASeqList

__init__(bases, position)

Parameters:

bases (str | Collection[str])
position (int)

Return type:

None

class constraints.ForbiddenSubstringFilter(substrings, indices=None)[source]

Restricts the sequence not to contain a certain substring(s), e.g., GGGG.

Parameters:

substrings (str | Collection[str])
indices (Sequence[int] | None)

substrings: str | Collection[str]

Substring(s) to forbid.

Can either be a single substring, or a collection (e.g., list, tuple, set). If a collection, all substrings must have the same length.

indices: Sequence[int] | None = None: Indices at which to check for each substring in ForbiddenSubstringFilter.substrings. If not specified, all appropriate indices are checked.

length()[source]

Returns:: length of substring(s) to check
Return type:: int

remove_violating_sequences(seqs)[source]

Remove sequences that have a string in ForbiddenSubstringFilter.substrings as a substring.

Parameters:: seqs (DNASeqList)
Return type:: DNASeqList

__init__(substrings, indices=None)

Parameters:

substrings (str | Collection[str])
indices (Sequence[int] | None)

Return type:

None

class constraints.RunsOfBasesFilter(bases, length)[source]

Restricts the sequence not to contain runs of a certain length from a certain subset of bases, (e.g., forbidding any substring in {C,G}^3; no four bases can appear in a row that are either C or G)

This works by simply generating all strings representing the runs of bases, and then using a ForbiddenSubstringFilter with those strings. So this will not be efficient for forbidding, for example {A,C,T}^20 (i.e., all runs of A’s, C’s, or T’s of length 20), which would generate all 3^20 = 3,486,784,401 strings of length 20 from the alphabet {A,C,T}^20. Hopefully such a constraint would not be used in practice.

Parameters:

bases (Collection[str])
length (int)

__init__(bases, length)[source]

Parameters:

bases (str | Collection[str]) – Can either be a single base, or a collection (e.g., list, tuple, set).
length (int) – length of run to forbid

Return type:

None

bases: Collection[str]: Bases to forbid in runs of length RunsOfBasesFilter.length.

length: int: Length of run to forbid.

remove_violating_sequences(seqs)[source]

Remove sequences that have a run of given length of bases from given bases.

Parameters:: seqs (DNASeqList)
Return type:: DNASeqList

class constraints.SubstringSampler(supersequence, substring_length, except_start_indices=None, except_overlapping_indices=None, circular=False)[source]

A SubstringSampler is an object for specifying a common case for the field DomainPool.possible_sequences, namely where we want the set of possible sequences to be all (or many) substrings of a single longer sequence.

For example, this can be used to choose a rotation of the M13mp18 strand in sequence design. If for example 300 consecutive bases of M13 will be used in the design, and we want to choose the rotation, but disallow the substring of length 300 to overlap the hairpin at indices 5514-5556, then one would do the following

possible_sequences = SubstringSampler(
    supersequence=m13(), substring_length=300,
    except_overlapping_indices=range(5514, 5557), circular=True)
pool = DomainPool('M13 rotations', possible_sequences=possible_sequences)

For this example, using a SubstringSampler is much more efficient than explicitly listing all length-300 substrings of M13 in the parameter DomainPool.possible_sequences. This is because the latter approach, whenever the Design improves and is written to a file, will write out all the length-300 substrings of M13, taking much more space than just writing the full M13 sequence once.

Parameters:

supersequence (str)
substring_length (int)
except_start_indices (tuple[int, ...])
except_overlapping_indices (Iterable[int] | None)
circular (bool)

__init__(supersequence, substring_length, except_start_indices=None, except_overlapping_indices=None, circular=False)[source]

Parameters:

supersequence (str)
substring_length (int)
except_start_indices (Iterable[int] | None)
except_overlapping_indices (Iterable[int] | None)
circular (bool)

Return type:

None

supersequence: str: The longer sequence from which to sample substrings.

substring_length: int: Length of substrings to sample.

circular: bool: Whether SubstringSampler.supersequence is circular. If so, then we can sample indices near the end and the substrings will start at the end and wrap around to the start.

except_start_indices: tuple[int, ...]: Start indices in SubstringSampler.supersequence to avoid. In the constructor this can be specified directly. Another option (mutually exclusive with the parameter except_start_indices) is to specify the parameter except_overlapping_indices, which sets SubstringSampler.except_start_indices so that substrings will not intersect any indices in except_overlapping_indices.

extended_supersequence: str: If SubstringSampler.circular is True, then this is SubstringSampler.supersequence extended by its own prefix of length SubstringSampler.substring_length - 1, to make sampling easier. Otherwise it is simply identical to SubstringSampler.supersequence. Computed in constructor from other arguments.

start_indices: tuple[int, ...]: List of start indices from which to sample when calling SubstringSampler.sample_substring(). Computed in constructor from other arguments.

all_possible_substrings()[source]

Returns:: list of all possible substrings that can be sampled by this SubstringSampler.
Return type:: list[str]

sample_substring(rng)[source]

Returns:: a random substring of SubstringSampler.supersequence of length SubstringSampler.substring_length.
Parameters:: rng (Generator)
Return type:: str

class constraints.DomainPool(name, length=None, possible_sequences=None, replace_with_close_sequences=True, hamming_probability=<factory>, numpy_filters=<factory>, sequence_filters=<factory>)[source]

Represents a group of related Domain’s that share common properties in their sequence design, such as length of DNA sequence, or bounds on nearest-neighbor duplex energy.

Also serves as a “source” of DNA sequences for Domain’s in this DomainPool. By calling DomainPool.generate_sequence() repeatedly, we can produce DNA sequences satisfying the constraints defining this DomainPool.

Parameters:

name (str)
length (int | None)
possible_sequences (list[str] | SubstringSampler | None)
replace_with_close_sequences (bool)
hamming_probability (dict[int, float])
numpy_filters (list[NumpyFilter])
sequence_filters (list[SequenceFilter])

name: str: Name of this DomainPool. Must be unique.

length: int | None = None

Length of DNA sequences generated by this DomainPool.

Should be None if DomainPool.possible_sequences is specified.

possible_sequences: list[str] | SubstringSampler | None = None

If specified, all other fields except DomainPool.name and DomainPool.length are ignored. This is an explicit list of sequences to consider for Domain’s using this DomainPool. During the search, if a domain with this DomainPool is picked to have its sequence changed, then a sequence will be picked uniformly at random from this list. Note that no NumpyFilter’s or SequenceFilter’s will be applied.

Alternatively, the field can be an instance of SubstringSampler for the common case that the set of possible sequences is the set of substrings of some length of a single longer sequence. For example, this can be used to choose a rotation of the M13mp18 strand in sequence design. (This is advantageous because the files saving the Design each time the design improves will be much shorter, since it takes much less space to write the M13 sequence than to write all of its length-300 substrings.)

Should be None if DomainPool.length is specified.

replace_with_close_sequences: bool = True: If True, instead of picking a sequence uniformly at random from all those satisfying the filters when returning a sequence from DomainPool.generate_sequence(), one is picked “close” in Hamming distance to the previous sequence of the Domain. The field DomainPool.hamming_probability is used to pick a distance at random, after which a sequence that distance from the previous sequence is selected to return.

hamming_probability: dict[int, float]: Dictionary that specifies probability of taking a new sequence from the pool that is some integer number of bases different from the previous sequence (Hamming distance).

numpy_filters: list[NumpyFilter]

NumpyFilter’s shared by all Domain’s in this DomainPool. This is used to choose potential sequences to assign to the Domain’s in this DomainPool in the method DomainPool.generate_sequence().

The difference with DomainPool.sequence_filters is that these constraints can be applied efficiently to many sequences at once, represented as a numpy 2D array of bytes (via the class np.DNASeqList), so they are done in large batches in advance. In contrast, the constraints in DomainPool.sequence_filters are done on Python strings representing DNA sequences, and they are called one at a time when a new sequence is requested in DomainPool.generate_sequence().

Optional; default is empty.

sequence_filters: list[SequenceFilter]

SequenceFilter’s shared by all Domain’s in this DomainPool. This is used to choose potential sequences to assign to the Domain’s in this DomainPool in the method DomainPool.generate().

See DomainPool.numpy_filters for an explanation of the difference between them.

See DomainPool.domain_constraints for an explanation of the difference between them.

Optional; default is empty.

satisfies_sequence_constraints(sequence)[source]

Parameters:: sequence (str) – DNA sequence to check
Returns:: whether sequence satisfies all constraints in DomainPool.sequence_filters
Return type:: bool

generate_all_sequences()[source]

Returns all DNA sequences of given length satisfying DomainPool.numpy_filters and DomainPool.sequence_filters.

CAUTION: This is prohibitively memory and time intensive if length is larger than 16, because it enumerates all DNA sequences of a given length, then filters out those not passing the filters.

Returns:: all DNA sequences of given length satisfying DomainPool.numpy_filters and DomainPool.sequence_filters
Return type:: list[str]

generate_sequence(rng, previous_sequence=None, warn_no_seqs_found=False)[source]

Returns a DNA sequence of given length satisfying DomainPool.numpy_filters and DomainPool.sequence_filters

Note: By default, there is no check that the sequence returned is unequal to one already assigned somewhere in the design, since both DomainPool.numpy_filters and DomainPool.sequence_filters do not have access to the whole Design. But the DomainPairConstraint returned by domains_not_substrings_of_each_other_constraint() can be used to specify this Design-wide constraint.

Note that if DomainPool.possible_sequences is specified, then all constraints are ignored, and instead a sequence is chosen randomly to be returned from that list.

Parameters:

rng (np.random.Generator) – numpy random number generator to use. To use a default, pass np.default_rng.
previous_sequence (str | None) – previously generated sequence to be replaced by a new sequence; None if no previous sequence exists. Used to choose a new sequence “close” to itself in Hamming distance, if the field DomainPool.replace_with_close_sequences is True and previous_sequence is not None. The number of differences between previous_sequence and its neighbors is determined by randomly picking a Hamming distance from DomainPool.hamming_probability with weighted probabilities of choosing each distance.
warn_no_seqs_found (bool) – whether to warn if no sequences satisfying constraints are found when generating a large number; this is on by default in case the user made the NumpyFilters or SequenceFilters too restrictive, but it can be annoying to see it printed when it is not hurting the search (frequently it appears when filters are restrictive and Hamming distance 1 is picked, but it’s okay because it simply samples a larger Hamming distance the next time)

Returns:

DNA sequence of given length satisfying DomainPool.numpy_filters and DomainPool.sequence_filters

Return type:

str

__init__(name, length=None, possible_sequences=None, replace_with_close_sequences=True, hamming_probability=<factory>, numpy_filters=<factory>, sequence_filters=<factory>)

Parameters:

name (str)
length (int | None)
possible_sequences (list[str] | SubstringSampler | None)
replace_with_close_sequences (bool)
hamming_probability (dict[int, float])
numpy_filters (list[NumpyFilter])
sequence_filters (list[SequenceFilter])

Return type:

None

class constraints.Part[source]

__init__()

Return type:: None

class constraints.Domain(name, pool=None, *, sequence=None, fixed=False, label=None, dependent=False, subdomains=None, weight=None)[source]

Represents a contiguous substring of the DNA sequence of a Strand, which is intended to be either single-stranded, or to bind fully to the Watson-Crick complement of the Domain.

If two domains are complementary, they are represented by the same Domain object. They are distinguished only by whether the Strand object containing them has the Domain in its set Strand.starred_domains or not.

A Domain uses only its name to compute hash and equality checks, not its sequence. This allows a Domain to be used in sets and dicts while modifying the sequence assigned to it, and also modifying the pool (letting the pool be assigned after it is created).

Parameters:

name (str)
pool (DomainPool | None)
sequence (str | None)
fixed (bool)
label (str | None)
dependent (bool)
subdomains (list[Domain] | None)
weight (float | None)

parent: Domain | None = None: Domain of which this is a subdomain. Note, this is not set manually, this is set by the library based on the Domain.subdomains of other domains in the same tree.

__init__(name, pool=None, *, sequence=None, fixed=False, label=None, dependent=False, subdomains=None, weight=None)[source]

Parameters:

name (str)
pool (DomainPool | None)
sequence (str | None)
fixed (bool)
label (str | None)
dependent (bool)
subdomains (list[Domain] | None)
weight (float | None)

Return type:

None

fixed: bool = False

Whether this Domain’s DNA sequence is fixed, i.e., cannot be changed by the search algorithm search.search_for_dna_sequences().

Note: If a domain is fixed then all of its subdomains must also be fixed.

label: str | None = None

Optional “label” string to associate to this Domain.

Useful for associating extra information with the Domain that will be serialized, for example, for DNA sequence design.

dependent: bool = False

Whether this Domain’s DNA sequence is dependent on others. Usually this is not the case. However, domains can be subdivided hierarchically into a tree of domains by setting Domain.subdomains to describe the tree. In this case exactly one domain along every path from the root to any leaf must be independent, and the rest dependent: the dependent domains will have their sequences calculated from the indepenedent ones.

A possible use case is that one strand represents a subsequence of M13 of length 300, of which there are 7249 possible DNA sequences to assign based on the different rotations of M13. If this strand is bound to several other strands, it will have several domains, but they cannot be set independently of each other. This can be done by creating a strand with a single long domain, which is subdivided into many dependent child domains. Only the entire strand, the root domain, can be assigned at once, changing every domain at once, so the domains are dependent on the root domain’s assigned sequence.

length: int | None = None: Length of this domain. If None, then the method Domain.get_length() asks Domain.pool for the length. However, a Domain with Domain.dependent set to True has no Domain.pool. For such domains, it is necessary to set a Domain.length field directly.

weight: float = 1.0

Weight to apply before picking domain at random to change when re-assigning DNA sequences during search. Should only be changed for independent domains. (those with Domain.dependent set to False)

Normally a domain’s probability of being changed is proportional to the total score of violations it causes, but that total score is first multiplied by Domain.weight. This is useful, for instance, to weight a domain lower when it has many subdomains that intersect many strands, for example if a domain represents an M13 strand. It may be more efficient to pick such a domain less often since changing it will change many strands in the design and, when the design gets close to optimized, this will likely cause the score to go up.

to_json_serializable(suppress_indent=True)[source]

Returns:: Dictionary d representing this Domain that is “naturally” JSON serializable, by calling json.dumps(d).
Parameters:: suppress_indent (bool)
Return type:: NoIndent | dict[str, Any]

static from_json_serializable(json_map, pool_with_name)[source]

Parameters:

json_map (dict[str, Any]) – JSON serializable object encoding this Domain, as returned by Domain.to_json_serializable().
pool_with_name (dict[str, DomainPool] | None) – dict mapping name to DomainPool with that name; required to rehydrate Domain’s. If None, then a DomainPool with no constraints is created with the name and domain length found in the JSON.

Returns:

Domain represented by dict json_map, assuming it was created by Domain.to_json_serializable().

Return type:

Domain

property name: str

name of this Domain

Type:: return

property pool: DomainPool

DomainPool of this Domain

Type:: return

property subdomains: list[Domain]

Subdomains of this Domain.

Used in connection with Domain.dependent to declare that some Domain’s are contained within other domains (forming a tree in general), and domains with Domain.dependent set to True automatically take their sequences from independent domains.

WARNING: this can be a bit tricky to determine the order when setting these. The subdomains should be listed in 5’ to 3’ order for UNSTARRED domains. If there is a starred domain with starred subdomains, they would be listed in REVERSE order.

For example, if there is a domain dom* [---------> of length 11 with two subdomains sub1* [-----> of length 7 and sub2* [--> of length 4 (put together they look like [----->[-->) that appear in that order left to right (5’ to 3’), then one would assign the domain dom to have subdomains [sub2, sub1], since the UNSTARRED domains appear <-----]<--], i.e., in 5’ to 3’ order for the unstarred domains, first the length 4 domain dom2 appears, then the length 7 domain dom1.

has_length()[source]

Returns:: True if this Domain has a length, which means either a sequence has been assigned to it, or it has a DomainPool.
Return type:: bool

get_length()[source]

Returns:: Length of this domain (delegates to pool)
Raises:: ValueError – if no DomainPool has been set for this Domain
Return type:: int

sequence()[source]

Returns:: DNA sequence of this domain (unstarred version)
Raises:: ValueError – If no sequence has been assigned.
Return type:: str

set_sequence(new_sequence)[source]

Parameters:: new_sequence (str) – new DNA sequence to set
Return type:: None

set_fixed_sequence(fixed_sequence)[source]

Set DNA sequence and fix it so it is not changed by the nuad sequence designer.

Since it is being fixed, there is no Domain pool, so we don’t check the pool or whether it has a length. We also bypass the check that it is not fixed.

Parameters:: fixed_sequence (str) – new fixed DNA sequence to set
Return type:: None

property starred_name: str

The value Domain.name with * appended to it.

Type:: return

property starred_sequence: str

Watson-Crick complement of DNA sequence assigned to this Domain.

Type:: return

get_name(starred)[source]

Parameters:: starred (bool) – whether to return the starred or unstarred version of the name
Returns:: The value Domain.name or Domain.starred_name, depending on the value of parameter starred.
Return type:: str

concrete_sequence(starred)[source]

Parameters:: starred (bool) – whether to return the starred or unstarred version of the sequence
Returns:: The value Domain.sequence or Domain.starred_sequence, depending on the value of parameter starred.
Raises:: ValueError – if this Domain does not have a sequence assigned
Return type:: str

has_sequence()[source]

Returns:: Whether a complete DNA sequence has been assigned to this Domain. If this domain has subdomains, False if any subdomain has not been assigned a sequence.
Return type:: bool

static complementary_domain_name(domain_name)[source]

Returns the name of the domain complementary to domain_name. In other words, a * is either removed from the end of domain_name, or appended to it if not already there.

Parameters:: domain_name (str) – name of domain
Returns:: name of complementary domain
Return type:: str

all_domains_in_tree(allow_fixed=False)[source]

Returns:: list of all domains in the same subdomain tree as this domain (including itself)
Parameters:: allow_fixed (bool)
Return type:: list[Domain]

all_domains_intersecting()[source]

Returns:: list of all domains intersecting this one, meaning those domains in the subtree rooted at this domain (including itself), plus any ancestors of this domain.
Return type:: list[Domain]

ancestors()[source]

Returns:: list of all domains that are ancestors of this one, NOT including this domain
Return type:: list[Domain]

has_pool()[source]

Returns:: whether a DomainPool has been assigned to this Domain
Return type:: bool

contains_in_subtree(other)[source]

Parameters:: other (Domain) – another Domain
Returns:: True if self contains other in its subtree of subdomains
Return type:: bool

independent_source()[source]

Like independent_ancestor_or_descendent(), but returns this Domain if it is already independent.

Returns:: the independent Domain that this domain depends on, which is itself if it is already independent
Return type:: Domain

independent_ancestor_or_descendent()[source]

Find the independent ancestor or descendent of this dependent Domain. Raises exception if this is not a dependent Domain.

Returns:: The independent ancestor or descendent of this Domain.
Return type:: Domain

constraints.domains_neq_constraint(check_complements=True, short_description='dom neq', weight=1.0, min_length=4, pairs=None, domains=None)[source]

More generally, returns constraint ensuring no two domains are substrings of each other. Note that this ensures that no two Domain’s are equal if they are the same length.

Parameters:

check_complements (bool) – whether to also ensure the check for Watson-Crick complements of the sequences
short_description (str) – short description of constraint suitable for logging to stdout
weight (float) – weight to assign to constraint
min_length (int) – minimum length substring to check. For instance if min_length is 4, then having two domains with sequences AAAA and CAAAAC would violate this constraint, but domains with sequences AAA and CAAAC would not.
pairs (Iterable[tuple[Domain, Domain]] | None) – pairs of domains to check. By default all pairs of unequal domains are compared unless both are fixed. Mutually exclusive with domains.
domains (Iterable[Domain] | None) – domains to check. All pairs among domains are checked. Mutually exclusive with pairs.

Returns:

a DomainPairConstraint ensuring no two domain sequences contain each other as a substring (in particular, if they are equal length, then they are not the same domain)

Return type:

DomainPairConstraint

class constraints.VendorFields(scale='25nm', purification='STD', plate=None, well=None)[source]

Data required when ordering DNA strands from a synthesis company such as IDT (Integrated DNA Technologies). This data is used when automatically generating files used to order DNA from IDT.

When exporting to IDT files via Design.write_idt_plate_excel_file() or Design.write_idt_bulk_input_file(), the field Strand.name is used for the name if it exists, otherwise a reasonable default is chosen.

Parameters:

scale (str)
purification (str)
plate (str | None)
well (str | None)

scale: str = '25nm': Synthesis scale at which to synthesize the strand (third field in IDT bulk input: https://www.idtdna.com/site/order/oligoentry). Choices supplied by IDT at the time this was written: "25nm", "100nm", "250nm", "1um", "5um", "10um", "4nmU", "20nmU", "PU", "25nmS".

purification: str = 'STD': Purification options (fourth field in IDT bulk input: https://www.idtdna.com/site/order/oligoentry). Choices supplied by IDT at the time this was written: "STD", "PAGE", "HPLC", "IEHPLC", "RNASE", "DUALHPLC", "PAGEHPLC".

plate: str | None = None

Name of plate in case this strand will be ordered on a 96-well or 384-well plate.

Optional field, but non-optional if book = openpyxl.load_workbook(filename=filename).well is not None.

well: str | None = None

Well position on plate in case this strand will be ordered on a 96-well or 384-well plate.

Optional field, but non-optional if VendorFields.plate is not None.

__init__(scale='25nm', purification='STD', plate=None, well=None)

Parameters:

scale (str)
purification (str)
plate (str | None)
well (str | None)

Return type:

None

class constraints.Strand(domains=None, starred_domain_indices=(), group='default_strand_group', name=None, label=None, vendor_fields=None)[source]

Represents a DNA strand, made of several Domain’s.

Parameters:

domains (Iterable[Domain] | None)
starred_domain_indices (Iterable[int])
group (str)
name (str | None)
label (str | None)
vendor_fields (VendorFields | None)

modification_5p: nm.Modification5Prime | None = None: 5’ modification; None if there is no 5’ modification.

modification_3p: nm.Modification3Prime | None = None: 3’ modification; None if there is no 3’ modification.

__init__(domains=None, starred_domain_indices=(), group='default_strand_group', name=None, label=None, vendor_fields=None)[source]

A Strand can be created only by listing explicit Domain objects via parameter domains. To specify a Strand by giving domain names, see the method Design.add_strand().

Parameters:

domains (Iterable[Domain] | None) – list of Domain’s on this Strand
starred_domain_indices (Iterable[int]) – Indices of Domain’s in domains that are starred.
group (str) – name of group of this Strand.
name (str | None) – Name of this Strand.
label (str | None) – Label to associate with this Strand.
vendor_fields (VendorFields | None) – VendorFields object to associate with this Strand; needed to call methods for exporting to IDT formats (e.g., Strand.write_idt_bulk_input_file())

Return type:

None

group: str: Optional “group” field to describe strands that share similar properties.

domains: list[Domain]: The Domain’s on this Strand, in order from 5’ end to 3’ end.

starred_domain_indices: frozenset[int]: Set of positions of Domain’s in Strand.domains on this Strand that are starred.

label: str | None = None

Optional generic “label” string to associate to this Strand.

Useful for associating extra information with the Strand that will be serialized, for example, for DNA sequence design.

vendor_fields: VendorFields | None = None: Fields used when ordering strands from a synthesis company such as IDT (Integrated DNA Technologies, Coralville, IA). If present (i.e., not equal to None) then the method Design.write_idt_bulk_input_file() can be called to automatically generate an text file for ordering strands in test tubes: https://www.idtdna.com/site/order/oligoentry, as can the method Design.write_idt_plate_excel_file() for writing a Microsoft Excel file that can be uploaded to IDT’s website for describing DNA sequences to be ordered in 96-well or 384-well plates.

modifications_int: dict[int, nm.ModificationInternal]

modifications.Modification’s to the DNA sequence (e.g., biotin, Cy3/Cy5 fluorphores).

Maps index within DNA sequence to modification. If the internal modification is attached to a base (e.g., internal biotin, /iBiodT/ from IDT), then the index is that of the base. If it goes between two bases (e.g., internal Cy3, /iCy3/ from IDT), then the index is that of the previous base, e.g., to put a Cy3 between bases at indices 3 and 4, the index should be 3. So for an internal modified base on a sequence of length n, the allowed indices are 0,…,n-1, and for an internal modification that goes between bases, the allowed indices are 0,…,n-2.

clone(name)[source]

Returns a copy of this Strand. The copy is “shallow” in that the Domain’s are shared. This is useful for creating multiple versions of each Strand, e.g., for having a variant with an extension.

WARNING: the Strand.label will be shared between them. If it should be copied, this must be done manually. A shallow copy of it can be made by setting

Parameters:: name (str | None) – new name to give this Strand
Returns:: A copy of this Strand.
Return type:: Strand

compute_derived_fields()[source]: Re-computes derived fields of this Strand. Should be called after modifications to the Strand. (Done automatically at the start of search.search_for_dna_sequences().)

intersects_domain(domain)[source]

Parameters:: domain (Domain) – domain to test for intersection
Returns:: whether this strand intersects domain, which is true if either domain is in the list Strand.domains, or if any of those domains have domain in their hierarchical tree as a subdomain or an ancestor
Return type:: bool

length()[source]

Returns:: Sum of lengths of Domain’s in this Strand. Each Domain must have a DomainPool assigned so that the length is defined.
Return type:: int

domain_names_concatenated(delim='-')[source]

Parameters:: delim (str) – Delimiter to put between domain names.
Returns:: names of Domain’s in this Strand, concatenated with delim in between.
Return type:: str

domain_names_tuple()[source]

Returns:: tuple of names of Domain’s in this Strand.
Return type:: tuple[str, …]

vendor_dna_sequence()[source]

Returns:: DNA sequence as it needs to be typed to order from a synthesis company, with Modification5Prime’s, Modification3Prime’s, and ModificationInternal’s represented with text codes, e.g., “/5Biosg/ACGT” for sequence ACGT with a 5’ biotin modification to order from IDT.
Return type:: str

to_json_serializable(suppress_indent=True)[source]

Returns:: Dictionary d representing this Strand that is “naturally” JSON serializable, by calling json.dumps(d).
Parameters:: suppress_indent (bool)
Return type:: NoIndent | dict[str, Any]

static from_json_serializable(json_map, domain_with_name)[source]

Returns:

Strand represented by dict json_map, assuming it was created by Strand.to_json_serializable().

Parameters:

json_map (dict[str, Any])
domain_with_name (dict[str, Domain])

Return type:

Strand

unstarred_domains()[source]

Returns:: list of unstarred Domain’s in this Strand, in order they appear in Strand.domains
Return type:: list[Domain]

starred_domains()[source]

Returns:: list of starred Domain’s in this Strand, in order they appear in Strand.domains
Return type:: list[Domain]

unstarred_domains_set()[source]

Returns:: set of unstarred Domain’s in this Strand
Return type:: OrderedSet[Domain]

starred_domains_set()[source]

Returns:: set of starred Domain’s in this Strand
Return type:: OrderedSet[Domain]

sequence(delimiter='')[source]

Parameters:: delimiter (str) – Delimiter string to place between sequences of each Domain in this Strand. For instance, if delimiter = '--', then it will return a string such as ACGTAGCTGA--CGCTAGCTGA--CGATCGATC--GCGATCGAT
Returns:: DNA sequence assigned to this Strand, calculated by concatenating all sequences assigned to its Domain’s.
Raises:: ValueError – if any Domain of this Strand does not have a sequence assigned
Return type:: str

assign_dna(sequence)[source]

Parameters:: sequence (str) – DNA sequence to assign to this Strand. Must have length = Strand.length().
Return type:: None

property fixed: bool: True if every Domain on this Strand has a fixed DNA sequence.

unfixed_domains()[source]

Returns:: all Domain’s in this Strand where Domain.fixed is False
Return type:: tuple[Domain, …]

property name: str

name of this Strand if it was assigned one, otherwise Domain names are concatenated with ‘-’ joining them

Type:: return

address_of_domain(domain_idx)[source]

Returns StrandDomainAddress of the domain located at domain_idx

Rparam domain_idx:: Index of domain
Parameters:: domain_idx (int)
Return type:: StrandDomainAddress

address_of_nth_domain_occurence(domain_name, n, forward=True)[source]

Returns StrandDomainAddress of the n’th occurence of domain named domain_name.

Parameters:

domain_name (str) – name of Domain to find address of
n (int) – which occurrence (in order on the Strand) of Domain with name domain_name to find address of.
forward – if True, starts searching from 5’ end, otherwise starts searching from 3’ end.

Returns:

StrandDomainAddress of the n’th occurence of domain named domain_name.

Return type:

StrandDomainAddress

address_of_first_domain_occurence(domain_name)[source]

Returns StrandDomainAddress of the first occurrence of domain named domain_name starting from the 5’ end.

Parameters:: domain_name (str)
Return type:: StrandDomainAddress

address_of_last_domain_occurence(domain_name)[source]

Returns StrandDomainAddress of the nth occurrence of domain named domain_name starting from the 3’ end.

Parameters:: domain_name (str)
Return type:: StrandDomainAddress

append_domain(domain, starred=False)[source]

Appends domain to 3’ end of this Strand.

Parameters:

domain (Domain) – Domain to append
starred (bool) – whether domain is starred

Return type:

None

prepend_domain(domain, starred=False)[source]

Prepends domain to 5’ end of this Strand (i.e., the beginning of the Strand).

Parameters:

domain (Domain) – Domain to prepend
starred (bool) – whether domain is starred

Return type:

None

insert_domain(idx, domain, starred=False)[source]

Inserts domain at index idx of this Strand, with same semantics as Python’s list.insert. For example, strand.insert(0, domain) is equivalent to strand.prepend_domain(domain) and strand.insert(len(strand.domains), domain) is equivalent to strand.append_domain(domain).

Parameters:

idx (int) – index at which to insert domain into this Strand
domain (Domain) – Domain to append
starred (bool) – whether domain is starred

Return type:

None

set_fixed_sequence(seq)[source]

Sets each domain of this Strand to have a substring of seq, such that the entire strand has the sequence seq. All Domain’s in this strand will be fixed after doing this. (And if any of them are already fixed it will raise an error.)

Parameters:: seq (str) – sequence to assign to this Strand
Return type:: None

class constraints.DomainPair(domain1: 'Domain', domain2: 'Domain', starred1: 'bool' = False, starred2: 'bool' = False)[source]

Parameters:

domain1 (Domain)
domain2 (Domain)
starred1 (bool)
starred2 (bool)

domain1: Domain: First domain

domain2: Domain: Second domain

starred1: bool = False: Whether first domain is starred (not used in most constraints)

starred2: bool = False: Whether second domain is starred (not used in most constraints)

__init__(domain1, domain2, starred1=False, starred2=False)

Parameters:

domain1 (Domain)
domain2 (Domain)
starred1 (bool)
starred2 (bool)

Return type:

None

class constraints.StrandPair(strand1: 'Strand', strand2: 'Strand')[source]

Parameters:

strand1 (Strand)
strand2 (Strand)

__init__(strand1, strand2)

Parameters:

strand1 (Strand)
strand2 (Strand)

Return type:

None

class constraints.Complex(*args: 'Strand')[source]

Parameters:: args (Strand)

__init__(*args)[source]

Creates a complex of strands given as arguments, e.g., Complex(strand1, strand2) creates a 2-strand complex.

Parameters:: args (Strand)
Return type:: None

strands: tuple[Strand, ...]: The strands in this complex.

constraints.remove_duplicates(lst)[source]

Parameters:: lst (Iterable[T]) – an Iterable of objects
Returns:: a list consisting of elements of lst with duplicates removed, while preserving iteration order of lst (naive approach using Python set would not preserve order, since iteration order of Python sets is not specified)
Return type:: list[T]

class constraints.PlateType(value)[source]

Represents two different types of plates in which DNA sequences can be ordered.

wells96 = 96: 96-well plate.

wells384 = 384: 384-well plate.

num_wells_per_plate()[source]

Returns:: number of wells in this plate type
Return type:: int

min_wells_per_plate()[source]

Returns:: minimum number of wells in this plate type to avoid extra charge by IDT
Return type:: int

class constraints.Design(strands=())[source]

Represents a complete design, i.e., a set of DNA Strand’s with domains, and Constraint’s on the sequences to assign to them via search.search_for_dna_sequences().

Parameters:: strands (list[Strand])

strands_by_group_name: dict[str, list[Strand]]

dict mapping each group name to a list of the Strand’s in this Design in the group.

Computed from Design.strands, so not specified in constructor.

domain_pools_to_domain_map: dict[DomainPool, list[Domain]]

dict mapping each DomainPool to a list of the Domain’s in this Design in the pool.

Computed from Design.strands, so not specified in constructor.

strands_by_name: dict[str, Strand]

dict mapping each name of a Strand to the Strand in this Design with that name.

Computed from Design.strands, so not specified in constructor.

property domains: list[Domain]

List of all Domain’s in this Design. (without repetitions)

Computed from Design.strands, so not specified in constructor.

__init__(strands=())[source]

Parameters:: strands (Iterable[Strand]) – the Strand’s in this Design
Return type:: None

strands: list[Strand]: List of all Strand’s in this Design.

domains_by_name: dict[str, Domain]

dict mapping each name of a Domain to the Domain’s in this Design.

Computed from Design.strands, so not specified in constructor.

compute_derived_fields()[source]

Computes derived fields of this Design. Used to ensure that all fields are valid in case the Design was manually modified after being created, before running search.search_for_dna_sequences().

Return type:: None

to_json()[source]

Returns:: JSON string representing this Design.
Return type:: str

to_json_serializable(suppress_indent=True)[source]

Parameters:: suppress_indent (bool) – Whether to suppress indentation of some objects using the NoIndent object.
Returns:: Dictionary d representing this Design that is “naturally” JSON serializable, by calling json.dumps(d).
Return type:: dict[str, Any]

write_design_file(directory='.', filename=None, extension='json')[source]

Write JSON file representing this Design, which can be imported via the method Design.from_design_file(), with the output file having the same name as the running script but with .py changed to .json, unless filename is explicitly specified. For instance, if the script is named my_design.py, then the design will be written to my_design.json. If extension is specified (but filename is not), then the design will be written to my_design.<extension>

The string written is that returned by Design.to_json().

Parameters:

directory (str) – directory in which to put file (default: current working directory)
filename (str | None) – filename (default: name of script with .py replaced by .sc). Mutually exclusive with extension
extension (str) – extension for filename (default: .sc) Mutually exclusive with filename

Return type:

None

static from_design_file(filename)[source]

Parameters:: filename (str) – name of JSON file describing the Design
Returns:: Design described by the JSON file with name filename, assuming it was created using Design.write_design_file().
Return type:: Design

static from_json(json_str)[source]

Parameters:: json_str (str) – The string representing the Design as a JSON object.
Returns:: Design described by this JSON string, assuming it was created using :py:meth`Design.to_json`.
Return type:: Design

static from_json_serializable(json_map)[source]

Parameters:: json_map (dict[str, Any]) – JSON serializable object encoding this Design, as returned by Design.to_json_serializable().
Returns:: Design represented by dict json_map, assuming it was created by Design.to_json_serializable(). No constraints are populated.
Return type:: Design

add_domain(name, pool=None, fixed_sequence=None, allow_if_exists=False)[source]

Adds a Domain with name name to this Design, and returns it.

This is useful in combination with Design.add_strand() for creating strands by giving domain names instead of explicit Domain objects. This way, the domains can be associated to a DomainPool or have a fixed sequence assigned at the time of creation, without needing to modify the Domain objects after they are created by Design.add_strand().

Parameters:

name (str) – name of the Domain to add. Raises exception if this domain has already been added to this Design.
pool (DomainPool | None) – DomainPool to associate with this Domain. Exactly one of pool and fixed_sequence can be specified.
fixed_sequence (str | None) – fixed sequence to assign to this Domain. Exactly one of pool and fixed_sequence can be specified.
allow_if_exists (bool) – if False (default), raises exception if a Domain with name name already exists in this Design (or its complement).

Return type:

Domain

add_strand(domain_names=None, domains=None, starred_domain_indices=None, group='default_strand_group', name=None, label=None, vendor_fields=None)[source]

This is an alternative way to create strands instead of calling the Strand constructor explicitly. It behaves similarly to the Strand constructor, but it has an option to specify Domain’s simply by giving a name.

A Strand can be created either by listing explicit Domain objects via parameter domains (as in the Strand constructor), or by giving names via parameter domain_names. If domain_names is specified, then by convention those that end with a * are assumed to be starred.

In particular, Domain objects are created as needed, whenever the Design sees a new domain name that has not been encountered. Also, Domain’s created in this way are “interned” as variables in a cache stored in the Design object; no two Domain’s with the same name in this design will be created, and subsequent uses of the same name will refer to the same Domain object.

Parameters:

domain_names (list[str] | None) – Names of the Domain’s on this Strand. Domain objects are created by the Design as needed whenever a new domain name is specified; if the domain name has already been used (or its complement via the convention that names ending in a * are the complement of the domain whose name is equal but without ending in a *), then the same Domain object is reused. Mutually exclusive with Strand.domains and Strand.starred_domain_indices.
domains (list[Domain] | None) – list of Domain’s on this Strand. Mutually exclusive with Strand.domain_names, and must be specified jointly with Strand.starred_domain_indices.
starred_domain_indices (Iterable[int] | None) – Indices of Domain’s in domains that are starred. Mutually exclusive with Strand.domain_names, and must be specified jointly with Strand.domains.
group (str) – name of group of this Strand.
name (str | None) – Name of this Strand.
label (str | None) – Label to associate with this Strand.
vendor_fields (VendorFields | None) – VendorFields object to associate with this Strand; needed to call methods for exporting to IDT formats (e.g., Strand.write_idt_bulk_input_file())

Returns:

the Strand that is created

Return type:

Strand

modifications(mod_type=None)[source]

Returns either set of all modifications.Modification’s in this Design, or set of all modifications of a given type (5’, 3’, or internal).

Parameters:: mod_type (nm.ModificationType | None) – type of modifications (5’, 3’, or internal); if not specified, all three types are returned
Returns:: set of all modifications in this Design (possibly of a given type).
Return type:: set[nm.Modification]

to_idt_bulk_input_format(delimiter=',', domain_delimiter='', key=None, warn_duplicate_name=False, only_strands_with_vendor_fields=False, strands=None)[source]

Called by Design.write_idt_bulk_input_file() to determine what string to write to the file. This function can be used to get the string directly without creating a file.

Parameters have the same meaning as in Design.write_idt_bulk_input_file().

Returns:

string that is written to the file in the method Design.write_idt_bulk_input_file().

Parameters:

delimiter (str)
domain_delimiter (str)
key (KeyFunction[Strand] | None)
warn_duplicate_name (bool)
only_strands_with_vendor_fields (bool)
strands (Iterable[Strand] | None)

Return type:

str

write_idt_bulk_input_file(*, filename=None, directory='.', key=None, extension=None, delimiter=',', domain_delimiter='', warn_duplicate_name=True, only_strands_with_vendor_fields=False, strands=None)[source]

Write .idt text file encoding the strands of this Design with the field Strand.vendor_fields, suitable for pasting into the “Bulk Input” field of IDT (Integrated DNA Technologies, Coralville, IA, https://www.idtdna.com/), with the output file having the same name as the running script but with .py changed to .idt, unless filename is explicitly specified. For instance, if the script is named my_origami.py, then the sequences will be written to my_origami.idt. If filename is not specified but extension is, then that extension is used instead of idt. At least one of filename or extension must be None.

The string written is that returned by Design.to_idt_bulk_input_format().

Parameters:

filename (str) – optional custom filename to use (instead of currently running script)
directory (str) – specifies a directory in which to place the file, either absolute or relative to the current working directory. Default is the current working directory.
key (KeyFunction[Strand] | None) – key function used to determine order in which to output strand sequences. Some useful defaults are provided by strand_order_key_function()
extension (str | None) – alternate filename extension to use (instead of idt)
delimiter (str) – is the symbol to delimit the four IDT fields name,sequence,scale,purification.
domain_delimiter (str) – symbol(s) to put in between DNA sequences of different domains in a strand.
warn_duplicate_name (bool) – if True prints a warning when two different Strand’s have the same VendorFields.name and the same Strand.sequence(). A ValueError is raised (regardless of the value of this parameter) if two different Strand’s have the same name but different sequences, IDT scales, or IDT purifications.
only_strands_with_vendor_fields (bool) – If False (the default), all non-scaffold sequences are output, with reasonable default values chosen if the field Strand.vendor_fields is missing. If True, then strands lacking the field Strand.vendor_fields will not be exported.
strands (Iterable[Strand] | None) – strands to export; if not specified, all strands in design are exported. NOTE: it is not checked that each Strand in strands is actually contained in this any:Design

Return type:

None

write_idt_plate_excel_file(*, filename=None, directory='.', key=None, warn_duplicate_name=False, only_strands_with_vendor_fields=False, use_default_plates=True, warn_using_default_plates=True, plate_type=PlateType.wells96, strands=None)[source]

Write .xls (Microsoft Excel) file encoding the strands of this Design with the field Strand.vendor_fields, suitable for uploading to IDT (Integrated DNA Technologies, Coralville, IA, https://www.idtdna.com/) to describe a 96-well or 384-well plate (https://www.idtdna.com/site/order/plate/index/dna/), with the output file having the same name as the running script but with .py changed to .xls, unless filename is explicitly specified. For instance, if the script is named my_origami.py, then the sequences will be written to my_origami.xls.

If the last plate has fewer than 24 strands for a 96-well plate, or fewer than 96 strands for a 384-well plate, then the last two plates are rebalanced to ensure that each plate has at least that number of strands, because IDT charges extra for a plate with too few strands: https://www.idtdna.com/pages/products/custom-dna-rna/dna-oligos/custom-dna-oligos

Parameters:

filename (str) – custom filename if default (explained above) is not desired
directory (str) – specifies a directory in which to place the file, either absolute or relative to the current working directory. Default is the current working directory.
key (KeyFunction[Strand] | None) –
key function used to determine order in which to output strand sequences. Some useful defaults are provided by strand_order_key_function()
warn_duplicate_name (bool) – if True prints a warning when two different Strand’s have the same VendorFields.name and the same Strand.sequence(). A ValueError is raised (regardless of the value of this parameter) if two different Strand’s have the same name but different sequences, IDT scales, or IDT purifications.
only_strands_with_vendor_fields (bool) – If False (the default), all non-scaffold sequences are output, with reasonable default values chosen if the field Strand.vendor_fields is missing. (though scaffold is included if export_scaffold is True). If True, then strands lacking the field Strand.vendor_fields will not be exported. If False, then use_default_plates must be True.
use_default_plates (bool) – Use default values for plate and well (ignoring those in Strand.vendor_fields, which may be None). If False, each Strand to export must have the field Strand.vendor_fields, so in particular the parameter only_strands_with_idt must be True.
warn_using_default_plates (bool) – specifies whether, if use_default_plates is True, to print a warning for strands whose Strand.vendor_fields has the fields VendorFields.plate and VendorFields.well, since use_default_plates directs these fields to be ignored.
plate_type (PlateType) – a PlateType specifying whether to use a 96-well plate or a 384-well plate if the use_default_plates parameter is True. Ignored if use_default_plates is False, because in that case the wells are explicitly set by the user, who is free to use coordinates for either plate type.
strands (Iterable[Strand] | None) – strands to export; if not specified, all strands in design are exported. NOTE: it is not checked that each Strand in strands is actually contained in this any:Design

Return type:

None

domain_pools()[source]

Returns:: list of all DomainPool’s in this Design
Return type:: list[DomainPool]

domains_by_pool_name(domain_pool_name)[source]

Parameters:: domain_pool_name (str) – name of a DomainPool
Returns:: the Domain’s in domain_pool
Return type:: list[Domain]

static from_scadnano_file(sc_filename, fix_assigned_sequences=True, ignored_strands=None)[source]

Converts a scadnano Design stored in file named sc_filename to a a Design for doing DNA sequence design. Each Strand name and Domain name from the scadnano Design are assigned as Strand.name and Domain.name in the obvious way. Assumes each Strand label is a string describing the strand group.

The scadnano package must be importable.

Also assigns sequences from domains in sc_design to those of the returned Design. If fix_assigned_sequences is true, then these DNA sequences are fixed; otherwise not.

Parameters:

sc_filename (str) – Name of file containing scadnano Design.
fix_assigned_sequences (bool) – Whether to fix the sequences that are assigned from those found in sc_design.
ignored_strands (Iterable[Strand] | None) – Strands to ignore

Returns:

An equivalent Design, ready to be given constraints for DNA sequence design.

Raises:

TypeError – If any scadnano strand label is not a string.

Return type:

Design

static from_scadnano_design(sc_design, fix_assigned_sequences=True, ignored_strands=None, warn_existing_domain_labels=True)[source]

Converts a scadnano Design sc_design to a a Design for doing DNA sequence design. Each Strand name and Domain name from the scadnano Design are assigned as Strand.name and Domain.name in the obvious way. Assumes each Strand label is a string describing the strand group.

The scadnano package must be importable.

Also assigns sequences from domains in sc_design to those of the returned Design. If fix_assigned_sequences is true, then these DNA sequences are fixed; otherwise not.

Parameters:

sc_design (sc.Design) – Instance of scadnano.Design from the scadnano Python scripting library.
fix_assigned_sequences (bool) – Whether to fix the sequences that are assigned from those found in sc_design.
ignored_strands (Iterable[Strand] | None) – Strands to ignore; none are ignore if not specified.
warn_existing_domain_labels (bool) – If True, logs warning when dsd Domain already has a label and so does scadnano domain, since scadnano label will not be assigned to the dsd Domain.

Returns:

An equivalent Design, ready to be given constraints for DNA sequence design.

Raises:

TypeError – If any scadnano strand label is not a string.

Return type:

Design

assign_fields_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False, ignore_domains=False)[source]

Assigns DNA sequence, VendorFields, and StrandGroups (as a key in a scadnano String.label dict under key “group”). TODO: document more

Parameters:

sc_design (Design)
ignored_strands (Collection[Strand])
overwrite (bool)
ignore_domains (bool)

assign_sequences_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False, ignore_domains=False, _skip_compute_derived_fields=False)[source]

Assigns sequences from this Design into sc_design.

Also writes a label to each scadnano strand. If the label is None a new one is created as a dict with a key group. The name of the StrandGroup of the nuad design is the value to assign to this key. If the scadnano strand label is already a dict, it adds this key. If the strand label is not None or a dict, an exception is raised.

Assumes that each domain name in domains in sc_design is a Domain.name of a Domain in this Design.

If multiple strands in sc_design share the same name, then all of them are assigned the DNA sequence of the nuad Strand with that name.

Parameters:

sc_design (Design) – a scadnano design.
ignored_strands (Collection[Strand]) – strands in the scadnano design that are to be ignored by the sequence designer.
overwrite (bool) – if True, overwrites existing sequences; otherwise gives an error if an existing sequence disagrees with the newly assigned sequence
ignore_domains (bool) – if True, does not check that domain names match between the designs. It is still required that each strand name appearing in sc_design is the name of a strand in this design. Cannot be specified if overwrite is False.
_skip_compute_derived_fields (bool)

Return type:

None

shared_strands_with_scadnano_design(sc_design, ignored_strands=())[source]

Returns a list of pairs (nuad_strand, sc_strands), where nuad_strand has the same name as all scadnano Strands in sc_strands, but only scadnano strands are included in the list that do not appear in ignored_strands.

Parameters:

sc_design (Design)
ignored_strands (Collection[Strand])

Return type:

list[tuple[Strand, list[Strand]]]

assign_strand_groups_to_labels(sc_design, ignored_strands=(), overwrite=False)[source]

TODO: document this

Parameters:

sc_design (Design)
ignored_strands (Iterable[Strand])
overwrite (bool)

Return type:

None

assign_idt_fields_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]

Assigns VendorFields from this Design into sc_design.

If multiple strands in sc_design share the same name, then all of them are assigned the IDT fields of the dsd Strand with that name.

Parameters:

sc_design (Design) – a scadnano design.
ignored_strands (Iterable[Strand]) – strands in the scadnano design that are to be not assigned.
overwrite (bool) – whether to overwrite existing fields.

Raises:

ValueError – if scadnano strand already has any modifications assigned

Return type:

None

assign_modifications_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]

Assigns modifications.Modification’s from this Design into sc_design.

If multiple strands in sc_design share the same name, then all of them are assigned the modifications of the dsd Strand with that name.

Parameters:

sc_design (Design) – a scadnano design.
ignored_strands (Iterable[Strand]) – strands in the scadnano design that are to be not assigned.
overwrite (bool) – whether to overwrite existing fields in scadnano design

Raises:

ValueError – if scadnano strand already has any modifications assigned

Return type:

None

copy_sequences_from(other)[source]

Assuming every Domain in this Design is has a matching (same name) Domain in other, copies sequences from other into this Design.

Parameters:: other (Design) – other Design from which to copy sequences
Return type:: None

check_all_subdomain_graphs_acyclic()[source]

Check that all domain graphs (if subdomains are used) are acyclic.

Return type:: None

check_all_subdomain_graphs_uniquely_assignable()[source]

Check that subdomain graphs are consistent and raise error if not.

Return type:: None

class constraints.Constraint(description, short_description='', weight=1.0, score_transfer_function=None)[source]

Abstract base class of all “soft” constraints to apply when running search.search_for_dna_sequences(). Unlike a NumpyFilter or a SequenceFilter, which disallow certain DNA sequences from ever being assigned to a Domain, a Constraint can be violated during the search. The goal of the search is to reduce the number of violated Constraint’s. See search.search_for_dna_sequences() for a more detailed description of how the search algorithm interacts with the constraints.

You will not use this class directly, but instead its concrete subclasses DomainConstraint, StrandConstraint, DomainPairConstraint, StrandPairConstraint, ComplexConstraint, which are subclasses of SingularConstraint, DomainsConstraint, StrandsConstraint, DomainPairsConstraint, StrandPairsConstraint, which are subclasses of BulkConstraint.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)

description: str: Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’ suitable for printing in a long text report.

short_description: str = '': Very short description of the constraint suitable for compactly logging to the screen, where many of these short descriptions must fit onto one line, e.g., ‘strand ss’ or ‘dom pair nupack’

weight: float = 1.0: Constant multiplier Weight of the problem; the higher the total weight of all the Constraint’s a Domain has caused, the greater likelihood its sequence is changed when stochastically searching for sequences to satisfy all constraints.

score_transfer_function: Callable[[float], float] | None = None

See nuad.search.SearchParameters.score_transfer_function.

If specified, this will override the one specified in nuad.search.SearchParameters.score_transfer_function.

abstract static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)

Return type:

None

class constraints.Result(excess, summary=None, value=None, unit=None)[source]

A Result is returned from the function SingularConstraint.evaluate, and a list of Result’s is returned from the function BulkConstraint.evaluate_bulk, describing the result of evaluating the constraint on the design “part”.

A Result must have an “excess” and “summary” specified.

Optionally one may also specify a “value”, which helps in graphically displaying the results of evaluating constraints using the function display_report().

For example, if the constraint checks that the NUPACK complex free energy of a strand is at least -2.5 kcal/mol, and a strand has energy -3.4 kcal/mol, then the following are sensible values for these fields:

value = -3.4
unit = "kcal/mol"
excess = -0.9
summary = "-3.4 kcal/mol"

Parameters:

excess (float)
summary (str | None)
value (float | None)
unit (str | None)

__init__(excess, summary=None, value=None, unit=None)[source]

Parameters:

excess (float)
summary (str | None)
value (float | None)
unit (str | None)

Return type:

None

excess: float

The excess is a nonnegative value that is turned into a score, and the search minimizes the total score of all constraint evaluations. Setting this to 0 (or a negative value) means the constraint is satisfied, and setting it to a positive value means the constraint is violated. The interpretation is that the larger excess is, the more the constraint is violated.

For example, a common value for excess is the amount by which the NUPACK complex free energy exceeds a threshold.

value: float | None = None

If this is a “numeric” constraint, i.e., checking some number such as the complex free energy of a strand and comparing it to a threshold, this is the “raw” value. It is optional, but if specified, then the raw values can be plotted in a Jupyter notebook by the function display_report().

Optional units (e.g., ‘kcal/mol’) can be specified in the field Result.units.

unit: str | None = None

Optional units for Result.value, e.g., 'kcal/mol'.

If specified, then the units are used in text reports and to label the y-axis in plots created by search.display_report().

score: float = 0.0: Set by the search algorithm based on Result.excess as well as other data such as the constraint’s weight and the SearchParameters.score_transfer_function.

part: DesignPart: Set by the search algorithm based on the part that was evaluated.

property summary: str

This string is displayed in the text report on constraints, after the name of the “part” (e.g., strand, pair of domains, pair of strands).

It can be set explicitly, or calculated from Result.value if not set explicitly.

class constraints.SingularConstraint(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, evaluate: 'Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]]' = <function SingularConstraint.<lambda> at 0x7dc03455fb80>, parallel: 'bool' = False)[source]

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)

evaluate()

Essentially a wrapper for a function that evaluates the Constraint. It takes as input a tuple of DNA sequences (Python strings) and an optional Part, where Part is one of Domain, Strand, DomainPair, StrandPair, or Complex (the latter being an alias for arbitrary-length tuple of Strand’s).

The second argument will be None if SingularConstraint.parallel is True (since it’s more expensive to serialize the Domain and Strand objects than strings for passing data to processes executing in parallel).

Thus, if the Constraint needs to use more data about the Part than just its DNA sequence, by accessing the second argument, Constraint.parallel should be set to False.

It should return a Result object.

parallel: bool = False: Whether or not to use parallelization across multiple processes to take advantage of multiple processors/cores, by calling SingularConstraint.evaluate on different DesignParts in separate processes.

call_evaluate(seqs, part, score_transfer_function)[source]

Evaluates this Constraint using function SingularConstraint.evaluate supplied in constructor.

Parameters:

seqs (tuple[str, ...]) – sequence(s) of relevant Part, e.g., if part is a pair of Strand’s, then seqs is a pair of strings
part (DesignPart | None) – the Part to be evaluated. Might be None if parallelization is being used, since it is cheaper to serialize only the sequence(s) than the entire Part for passing to other processes to evaluate in parallel.
score_transfer_function (Callable[[float], float]) – function to apply to the excess value of the Result returned by the evaluate function, to compute the score.

Returns:

a Result object

Return type:

Result[DesignPart]

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)

Return type:

None

class constraints.BulkConstraint(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, evaluate_bulk: 'Callable[[Sequence[DesignPart]], list[Result]]' = <function BulkConstraint.<lambda> at 0x7dc03455fee0>)[source]

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])

Return type:

None

class constraints.ConstraintWithDomains(domains: 'tuple[Domain, ...] | None' = None)[source]

Parameters:: domains (tuple[Domain, ...] | None)

domains: tuple[Domain, ...] | None = None: tuple of Domain’s to check; if not specified, all Domain’s in Design are checked.

__init__(domains=None)

Parameters:: domains (tuple[Domain, ...] | None)
Return type:: None

class constraints.ConstraintWithStrands(strands: 'tuple[Strand, ...] | None' = None)[source]

Parameters:: strands (tuple[Strand, ...] | None)

strands: tuple[Strand, ...] | None = None: tuple of Strand’s to check; if not specified, all Strand’s in Design are checked.

__init__(strands=None)

Parameters:: strands (tuple[Strand, ...] | None)
Return type:: None

class constraints.DomainConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domains=None)[source]

Constraint that applies to a single Domain.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
domains (tuple[Domain, ...] | None)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domains=None)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
domains (tuple[Domain, ...] | None)

Return type:

None

class constraints.StrandConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strands=None)[source]

Constraint that applies to a single Strand.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
strands (tuple[Strand, ...] | None)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strands=None)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
strands (tuple[Strand, ...] | None)

Return type:

None

class constraints.ConstraintWithDomainPairs(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, domain_pairs: 'tuple[DomainPair, ...] | None' = None, pairs: 'InitVar[Iterable[tuple[Domain, Domain], ...] | None]' = None, check_domain_against_itself: 'bool' = True)[source]

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)

domain_pairs: tuple[DomainPair, ...] | None = None

List of DomainPair’s to check; if not specified, all pairs in Design are checked.

This can be specified manmually, or alternately is set internally in the constructor based on the optional __init__ parameter pairs.

pairs: InitVar[Iterable[tuple[Domain, Domain], ...] | None] = None: Init-only variable (specified in constructor, but is not a field in the class) for specifying pairs of domains to check; if not specified, all pairs in Design are checked, unless ConstraintWithDomainPairs.domain_pairs is specified.

check_domain_against_itself: bool = True: Whether to check a domain against itself when checking all pairs of Domain’s in the Design. Only used if ConstraintWithDomainPairs.pairs is not specified, otherwise it is ignored.

__init__(description, short_description='', weight=1.0, score_transfer_function=None, domain_pairs=None, pairs=None, check_domain_against_itself=True)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)

Return type:

None

class constraints.ConstraintWithStrandPairs(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, strand_pairs: 'tuple[StrandPair, ...] | None' = None, pairs: 'InitVar[Iterable[tuple[Strand, Strand], ...] | None]' = None, check_strand_against_itself: 'bool' = True)[source]

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)

strand_pairs: tuple[StrandPair, ...] | None = None

List of StrandPair’s to check; if not specified, all pairs in Design are checked.

This can be specified manmually, or alternately is set internally in the constructor based on the optional __init__ parameter pairs.

pairs: InitVar[Iterable[tuple[Strand, Strand], ...] | None] = None: Init-only variable (specified in constructor, but is not a field in the class) for specifying pairs of strands; if not specified, all pairs in Design are checked, unless ConstraintWithStrandPairs.strand_pairs is specified.

check_strand_against_itself: bool = True: Whether to check a strand against itself when checking all pairs of Strand’s in the Design. Only used if ConstraintWithStrandPairs.pairs is not specified, otherwise it is ignored.

__init__(description, short_description='', weight=1.0, score_transfer_function=None, strand_pairs=None, pairs=None, check_strand_against_itself=True)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)

Return type:

None

class constraints.DomainPairConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domain_pairs=None, pairs=None, check_domain_against_itself=True)[source]

Constraint that applies to a pair of Domain’s.

These should be symmetric, meaning that the constraint will give the same evaluation whether its evaluate method is given the pair (domain1, domain2), or the pair (domain2, domain1).

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domain_pairs=None, pairs=None, check_domain_against_itself=True)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)

Return type:

None

class constraints.StrandPairConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strand_pairs=None, pairs=None, check_strand_against_itself=True)[source]

Constraint that applies to a pair of Strand’s.

These should be symmetric, meaning that the constraint will give the same evaluation whether its evaluate method is given the pair (strand1, strand2), or the pair (strand2, strand1).

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strand_pairs=None, pairs=None, check_strand_against_itself=True)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)

Return type:

None

class constraints.DomainsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domains=None)[source]

Constraint that applies to a several Domain’s.

The difference with DomainConstraint is that the caller may want to process all Domain’s at once, e.g., by giving many of them to a third-party program such as ViennaRNA, which may be more efficient than repeatedly calling a Python function.

It is assumed that the constraint works by checking one Domain at a time. After computing initial violations of constraints, subsequent calls to this constraint only give the domain that was mutated, not the entire of Domain’s in the whole Design.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
domains (tuple[Domain, ...] | None)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domains=None)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
domains (tuple[Domain, ...] | None)

Return type:

None

class constraints.StrandsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strands=None)[source]

Constraint that applies to a several Strand’s.

The difference with StrandConstraint is that the caller may want to process all Strand’s at once, e.g., by giving many of them to a third-party program such as ViennaRNA.

It is assumed that the constraint works by checking one Strand at a time. After computing initial violations of constraints, subsequent calls to this constraint only give strands containing the domain that was mutated, not the entire of Strand’s in the whole Design.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
strands (tuple[Strand, ...] | None)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strands=None)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
strands (tuple[Strand, ...] | None)

Return type:

None

class constraints.DomainPairsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domain_pairs=None, pairs=None, check_domain_against_itself=True)[source]

Similar to DomainsConstraint but operates on a specified list of pairs of Domain’s.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domain_pairs=None, pairs=None, check_domain_against_itself=True)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)

Return type:

None

class constraints.StrandPairsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strand_pairs=None, pairs=None, check_strand_against_itself=True)[source]

Similar to StrandsConstraint but operates on a specified list of pairs of Strand’s.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)

static part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strand_pairs=None, pairs=None, check_strand_against_itself=True)

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)

Return type:

None

constraints.verify_designs_match(design1, design2, check_fixed=True)[source]

Verifies that two designs match, other than their constraints. This is useful when loading a design that has been saved in the middle of searching for DNA sequences, to verify that it matches a design created before the DNA sequence search started.

Parameters:

design1 (Design) – A Design.
design2 (Design) – Another Design.
check_fixed (bool) – Whether to check for fixed sequences equal between the two (may want to not check in case these are set later).

Raises:

ValueError – If the designs do not match. Here is what is checked: - strand names and group names appear in the same order - domain names and pool names appear in the same order in strands with the same name - Domain.fixed matches between Domain’s

Return type:

None

constraints.convert_threshold(threshold, key)[source]

Parameters:

threshold (float | dict[T, float]) – either a single float, or a dictionary mapping instances of T to floats
key (T) – instance of T

Returns:

threshold for key

Return type:

float

constraints.nupack_domain_free_energy_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, parallel=False, description=None, short_description='strand_ss_nupack', domains=None)[source]

Returns constraint that checks individual Domain’s for excessive interaction using NUPACK’s pfunc.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

threshold (float) – energy threshold in kcal/mol
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
weight (float) – how much to weigh this Constraint
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.
domains (Iterable[Domain] | None) – Domain’s to check; if not specified, all domains are checked.
description (str | None) – detailed description of constraint suitable for putting in report; if not specified a reasonable default is chosen
short_description (str) – short description of constraint suitable for logging to stdout

Returns:

the constraint

Return type:

DomainConstraint

constraints.forbidden_substring_strand_constraint(forbidden_substring, weight=1.0, score_transfer_function=None, parallel=False, description=None, short_description=None, strands=None)[source]

Returns constraint that checks that a given substring does not appear in the sequence of any of the given Strand’s. For individual domains, this is better implemented using ForbiddenSubstringFilter, but that will not catch substrings such as GGGG that cross domain boundaries.

Parameters:

forbidden_substring (str | Iterable[str])
weight (float)
score_transfer_function (Callable[[float], float] | None)
parallel (bool)
description (str | None)
short_description (str | None)
strands (Iterable[Strand] | None)

Return type:

StrandConstraint

constraints.nupack_strand_free_energy_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, parallel=False, description=None, short_description='strand_ss_nupack', strands=None)[source]

Returns constraint that checks individual Strand’s for excessive interaction using NUPACK’s pfunc. This is the so-called “complex free energy”: https://docs.nupack.org/definitions/#complex-free-energy

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

threshold (float) – energy threshold in kcal/mol
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
weight (float) – how much to weigh this Constraint
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.
strands (Iterable[Strand] | None) – Strands to check; if not specified, all strands are checked.
description (str | None) – detailed description of constraint suitable for putting in report; if not specified a reasonable default is chosen
short_description (str) – short description of constraint suitable for logging to stdout

Returns:

the constraint

Return type:

StrandConstraint

constraints.nupack_domain_pair_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, parallel=False, weight=1.0, score_transfer_function=None, description=None, short_description='dom_pair_nupack', pairs=None)[source]

Returns constraint that checks given pairs of Domain’s for excessive interaction using NUPACK’s pfunc executable. Each of the four combinations of seq1, seq2 and their Watson-Crick complements are compared.

Parameters:

threshold (float) – Energy threshold in kcal/mol.
temperature (float) – Temperature in Celsius
sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
parallel (bool) – Whether to test each pair of Domain’s in parallel (i.e., sets field Constraint.parallel)
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
description (str | None) – Detailed description of constraint suitable for summary report.
short_description (str) – See Constraint.short_description
pairs (Iterable[tuple[Domain, Domain]] | None) – Pairs of Domain’s to compare; if not specified, checks all pairs (including a Domain against itself).

Returns:

The DomainPairConstraint.

Return type:

DomainPairConstraint

constraints.nupack_strand_pair_constraints_by_number_matching_domains(thresholds, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, ignore_missing_thresholds=False)[source]

Convenience function for creating many constraints as returned by nupack_strand_pair_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.

Optional parameters descriptions and short_descriptions are also dicts keyed by the same keys.

Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.

It is also common to set different thresholds according to the lengths of the strands. This can be done by calling strand_pairs_by_lengths() to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.

Parameters:

thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
sodium (float) – concentration of Na+ in molar
magnesium (float) – concentration of Mg++ in molar
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of Strand’s in parallel.
strands (Iterable[Strand] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in pairs. Mutually exclusive with pairs.
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.
ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.

Returns:

list of constraints, one per threshold in thresholds

Return type:

list[StrandPairConstraint]

constraints.rna_duplex_strand_pair_constraints_by_number_matching_domains(thresholds, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, ignore_missing_thresholds=False)[source]

Convenience function for creating many constraints as returned by rna_duplex_strand_pair_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.

Optional parameters descriptions and short_descriptions are also dicts keyed by the same keys.

Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.

It is also common to set different thresholds according to the lengths of the strands. This can be done by calling strand_pairs_by_lengths() to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.

Parameters:

thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
sodium (float) – concentration of Na+ in molar
magnesium (float) – concentration of Mg++ in molar
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of Strand’s in parallel.
strands (Iterable[Strand] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in pairs. Mutually exclusive with pairs.
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.
ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.

Returns:

list of constraints, one per threshold in thresholds

Return type:

list[StrandPairConstraint]

constraints.nupack_strand_pair_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, description=None, short_description='strand_pair_nupack', parallel=False, pairs=None)[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using NUPACK’s pfunc function.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

threshold (float) – Energy threshold in kcal/mol
temperature (float) – Temperature in Celsius
sodium (float) – concentration of Na+ in molar
magnesium (float) – concentration of Mg++ in molar
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.
description (str | None) – Detailed description of constraint suitable for report.
short_description (str) – See Constraint.short_description
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs (including a Strand against itself).

Returns:

The StrandPairConstraint.

Return type:

StrandPairConstraint

constraints.rna_duplex_strand_pair_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_dup_strand_pairs', parallel=False, pairs=None, strands=None)[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using Vienna RNA’s RNAduplex executable.

Often one wishes to let the threshold depend on how many domains match between a pair of strands. The function rna_duplex_strand_pairs_constraints_by_number_matching_domains() is useful for this purpose, returning a list of StrandPairsConstraint’s such as those returned by this function, one for each possible number of matching domains.

Parameters:

threshold (float) – Energy threshold in kcal/mol. If a float, this is used for all pairs of strands. If a dict[int, float], interpreted to mean that
temperature (float) – Temperature in Celsius.
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
description (str | None) – See Constraint.description
short_description (str) – See Constraint.short_description
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in design.
max_energy – This is the maximum energy possible to assign. If RNAduplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAduplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.
gu_wobble – Whether to allow GU wobble pairs; I honestly don’t know what this means for DNA, but it’s an option in RNAduplex, and it makes a difference in calculating the energy. NOTE: this is a global property of the ViennaRNA package, so this must be set the same for every constraint that uses ViennaRNA.
parallel (bool)
strands (Iterable[Strand] | None)

Returns:

The StrandPairConstraint.

Return type:

StrandPairConstraint

constraints.chunker(sequence, chunk_length=None, num_chunks=None)[source]

Collect data into fixed-length chunks or blocks, e.g., chunker(‘ABCDEFG’, 3) –> ABC DEF G

Parameters:

sequence (Sequence[T]) – Sequence (list or tuple) of items.
chunk_length (int | None) – Length of each chunk. Mutually exclusive with num_chunks.
num_chunks (int | None) – Number of chunks. Mutually exclusive with chunk_length.

Returns:

list of num_chunks lists, each list of length chunk_length (one of num_chunks or chunk_length will be calculated from the other).

Return type:

list[list[T]]

constraints.cpu_count(logical=False)[source]

Counts the number of physical CPUs (cores). For greatest accuracy, requires the 3rd party psutil package to be installed.

Parameters:: logical (bool) – Whether to count number of logical processors or physical CPU cores.
Returns:: Number of physical CPU cores if logical is False and package psutils is installed; otherwise, the number of logical processors.
Return type:: int

constraints.rna_duplex_domain_pairs_constraint(threshold, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_dup_dom_pairs', pairs=None, domains=None)[source]

Returns constraint that checks given pairs of Domain’s for excessive interaction using Vienna RNA’s duplexfold function (what RNADuplex command-line tool does).

Parameters:

threshold (float) – energy threshold
temperature (float) – temperature in Celsius
weight (float) – how much to weigh this Constraint
score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.
description (str | None) – long description of constraint suitable for printing in report file
short_description (str) – short description of constraint suitable for logging to stdout
pairs (Iterable[tuple[Domain, Domain]] | None) – pairs of Domain’s to compare; if neither this nor domains specified, checks all pairs
domains (Iterable[Domain] | None) – Domain’s to compare (all pairs with replacement); if neither this nor pairs specified, checks all pairs
parameters_filename – name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()

Returns:

constraint

Return type:

DomainPairsConstraint

constraints.rna_plex_domain_pairs_constraint(threshold, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_plex_dom_pairs', pairs=None, parameters_filename='dna_mathews1999.par')[source]

Returns constraint that checks given pairs of Domain’s for excessive interaction using Vienna RNA’s RNAplex executable.

Parameters:

threshold (float) – energy threshold
temperature (float) – temperature in Celsius
weight (float) – how much to weigh this Constraint
score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.
description (str | None) – long description of constraint suitable for printing in report file
short_description (str) – short description of constraint suitable for logging to stdout
pairs (Iterable[tuple[Domain, Domain]] | None) – pairs of Domain’s to compare; if not specified, checks all pairs
parameters_filename (str) – name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()

Returns:

constraint

Return type:

DomainPairsConstraint

constraints.nupack_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, description=None, short_description='dom_pair_nupack_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Similar to rna_plex_domain_pairs_nonorthogonal_constraint(), but uses NUPACK instead of RNAplex. Only two parameters sodium and magnesium are different; documented here.

Parameters:

sodium (float) – concentration of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – concentration of magnesium (Mg++) in moles per liter
thresholds (dict[tuple[Domain, bool, Domain, bool] | tuple[Domain, Domain], tuple[float, float]])
temperature (float)
weight (float)
score_transfer_function (Callable[[float], float] | None)
description (str | None)
short_description (str)
parameters_filename (str)
max_energy (float)

Return type:

DomainPairsConstraint

constraints.rna_plex_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_plex_dom_pairs_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Returns constraint that checks given pairs of Domain’s for interaction energy in a given interval, using ViennaRNA’s RNAplex executable.

This can be used to implement “nonorthogonal” domains as described in these papers:

The “binding affinity table” as described in the first paper is implemented as the thresholds parameter of this function.

Parameters:

thresholds (dict[tuple[Domain, bool, Domain, bool] | tuple[Domain, Domain], tuple[float, float]]) –
dict mapping pairs of Domain’s, along with Booleans to indicate whether either Domain is starred, to pairs of energy thresholds. Alternately, the key can be a pair of Domain’s (a,b) without any Booleans; in this case only the unstarred versions of the domains are checked; this is equivalent to the key (a, False, b, False).

For example, if the dict is
```
{
  (a, False, b, False): (-9.0, -8.0),
  (a, False, b, True):  (-5.0, -3.0),
  (a, True,  c, False): (-1.0,  0.0),
  (a,        d):        (-7.0, -6.0),
}
```
then the constraint ensures that RNAplex energy for
- (a,b) is between -9.0 and -8.0 kcal/mol,
- (a,b*) is between -5.0 and -3.0 kcal/mol,
- (a*,c) is between -1.0 and 0.0 kcal/mol, and
- (a,d) is between -7.0 and -6.0 kcal/mol.
For all other pairs of domains not listed as keys in the dict (such as (b,c) or (a*,b*) above), the constraint is not checked.
temperature (float) – temperature in Celsius
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float]) – See Constraint.score_transfer_function.
description (str | None) – See Constraint.description.
short_description (str) – See Constraint.short_description.
parameters_filename (str) – name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()
max_energy (float) – maximum energy to return; if the RNAplex returns a value larger than this, then this value is used instead

Returns:

constraint

Return type:

DomainPairsConstraint

constraints.rna_duplex_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_plex_dom_pairs_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Similar to rna_plex_domain_pairs_nonorthogonal_constraint(), but uses RNAduplex instead of RNAplex.

Parameters:

thresholds (dict[tuple[Domain, bool, Domain, bool] | tuple[Domain, Domain], tuple[float, float]])
temperature (float)
weight (float)
score_transfer_function (Callable[[float], float])
description (str | None)
short_description (str)
parameters_filename (str)
max_energy (float)

Return type:

DomainPairsConstraint

constraints.rna_multifold_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_multifold_dom_pairs_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Similar to rna_plex_domain_pairs_nonorthogonal_constraint(), but uses ViennaRNA’s multi-strand partition function (RNA.fold_compound(...).pf(), the Python analog of NUPACK’s pfunc()) instead of RNAplex.

Parameters:

thresholds (dict[tuple[Domain, bool, Domain, bool] | tuple[Domain, Domain], tuple[float, float]])
temperature (float)
weight (float)
score_transfer_function (Callable[[float], float])
description (str | None)
short_description (str)
parameters_filename (str)
max_energy (float)

Return type:

DomainPairsConstraint

constraints.strand_pairs_by_lengths(strands)[source]

Separates pairs of strands in strands by lengths. If there are n different strand lengths in strands, then there are ((n+1) choose 2) keys in the returned dict, one for each pair of lengths (len1, len2), including pairs where len1 == len2. This key maps to a list of all pairs of strands in strands where the first strand has length len1 and the second has length len2.

Parameters:: strands (Iterable[Strand]) – strands to check
Returns:: dict mapping pairs of lengths to pairs of strands from strands having those respective lengths
Return type:: dict[tuple[int, int], list[tuple[Strand, Strand]]]

constraints.strand_pairs_by_number_matching_domains(*, strands=None, pairs=None)[source]

Utility function for calculating number of complementary domains betweeen several pairs of strands.

Note that for the common use case that you want to create several constraints, each with their own threshold depending on the number of complementary domains, you should use functions such as rna_duplex_strand_pairs_constraints_by_number_matching_domains() or nupack_strand_pair_constraints_by_number_matching_domains(), which in turn calls this function and then creates as many constraints as their are different numbers of complementary domains.

Parameters:

strands (Iterable[Strand] | None) – list of Strand’s in which to find pairs. Mutually exclusive with pairs.
pairs (Iterable[tuple[Strand, Strand]] | None) – list of pairs of strands. Mutually exclusive with strands.

Returns:

dict mapping integer (number of complementary Domain’s) to the list of pairs of strands in strands with that number of complementary domains

Return type:

dict[int, list[tuple[Strand, Strand]]]

constraints.rna_multifold_strand_pair_constraints_by_number_matching_domains(*, thresholds, temperature=37, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, ignore_missing_thresholds=False)[source]

Similar to rna_duplex_strand_pair_constraints_by_number_matching_domains() but creates constraints as returned by rna_multifold_strand_pair_constraint().

Parameters:

thresholds (dict[int, float])
temperature (float)
weight (float)
score_transfer_function (Callable[[float], float] | None)
descriptions (dict[int, str] | None)
short_descriptions (dict[int, str] | None)
parallel (bool)
strands (Iterable[Strand] | None)
pairs (Iterable[tuple[Strand, Strand]] | None)
ignore_missing_thresholds (bool)

Return type:

list[StrandPairConstraint]

constraints.rna_duplex_strand_pairs_constraints_by_number_matching_domains(*, thresholds, temperature=37, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, parameters_filename='dna_mathews1999.par', ignore_missing_thresholds=False)[source]

Convenience function for creating many constraints as returned by rna_duplex_strand_pairs_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.

Optional parameters description and short_description are also dicts keyed by the same keys.

Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.

It is also common to set different thresholds according to the lengths of the strands. This can be done by calling strand_pairs_by_lengths() to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.

Parameters:

thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of Strand’s in parallel.
strands (Iterable[Strand] | None) – Strand’s to compare; if not specified, checks all in design. Mutually exclusive with pairs.
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.
parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()
ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.

Returns:

list of constraints, one per threshold in thresholds

Return type:

list[StrandPairsConstraint]

constraints.rna_plex_strand_pairs_constraints_by_number_matching_domains(*, thresholds, temperature=37, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, parameters_filename='dna_mathews1999.par', ignore_missing_thresholds=False)[source]

Convenience function for creating many constraints as returned by rna_plex_strand_pairs_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.

Optional parameters description and short_description are also dicts keyed by the same keys.

Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.

It is also common to set different thresholds according to the lengths of the strands. This can be done by calling strand_pairs_by_lengths() to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.

Parameters:

thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of Strand’s in parallel.
strands (Iterable[Strand] | None) – Strand’s to compare; if not specified, checks all in design. Mutually exclusive with pairs.
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.
parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_plex_multiple()
ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.

Returns:

list of constraints, one per threshold in thresholds

Return type:

list[StrandPairsConstraint]

constraints.longest_complementary_subsequences_python_loop(arr1, arr2, gc_double)[source]

Like longest_complementary_subsequences(), but uses a Python loop instead of numpy operations. This is slower, but is easier to understand and useful for testing.

Parameters:

arr1 (ndarray)
arr2 (ndarray)
gc_double (bool)

Return type:

list[int]

constraints.longest_complementary_subsequences_two_loops(arr1, arr2, gc_double)[source]

Calculate length of longest common subsequences between arr1[i] and arr2[i] for each i, storing in returned list result[i].

This uses two nested Python loops to calculate the whole dynamic programming table. longest_complementary_subsequences() is slightly faster because it maintains only the diagonal of the DP table, and uses numpy vectorized operations to calculate the next diagonal of the table.

When used for DNA sequences, this assumes arr2 has been reversed along axis 1, i.e., the sequences in arr1 are assumed to be oriented 5’ –> 3’, and the sequences in arr2 are assumed to be oriented 3’ –> 5’.

Parameters:

arr1 (ndarray) – 2D array of DNA sequences, with each sequence represented as a 1D array of 0, 1, 2, 3 corresponding to A, C, G, T, respectively, with each row being a single DNA sequence oriented 5’ –> 3’.
arr2 (ndarray) – 2D array of DNA sequences, with each row being a single DNA sequence oriented 3’ –> 5’.
gc_double (bool) – Whether to double the score for G-C base pairs.

Returns:

list ret of ints, where ret[i] is the length of the longest complementary subsequence between arr1[i] and arr2[i].

Return type:

list[int]

constraints.longest_complementary_subsequences(arr1, arr2, gc_double)[source]

Calculate length of longest common subsequences between arr1[i] and arr2[i] for each i, storing in returned list result[i].

Unlike longest_complementary_subsequences_two_loops(), this uses only one Python loop, by using the “anti-diagonal” method for evaluating the dynamic programming table, calculating a whole anti-diagonal from the previous two in O(1) numpy commands.

When used for DNA sequences, this assumes arr2 has been reversed along axis 1, i.e., the sequences in arr1 are assumed to be oriented 5’ –> 3’, and the sequences in arr2 are assumed to be oriented 3’ –> 5’.

Parameters:

arr1 (ndarray) – 2D array of DNA sequences, with each sequence represented as a 1D array of 0, 1, 2, 3 corresponding to A, C, G, T, respectively, with each row being a single DNA sequence oriented 5’ –> 3’.
arr2 (ndarray) – 2D array of DNA sequences, with each row being a single DNA sequence oriented 3’ –> 5’.
gc_double (bool) – Whether to double the score for G-C base pairs. (assumes that integers 1,2 represent C,G respectively)

Returns:

list ret of ints, where ret[i] is the length of the longest complementary subsequence between arr1[i] and arr2[i].

Return type:

list[int]

constraints.lcs_domain_pairs_constraint(*, threshold, weight=1.0, score_transfer_function=None, description=None, short_description='lcs domain pairs', domains=None, pairs=None, check_domain_against_itself=True, gc_double=True)[source]

Checks pairs of domain sequences for longest complementary subsequences. This can be thought of as a very rough heuristic for “binding energy” that is much less accurate than NUPACK or ViennaRNA, but much faster to evaluate.

Args

threshold: Max length of complementary subsequence allowed.

weight: See Constraint.weight.

score_transfer_function: See Constraint.score_transfer_function.

description: See Constraint.description

short_description: See Constraint.short_description

pairs: Pairs of Strand’s to compare; if not specified, checks all pairs in design.

check_domain_against_itself: Whether to check domain x against x*. (Obviously x has a maximal: common subsequence with x, so we don’t check that.)

gc_double: Whether to weigh G-C base pairs as double (i.e., they count for 2 instead of 1).

Returns

A DomainPairsConstraint that checks given pairs of Domain’s for excessive interaction due to having long complementary subsequences.

Parameters:

threshold (int)
weight (float)
score_transfer_function (Callable[[float], float] | None)
description (str | None)
short_description (str)
domains (Iterable[Domain] | None)
pairs (Iterable[tuple[Domain, Domain]] | None)
check_domain_against_itself (bool)
gc_double (bool)

Return type:

DomainPairsConstraint

constraints.lcs_strand_pairs_constraint(*, threshold, weight=1.0, score_transfer_function=None, description=None, short_description='lcs strand pairs', pairs=None, check_strand_against_itself=True, gc_double=True)[source]

Checks pairs of strand sequences for longest complementary subsequences. This can be thought of as a very rough heuristic for “binding energy” that is much less accurate than NUPACK or ViennaRNA, but much faster to evaluate.

Args

threshold: Max length of complementary subsequence allowed.

weight: See Constraint.weight.

score_transfer_function: See Constraint.score_transfer_function.

description: See Constraint.description.

short_description: See Constraint.short_description.

pairs: Pairs of Strand’s to compare; if not specified, checks all pairs in design.

check_strand_against_itself: Whether to check a strand against itself.

gc_double: Whether to weigh G-C base pairs as double (i.e., they count for 2 instead of 1).

Returns

A StrandPairsConstraint that checks given pairs of Strand’s for excessive interaction due to having long complementary subsequences.

Parameters:

threshold (int)
weight (float)
score_transfer_function (Callable[[float], float] | None)
description (str | None)
short_description (str)
pairs (Iterable[tuple[Strand, Strand]] | None)
check_strand_against_itself (bool)
gc_double (bool)

Return type:

StrandPairsConstraint

constraints.lcs_strand_pairs_constraints_by_number_matching_domains(*, thresholds, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, gc_double=True, parameters_filename='', ignore_missing_thresholds=False)[source]

TODO

Parameters:

thresholds (dict[int, int])
weight (float)
score_transfer_function (Callable[[float], float] | None)
descriptions (dict[int, str] | None)
short_descriptions (dict[int, str] | None)
parallel (bool)
strands (Iterable[Strand] | None)
pairs (Iterable[tuple[Strand, Strand]] | None)
gc_double (bool)
parameters_filename (str)
ignore_missing_thresholds (bool)

Return type:

list[StrandPairsConstraint]

constraints.rna_duplex_strand_pairs_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_dup_strand_pairs', parallel=False, pairs=None, parameters_filename='dna_mathews1999.par')[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using Vienna RNA’s RNAduplex executable.

Often one wishes to let the threshold depend on how many domains match between a pair of strands. The function rna_duplex_strand_pairs_constraints_by_number_matching_domains() is useful for this purpose, returning a list of StrandPairsConstraint’s such as those returned by this function, one for each possible number of matching domains.

TODO: explain that this should be many pairs of strands to be fast

Parameters:

threshold (float) – Energy threshold in kcal/mol. If a float, this is used for all pairs of strands. If a dict[int, float], interpreted to mean that
temperature (float) – Temperature in Celsius.
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
description (str | None) – See Constraint.description
short_description (str) – See Constraint.short_description
parallel (bool) – Whether to test each pair of Strand’s in parallel.
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in design.
parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_duplex_multiple()

Returns:

The StrandPairsConstraint.

Return type:

StrandPairsConstraint

constraints.rna_plex_strand_pairs_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_plex_strand_pairs', parallel=False, pairs=None, parameters_filename='dna_mathews1999.par')[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using Vienna RNA’s RNAplex executable.

Often one wishes to let the threshold depend on how many domains match between a pair of strands. The function rna_plex_strand_pairs_constraints_by_number_matching_domains() is useful for this purpose, returning a list of StrandPairsConstraint’s such as those returned by this function, one for each possible number of matching domains.

Parameters:

threshold (float) – Energy threshold in kcal/mol. If a float, this is used for all pairs of strands. If a dict[int, float], interpreted to mean that
temperature (float) – Temperature in Celsius.
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
description (str | None) – See Constraint.description
short_description (str) – See Constraint.short_description
parallel (bool) – Whether to test each pair of Strand’s in parallel.
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare; if not specified, checks all pairs in design.
parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as vienna_nupack.rna_plex_multiple()

Returns:

The StrandPairsConstraint.

Return type:

StrandPairsConstraint

constraints.rna_multifold_strand_pair_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_multifold_strand_pair', parallel=False, pairs=None, strands=None)[source]

Returns constraint that checks given pairs of Strand’s for excessive interaction using ViennaRNA’s multi-strand partition function (the analog of NUPACK’s pfunc()), via the Python RNA.fold_compound(...).pf() API. This is equivalent in spirit to the command-line RNAmultifold tool (the N-strand successor to RNAcofold).

Multi-strand support requires ViennaRNA >= 2.5.0.

Parameters:

threshold (float) – Energy threshold in kcal/mol.
temperature (float) – Temperature in Celsius.
weight (float) – See Constraint.weight.
score_transfer_function (Callable[[float], float] | None) – See Constraint.score_transfer_function.
description (str | None) – See Constraint.description
short_description (str) – See Constraint.short_description
parallel (bool) – Whether to test each pair of Strand’s in parallel.
pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of Strand’s to compare. Mutually exclusive with strands.
strands (Iterable[Strand] | None) – Strand’s to compare; all pairs (including each strand with itself) are checked. Mutually exclusive with pairs.

Returns:

The StrandPairConstraint.

Return type:

StrandPairConstraint

class constraints.ConstraintWithComplexes(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, complexes: 'tuple[Complex, ...]' = ())[source]

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
complexes (tuple[Complex, ...])

__init__(description, short_description='', weight=1.0, score_transfer_function=None, complexes=())

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
complexes (tuple[Complex, ...])

Return type:

None

complexes: tuple[Complex, ...] = (): List of complexes (tuples of Strand’s) to check.

class constraints.ComplexConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, complexes=())[source]

Constraint that applies to a complex (tuple of Strand’s).

Specify Constraint._evaluate in the constructor.

Unlike other types of Constraint’s such as StrandConstraint or StrandPairConstraint, there is no default list of Complex’s that a ComplexConstraint is applied to. The list of Complex’s must be specified manually in the constructor.

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
complexes (tuple[Complex, ...])

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, complexes=())

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
complexes (tuple[Complex, ...])

Return type:

None

part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

class constraints.ComplexesConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, complexes=())[source]

Similar to ComplexConstraint but operates on a specified list of complexes (tuples of Strand’s).

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
complexes (tuple[Complex, ...])

__init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, complexes=())

Parameters:

description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
complexes (tuple[Complex, ...])

Return type:

None

part_name()[source]

Returns name of the Part that this Constraint tests.

Returns:: name of the Part that this Constraint tests (e.g., “domain”, “strand pair”)
Return type:: str

constraints.default_interior_to_strand_probability = 0.98: Default probability threshold for BasePairType.INTERIOR_TO_STRAND

constraints.default_adjacent_to_exterior_base_pair = 0.95: Default probability threshold for BasePairType.ADJACENT_TO_EXTERIOR_BASE_PAIR

constraints.default_blunt_end_probability = 0.33: Default probability threshold for BasePairType.BLUNT_END

constraints.default_nick_3p_probability = 0.79: Default probability threshold for BasePairType.NICK_3P

constraints.default_nick_5p_probability = 0.73: Default probability threshold for BasePairType.NICK_5P

constraints.default_dangle_3p_probability = 0.51: Default probability threshold for BasePairType.DANGLE_3P

constraints.default_dangle_5p_probability = 0.57: Default probability threshold for BasePairType.DANGLE_5P

constraints.default_dangle_5p_3p_probability = 0.73: Default probability threshold for BasePairType.DANGLE_5P_3P

constraints.default_overhang_on_this_strand_3p_probability = 0.82: Default probability threshold for BasePairType.OVERHANG_ON_THIS_STRAND_3P

constraints.default_overhang_on_this_strand_5p_probability = 0.79: Default probability threshold for BasePairType.OVERHANG_ON_THIS_STRAND_5P

constraints.default_overhang_on_adjacent_strand_3p_probability = 0.55: Default probability threshold for BasePairType.OVERHANG_ON_ADJACENT_STRAND_3P

constraints.default_overhang_on_adjacent_strand_5p_probability = 0.49: Default probability threshold for BasePairType.OVERHANG_ON_ADJACENT_STRAND_5P

constraints.default_overhang_on_both_strand_3p_probability = 0.61: Default probability threshold for BasePairType.OVERHANG_ON_BOTH_STRANDS_3P

constraints.default_overhang_on_both_strand_5p_probability = 0.55: Default probability threshold for BasePairType.OVERHANG_ON_BOTH_STRANDS_5P

constraints.default_three_arm_junction_probability = 0.69: Default probability threshold for BasePairType.THREE_ARM_JUNCTION

constraints.default_four_arm_junction_probability = 0.84: Default probability threshold for BasePairType.FOUR_ARM_JUNCTION

constraints.default_five_arm_junction_probability = 0.77: Default probability threshold for BasePairType.FIVE_ARM_JUNCTION

constraints.default_mismatch_probability = 0.76: Default probability threshold for BasePairType.MISMATCH

constraints.default_bulge_loop_3p_probability = 0.69: Default probability threshold for BasePairType.BULGE_LOOP_3P

constraints.default_bulge_loop_5p_probability = 0.65: Default probability threshold for BasePairType.BULGE_LOOP_5P

constraints.default_unpaired_probability = 0.95: Default probability threshold for BasePairType.UNPAIRED

constraints.default_other_probability = 0.7: Default probability threshold for BasePairType.OTHER

class constraints.BasePairType(value)[source]

Represents different configurations for a base pair, and its immediate neighboring base pairs (or lack thereof).

Notation:

“#” indicates denotes the ends of a domain. They can either be the end of a strand or they could be connected to another domain.
“]” and “[” indicates 5’ ends of strand
“>” and “<” indicates 3’ ends of a strand
“-” indicates a base (number of these are not important).
“|” indicates a bases are bound (forming a base pair). Any “-” not connected by “|” is unbound

Domain Example:

The following represents an unbound domain of length 5

#-----#

The following represents bound domains of length 5

#-----#
 |||||
#-----#

Ocassionally, domains will be vertical in the case of overhangs. In this case, “-” and “|” have opposite meanings

Vertical Domain Example:

# #
|-|
|-|
|-|
|-|
|-|
# #

Formatting:

Top strands have 5’ end on left side and 3’ end on right side
Bottom strand have 3’ end on left side and 5’ end on right side

Strand Example:

strand0: a-b-c-d
strand1: d*-b*-c*-a*

            a      b      c      d
strand0  [-----##-----##-----##----->
          |||||  |||||  |||||  |||||
strand1  <-----##-----##-----##-----]
            a*     b*     c*     d*

Consecutive “#”:

In some cases, extra “#” are needed to make space for ascii art. We consider any consecutive “#”s to be equivalent “##”. The following is considered equivalent to the example above

            a       b        c      d
strand0  [-----###-----####-----##----->
          |||||   |||||    |||||  |||||
strand1  <-----###-----####-----##-----]
            a*      b*       c*     d*

Note that only consecutive “#”s is considered equivalent to “##”. The following example is not equivalent to the strands above because the “# #” between b and c are seperated by spaces, so they are not equivalent to “##”, meaning that b and c neednot be adjacent. Note that while b and c need not be adjacent, b* and c* are still adjacent because they are seperated by consecutive “#”s with no spaces in between.

            a       b        c      d
strand0  [-----###-----#  #-----##----->
          |||||   |||||    |||||  |||||
strand1  <-----###-----####-----##-----]
            a*      b*       c*     d*

INTERIOR_TO_STRAND = 1

Base pair is located inside of a strand but not next to a base pair that resides on the end of a strand.

Similar base-pairing probability compared to ADJACENT_TO_EXTERIOR_BASE_PAIR but usually breathes less.

#-----##-----#
 |||||  |||||
#-----##-----#
     ^
     |
 base pair

ADJACENT_TO_EXTERIOR_BASE_PAIR = 2

Base pair is located inside of a strand and next to a base pair that resides on the end of a strand.

Similar base-pairing probability compared to INTERIOR_TO_STRAND but usually breathes more.

#-----#
 |||||
#-----]
    ^
    |
base pair

or

#----->
 |||||
#-----#
    ^
    |
base pair

BLUNT_END = 3

Base pair is located at the end of both strands.

#----->
 |||||
#-----]
     ^
     |
 base pair

NICK_3P = 4

Base pair is located at a nick involving the 3’ end of the strand.

#----->[-----#
 |||||  |||||
#-----##-----#
     ^
     |
 base pair

NICK_5P = 5

Base pair is located at a nick involving the 3’ end of the strand.

#-----##-----#
 |||||  |||||
#-----]<-----#
     ^
     |
 base pair

DANGLE_3P = 6

Base pair is located at the end of a strand with a dangle on the 3’ end.

#-----##----#
 |||||
#-----]
     ^
     |
 base pair

DANGLE_5P = 7

Base pair is located at the end of a strand with a dangle on the 5’ end.

#----->
 |||||
#-----##----#
     ^
     |
 base pair

DANGLE_5P_3P = 8

Base pair is located with dangle at both the 3’ and 5’ end.

#-----##----#
 |||||
#-----##----#
     ^
     |
 base pair

OVERHANG_ON_THIS_STRAND_3P = 9

Base pair is next to a overhang on the 3’ end.

      #
      |
      |
      |
      #
#-----# #-----#
 |||||   |||||
#-----###-----#
     ^
     |
 base pair

OVERHANG_ON_THIS_STRAND_5P = 10

Base pair is next to a overhang on the 5’ end.

 base pair
     |
     v
#-----###-----#
 |||||   |||||
#-----# #-----#
      #
      |
      |
      |
      #

OVERHANG_ON_ADJACENT_STRAND_3P = 11

Base pair 3’ end interfaces with an overhang.

The adjacent base pair type is OVERHANG_ON_THIS_STRAND_5P

        #
        |
        |
        |
        #
#-----# #---#
 |||||   |||
#-----###---#
     ^
     |
 base pair

OVERHANG_ON_ADJACENT_STRAND_5P = 12

Base pair 5’ end interfaces with an overhang.

The adjacent base pair type is OVERHANG_ON_THIS_STRAND_3P

 base pair
     |
     v
#-----###-----#
 |||||   |||||
#-----# #-----#
        #
        |
        |
        |
        #

OVERHANG_ON_BOTH_STRANDS_3P = 13

Base pair’s 3’ end is an overhang and adjacent strand also has an overhang.

      # #
      | |
      | |
      | |
      # #
#-----# #---#
 |||||  ||||
#-----###---#
     ^
     |
 base pair

OVERHANG_ON_BOTH_STRANDS_5P = 14

Base pair’s 5’ end is an overhang and adjacent strand also has an overhang.

 base pair
     |
     v
#-----###-----#
 |||||   |||||
#-----# #-----#
      # #
      | |
      | |
      | |
      # #

THREE_ARM_JUNCTION = 15

Base pair is located next to a three-arm-junction.

      # #
      |-|
      |-|
      |-|
      # #
#-----# #---#
 |||||  ||||
#-----###---#
     ^
     |
 base pair

FOUR_ARM_JUNCTION = 16

Currently, this case isn’t actually detected (considered as OTHER).

Base pair is located next to a four-arm-junction (e.g. Holliday junction).

      # #
      |-|
      |-|
      |-|
      # #
#-----# #-----#
 |||||   |||||
#-----# #-----#
      # #
      |-|
      |-|
      |-|
      # #

Type:: TODO

FIVE_ARM_JUNCTION = 17

Currently, this case isn’t actually detected (considered as OTHER).

Base pair is located next to a five-arm-junction.

Type:: TODO

MISMATCH = 18

Currently, this case isn’t actually detected (considered as DANGLE_5P_3P).

Base pair is located next to a mismatch.

#-----##-##-----#
 |||||     |||||
#-----##-##-----#
     ^
     |
 base pair

Type:: TODO

BULGE_LOOP_3P = 19

Currently, this case isn’t actually detected (considered as OVERHANG_ON_BOTH_STRANDS_3P).

Base pair is located next to a mismatch.

#-----##-##-----#
 |||||     |||||
#-----#####-----#
     ^
     |
 base pair

Type:: TODO

BULGE_LOOP_5P = 20

Currently, this case isn’t actually detected (considered as OVERHANG_ON_BOTH_STRANDS_5P).

Base pair is located next to a mismatch.

#-----#####-----#
 |||||     |||||
#-----##-##-----#
     ^
     |
 base pair

Type:: TODO

UNPAIRED = 21

Base is unpaired.

Probabilities specify how unlikely a base is to be paired with another base.

OTHER = 22: Other base pair types.

class constraints.StrandDomainAddress(strand, domain_idx)[source]

An addressing scheme for specifying a domain on a strand.

Parameters:

strand (Strand)
domain_idx (int)

__init__(strand, domain_idx)

Parameters:

strand (Strand)
domain_idx (int)

Return type:

None

strand: Strand: strand to index

domain_idx: int: order in which domain appears in StrandDomainAddress.strand

neighbor_5p()[source]

Returns 5’ domain neighbor. If domain is 5’ end of strand, returns None

Returns:: StrandDomainAddress of 5’ neighbor or None if no 5’ neighbor
Return type:: StrandDomainAddress | None

neighbor_3p()[source]

Returns 3’ domain neighbor. If domain is 3’ end of strand, returns None

Returns:: StrandDomainAddress of 3’ neighbor or None if no 3’ neighbor
Return type:: StrandDomainAddress | None

domain()[source]

Returns domain referenced by this address.

Returns:: domain
Return type:: Domain

constraints.BaseAddress = typing.Union[int, tuple[constraints.StrandDomainAd...

Represents a reference to a base. Can be either specified as a NUPACK base index or an index of a nuad StrandDomainAddress:

alias of int | tuple[StrandDomainAddress, int]

constraints.BasePairAddress: Represents a reference to a base pair

constraints.BoundDomains: Represents bound domains

constraints.nupack_complex_base_pair_probability_constraint(strand_complexes, nonimplicit_base_pairs=None, all_base_pairs=None, base_pair_prob_by_type=None, base_pair_prob_by_type_upper_bound=None, base_pair_prob=None, base_unpaired_prob=None, base_pair_prob_upper_bound=None, base_unpaired_prob_upper_bound=None, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, description=None, short_description='ComplexBPProbs', parallel=False)[source]

Returns constraint that checks given base pair probabilities in tuples of Strand’s

Parameters:

strand_complexes (list[Complex]) – Iterable of Strand tuples
nonimplicit_base_pairs (Optional[Iterable[BoundDomains]]) –
list of nonimplicit base pairs that cannot be inferred because multiple instances of the same Domain exist in complex.

The StrandDomainAddress.strand field of each address should reference a strand in the first complex in strand_complexes.

For example, if one Strand has one T Domain and another strand in the complex has two T* Domain s, then the intended binding graph cannot be inferred and must be stated explicitly in this field.
all_base_pairs (Optional[Iterable[BoundDomains]]) –
list of all base pairs in complex. If not provided, then base pairs are infered based on the name of Domain s in the complex as well as base pairs specified in nonimplicit_base_pairs.

TODO: This has not been implemented yet, and the behavior is as if this parameter is always None (binding graph is always inferred).
base_pair_prob_by_type (Optional[dict[BasePairType, float]]) –
Probability lower bounds for each BasePairType. All BasePairType comes with a default such as default_interior_to_strand_probability which will be used if a lower bound is not specified for a particular type.

Note: Despite the name of this parameter, set thresholds for unpaired bases by specifying a threshold for BasePairType.UNPAIRED.
base_pair_prob_by_type_upper_bound (dict[BasePairType, float], optional) –
Probability upper bounds for each BasePairType. By default, no upper bound is set.

Note: Despite the name of this parameter, set thresholds for unpaired bases by specifying a threshold for BasePairType.UNPAIRED.

TODO: This has not been implemented yet.
base_pair_prob (Optional[dict[BasePairAddress, float]]) –
Probability lower bounds for each BasePairAddress which takes precedence over probabilities specified by base_pair_prob_by_type.

TODO: This has not been implemented yet.
base_unpaired_prob (Optional[dict[BaseAddress, float]]) – Probability lower bounds for each BaseAddress representing unpaired bases. These lower bounds take precedence over the probability specified by base_pair_prob_by_type[BasePairType.UNPAIRED].
base_pair_prob_upper_bound (Optional[dict[BasePairAddress, float]]) – Probability upper bounds for each :py:class`BasePairAddress` which takes precedence over probabilties specified by base_pair_prob_by_type_upper_bound.
base_unpaired_prob_upper_bound (Optional[dict[BaseAddress, float]]) – Probability upper bounds for each BaseAddress representing unpaired bases. These lower bounds take precedence over the probability specified by base_pair_prob_by_type_upper_bound[BasePairType.UNPAIRED].
temperature (float, optional) – Temperature specified in °C, defaults to vienna_nupack.default_temperature.
sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
weight (float, optional) – See Constraint.weight, defaults to 1.0
score_transfer_function (Callable[[float], float], optional) – Score transfer function to use. By default, f(x) = x**2 is used, where x is the sum of the squared errors of each base pair that violates the threshold.
description (str | None, optional) – See Constraint.description, defaults to None
short_description (str, optional) – See Constraint.short_description defaults to ‘complex_secondary_structure_nupack’
parallel (bool, optional) – TODO: Implement this

Raises:

ImportError – If NUPACK 4 is not installed.
ValueError –
If strand_complexes is not valid. In order for strand_complexes to be valid, strand_complexes must:
- Consist of complexes (tuples of Strand objects)
- Each complex must be of the same motif
  - Same number of Strand s in each complex
  - Same number of Domain s in each Strand
  - Same number of bases in each Domain

Returns:

ComplexConstraint

Return type:

ComplexConstraint

nuad.modifications

class modifications.ModificationType(value)[source]

Type of modification (5’, 3’, or internal).

five_prime = "5'": 5’ modification type

three_prime = "5'": 3’ modification type

internal = 'internal': internal modification type

class modifications.Modification(vendor_code, id='WARNING: no id assigned to modification')[source]

Abstract case class of modifications (to DNA sequences, e.g., biotin or Cy3). Use concrete subclasses Modification3Prime, Modification5Prime, or ModificationInternal to instantiate.

If Modification.id is not specified, then Modification.vendor_code is used as the unique ID. Each Modification.id must be unique. For example if you create a 5’ “modification” to represent 6 T bases: t6_5p = Modification5Prime(display_text='6T', idt_text='TTTTTT') (this is a useful hack for putting single-stranded extensions on strands until loopouts on the end of a strand are supported; see https://github.com/UC-Davis-molecular-computing/scadnano-python-package/issues/2), then this would clash with a similar 3’ modification without specifying unique IDs for them: t6_3p = Modification3Prime(display_text='6T', vendor_code='TTTTTT') # ERROR.

In general it is recommended to create a single Modification object for each type of modification in the design. For example, if many strands have a 5’ biotin, then it is recommended to create a single Modification object and re-use it on each strand with a 5’ biotin:

biotin_5p = Modification5Prime(display_text='B', vendor_code='/5Biosg/')
design.strand(0, 0).move(8).with_modification_5p(biotin_5p)
design.strand(1, 0).move(8).with_modification_5p(biotin_5p)

Parameters:

vendor_code (str)
id (str)

vendor_code: str: Text string specifying this modification (e.g., ‘/5Biosg/’ for 5’ biotin). optional

id: str = 'WARNING: no id assigned to modification': Representation as a string; used to write in Strand json representation, while the full description of the modification is written under a global key in the Design. If not specified, but Modification.idt_text is specified, then it will be set equal to that.

__init__(vendor_code, id='WARNING: no id assigned to modification')

Parameters:

vendor_code (str)
id (str)

Return type:

None

class modifications.Modification5Prime(vendor_code, id='WARNING: no id assigned to modification')[source]

5’ modification of DNA sequence, e.g., biotin or Cy3.

Parameters:

vendor_code (str)
id (str)

__init__(vendor_code, id='WARNING: no id assigned to modification')

Parameters:

vendor_code (str)
id (str)

Return type:

None

class modifications.Modification3Prime(vendor_code, id='WARNING: no id assigned to modification')[source]

3’ modification of DNA sequence, e.g., biotin or Cy3.

Parameters:

vendor_code (str)
id (str)

__init__(vendor_code, id='WARNING: no id assigned to modification')

Parameters:

vendor_code (str)
id (str)

Return type:

None

class modifications.ModificationInternal(vendor_code, id='WARNING: no id assigned to modification', allowed_bases=None)[source]

Internal modification of DNA sequence, e.g., biotin or Cy3.

Parameters:

vendor_code (str)
id (str)
allowed_bases (AbstractSet[str] | None)

__init__(vendor_code, id='WARNING: no id assigned to modification', allowed_bases=None)

Parameters:

vendor_code (str)
id (str)
allowed_bases (AbstractSet[str] | None)

Return type:

None

allowed_bases: AbstractSet[str] | None = None: If None, then this is an internal modification that goes between bases. If instead it is a list of bases, then this is an internal modification that attaches to a base, and this lists the allowed bases for this internal modification to be placed at. For example, internal biotins for IDT must be at a T. If any base is allowed, it should be ['A','C','G','T'].

nuad.vienna_nupack

Contains utility functions for accessing NUPACK 4 and ViennaRNA energy calculation algorithms.

The main functions are pfunc() (for calculating complex free energy with NUPACK, along with its helper functions secondary_structure_single_strand() and binding()), nupack_complex_base_pair_probabilities() (for calculating base pair probabilities with NUPACK), rna_duplex_multiple() (for calculating an approximation to two-strand complex free energy that is much faster than calling pfunc() on the same pair of strands), and rna_plex_multiple() (which is even faster than rna_duplex_multiple()).

vienna_nupack.calculate_strand_association_penalty(temperature, num_seqs)[source]

Additive adjustment factor to convert NUPACK’s mole fraction units to molarity.

For details on why this is needed for multi-stranded complexes, see Section S1.1 of http://www.nupack.org/downloads/serve_public_file/fornace20_supp.pdf?type=pdf and Figure 2 of http://www.nupack.org/downloads/serve_public_file/nupack_user_guide_3.2.2.pdf?type=pdf

Parameters:

temperature (float) – temperature in Celsius
num_seqs (int) – number of sequences

Returns:

Additive adjustment factor to convert NUPACK’s mole fraction units to molar.

Return type:

float

vienna_nupack.pfunc(seqs, temperature=37, sodium=0.05, magnesium=0.0125, strand_association_penalty=True)[source]

Calls pfunc from NUPACK 4 (http://www.nupack.org/) on a complex consisting of the unique strands in seqs, returns energy (“delta G”), i.e., generally a negative number.

By default, a strand association penalty is applied that is not applied by NUPACK’s pfunc. See strand_association_penalty parameter documentation for details.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

seqs (S | Iterable[S]) – DNA sequences (tuple, or a single DNA sequence), whose order indicates a cyclic permutation of the complex. For one or two sequences, there is only one cyclic permutation, so the order doesn’t matter in such cases.
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium in moles per liter
magnesium (float) – molarity of magnesium in moles per liter
strand_association_penalty (bool) – Add strand association penalty for a complex, related to converting NUPACK’s mole fraction units to molarity. The quantity added is that returned by calculate_strand_association_penalty() with parameters temperature and len(seqs). For most constraints, which involve only one size of complex, this factor won’t matter other than to adjust the energy threshold by the same factor. The factor depends only on the number of strands in seqs, but not on their sequences. However, this factor is needed for a meaningful comparison of energies between complexes of different sizes, e.g., to calculate equilibrium concentrations of complexes of various sizes. For details on why this is needed for multi-stranded complexes, see Section S1.1 of http://www.nupack.org/downloads/serve_public_file/fornace20_supp.pdf?type=pdf and Figure 2 of http://www.nupack.org/downloads/serve_public_file/nupack_user_guide_3.2.2.pdf?type=pdf

Returns:

complex free energy (“delta G”) of ordered complex with strands in given cyclic permutation

Return type:

float

vienna_nupack.nupack_complex_base_pair_probabilities(strand_complex, temperature=37, sodium=0.05, magnesium=0.0125)[source]

Calculates base-pair probabilities according to NUPACK 4.

Parameters:

strand_complex (Complex) – Ordered tuple of strands in complex (specifying a particular circular ordering, which is imposed on all considered secondary structures)
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium in moles per liter
magnesium (float) – molarity of magnesium in moles per liter

Returns:

2D Numpy array of floats, with result[i1][i2] giving the base-pair probability of base at position i1 with base at position i2 (if i1 != i2), where i1 and i2 are the absolute positions of the bases in the entire ordered list of strands. For example, with strands AAAA and TTTTT, there are nine indices 0,1,2,3,4,5,6,7,8, with positions 0,1,2,3 on the first strand AAAA, and positions 4,5,6,7,8 on the second strand TTTTT. If i1 == i2, then result[i1][i1] is the probability that the base at position i1 is unpaired.

Return type:

ndarray

vienna_nupack.call_subprocess(command_strs, user_input)[source]

Calls system command through a subprocess. Assumes running on a POSIX operating system.

If running on Windows, automatically appends “wsl -e” to start of command to call command through Windows subsystem for Linux, so wsl must be installed for this to work: https://docs.microsoft.com/en-us/windows/wsl/install-win10

Parameters:

command_strs (List[str]) – List of command and command line arguments, i.e., to call ls -l -a, command_strs should be the list [‘ls’, ‘-l’, ‘-a’].
user_input (str) – Input to give once program is running (i.e., what would user type).

Returns:

pair of strings (output, error), giving the strings written to stdout and stderr, respectively.

Return type:

tuple[str, str]

vienna_nupack.rna_duplex(seq1, seq2, temperature=37, max_energy=0.0, gu_wobble=False)[source]

TODO

Parameters:

seq1 (S)
seq2 (S)
temperature (float)
max_energy (float)
gu_wobble (bool)

Return type:

float

vienna_nupack.rna_duplex_multiple(pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0, gu_wobble=False, progress_bar=False)[source]

Calls RNA.duplexfold (from ViennaRNA Python package: https://www.tbi.univie.ac.at/RNA/ViennaRNA/refman/api_python.html) on a list of pairs, specifically: [ (seq1, seq2), (seq3, seq4), (seq5, seq6), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.

Parameters:

pairs (Sequence[tuple[S, S]]) – sequence (list or tuple) of pairs of DNA sequences
temperature (float) – temperature in Celsius
max_energy (float) – This is the maximum energy possible to assign. If RNAduplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAduplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.
gu_wobble (bool) – Whether to allow GU wobble pairs; I honestly don’t know what this means for DNA, but it’s an option in RNAduplex, and it makes a difference in calculating the energy.
logger (Logger)
parameters_filename (str)
progress_bar (bool)

Returns:

list of free energies, in the same order as pairs

Return type:

tuple[float, …]

vienna_nupack.rna_duplex_multiple_deprecated(pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Calls RNAduplex (from ViennaRNA package: https://www.tbi.univie.ac.at/RNA/) on a list of pairs, specifically: [ (seq1, seq2), (seq3, seq4), (seq5, seq6), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.

Parameters:

pairs (Sequence[tuple[S, S]]) – sequence (list or tuple) of pairs of DNA sequences
logger (Logger) – logger to use for printing error messages
temperature (float) – temperature in Celsius
parameters_filename (str) – name of parameters file for ViennaRNA
max_energy (float) – This is the maximum energy possible to assign. If RNAduplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAduplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.

Returns:

list of free energies, in the same order as pairs

Return type:

tuple[float, …]

vienna_nupack.rna_duplex_multiple_parallel(thread_pool, pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Parallel version of rna_duplex_multiple(). TODO document this

Parameters:

thread_pool (ThreadPool)
pairs (Sequence[tuple[S, S]])
logger (Logger)
temperature (float)
parameters_filename (str)
max_energy (float)

Return type:

tuple[float, …]

vienna_nupack.rna_plex_multiple(pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Calls RNAplex (from ViennaRNA package: https://www.tbi.univie.ac.at/RNA/) on a list of pairs, specifically: [ (seq1, seq2), (seq2, seq3), (seq4, seq5), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.

RNAplex is supposedly faster than RNAduplex, but less accurate since, compared to RNAduplex, “ loop energy is an affine function of the loop size instead of a logarithmic function” (https://doi.org/10.1093/bioinformatics/btn193). However, in testing with random sequences, RNAplex can actually be SLOWER, for instance with 1000 pairs of DNA sequences each of length 64, RNAduplex takes 1.93 s on average, compared to 2.28 s for RNAplex. (tested using %timeit in Jupyter lab) This is with default parameters; using “-f 1” speeds up the computing time by about factor 5; in this case RNAplex takes only 0.41 s on average instead of 2.28 s.

Parameters:

pairs (Sequence[tuple[S, S]]) – sequence (list or tuple) of pairs of DNA sequences
logger (Logger) – logger to use for printing error messages
temperature (float) – temperature in Celsius
parameters_filename (str) – name of parameters file for NUPACK
max_energy (float) – This is the maximum energy possible to assign. If RNAplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.

Returns:

list of free energies, in the same order as pairs

Return type:

tuple[float, …]

vienna_nupack.nupack_multiple_with_sodium_magnesium(sodium=0.05, magnesium=0.0125)[source]

Used when we want a BulkConstraint using NUPACK (even though most of them are SingularConstraint’s).

Calls NUPACK (specifically, the function binding()) on a list of pairs: [ (seq1, seq2), (seq2, seq3), (seq4, seq5), … ] where seqi is a string over {A,C,T,G}.

Parameters:

sodium (float) – molarity of sodium in moles per liter
magnesium (float) – molarity of magnesium in moles per liter

Returns:

list of free energies, in the same order as pairs

Return type:

Callable[[Sequence[tuple[S, S]], Logger, float, str, float], tuple[float, …]]

vienna_nupack.rna_plex_multiple_parallel(thread_pool, pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]

Parallel version of rna_plex_multiple(). TODO document this

Parameters:

thread_pool (ThreadPool)
pairs (Sequence[tuple[S, S]])
logger (Logger)
temperature (float)
parameters_filename (str)
max_energy (float)

Return type:

tuple[float, …]

vienna_nupack.rna_multifold(seqs, temperature=37, max_energy=0.0, gu_wobble=False)[source]

Computes the complex free energy of an arbitrary tuple of DNA sequences using ViennaRNA’s multi-strand partition function (the analog of NUPACK’s pfunc()). Internally calls RNA.fold_compound("&".join(seqs)).pf().

Multi-strand (>2) support requires ViennaRNA >= 2.5.0. For two strands this is equivalent to RNAcofold’s partition-function output; for one strand it is equivalent to RNAfold -p.

Parameters:

seqs (Sequence[S]) – tuple/list of DNA sequences forming a complex (any length >= 1)
temperature (float) – temperature in Celsius
max_energy (float) – Maximum energy to return. If ViennaRNA reports an energy larger than this (e.g., 100000 when no base pairs are possible between strands like CCCC and TTTT), it is clamped to max_energy.
gu_wobble (bool) – Whether to allow GU wobble pairs.

Returns:

complex free energy in kcal/mol

Return type:

float

vienna_nupack.rna_multifold_multiple(seq_tuples, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0, gu_wobble=False)[source]

Bulk wrapper around rna_multifold(): computes the complex free energy of each tuple of sequences in seq_tuples. Each element of seq_tuples may have any length >= 1.

Provided so callers expecting the same signature as rna_duplex_multiple() / rna_plex_multiple() can use multi-strand free energies. The parameters_filename argument is accepted for signature parity but ignored; the Mathews 2004 DNA parameters are loaded once via load_params_viennarna().

Parameters:

seq_tuples (Sequence[Sequence[S]])
logger (Logger)
temperature (float)
parameters_filename (str)
max_energy (float)
gu_wobble (bool)

Return type:

tuple[float, …]

vienna_nupack.wc(seq)[source]

Return reverse complement of seq. Alias for rc().

Deprecated: use rc() instead.

Parameters:: seq (str)
Return type:: str

vienna_nupack.rc(seq)[source]

Return reverse complement of seq.

Parameters:: seq (str)
Return type:: str

vienna_nupack.free_energy_single_strand(seq, temperature=37, sodium=0.05, magnesium=0.0125)[source]

Computes the “complex free energy” (https://docs.nupack.org/definitions/#complex-free-energy) of a single strand according to NUPACK.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

seq (str)
temperature (float)
sodium (float)
magnesium (float)

Return type:

float

vienna_nupack.binding_complement(seq, temperature=37, sodium=0.05, magnesium=0.0125, subtract_indv=True)[source]

Computes the complex free energy of a strand with its perfect Watson-Crick complement.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

seq (str)
temperature (float)
sodium (float)
magnesium (float)
subtract_indv (bool)

Return type:

float

vienna_nupack.binding(seq1, seq2, *, temperature=37, sodium=0.05, magnesium=0.0125)[source]

Computes the complex free energy of association between two strands.

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

seq1 (str)
seq2 (str)
temperature (float)
sodium (float)
magnesium (float)

Return type:

float

vienna_nupack.random_dna_seq(length, bases='ACTG')[source]

Chooses a random DNA sequence.

Parameters:

length (int)
bases (Sequence)

Return type:

str

vienna_nupack.domain_orthogonal(seq, seqs, temperature, sodium, magnesium, orthogonality, orthogonality_ave=-1, parallel=False)[source]

test orthogonality of domain with all others and their wc complements

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

seq (str)
seqs (Sequence[str])
temperature (float)
sodium (float)
magnesium (float)
orthogonality (float)
orthogonality_ave (float)
parallel (bool)

Return type:

bool

vienna_nupack.domain_pairwise_concatenated_no_sec_struct(seq, seqs, temperature, sodium, magnesium, concat, concat_ave=-1, parallel=False)[source]

test lack of secondary structure in concatenated domains

NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.

Parameters:

seq (str)
seqs (Sequence[str])
temperature (float)
sodium (float)
magnesium (float)
concat (float)
concat_ave (float)
parallel (bool)

Return type:

bool

vienna_nupack.domain_concatenated_no_4gc(seq, seqs)[source]

prevent {G,C}^4 under concatenation

Parameters:

seq (str)
seqs (Sequence[str])

Return type:

bool

vienna_nupack.domain_no_4gc(seq)[source]

prevent {G,C}^4

Parameters:: seq (str)
Return type:: bool

vienna_nupack.domain_concatenated_no_4g_or_4c(seq, seqs)[source]

prevent G^4 and C^4 under concatenation

Parameters:

seq (str)
seqs (Sequence[str])

Return type:

bool

nuad.np

Library for doing sequence design that can be expressed as linear algebra operations for rapid processing by numpy (e.g., generating all DNA sequences of a certain length and calculating all their full duplex binding energies in the nearest neighbor model and filtering those outside a given range).

Based on the DNA single-stranded tile (SST) sequence designer used in the following publication.

“Diverse and robust molecular algorithms using reprogrammable DNA self-assembly” Woods*, Doty*, Myhrvold, Hui, Zhou, Yin, Winfree. (*Joint first co-authors)

np.idx2seq(idx, length)[source]

Return the lexicographic idx’th DNA sequence of given length.

Parameters:

idx (int)
length (int)

Return type:

str

np.seq2arr(seq, base2bits_local=None)[source]

Convert seq (string with DNA alphabet) to numpy array with integers 0,1,2,3.

Parameters:

seq (str)
base2bits_local (dict[str, int] | None)

Return type:

np.ndarray

np.seqs2arr(seqs)[source]

Return numpy 2D array converting the given DNA sequences to integers.

Parameters:: seqs (Sequence[str])
Return type:: ndarray

np.arr2seqs(arr)[source]

Return list of strings converting the given numpy array of integers to DNA sequences.

Parameters:: arr (ndarray)
Return type:: list[str]

np.arr2seq(arr)[source]

Return string converting the given numpy array of integers to DNA sequence.

Parameters:: arr (ndarray)
Return type:: str

np.make_array_with_all_dna_seqs(length, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with all DNA sequences of given length in lexicographic order. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Parameters:

length (int)
bases (Collection[str])

Return type:

ndarray

np.make_array_with_random_subset_of_dna_seqs(length, num_random_seqs, rng=Generator(PCG64) at 0x7DC02407E4A0, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with random subset of size num_seqs of DNA sequences of given length. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Sequences returned will be unique (i.e., sampled without replacement) and in a random order

Parameters:

length (int) – length of each row
num_random_seqs (int) – number of rows
bases (Collection[str]) – DNA bases to use
rng (Generator) – numpy random number generator (type returned by numpy.random.default_rng())

Returns:

2D numpy array with random subset of size num_seqs of DNA sequences of given length

Return type:

ndarray

np.make_array_with_all_dna_seqs_hamming_distance(dist, seq, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with all DNA sequences of given length in lexicographic order. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Parameters:

dist (int)
seq (str)
bases (Collection[str])

Return type:

ndarray

np.make_array_with_random_subset_of_dna_seqs_hamming_distance(num_seqs, dist, seq, rng=Generator(PCG64) at 0x7DC02407E4A0, bases=('A', 'C', 'G', 'T'))[source]

Return 2D numpy array with random subset of size num_seqs of DNA sequences of given length. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.

Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.

Sampled with replacement, so the same row may appear twice in the returned array

Parameters:

num_seqs (int) – number of sequences to generate
dist (int) – Hamming distance to be from seq
seq (str) – sequence to generate other sequences close to
bases (Collection[str]) – DNA bases to use
rng (Generator) – numpy random number generator (type returned by numpy.random.default_rng())

Returns:

2D numpy array with random subset of size num_seqs of DNA sequences of given length

Return type:

ndarray

np.longest_common_substring(a1, a2, vectorized=True)[source]

Return start and end indices (a1start, a2start, length) of longest common substring (subarray) of 1D arrays a1 and a2.

Parameters:

a1 (ndarray)
a2 (ndarray)
vectorized (bool)

Return type:

tuple[int, int, int]

np.longest_common_substrings_singlea1(a1, a2s)[source]

Return start and end indices (a1starts, a2starts, lengths) of longest common substring (subarray) of 1D array a1 and rows of 2D array a2s.

If length[i]=0, then a1starts[i]=a2starts[i]=0 (not -1), so be sure to check length[i] to see if any substrings actually matched.

Parameters:

a1 (ndarray)
a2s (ndarray)

Return type:

tuple[ndarray, ndarray, ndarray]

np.longest_common_substrings_product(a1s, a2s)[source]

Return start and end indices (a1starts, a2starts, lengths) of longest common substring (subarray) of each pair in the cross product of rows of a1s and a2s.

If length[i]=0, then a1starts[i]=a2starts[i]=0 (not -1), so be sure to check length[i] to see if any substrings actually matched.

Parameters:

a1s (ndarray)
a2s (ndarray)

Return type:

tuple[ndarray, ndarray, ndarray]

np.longest_common_substrings_all_pairs_strings(seqs1, seqs2)[source]

For Python strings

Parameters:

seqs1 (Sequence[str])
seqs2 (Sequence[str])

Return type:

tuple[ndarray, ndarray, ndarray]

np.strongest_common_substrings_all_pairs_string(seqs1, seqs2, temperature)[source]

For Python strings representing DNA; checks for reverse complement matches rather than direct matches, and evaluates nearest neighbor energy, returning indices lengths, and energies of strongest complementary substrings.

Parameters:

seqs1 (Sequence[str])
seqs2 (Sequence[str])
temperature (float)

Return type:

tuple[list[float], list[float], list[float], list[float]]

class np.DNASeqList(length=None, num_random_seqs=None, shuffle=False, alphabet=('A', 'C', 'G', 'T'), seqs=None, seqarr=None, filename=None, rng=Generator(PCG64) at 0x7DC02407E4A0, hamming_distance_from_sequence=None)[source]

Represents a list of DNA sequences of identical length. The sequences are stored as a 2D numpy array of bytes DNASeqList.seqarr. Each byte represents a single DNA base (so it is not a compact representation; the most significant 6 bits of the byte will always be 0).

Parameters:

length (int | None)
num_random_seqs (int | None)
shuffle (bool)
alphabet (Collection[str])
seqs (Sequence[str] | None)
seqarr (ndarray)
filename (str | None)
rng (Generator)
hamming_distance_from_sequence (tuple[int, str] | None)

__init__(length=None, num_random_seqs=None, shuffle=False, alphabet=('A', 'C', 'G', 'T'), seqs=None, seqarr=None, filename=None, rng=Generator(PCG64) at 0x7DC02407E4A0, hamming_distance_from_sequence=None)[source]

Creates a set of DNA sequences, all of the same length.

Create either all sequences of a given length if seqs is not specified, or all sequences in seqs if seqs is specified. If neither is specified then all sequences of length 3 are created.

Exactly one of the following should be specified:

length (possibly along with alphabet and num_random_seqs)
seqs
seqarr
filename
hamming_distance_from_sequence (possibly along with alphabet and num_random_seqs)

Parameters:

length (int | None) – length of sequences; num_seqs and alphabet can also be specified along with it
hamming_distance_from_sequence (tuple[int, str] | None) – if specified and equal to (dist, seq) of type (int, str), then only sequences at Hamming distance dist from seq will be generated. Raises error if length, seqs, seqarr, or filename is specified.
num_random_seqs (int | None) – number of sequences to generate; if not specified, then all sequences of length length using bases from alphabet are generated. Sequences are sampled with replacement, so the same sequence may appear twice.
shuffle (bool) – whether to shuffle sequences
alphabet (Collection[str]) – a subset of {‘A’, ‘C’, ‘G’, ‘T’}
seqs (Sequence[str] | None) – sequence (e.g., list or tuple) of strings, all of the same length
seqarr (np.ndarray | None) – 2D NumPy array, with axis 0 moving between sequences, and axis 1 moving between consecutive DNA bases in a sequence
filename (str | None) – name of file containing a DNASeqList as written by DNASeqList.write_to_file()
rng (np.random.Generator) – numpy random number generator (type returned by numpy.random.default_rng())

rng: Generator: Random number generator to use.

seqarr: ndarray

Uses a (noncompact) internal representation using 8 bits (1 byte, dtype = np.ubyte) per base, stored in a numpy 2D array of bytes. Each row (axis 0) is a DNA sequence, and each column (axis 1) is a base in a sequence.

The code used is \(A \to 0, C \to 1, G \to 2, T \to 3\).

seqlen: int: Length of each DNA sequence (number of columns, axis 1, in DNASeqList.seqarr)

numseqs: int: Number of DNA sequences (number of rows, axis 0, in DNASeqList.seqarr)

random_choice(num, rng=Generator(PCG64) at 0x7DC02407E4A0, replace=False)[source]

Returns random choice of num DNA sequence(s) (represented as list of Python strings).

Parameters:

num (int) – number of sequences to sample
rng (Generator) – random number generator to use
replace (bool) – whether to sample with replacement

Returns:

sampled sequences

Return type:

list[str]

random_sequence(rng=Generator(PCG64) at 0x7DC02407E4A0)[source]

Returns random DNA sequence (represented as Python string).

Returns:: sampled sequence
Parameters:: rng (Generator)
Return type:: str

write_to_file(filename)[source]

Writes text file describing DNA sequence list, in format

numseqs seqlen seq1 seq2 seq3 …

where numseqs, seqlen are integers, and seq1, … are strings from {A,C,G,T}

Parameters:: filename (str)
Return type:: None

concatenate(other)[source]

Creates a new DNASeqList by concatenating this with another.

They must have the same sequence DNASeqList.numseqs.

Parameters:: other (DNASeqList)
Return type:: DNASeqList

to_list()[source]

Return list of strings representing the sequences, e.g. [‘ACG’,’TAA’]

Return type:: list[str]

get_seq_str(idx)[source]

Return idx’th DNA sequence as a string.

Parameters:: idx (int)
Return type:: str

get_seqs_str_list(slice_)[source]

Return a list of strings specified by slice.

Parameters:: slice_ (slice)
Return type:: list[str]

keep_seqs_at_indices(indices)[source]

Keeps only sequences at the given indices.

Parameters:: indices (Iterable[int])
Return type:: None

pop()[source]

Remove and return last seq, as a string.

Return type:: str

pop_array()[source]

Remove and return last seq, as a string.

Return type:: ndarray

hamming_map(sequence)[source]

Return dict mapping each length d to a DNASeqList of sequences that are Hamming distance d from seq.

Parameters:: sequence (str)
Return type:: dict[int, DNASeqList]

sublist(start, end=None)[source]

Return sublist of DNASeqList from start, inclusive, to end, exclusive.

If end is not specified, goes until the end of the list.

Parameters:

start (int)
end (int | None)

Return type:

DNASeqList

filter_energy(low, high, temperature)[source]

Return new DNASeqList with seqs whose wc complement energy is within the given range.

Parameters:

low (float)
high (float)
temperature (float)

Return type:

DNASeqList

energies(temperature)[source]

Calculates the nearest-neighbor binding energy of each sequence with its perfect complement (summing over all length-2 substrings of the domain’s sequence), using parameters from the 2004 Santa-Lucia and Hicks paper (https://www.annualreviews.org/doi/abs/10.1146/annurev.biophys.32.110601.141800, see Table 1, and example on page 419).

This is used by NearestNeighborEnergyFilter to calculate the energy of domains when filtering.

Parameters:: temperature (float) – temperature in Celsius
Returns:: array of nearest-neighbor energies of each sequence with its perfect Watson-Crick complement
Return type:: ndarray

filter_end_gc()[source]

Remove any sequence with A or T on the end. Also remove domains that do not have an A or T either next to that base, or one away. Otherwise we could get a domain ending in {C,G}^3, which, placed next to any domain ending in C or G, will create a substring in {C,G}^4 and be rejected if we are filtering those.

Return type:: DNASeqList

filter_end_at(gc_near_end=False)[source]

Remove any sequence with C or G on the end. Also, if gc_near_end is True, remove domains that do not have an C or G either next to that base, or one away, to prevent breathing.

Parameters:: gc_near_end (bool)
Return type:: DNASeqList

filter_base_nowhere(base)[source]

Remove any sequence that has given base anywhere.

Parameters:: base (str)
Return type:: DNASeqList

filter_base_count(base, low, high)[source]

Remove any sequence not satisfying low <= #base <= high.

Parameters:

base (str)
low (int)
high (int)

Return type:

DNASeqList

filter_base_at_pos(pos, base)[source]

Remove any sequence that does not have given base at position pos.

Parameters:

pos (int)
base (str)

Return type:

DNASeqList

np.create_toeplitz(seqlen, sublen, indices=None)[source]

Creates a toeplitz matrix, useful for finding subsequences.

seqlen is length of larger sequence; sublen is length of substring we’re checking for. If indices is None, then all rows are created, otherwise only rows for checking those indices are created.

Parameters:

seqlen (int)
sublen (int)
indices (Sequence[int] | None)

Return type:

np.ndarray

np.calculate_loop_energies(temperature, negate=False)[source]

Get SantaLucia and Hicks nearest-neighbor loop energies for given temperature, 1 M Na+.

Parameters:

temperature (float)
negate (bool)

Return type:

ndarray

np.wcenergy(seq, temperature, negate=False)[source]

Return the wc energy of seq binding to its complement.

Parameters:

seq (str)
temperature (float)
negate (bool)

Return type:

float

np.calculate_wc_energies(seqarr, temperature, negate=False)[source]

Calculate and store in an array all energies of all sequences in seqarr with their Watson-Crick complements.

Parameters:

seqarr (ndarray)
temperature (float)
negate (bool)

Return type:

ndarray

np.wc_arr(seqarr)[source]

Return numpy array of reverse complements of sequences in seqarr.

Parameters:: seqarr (ndarray)
Return type:: ndarray

np.energy_hist(length, temperature=37, combine_lengths=False, num_random_sequences=100000, figsize=(15, 6), **kwargs)[source]

Make a matplotlib histogram of the nearest-neighbor energies (as defined by DNASeqList.energies()) of all DNA sequences of the given length(s), or a randomly selected subset if length(s) is too large to enumerate all DNA sequences of that length.

This is useful, for example, to choose low and high energy values to pass to NearestNeighborEnergyFilter.

Parameters:

length (int | Iterable[int]) – length of DNA sequences to consider, or an iterable (e.g., list) of lengths
temperature (float) – temperature in Celsius
combine_lengths (bool) – If True, then length should be an iterable, and the histogram will combine all calculated energies from all lengths into one histogram to plot. If False (the default), then different lengths are plotted in different colors in the histogram.
num_random_sequences (int) – If the length is too large to enumerate all DNA sequences of that length, then this many random sequences are used to estimate the histogram.
figsize (tuple[int, int]) – Size of the figure in inches.
kwargs – Any keyword arguments given are passed along as keyword arguments to matplotlib.pyplot.hist: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

Return type:

None

nuad documentation

nuad.search

nuad.constraints

nuad.modifications

nuad.vienna_nupack

nuad.np

Indices and tables