nuad documentation
nuad.search
nuad.constraints
This module defines types for helping to define DNA sequence design constraints.
The key classes are Design, Strand, Domain to define a DNA design,
and various subclasses of Constraint, such as StrandConstraint or StrandPairConstraint,
to define constraints on the sequences assigned to each Domain when calling
search.search_for_dna_sequences().
Also important are two other types of constraints
(not subclasses of Constraint), which are used prior to the search to determine if it is even
legal to use a DNA sequence: subclasses of the abstract base class NumpyFilter,
and SequenceFilter, an alias for a function taking a string as input and returning a bool.
See the README on the GitHub page for more detailed explaination of these classes: https://github.com/UC-Davis-molecular-computing/dsd#data-model
- constraints.all_dna_bases = {'A', 'C', 'G', 'T'}
set of all DNA bases.
- class constraints.M13Variant(value)[source]
Variants of M13mp18 viral genome. “Standard” variant is p7249. Other variants are longer.
- p7249 = 'p7249'
“Standard” variant of M13mp18; 7249 bases long, available from, for example
https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p7249
https://www.neb.com/products/n4040-m13mp18-single-stranded-dna
- p7560 = 'p7560'
Variant of M13mp18 that is 7560 bases long. Available from, for example
https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p7560
- p8064 = 'p8064'
Variant of M13mp18 that is 8064 bases long. Available from, for example
https://www.tilibit.com/collections/scaffold-dna/products/single-stranded-scaffold-dna-type-p8064
- p8634 = 'p8634'
Variant of M13mp18 that is 8634 bases long. At the time of this writing, not listed as available from any biotech vender, but Tilibit will make it for you if you ask. (https://www.tilibit.com/pages/contact-us)
- length()[source]
- Returns:
length of this variant of M13 (e.g., 7249 for variant
M13Variant.p7249)- Return type:
int
- constraints.m13(rotation=5587, variant=M13Variant.p7249)[source]
The M13mp18 DNA sequence (commonly called simply M13).
By default, starts from cyclic rotation 5587 (with 0-based indexing; commonly this is called rotation 5588, which assumes that indexing begins at 1), as defined in GenBank.
By default, returns the “standard” variant of consisting of 7249 bases, sold by companies such as Tilibit and New England Biolabs.
For a more detailed discussion of why the default rotation 5587 of M13 is used, see Supplementary Note S8 in [Folding DNA to create nanoscale shapes and patterns. Paul W. K. Rothemund, Nature 440:297-302 (2006)].
- Parameters:
rotation (int) – rotation of circular strand. Valid values are 0 through length-1.
variant (M13Variant) – variant of M13 strand to use
- Returns:
M13 strand sequence
- Return type:
str
- constraints.m13_substrings_of_length(length, except_indices=(5514, 5515, 5516, 5517, 5518, 5519, 5520, 5521, 5522, 5523, 5524, 5525, 5526, 5527, 5528, 5529, 5530, 5531, 5532, 5533, 5534, 5535, 5536, 5537, 5538, 5539, 5540, 5541, 5542, 5543, 5544, 5545, 5546, 5547, 5548, 5549, 5550, 5551, 5552, 5553, 5554, 5555, 5556), variant=M13Variant.p7249)[source]
WARNING: This function was previously recommended to use with
DomainPool.possible_sequencesto specify possible rotations of M13 to use. However, it creates a large file size to write all those sequences to disk on every update in the search. A better method now exists to specify this, which is to specify aSubstringSamplerobject as the value forDomainPool.possible_sequencesinstead of calling this function.Return all substrings of the M13mp18 DNA sequence of length length, except those overlapping indices in except_start_indices.
This is useful with the field
DomainPool.possible_sequences, when one strand in theDesignrepresents a small portion of the full M13 sequence, and part of the sequence design process is to choose a rotation of M13 to use. One can set that strand to have a singleDomain, which contains dependent subdomains (those withDomain.dependentset to True). These subdomains are the smaller domains where M13 attaches to otherStrand’s in theDesign. Then, give the parentDomainaDomainPoolwithDomainPool.possible_sequencesset to the return value of this function, to allow the search to explore different rotations of M13.For example, suppose m13_subdomains is a list containing
Domain’s from theDesign, which are consecutive subdomains of M13 from 5’ to 3’ (all withDomain.dependentset to True), and m13_length is the sum of their lengths (note this needs to be calculated manually since the following code assumes noDomainin m13_subdomains has aDomainPoolyet, thus none yet have a length). Then the following code creates aStrandrepresenting the M13 portion that binds to otherStrand’s in theDesign.m13_subdomains = # subdomains of M13 used in the design m13_length = # sum of lengths of domains in m13_subdomains m13_substrings = dc.m13_substrings_of_length(m13_length) m13_domain_pool = dc.DomainPool(name='m13 domain pool', possible_sequences=m13_substrings) m13_domain = dc.Domain(name='m13', subdomains=m13_subdomains, pool=m13_domain_pool) m13_strand = dc.Strand(name='m13', domains=[m13_domain])
- Parameters:
length (int) – length of substrings to return
except_indices (Iterable[int]) – Indices of M13 to avoid in any part of the substring. If not specified, based on length, indices 5514-5556 are avoided, which are known to contain a long hairpin. (When using 1-based indexing, these are indices 5515-5557.) For example, if length = 10, then the starting indices of substrings will avoid the list [5505, 5506, …, 5556]
variant (M13Variant) –
M13Variantto use
- Returns:
All substrings of the M13mp18 DNA sequence, except those that overlap any index in except_start_indices.
- Return type:
list[str]
- constraints.default_score_transfer_function(x)[source]
A cubic transfer function.
- Returns:
max(0.0, x^3)
- Parameters:
x (float)
- Return type:
float
- constraints.logger = <Logger dsd (DEBUG)>
Global logger instance used throughout dsd.
Call
logger.removeHandler(logger.handlers[0])to stop screen output (assuming that you haven’t added or removed any handlers to the dsd logger instance already; by default there is one StreamHandler, and removing it will stop screen output).Call
logger.addHandler(logging.FileHandler(filename))to direct to a file.
- constraints.all_pairs(values, with_replacement=True, where=<function <lambda>>)[source]
Strongly typed function to get list of all pairs from iterable. (for using with mypy)
- Parameters:
values (Iterable[T]) – Iterable of values.
with_replacement (bool) – Whether to include self pairs, i.e., pairs (a,a)
where (Callable[[T, T], bool]) – Predicate indicating whether to include a specific pair. Must take two parameters, each of type T, and return a bool.
- Returns:
list of all pairs of values from iterable.
- Return type:
list[tuple[T, T]]
- constraints.all_pairs_iterator(values, with_replacement=True, where=<function <lambda>>)[source]
Strongly typed function to get iterator of all pairs from iterable. (for using with mypy)
This is WITH replacement; to specify without replacement, set with_replacement = False
- Parameters:
values (Iterable[T]) – Iterable of values.
with_replacement (bool) – Whether to include self pairs, i.e., pairs (a,a)
where (Callable[[tuple[T, T]], bool]) – Predicate indicating whether to include a specific pair.
- Returns:
Iterator of all pairs of values from iterable. Unlike
all_pairs(), which returns a list, the iterator returned may be iterated over only ONCE.- Return type:
Iterator[tuple[T, T]]
- constraints.SequenceFilter = typing.Callable[[str], bool]
Filter (see description of
NumpyFilterfor explanation of the term “filter”) that applies to a DNA sequence; the difference between this an aDomainConstraintis that these are applied before a sequence is assigned to aDomain, so the constraint can only be based on the DNA sequence, and not, for instance, on theDomain’sDomainPool.Consequently
SequenceFilter’s, likeNumpyFilter’s, are treated differently than subtypes ofConstraint, since a DNA sequence failing anySequenceFilter’s orNumpyFilter’s is never allowed to be assigned into anyDomain.The difference with
NumpyFilteris that aNumpyFilterrequires one to express the constraint in a way that is efficient for the linear algebra operations of numpy. If you cannot figure out how to do this, aSequenceFiltercan be expressed in pure Python, but typically will be much slower to apply than aNumpyFilter.alias of
Callable[[str],bool]
- class constraints.NumpyFilter[source]
Abstract base class for numpy filters. A “filter” is a hard constraint applied to sequences for a
Domain; a sequence not passing the filter is never allowed to be assigned to aDomain. This constrasts with the various subclasses ofConstraint, which are different in two ways: 1) they can apply to large parts of the design than just a domain, e.g., aStrandor a pair ofDomain’s, and 2) they are “soft” constraints that are allowed to be violated during the course of the search.A
NumpyFilteris one that can be efficiently encoded as numpy operations on 2D arrays of bytes representing DNA sequences, through the classnp.DNASeqList(which uses such a 2D array as the fieldnp.DNASeqList.seqarr).Subclasses should set the value
NumpyFilter.name, inherited from this class.Pre-made subclasses of
NumpyFilterprovided in this library, such asRestrictBasesFilterorNearestNeighborEnergyFilter, are dataclasses (https://docs.python.org/3/library/dataclasses.html). There is no requirement that custom subclasses be dataclasses, but since the subclasses will inherit the fieldNumpyFilter.name, you can easily make them dataclasses to get, for example, freereprandstrimplementations. See the source code for examples.The related type
SequenceFilter(which is just an alias for a Python function with a certain signature) has a similar purpose, but is used for filters that cannot be encoded as numpy operations. Since they are applied by running a Python loop, they are much slower to evaluate than aNumpyFilter.- name: str = 'TODO: give a concrete name to this NumpyFilter'
Name of this
NumpyFilter.
- abstract remove_violating_sequences(seqs)[source]
Subclasses should override this method.
Since these are filters that use numpy, generally they will access the numpy ndarray instance seqs.seqarr, operate on it, and then create a new
np.DNASeqListinstance via the constructornp.DNASeqListtaking an numpy ndarray as input.See the source code of included constraints for examples, such as
NearestNeighborEnergyFilter.remove_violating_sequences()orBaseCountFilter.remove_violating_sequences(). These are usually quite tricky to write, requiring one to think in terms of linear algebra operations. The code tends not to be easy to read. But when a constraint can be expressed in this way, it is typically very fast to apply; many millions of sequences can be processed in a few seconds.- Parameters:
seqs (DNASeqList) –
np.DNASeqListobject representing DNA sequences- Returns:
a new
np.DNASeqListobject representing the DNA sequences in seqs that satisfy the constraint- Return type:
DNASeqList
- __init__()
- Return type:
None
- class constraints.RestrictBasesFilter(bases)[source]
Restricts the sequence to use only a subset of bases. This can be used to implement a so-called “three-letter code”, for instance, in which a certain subset of
Stranduses only the bases A, T, C (andStrand’s with complementaryDomainuse only A, T, G), to help reduce secondary structure of thoseStrand’s. See for example Supplementary Section S1.1 of “Scaling Up Digital Circuit Computation with DNA Strand Displacement Cascades”, Qian and Winfree, Science 332:1196–1201, 2011. DOI: 10.1126/science.1200520, https://science.sciencemag.org/content/332/6034/1196, http://www.qianlab.caltech.edu/seesaw_digital_circuits2011_SI.pdfNote, however, that this is a filter for
Domain’s, not wholeStrand’s, so for a three-letter code to work, you must take care not to mixedDomain’s on aStrandthat will use different alphabets.- Parameters:
bases (Collection[str])
- bases: Collection[str]
Bases to use. Must be a strict subset of {‘A’, ‘C’, ‘G’, ‘T’} with at least two bases.
- remove_violating_sequences(seqs)[source]
Should never be called directly; it is handled specially by the library when initially generating sequences.
- Parameters:
seqs (DNASeqList)
- Return type:
DNASeqList
- __init__(bases)
- Parameters:
bases (Collection[str])
- Return type:
None
- class constraints.NearestNeighborEnergyFilter(low_energy=None, high_energy=None, temperature=37.0)[source]
This constraint calculates the nearest-neighbor binding energy of a domain with its perfect complement (summing over all length-2 substrings of the domain’s sequence), using parameters from the 2004 Santa-Lucia and Hicks paper (https://www.annualreviews.org/doi/abs/10.1146/annurev.biophys.32.110601.141800, see Table 1, and example on page 419). It rejects any sequences whose energy according to this sum is outside the range [
NearestNeighborEnergyFilter.low_energy,NearestNeighborEnergyFilter.high_energy].If
NearestNeighborEnergyFilter.low_energyorNearestNeighborEnergyFilter.high_energyis not specified, only the other is used, e.g.,NearestNeighborEnergyFilter(high_energy=-7.5)filters out all sequences with nearest neighbor binding energy above -7.5. At least one must be specified.- Parameters:
low_energy (float | None)
high_energy (float | None)
temperature (float)
- low_energy: float | None = None
Low threshold for nearest-neighbor energy.
- high_energy: float | None = None
High threshold for nearest-neighbor energy.
- temperature: float = 37.0
Temperature in Celsius at which to calculate nearest-neighbor energy.
- remove_violating_sequences(seqs)[source]
Remove sequences with nearest-neighbor energies outside of an interval.
- Parameters:
seqs (DNASeqList)
- Return type:
DNASeqList
- __init__(low_energy=None, high_energy=None, temperature=37.0)
- Parameters:
low_energy (float | None)
high_energy (float | None)
temperature (float)
- Return type:
None
- class constraints.BaseCountFilter(bases, high=None, low=None)[source]
Restricts the sequence to contain a certain number of occurences of given bases.
This is useful to implement, for instance, a “pseudo three letter code”, in which the alphabet is mostly ATC, but with the exception up to one G.
- Parameters:
bases (str)
high (int | None)
low (int | None)
- bases: str
Bases to count. e.g., ‘A’ or ‘AT’. If multiple bases are included, it ensures the sum of their counts is between
BaseCountFilter.highandBaseCountFilter.low.
- high: int | None = None
Count of bases in
BaseCountFilter.basesmust be at most this.
- low: int | None = None
Count of bases in
BaseCountFilter.basesmust be at least this.
- remove_violating_sequences(seqs)[source]
Remove sequences whose counts of a certain bases are outside of an interval.
- Parameters:
seqs (DNASeqList)
- Return type:
DNASeqList
- __init__(bases, high=None, low=None)
- Parameters:
bases (str)
high (int | None)
low (int | None)
- Return type:
None
- class constraints.BaseEndFilter(bases, distance_from_end=0, five_prime=True, three_prime=True)[source]
Restricts the sequence to contain only certain bases on (or near, if
BaseEndFilter.distance_from_end> 0) each end.- Parameters:
bases (Collection[str])
distance_from_end (int)
five_prime (bool)
three_prime (bool)
- bases: Collection[str]
Bases to require on ends.
- distance_from_end: int = 0
Distance from end.
- five_prime: bool = True
Whether to apply to 5’ end of sequence (left end of DNA sequence, lowest index).
- three_prime: bool = True
Whether to apply to 3’ end of sequence (right end of DNA sequence, highest index).
- remove_violating_sequences(seqs)[source]
Keeps sequences with the given bases at given distance from the 5’ or 3’ end.
- Parameters:
seqs (DNASeqList)
- Return type:
DNASeqList
- __init__(bases, distance_from_end=0, five_prime=True, three_prime=True)
- Parameters:
bases (Collection[str])
distance_from_end (int)
five_prime (bool)
three_prime (bool)
- Return type:
None
- class constraints.BaseAtPositionFilter(bases, position)[source]
Restricts the sequence to contain only certain base(s) on at a particular position.
One use case is that many internal modifications (e.g., biotin or fluorophore) can only be placed on an T.
- Parameters:
bases (str | Collection[str])
position (int)
- bases: str | Collection[str]
Base(s) to require at position
BasePositionConstraint.position.Can either be a single base, or a collection (e.g., list, tuple, set). If several bases are specified, the base at
BasePositionConstraint.positionmust be one of the bases inBasePositionConstraint.bases.
- position: int
Position of base to check.
- remove_violating_sequences(seqs)[source]
Remove sequences that don’t have one of the given bases at the given position.
- Parameters:
seqs (DNASeqList)
- Return type:
DNASeqList
- __init__(bases, position)
- Parameters:
bases (str | Collection[str])
position (int)
- Return type:
None
- class constraints.ForbiddenSubstringFilter(substrings, indices=None)[source]
Restricts the sequence not to contain a certain substring(s), e.g., GGGG.
- Parameters:
substrings (str | Collection[str])
indices (Sequence[int] | None)
- substrings: str | Collection[str]
Substring(s) to forbid.
Can either be a single substring, or a collection (e.g., list, tuple, set). If a collection, all substrings must have the same length.
- indices: Sequence[int] | None = None
Indices at which to check for each substring in
ForbiddenSubstringFilter.substrings. If not specified, all appropriate indices are checked.
- remove_violating_sequences(seqs)[source]
Remove sequences that have a string in
ForbiddenSubstringFilter.substringsas a substring.- Parameters:
seqs (DNASeqList)
- Return type:
DNASeqList
- __init__(substrings, indices=None)
- Parameters:
substrings (str | Collection[str])
indices (Sequence[int] | None)
- Return type:
None
- class constraints.RunsOfBasesFilter(bases, length)[source]
Restricts the sequence not to contain runs of a certain length from a certain subset of bases, (e.g., forbidding any substring in {C,G}^3; no four bases can appear in a row that are either C or G)
This works by simply generating all strings representing the runs of bases, and then using a
ForbiddenSubstringFilterwith those strings. So this will not be efficient for forbidding, for example {A,C,T}^20 (i.e., all runs of A’s, C’s, or T’s of length 20), which would generate all 3^20 = 3,486,784,401 strings of length 20 from the alphabet {A,C,T}^20. Hopefully such a constraint would not be used in practice.- Parameters:
bases (Collection[str])
length (int)
- __init__(bases, length)[source]
- Parameters:
bases (str | Collection[str]) – Can either be a single base, or a collection (e.g., list, tuple, set).
length (int) – length of run to forbid
- Return type:
None
- bases: Collection[str]
Bases to forbid in runs of length
RunsOfBasesFilter.length.
- length: int
Length of run to forbid.
- class constraints.SubstringSampler(supersequence, substring_length, except_start_indices=None, except_overlapping_indices=None, circular=False)[source]
A
SubstringSampleris an object for specifying a common case for the fieldDomainPool.possible_sequences, namely where we want the set of possible sequences to be all (or many) substrings of a single longer sequence.For example, this can be used to choose a rotation of the M13mp18 strand in sequence design. If for example 300 consecutive bases of M13 will be used in the design, and we want to choose the rotation, but disallow the substring of length 300 to overlap the hairpin at indices 5514-5556, then one would do the following
possible_sequences = SubstringSampler( supersequence=m13(), substring_length=300, except_overlapping_indices=range(5514, 5557), circular=True) pool = DomainPool('M13 rotations', possible_sequences=possible_sequences)
For this example, using a
SubstringSampleris much more efficient than explicitly listing all length-300 substrings of M13 in the parameterDomainPool.possible_sequences. This is because the latter approach, whenever theDesignimproves and is written to a file, will write out all the length-300 substrings of M13, taking much more space than just writing the full M13 sequence once.- Parameters:
supersequence (str)
substring_length (int)
except_start_indices (tuple[int, ...])
except_overlapping_indices (Iterable[int] | None)
circular (bool)
- __init__(supersequence, substring_length, except_start_indices=None, except_overlapping_indices=None, circular=False)[source]
- Parameters:
supersequence (str)
substring_length (int)
except_start_indices (Iterable[int] | None)
except_overlapping_indices (Iterable[int] | None)
circular (bool)
- Return type:
None
- supersequence: str
The longer sequence from which to sample substrings.
- substring_length: int
Length of substrings to sample.
- circular: bool
Whether
SubstringSampler.supersequenceis circular. If so, then we can sample indices near the end and the substrings will start at the end and wrap around to the start.
- except_start_indices: tuple[int, ...]
Start indices in
SubstringSampler.supersequenceto avoid. In the constructor this can be specified directly. Another option (mutually exclusive with the parameter except_start_indices) is to specify the parameter except_overlapping_indices, which setsSubstringSampler.except_start_indicesso that substrings will not intersect any indices in except_overlapping_indices.
- extended_supersequence: str
If
SubstringSampler.circularis True, then this isSubstringSampler.supersequenceextended by its own prefix of lengthSubstringSampler.substring_length - 1, to make sampling easier. Otherwise it is simply identical toSubstringSampler.supersequence. Computed in constructor from other arguments.
- start_indices: tuple[int, ...]
List of start indices from which to sample when calling
SubstringSampler.sample_substring(). Computed in constructor from other arguments.
- all_possible_substrings()[source]
- Returns:
list of all possible substrings that can be sampled by this
SubstringSampler.- Return type:
list[str]
- sample_substring(rng)[source]
- Returns:
a random substring of
SubstringSampler.supersequenceof lengthSubstringSampler.substring_length.- Parameters:
rng (Generator)
- Return type:
str
- class constraints.DomainPool(name, length=None, possible_sequences=None, replace_with_close_sequences=True, hamming_probability=<factory>, numpy_filters=<factory>, sequence_filters=<factory>)[source]
Represents a group of related
Domain’s that share common properties in their sequence design, such as length of DNA sequence, or bounds on nearest-neighbor duplex energy.Also serves as a “source” of DNA sequences for
Domain’s in thisDomainPool. By callingDomainPool.generate_sequence()repeatedly, we can produce DNA sequences satisfying the constraints defining thisDomainPool.- Parameters:
name (str)
length (int | None)
possible_sequences (list[str] | SubstringSampler | None)
replace_with_close_sequences (bool)
hamming_probability (dict[int, float])
numpy_filters (list[NumpyFilter])
sequence_filters (list[SequenceFilter])
- name: str
Name of this
DomainPool. Must be unique.
- length: int | None = None
Length of DNA sequences generated by this
DomainPool.Should be None if
DomainPool.possible_sequencesis specified.
- possible_sequences: list[str] | SubstringSampler | None = None
If specified, all other fields except
DomainPool.nameandDomainPool.lengthare ignored. This is an explicit list of sequences to consider forDomain’s using thisDomainPool. During the search, if a domain with thisDomainPoolis picked to have its sequence changed, then a sequence will be picked uniformly at random from this list. Note that noNumpyFilter’s orSequenceFilter’s will be applied.Alternatively, the field can be an instance of
SubstringSamplerfor the common case that the set of possible sequences is the set of substrings of some length of a single longer sequence. For example, this can be used to choose a rotation of the M13mp18 strand in sequence design. (This is advantageous because the files saving theDesigneach time the design improves will be much shorter, since it takes much less space to write the M13 sequence than to write all of its length-300 substrings.)Should be None if
DomainPool.lengthis specified.
- replace_with_close_sequences: bool = True
If True, instead of picking a sequence uniformly at random from all those satisfying the filters when returning a sequence from
DomainPool.generate_sequence(), one is picked “close” in Hamming distance to the previous sequence of theDomain. The fieldDomainPool.hamming_probabilityis used to pick a distance at random, after which a sequence that distance from the previous sequence is selected to return.
- hamming_probability: dict[int, float]
Dictionary that specifies probability of taking a new sequence from the pool that is some integer number of bases different from the previous sequence (Hamming distance).
- numpy_filters: list[NumpyFilter]
NumpyFilter’s shared by allDomain’s in thisDomainPool. This is used to choose potential sequences to assign to theDomain’s in thisDomainPoolin the methodDomainPool.generate_sequence().The difference with
DomainPool.sequence_filtersis that these constraints can be applied efficiently to many sequences at once, represented as a numpy 2D array of bytes (via the classnp.DNASeqList), so they are done in large batches in advance. In contrast, the constraints inDomainPool.sequence_filtersare done on Python strings representing DNA sequences, and they are called one at a time when a new sequence is requested inDomainPool.generate_sequence().Optional; default is empty.
- sequence_filters: list[SequenceFilter]
SequenceFilter’s shared by allDomain’s in thisDomainPool. This is used to choose potential sequences to assign to theDomain’s in thisDomainPoolin the methodDomainPool.generate().See
DomainPool.numpy_filtersfor an explanation of the difference between them.See
DomainPool.domain_constraintsfor an explanation of the difference between them.Optional; default is empty.
- satisfies_sequence_constraints(sequence)[source]
- Parameters:
sequence (str) – DNA sequence to check
- Returns:
whether sequence satisfies all constraints in
DomainPool.sequence_filters- Return type:
bool
- generate_all_sequences()[source]
Returns all DNA sequences of given length satisfying
DomainPool.numpy_filtersandDomainPool.sequence_filters.CAUTION: This is prohibitively memory and time intensive if length is larger than 16, because it enumerates all DNA sequences of a given length, then filters out those not passing the filters.
- Returns:
all DNA sequences of given length satisfying
DomainPool.numpy_filtersandDomainPool.sequence_filters- Return type:
list[str]
- generate_sequence(rng, previous_sequence=None, warn_no_seqs_found=False)[source]
Returns a DNA sequence of given length satisfying
DomainPool.numpy_filtersandDomainPool.sequence_filtersNote: By default, there is no check that the sequence returned is unequal to one already assigned somewhere in the design, since both
DomainPool.numpy_filtersandDomainPool.sequence_filtersdo not have access to the wholeDesign. But theDomainPairConstraintreturned bydomains_not_substrings_of_each_other_constraint()can be used to specify thisDesign-wide constraint.Note that if
DomainPool.possible_sequencesis specified, then all constraints are ignored, and instead a sequence is chosen randomly to be returned from that list.- Parameters:
rng (np.random.Generator) – numpy random number generator to use. To use a default, pass
np.default_rng.previous_sequence (str | None) – previously generated sequence to be replaced by a new sequence; None if no previous sequence exists. Used to choose a new sequence “close” to itself in Hamming distance, if the field
DomainPool.replace_with_close_sequencesis True and previous_sequence is not None. The number of differences between previous_sequence and its neighbors is determined by randomly picking a Hamming distance fromDomainPool.hamming_probabilitywith weighted probabilities of choosing each distance.warn_no_seqs_found (bool) – whether to warn if no sequences satisfying constraints are found when generating a large number; this is on by default in case the user made the NumpyFilters or SequenceFilters too restrictive, but it can be annoying to see it printed when it is not hurting the search (frequently it appears when filters are restrictive and Hamming distance 1 is picked, but it’s okay because it simply samples a larger Hamming distance the next time)
- Returns:
DNA sequence of given length satisfying
DomainPool.numpy_filtersandDomainPool.sequence_filters- Return type:
str
- __init__(name, length=None, possible_sequences=None, replace_with_close_sequences=True, hamming_probability=<factory>, numpy_filters=<factory>, sequence_filters=<factory>)
- Parameters:
name (str)
length (int | None)
possible_sequences (list[str] | SubstringSampler | None)
replace_with_close_sequences (bool)
hamming_probability (dict[int, float])
numpy_filters (list[NumpyFilter])
sequence_filters (list[SequenceFilter])
- Return type:
None
- class constraints.Domain(name, pool=None, *, sequence=None, fixed=False, label=None, dependent=False, subdomains=None, weight=None)[source]
Represents a contiguous substring of the DNA sequence of a
Strand, which is intended to be either single-stranded, or to bind fully to the Watson-Crick complement of theDomain.If two domains are complementary, they are represented by the same
Domainobject. They are distinguished only by whether theStrandobject containing them has theDomainin its setStrand.starred_domainsor not.A
Domainuses only its name to compute hash and equality checks, not its sequence. This allows aDomainto be used in sets and dicts while modifying the sequence assigned to it, and also modifying the pool (letting the pool be assigned after it is created).- Parameters:
name (str)
pool (DomainPool | None)
sequence (str | None)
fixed (bool)
label (str | None)
dependent (bool)
subdomains (list[Domain] | None)
weight (float | None)
- parent: Domain | None = None
Domain of which this is a subdomain. Note, this is not set manually, this is set by the library based on the
Domain.subdomainsof other domains in the same tree.
- __init__(name, pool=None, *, sequence=None, fixed=False, label=None, dependent=False, subdomains=None, weight=None)[source]
- Parameters:
name (str)
pool (DomainPool | None)
sequence (str | None)
fixed (bool)
label (str | None)
dependent (bool)
subdomains (list[Domain] | None)
weight (float | None)
- Return type:
None
- fixed: bool = False
Whether this
Domain’s DNA sequence is fixed, i.e., cannot be changed by the search algorithmsearch.search_for_dna_sequences().Note: If a domain is fixed then all of its subdomains must also be fixed.
- label: str | None = None
Optional “label” string to associate to this
Domain.Useful for associating extra information with the
Domainthat will be serialized, for example, for DNA sequence design.
- dependent: bool = False
Whether this
Domain’s DNA sequence is dependent on others. Usually this is not the case. However, domains can be subdivided hierarchically into a tree of domains by settingDomain.subdomainsto describe the tree. In this case exactly one domain along every path from the root to any leaf must be independent, and the rest dependent: the dependent domains will have their sequences calculated from the indepenedent ones.A possible use case is that one strand represents a subsequence of M13 of length 300, of which there are 7249 possible DNA sequences to assign based on the different rotations of M13. If this strand is bound to several other strands, it will have several domains, but they cannot be set independently of each other. This can be done by creating a strand with a single long domain, which is subdivided into many dependent child domains. Only the entire strand, the root domain, can be assigned at once, changing every domain at once, so the domains are dependent on the root domain’s assigned sequence.
- length: int | None = None
Length of this domain. If None, then the method
Domain.get_length()asksDomain.poolfor the length. However, aDomainwithDomain.dependentset to True has noDomain.pool. For such domains, it is necessary to set aDomain.lengthfield directly.
- weight: float = 1.0
Weight to apply before picking domain at random to change when re-assigning DNA sequences during search. Should only be changed for independent domains. (those with
Domain.dependentset to False)Normally a domain’s probability of being changed is proportional to the total score of violations it causes, but that total score is first multiplied by
Domain.weight. This is useful, for instance, to weight a domain lower when it has many subdomains that intersect many strands, for example if a domain represents an M13 strand. It may be more efficient to pick such a domain less often since changing it will change many strands in the design and, when the design gets close to optimized, this will likely cause the score to go up.
- to_json_serializable(suppress_indent=True)[source]
- Returns:
Dictionary
drepresenting thisDomainthat is “naturally” JSON serializable, by callingjson.dumps(d).- Parameters:
suppress_indent (bool)
- Return type:
NoIndent | dict[str, Any]
- static from_json_serializable(json_map, pool_with_name)[source]
- Parameters:
json_map (dict[str, Any]) – JSON serializable object encoding this
Domain, as returned byDomain.to_json_serializable().pool_with_name (dict[str, DomainPool] | None) – dict mapping name to
DomainPoolwith that name; required to rehydrateDomain’s. If None, then a DomainPool with no constraints is created with the name and domain length found in the JSON.
- Returns:
Domainrepresented by dict json_map, assuming it was created byDomain.to_json_serializable().- Return type:
- property pool: DomainPool
DomainPoolof thisDomain- Type:
return
- property subdomains: list[Domain]
Subdomains of this
Domain.Used in connection with
Domain.dependentto declare that someDomain’s are contained within other domains (forming a tree in general), and domains withDomain.dependentset to True automatically take their sequences from independent domains.WARNING: this can be a bit tricky to determine the order when setting these. The subdomains should be listed in 5’ to 3’ order for UNSTARRED domains. If there is a starred domain with starred subdomains, they would be listed in REVERSE order.
For example, if there is a domain dom*
[--------->of length 11 with two subdomains sub1*[----->of length 7 and sub2*[-->of length 4 (put together they look like[----->[-->) that appear in that order left to right (5’ to 3’), then one would assign the domain dom to have subdomains[sub2, sub1], since the UNSTARRED domains appear<-----]<--], i.e., in 5’ to 3’ order for the unstarred domains, first the length 4 domain dom2 appears, then the length 7 domain dom1.
- has_length()[source]
- Returns:
True if this
Domainhas a length, which means either a sequence has been assigned to it, or it has aDomainPool.- Return type:
bool
- get_length()[source]
- Returns:
Length of this domain (delegates to pool)
- Raises:
ValueError – if no
DomainPoolhas been set for thisDomain- Return type:
int
- sequence()[source]
- Returns:
DNA sequence of this domain (unstarred version)
- Raises:
ValueError – If no sequence has been assigned.
- Return type:
str
- set_sequence(new_sequence)[source]
- Parameters:
new_sequence (str) – new DNA sequence to set
- Return type:
None
- set_fixed_sequence(fixed_sequence)[source]
Set DNA sequence and fix it so it is not changed by the nuad sequence designer.
Since it is being fixed, there is no Domain pool, so we don’t check the pool or whether it has a length. We also bypass the check that it is not fixed.
- Parameters:
fixed_sequence (str) – new fixed DNA sequence to set
- Return type:
None
- property starred_name: str
The value
Domain.namewith * appended to it.- Type:
return
- property starred_sequence: str
Watson-Crick complement of DNA sequence assigned to this
Domain.- Type:
return
- get_name(starred)[source]
- Parameters:
starred (bool) – whether to return the starred or unstarred version of the name
- Returns:
The value
Domain.nameorDomain.starred_name, depending on the value of parameter starred.- Return type:
str
- concrete_sequence(starred)[source]
- Parameters:
starred (bool) – whether to return the starred or unstarred version of the sequence
- Returns:
The value
Domain.sequenceorDomain.starred_sequence, depending on the value of parameter starred.- Raises:
ValueError – if this
Domaindoes not have a sequence assigned- Return type:
str
- has_sequence()[source]
- Returns:
Whether a complete DNA sequence has been assigned to this
Domain. If this domain has subdomains, False if any subdomain has not been assigned a sequence.- Return type:
bool
- static complementary_domain_name(domain_name)[source]
Returns the name of the domain complementary to domain_name. In other words, a
*is either removed from the end of domain_name, or appended to it if not already there.- Parameters:
domain_name (str) – name of domain
- Returns:
name of complementary domain
- Return type:
str
- all_domains_in_tree(allow_fixed=False)[source]
- Returns:
list of all domains in the same subdomain tree as this domain (including itself)
- Parameters:
allow_fixed (bool)
- Return type:
list[Domain]
- all_domains_intersecting()[source]
- Returns:
list of all domains intersecting this one, meaning those domains in the subtree rooted at this domain (including itself), plus any ancestors of this domain.
- Return type:
list[Domain]
- ancestors()[source]
- Returns:
list of all domains that are ancestors of this one, NOT including this domain
- Return type:
list[Domain]
- has_pool()[source]
- Returns:
whether a
DomainPoolhas been assigned to thisDomain- Return type:
bool
- independent_source()[source]
Like
independent_ancestor_or_descendent(), but returns this Domain if it is already independent.
- constraints.domains_neq_constraint(check_complements=True, short_description='dom neq', weight=1.0, min_length=4, pairs=None, domains=None)[source]
More generally, returns constraint ensuring no two domains are substrings of each other. Note that this ensures that no two
Domain’s are equal if they are the same length.- Parameters:
check_complements (bool) – whether to also ensure the check for Watson-Crick complements of the sequences
short_description (str) – short description of constraint suitable for logging to stdout
weight (float) – weight to assign to constraint
min_length (int) – minimum length substring to check. For instance if min_length is 4, then having two domains with sequences AAAA and CAAAAC would violate this constraint, but domains with sequences AAA and CAAAC would not.
pairs (Iterable[tuple[Domain, Domain]] | None) – pairs of domains to check. By default all pairs of unequal domains are compared unless both are fixed. Mutually exclusive with domains.
domains (Iterable[Domain] | None) – domains to check. All pairs among domains are checked. Mutually exclusive with pairs.
- Returns:
a
DomainPairConstraintensuring no two domain sequences contain each other as a substring (in particular, if they are equal length, then they are not the same domain)- Return type:
- class constraints.VendorFields(scale='25nm', purification='STD', plate=None, well=None)[source]
Data required when ordering DNA strands from a synthesis company such as IDT (Integrated DNA Technologies). This data is used when automatically generating files used to order DNA from IDT.
When exporting to IDT files via
Design.write_idt_plate_excel_file()orDesign.write_idt_bulk_input_file(), the fieldStrand.nameis used for the name if it exists, otherwise a reasonable default is chosen.- Parameters:
scale (str)
purification (str)
plate (str | None)
well (str | None)
- scale: str = '25nm'
Synthesis scale at which to synthesize the strand (third field in IDT bulk input: https://www.idtdna.com/site/order/oligoentry). Choices supplied by IDT at the time this was written:
"25nm","100nm","250nm","1um","5um","10um","4nmU","20nmU","PU","25nmS".
- purification: str = 'STD'
Purification options (fourth field in IDT bulk input: https://www.idtdna.com/site/order/oligoentry). Choices supplied by IDT at the time this was written:
"STD","PAGE","HPLC","IEHPLC","RNASE","DUALHPLC","PAGEHPLC".
- plate: str | None = None
Name of plate in case this strand will be ordered on a 96-well or 384-well plate.
Optional field, but non-optional if
book = openpyxl.load_workbook(filename=filename).wellis notNone.
- well: str | None = None
Well position on plate in case this strand will be ordered on a 96-well or 384-well plate.
Optional field, but non-optional if
VendorFields.plateis notNone.
- __init__(scale='25nm', purification='STD', plate=None, well=None)
- Parameters:
scale (str)
purification (str)
plate (str | None)
well (str | None)
- Return type:
None
- class constraints.Strand(domains=None, starred_domain_indices=(), group='default_strand_group', name=None, label=None, vendor_fields=None)[source]
Represents a DNA strand, made of several
Domain’s.- Parameters:
domains (Iterable[Domain] | None)
starred_domain_indices (Iterable[int])
group (str)
name (str | None)
label (str | None)
vendor_fields (VendorFields | None)
- modification_5p: nm.Modification5Prime | None = None
5’ modification; None if there is no 5’ modification.
- modification_3p: nm.Modification3Prime | None = None
3’ modification; None if there is no 3’ modification.
- __init__(domains=None, starred_domain_indices=(), group='default_strand_group', name=None, label=None, vendor_fields=None)[source]
A
Strandcan be created only by listing explicitDomainobjects via parameter domains. To specify aStrandby giving domain names, see the methodDesign.add_strand().- Parameters:
domains (Iterable[Domain] | None) – list of
Domain’s on thisStrandstarred_domain_indices (Iterable[int]) – Indices of
Domain’s in domains that are starred.group (str) – name of group of this
Strand.name (str | None) – Name of this
Strand.label (str | None) – Label to associate with this
Strand.vendor_fields (VendorFields | None) –
VendorFieldsobject to associate with thisStrand; needed to call methods for exporting to IDT formats (e.g.,Strand.write_idt_bulk_input_file())
- Return type:
None
- group: str
Optional “group” field to describe strands that share similar properties.
- starred_domain_indices: frozenset[int]
Set of positions of
Domain’s inStrand.domainson thisStrandthat are starred.
- label: str | None = None
Optional generic “label” string to associate to this
Strand.Useful for associating extra information with the
Strandthat will be serialized, for example, for DNA sequence design.
- vendor_fields: VendorFields | None = None
Fields used when ordering strands from a synthesis company such as IDT (Integrated DNA Technologies, Coralville, IA). If present (i.e., not equal to
None) then the methodDesign.write_idt_bulk_input_file()can be called to automatically generate an text file for ordering strands in test tubes: https://www.idtdna.com/site/order/oligoentry, as can the methodDesign.write_idt_plate_excel_file()for writing a Microsoft Excel file that can be uploaded to IDT’s website for describing DNA sequences to be ordered in 96-well or 384-well plates.
- modifications_int: dict[int, nm.ModificationInternal]
modifications.Modification’s to the DNA sequence (e.g., biotin, Cy3/Cy5 fluorphores).Maps index within DNA sequence to modification. If the internal modification is attached to a base (e.g., internal biotin, /iBiodT/ from IDT), then the index is that of the base. If it goes between two bases (e.g., internal Cy3, /iCy3/ from IDT), then the index is that of the previous base, e.g., to put a Cy3 between bases at indices 3 and 4, the index should be 3. So for an internal modified base on a sequence of length n, the allowed indices are 0,…,n-1, and for an internal modification that goes between bases, the allowed indices are 0,…,n-2.
- clone(name)[source]
Returns a copy of this
Strand. The copy is “shallow” in that theDomain’s are shared. This is useful for creating multiple versions of eachStrand, e.g., for having a variant with an extension.WARNING: the
Strand.labelwill be shared between them. If it should be copied, this must be done manually. A shallow copy of it can be made by setting
- compute_derived_fields()[source]
Re-computes derived fields of this
Strand. Should be called after modifications to the Strand. (Done automatically at the start ofsearch.search_for_dna_sequences().)
- intersects_domain(domain)[source]
- Parameters:
domain (Domain) – domain to test for intersection
- Returns:
whether this strand intersects domain, which is true if either domain is in the list
Strand.domains, or if any of those domains have domain in their hierarchical tree as a subdomain or an ancestor- Return type:
bool
- length()[source]
- Returns:
Sum of lengths of
Domain’s in thisStrand. EachDomainmust have aDomainPoolassigned so that the length is defined.- Return type:
int
- vendor_dna_sequence()[source]
- Returns:
DNA sequence as it needs to be typed to order from a synthesis company, with
Modification5Prime’s,Modification3Prime’s, andModificationInternal’s represented with text codes, e.g., “/5Biosg/ACGT” for sequence ACGT with a 5’ biotin modification to order from IDT.- Return type:
str
- to_json_serializable(suppress_indent=True)[source]
- Returns:
Dictionary
drepresenting thisStrandthat is “naturally” JSON serializable, by callingjson.dumps(d).- Parameters:
suppress_indent (bool)
- Return type:
NoIndent | dict[str, Any]
- static from_json_serializable(json_map, domain_with_name)[source]
- Returns:
Strandrepresented by dict json_map, assuming it was created byStrand.to_json_serializable().- Parameters:
json_map (dict[str, Any])
domain_with_name (dict[str, Domain])
- Return type:
- unstarred_domains()[source]
- Returns:
list of unstarred
Domain’s in thisStrand, in order they appear inStrand.domains- Return type:
list[Domain]
- starred_domains()[source]
- Returns:
list of starred
Domain’s in thisStrand, in order they appear inStrand.domains- Return type:
list[Domain]
- sequence(delimiter='')[source]
- Parameters:
delimiter (str) – Delimiter string to place between sequences of each
Domainin thisStrand. For instance, if delimiter ='--', then it will return a string such asACGTAGCTGA--CGCTAGCTGA--CGATCGATC--GCGATCGAT- Returns:
DNA sequence assigned to this
Strand, calculated by concatenating all sequences assigned to itsDomain’s.- Raises:
ValueError – if any
Domainof thisStranddoes not have a sequence assigned- Return type:
str
- assign_dna(sequence)[source]
- Parameters:
sequence (str) – DNA sequence to assign to this
Strand. Must have length =Strand.length().- Return type:
None
- unfixed_domains()[source]
- Returns:
all
Domain’s in thisStrandwhereDomain.fixedis False- Return type:
tuple[Domain, …]
- property name: str
name of this
Strandif it was assigned one, otherwiseDomainnames are concatenated with ‘-’ joining them- Type:
return
- address_of_domain(domain_idx)[source]
Returns
StrandDomainAddressof the domain located at domain_idx- Rparam domain_idx:
Index of domain
- Parameters:
domain_idx (int)
- Return type:
- address_of_nth_domain_occurence(domain_name, n, forward=True)[source]
Returns
StrandDomainAddressof the n’th occurence of domain named domain_name.- Parameters:
- Returns:
StrandDomainAddressof the n’th occurence of domain named domain_name.- Return type:
- address_of_first_domain_occurence(domain_name)[source]
Returns
StrandDomainAddressof the first occurrence of domain named domain_name starting from the 5’ end.- Parameters:
domain_name (str)
- Return type:
- address_of_last_domain_occurence(domain_name)[source]
Returns
StrandDomainAddressof the nth occurrence of domain named domain_name starting from the 3’ end.- Parameters:
domain_name (str)
- Return type:
- prepend_domain(domain, starred=False)[source]
Prepends domain to 5’ end of this
Strand(i.e., the beginning of theStrand).
- insert_domain(idx, domain, starred=False)[source]
Inserts domain at index idx of this
Strand, with same semantics as Python’s list.insert. For example,strand.insert(0, domain)is equivalent tostrand.prepend_domain(domain)andstrand.insert(len(strand.domains), domain)is equivalent tostrand.append_domain(domain).
- set_fixed_sequence(seq)[source]
Sets each domain of this
Strandto have a substring of seq, such that the entire strand has the sequence seq. AllDomain’s in this strand will be fixed after doing this. (And if any of them are already fixed it will raise an error.)- Parameters:
seq (str) – sequence to assign to this
Strand- Return type:
None
- class constraints.DomainPair(domain1: 'Domain', domain2: 'Domain', starred1: 'bool' = False, starred2: 'bool' = False)[source]
-
- starred1: bool = False
Whether first domain is starred (not used in most constraints)
- starred2: bool = False
Whether second domain is starred (not used in most constraints)
- class constraints.Complex(*args: 'Strand')[source]
- Parameters:
args (Strand)
- constraints.remove_duplicates(lst)[source]
- Parameters:
lst (Iterable[T]) – an Iterable of objects
- Returns:
a list consisting of elements of lst with duplicates removed, while preserving iteration order of lst (naive approach using Python set would not preserve order, since iteration order of Python sets is not specified)
- Return type:
list[T]
- class constraints.PlateType(value)[source]
Represents two different types of plates in which DNA sequences can be ordered.
- wells96 = 96
96-well plate.
- wells384 = 384
384-well plate.
- class constraints.Design(strands=())[source]
Represents a complete design, i.e., a set of DNA
Strand’s with domains, andConstraint’s on the sequences to assign to them viasearch.search_for_dna_sequences().- Parameters:
strands (list[Strand])
- strands_by_group_name: dict[str, list[Strand]]
dict mapping each group name to a list of the
Strand’s in thisDesignin the group.Computed from
Design.strands, so not specified in constructor.
- domain_pools_to_domain_map: dict[DomainPool, list[Domain]]
dict mapping each
DomainPoolto a list of theDomain’s in thisDesignin the pool.Computed from
Design.strands, so not specified in constructor.
- strands_by_name: dict[str, Strand]
dict mapping each name of a
Strandto theStrandin thisDesignwith that name.Computed from
Design.strands, so not specified in constructor.
- property domains: list[Domain]
List of all
Domain’s in thisDesign. (without repetitions)Computed from
Design.strands, so not specified in constructor.
- domains_by_name: dict[str, Domain]
dict mapping each name of a
Domainto theDomain’s in thisDesign.Computed from
Design.strands, so not specified in constructor.
- compute_derived_fields()[source]
Computes derived fields of this
Design. Used to ensure that all fields are valid in case theDesignwas manually modified after being created, before runningsearch.search_for_dna_sequences().- Return type:
None
- to_json_serializable(suppress_indent=True)[source]
- Parameters:
suppress_indent (bool) – Whether to suppress indentation of some objects using the NoIndent object.
- Returns:
Dictionary
drepresenting thisDesignthat is “naturally” JSON serializable, by callingjson.dumps(d).- Return type:
dict[str, Any]
- write_design_file(directory='.', filename=None, extension='json')[source]
Write JSON file representing this
Design, which can be imported via the methodDesign.from_design_file(), with the output file having the same name as the running script but with.pychanged to.json, unless filename is explicitly specified. For instance, if the script is namedmy_design.py, then the design will be written tomy_design.json. If extension is specified (but filename is not), then the design will be written tomy_design.<extension>The string written is that returned by
Design.to_json().- Parameters:
directory (str) – directory in which to put file (default: current working directory)
filename (str | None) – filename (default: name of script with
.pyreplaced by.sc). Mutually exclusive with extensionextension (str) – extension for filename (default:
.sc) Mutually exclusive with filename
- Return type:
None
- static from_design_file(filename)[source]
- Parameters:
filename (str) – name of JSON file describing the
Design- Returns:
Designdescribed by the JSON file with name filename, assuming it was created usingDesign.write_design_file().- Return type:
- static from_json_serializable(json_map)[source]
- Parameters:
json_map (dict[str, Any]) – JSON serializable object encoding this
Design, as returned byDesign.to_json_serializable().- Returns:
Designrepresented by dict json_map, assuming it was created byDesign.to_json_serializable(). No constraints are populated.- Return type:
- add_domain(name, pool=None, fixed_sequence=None, allow_if_exists=False)[source]
Adds a
Domainwith name name to thisDesign, and returns it.This is useful in combination with
Design.add_strand()for creating strands by giving domain names instead of explicitDomainobjects. This way, the domains can be associated to aDomainPoolor have a fixed sequence assigned at the time of creation, without needing to modify theDomainobjects after they are created byDesign.add_strand().- Parameters:
name (str) – name of the
Domainto add. Raises exception if this domain has already been added to thisDesign.pool (DomainPool | None) –
DomainPoolto associate with thisDomain. Exactly one of pool and fixed_sequence can be specified.fixed_sequence (str | None) – fixed sequence to assign to this
Domain. Exactly one of pool and fixed_sequence can be specified.allow_if_exists (bool) – if False (default), raises exception if a
Domainwith name name already exists in thisDesign(or its complement).
- Return type:
- add_strand(domain_names=None, domains=None, starred_domain_indices=None, group='default_strand_group', name=None, label=None, vendor_fields=None)[source]
This is an alternative way to create strands instead of calling the
Strandconstructor explicitly. It behaves similarly to theStrandconstructor, but it has an option to specifyDomain’s simply by giving a name.A
Strandcan be created either by listing explicitDomainobjects via parameter domains (as in theStrandconstructor), or by giving names via parameter domain_names. If domain_names is specified, then by convention those that end with a*are assumed to be starred.In particular,
Domainobjects are created as needed, whenever theDesignsees a new domain name that has not been encountered. Also,Domain’s created in this way are “interned” as variables in a cache stored in theDesignobject; no twoDomain’s with the same name in this design will be created, and subsequent uses of the same name will refer to the sameDomainobject.- Parameters:
domain_names (list[str] | None) – Names of the
Domain’s on thisStrand.Domainobjects are created by theDesignas needed whenever a new domain name is specified; if the domain name has already been used (or its complement via the convention that names ending in a * are the complement of the domain whose name is equal but without ending in a *), then the sameDomainobject is reused. Mutually exclusive withStrand.domainsandStrand.starred_domain_indices.domains (list[Domain] | None) – list of
Domain’s on thisStrand. Mutually exclusive withStrand.domain_names, and must be specified jointly withStrand.starred_domain_indices.starred_domain_indices (Iterable[int] | None) – Indices of
Domain’s in domains that are starred. Mutually exclusive withStrand.domain_names, and must be specified jointly withStrand.domains.group (str) – name of group of this
Strand.name (str | None) – Name of this
Strand.label (str | None) – Label to associate with this
Strand.vendor_fields (VendorFields | None) –
VendorFieldsobject to associate with thisStrand; needed to call methods for exporting to IDT formats (e.g.,Strand.write_idt_bulk_input_file())
- Returns:
the
Strandthat is created- Return type:
- modifications(mod_type=None)[source]
Returns either set of all
modifications.Modification’s in thisDesign, or set of all modifications of a given type (5’, 3’, or internal).- Parameters:
mod_type (nm.ModificationType | None) – type of modifications (5’, 3’, or internal); if not specified, all three types are returned
- Returns:
set of all modifications in this
Design(possibly of a given type).- Return type:
set[nm.Modification]
- to_idt_bulk_input_format(delimiter=',', domain_delimiter='', key=None, warn_duplicate_name=False, only_strands_with_vendor_fields=False, strands=None)[source]
Called by
Design.write_idt_bulk_input_file()to determine what string to write to the file. This function can be used to get the string directly without creating a file.Parameters have the same meaning as in
Design.write_idt_bulk_input_file().- Returns:
string that is written to the file in the method
Design.write_idt_bulk_input_file().- Parameters:
- Return type:
str
- write_idt_bulk_input_file(*, filename=None, directory='.', key=None, extension=None, delimiter=',', domain_delimiter='', warn_duplicate_name=True, only_strands_with_vendor_fields=False, strands=None)[source]
Write
.idttext file encoding the strands of thisDesignwith the fieldStrand.vendor_fields, suitable for pasting into the “Bulk Input” field of IDT (Integrated DNA Technologies, Coralville, IA, https://www.idtdna.com/), with the output file having the same name as the running script but with.pychanged to.idt, unless filename is explicitly specified. For instance, if the script is namedmy_origami.py, then the sequences will be written tomy_origami.idt. If filename is not specified but extension is, then that extension is used instead ofidt. At least one of filename or extension must beNone.The string written is that returned by
Design.to_idt_bulk_input_format().- Parameters:
filename (str) – optional custom filename to use (instead of currently running script)
directory (str) – specifies a directory in which to place the file, either absolute or relative to the current working directory. Default is the current working directory.
key (KeyFunction[Strand] | None) – key function used to determine order in which to output strand sequences. Some useful defaults are provided by
strand_order_key_function()extension (str | None) – alternate filename extension to use (instead of idt)
delimiter (str) – is the symbol to delimit the four IDT fields name,sequence,scale,purification.
domain_delimiter (str) – symbol(s) to put in between DNA sequences of different domains in a strand.
warn_duplicate_name (bool) – if
Trueprints a warning when two differentStrand’s have the sameVendorFields.nameand the sameStrand.sequence(). A ValueError is raised (regardless of the value of this parameter) if two differentStrand’s have the same name but different sequences, IDT scales, or IDT purifications.only_strands_with_vendor_fields (bool) – If False (the default), all non-scaffold sequences are output, with reasonable default values chosen if the field
Strand.vendor_fieldsis missing. If True, then strands lacking the fieldStrand.vendor_fieldswill not be exported.strands (Iterable[Strand] | None) – strands to export; if not specified, all strands in design are exported. NOTE: it is not checked that each
Strandin strands is actually contained in this any:Design
- Return type:
None
- write_idt_plate_excel_file(*, filename=None, directory='.', key=None, warn_duplicate_name=False, only_strands_with_vendor_fields=False, use_default_plates=True, warn_using_default_plates=True, plate_type=PlateType.wells96, strands=None)[source]
Write
.xls(Microsoft Excel) file encoding the strands of thisDesignwith the fieldStrand.vendor_fields, suitable for uploading to IDT (Integrated DNA Technologies, Coralville, IA, https://www.idtdna.com/) to describe a 96-well or 384-well plate (https://www.idtdna.com/site/order/plate/index/dna/), with the output file having the same name as the running script but with.pychanged to.xls, unless filename is explicitly specified. For instance, if the script is namedmy_origami.py, then the sequences will be written tomy_origami.xls.If the last plate has fewer than 24 strands for a 96-well plate, or fewer than 96 strands for a 384-well plate, then the last two plates are rebalanced to ensure that each plate has at least that number of strands, because IDT charges extra for a plate with too few strands: https://www.idtdna.com/pages/products/custom-dna-rna/dna-oligos/custom-dna-oligos
- Parameters:
filename (str) – custom filename if default (explained above) is not desired
directory (str) – specifies a directory in which to place the file, either absolute or relative to the current working directory. Default is the current working directory.
key (KeyFunction[Strand] | None) –
key function used to determine order in which to output strand sequences. Some useful defaults are provided by
strand_order_key_function()warn_duplicate_name (bool) – if
Trueprints a warning when two differentStrand’s have the sameVendorFields.nameand the sameStrand.sequence(). A ValueError is raised (regardless of the value of this parameter) if two differentStrand’s have the same name but different sequences, IDT scales, or IDT purifications.only_strands_with_vendor_fields (bool) – If False (the default), all non-scaffold sequences are output, with reasonable default values chosen if the field
Strand.vendor_fieldsis missing. (though scaffold is included if export_scaffold is True). If True, then strands lacking the fieldStrand.vendor_fieldswill not be exported. If False, then use_default_plates must be True.use_default_plates (bool) – Use default values for plate and well (ignoring those in
Strand.vendor_fields, which may be None). If False, each Strand to export must have the fieldStrand.vendor_fields, so in particular the parameter only_strands_with_idt must be True.warn_using_default_plates (bool) – specifies whether, if use_default_plates is True, to print a warning for strands whose
Strand.vendor_fieldshas the fieldsVendorFields.plateandVendorFields.well, since use_default_plates directs these fields to be ignored.plate_type (PlateType) – a
PlateTypespecifying whether to use a 96-well plate or a 384-well plate if the use_default_plates parameter isTrue. Ignored if use_default_plates isFalse, because in that case the wells are explicitly set by the user, who is free to use coordinates for either plate type.strands (Iterable[Strand] | None) – strands to export; if not specified, all strands in design are exported. NOTE: it is not checked that each
Strandin strands is actually contained in this any:Design
- Return type:
None
- domain_pools()[source]
- Returns:
list of all
DomainPool’s in thisDesign- Return type:
list[DomainPool]
- domains_by_pool_name(domain_pool_name)[source]
- Parameters:
domain_pool_name (str) – name of a
DomainPool- Returns:
the
Domain’s in domain_pool- Return type:
list[Domain]
- static from_scadnano_file(sc_filename, fix_assigned_sequences=True, ignored_strands=None)[source]
Converts a scadnano Design stored in file named sc_filename to a a
Designfor doing DNA sequence design. Each Strand name and Domain name from the scadnano Design are assigned asStrand.nameandDomain.namein the obvious way. Assumes each Strand label is a string describing the strand group.The scadnano package must be importable.
Also assigns sequences from domains in sc_design to those of the returned
Design. If fix_assigned_sequences is true, then these DNA sequences are fixed; otherwise not.- Parameters:
sc_filename (str) – Name of file containing scadnano Design.
fix_assigned_sequences (bool) – Whether to fix the sequences that are assigned from those found in sc_design.
ignored_strands (Iterable[Strand] | None) – Strands to ignore
- Returns:
An equivalent
Design, ready to be given constraints for DNA sequence design.- Raises:
TypeError – If any scadnano strand label is not a string.
- Return type:
- static from_scadnano_design(sc_design, fix_assigned_sequences=True, ignored_strands=None, warn_existing_domain_labels=True)[source]
Converts a scadnano Design sc_design to a a
Designfor doing DNA sequence design. Each Strand name and Domain name from the scadnano Design are assigned asStrand.nameandDomain.namein the obvious way. Assumes each Strand label is a string describing the strand group.The scadnano package must be importable.
Also assigns sequences from domains in sc_design to those of the returned
Design. If fix_assigned_sequences is true, then these DNA sequences are fixed; otherwise not.- Parameters:
sc_design (sc.Design) – Instance of scadnano.Design from the scadnano Python scripting library.
fix_assigned_sequences (bool) – Whether to fix the sequences that are assigned from those found in sc_design.
ignored_strands (Iterable[Strand] | None) – Strands to ignore; none are ignore if not specified.
warn_existing_domain_labels (bool) – If True, logs warning when dsd
Domainalready has a label and so does scadnano domain, since scadnano label will not be assigned to the dsdDomain.
- Returns:
An equivalent
Design, ready to be given constraints for DNA sequence design.- Raises:
TypeError – If any scadnano strand label is not a string.
- Return type:
- assign_fields_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False, ignore_domains=False)[source]
Assigns DNA sequence, VendorFields, and StrandGroups (as a key in a scadnano String.label dict under key “group”). TODO: document more
- Parameters:
sc_design (Design)
ignored_strands (Collection[Strand])
overwrite (bool)
ignore_domains (bool)
- assign_sequences_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False, ignore_domains=False, _skip_compute_derived_fields=False)[source]
Assigns sequences from this
Designinto sc_design.Also writes a label to each scadnano strand. If the label is None a new one is created as a dict with a key group. The name of the StrandGroup of the nuad design is the value to assign to this key. If the scadnano strand label is already a dict, it adds this key. If the strand label is not None or a dict, an exception is raised.
Assumes that each domain name in domains in sc_design is a
Domain.nameof aDomainin thisDesign.If multiple strands in sc_design share the same name, then all of them are assigned the DNA sequence of the nuad
Strandwith that name.- Parameters:
sc_design (Design) – a scadnano design.
ignored_strands (Collection[Strand]) – strands in the scadnano design that are to be ignored by the sequence designer.
overwrite (bool) – if True, overwrites existing sequences; otherwise gives an error if an existing sequence disagrees with the newly assigned sequence
ignore_domains (bool) – if True, does not check that domain names match between the designs. It is still required that each strand name appearing in sc_design is the name of a strand in this design. Cannot be specified if overwrite is False.
_skip_compute_derived_fields (bool)
- Return type:
None
Returns a list of pairs (nuad_strand, sc_strands), where nuad_strand has the same name as all scadnano Strands in sc_strands, but only scadnano strands are included in the list that do not appear in ignored_strands.
- assign_strand_groups_to_labels(sc_design, ignored_strands=(), overwrite=False)[source]
TODO: document this
- Parameters:
sc_design (Design)
ignored_strands (Iterable[Strand])
overwrite (bool)
- Return type:
None
- assign_idt_fields_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]
Assigns
VendorFieldsfrom thisDesigninto sc_design.If multiple strands in sc_design share the same name, then all of them are assigned the IDT fields of the dsd
Strandwith that name.- Parameters:
sc_design (Design) – a scadnano design.
ignored_strands (Iterable[Strand]) – strands in the scadnano design that are to be not assigned.
overwrite (bool) – whether to overwrite existing fields.
- Raises:
ValueError – if scadnano strand already has any modifications assigned
- Return type:
None
- assign_modifications_to_scadnano_design(sc_design, ignored_strands=(), overwrite=False)[source]
Assigns
modifications.Modification’s from thisDesigninto sc_design.If multiple strands in sc_design share the same name, then all of them are assigned the modifications of the dsd
Strandwith that name.- Parameters:
sc_design (Design) – a scadnano design.
ignored_strands (Iterable[Strand]) – strands in the scadnano design that are to be not assigned.
overwrite (bool) – whether to overwrite existing fields in scadnano design
- Raises:
ValueError – if scadnano strand already has any modifications assigned
- Return type:
None
- copy_sequences_from(other)[source]
Assuming every
Domainin thisDesignis has a matching (same name)Domainin other, copies sequences from other into thisDesign.
- class constraints.Constraint(description, short_description='', weight=1.0, score_transfer_function=None)[source]
Abstract base class of all “soft” constraints to apply when running
search.search_for_dna_sequences(). Unlike aNumpyFilteror aSequenceFilter, which disallow certain DNA sequences from ever being assigned to aDomain, aConstraintcan be violated during the search. The goal of the search is to reduce the number of violatedConstraint’s. Seesearch.search_for_dna_sequences()for a more detailed description of how the search algorithm interacts with the constraints.You will not use this class directly, but instead its concrete subclasses
DomainConstraint,StrandConstraint,DomainPairConstraint,StrandPairConstraint,ComplexConstraint, which are subclasses ofSingularConstraint,DomainsConstraint,StrandsConstraint,DomainPairsConstraint,StrandPairsConstraint, which are subclasses ofBulkConstraint.- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
- description: str
Description of the constraint, e.g., ‘strand has secondary structure exceeding -2.0 kcal/mol’ suitable for printing in a long text report.
- short_description: str = ''
Very short description of the constraint suitable for compactly logging to the screen, where many of these short descriptions must fit onto one line, e.g., ‘strand ss’ or ‘dom pair nupack’
- weight: float = 1.0
Constant multiplier Weight of the problem; the higher the total weight of all the
Constraint’s aDomainhas caused, the greater likelihood its sequence is changed when stochastically searching for sequences to satisfy all constraints.
- score_transfer_function: Callable[[float], float] | None = None
See
nuad.search.SearchParameters.score_transfer_function.If specified, this will override the one specified in
nuad.search.SearchParameters.score_transfer_function.
- abstract static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
- Return type:
None
- class constraints.Result(excess, summary=None, value=None, unit=None)[source]
A
Resultis returned from the functionSingularConstraint.evaluate, and a list ofResult’s is returned from the functionBulkConstraint.evaluate_bulk, describing the result of evaluating the constraint on the design “part”.A
Resultmust have an “excess” and “summary” specified.Optionally one may also specify a “value”, which helps in graphically displaying the results of evaluating constraints using the function
display_report().For example, if the constraint checks that the NUPACK complex free energy of a strand is at least -2.5 kcal/mol, and a strand has energy -3.4 kcal/mol, then the following are sensible values for these fields:
value=-3.4unit="kcal/mol"excess=-0.9summary="-3.4 kcal/mol"
- Parameters:
excess (float)
summary (str | None)
value (float | None)
unit (str | None)
- __init__(excess, summary=None, value=None, unit=None)[source]
- Parameters:
excess (float)
summary (str | None)
value (float | None)
unit (str | None)
- Return type:
None
- excess: float
The excess is a nonnegative value that is turned into a score, and the search minimizes the total score of all constraint evaluations. Setting this to 0 (or a negative value) means the constraint is satisfied, and setting it to a positive value means the constraint is violated. The interpretation is that the larger excess is, the more the constraint is violated.
For example, a common value for excess is the amount by which the NUPACK complex free energy exceeds a threshold.
- value: float | None = None
If this is a “numeric” constraint, i.e., checking some number such as the complex free energy of a strand and comparing it to a threshold, this is the “raw” value. It is optional, but if specified, then the raw values can be plotted in a Jupyter notebook by the function
display_report().Optional units (e.g., ‘kcal/mol’) can be specified in the field
Result.units.
- unit: str | None = None
Optional units for
Result.value, e.g.,'kcal/mol'.If specified, then the units are used in text reports and to label the y-axis in plots created by
search.display_report().
- score: float = 0.0
Set by the search algorithm based on
Result.excessas well as other data such as the constraint’s weight and theSearchParameters.score_transfer_function.
- part: DesignPart
Set by the search algorithm based on the part that was evaluated.
- property summary: str
This string is displayed in the text report on constraints, after the name of the “part” (e.g., strand, pair of domains, pair of strands).
It can be set explicitly, or calculated from
Result.valueif not set explicitly.
- class constraints.SingularConstraint(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, evaluate: 'Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]]' = <function SingularConstraint.<lambda> at 0x7dc03455fb80>, parallel: 'bool' = False)[source]
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
- evaluate()
Essentially a wrapper for a function that evaluates the
Constraint. It takes as input a tuple of DNA sequences (Python strings) and an optionalPart, wherePartis one ofDomain,Strand,DomainPair,StrandPair, orComplex(the latter being an alias for arbitrary-length tuple ofStrand’s).The second argument will be None if
SingularConstraint.parallelis True (since it’s more expensive to serialize theDomainandStrandobjects than strings for passing data to processes executing in parallel).Thus, if the
Constraintneeds to use more data about thePartthan just its DNA sequence, by accessing the second argument,Constraint.parallelshould be set to False.It should return a
Resultobject.
- parallel: bool = False
Whether or not to use parallelization across multiple processes to take advantage of multiple processors/cores, by calling
SingularConstraint.evaluateon different DesignParts in separate processes.
- call_evaluate(seqs, part, score_transfer_function)[source]
Evaluates this
Constraintusing functionSingularConstraint.evaluatesupplied in constructor.- Parameters:
seqs (tuple[str, ...]) – sequence(s) of relevant
Part, e.g., if part is a pair ofStrand’s, then seqs is a pair of stringspart (DesignPart | None) – the
Partto be evaluated. Might be None if parallelization is being used, since it is cheaper to serialize only the sequence(s) than the entirePartfor passing to other processes to evaluate in parallel.score_transfer_function (Callable[[float], float]) – function to apply to the excess value of the
Resultreturned by the evaluate function, to compute the score.
- Returns:
a
Resultobject- Return type:
Result[DesignPart]
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
- Return type:
None
- class constraints.BulkConstraint(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, evaluate_bulk: 'Callable[[Sequence[DesignPart]], list[Result]]' = <function BulkConstraint.<lambda> at 0x7dc03455fee0>)[source]
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
- Return type:
None
- class constraints.ConstraintWithDomains(domains: 'tuple[Domain, ...] | None' = None)[source]
- Parameters:
domains (tuple[Domain, ...] | None)
- class constraints.ConstraintWithStrands(strands: 'tuple[Strand, ...] | None' = None)[source]
- Parameters:
strands (tuple[Strand, ...] | None)
- class constraints.DomainConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domains=None)[source]
Constraint that applies to a single
Domain.- Parameters:
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domains=None)
- class constraints.StrandConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strands=None)[source]
Constraint that applies to a single
Strand.- Parameters:
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strands=None)
- class constraints.ConstraintWithDomainPairs(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, domain_pairs: 'tuple[DomainPair, ...] | None' = None, pairs: 'InitVar[Iterable[tuple[Domain, Domain], ...] | None]' = None, check_domain_against_itself: 'bool' = True)[source]
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)
- domain_pairs: tuple[DomainPair, ...] | None = None
List of
DomainPair’s to check; if not specified, all pairs inDesignare checked.This can be specified manmually, or alternately is set internally in the constructor based on the optional
__init__parameter pairs.
- pairs: InitVar[Iterable[tuple[Domain, Domain], ...] | None] = None
Init-only variable (specified in constructor, but is not a field in the class) for specifying pairs of domains to check; if not specified, all pairs in
Designare checked, unlessConstraintWithDomainPairs.domain_pairsis specified.
- check_domain_against_itself: bool = True
Whether to check a domain against itself when checking all pairs of
Domain’s in theDesign. Only used ifConstraintWithDomainPairs.pairsis not specified, otherwise it is ignored.
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, domain_pairs=None, pairs=None, check_domain_against_itself=True)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)
- Return type:
None
- class constraints.ConstraintWithStrandPairs(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, strand_pairs: 'tuple[StrandPair, ...] | None' = None, pairs: 'InitVar[Iterable[tuple[Strand, Strand], ...] | None]' = None, check_strand_against_itself: 'bool' = True)[source]
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)
- strand_pairs: tuple[StrandPair, ...] | None = None
List of
StrandPair’s to check; if not specified, all pairs inDesignare checked.This can be specified manmually, or alternately is set internally in the constructor based on the optional
__init__parameter pairs.
- pairs: InitVar[Iterable[tuple[Strand, Strand], ...] | None] = None
Init-only variable (specified in constructor, but is not a field in the class) for specifying pairs of strands; if not specified, all pairs in
Designare checked, unlessConstraintWithStrandPairs.strand_pairsis specified.
- check_strand_against_itself: bool = True
Whether to check a strand against itself when checking all pairs of
Strand’s in theDesign. Only used ifConstraintWithStrandPairs.pairsis not specified, otherwise it is ignored.
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, strand_pairs=None, pairs=None, check_strand_against_itself=True)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)
- Return type:
None
- class constraints.DomainPairConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domain_pairs=None, pairs=None, check_domain_against_itself=True)[source]
Constraint that applies to a pair of
Domain’s.These should be symmetric, meaning that the constraint will give the same evaluation whether its evaluate method is given the pair (domain1, domain2), or the pair (domain2, domain1).
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, domain_pairs=None, pairs=None, check_domain_against_itself=True)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)
- Return type:
None
- class constraints.StrandPairConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strand_pairs=None, pairs=None, check_strand_against_itself=True)[source]
Constraint that applies to a pair of
Strand’s.These should be symmetric, meaning that the constraint will give the same evaluation whether its evaluate method is given the pair (strand1, strand2), or the pair (strand2, strand1).
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, strand_pairs=None, pairs=None, check_strand_against_itself=True)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate (Callable[[tuple[str, ...], DesignPart | None], Result[DesignPart]])
parallel (bool)
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)
- Return type:
None
- class constraints.DomainsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domains=None)[source]
Constraint that applies to a several
Domain’s.The difference with
DomainConstraintis that the caller may want to process allDomain’s at once, e.g., by giving many of them to a third-party program such as ViennaRNA, which may be more efficient than repeatedly calling a Python function.It is assumed that the constraint works by checking one
Domainat a time. After computing initial violations of constraints, subsequent calls to this constraint only give the domain that was mutated, not the entire ofDomain’s in the wholeDesign.- Parameters:
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domains=None)
- class constraints.StrandsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strands=None)[source]
Constraint that applies to a several
Strand’s.The difference with
StrandConstraintis that the caller may want to process allStrand’s at once, e.g., by giving many of them to a third-party program such as ViennaRNA.It is assumed that the constraint works by checking one
Strandat a time. After computing initial violations of constraints, subsequent calls to this constraint only give strands containing the domain that was mutated, not the entire ofStrand’s in the wholeDesign.- Parameters:
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strands=None)
- class constraints.DomainPairsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domain_pairs=None, pairs=None, check_domain_against_itself=True)[source]
Similar to
DomainsConstraintbut operates on a specified list of pairs ofDomain’s.- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, domain_pairs=None, pairs=None, check_domain_against_itself=True)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
domain_pairs (tuple[DomainPair, ...] | None)
pairs (InitVar[Iterable[tuple[Domain, Domain], ...] | None])
check_domain_against_itself (bool)
- Return type:
None
- class constraints.StrandPairsConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strand_pairs=None, pairs=None, check_strand_against_itself=True)[source]
Similar to
StrandsConstraintbut operates on a specified list of pairs ofStrand’s.- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)
- static part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, strand_pairs=None, pairs=None, check_strand_against_itself=True)
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
evaluate_bulk (Callable[[Sequence[DesignPart]], list[Result]])
strand_pairs (tuple[StrandPair, ...] | None)
pairs (InitVar[Iterable[tuple[Strand, Strand], ...] | None])
check_strand_against_itself (bool)
- Return type:
None
- constraints.verify_designs_match(design1, design2, check_fixed=True)[source]
Verifies that two designs match, other than their constraints. This is useful when loading a design that has been saved in the middle of searching for DNA sequences, to verify that it matches a design created before the DNA sequence search started.
- Parameters:
- Raises:
ValueError – If the designs do not match. Here is what is checked: - strand names and group names appear in the same order - domain names and pool names appear in the same order in strands with the same name -
Domain.fixedmatches betweenDomain’s- Return type:
None
- constraints.convert_threshold(threshold, key)[source]
- Parameters:
threshold (float | dict[T, float]) – either a single float, or a dictionary mapping instances of T to floats
key (T) – instance of T
- Returns:
threshold for key
- Return type:
float
- constraints.nupack_domain_free_energy_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, parallel=False, description=None, short_description='strand_ss_nupack', domains=None)[source]
Returns constraint that checks individual
Domain’s for excessive interaction using NUPACK’s pfunc.NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
threshold (float) – energy threshold in kcal/mol
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
weight (float) – how much to weigh this
Constraintscore_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.
domains (Iterable[Domain] | None) –
Domain’s to check; if not specified, all domains are checked.description (str | None) – detailed description of constraint suitable for putting in report; if not specified a reasonable default is chosen
short_description (str) – short description of constraint suitable for logging to stdout
- Returns:
the constraint
- Return type:
- constraints.forbidden_substring_strand_constraint(forbidden_substring, weight=1.0, score_transfer_function=None, parallel=False, description=None, short_description=None, strands=None)[source]
Returns constraint that checks that a given substring does not appear in the sequence of any of the given
Strand’s. For individual domains, this is better implemented usingForbiddenSubstringFilter, but that will not catch substrings such as GGGG that cross domain boundaries.- Parameters:
forbidden_substring (str | Iterable[str])
weight (float)
score_transfer_function (Callable[[float], float] | None)
parallel (bool)
description (str | None)
short_description (str | None)
strands (Iterable[Strand] | None)
- Return type:
- constraints.nupack_strand_free_energy_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, parallel=False, description=None, short_description='strand_ss_nupack', strands=None)[source]
Returns constraint that checks individual
Strand’s for excessive interaction using NUPACK’s pfunc. This is the so-called “complex free energy”: https://docs.nupack.org/definitions/#complex-free-energyNUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
threshold (float) – energy threshold in kcal/mol
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
weight (float) – how much to weigh this
Constraintscore_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.
strands (Iterable[Strand] | None) – Strands to check; if not specified, all strands are checked.
description (str | None) – detailed description of constraint suitable for putting in report; if not specified a reasonable default is chosen
short_description (str) – short description of constraint suitable for logging to stdout
- Returns:
the constraint
- Return type:
- constraints.nupack_domain_pair_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, parallel=False, weight=1.0, score_transfer_function=None, description=None, short_description='dom_pair_nupack', pairs=None)[source]
Returns constraint that checks given pairs of
Domain’s for excessive interaction using NUPACK’s pfunc executable. Each of the four combinations of seq1, seq2 and their Watson-Crick complements are compared.- Parameters:
threshold (float) – Energy threshold in kcal/mol.
temperature (float) – Temperature in Celsius
sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
parallel (bool) – Whether to test each pair of
Domain’s in parallel (i.e., sets fieldConstraint.parallel)weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.description (str | None) – Detailed description of constraint suitable for summary report.
short_description (str) – See
Constraint.short_descriptionpairs (Iterable[tuple[Domain, Domain]] | None) – Pairs of
Domain’s to compare; if not specified, checks all pairs (including aDomainagainst itself).
- Returns:
The
DomainPairConstraint.- Return type:
- constraints.nupack_strand_pair_constraints_by_number_matching_domains(thresholds, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, ignore_missing_thresholds=False)[source]
Convenience function for creating many constraints as returned by
nupack_strand_pair_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.Optional parameters descriptions and short_descriptions are also dicts keyed by the same keys.
Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.
It is also common to set different thresholds according to the lengths of the strands. This can be done by calling
strand_pairs_by_lengths()to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.- Parameters:
thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
sodium (float) – concentration of Na+ in molar
magnesium (float) – concentration of Mg++ in molar
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of
Strand’s in parallel.strands (Iterable[Strand] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in pairs. Mutually exclusive with pairs.pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.
- Returns:
list of constraints, one per threshold in thresholds
- Return type:
list[StrandPairConstraint]
- constraints.rna_duplex_strand_pair_constraints_by_number_matching_domains(thresholds, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, ignore_missing_thresholds=False)[source]
Convenience function for creating many constraints as returned by
rna_duplex_strand_pair_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.Optional parameters descriptions and short_descriptions are also dicts keyed by the same keys.
Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.
It is also common to set different thresholds according to the lengths of the strands. This can be done by calling
strand_pairs_by_lengths()to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.- Parameters:
thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
sodium (float) – concentration of Na+ in molar
magnesium (float) – concentration of Mg++ in molar
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of
Strand’s in parallel.strands (Iterable[Strand] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in pairs. Mutually exclusive with pairs.pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.
- Returns:
list of constraints, one per threshold in thresholds
- Return type:
list[StrandPairConstraint]
- constraints.nupack_strand_pair_constraint(threshold, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, description=None, short_description='strand_pair_nupack', parallel=False, pairs=None)[source]
Returns constraint that checks given pairs of
Strand’s for excessive interaction using NUPACK’s pfunc function.NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
threshold (float) – Energy threshold in kcal/mol
temperature (float) – Temperature in Celsius
sodium (float) – concentration of Na+ in molar
magnesium (float) – concentration of Mg++ in molar
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.parallel (bool) – Whether to use parallelization by running constraint evaluation in separate processes to take advantage of multiple cores.
description (str | None) – Detailed description of constraint suitable for report.
short_description (str) – See
Constraint.short_descriptionpairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs (including aStrandagainst itself).
- Returns:
The
StrandPairConstraint.- Return type:
- constraints.rna_duplex_strand_pair_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_dup_strand_pairs', parallel=False, pairs=None, strands=None)[source]
Returns constraint that checks given pairs of
Strand’s for excessive interaction using Vienna RNA’s RNAduplex executable.Often one wishes to let the threshold depend on how many domains match between a pair of strands. The function
rna_duplex_strand_pairs_constraints_by_number_matching_domains()is useful for this purpose, returning a list ofStrandPairsConstraint’s such as those returned by this function, one for each possible number of matching domains.- Parameters:
threshold (float) – Energy threshold in kcal/mol. If a float, this is used for all pairs of strands. If a dict[int, float], interpreted to mean that
temperature (float) – Temperature in Celsius.
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.description (str | None) – See
Constraint.descriptionshort_description (str) – See
Constraint.short_descriptionpairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in design.max_energy – This is the maximum energy possible to assign. If RNAduplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAduplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.
gu_wobble – Whether to allow GU wobble pairs; I honestly don’t know what this means for DNA, but it’s an option in RNAduplex, and it makes a difference in calculating the energy. NOTE: this is a global property of the ViennaRNA package, so this must be set the same for every constraint that uses ViennaRNA.
parallel (bool)
strands (Iterable[Strand] | None)
- Returns:
The
StrandPairConstraint.- Return type:
- constraints.chunker(sequence, chunk_length=None, num_chunks=None)[source]
Collect data into fixed-length chunks or blocks, e.g., chunker(‘ABCDEFG’, 3) –> ABC DEF G
- Parameters:
sequence (Sequence[T]) – Sequence (list or tuple) of items.
chunk_length (int | None) – Length of each chunk. Mutually exclusive with num_chunks.
num_chunks (int | None) – Number of chunks. Mutually exclusive with chunk_length.
- Returns:
list of num_chunks lists, each list of length chunk_length (one of num_chunks or chunk_length will be calculated from the other).
- Return type:
list[list[T]]
- constraints.cpu_count(logical=False)[source]
Counts the number of physical CPUs (cores). For greatest accuracy, requires the 3rd party psutil package to be installed.
- Parameters:
logical (bool) – Whether to count number of logical processors or physical CPU cores.
- Returns:
Number of physical CPU cores if logical is False and package psutils is installed; otherwise, the number of logical processors.
- Return type:
int
- constraints.rna_duplex_domain_pairs_constraint(threshold, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_dup_dom_pairs', pairs=None, domains=None)[source]
Returns constraint that checks given pairs of
Domain’s for excessive interaction using Vienna RNA’s duplexfold function (what RNADuplex command-line tool does).- Parameters:
threshold (float) – energy threshold
temperature (float) – temperature in Celsius
weight (float) – how much to weigh this
Constraintscore_transfer_function (Callable[[float], float]) – See
Constraint.score_transfer_function.description (str | None) – long description of constraint suitable for printing in report file
short_description (str) – short description of constraint suitable for logging to stdout
pairs (Iterable[tuple[Domain, Domain]] | None) – pairs of
Domain’s to compare; if neither this nor domains specified, checks all pairsdomains (Iterable[Domain] | None) –
Domain’s to compare (all pairs with replacement); if neither this nor pairs specified, checks all pairsparameters_filename – name of parameters file for ViennaRNA; default is same as
vienna_nupack.rna_duplex_multiple()
- Returns:
constraint
- Return type:
- constraints.rna_plex_domain_pairs_constraint(threshold, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_plex_dom_pairs', pairs=None, parameters_filename='dna_mathews1999.par')[source]
Returns constraint that checks given pairs of
Domain’s for excessive interaction using Vienna RNA’s RNAplex executable.- Parameters:
threshold (float) – energy threshold
temperature (float) – temperature in Celsius
weight (float) – how much to weigh this
Constraintscore_transfer_function (Callable[[float], float]) – See
Constraint.score_transfer_function.description (str | None) – long description of constraint suitable for printing in report file
short_description (str) – short description of constraint suitable for logging to stdout
pairs (Iterable[tuple[Domain, Domain]] | None) – pairs of
Domain’s to compare; if not specified, checks all pairsparameters_filename (str) – name of parameters file for ViennaRNA; default is same as
vienna_nupack.rna_duplex_multiple()
- Returns:
constraint
- Return type:
- constraints.nupack_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, description=None, short_description='dom_pair_nupack_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Similar to
rna_plex_domain_pairs_nonorthogonal_constraint(), but uses NUPACK instead of RNAplex. Only two parameters sodium and magnesium are different; documented here.- Parameters:
sodium (float) – concentration of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – concentration of magnesium (Mg++) in moles per liter
thresholds (dict[tuple[Domain, bool, Domain, bool] | tuple[Domain, Domain], tuple[float, float]])
temperature (float)
weight (float)
score_transfer_function (Callable[[float], float] | None)
description (str | None)
short_description (str)
parameters_filename (str)
max_energy (float)
- Return type:
- constraints.rna_plex_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_plex_dom_pairs_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Returns constraint that checks given pairs of
Domain’s for interaction energy in a given interval, using ViennaRNA’s RNAplex executable.This can be used to implement “nonorthogonal” domains as described in these papers:
The “binding affinity table” as described in the first paper is implemented as the thresholds parameter of this function.
- Parameters:
thresholds (dict[tuple[Domain, bool, Domain, bool] | tuple[Domain, Domain], tuple[float, float]]) –
dict mapping pairs of
Domain’s, along with Booleans to indicate whether eitherDomainis starred, to pairs of energy thresholds. Alternately, the key can be a pair ofDomain’s(a,b)without any Booleans; in this case only the unstarred versions of the domains are checked; this is equivalent to the key(a, False, b, False).For example, if the dict is
{ (a, False, b, False): (-9.0, -8.0), (a, False, b, True): (-5.0, -3.0), (a, True, c, False): (-1.0, 0.0), (a, d): (-7.0, -6.0), }
then the constraint ensures that RNAplex energy for
(a,b)is between -9.0 and -8.0 kcal/mol,(a,b*)is between -5.0 and -3.0 kcal/mol,(a*,c)is between -1.0 and 0.0 kcal/mol, and(a,d)is between -7.0 and -6.0 kcal/mol.
For all other pairs of domains not listed as keys in the dict (such as
(b,c)or(a*,b*)above), the constraint is not checked.temperature (float) – temperature in Celsius
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float]) – See
Constraint.score_transfer_function.description (str | None) – See
Constraint.description.short_description (str) – See
Constraint.short_description.parameters_filename (str) – name of parameters file for ViennaRNA; default is same as
vienna_nupack.rna_duplex_multiple()max_energy (float) – maximum energy to return; if the RNAplex returns a value larger than this, then this value is used instead
- Returns:
constraint
- Return type:
- constraints.rna_duplex_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_plex_dom_pairs_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Similar to
rna_plex_domain_pairs_nonorthogonal_constraint(), but uses RNAduplex instead of RNAplex.- Parameters:
- Return type:
- constraints.rna_multifold_domain_pairs_nonorthogonal_constraint(thresholds, temperature=37, weight=1.0, score_transfer_function=<function <lambda>>, description=None, short_description='rna_multifold_dom_pairs_nonorth', parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Similar to
rna_plex_domain_pairs_nonorthogonal_constraint(), but uses ViennaRNA’s multi-strand partition function (RNA.fold_compound(...).pf(), the Python analog of NUPACK’spfunc()) instead of RNAplex.- Parameters:
- Return type:
- constraints.strand_pairs_by_lengths(strands)[source]
Separates pairs of strands in strands by lengths. If there are n different strand lengths in strands, then there are ((n+1) choose 2) keys in the returned dict, one for each pair of lengths
(len1, len2), including pairs wherelen1 == len2. This key maps to a list of all pairs of strands in strands where the first strand has lengthlen1and the second has lengthlen2.
- constraints.strand_pairs_by_number_matching_domains(*, strands=None, pairs=None)[source]
Utility function for calculating number of complementary domains betweeen several pairs of strands.
Note that for the common use case that you want to create several constraints, each with their own threshold depending on the number of complementary domains, you should use functions such as
rna_duplex_strand_pairs_constraints_by_number_matching_domains()ornupack_strand_pair_constraints_by_number_matching_domains(), which in turn calls this function and then creates as many constraints as their are different numbers of complementary domains.- Parameters:
- Returns:
dict mapping integer (number of complementary
Domain’s) to the list of pairs of strands in strands with that number of complementary domains- Return type:
- constraints.rna_multifold_strand_pair_constraints_by_number_matching_domains(*, thresholds, temperature=37, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, ignore_missing_thresholds=False)[source]
Similar to
rna_duplex_strand_pair_constraints_by_number_matching_domains()but creates constraints as returned byrna_multifold_strand_pair_constraint().- Parameters:
thresholds (dict[int, float])
temperature (float)
weight (float)
score_transfer_function (Callable[[float], float] | None)
descriptions (dict[int, str] | None)
short_descriptions (dict[int, str] | None)
parallel (bool)
strands (Iterable[Strand] | None)
ignore_missing_thresholds (bool)
- Return type:
list[StrandPairConstraint]
- constraints.rna_duplex_strand_pairs_constraints_by_number_matching_domains(*, thresholds, temperature=37, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, parameters_filename='dna_mathews1999.par', ignore_missing_thresholds=False)[source]
Convenience function for creating many constraints as returned by
rna_duplex_strand_pairs_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.Optional parameters description and short_description are also dicts keyed by the same keys.
Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.
It is also common to set different thresholds according to the lengths of the strands. This can be done by calling
strand_pairs_by_lengths()to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.- Parameters:
thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of
Strand’s in parallel.strands (Iterable[Strand] | None) –
Strand’s to compare; if not specified, checks all in design. Mutually exclusive with pairs.pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as
vienna_nupack.rna_duplex_multiple()ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.
- Returns:
list of constraints, one per threshold in thresholds
- Return type:
list[StrandPairsConstraint]
- constraints.rna_plex_strand_pairs_constraints_by_number_matching_domains(*, thresholds, temperature=37, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, parameters_filename='dna_mathews1999.par', ignore_missing_thresholds=False)[source]
Convenience function for creating many constraints as returned by
rna_plex_strand_pairs_constraint(), one for each threshold specified in parameter thresholds, based on number of matching (complementary) domains between pairs of strands.Optional parameters description and short_description are also dicts keyed by the same keys.
Exactly one of strands or pairs must be specified. If strands, then all pairs of strands (including a strand with itself) will be checked; otherwise only those pairs in pairs will be checked.
It is also common to set different thresholds according to the lengths of the strands. This can be done by calling
strand_pairs_by_lengths()to separate first by lengths in a dict mapping length pairs to strand pairs, then calling this function once for each (key, value) in that dict, giving the value (which is a list of pairs of strands) as the pairs parameter to this function.- Parameters:
thresholds (dict[int, float]) – Energy thresholds in kcal/mol. If k domains are complementary between the strands, then use threshold thresholds[k].
temperature (float) – Temperature in Celsius.
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.descriptions (dict[int, str] | None) – Long descriptions of constraint suitable for putting into constraint report.
short_descriptions (dict[int, str] | None) – Short descriptions of constraint suitable for logging to stdout.
parallel (bool) – Whether to test each pair of
Strand’s in parallel.strands (Iterable[Strand] | None) –
Strand’s to compare; if not specified, checks all in design. Mutually exclusive with pairs.pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in strands, including each strand with itself. Mutually exclusive with strands.parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as
vienna_nupack.rna_plex_multiple()ignore_missing_thresholds (bool) – If True, then a key num left out of thresholds dict will cause no constraint to be returned for pairs of strands with num complementary domains. If False, then a ValueError is raised.
- Returns:
list of constraints, one per threshold in thresholds
- Return type:
list[StrandPairsConstraint]
- constraints.longest_complementary_subsequences_python_loop(arr1, arr2, gc_double)[source]
Like
longest_complementary_subsequences(), but uses a Python loop instead of numpy operations. This is slower, but is easier to understand and useful for testing.- Parameters:
arr1 (ndarray)
arr2 (ndarray)
gc_double (bool)
- Return type:
list[int]
- constraints.longest_complementary_subsequences_two_loops(arr1, arr2, gc_double)[source]
Calculate length of longest common subsequences between arr1[i] and arr2[i] for each i, storing in returned list result[i].
This uses two nested Python loops to calculate the whole dynamic programming table.
longest_complementary_subsequences()is slightly faster because it maintains only the diagonal of the DP table, and uses numpy vectorized operations to calculate the next diagonal of the table.When used for DNA sequences, this assumes arr2 has been reversed along axis 1, i.e., the sequences in arr1 are assumed to be oriented 5’ –> 3’, and the sequences in arr2 are assumed to be oriented 3’ –> 5’.
- Parameters:
arr1 (ndarray) – 2D array of DNA sequences, with each sequence represented as a 1D array of 0, 1, 2, 3 corresponding to A, C, G, T, respectively, with each row being a single DNA sequence oriented 5’ –> 3’.
arr2 (ndarray) – 2D array of DNA sequences, with each row being a single DNA sequence oriented 3’ –> 5’.
gc_double (bool) – Whether to double the score for G-C base pairs.
- Returns:
list ret of ints, where ret[i] is the length of the longest complementary subsequence between arr1[i] and arr2[i].
- Return type:
list[int]
- constraints.longest_complementary_subsequences(arr1, arr2, gc_double)[source]
Calculate length of longest common subsequences between arr1[i] and arr2[i] for each i, storing in returned list result[i].
Unlike
longest_complementary_subsequences_two_loops(), this uses only one Python loop, by using the “anti-diagonal” method for evaluating the dynamic programming table, calculating a whole anti-diagonal from the previous two in O(1) numpy commands.When used for DNA sequences, this assumes arr2 has been reversed along axis 1, i.e., the sequences in arr1 are assumed to be oriented 5’ –> 3’, and the sequences in arr2 are assumed to be oriented 3’ –> 5’.
- Parameters:
arr1 (ndarray) – 2D array of DNA sequences, with each sequence represented as a 1D array of 0, 1, 2, 3 corresponding to A, C, G, T, respectively, with each row being a single DNA sequence oriented 5’ –> 3’.
arr2 (ndarray) – 2D array of DNA sequences, with each row being a single DNA sequence oriented 3’ –> 5’.
gc_double (bool) – Whether to double the score for G-C base pairs. (assumes that integers 1,2 represent C,G respectively)
- Returns:
list ret of ints, where ret[i] is the length of the longest complementary subsequence between arr1[i] and arr2[i].
- Return type:
list[int]
- constraints.lcs_domain_pairs_constraint(*, threshold, weight=1.0, score_transfer_function=None, description=None, short_description='lcs domain pairs', domains=None, pairs=None, check_domain_against_itself=True, gc_double=True)[source]
Checks pairs of domain sequences for longest complementary subsequences. This can be thought of as a very rough heuristic for “binding energy” that is much less accurate than NUPACK or ViennaRNA, but much faster to evaluate.
- Args
threshold: Max length of complementary subsequence allowed.
weight: See
Constraint.weight.score_transfer_function: See
Constraint.score_transfer_function.description: See
Constraint.descriptionshort_description: See
Constraint.short_descriptionpairs: Pairs of
Strand’s to compare; if not specified, checks all pairs in design.- check_domain_against_itself: Whether to check domain x against x*. (Obviously x has a maximal
common subsequence with x, so we don’t check that.)
gc_double: Whether to weigh G-C base pairs as double (i.e., they count for 2 instead of 1).
- Returns
A
DomainPairsConstraintthat checks given pairs ofDomain’s for excessive interaction due to having long complementary subsequences.
- Parameters:
- Return type:
- constraints.lcs_strand_pairs_constraint(*, threshold, weight=1.0, score_transfer_function=None, description=None, short_description='lcs strand pairs', pairs=None, check_strand_against_itself=True, gc_double=True)[source]
Checks pairs of strand sequences for longest complementary subsequences. This can be thought of as a very rough heuristic for “binding energy” that is much less accurate than NUPACK or ViennaRNA, but much faster to evaluate.
- Args
threshold: Max length of complementary subsequence allowed.
weight: See
Constraint.weight.score_transfer_function: See
Constraint.score_transfer_function.description: See
Constraint.description.short_description: See
Constraint.short_description.pairs: Pairs of
Strand’s to compare; if not specified, checks all pairs in design.check_strand_against_itself: Whether to check a strand against itself.
gc_double: Whether to weigh G-C base pairs as double (i.e., they count for 2 instead of 1).
- Returns
A
StrandPairsConstraintthat checks given pairs ofStrand’s for excessive interaction due to having long complementary subsequences.
- Parameters:
- Return type:
- constraints.lcs_strand_pairs_constraints_by_number_matching_domains(*, thresholds, weight=1.0, score_transfer_function=None, descriptions=None, short_descriptions=None, parallel=False, strands=None, pairs=None, gc_double=True, parameters_filename='', ignore_missing_thresholds=False)[source]
TODO
- Parameters:
thresholds (dict[int, int])
weight (float)
score_transfer_function (Callable[[float], float] | None)
descriptions (dict[int, str] | None)
short_descriptions (dict[int, str] | None)
parallel (bool)
strands (Iterable[Strand] | None)
gc_double (bool)
parameters_filename (str)
ignore_missing_thresholds (bool)
- Return type:
list[StrandPairsConstraint]
- constraints.rna_duplex_strand_pairs_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_dup_strand_pairs', parallel=False, pairs=None, parameters_filename='dna_mathews1999.par')[source]
Returns constraint that checks given pairs of
Strand’s for excessive interaction using Vienna RNA’s RNAduplex executable.Often one wishes to let the threshold depend on how many domains match between a pair of strands. The function
rna_duplex_strand_pairs_constraints_by_number_matching_domains()is useful for this purpose, returning a list ofStrandPairsConstraint’s such as those returned by this function, one for each possible number of matching domains.TODO: explain that this should be many pairs of strands to be fast
- Parameters:
threshold (float) – Energy threshold in kcal/mol. If a float, this is used for all pairs of strands. If a dict[int, float], interpreted to mean that
temperature (float) – Temperature in Celsius.
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.description (str | None) – See
Constraint.descriptionshort_description (str) – See
Constraint.short_descriptionparallel (bool) – Whether to test each pair of
Strand’s in parallel.pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in design.parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as
vienna_nupack.rna_duplex_multiple()
- Returns:
- Return type:
- constraints.rna_plex_strand_pairs_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_plex_strand_pairs', parallel=False, pairs=None, parameters_filename='dna_mathews1999.par')[source]
Returns constraint that checks given pairs of
Strand’s for excessive interaction using Vienna RNA’s RNAplex executable.Often one wishes to let the threshold depend on how many domains match between a pair of strands. The function
rna_plex_strand_pairs_constraints_by_number_matching_domains()is useful for this purpose, returning a list ofStrandPairsConstraint’s such as those returned by this function, one for each possible number of matching domains.- Parameters:
threshold (float) – Energy threshold in kcal/mol. If a float, this is used for all pairs of strands. If a dict[int, float], interpreted to mean that
temperature (float) – Temperature in Celsius.
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.description (str | None) – See
Constraint.descriptionshort_description (str) – See
Constraint.short_descriptionparallel (bool) – Whether to test each pair of
Strand’s in parallel.pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare; if not specified, checks all pairs in design.parameters_filename (str) – Name of parameters file for ViennaRNA; default is same as
vienna_nupack.rna_plex_multiple()
- Returns:
- Return type:
- constraints.rna_multifold_strand_pair_constraint(*, threshold, temperature=37, weight=1.0, score_transfer_function=None, description=None, short_description='rna_multifold_strand_pair', parallel=False, pairs=None, strands=None)[source]
Returns constraint that checks given pairs of
Strand’s for excessive interaction using ViennaRNA’s multi-strand partition function (the analog of NUPACK’spfunc()), via the PythonRNA.fold_compound(...).pf()API. This is equivalent in spirit to the command-lineRNAmultifoldtool (the N-strand successor toRNAcofold).Multi-strand support requires ViennaRNA >= 2.5.0.
- Parameters:
threshold (float) – Energy threshold in kcal/mol.
temperature (float) – Temperature in Celsius.
weight (float) – See
Constraint.weight.score_transfer_function (Callable[[float], float] | None) – See
Constraint.score_transfer_function.description (str | None) – See
Constraint.descriptionshort_description (str) – See
Constraint.short_descriptionparallel (bool) – Whether to test each pair of
Strand’s in parallel.pairs (Iterable[tuple[Strand, Strand]] | None) – Pairs of
Strand’s to compare. Mutually exclusive with strands.strands (Iterable[Strand] | None) –
Strand’s to compare; all pairs (including each strand with itself) are checked. Mutually exclusive with pairs.
- Returns:
The
StrandPairConstraint.- Return type:
- class constraints.ConstraintWithComplexes(description: 'str', short_description: 'str' = '', weight: 'float' = 1.0, score_transfer_function: 'Callable[[float], float] | None' = None, complexes: 'tuple[Complex, ...]' = ())[source]
- Parameters:
description (str)
short_description (str)
weight (float)
score_transfer_function (Callable[[float], float] | None)
complexes (tuple[Complex, ...])
- class constraints.ComplexConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, complexes=())[source]
Constraint that applies to a complex (tuple of
Strand’s).Specify
Constraint._evaluatein the constructor.Unlike other types of
Constraint’s such asStrandConstraintorStrandPairConstraint, there is no default list ofComplex’s that aComplexConstraintis applied to. The list ofComplex’s must be specified manually in the constructor.- Parameters:
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate=<function SingularConstraint.<lambda>>, parallel=False, complexes=())
- part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- class constraints.ComplexesConstraint(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, complexes=())[source]
Similar to
ComplexConstraintbut operates on a specified list of complexes (tuples ofStrand’s).- Parameters:
- __init__(description, short_description='', weight=1.0, score_transfer_function=None, evaluate_bulk=<function BulkConstraint.<lambda>>, complexes=())
- part_name()[source]
Returns name of the
Partthat thisConstrainttests.- Returns:
name of the
Partthat thisConstrainttests (e.g., “domain”, “strand pair”)- Return type:
str
- constraints.default_interior_to_strand_probability = 0.98
Default probability threshold for
BasePairType.INTERIOR_TO_STRAND
- constraints.default_adjacent_to_exterior_base_pair = 0.95
Default probability threshold for
BasePairType.ADJACENT_TO_EXTERIOR_BASE_PAIR
- constraints.default_blunt_end_probability = 0.33
Default probability threshold for
BasePairType.BLUNT_END
- constraints.default_nick_3p_probability = 0.79
Default probability threshold for
BasePairType.NICK_3P
- constraints.default_nick_5p_probability = 0.73
Default probability threshold for
BasePairType.NICK_5P
- constraints.default_dangle_3p_probability = 0.51
Default probability threshold for
BasePairType.DANGLE_3P
- constraints.default_dangle_5p_probability = 0.57
Default probability threshold for
BasePairType.DANGLE_5P
- constraints.default_dangle_5p_3p_probability = 0.73
Default probability threshold for
BasePairType.DANGLE_5P_3P
- constraints.default_overhang_on_this_strand_3p_probability = 0.82
Default probability threshold for
BasePairType.OVERHANG_ON_THIS_STRAND_3P
- constraints.default_overhang_on_this_strand_5p_probability = 0.79
Default probability threshold for
BasePairType.OVERHANG_ON_THIS_STRAND_5P
- constraints.default_overhang_on_adjacent_strand_3p_probability = 0.55
Default probability threshold for
BasePairType.OVERHANG_ON_ADJACENT_STRAND_3P
- constraints.default_overhang_on_adjacent_strand_5p_probability = 0.49
Default probability threshold for
BasePairType.OVERHANG_ON_ADJACENT_STRAND_5P
- constraints.default_overhang_on_both_strand_3p_probability = 0.61
Default probability threshold for
BasePairType.OVERHANG_ON_BOTH_STRANDS_3P
- constraints.default_overhang_on_both_strand_5p_probability = 0.55
Default probability threshold for
BasePairType.OVERHANG_ON_BOTH_STRANDS_5P
- constraints.default_three_arm_junction_probability = 0.69
Default probability threshold for
BasePairType.THREE_ARM_JUNCTION
- constraints.default_four_arm_junction_probability = 0.84
Default probability threshold for
BasePairType.FOUR_ARM_JUNCTION
- constraints.default_five_arm_junction_probability = 0.77
Default probability threshold for
BasePairType.FIVE_ARM_JUNCTION
- constraints.default_mismatch_probability = 0.76
Default probability threshold for
BasePairType.MISMATCH
- constraints.default_bulge_loop_3p_probability = 0.69
Default probability threshold for
BasePairType.BULGE_LOOP_3P
- constraints.default_bulge_loop_5p_probability = 0.65
Default probability threshold for
BasePairType.BULGE_LOOP_5P
- constraints.default_unpaired_probability = 0.95
Default probability threshold for
BasePairType.UNPAIRED
- constraints.default_other_probability = 0.7
Default probability threshold for
BasePairType.OTHER
- class constraints.BasePairType(value)[source]
Represents different configurations for a base pair, and its immediate neighboring base pairs (or lack thereof).
Notation:
“#” indicates denotes the ends of a domain. They can either be the end of a strand or they could be connected to another domain.
“]” and “[” indicates 5’ ends of strand
“>” and “<” indicates 3’ ends of a strand
“-” indicates a base (number of these are not important).
“|” indicates a bases are bound (forming a base pair). Any “-” not connected by “|” is unbound
Domain Example:
The following represents an unbound domain of length 5
#-----#
The following represents bound domains of length 5
#-----# ||||| #-----#
Ocassionally, domains will be vertical in the case of overhangs. In this case, “-” and “|” have opposite meanings
Vertical Domain Example:
# # |-| |-| |-| |-| |-| # #
Formatting:
Top strands have 5’ end on left side and 3’ end on right side
Bottom strand have 3’ end on left side and 5’ end on right side
Strand Example:
strand0: a-b-c-d strand1: d*-b*-c*-a* a b c d strand0 [-----##-----##-----##-----> ||||| ||||| ||||| ||||| strand1 <-----##-----##-----##-----] a* b* c* d*Consecutive “#”:
In some cases, extra “#” are needed to make space for ascii art. We consider any consecutive “#”s to be equivalent “##”. The following is considered equivalent to the example above
a b c d strand0 [-----###-----####-----##-----> ||||| ||||| ||||| ||||| strand1 <-----###-----####-----##-----] a* b* c* d*Note that only consecutive “#”s is considered equivalent to “##”. The following example is not equivalent to the strands above because the “# #” between b and c are seperated by spaces, so they are not equivalent to “##”, meaning that b and c neednot be adjacent. Note that while b and c need not be adjacent, b* and c* are still adjacent because they are seperated by consecutive “#”s with no spaces in between.
a b c d strand0 [-----###-----# #-----##-----> ||||| ||||| ||||| ||||| strand1 <-----###-----####-----##-----] a* b* c* d*- INTERIOR_TO_STRAND = 1
Base pair is located inside of a strand but not next to a base pair that resides on the end of a strand.
Similar base-pairing probability compared to
ADJACENT_TO_EXTERIOR_BASE_PAIRbut usually breathes less.#-----##-----# ||||| ||||| #-----##-----# ^ | base pair
- ADJACENT_TO_EXTERIOR_BASE_PAIR = 2
Base pair is located inside of a strand and next to a base pair that resides on the end of a strand.
Similar base-pairing probability compared to
INTERIOR_TO_STRANDbut usually breathes more.#-----# ||||| #-----] ^ | base pairor
#-----> ||||| #-----# ^ | base pair
- BLUNT_END = 3
Base pair is located at the end of both strands.
#-----> ||||| #-----] ^ | base pair
- NICK_3P = 4
Base pair is located at a nick involving the 3’ end of the strand.
#----->[-----# ||||| ||||| #-----##-----# ^ | base pair
- NICK_5P = 5
Base pair is located at a nick involving the 3’ end of the strand.
#-----##-----# ||||| ||||| #-----]<-----# ^ | base pair
- DANGLE_3P = 6
Base pair is located at the end of a strand with a dangle on the 3’ end.
#-----##----# ||||| #-----] ^ | base pair
- DANGLE_5P = 7
Base pair is located at the end of a strand with a dangle on the 5’ end.
#-----> ||||| #-----##----# ^ | base pair
- DANGLE_5P_3P = 8
Base pair is located with dangle at both the 3’ and 5’ end.
#-----##----# ||||| #-----##----# ^ | base pair
- OVERHANG_ON_THIS_STRAND_3P = 9
Base pair is next to a overhang on the 3’ end.
# | | | # #-----# #-----# ||||| ||||| #-----###-----# ^ | base pair
- OVERHANG_ON_THIS_STRAND_5P = 10
Base pair is next to a overhang on the 5’ end.
base pair | v #-----###-----# ||||| ||||| #-----# #-----# # | | | #
- OVERHANG_ON_ADJACENT_STRAND_3P = 11
Base pair 3’ end interfaces with an overhang.
The adjacent base pair type is
OVERHANG_ON_THIS_STRAND_5P# | | | # #-----# #---# ||||| ||| #-----###---# ^ | base pair
- OVERHANG_ON_ADJACENT_STRAND_5P = 12
Base pair 5’ end interfaces with an overhang.
The adjacent base pair type is
OVERHANG_ON_THIS_STRAND_3Pbase pair | v #-----###-----# ||||| ||||| #-----# #-----# # | | | #
- OVERHANG_ON_BOTH_STRANDS_3P = 13
Base pair’s 3’ end is an overhang and adjacent strand also has an overhang.
# # | | | | | | # # #-----# #---# ||||| |||| #-----###---# ^ | base pair
- OVERHANG_ON_BOTH_STRANDS_5P = 14
Base pair’s 5’ end is an overhang and adjacent strand also has an overhang.
base pair | v #-----###-----# ||||| ||||| #-----# #-----# # # | | | | | | # #
- THREE_ARM_JUNCTION = 15
Base pair is located next to a three-arm-junction.
# # |-| |-| |-| # # #-----# #---# ||||| |||| #-----###---# ^ | base pair
- FOUR_ARM_JUNCTION = 16
Currently, this case isn’t actually detected (considered as
OTHER).Base pair is located next to a four-arm-junction (e.g. Holliday junction).
# # |-| |-| |-| # # #-----# #-----# ||||| ||||| #-----# #-----# # # |-| |-| |-| # #- Type:
TODO
- FIVE_ARM_JUNCTION = 17
Currently, this case isn’t actually detected (considered as
OTHER).Base pair is located next to a five-arm-junction.
- Type:
TODO
- MISMATCH = 18
Currently, this case isn’t actually detected (considered as
DANGLE_5P_3P).Base pair is located next to a mismatch.
#-----##-##-----# ||||| ||||| #-----##-##-----# ^ | base pair- Type:
TODO
- BULGE_LOOP_3P = 19
Currently, this case isn’t actually detected (considered as
OVERHANG_ON_BOTH_STRANDS_3P).Base pair is located next to a mismatch.
#-----##-##-----# ||||| ||||| #-----#####-----# ^ | base pair- Type:
TODO
- BULGE_LOOP_5P = 20
Currently, this case isn’t actually detected (considered as
OVERHANG_ON_BOTH_STRANDS_5P).Base pair is located next to a mismatch.
#-----#####-----# ||||| ||||| #-----##-##-----# ^ | base pair- Type:
TODO
- UNPAIRED = 21
Base is unpaired.
Probabilities specify how unlikely a base is to be paired with another base.
- OTHER = 22
Other base pair types.
- class constraints.StrandDomainAddress(strand, domain_idx)[source]
An addressing scheme for specifying a domain on a strand.
- Parameters:
strand (Strand)
domain_idx (int)
- domain_idx: int
order in which domain appears in
StrandDomainAddress.strand
- neighbor_5p()[source]
Returns 5’ domain neighbor. If domain is 5’ end of strand, returns None
- Returns:
StrandDomainAddress of 5’ neighbor or None if no 5’ neighbor
- Return type:
StrandDomainAddress | None
- neighbor_3p()[source]
Returns 3’ domain neighbor. If domain is 3’ end of strand, returns None
- Returns:
StrandDomainAddress of 3’ neighbor or None if no 3’ neighbor
- Return type:
StrandDomainAddress | None
- constraints.BaseAddress = typing.Union[int, tuple[constraints.StrandDomainAd...
Represents a reference to a base. Can be either specified as a NUPACK base index or an index of a nuad
StrandDomainAddress:alias of
int|tuple[StrandDomainAddress,int]
- constraints.BasePairAddress
Represents a reference to a base pair
- constraints.BoundDomains
Represents bound domains
- constraints.nupack_complex_base_pair_probability_constraint(strand_complexes, nonimplicit_base_pairs=None, all_base_pairs=None, base_pair_prob_by_type=None, base_pair_prob_by_type_upper_bound=None, base_pair_prob=None, base_unpaired_prob=None, base_pair_prob_upper_bound=None, base_unpaired_prob_upper_bound=None, temperature=37, sodium=0.05, magnesium=0.0125, weight=1.0, score_transfer_function=None, description=None, short_description='ComplexBPProbs', parallel=False)[source]
Returns constraint that checks given base pair probabilities in tuples of
Strand’s- Parameters:
strand_complexes (list[Complex]) – Iterable of
Strandtuplesnonimplicit_base_pairs (Optional[Iterable[BoundDomains]]) –
list of nonimplicit base pairs that cannot be inferred because multiple instances of the same
Domainexist in complex.The
StrandDomainAddress.strandfield of each address should reference a strand in the first complex instrand_complexes.For example, if one
Strandhas one TDomainand another strand in the complex has two T*Domains, then the intended binding graph cannot be inferred and must be stated explicitly in this field.all_base_pairs (Optional[Iterable[BoundDomains]]) –
list of all base pairs in complex. If not provided, then base pairs are infered based on the name of
Domains in the complex as well as base pairs specified innonimplicit_base_pairs.TODO: This has not been implemented yet, and the behavior is as if this parameter is always
None(binding graph is always inferred).base_pair_prob_by_type (Optional[dict[BasePairType, float]]) –
Probability lower bounds for each
BasePairType. AllBasePairTypecomes with a default such asdefault_interior_to_strand_probabilitywhich will be used if a lower bound is not specified for a particular type.Note: Despite the name of this parameter, set thresholds for unpaired bases by specifying a threshold for
BasePairType.UNPAIRED.base_pair_prob_by_type_upper_bound (dict[BasePairType, float], optional) –
Probability upper bounds for each
BasePairType. By default, no upper bound is set.Note: Despite the name of this parameter, set thresholds for unpaired bases by specifying a threshold for
BasePairType.UNPAIRED.TODO: This has not been implemented yet.
base_pair_prob (Optional[dict[BasePairAddress, float]]) –
Probability lower bounds for each
BasePairAddresswhich takes precedence over probabilities specified bybase_pair_prob_by_type.TODO: This has not been implemented yet.
base_unpaired_prob (Optional[dict[BaseAddress, float]]) – Probability lower bounds for each
BaseAddressrepresenting unpaired bases. These lower bounds take precedence over the probability specified bybase_pair_prob_by_type[BasePairType.UNPAIRED].base_pair_prob_upper_bound (Optional[dict[BasePairAddress, float]]) – Probability upper bounds for each :py:class`BasePairAddress` which takes precedence over probabilties specified by
base_pair_prob_by_type_upper_bound.base_unpaired_prob_upper_bound (Optional[dict[BaseAddress, float]]) – Probability upper bounds for each
BaseAddressrepresenting unpaired bases. These lower bounds take precedence over the probability specified bybase_pair_prob_by_type_upper_bound[BasePairType.UNPAIRED].temperature (float, optional) – Temperature specified in °C, defaults to
vienna_nupack.default_temperature.sodium (float) – molarity of sodium (more generally, monovalent ions such as Na+, K+, NH4+) in moles per liter
magnesium (float) – molarity of magnesium (Mg++) in moles per liter
weight (float, optional) – See
Constraint.weight, defaults to 1.0score_transfer_function (Callable[[float], float], optional) – Score transfer function to use. By default, f(x) = x**2 is used, where x is the sum of the squared errors of each base pair that violates the threshold.
description (str | None, optional) – See
Constraint.description, defaults to Noneshort_description (str, optional) – See
Constraint.short_descriptiondefaults to ‘complex_secondary_structure_nupack’parallel (bool, optional) – TODO: Implement this
- Raises:
ImportError – If NUPACK 4 is not installed.
ValueError –
If
strand_complexesis not valid. In order forstrand_complexesto be valid,strand_complexesmust:Consist of complexes (tuples of
Strandobjects)Each complex must be of the same motif
- Returns:
ComplexConstraint
- Return type:
nuad.modifications
- class modifications.ModificationType(value)[source]
Type of modification (5’, 3’, or internal).
- five_prime = "5'"
5’ modification type
- three_prime = "5'"
3’ modification type
- internal = 'internal'
internal modification type
- class modifications.Modification(vendor_code, id='WARNING: no id assigned to modification')[source]
Abstract case class of modifications (to DNA sequences, e.g., biotin or Cy3). Use concrete subclasses
Modification3Prime,Modification5Prime, orModificationInternalto instantiate.If
Modification.idis not specified, thenModification.vendor_codeis used as the unique ID. EachModification.idmust be unique. For example if you create a 5’ “modification” to represent 6 T bases:t6_5p = Modification5Prime(display_text='6T', idt_text='TTTTTT')(this is a useful hack for putting single-stranded extensions on strands until loopouts on the end of a strand are supported; see https://github.com/UC-Davis-molecular-computing/scadnano-python-package/issues/2), then this would clash with a similar 3’ modification without specifying unique IDs for them:t6_3p = Modification3Prime(display_text='6T', vendor_code='TTTTTT') # ERROR.In general it is recommended to create a single
Modificationobject for each type of modification in the design. For example, if many strands have a 5’ biotin, then it is recommended to create a singleModificationobject and re-use it on each strand with a 5’ biotin:biotin_5p = Modification5Prime(display_text='B', vendor_code='/5Biosg/') design.strand(0, 0).move(8).with_modification_5p(biotin_5p) design.strand(1, 0).move(8).with_modification_5p(biotin_5p)
- Parameters:
vendor_code (str)
id (str)
- vendor_code: str
Text string specifying this modification (e.g., ‘/5Biosg/’ for 5’ biotin). optional
- id: str = 'WARNING: no id assigned to modification'
Representation as a string; used to write in
Strandjson representation, while the full description of the modification is written under a global key in theDesign. If not specified, butModification.idt_textis specified, then it will be set equal to that.
- __init__(vendor_code, id='WARNING: no id assigned to modification')
- Parameters:
vendor_code (str)
id (str)
- Return type:
None
- class modifications.Modification5Prime(vendor_code, id='WARNING: no id assigned to modification')[source]
5’ modification of DNA sequence, e.g., biotin or Cy3.
- Parameters:
vendor_code (str)
id (str)
- __init__(vendor_code, id='WARNING: no id assigned to modification')
- Parameters:
vendor_code (str)
id (str)
- Return type:
None
- class modifications.Modification3Prime(vendor_code, id='WARNING: no id assigned to modification')[source]
3’ modification of DNA sequence, e.g., biotin or Cy3.
- Parameters:
vendor_code (str)
id (str)
- __init__(vendor_code, id='WARNING: no id assigned to modification')
- Parameters:
vendor_code (str)
id (str)
- Return type:
None
- class modifications.ModificationInternal(vendor_code, id='WARNING: no id assigned to modification', allowed_bases=None)[source]
Internal modification of DNA sequence, e.g., biotin or Cy3.
- Parameters:
vendor_code (str)
id (str)
allowed_bases (AbstractSet[str] | None)
- __init__(vendor_code, id='WARNING: no id assigned to modification', allowed_bases=None)
- Parameters:
vendor_code (str)
id (str)
allowed_bases (AbstractSet[str] | None)
- Return type:
None
- allowed_bases: AbstractSet[str] | None = None
If None, then this is an internal modification that goes between bases. If instead it is a list of bases, then this is an internal modification that attaches to a base, and this lists the allowed bases for this internal modification to be placed at. For example, internal biotins for IDT must be at a T. If any base is allowed, it should be
['A','C','G','T'].
nuad.vienna_nupack
Contains utility functions for accessing NUPACK 4 and ViennaRNA energy calculation algorithms.
The main functions are
pfunc() (for calculating complex free energy with NUPACK, along with its helper functions
secondary_structure_single_strand() and binding()),
nupack_complex_base_pair_probabilities() (for calculating base pair probabilities with NUPACK),
rna_duplex_multiple() (for calculating an approximation to two-strand complex free energy
that is much faster than calling pfunc() on the same pair of strands),
and
rna_plex_multiple() (which is even faster than rna_duplex_multiple()).
- vienna_nupack.calculate_strand_association_penalty(temperature, num_seqs)[source]
Additive adjustment factor to convert NUPACK’s mole fraction units to molarity.
For details on why this is needed for multi-stranded complexes, see Section S1.1 of http://www.nupack.org/downloads/serve_public_file/fornace20_supp.pdf?type=pdf and Figure 2 of http://www.nupack.org/downloads/serve_public_file/nupack_user_guide_3.2.2.pdf?type=pdf
- Parameters:
temperature (float) – temperature in Celsius
num_seqs (int) – number of sequences
- Returns:
Additive adjustment factor to convert NUPACK’s mole fraction units to molar.
- Return type:
float
- vienna_nupack.pfunc(seqs, temperature=37, sodium=0.05, magnesium=0.0125, strand_association_penalty=True)[source]
Calls pfunc from NUPACK 4 (http://www.nupack.org/) on a complex consisting of the unique strands in seqs, returns energy (“delta G”), i.e., generally a negative number.
By default, a strand association penalty is applied that is not applied by NUPACK’s pfunc. See strand_association_penalty parameter documentation for details.
NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
seqs (S | Iterable[S]) – DNA sequences (tuple, or a single DNA sequence), whose order indicates a cyclic permutation of the complex. For one or two sequences, there is only one cyclic permutation, so the order doesn’t matter in such cases.
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium in moles per liter
magnesium (float) – molarity of magnesium in moles per liter
strand_association_penalty (bool) – Add strand association penalty for a complex, related to converting NUPACK’s mole fraction units to molarity. The quantity added is that returned by
calculate_strand_association_penalty()with parameters temperature and len(seqs). For most constraints, which involve only one size of complex, this factor won’t matter other than to adjust the energy threshold by the same factor. The factor depends only on the number of strands in seqs, but not on their sequences. However, this factor is needed for a meaningful comparison of energies between complexes of different sizes, e.g., to calculate equilibrium concentrations of complexes of various sizes. For details on why this is needed for multi-stranded complexes, see Section S1.1 of http://www.nupack.org/downloads/serve_public_file/fornace20_supp.pdf?type=pdf and Figure 2 of http://www.nupack.org/downloads/serve_public_file/nupack_user_guide_3.2.2.pdf?type=pdf
- Returns:
complex free energy (“delta G”) of ordered complex with strands in given cyclic permutation
- Return type:
float
- vienna_nupack.nupack_complex_base_pair_probabilities(strand_complex, temperature=37, sodium=0.05, magnesium=0.0125)[source]
Calculates base-pair probabilities according to NUPACK 4.
- Parameters:
strand_complex (Complex) – Ordered tuple of strands in complex (specifying a particular circular ordering, which is imposed on all considered secondary structures)
temperature (float) – temperature in Celsius
sodium (float) – molarity of sodium in moles per liter
magnesium (float) – molarity of magnesium in moles per liter
- Returns:
2D Numpy array of floats, with result[i1][i2] giving the base-pair probability of base at position i1 with base at position i2 (if i1 != i2), where i1 and i2 are the absolute positions of the bases in the entire ordered list of strands. For example, with strands AAAA and TTTTT, there are nine indices 0,1,2,3,4,5,6,7,8, with positions 0,1,2,3 on the first strand AAAA, and positions 4,5,6,7,8 on the second strand TTTTT. If i1 == i2, then result[i1][i1] is the probability that the base at position i1 is unpaired.
- Return type:
ndarray
- vienna_nupack.call_subprocess(command_strs, user_input)[source]
Calls system command through a subprocess. Assumes running on a POSIX operating system.
If running on Windows, automatically appends “wsl -e” to start of command to call command through Windows subsystem for Linux, so wsl must be installed for this to work: https://docs.microsoft.com/en-us/windows/wsl/install-win10
- Parameters:
command_strs (List[str]) – List of command and command line arguments, i.e., to call
ls -l -a, command_strs should be the list [‘ls’, ‘-l’, ‘-a’].user_input (str) – Input to give once program is running (i.e., what would user type).
- Returns:
pair of strings (output, error), giving the strings written to stdout and stderr, respectively.
- Return type:
tuple[str, str]
- vienna_nupack.rna_duplex(seq1, seq2, temperature=37, max_energy=0.0, gu_wobble=False)[source]
TODO
- Parameters:
seq1 (S)
seq2 (S)
temperature (float)
max_energy (float)
gu_wobble (bool)
- Return type:
float
- vienna_nupack.rna_duplex_multiple(pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0, gu_wobble=False, progress_bar=False)[source]
Calls RNA.duplexfold (from ViennaRNA Python package: https://www.tbi.univie.ac.at/RNA/ViennaRNA/refman/api_python.html) on a list of pairs, specifically: [ (seq1, seq2), (seq3, seq4), (seq5, seq6), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.
- Parameters:
pairs (Sequence[tuple[S, S]]) – sequence (list or tuple) of pairs of DNA sequences
temperature (float) – temperature in Celsius
max_energy (float) – This is the maximum energy possible to assign. If RNAduplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAduplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.
gu_wobble (bool) – Whether to allow GU wobble pairs; I honestly don’t know what this means for DNA, but it’s an option in RNAduplex, and it makes a difference in calculating the energy.
logger (Logger)
parameters_filename (str)
progress_bar (bool)
- Returns:
list of free energies, in the same order as pairs
- Return type:
tuple[float, …]
- vienna_nupack.rna_duplex_multiple_deprecated(pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Calls RNAduplex (from ViennaRNA package: https://www.tbi.univie.ac.at/RNA/) on a list of pairs, specifically: [ (seq1, seq2), (seq3, seq4), (seq5, seq6), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.
- Parameters:
pairs (Sequence[tuple[S, S]]) – sequence (list or tuple) of pairs of DNA sequences
logger (Logger) – logger to use for printing error messages
temperature (float) – temperature in Celsius
parameters_filename (str) – name of parameters file for ViennaRNA
max_energy (float) – This is the maximum energy possible to assign. If RNAduplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAduplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.
- Returns:
list of free energies, in the same order as pairs
- Return type:
tuple[float, …]
- vienna_nupack.rna_duplex_multiple_parallel(thread_pool, pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Parallel version of
rna_duplex_multiple(). TODO document this- Parameters:
thread_pool (ThreadPool)
pairs (Sequence[tuple[S, S]])
logger (Logger)
temperature (float)
parameters_filename (str)
max_energy (float)
- Return type:
tuple[float, …]
- vienna_nupack.rna_plex_multiple(pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Calls RNAplex (from ViennaRNA package: https://www.tbi.univie.ac.at/RNA/) on a list of pairs, specifically: [ (seq1, seq2), (seq2, seq3), (seq4, seq5), … ] where seqi is a string over {A,C,T,G}. Temperature is in Celsius. Returns a list (in the same order as seqpairs) of free energies.
RNAplex is supposedly faster than RNAduplex, but less accurate since, compared to RNAduplex, “ loop energy is an affine function of the loop size instead of a logarithmic function” (https://doi.org/10.1093/bioinformatics/btn193). However, in testing with random sequences, RNAplex can actually be SLOWER, for instance with 1000 pairs of DNA sequences each of length 64, RNAduplex takes 1.93 s on average, compared to 2.28 s for RNAplex. (tested using %timeit in Jupyter lab) This is with default parameters; using “-f 1” speeds up the computing time by about factor 5; in this case RNAplex takes only 0.41 s on average instead of 2.28 s.
- Parameters:
pairs (Sequence[tuple[S, S]]) – sequence (list or tuple) of pairs of DNA sequences
logger (Logger) – logger to use for printing error messages
temperature (float) – temperature in Celsius
parameters_filename (str) – name of parameters file for NUPACK
max_energy (float) – This is the maximum energy possible to assign. If RNAplex reports any energies larger than this, they will be changed to max_energy. This is useful in case two sequences have no possible base pairs between them (e.g., CCCC and TTTT), in which case RNAplex assigns a free energy of 100000 (perhaps its approximation of infinity). But for meaningful comparison and particularly for graphing energies, it’s nice if there’s not some value several orders of magnitude larger than all the rest.
- Returns:
list of free energies, in the same order as pairs
- Return type:
tuple[float, …]
- vienna_nupack.nupack_multiple_with_sodium_magnesium(sodium=0.05, magnesium=0.0125)[source]
Used when we want a
BulkConstraintusing NUPACK (even though most of them areSingularConstraint’s).Calls NUPACK (specifically, the function
binding()) on a list of pairs: [ (seq1, seq2), (seq2, seq3), (seq4, seq5), … ] where seqi is a string over {A,C,T,G}.- Parameters:
sodium (float) – molarity of sodium in moles per liter
magnesium (float) – molarity of magnesium in moles per liter
- Returns:
list of free energies, in the same order as pairs
- Return type:
Callable[[Sequence[tuple[S, S]], Logger, float, str, float], tuple[float, …]]
- vienna_nupack.rna_plex_multiple_parallel(thread_pool, pairs, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0)[source]
Parallel version of
rna_plex_multiple(). TODO document this- Parameters:
thread_pool (ThreadPool)
pairs (Sequence[tuple[S, S]])
logger (Logger)
temperature (float)
parameters_filename (str)
max_energy (float)
- Return type:
tuple[float, …]
- vienna_nupack.rna_multifold(seqs, temperature=37, max_energy=0.0, gu_wobble=False)[source]
Computes the complex free energy of an arbitrary tuple of DNA sequences using ViennaRNA’s multi-strand partition function (the analog of NUPACK’s
pfunc()). Internally callsRNA.fold_compound("&".join(seqs)).pf().Multi-strand (>2) support requires ViennaRNA >= 2.5.0. For two strands this is equivalent to RNAcofold’s partition-function output; for one strand it is equivalent to RNAfold -p.
- Parameters:
seqs (Sequence[S]) – tuple/list of DNA sequences forming a complex (any length >= 1)
temperature (float) – temperature in Celsius
max_energy (float) – Maximum energy to return. If ViennaRNA reports an energy larger than this (e.g., 100000 when no base pairs are possible between strands like CCCC and TTTT), it is clamped to max_energy.
gu_wobble (bool) – Whether to allow GU wobble pairs.
- Returns:
complex free energy in kcal/mol
- Return type:
float
- vienna_nupack.rna_multifold_multiple(seq_tuples, logger=<RootLogger root (WARNING)>, temperature=37, parameters_filename='dna_mathews1999.par', max_energy=0.0, gu_wobble=False)[source]
Bulk wrapper around
rna_multifold(): computes the complex free energy of each tuple of sequences in seq_tuples. Each element of seq_tuples may have any length >= 1.Provided so callers expecting the same signature as
rna_duplex_multiple()/rna_plex_multiple()can use multi-strand free energies. The parameters_filename argument is accepted for signature parity but ignored; the Mathews 2004 DNA parameters are loaded once viaload_params_viennarna().- Parameters:
seq_tuples (Sequence[Sequence[S]])
logger (Logger)
temperature (float)
parameters_filename (str)
max_energy (float)
gu_wobble (bool)
- Return type:
tuple[float, …]
- vienna_nupack.wc(seq)[source]
Return reverse complement of seq. Alias for
rc().Deprecated: use
rc()instead.- Parameters:
seq (str)
- Return type:
str
- vienna_nupack.rc(seq)[source]
Return reverse complement of seq.
- Parameters:
seq (str)
- Return type:
str
- vienna_nupack.free_energy_single_strand(seq, temperature=37, sodium=0.05, magnesium=0.0125)[source]
Computes the “complex free energy” (https://docs.nupack.org/definitions/#complex-free-energy) of a single strand according to NUPACK.
NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
seq (str)
temperature (float)
sodium (float)
magnesium (float)
- Return type:
float
- vienna_nupack.binding_complement(seq, temperature=37, sodium=0.05, magnesium=0.0125, subtract_indv=True)[source]
Computes the complex free energy of a strand with its perfect Watson-Crick complement.
NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
seq (str)
temperature (float)
sodium (float)
magnesium (float)
subtract_indv (bool)
- Return type:
float
- vienna_nupack.binding(seq1, seq2, *, temperature=37, sodium=0.05, magnesium=0.0125)[source]
Computes the complex free energy of association between two strands.
NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
seq1 (str)
seq2 (str)
temperature (float)
sodium (float)
magnesium (float)
- Return type:
float
- vienna_nupack.random_dna_seq(length, bases='ACTG')[source]
Chooses a random DNA sequence.
- Parameters:
length (int)
bases (Sequence)
- Return type:
str
- vienna_nupack.domain_orthogonal(seq, seqs, temperature, sodium, magnesium, orthogonality, orthogonality_ave=-1, parallel=False)[source]
test orthogonality of domain with all others and their wc complements
NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
seq (str)
seqs (Sequence[str])
temperature (float)
sodium (float)
magnesium (float)
orthogonality (float)
orthogonality_ave (float)
parallel (bool)
- Return type:
bool
- vienna_nupack.domain_pairwise_concatenated_no_sec_struct(seq, seqs, temperature, sodium, magnesium, concat, concat_ave=-1, parallel=False)[source]
test lack of secondary structure in concatenated domains
NUPACK 4 must be installed. Installation instructions can be found at https://piercelab-caltech.github.io/nupack-docs/start/.
- Parameters:
seq (str)
seqs (Sequence[str])
temperature (float)
sodium (float)
magnesium (float)
concat (float)
concat_ave (float)
parallel (bool)
- Return type:
bool
nuad.np
Library for doing sequence design that can be expressed as linear algebra operations for rapid processing by numpy (e.g., generating all DNA sequences of a certain length and calculating all their full duplex binding energies in the nearest neighbor model and filtering those outside a given range).
Based on the DNA single-stranded tile (SST) sequence designer used in the following publication.
“Diverse and robust molecular algorithms using reprogrammable DNA self-assembly” Woods*, Doty*, Myhrvold, Hui, Zhou, Yin, Winfree. (*Joint first co-authors)
- np.idx2seq(idx, length)[source]
Return the lexicographic idx’th DNA sequence of given length.
- Parameters:
idx (int)
length (int)
- Return type:
str
- np.seq2arr(seq, base2bits_local=None)[source]
Convert seq (string with DNA alphabet) to numpy array with integers 0,1,2,3.
- Parameters:
seq (str)
base2bits_local (dict[str, int] | None)
- Return type:
np.ndarray
- np.seqs2arr(seqs)[source]
Return numpy 2D array converting the given DNA sequences to integers.
- Parameters:
seqs (Sequence[str])
- Return type:
ndarray
- np.arr2seqs(arr)[source]
Return list of strings converting the given numpy array of integers to DNA sequences.
- Parameters:
arr (ndarray)
- Return type:
list[str]
- np.arr2seq(arr)[source]
Return string converting the given numpy array of integers to DNA sequence.
- Parameters:
arr (ndarray)
- Return type:
str
- np.make_array_with_all_dna_seqs(length, bases=('A', 'C', 'G', 'T'))[source]
Return 2D numpy array with all DNA sequences of given length in lexicographic order. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.
Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.
- Parameters:
length (int)
bases (Collection[str])
- Return type:
ndarray
- np.make_array_with_random_subset_of_dna_seqs(length, num_random_seqs, rng=Generator(PCG64) at 0x7DC02407E4A0, bases=('A', 'C', 'G', 'T'))[source]
Return 2D numpy array with random subset of size num_seqs of DNA sequences of given length. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.
Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.
Sequences returned will be unique (i.e., sampled without replacement) and in a random order
- Parameters:
length (int) – length of each row
num_random_seqs (int) – number of rows
bases (Collection[str]) – DNA bases to use
rng (Generator) – numpy random number generator (type returned by numpy.random.default_rng())
- Returns:
2D numpy array with random subset of size num_seqs of DNA sequences of given length
- Return type:
ndarray
- np.make_array_with_all_dna_seqs_hamming_distance(dist, seq, bases=('A', 'C', 'G', 'T'))[source]
Return 2D numpy array with all DNA sequences of given length in lexicographic order. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.
Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.
- Parameters:
dist (int)
seq (str)
bases (Collection[str])
- Return type:
ndarray
- np.make_array_with_random_subset_of_dna_seqs_hamming_distance(num_seqs, dist, seq, rng=Generator(PCG64) at 0x7DC02407E4A0, bases=('A', 'C', 'G', 'T'))[source]
Return 2D numpy array with random subset of size num_seqs of DNA sequences of given length. Bases contains bases to be used: (‘A’,’C’,’G’,’T’) by default, but can be set to a subset of these.
Uses the encoding described in the documentation for DNASeqList. The result is a 2D array, where each row represents a DNA sequence, and that row has one byte per base.
Sampled with replacement, so the same row may appear twice in the returned array
- Parameters:
num_seqs (int) – number of sequences to generate
dist (int) – Hamming distance to be from seq
seq (str) – sequence to generate other sequences close to
bases (Collection[str]) – DNA bases to use
rng (Generator) – numpy random number generator (type returned by numpy.random.default_rng())
- Returns:
2D numpy array with random subset of size num_seqs of DNA sequences of given length
- Return type:
ndarray
- np.longest_common_substring(a1, a2, vectorized=True)[source]
Return start and end indices (a1start, a2start, length) of longest common substring (subarray) of 1D arrays a1 and a2.
- Parameters:
a1 (ndarray)
a2 (ndarray)
vectorized (bool)
- Return type:
tuple[int, int, int]
- np.longest_common_substrings_singlea1(a1, a2s)[source]
Return start and end indices (a1starts, a2starts, lengths) of longest common substring (subarray) of 1D array a1 and rows of 2D array a2s.
If length[i]=0, then a1starts[i]=a2starts[i]=0 (not -1), so be sure to check length[i] to see if any substrings actually matched.
- Parameters:
a1 (ndarray)
a2s (ndarray)
- Return type:
tuple[ndarray, ndarray, ndarray]
- np.longest_common_substrings_product(a1s, a2s)[source]
Return start and end indices (a1starts, a2starts, lengths) of longest common substring (subarray) of each pair in the cross product of rows of a1s and a2s.
If length[i]=0, then a1starts[i]=a2starts[i]=0 (not -1), so be sure to check length[i] to see if any substrings actually matched.
- Parameters:
a1s (ndarray)
a2s (ndarray)
- Return type:
tuple[ndarray, ndarray, ndarray]
- np.longest_common_substrings_all_pairs_strings(seqs1, seqs2)[source]
For Python strings
- Parameters:
seqs1 (Sequence[str])
seqs2 (Sequence[str])
- Return type:
tuple[ndarray, ndarray, ndarray]
- np.strongest_common_substrings_all_pairs_string(seqs1, seqs2, temperature)[source]
For Python strings representing DNA; checks for reverse complement matches rather than direct matches, and evaluates nearest neighbor energy, returning indices lengths, and energies of strongest complementary substrings.
- Parameters:
seqs1 (Sequence[str])
seqs2 (Sequence[str])
temperature (float)
- Return type:
tuple[list[float], list[float], list[float], list[float]]
- class np.DNASeqList(length=None, num_random_seqs=None, shuffle=False, alphabet=('A', 'C', 'G', 'T'), seqs=None, seqarr=None, filename=None, rng=Generator(PCG64) at 0x7DC02407E4A0, hamming_distance_from_sequence=None)[source]
Represents a list of DNA sequences of identical length. The sequences are stored as a 2D numpy array of bytes
DNASeqList.seqarr. Each byte represents a single DNA base (so it is not a compact representation; the most significant 6 bits of the byte will always be 0).- Parameters:
length (int | None)
num_random_seqs (int | None)
shuffle (bool)
alphabet (Collection[str])
seqs (Sequence[str] | None)
seqarr (ndarray)
filename (str | None)
rng (Generator)
hamming_distance_from_sequence (tuple[int, str] | None)
- __init__(length=None, num_random_seqs=None, shuffle=False, alphabet=('A', 'C', 'G', 'T'), seqs=None, seqarr=None, filename=None, rng=Generator(PCG64) at 0x7DC02407E4A0, hamming_distance_from_sequence=None)[source]
Creates a set of DNA sequences, all of the same length.
Create either all sequences of a given length if seqs is not specified, or all sequences in seqs if seqs is specified. If neither is specified then all sequences of length 3 are created.
Exactly one of the following should be specified:
length (possibly along with alphabet and num_random_seqs)
seqs
seqarr
filename
hamming_distance_from_sequence (possibly along with alphabet and num_random_seqs)
- Parameters:
length (int | None) – length of sequences; num_seqs and alphabet can also be specified along with it
hamming_distance_from_sequence (tuple[int, str] | None) – if specified and equal to (dist, seq) of type (int, str), then only sequences at Hamming distance dist from seq will be generated. Raises error if length, seqs, seqarr, or filename is specified.
num_random_seqs (int | None) – number of sequences to generate; if not specified, then all sequences of length length using bases from alphabet are generated. Sequences are sampled with replacement, so the same sequence may appear twice.
shuffle (bool) – whether to shuffle sequences
alphabet (Collection[str]) – a subset of {‘A’, ‘C’, ‘G’, ‘T’}
seqs (Sequence[str] | None) – sequence (e.g., list or tuple) of strings, all of the same length
seqarr (np.ndarray | None) – 2D NumPy array, with axis 0 moving between sequences, and axis 1 moving between consecutive DNA bases in a sequence
filename (str | None) – name of file containing a
DNASeqListas written byDNASeqList.write_to_file()rng (np.random.Generator) – numpy random number generator (type returned by numpy.random.default_rng())
- rng: Generator
Random number generator to use.
- seqarr: ndarray
Uses a (noncompact) internal representation using 8 bits (1 byte, dtype = np.ubyte) per base, stored in a numpy 2D array of bytes. Each row (axis 0) is a DNA sequence, and each column (axis 1) is a base in a sequence.
The code used is \(A \to 0, C \to 1, G \to 2, T \to 3\).
- seqlen: int
Length of each DNA sequence (number of columns, axis 1, in
DNASeqList.seqarr)
- numseqs: int
Number of DNA sequences (number of rows, axis 0, in
DNASeqList.seqarr)
- random_choice(num, rng=Generator(PCG64) at 0x7DC02407E4A0, replace=False)[source]
Returns random choice of num DNA sequence(s) (represented as list of Python strings).
- Parameters:
num (int) – number of sequences to sample
rng (Generator) – random number generator to use
replace (bool) – whether to sample with replacement
- Returns:
sampled sequences
- Return type:
list[str]
- random_sequence(rng=Generator(PCG64) at 0x7DC02407E4A0)[source]
Returns random DNA sequence (represented as Python string).
- Returns:
sampled sequence
- Parameters:
rng (Generator)
- Return type:
str
- write_to_file(filename)[source]
Writes text file describing DNA sequence list, in format
numseqs seqlen seq1 seq2 seq3 …
where numseqs, seqlen are integers, and seq1, … are strings from {A,C,G,T}
- Parameters:
filename (str)
- Return type:
None
- concatenate(other)[source]
Creates a new
DNASeqListby concatenating this with another.They must have the same sequence
DNASeqList.numseqs.- Parameters:
other (DNASeqList)
- Return type:
- to_list()[source]
Return list of strings representing the sequences, e.g. [‘ACG’,’TAA’]
- Return type:
list[str]
- get_seq_str(idx)[source]
Return idx’th DNA sequence as a string.
- Parameters:
idx (int)
- Return type:
str
- get_seqs_str_list(slice_)[source]
Return a list of strings specified by slice.
- Parameters:
slice_ (slice)
- Return type:
list[str]
- keep_seqs_at_indices(indices)[source]
Keeps only sequences at the given indices.
- Parameters:
indices (Iterable[int])
- Return type:
None
- hamming_map(sequence)[source]
Return dict mapping each length d to a
DNASeqListof sequences that are Hamming distance d from seq.- Parameters:
sequence (str)
- Return type:
dict[int, DNASeqList]
- sublist(start, end=None)[source]
Return sublist of DNASeqList from start, inclusive, to end, exclusive.
If end is not specified, goes until the end of the list.
- Parameters:
start (int)
end (int | None)
- Return type:
- filter_energy(low, high, temperature)[source]
Return new DNASeqList with seqs whose wc complement energy is within the given range.
- Parameters:
low (float)
high (float)
temperature (float)
- Return type:
- energies(temperature)[source]
Calculates the nearest-neighbor binding energy of each sequence with its perfect complement (summing over all length-2 substrings of the domain’s sequence), using parameters from the 2004 Santa-Lucia and Hicks paper (https://www.annualreviews.org/doi/abs/10.1146/annurev.biophys.32.110601.141800, see Table 1, and example on page 419).
This is used by
NearestNeighborEnergyFilterto calculate the energy of domains when filtering.- Parameters:
temperature (float) – temperature in Celsius
- Returns:
array of nearest-neighbor energies of each sequence with its perfect Watson-Crick complement
- Return type:
ndarray
- filter_end_gc()[source]
Remove any sequence with A or T on the end. Also remove domains that do not have an A or T either next to that base, or one away. Otherwise we could get a domain ending in {C,G}^3, which, placed next to any domain ending in C or G, will create a substring in {C,G}^4 and be rejected if we are filtering those.
- Return type:
- filter_end_at(gc_near_end=False)[source]
Remove any sequence with C or G on the end. Also, if gc_near_end is True, remove domains that do not have an C or G either next to that base, or one away, to prevent breathing.
- Parameters:
gc_near_end (bool)
- Return type:
- filter_base_nowhere(base)[source]
Remove any sequence that has given base anywhere.
- Parameters:
base (str)
- Return type:
- filter_base_count(base, low, high)[source]
Remove any sequence not satisfying low <= #base <= high.
- Parameters:
base (str)
low (int)
high (int)
- Return type:
- np.create_toeplitz(seqlen, sublen, indices=None)[source]
Creates a toeplitz matrix, useful for finding subsequences.
seqlen is length of larger sequence; sublen is length of substring we’re checking for. If indices is None, then all rows are created, otherwise only rows for checking those indices are created.
- Parameters:
seqlen (int)
sublen (int)
indices (Sequence[int] | None)
- Return type:
np.ndarray
- np.calculate_loop_energies(temperature, negate=False)[source]
Get SantaLucia and Hicks nearest-neighbor loop energies for given temperature, 1 M Na+.
- Parameters:
temperature (float)
negate (bool)
- Return type:
ndarray
- np.wcenergy(seq, temperature, negate=False)[source]
Return the wc energy of seq binding to its complement.
- Parameters:
seq (str)
temperature (float)
negate (bool)
- Return type:
float
- np.calculate_wc_energies(seqarr, temperature, negate=False)[source]
Calculate and store in an array all energies of all sequences in seqarr with their Watson-Crick complements.
- Parameters:
seqarr (ndarray)
temperature (float)
negate (bool)
- Return type:
ndarray
- np.wc_arr(seqarr)[source]
Return numpy array of reverse complements of sequences in seqarr.
- Parameters:
seqarr (ndarray)
- Return type:
ndarray
- np.energy_hist(length, temperature=37, combine_lengths=False, num_random_sequences=100000, figsize=(15, 6), **kwargs)[source]
Make a matplotlib histogram of the nearest-neighbor energies (as defined by
DNASeqList.energies()) of all DNA sequences of the given length(s), or a randomly selected subset if length(s) is too large to enumerate all DNA sequences of that length.This is useful, for example, to choose low and high energy values to pass to
NearestNeighborEnergyFilter.- Parameters:
length (int | Iterable[int]) – length of DNA sequences to consider, or an iterable (e.g., list) of lengths
temperature (float) – temperature in Celsius
combine_lengths (bool) – If True, then length should be an iterable, and the histogram will combine all calculated energies from all lengths into one histogram to plot. If False (the default), then different lengths are plotted in different colors in the histogram.
num_random_sequences (int) – If the length is too large to enumerate all DNA sequences of that length, then this many random sequences are used to estimate the histogram.
figsize (tuple[int, int]) – Size of the figure in inches.
kwargs – Any keyword arguments given are passed along as keyword arguments to matplotlib.pyplot.hist: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html
- Return type:
None