Production Strategies For Molecular Building Blocks Suitable For DGAP

Introduction, goals, and standard techniques

Scope of document
Design criteria for MBBs
Core proteins and attachment chemistries discussed here
Genetic engineering of proteins to modify surface amino acids
General strategies for site-specific attachment of DNA sequences

Attachment Methods

Method 1. DNA-conjugation to each monomer in a multi-subunit complex.
Method 2. Use different attachment chemistry at each of three sites.
Method 3. Separation of DNA-conjugated streptavidin.

References

Introduction, goals, and standard techniques

Scope of document

This is a description of several proposed strategies for producing molecular building blocks (MBBs) consisting of DNA-protein conjugates with a different specific DNA sequence attached to each of several specific attachment sites on a protein, for use in the proposed process, DNA-Guided Assembly of Proteins (DGAP). (That process is described in a separate document, which should be read before this one.)

Although we (I and my partners) prefer some methods over others, I describe a variety of methods of different kinds for making MBBs, in part because some methods differ in which proteins can be used, or in the number of DNA attachment sites per protein that they allow, and in part because we may encounter unexpected difficulties in our preferred methods, and thus would like to have fallback methods available. (We have other methods or variations not included here for reasons of brevity.)

The methods described cover a variety of levels of difficulty or risk to develop, cost to practice, and level of generality of MBBs that can be made, as well as of implementation techniques. Many applications of DGAP will be made possible even if only one of these methods is implemented. We expect that most of them could eventually be implemented, and we might implement more than one if the new applications made possible by the newly accessible kinds of MBBs justified it.

To develop any of these methods, we expect that we will need to work in an existing lab with the help of researchers and lab technicians experienced in the specific kinds of techniques we will use. Furthermore, many of the specific protocols, described here in general terms, will need to be chosen and developed with expert advice, including protocols for separation (discussed here very generally) and other kinds of characterization and verification (not discussed at all), as well as the synthesis of linker molecules and/or the genetic engineering of core proteins for MBBs, which would be needed for some of the methods.

(Other issues not discussed in this document include the details of the P-sites and the covalent crosslinking between P-sites on different assembled MBBs, the geometry of attachments between MBBs, and any of the specific possible applications of these MBBs assembled using the DGAP process.)

Design criteria for MBBs

This document does not discuss the specific applications of DGAP in which the MBBs described here are meant to be used. However, all of these applications have the common feature that the specific 3-dimensional arrangement of proteins is important, whereas the specific identities of the proteins used are of much less importance (except possibly for a few MBBs per design), since most MBBs are used purely as mechanical elements in the designs, i.e. as scaffolding for other molecules. Similarly, the specific DNA sequences attached to each MBB are important only for guiding the proper assembly of several MBBs into a larger structure. Accordingly, the proteins used in most MBBs can be chosen for ease of production of the MBBs, and so can some portions of the attached DNA sequences, provided some other portions of the DNA sequences are able to be different for each attachment site of each specific kind of MBB.

In order to uniquely orient each MBB using the DNA (attached to the C-sites), at least three C-sites are required. Many structures will be easier to build from MBBs with at least 4 C-sites, distributed (very roughly) tetrahedrally, so that the protein can be pulled stably in any direction using the closest three of the four sites. Four C-sites should be sufficient for most structures, but up to eight or so sites could be useful in some cases.

Core proteins and attachment chemistries mentioned in this document

In the methods described in this document, the protein used as the core of an MBB will be either a streptavidin tetramer, with its biotin- binding sites used as DNA-attachment sites, or an unspecified asymmetric protein with several cysteine residues on the outer surface, probably introduced by genetic engineering at positions chosen for this application, used as DNA-attachment sites. In the terminology of the document describing DGAP in general, these DNA-attachment sites are the C-sites; the P-sites will either be special groups included between the DNA and the group that attaches it to the C-site, or will be other surface amino acids of the MBB protein.

(Attachment of suitably modified DNA to the sulfhydryl groups of cysteines or to the biotin-binding sites of streptavidin are standard techniques [Hermanson 1996].)

It is also possible that specific functionalization of surface lysine residues could be used to form an additional C-site on certain core proteins, after genetic replacement of endogenous lysines and introduction of new lysines at desired positions for C-sites. Similar genetic modifications have been done for other reasons [Gaertner et.al. 1992]. The discussion of lysine functionalization by anhydrides in Hermanson [1996, p. 145] implies that specificity for this residue is possible, though we have not yet investigated this sufficiently.

Genetic engineering of proteins to modify surface amino acids

There are standard techniques for generating substantial quantities of proteins with several specifically-designed amino-acid replacements, involving site-directed mutagenesis of cloned genes and insertion into bacterial plasmids for expression. For example, Saraswat et. al. [1992] used this technique to replace the two endogenous cysteines of a natural protein with alanine, and to add new cysteines at each of 5 different positions chosen for purposes related to their application (one in each of 5 new protein species), obtaining yields of 70-80 mg protein / liter of bacterial culture. Other examples include [Kanaya et. al. 1992] and [Gaertner et. al. 1992].

There is no inherent limit to the number of amino acids that can be modified with this technique, since modified genes can be amplified between sequential replacements, if necessary. Provided that the replacements are isolated surface amino acids, it is likely that the modified protein will fold in the same way as the native one [Handel 1995, personal communication].

General strategies for site-specific attachment of DNA sequences

For the asymmetric molecules, attachment sites can be chosen at geometrically distinguishable positions on the protein, and the methods described must produce MBBs in which the DNA sequence attached at each site is not only different from the other sites, but has a specific correspondence to the site, which we determine in advance.

For a streptavidin tetramer whose biotin-binding sites are used, the molecule's symmetry (with three 2-fold rotation axes -- less symmetrical than a regular tetrahedron) renders each site indistinguishable, but if one site is chosen arbitrarily, the other three are distinguishable from each other (they are all at different distances from the chosen site); this means that to have only one species of MBB, it is still necessary to produce only one geometrical arrangement of attached DNA sequences, out of the 6 arrangements that would be possible given only that exactly one copy of each DNA sequence is attached to each tetramer.

Some of the methods described here achieve the necessary specificity of DNA attachment by taking advantage of the different distances between different pairs of attachment sites, either during the construction of MBBs, or by separating the correct ones from the others after they have been constructed nonspecifically. One method makes use of site-specific blocking groups, one of site-specific attachment chemistries, and one of the ability to assemble certain multi-protein complexes from distinct subunits.

Attachment Methods

The methods are given in order of ease of description. Method 3 is the most complex to describe, but it is the one we will try first since it is likely to be both the easiest method to develop, and the one for which new MBBs differing only in attached DNA sequences can be made most quickly. Accordingly, we have investigated it more thoroughly than the other possible methods given here.

Method 1. DNA-conjugation to each monomer in a multi-subunit complex.

Core protein: some multi-subunit protein (to be chosen), in which several subunits occur only once in the complex, and in which subunits are available separately and can be mixed to reconstitute the complex in a unique arrangement.

Other requirements: ability to attach DNA to one site on each of several subunit proteins (using any of the methods mentioned previously, i.e. to an introduced surface cysteine, or to the amino or carboxyl terminus); some knowledge of structure of subunits and complex.

Outline of method:

Attach a different DNA sequence to each subunit in isolation, then mix the subunits so as to reconstitute the complex. Probably necessary to covalently crosslink the subunits to stabilize the complex.

Details:

Attachment methods are as described in Method 2.

Possible problems:

We have not yet searched for a suitable protein complex (for which there is enough structural knowledge), so we cannot be sure one exists.
Unless DNA attachment was entirely at chain termini, genetic engineering of subunits would be necessary (e.g. to introduce cysteines).
If the complex has to be stabilized by crosslinking, whether it would be acceptable for this crosslinking to be non-site-specific is application dependent (and not yet known). If site-specific crosslinking is necessary, this might require genetic engineering to remove or introduce crosslinking sites. If these were cysteines, the C-sites could not be cysteines.

Possible variations:

This method might also be applied to proteolytic fragments of some monomeric protein which can reassociate and be covalently linked into the native form [e.g. Gaertner et. al. 1992].
If some other method described here can be applied to each subunit individually, so that each subunit carries several attached DNA strands, combining the subunits as in this method might be used to generate MBBs with many more attached DNA strands than would be possible with other methods used alone.

Method 2. Use different attachment chemistry at each of three sites.

Core protein: a monomer with an N-terminal serine or threonine whose amino and carboxyl termini are exposed on its surface, provided they are sufficiently separated (more precisely, when the last few amino acids are flexible: provided that the points at which each chain-terminus becomes flexible are sufficiently separated), and with a single surface cysteine (sufficiently separated from the termini).

C-sites: the amino terminus, the carboxyl terminus, and the single surface cysteine

Outline of method:

Attach a different DNA sequence to each C-site by using an attachment protocol which affects only that kind of site. (Sequences with no significant complementarity should be used.)

Details:

For all attachments of DNA described in this method, the general strategy will be to modify each C-site (in a way specific for that site) to introduce a functional group not otherwise found on the protein, and then to conjugate the resulting modified protein to DNA (with a suitable functional group attached in a separate prior step).

Attachment of functional groups to cysteines (and the introduction of cysteines by genetic engineering) was discussed in the Introduction. Attachment of functional groups specifically to the amino terminus can be done by mild oxidation of an N-terminal serine or threonine [Fields & Dixon 1968; Geoghegan & Stroh 1992; Gaertner et. al. 1992]. Attachment of functional groups specifically to the carboxyl terminus can be done by reverse proteolysis followed by hydrazone bond formation, under mild conditions [Rose et. al. 1991; King et. al. 1986].

The specific choice of functional groups and final conjugation chemistries have not yet been made, but several alternatives appear to be available [Hermanson 1996]. If each attached DNA and its linker is stable under the procedures for attachment of subsequent DNAs (as is reasonable to expect given the mild conditions of the functionalization procedures referred to), it is likely that the same final conjugation chemistry can be used in each case, with each DNA added before subsequent C-sites are functionalized. If not, it will be necessary to attach all the DNAs at the end and thus to use three different final conjugation chemistries.

Since these methods have less than 100% yield, the correct MBBs should be purified at the end, for example by affinity separation using each required DNA sequence in turn, or gel-retardation (possibly using all complementary sequences at once), or by overall charge or molecular weight (e.g. by DNA-denaturing electrophoresis). (It will probably be desirable to purify the MBBs at various intermediate stages as well, especially during development of the protocols.)

The bases of the flexible portions of the termini (and also the sulfhydryl group of the cysteine) need to have some minimum separation, which I estimate to be 6 to 8 Angstroms, in order to permit use of the resulting MBBs in the DGAP process.

Possible Problems:

Only three attachment sites can be functionalized. For a few applications this will be sufficient (since 3 fixed points are enough to hold the protein in a unique orientation, as mentioned in the Introduction), but for most applications we would prefer to have at least four distinct C-sites.

Possible Variations:

If two surface cysteines were present, this method could be used to produce MBBs with four C-sites but with only three different kinds of DNA sequences attached to them (with two copies of one sequence on the cysteine C-sites). For some applications this would be preferable to having only 3 sites in spite of the nonuniqueness of two of the sequences.
It may be possible to use a single lysine as a fourth C-site with its own specific attachment chemistry.
This method might be combined with the hybridization detection used in Method 3 to allow two (or possibly more) cysteines to receive different sequences in a distinguishable way (assuming the appropriate C-site-pair distances were sufficiently different, as described in Method 3), making possible MBBs with 4 (or possibly more) distinct C-sites.

Method 3. Separation of various DNA-conjugated species of streptavidin, based on distances between DNA attachment sites.

Introduction
Details and discussion
Possible problems and variations

This is the method we will try first, since it is likely to be the easiest one for making many MBBs differing only in DNA sequences, which is desirable for constructing large assemblies of MBBs. (Unfortunately it is not the simplest method to describe.)

Core protein: streptavidin tetramer (could also use avidin or deglycosylated avidin [Green 1990])

C-sites: four biotin-binding sites

Outline of method (see Figs. 1-5)

Mix streptavidin tetramers in solution with two species of doubly-biotinylated ssDNA (described below; Fig. 5). The two biotins on each ssDNA will be designed to be close enough that they must bind to a pair of binding sites on a single side of a streptavidin tetramer (or to sites on two different tetramers) [Green et. al. 1971]. The desired MBB (Fig. 2) consists of single streptavidin tetramer conjugated to one ssDNA of each species, with a specific one of two possible geometrical arrangements, given that each ssDNA has bound to two sites on one side. Desired arrangement will allow hybridization between parts of the two ssDNAs on a single MBB, which will be impossible in the other arrangement due to different distances between different pairs of biotin-binding sites, or due to two copies of the same species of ssDNA being conjugated to one tetramer. Thus under non-denaturing conditions, only the desired end product particles will consist of just one streptavidin tetramer, conjugated to the right amount of DNA, and with the desired hybridization of some of that DNA; this will allow the correct MBBs to be separated from the others.

Details and discussion

This method takes advantage of the ease of obtaining biotinylated DNA and attaching it to this protein. (For various reasons (mentioned below) we may prefer to use either streptavidin, avidin, or deglycosylated avidin; the following discussion applies in any of these cases.)

This method depends on the symmetric structure of streptavidin (or avidin, which has a very similar structure) and the specific arrangement of its biotin-binding sites (or more precisely, the points at which bound biotin protrudes from the protein) [PDB files 1SLF and 1STP (click here to see them via a Java applet), RCSB Protein Data Bank; Green 1990, Green et. al. 1971, Livnah et. al. 1993, Hendrickson et. al. 1989.].

Figures 1a-d depict the structure of streptavidin in a schematic form.

Fig. 1a and 1b

Fig. 1a (corresponding top view shown in Fig. 1c) shows the approximate locations of the bound-biotin carboxyl groups ("biotin binding sites" B1-B4), which are on alternate vertices of an imaginary rectangular solid, embedded in the streptavidin tetramer, with the dimensions shown. Part of this solid, showing site B2, is also visible in Figs. 1b and 1d. (Fig. 1d also shows site B1.) (These dimensions were computed from measured inter-atomic distances in the PDB files referred to in the main text, but the atoms used to represent the binding sites were not parts of the carboxyl groups themselves, but were sulfur atoms within sulfate ions bound in approximately the same locations. The resulting error in binding site locations is estimated to be less than 1 A (Angstrom) in any direction, based on comparisons between PDB files containing either bound sulfate or bound biotin.)

Fig. 1b (corresponding top view shown in Fig. 1d) shows a highly schematic view of a streptavidin tetramer, along with the estimated length of each segment of the shortest paths (over the protein surface) which connect various pairs of binding sites. All path segments not shown are related by symmetry to one of the ones shown. (The paths themselves can be best seen in Fig. 2, although only the shortest path, from B1 to B3, is represented there.)

Fig. 1c and 1d : Top Views

The actual shape of the tetramer looks quite different from the shape shown, but the locations and lengths of the shortest over-surface paths (as visually inferred from the PDB file) are approximately correct. The figure has the same 2-fold rotational symmetries as the tetramer, as well as, for simplicity of presentation, additional mirror symmetries (of the overall shape only, not of the binding site locations) which the tetramer does not have.
Sites B1 and B3 are connected by three segments in succession of lengths 13 A, 22 A, and 13 A (the last segment is on the bottom and thus not visible in the figure). An alternative path from B1 to B3, going behind the protein (not shown), is much longer, with segment lengths of 24 A, 22 A, and 24 A.
Sites B1 and B4 are connected by segments of lengths 13 A, 22 A, and 24 A (as well as by another path of the same length behind the protein, not shown, with segment lengths 24 A, 22 A, 13 A).
All other pairs of sites are related by symmetry to one of these pairs (or to the B1-B2 pair, with a much shorter single-segment path of 22 A, not shown).
The procedure described in the main text depends on the difference between the shortest-path length connecting sites B1 and B3, and that connecting sites B1 and B4. This difference is estimated as 11 A (24 A minus 13 A). The accuracy of this estimate depends only on the accuracy of the 24 A and 13 A segment length estimates (since the 22 A segment and one of the 13 A segments are shared by both paths being compared).
The actual path taken by the ssDNA backbones and dsDNA helix (as shown in Fig. 2) would of course be longer due to the DNA's necessary separation from the protein surface, adding perhaps 4 A per corner turned, but this effect is approximately the same for both paths. The effective path lengths will have to be determined by experiment, as discussed in the main text, but it is likely that the actual path-length difference will be almost as great as that for the idealized paths shown in this figure.

We use the following facts about the structure of the streptavidin tetramer (Fig. 1) (a modified PDB file showing the features discussed is available on request):

Each monomer provides one biotin binding site, and all monomers (and thus all binding sites) are equivalent to each other (i.e. are related by some symmetry transformation of the tetramer as a whole).
In spite of the equivalence of individual sites, different pairs of sites can occur in three different relative positions (i.e., given any one site, each of the other three sites is distinguishable relative to the chosen site). We use the different over-surface (geodesic) distances between different pairs of sites to distinguish (by presence or absence of hybridization) between the conjugates in which the biotinylated ssDNA has been attached in the desired or undesired arrangement. (More precisely, we use differences between the lengths of specific combinations of ssDNA/dsDNA which would be sufficiently long (taking the lack of flexibility of dsDNA into account) to join different pairs of sites.)
The tetramer has 222-point symmetry, i.e. three mutually-perpendicular 2-fold axes of rotational symmetry. It is therefore chiral, even ignoring the protein itself and considering only the spatial locations of the biotin binding sites. These sites (i.e. the biotin carboxyl groups protruding from the protein surface) are located at 4 of the 8 corners of an imaginary rectangular solid (embedded within the protein) of dimensions 9 by 20 by 28 Angstroms (Fig. 1a, and calculations from PDB coordinates). The pairs of sites on the same face of the protein are at corners diagonally across from each other on the 9 by 20 Angstrom faces of this imaginary solid. (Thus the sites in each pair are about 22 Angstroms apart.)

Figures 2a-d depict the structure of desired end product MBB in a schematic form.

Fig. 2a and 2b

Fig. 2c and 2d : Top Views

Fig. 2 uses the same views as Fig. 1, but shows the locations of ssDNA and dsDNA in the desired product of the initial mixing step (with the hybridization which will only be possible for this product). ssDNA12 (see Fig. 5 in text for nomenclature) is shown in blue, and ssDNA34 in red. The ssDNA ends are labelled with 3' and 5' and with e1-e4 (for end 1 through end 4) as in Fig. 5. The backbones of the hybridized dsDNA region are shown as small rectangles forming a double helix (with the major groove facing the protein surface). The segments which remain unhybridized are shown as straight or wavy lines depending on whether they will be stretched to almost their maximal lengths (true for the ssDNA bases shown as dots in Fig. 5) or will remain free to move (true for the ssDNA bases shown as X's in Fig. 5). The wavy lines (ssDNA X bases) are the ones intended to be left free for further hybridization when the product MBB is assembled with other MBBs in the DGAP process.

The two species of doubly-biotinylated ssDNA to be attached have structures as follows:

Figure 5:

name                    structure (with end labels)
-------   --------------------------------------------------------------
ssDNA12:  end 1 ->  3'XXXXXXXXXXxxxxxxx....b.....bXXXXXXXXXX5'  <- end 2

ssDNA34:  end 3 ->  3'XXXXXXXXXXxxxxxxx....b.....bXXXXXXXXXX5'  <- end 4

Key to symbols:

3' and 5' indicate the orientation of each ssDNA backbone
b: biotin (assumed attached directly to backbone; see below)
X (upper-case): any DNA base which will be left unhybridized in the desired MBB structure (and is used only when the MBB is later assembled by the DGAP process)
x (lower-case): any DNA base which will be hybridized in the desired MBB structure (choice of specific sequences will be discussed later)
. (period): a DNA base used solely as a spacer (i.e. it may be replaced by some other flexible linker molecule if necessary)

(The precise number of bases to be used in specific segments will have to be determined by experiment, though the lengths shown above are meant to be approximately correct.)

(Biotin is often attached to DNA with a long linker arm; in the structures above I am assuming it can be attached directly into the ssDNA backbone with no additional linker arm. This is a reasonable assumption given the wide range of biotin attachment configurations in use. If this is not true we will need to use linkers other than ssDNA bases in place of some of the DNA bases shown by periods above.)

Figure 3 shows some of the species of conjugates that can be obtained as products of the initial mixing step. (As mentioned previously, the separation between the two biotins on each ssDNA molecule is kept short enough that, if both biotins bind to one streptavidin tetramer, they must bind at either sites 1 and 2, or at sites 3 and 4, since all other pairs of sites are separated by a greater distance than the biotins are.)

Figure 3a-h depict products of initial mixing step (assuming no dimerization).

Fig. 3a and 3b : Desired product (assuming no dimerization).

Fig. 3c through 3g : Undesired products with right amount of DNA (hybridization not possible without forming aggregates).

Fig. 3h and 3i : Undesired products with wrong amount of DNA (examples).

Figure 4. Dimer of undesired products (one example).
Figs. 3 and 4 show various products of the initial mixing step (and subsequent hybridization). The individual subfigures are discussed in the main text.
The ssDNAs are colored and labelled as in Fig. 2. Hybridized dsDNA regions are shown as colored ssDNA backbones on the surface of an imaginary cylinder representing the shape of the helix. Biotin is shown as a small T-shape.
Streptavidin tetramers are shown even more schematically than in the previous figures, as squares with four T-shaped holes representing biotin-binding sites. These squares are oriented as if the rectangular solid in Fig. 1a was seen from the left side (not from the front or back). The biotin sites in the tetramers shown in Figs. 3a and 3b are labelled B1-B4 accordingly. Note that four different labellings would be valid, due to the symmetry of the tetramer; in particular, Fig. 3b is identical to Fig. 3a if rotated 180 degrees around a vertical axis. (The other axes of 2-fold rotational symmetry are the horizontal axis and the axis perpendicular to the plane of the figures.)

It will be desirable to do the mixing at sufficiently low concentrations of all ingredients that each ssDNA molecule or streptavidin tetramer usually encounters only one other molecule at a time, to maximize the chance that both biotins of one ssDNA molecule bind to the same streptavidin tetramer. (A low concentration is also necessary to avoid aggregation of streptavidin due to low solubility.) The initial mixing should probably be done under denaturing conditions for DNA, so that individual ssDNA molecules are usually encountered separately, but we will determine by experiment whether this is actually better, and which ingredients should be in excess, for maximizing the yield of the desired product and for ease of the final separation steps.

For ease of discussion, I will describe the separation in two steps even though one combined step may suffice. The first step will remove all structures other than the ones with exactly one protein and two ssDNAs per particle. (Two such undesired particles are shown in Figs. 3h and 3i.) (We have not yet determined which separation technique to use. Niemeyer et. al. [1994] have demonstrated separation of streptavidin-DNA conjugates carrying varying numbers of DNA molecules by both ion-exchange chromatography and non-denaturing PAGE. Isoelectric focusing might also be expected to be useful. Niemeyer et. al. [1994] have also demonstrated gel-retardation of streptavidin-DNA conjugates by complementary DNA, which might be necessary for separation of streptavidin-PNA conjugates if we use PNA for other reasons.)

The remaining particles include some with two copies of the same ssDNA (Figs. 3d-3g), and some with one ssDNA of each kind (Figs. 3a-3c); of the latter, some have the desired geometrical arrangement of ssDNA (Figs. 3a and 3b) and some have the other arrangement (Fig. 3c). (All arrangements not shown are equivalent by symmetry to some arrangement which is shown.)

In order to distinguish between these species, we will design the ssDNA sequences so that sequences 1 and 3 (where sequence n means the DNA between end n and the nearest biotin) can hybridize as shown in Figs. 3a and 3b (and Fig. 2), but only if the biotins nearest to ends 1 and 3 are attached to sites 1 and 3 (or the symmetrically equivalent pairs of sites, 3 and 1, 2 and 4, or 4 and 2). This is possible because these pairs of sites are significantly closer (along a path over the protein surface) than the other pairs to which these biotins could be attached (see Fig. 1). (The sensitivity of using the presence or absence of hybridization to distinguish these inter-site distances is discussed below.)

Nondenaturing electrophoresis can be sensitive to differences in hybridization, so we should be able to detect this difference directly, perhaps in the same separation step in which we remove particles with the wrong amount of DNA or protein. Also, particles whose 1 and 3 strands (or two 1 strands or two 3 strands) are prevented from hybridizing to each other are likely to form dimers (or larger polymeric aggregates) in which strands attached to different proteins hybridize (Fig. 4); such particles would certainly be separable in the initial step.

The end result will be the separation of the desired product from all other products of the initial mixing step. In some applications the product can be used directly (in spite of its attached double helix). For other applications we may want to alter the ssDNA using further routine methods, such as ligation to dsDNA with a long overhang (though whether the ligase will be sterically hindered is unknown). Other possible modifications to the final MBBs are discussed below.

Possible problems and variations

Undesired hybridization between otherwise-correct MBBs:
We are assuming that preparation conditions can be found such that hybridization will usually occur between DNA sequences attached to a single particle whenever that is possible, and occur between two particles, if at all, only when neither single particle has the right arrangement of sequences to permit hybridization. Although entropy of particle motion would disfavor dimers, other factors might invalidate this assumption, which will thus have to be tested. It may be possible to initiate rapid hybridization while keeping the particles at sufficiently low concentration (or in a low-mobility environment such as a gel) that most MBBs whose strands can hybridize intra-molecularly do so, and then to "lock in" these hybridizations even after the particles are more concentrated (either by maintaining strong enough hybridization conditions, or by stabilizing the hybridized regions with intra-base-pair disulfide bonds [Goodwin et. al. 1994]).

Sensitivity of hybridization to inter-site distance:
The path between the desired pair of sites is about 11 Angstroms shorter than the path between the undesired pair of sites. Since the length of the desired ssDNA/dsDNA configuration (as in Fig. 2) can be adjusted in steps of about 5.9 Angstroms [Smith 1996, Saenger 1984] by adding extra bases into single arms of the ssDNA region, or by even finer increments by incorporating non-DNA linkers into the backbone of the ssDNA part, it will be possible to find lengths of DNA which can fit, in hybridized form, in only one of these two cases. (The distance estimates are not precise, but the difference between the two paths is better known than the path lengths themselves because the paths are composed of identical segments for most of their lengths (Figs. 1 and 2).)

The dehybridization of one base pair of the dsDNA region would allow the total length of the ssDNA/dsDNA combination to increase by only about 2.5 A (the difference between the length of ssDNA per base, 5.9 A, and the rise of one base pair in dsDNA, 3.4 A). Thus, to make up for the 11 A difference, at least 4 of the base pairs would have to separate, so the two ssDNA strands could not hybridize when their ends were separated by the longer path length. (Both ssDNA and dsDNA can be stretched to 7 A per base under sufficient tension [Smith 1996, Saenger 1984], but this seems unlikely to be preferred over a lack of hybridization.)

Possible errors in estimates of inter-site distances:

I estimated over-surface distances on the protein by visual inspection of 3D protein models based on PDB files (to guess the shortest paths between attachment sites; Fig. 1) and calculation of line-of-sight distances between specific atoms appearing to lie on those paths. (I added sufficient length to account for the actual path of a chain of bonded atoms being separated from the surface by 4 A due to steric hindrance.) (Some parts of the actual paths for the ssDNA segments must be slightly bent outwards compared to the calculated paths, but this effect appears to be about the same for each path.)

There are numerous sources of possible error in this estimation. Furthermore, I have neglected consideration of possible interactions between the DNA and protein other than steric hindrance, notably electrostatic forces, which might strongly favor some paths and oppose others. Thus the actual threshhold lengths for hybridization will have to be experimentally determined. (I have developed outlines (not included here) of preliminary experimental protocols to determine the path lengths necessary for hybridization, in which each path's required length of DNA can be determined independently. Therefore it should not be necessary to try all pairs of path lengths in combination. Given the range of likely path lengths, a few trials in succession should suffice to measure them, once the experimental technique itself is debugged.)

Since the estimate of path-length-difference is more reliable than the estimates for the path lengths themselves (as discussed above), and since this difference is sufficient to prevent hybridization of 4 base pairs, the margin of error suggests that this method is likely to be workable.

If it appears that electrostatic effects are causing problems, we have the option of using NeutrAvidin (a form of deglycosylated avidin available from Pierce Chemical Co., with an isoelectic point much closer to pH 7 than streptavidin), and/or PNA (which is uncharged), as well as increasing the ionic strength.

Stability of biotin-streptavidin attachment:

The biotin-streptavidin interaction has a half life for exchange of biotin of only a few days at 25 degrees C (precise value depends on pH) [Green 1990, Jones & Kurzban 1995]. At 4 degrees C the half life is much longer (undetectable, according to Jones & Kurzban [1995]). The biotin-avidin interaction is much more stable, with a half-life of 200 days at pH 7 and 25 degrees C [Green 1990]. It may be desirable to stabilize streptavidin-based MBBs with additional covalent crosslinks between the protein and the biotin-DNA conjugate, or to stabilize assemblies of MBBs by covalent crosslinks between the proteins (as assumed will be desirable in general for use of the DGAP process). Details of possible covalent crosslinks have not been developed. Genes of both avidin and streptavidin are available for genetic engineering if necessary for surface residue replacements [Green 1990, Chandra & Gray 1990]. (Having to genetically engineer the protein would remove some of the advantage of this method in ease of development, compared to some of the other ones discussed here, which can furthermore be applied to a wider variety of proteins. However, once the modified protein was developed, this method would still be easier to practice than the others when many MBBs differing only in DNA sequences were desired.)

Stability of tetramers:

We cannot exclude the possibility that the individual monomers in streptavidin or avidin tetramers might rearrange [Jones & Kurzban 1995], or even exchange between proteins, at some slow rate, rendering our "building blocks" unstable. If so, we will have to stabilize the tetramers with covalent crosslinks of some kind, such as disulfide bonds between genetically-introduced cysteines.

Oligo-tetramers:

Aggregation of streptavidin tetramers into higher-order forms, perhaps covalently crosslinked, has been reported [Bayer et. al. 1989]. If necessary, we can remove these from our starting materials by gel filtration [Bayer et. al. 1990]. However, since the final separation must remove tetramers linked intermolecularly by doubly-biotinylated ssDNAs, it will probably not be necessary to purify these from the starting material in a separate step.

Alternative separation methods:

The two ssDNA sequences shown are of the same length, but it would be possible to use sequences of two different lengths, which would aid in the separation of the particles with two copies of one sequence, and would provide more information about the yield of each form (shown in Fig. 3) after the initial mixing.

Application to other proteins:

An analogous method should be possible for other tetrameric proteins with the same 222-point symmetry as streptavidin, using a DNA- attachment method other than biotin binding. Low protein concentration will favor attachment to one protein of both conjugation groups on each ssDNA during the initial mixing/binding step, even if this conjugation is slow and possibly reversible. Subsequent steps will be precisely analogous except for the required DNA lengths being different.

Stabilization of hybridized region of MBB:

It may be possible to stabilize the hybridized dsDNA region in the final MBB with intra-base-pair disulfide bonds [Goodwin et. al. 1994] in case this would help with assembly of several MBBs.

Digestion of hybridized region with restriction enzymes:

Alternatively (or in addition) we may want to digest the hybridized dsDNA region with a restriction enzyme, in which case that region will have to be made at least 8 base pairs long [catalog, New England Biolabs, 1996/97, p. 238], and we will have to test for steric hindrance of the enzyme by the core protein. (If steric hindrance occurs, this technique could probably still be used if restriction itself was used as the test for correctness of an MBB.) The hybridized regions left after restriction would be sufficiently short (4 base pairs minus half of the sticky-end length made by the enzyme) not to interfere with hybridization of the resulting ssDNAs to ssDNAs introduced later, e.g. from other MBBs in a DGAP assembly. Although the restriction-shortened ssDNAs would have the same sequences for short lengths at their ends (due to the restriction site being palindromic), they could still be different farther from the end, and thus be specific for hybridization to different external sequences.

References.

Bayer, E. A., Ben-Hur, H., Hiller, Y., and Wilchek, M. (1989).: Postsecretory modifications of streptavidin. Biochem. J. 259, 369-376.
Bayer, E. A., Ben-Hur, H., and Wilchek, M. (1990).: Isolation and properties of streptavidin. Methods Enzymol. 184, 80-89.
Chandra, G., and Gray, J. G. (1990).: Cloning and expression of avidin in escherichia coli. Methods Enzymol. 184, 70-79.
Fields, R., and Dixon, H. B. F. (1968).: A spectrophotometric method for the microdetermination of periodate. Biochem. J. 108, 883-887.
Gaertner, H. F., Rose, K., Cotton, R., Timms, D., Camble, R., and Offord, R. E. (1992).: Construction of protein analogues by site-specific condensation of unprotected fragments. Bioconjugate Chem. 3, 262-268.
Geoghegan, K. F., and Stroh, J. G. (1992).: Site-directed conjugation of nonpeptide groups to peptides and proteins via periodate oxidation of a 2-amino alcohol. Application to modification at N-terminal serine. Bioconjugate Chem. 3, 138-146.
Goodwin, J. T., Osborne, S. E., Swanson, P. C., and Glick, G. D. (1994).: Synthesis of a disulfide cross-linked DNA triple helix. Tetrahedron Letters 35, 4527-4530.
Green, N. M., Konieczny, L., Toms, E. J., and Valentine, R. C. (1971).: The use of bifunctional biotinyl compounds to determine the arrangement of subunits in avidin. Biochem. J. 125, 781-791.
Green, N. M. (1990).: Avidin and streptavidin. Methods Enzymol. 184, 51-67.
Handel, T. (1995).: Personal communication.
Hendrickson, W. A., Pahler, A., Smith, J. L., Satow, Y., Merritt, E. A., and Phizackerley, R. P. (1989).: Crystal structure of core streptavidin determined from multiwavelength anomalous diffraction of synchrotron radiation. Proc. Natl. Acad. Sci. USA 86, 2190-2194.
Hermanson, G. T. (1996).: Bioconjugate Techniques. Academic Press, San Diego.
Jones, M. L., and Kurzban, G. P. (1995).: Noncooperativity of biotin binding to tetrameric streptavidin. Biochem. 34, 11750-11756.
Kanaya, S., Nakai, C., Konishi, A., Inoue, H., Ohtsuka, E., and Ikehara, M. (1992).: A hybrid Ribonuclease H: A novel RNA cleaving enzyme with sequence-specific recognition. J. Biol. Chem. 267, 8492-8498.
King, T. P., Zhao, S. W., and Lam, T. (1986).: Preparation of protein conjugates via intermolecular hydrazone linkage. Biochem. 25, 5774-5779.
Livnah, O., Bayer, E. A., Wilchek, M., and Sussman, J. L. (1993).: Three-dimensional structures of avidin and the avidin-biotin complex. Biochem. 90, 5076-5080.
New England Biolabs (1996).: 1996/97 Catalog.
Niemeyer, C. M., Sano, T., Smith, C. L., and Cantor, C. R. (1994).: Oligonucleotide-directed self-assembly of proteins: semisynthetic DNA- streptavidin hybrid molecules as connectors for the generation of macroscopic arrays and the construction of supramolecular bioconjugates. Nucleic Acids Res. 22, 5530-5539.
Rose, K., Vilaseca, A., Werlen, R., Meunier, A., Fisch, I., Jones, R. M. L., and Offord, R. E. (1991).: Preparation of well-defined protein conjugates using enzyme-assisted reverse proteolysis. Bioconjugate Chem. 2, 154-159.
Saenger, W. (1984).: Principles of Nucleic Acid Structure. Springer-Verlag, New York.
Saraswat, L. D., Pastra-Landis, S. C., and Lowey, S. (1992).: Mapping single cysteine mutants of light chain 2 in chicken skeletal myosin. J. Biol. Chem. 267, 21112-21118.
Smith, S. B., Cui, Y., and Bustamante, C. (1996).: Overstretching B-DNA: The elastic response of individual double-stranded and single-stranded DNA molecules. Science 271, 795-799.