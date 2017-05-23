There are a lot of interesting and useful experiments you can do to test interactions with DNA in the living cell. Chromatin Immunoprecipitation Sequencing (ChIP-Seq) is a well-known one to spot protein-DNA interactions, and the graphic below (from the Swiss Institute of Bioinformatics) will show you broadly how it works. Proteins that are interacting with DNA at your experimental time point are cross-linked onto it, and the DNA itself is then sheared into smaller pieces. The linked DNA-protein pieces are then immunoprecipitated with the relevant antibod(ies), then the DNA involved is liberated again and sequenced. The antibody you use lets you dial in on a particular protein, and you then get the sequence it’s interacting with. There are, of course, a lot of subtleties in the way the experiment is run and analyzed (for example, optimizing the time of the crosslinking step, which can indeed go on for too little or too long).

A similar sort of experiment can be done to determine DNA-DNA interactions, which gives you information about the three-dimensional chromatin structure (chromosome conformation capture). The first of these was called 3C, from that phrase, and it tests one particular interaction at a time. (A combination of the 3C experiment and the CHiP one is the gloriously named CHiA PET protocol). Chromosome conformation capture-on-chip (4C) lets you test one locus against all the others, and 5C (chromosome conformation capture carbon copy) is a way to test all sorts of loci and their interactions within a certain region, all at the same time. The next wrinkle on all this, Hi-C (and we can be grateful that the inventors resisted the temptation to come up with a 6C acronym), is the “all against all” experiment, testing (in theory) all the chromatin interactions at once, and is getting a lot of use these days.

A Hi-C schematic (from that last reference) is shown above. Broadly, DNA is crosslinked to whatever histone protein it’s in proximity with, and these links remain after the DNA is broken into convenient pieces with restriction enzymes. Then the 5′ overhangs are filled in with oligonucleotides that have biotin residues attached to them (a key step), followed by “blunt-end ligation”, under conditions that will give almost entirely products that are already in proximity to each other (as opposed to between different sets of pre-crosslinked partners). This also means that the junction points are marked with biotin residues, which comes in handy when you shear things into pieces and can pull those pieces down with streptavidin-coated beads. You wash everything else off and sequence the purified junction pieces, giving you a whopping library of interacting fragments, which can be mapped back to show you what the three-dimensional structures were that were likely to produce them. Since the original reports, the technique has been extended in all sorts of ways, including down to single-cell levels and combinatorial single-cell studies as well. As I understand it, Hi-C is the way to go to get a general look at chromosomal organization, while the resolution of the 4C experiments is higher if you want to take a closer look at a specific interaction.

These experiments are all very nice, and they give you a lot of information that’s otherwise not easy to come by (especially in such quantities). They both clearly depend on the speed and low cost of modern sequencing techniques – you wouldn’t have dreamed about trying such experiments in the old days, sonny (just get my wife started sometime on her days of dealing with those big DNA sequencing gels back in the early 1990s). They also assume modern software and processing power to assemble all these little chunks of data back into a coherent whole (and that is still no stroll through the peonies, for sure). But the chemists in the audience – and I hope some of you are still with me here – may have noticed that I’m glossing over still more details of interest.

Specifically, these techniques also depend on a crosslinking step between DNA and protein, and it’s worth stopping to ask just how that’s done. The answer is formaldehyde, and plenty of it. From an organic chemistry standpoint, we’re looking at a lot of (thio)aminals, gem-diamines, etc. Interestingly, the structures of the most likely adducts were not reported until 2010. Another thing to remember is that these reactions are reversible – which to be sure, is part of their appeal in the protocols above – but it’s important to have some idea of the relevant rate constants. (This paper addresses this directly, and good for them). There’s been a recent report using isotopically labeled formaldehyde along with mass spec to get more details of the process.

And beyond these issues, you have to wonder about the degree to which all these various crosslinks form. The assumption that formaldehyde is a Universal Crosslinker does not seem to be justified:

The most problematic step is formaldehyde fixation. It is commonly believed that formaldehyde can fix any DNA–protein complex. However, this assumption is far from being universally verified. For example, the lac repressor cannot be fixed to DNA by formaldehyde, even though its DNA-binding domain contains a number of basic amino acid residues [2]. The same was reported for NF-κB [3]. To specialists in the field, there are chromatin components that are proverbially difficult to cross-link, and specific protocols have been elaborated to solve this problem in some individual cases, mainly in an empiric way (see for instance [4]). It was established that there is a temporal threshold for cross-linking reactions such that once the residence time of a protein drops to <5 s, it becomes ‘invisible’ to formaldehyde cross-linking [5]. The formaldehyde fixation procedure remains in fact empirical, and little is known about the specificity of in vivo cross-linking, its efficiency and the chemical adducts induced by this procedure. Therefore, scientists performing cross-linking experiments are actually flying blind, and this can cause major problems in data interpretation [6, 7].

As those references show, there are certainly research groups that are looking into these questions, but at the same time, I think it’s safe to say that there are a lot more papers published that just take the data and run with them. There are surely some cases where this has messed things up, and probably a lot more instances where it’s led to incomplete analyses. As the quote above details, there are a lot of reasons for crosslinks not to form, or for some to be formed much more readily than others, and these are the (sometimes unacknowledged) backdrop for all the data handling that follows. These problems have led to efforts to find new crosslinking agents, and it will be interesting to see how these get picked up by other researchers.

And all this, to me, is an example of where organic chemistry eyes can help out with chemical biology and the molecular biology beyond it. I’ve long thought that the key to doing good chemical biology is to recognize that you’re making and breaking real bonds between real molecular species – abstraction is necessary to put these things up on a screen during a seminar, but abstraction is not what’s happening in your vials. In these experiments, for example, there are protein-DNA interactions that you’re just not going to capture with the standard formaldehyde protocols, and you have little or no way of knowing a priori what you might be missing, or how important the missing things might be. “Flying blind”, as that reference above puts it, can only get us all so far, and I think that synthetic chemists should be helping to light up the instrument panel.

