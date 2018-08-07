Here's a look (open access) at Eli Lilly's (NYSE:LLY) screening collection in terms of PAINS filters, and there are things for everyone to argue about in it. The entire concept of these filters has been occasion for argument, of course. Allow me to caricature some of the opinions that you hear in these: at one extreme, it would go something like, "Damn right. The screening collections have compounds in them that never should have been put in there in the first place and just waste everyone's time. Filtering those out at the start is a public service." The other end of the spectrum would be, "Barbaric. We're supposed to be scientists and make our judgments based on evidence. Crossing off compounds just because you don't like their structures (or even worse, just because somebody else didn't) will reduce your chances of ever finding chemical matter to work with. Get your noses out of the air".

There's something in both of these exaggerations. My own opinion is that it's a sliding scale. As you progress along it, the chances that a given compound (or whole structural class) is going to waste your time increase. If you want to work on them past a certain point (a point of your own choosing, mind you), then you should do so in full knowledge that there are people who would not, and know why not. I'm sliding over a number of important details, of course - how that scale is constructed, who says what compounds are on it, the issues with them varying according to the assay, their intended use, etc. But in the abstract, that's my own take.

The Lilly group took a look at its compounds via the original PAINS filters and evaluated things based on their behavior in six assay types (including the AlphaScreen format that started the whole thing), compound stability, cytotoxicity, and Hill slope. A high value in that last one generally indicates promiscuous/multivalent binding as opposed to a 1:1 complex - details here and here. This represents around 14 million data points (often dose-responsed) out of over 3,000 assays, so it's a pretty good-sized selection. Compounds that had been flagged as impure were excluded (but see below). Calculating the hit rate of the PAINS-flagged compounds versus a random set showed that AlphaScreen, FRET, and fluorescence polarization assays (in that order) showed the most enrichment in promiscuous compounds (recall that the original PAINS paper came out of an AlphaScreen campaign). Looking at the specific structural alerts, it also appeared that these are more AlphaScreen-focused. The only two that really seem to cause pan-assay trouble are 1,4-diamino aryls, with one nitrogen substituted by dialkyl, and our good old friends the rhodanines (with the exo alkene bond).

There's another effect at work with the structural filters, though. This paper suggests that some of them may not be flagging promiscuity across assays as much as they are chemical instability, with the parent structures falling apart to the (reactive) species that are really causing trouble. That's a worthwhile distinction, especially since the numbers of these bad actors don't need to be very large to blow an assay. Many readers here will have had experiences with these things, and a couple of my own are here. You cannot assume that you are always screening a pure set of compounds, because there will always have been some interval since these things were last checked (and that's assuming everything was clean to start with, which is a mighty generous assumption).

Once you move on to cell assays, you have cytotoxicity and other such off-target garbage to worry about. There are far, far too many papers in the literature that report Compound X as hitting in a primary assay of some sort, then show that Compound X is active in some sort of cell assay, and conclude (or let the reader conclude) that these two things are necessarily connected. When it ain't necessarily so. That is especially true for things like growth assay in tumor cell lines and the like. Just assuming that such activity is due to your primary assay readout is... ambitious. The Lilly group found that several PAINS filters seem to be associated with general cytotoxic effects, with the anilines mentioned above and some quinone structures as particular offenders.

As for the Hill slope data mentioned above, a number of PAINS substructures seems to be associated with high values, which is generally not a good sign. The FP and FRET assays seem to be especially sensitive to this sort of thing, but overall, I'd say that this is the category that the filters deliver pretty good value for the effort. To use that sliding scale view that I was talking about before, this would mean that if you find an interesting compound in one of these classes, an early action item should be to check the Hill slope that you got in the assay and keep an eye on it in follow-up assays.

And that leads in to the overall lesson, too: (1) there are active compounds from assays whose structures lower their chances of success. (2) Some of these are true hits that will nonetheless be difficult to progress, and others are flat-out false positives. But (3) these bad-actor structural classes can all be context-dependent, and some of them are intrinsically more worrisome than others. So, (4) use the filters and literature reports on them as guidelines for how to deal with them. You don't kill a compound in the absence of data, but the filter tells you what data you may need to pay attention to immediately. (5) Check the purity again on both the solid and DMSO solution samples (and do a quick silica gel plug or HPLC for good measure). Check the Hill slope. Check for aggregation in your assay buffer. Check for redox cycling. Check for activity in other assays, if you have such reference data. You actually should be doing these things for all your hits of interest, but setting off one of the various PAINS alerts is a reminder that you ignore them at your peril.

