Generativity / Kluster


Kluster: Variation and Similarity 

Subscribe button

In 2017, the Edge Foundation posed as its Annual Question, the query “What scientific term or concept ought to be more widely known?” Among the many responses was one from John Naughton, a senior research fellow at the University of Cambridge. Naughton’s suggestion was the long-forgotten theory of Ross Ashby, a British medical doctor and cybernetician, active in the 1950s. Cyberneticists are interested in control mechanisms, and Ashby was particularly interested in homeostasis: how complex systems operating in changing environments succeed in maintaining critical functions within tightly defined limits. 

To Ashby, for a system to be stable, the number of states that its control mechanism is capable of attaining must be greater than or equal to the number of states in the system being controlled. In other words, the control mechanism must exhibit the capacity for more variation than the variability exhibited by what is desired to be controlled. From this he derived his ‘Law’ of Requisite Variety.

As Naughton writes in his Edge response:

In colloquial terms Ashby’s Law has come to be understood as a simple proposition: if a system is to be able to deal successfully with the diversity of challenges that its environment produces, then it needs to have a repertoire of responses which is (at least) as nuanced as the problems thrown up by the environment. So, a viable system is one that can handle the variability of its environment. Or, as Ashby put it, only variety can absorb variety.

It’s not hard to migrate this concept from the control of a complex system to the control of a complex pathology; and indeed, perhaps even without knowing of Ashby’s work, modern-day drug polypharmacy has grasped its significance. For example, many of the older readers will remember the dire days of the early AIDS epidemic, when powerful reverse transcriptase inhibitors were sequentially inflicted on the patient, each one being replaced by another after the virus mutated to the now-ineffectual current drug. Eventually, it was concluded that this sequential approach just gave the virus time to ‘go to school’ on each drug.

Why not give several drugs simultaneously—but ensure that each drug worked by a mechanism not shared by any of the others; beginning the ‘combination therapy’ that essentially points a gun to the head of the virus and forces it to mutate to THREE-TO-FOUR things simultaneously, something even HIV cannot do. This approach not only revolutionized HIV treatment but also retrieved the concept of polypharmacy from the waste bin of despised medical theories.

Polypharmacy is, of course, nothing new to practitioners of traditional medicines. For example, Traditional Chinese Medicine (TCM) has a long and distinguished history of developing and utilizing various classic combination formulas, many of which were developed hundreds of years ago and still in usage today.


There’s an App for This

Developing an effective algorithm for the determination of requisite variety has been an on-and-off dream of mine, and some free time provided by the pandemic gave me just the opportunity.

You can run this app (Kluster) from any laptop, pad, or desk computer (sorry, not great on smart phones) by pointing your browser (extensively tested on Chrome) here: https://www.datapunk.net/tlfd/.

To protect against tiresome robots, you may have to prove you are human by simply moving the slider’s blue dot to the middle of the range. If you’ve been here recently, you’ll already have the access cookie and will just be directed to the main splash page. From there click on the link that reads Kluster (Similarity) under the category Networks. That will take you to the front page of the app.

The Kluster Pathology Selection Screen

I decided to base the structure of the algorithm on specific pathologies that have been categorized by their genomic/molecular expression. For this I used the Diseasome 2.0 dataset,1 composed of an updated version of the OMIM data that replicated and expanded upon the Goh and Barabasi initial disease network data.2 Both investigator teams used a variety of methods to identify gene/disease links, and the resultant data provides us with many thousands of individual records, each of which includes the MeSH disease name nomenclature, the associated gene symbols, and an ‘association weight’ reflecting the the strength of the association.

Thus, the app begins with a simple screen that allows the user to specify one or two diseases that are the subject of their inquiry. The available pathologies are those for which the Diseasome datasets possess information about.

At this point, we’ll just plunk in two pathologies and see what happens. In this example we’ll select ‘Hypertension’ at pathology #1 and ‘Dermatitis, Atopic’ as pathology #2. Now we just press the ‘View Possible Agents’ button and move onto the next screen.

Once the smoke clears, we are presented with a scrollable table listing a variety of natural agents sorted by the ‘coverage’ which is simply how close the pattern of the agents recorded effects on gene expression match the gene expression profile of the two selected pathologies, since, as it turns out, finding requisite variety first begins by identifying similarities. The agent expression data was extracted from the extensive dataset developed by the editors of a genomic software platform I’ve developed called Opus23 (https://www.datapunk.net/opus23).

The Kluster Agent Selection Screen

Next to coverage value are four radio buttons, three assignment slots (A, B, C) and an ‘ignore this’ button (X).  You can select up to three agents for evaluation. To do this, just tick either A, B or C next to the agent. If you decide to change your mind about an agent and select another, just tick the X button and it will be ignored. The ABC slots have no difference in value, i.e., an agent assigned to A is not valued any different than an agent assigned to C —they’re just placeholders. We’ll put Pycnogenol in slot A, Ginkgo biloba in slot B and Olive Oil in slot C. Now we press the ‘Analyze Agent Cluster’ and move onto the final screen.


Game, Set, Match

Before we move on to the punch line, let’s spend a minute or two talking about sets, which are simply collections of things. If theory is not your thing, just move onto the next section.

In data science, the simplest set is known as an array, which can hold as little as two elements, or as many as your computer’s memory allows. What makes arrays so useful is that they are indexed, in a manner not dissimilar to your basic egg carton. We don’t normally do this, but you could (if you’re that type of person) ask someone to give you the third egg from the bottom in the leftmost column of the cartoon. Evaluating the difference or sameness of things relies on this very basic principle.

To do this we use a variety of coefficients (things that get multiplied by some variable) to get the similarity of two compared sets (arrays) of data, in this case the arrays of genes linked to the selected pathology (A) and the three chosen agents (B).

Kluster uses three different coefficients to gauge set similarity, Dice, Jaccard and Cosine, but all utilize some version of the following formula: (A intersect B) / 0.5 (A + B)

Which we can describe as the number of elements in common to both sets relative to the average size of the total number of elements present and utilizing some sort of weighing factor (a value between 0 and 1, in this case 0.5.). Once we calculate similarity between pathology and agent sets, variance between agents is simple; we get a number for the similarity between two agents and invert it.


Payoff

The Kluster Gene Intersection Screen

Okay, so let’s look at the third screen and see what I mean. The top part is a basic recapitulation of what Kluster has discovered and is using to do its magic. Thus, we see that for the union of gene sets belonging to either ‘Hypertension’ or ‘Dermatitis, Atopic,’ we show 116 listed gene associations.  Next, we see the set intersects between the gene profile of the pathologies and each selected agent. For each agent we also see the specific gene intersects, and a value termed ‘Gene/Pathology Association.’

This value represents the aggregate of the ‘strength’ (‘association weight’) assigned to each gene for that pathology. This is exemplified by looking at the scores for Pycnogenol and Ginkgo biloba. Gingko actually has more gene hits (9) than Pycnogenol (6) but has a lower association value (54) than Pycnogenol (61). This is likely because Pycnogenol (like olive oil) has an intersect with the ACE gene, a very high value association in hypertension, while Ginkgo does not.


Ah, But Not So Fast

Scrolling down a bit we encounter the next table, and like the dutiful program that it is, Kluster identifies that, via brute force similarity, Gingko is indeed the winner in our contest, simply because its gene set has the most coverage of the gene set associated with the union of Hypertension and Atopic Dermatitis.

However, things change completely when we examine not the extent of a single agent’s coverage, but rather the degree of variety of the coverage when each agent is paired with the others.  In this case the Congeniality and Talent Awards may in fact be the overall winners; because in our little example, Ginkgo is not even in the winning pair, which is instead Pycnogenol and Olive Oil. This combination can be expected to exert more variety over the genetic expression of the pathology combination and thus may well control it better.


Closing Thoughts

I’ve used Kluster in a few clinical situations and enjoyed exploiting the benefits of the additional insights it provides. Play around with it. Choose different pathology and agent combinations. The joys are endless.

The underpinnings of our simple app drive a lot of modern empirical drug discovery, the so-called in silico methodology. But its basis is as old as strategy itself. I’m reminded of the quote by which the Duke of Wellington recounted the Battle of Waterloo: “They came at us the same old way, and we beat them the same old way.”  Don’t be that field marshal!

Think about infusing a little bit of Requisite Variety into your next protocol. You may well be surprised by a major uptick in efficacy.


References

  1. Jan Baumbach J. et al. An in silico-based approach to improve the efficacy and precision of drug REPurpOsing TRIALs for a mechanism-based patient cohort with predominant cerebro-cardiovascular phenotypes. Ref. Ares (2018)3530711 – 03/07/2018
  • Kwang-Il Goh, et al. The human disease network. Proc Natl Acad Sci USA. 2007 May 22; 104(21): 8685–8690. 


Peter D’Adamo is a retired distinguished professor of clinical medicine at the University of Bridgeport School of Naturopathic Medicine. His New York Times bestselling books have sold over 8 million copies and have been translated into over 75 languages. He is the developer of the acclaimed Opus23 genomic software suite and a variety of other generative apps that can be explored at www.datapunk.net.  In his spare time, he brings old VW Beetles back to life at his garage on www.kdf20.com.