ChEMBL Resources

The SARfaris: GPCR, Kinase, ADME

Friday, 30 August 2013

New Drug Approvals 2013 - Pt. XII - Technetium Tc 99m Tilmanocept (LymphoSeekTM)

ATC code: V09IA09

On March 13th 2013, the FDA approved Technetium Tc 99m Tilmanocept (LymphoSeekTM), a radioactive diagnostic agent indicated for lymphatic mapping with a hand-held gamma counter to assist in the localisation of lymph nodes draining a primary tumour site in patients with breast cancer or melanoma

Melanoma is a malignant skin tumour, which, although rather uncommon, causes 75% of skin cancer related deaths.  Breast cancer accounts for almost 23% of all cancers in women and in 2008 caused 13.7% of cancer related deaths in women. Lymph nodes drain lymphatic fluid coming from tissues, if the tissues contain a tumour, the node will retain cancer cells coming from it. By removing and analysing the lymph node, precious informations can be obtained regarding the spread of the tumour.

Technetium Tc 99m Tilmanocept (ChEMBL: CHEMBL2108726) acts by accumulating in lymphatic tissue and selectively binding to mannose binding receptor (CD206, ChEMBL: CHEMBL2176854, Uniprot:P22897) found on macrophage and dendritic cells membrane. In vitro studies show that Technetium Tc 99m Tilmanocept binds to the human mannose binding receptor with an affinity of Kd = 2.76 x 10-11 M. Clinical studies show that it accumulates in lymph nodes within 10 min and up to 30 hours after the injection.

The PDBe entry (PDBe: 1egg) for a crystal structure of the macrophage mannose receptor is shown below.

Tilmanocept is a macromolecule composed of multiple units of diethylenetriaminepentaacetic acid (DTPA) and mannose, each covalently bonded to a 10 kDa backbone of dextran. The DTPA acts as a chelating agent for labelling with Technetium 99m (Tc 99m), while mannose, a naturally occurring sugar, acts as a target ligand. The active component is radioactive Tc 99m, a synthetic element widely used in nuclear medicine that decays with a half-life of 6 hours emitting Gamma-2 photons. The molecular formula of Technetium Tc 99m Tilmanocept is [C6H10O5]n.(C19H28N4O9S99mTc)b.(C13H24N2O5S2)c.(C5H11NS)a. It contains 3-8 conjugated DTPA molecules (b); 12-20 conjugated mannose molecules (c) with 0-17 amine side chains (a) remaining free. The calculated average molecular weight of Tilmanocept ranges from 15,281 to 23,454 g/mol.

The recommended dose of Technetium Tc 99m Tilmanocept is 18.5 MBq (0.5 mCi) as a radioactivity dose and 50 mcg as a mass dose, administered via injection at least 15 minutes prior the lymph node mapping. Technetium Tc 99m Tilmanocept has a half-life at the injection site of 1.75 to 3.05 hours.

LymphoSeek is produced by Navidea Biopharmaceuticals, Inc.
Full prescribing information can be found here.

Monday, 19 August 2013

Removal of Metal-Containing Compounds

Further to my post a few months ago (To Remove or Not to Remove) about removing certain problem metal-containing compounds, we have now come up with a plan of what to do.
Instead of labeling this curation as ‘removal of inorganics’, or ‘removal of organometallics’, we simply want this to be known as ‘removal of some metal-containing compounds’.

The criterion that we used was to exclude a large proportion of compounds that contained a metal, apart from cases where a metal was commonly found as part of a pharmaceutical preparation (e.g. Ranitidine Bismuth Citrate CHEMBL2111286, Silver Sulfadiazine CHEMBL1382627, Bacitracin Zinc CHEMBL2096639). The reasoning behind the removal of such compounds was that most of these metals are bonded to the rest of the compound components via coordinate bonds. However, due to InChI limitations, there is no way of creating a Standard InChI that retains coordinate bond information. As we use Standard InChI as the main compound identifier of uniqueness in ChEMBL, it was decided to exclude the structures altogether.

This change will come into effect with the release of ChEMBL_17, and only affects ~3,200 compounds. The compound image on the interface will be replaced with an icon that shows it’s a metal-containing compound (see picture, above). The structures will not be part of the download set on the FTP site, but we will retain the molecular formula in both the downloads and on the ChEMBL interface, so that you can still see the elemental make up of the compound. We will, of course, retain all of the bioactivity data on these compounds.

Any questions, please feel free to contact

Tuesday, 13 August 2013

2nd RDKit UGM - A reminder

For those who forgot to register, this is a gentle reminder for the 2nd RDKit User Group Meeting. The meeting will take place October 2nd-4th here the Genome Campus in Hinxton, UK. We're using a different format for the meeting this year:

Days 1 and 2: Talks, lightning talks, roundtable(s), discussion, and something new: talktorials! Talktorials are somewhere between a talk and a tutorial, they cover something interesting done with the RDKit and include the code used to do the work. During the presentation you'll give an overview of what you did and also show the pieces of the code that are central to the work. The idea is to mix the science up with the tutorial aspects.

Day 3 will be the first ever RDKit sprint: those who choose to stay will spend an intense day working in small groups to produce useful artifacts: new bits of code, KNIME nodes, KNIME workflows, tutorials, documentation, IPython notebooks, etc. We'll see who's there and what folks are interested in contributing and go from there.

There will also be, of course, social and networking activities!

Registration is free at the following link:

We are also looking for people who are willing to do presentations or talktorials on the first two days. If you're interested in contributing, please send us an email

We are really looking forward to seeing a bunch of you again, to meet some new people from the ever growing RDKit developer and user community, and to hear some more cool stories about what people do with the RDKit.

Greg and George

Monday, 12 August 2013

What is the R&D Cost of a New Medicine?

Here's a recent (2012), and excellent, analysis and estimate of the development costs of a new medicine (specifically an NME, a chemically distinct, novel molecule). There is a good overview of the historical trends in costs and attrition, and a collection of all significant previous estimates of the R&D costs of a new drug. There's some nice exploration of the sensitivity of the costs to various factors, and differential success and costs across various therapeutic areas.

In case you wanted to jump to the punchline, the costs in this study is $1,506,000,000 (i.e. $1.5bn) at 2011 USD prices.

The report is free, with only registration at the OHE website required to download the report. Great value!

%T The R&D Cost of a New Medicine
%A J. Mestre-Ferrandiz
%A J. Sussex
%A A. Towse
%I Office of Health Economics
%D 2012
%O ISBN 978-1-899040-19-3


Friday, 9 August 2013

USAN Watch: August 2013

The USANs for August 2013 have recently been published.

USAN Research Code InChIKey (Parent)Drug ClassTherapeutic classTarget
therapeuticsynthetic small molecule prodrugCMV DNA polymerase
therapeuticnatural product derived small molecule prodrugHIV RT
HuMax-CD38, 3003-005

n/atherapeuticmonoclonal antibodyCD38
diclofenacDCOPUUMXTXDBNB-UHFFFAOYSA-Ntherapeuticsynthetic small moleculeCOX
IPI-145; INK-1197 

therapeuticsynthetic small moleculePI3K-delta, PI3K-gamma
therapeuticsynthetic small moleculeHCV-NS5A
RQ-7, RQ-00000007, MR10A7, AAT-007, CJ-023, 423

therapeuticsynthetic small moleculeEP4
IDX-18719, IDX-719

therapeuticsynthetic small moleculeHCV-NS5A
LP-802034, LX-4211 

therapeuticsynthetic small moleculeSGLT1, SGLT2
SZBGQDXLNMELTB-UHFFFAOYSA-Ntherapeuticsynthetic small moleculeSMO-1
therapeuticsynthetic small molecule
Adenoviral Vector Ad-RTS-IL-12

Friday, 2 August 2013

Open PHACTS KNIME and Pipeline Pilot Components

Open PHACTS has released a collection of Pipeline Pilot and KNIME workflow components which integrate with the Open PHACTS API. Integration with these well-established graphical workflow tools allows the pharmacological and physicochemical data within the Open PHACTS Discovery Platform to be easily accessed and consumed.

Open PHACTS (Open PHArmacological Concepts Triple Store) is a project of the Innovative Medicines Initiative (IMI) and has seen SMEs, academia and the pharmaceutical industry work together to create a freely-available online platform to multiple, integrated sources of publicly available pharmacological data. The project ends in 2014 and the project’s not-for-profit successor organisation, the Open PHACTS Foundation, will continue to support and develop the infrastructure created.

The Open PHACTS Discovery Platform has been designed to answer various critical pharmacology questions, many of which can be addressed using the newly released Pipeline Pilot and KNIME nodes. The portal to the workflow integration collection can be found at

Compound Sets and Availability

Chemical databases come in many different types and flavours, and given that we now have UniChem up and running, and it is being actively used by at least some of you, our minds have turned a little to describing these ‘flavours’ and ‘resolutions’. One of the key things a user is interested in is how easy is it to get hold of a compound, since this is usually a key filter applied to actually doing anything with the results of a database search. The cost implications of needing to commission synthesis, or potentially try and develop new synthetic methodology to a compound are substantial, and there is a substantial literature on the computational assessment of synthetic accessibility (q.v.).

So, here is a simple five state classification that reflects the typical availability of a compounds in a chemical collection.
  1. A compound has been previously been synthesized and is readily available from chemical vendors.
  2. A compound has been previously synthesized but would require resynthesis.
  3. A compound has not been previously synthesized, but close analogues have and the compound is likely to be readily synthesizable. This class of molecule is often associated with the phrase ‘virtual library’.
  4. A compound has not been previously synthesized, and effort would be required to think about synthetic access to the compound.
  5. A compound is theoretically possible with respect to valence rules, but is so unstable that it is unlikely that it ever can be isolated in pure form and then experiments in a biofluid carried out.
Of course, all these classifications are interesting, but you can do a lot more, a lot quicker and cheaper if a compound is in set 1.

As an estimate of the likely difference in cost between these different classes, I personally, would rate the cost differences, relative to set 1, as twenty fold for set 2, forty fold for set 3, and two hundred fold for set 4 - but these are just my estimates, and there will be a big variance in these costs dependent of the exact compound, its class, etc. Others will have better or different estimates of the average cost differences between the sets (comments welcome!).

Because of the way that people have assembled chemical databases, entire primary databases tend to cluster in a similar way - for example ChEMBL is mostly 2), DrugBank is mostly 1) and GDB-17 is mostly 4). Directly from the above definition, every compound with a known bioactivity has to have been synthesized, and so ChEMBL will always be a 2) in this classification. Of course, some compounds in ChEMBL are readily available, but it is a clear minority.

When people build federated chemical databases (those with little unique primary content, but primarily add value by bringing lots of feeder databases together; for example PubChem and ChemSpider) the picture gets a little more complicated at a database level, since they are often blends of synthesized and ‘virtual’ compound sets. But the same need to indicate the availability/provenance of a structure is useful, and federated databases need to store the original primary database (which may or may not itself be available outside of the federated database). 

So, a couple of thoughts:

  • Is this classification useful to apply to the contents of UniChem? 
  • Is the following classification of the UniChem component databases useful and valid?
  1. DrugBank, PDBe, IUPHAR, KEGG, ChEBI, Array_Express, NIH_NCC
  2. ChEMBL, ZINC, eMolecules
  3. IBM, Patents, SureChem (we don’t currently have GDB in UniChem, but if it was it would be in this set.

See UniChem itself for more details of what is behind these set names.