We also find that profile based ISS (PBISS) and profile based ISC (PBISC) significantly decreases ISS search time without sacrificing search performance.

On the basis of our large-scale investigation directed against a wide spectrum of pharmaceutical targets, we conclude that ISC and ISS searches perform better than 2D fingerprint similarity searching and that profile based versions of these algorithms do nearly as well in less time.

The enriched biological activity information of compounds in large and freely-accessible chemical databases like the Pub Chem Bioassay Database has become a powerful research resource for the scientific research community.

Currently, 2D fingerprint based conventional similarity search (CSS) is the most common widely used approach for database screening, but it does not typically incorporate the relative importance of fingerprint bits to biological activity.

In part, this is due to the relative computational efficiency, which is important for large online chemical databases such as Pub Chem to answer user queries in a reasonable amount of time.

ISS is an iterative similarity search approach in which the similarity of a database compound is determined by comparing the query compound to multiple references with the same biological activity.

The basic theory behind ISS is that the neighbor list of references map out a hypervolume in the multidimensional sampling space for the bioactivity of interest, and consequently the top-ranked structures in the search result are more likely to be compounds with similar biological activity. compared ISS with CSS and bit-weighting approaches, and they found an overwhelming advantage of ISS in retrieving active hits [].

In total, 6 search approaches including 2 non-iterative approaches (2D fingerprint base d conventional similarity search or CSS, and average profile search or PBSS), 2 iterative ISS approaches with multiple active references (fingerprint based ISS, and average profile based ISS or PBISS search), and finally 2 iterative searches with classification (fingerprint based ISC, and average profile based ISC or PBISC) were systematically tested on 208 activity classes.

The arithmetic mean of recall rates tested on the selected activity class (ARR), the arithmetic precision rate (APR), and area under the ROC curve (AUC) of each of 208 activity classes were compared to comprehensively evaluate the search performance of all 6 search approaches.


