Experience the benefits of AI technology, contact our consultants for more information!

News & Events News

Training strategy for unbalanced small datasets in deep learning

Research 21, August, 2020

Author(s):

Hsin Liu, Cheng-Wei Lee, Bo-Han Su, and Yufeng J. Tseng

 

Abstract:

Big datasets have been keys to deep learning and the neural network approach applied to them in the past few years. However, one never has the luxury in medicinal chemistry compared to image processing field where large accessible data were readily available. The smaller datasets are often due to the lack of published experimental results which might be affected by including complex experimental design, expensive experimentation, or simply limitations in techniques. Also, the nature of medicinal chemistry chasing after more active compounds make almost published data unbalanced—that is having few positive data with mostly negative data. It would be invaluable to be able to train a model with unbalanced small dataset in medicinal chemistry for drug development in particular. In this work, we proposed a training strategy for unbalanced small datasets. The strategy includes selecting the sampling ratio, core deep learning methods, fingerprint selection, and descriptor merge of fingerprint and automatic feature extraction by deep learning. We chose the Ames test for mutagenicity as the example in this study due to its available information for validation study; and also the entire dataset could be divided in segments to simulate unbalanced small datasets for training and discussion. Overall, the up-sampling method is able to rebalance the data distribution in different categories and demonstrates better performance in both convergence speed and balanced accuracy.

 

References:

1. David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan ́ Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems (NIPS), pp. 2224–2232, 2015.

 

Download PDF file: 

 

 

 

 

Recommended News

IN THE REPORT

17, August, 2022

IN THE REPORT - PwC Taiwan

Read article

DEMO DAY

27, July, 2022

Join us for 2022 TMU x BE DEMO DAY

Read article

Upcoming events

21 - 25, August, 2022

Chicago, IL & Hybrid

ACS Fall 2022

View event

27 - 31, July, 2022

Taipei Nangang Exhibition Center _ Hall 2 (TaiNEX2)

BIO Asia Taiwan 2022

View event

20 - 24, March, 2022

Virtual & San Diego Convention Center (SDCC)

ACS Spring 2022

View event

21 - 25, July, 2021

Online + Onsite

BIO Asia Taiwan 2021

View event

26, May, 2021

Room 402A & 402B, NTUH International Convention Center

FORUM - New applications in drug discovery and development

View event

29, April, 2021

Conference room: C212 Building C, National Biotechnology Research Park

CONFERENCE - How to commercialize AI in pharmaceuticals industry

View event

04, December, 2020

Virtual Meeting

Biotech–MedTech

View event

14 - 18, September, 2020

Virtual Meeting

TechCrunch Disrupt 2020

View event

08, September, 2020

TPRMA, Taiwan

Seminar - Using AI to accelerates the drug development of small molecular drug

View event

17 - 20, August, 2020

Virtual Meeting & Expo

ACS Fall 2020 Moving Chemistry From Bench to Market

View event

08 - 11, June, 2020

San Diego Convention Center, California

BIO International Convention

View event