Jurs Research Group

1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Click on an active number to read the abstract

Last Update November 2005



54. G.P. Sutton, L.S. Anker, P.C. Jurs, Evaluation of Automated Methods for the Selection of Models for the Simulation of Carbon-13 Nuclear Magnetic Resonance Spectra of Keto Steroids Anal. Chem. 63:443-449 (1991).

155. P.C. Jurs and R.G. Lawson, Analysis of Chemical Structure-Biological Activity Relationships Using Clustering Methods, Chemo. & Intell. Lab. Syst. 10:81-83 (1991).

156. M.L. Ranc and P.C. Jurs, Simulation of Carbon-13 Nuclear Magnetic Resonance Spectra of Quinolines and Isoquinolines, Anal. Chim. Acta 248:183-193 (1991).

157. D.T. Stanton, P.C. Jurs, M.G. Hicks, Computer-Assisted Prediction of Normal Boiling Points of Furans, Tetrahydrofurans, and Thiophenes, J Chem. Infor. Comput. Sci. 31:301-310 (1991).

158. P.A. Edwards, L.S. Anker, P.C. Jurs, Quantitative Structure-Property Relationship Studies of the Odor Threshold of Odor Active Compounds, Chemical Senses 16:447-465 (1991).

159. C.G. Georgakopoulos, J.C. Kiburis, P.C. Jurs, Prediction of Gas Chromatographic Relative Retention Times of Stimulants and Narcotics, Anal. Chem. 63:2012-2024 (1991).

160. C.G. Georgakopoulos, O.G. Tsika, J.C. Kiburis, P.C. Jurs, Prediction of Gas Chromatographic Relative Retention Times of Anabolic Steroids, Anal. Chem. 63:2025-2028 (1991).

161. J.W. Ball, L.S. Anker, P.C. Jurs, Automated Model Selection for the Simulation of Carbon-13 Nuclear Magnetic Resonance Spectra of Cyclopentanones and Cycloheptanones, Anal. Chem. 63:2435-2442 (1991).

162. P.C. Jurs, T.L. Isenhour, C.L. Wilkins, Basic Pro Chemiky, SNTL Nakladatelstvi tecnick literatury, Praha, Czechoslovakia 1991.

1992

163. M.D. Needham and P.C. Jurs, Quantitative Structure-Retention Relationship Studies of Polychlorinated Dibenzodioxins on Gas Chromatographic Stationary Phases of Varying Polarity, Anal. Chim. Acta 258:183-198 (1992).

164. M.D. Needham, K.C. Adams, P.C. Jurs, Quantitative Structure-Retention Relationship Studies of Polychlorinated Dibenzofurans on Gas Chromatographic Stationary Phases of Varying Polarity, Anal. Chim. Acta 258:199-218 (1992).

165. D.T. Stanton and P.C. Jurs, Computer-Assisted Study of the Relationship Between Molecular Structure and Surface Tension of Organic Compounds, J. Chem. Inf. Comp. Sci. 32:109-115 (1992).

166. S.L. Dixon and P.C. Jurs, Atomic Charge Calculations for Quantitative Structure-Property Relationships, J. Comp. Chem. 13:492-504 (1992).

167. L.S. Anker and P.C. Jurs, Prediction of Carbon-13 Nuclear Magnetic Resonance Chemical Shifts by Artificial Neural Networks, Analytical Chemistry 64:1157-1164 (1992).

168. C.J. Russell, S.L. Dixon, P.C. Jurs, Computer-Assisted Study of the Relationship Between Molecular Structure and Henry's Law Constant, Analytical Chemistry 64:1350-1355 (1992).

169. F.M. Dunnivant, A.W. Elzerman, P.C. Jurs, M.N. Hasan, Quantitative Structure-Property Relationships for Aqueous Solubilities and Henry's Law Constants of Polychlorinated Biphenyls, Environ. Sci. Tech. 26:1567-1573 (1992).

170. P.C. Jurs, J.W. Ball, L.S. Anker, T.L. Friedman, Carbon-13 Nuclear Magnetic Resonance Spectrum Simulation, J. Chem. Infor. Comp. Sci. 32:272-278 (1992).

171. D.T. Stanton, L.M. Egolf, P.C. Jurs, M.G. Hicks, Computer-Assisted Prediction of Normal Boiling Points of Pyrans and Pyrroles, J. Chem. Inf. Comp. Sci. 32:306-316 (1992).

172. L.M. Egolf and P.C. Jurs, Estimation of Autoignition Temperatures of Hydrocarbons, Alcohols and Esters from Molecular Structure, Ind. Eng. Chem. Res. 31:1798-1807 (1992).

173. T.F. Woloszyn and P.C. Jurs, Quantitative Structure-Retention Relationship Studies of Sulfur Vesicants, Anal. Chem. 64:3059- 3063 (1992).

1993

174. T.F. Woloszyn and P.C. Jurs, Prediction of Gas Chromatographic Retention Behavior of Hydrocarbons from Naphthas, Anal. Chem. 65:582-587 (1993).

175. P.C. Jurs, L.S. Anker, J.W. Ball, Carbon-13 Nuclear Magnetic Resonance Spectrum Simulation, in Computer-Enhanced Analytical Spectroscopy, Vol. 4, C.W. Wilkins (Ed.), Plenum Press, 1993.

176. E.P. Jaeger, T.R. Stouch, P.C. Jurs, Structure-Activity Relationship Studies of Retinoid Cancer Inhibition, Eur. J. Med. Chem. 28:275-290 (1993).

177. J.W. Ball and P.C. Jurs, Automated Selection of Regression Models using Neural Networks for 13C NMR Spectral Predictions, Anal. Chem. 65:505-512 (1993).

178. M.L. Ranc and P.C. Jurs, Simulation of 13C Nuclear Magnetic Resonance Spectra of Indoles, Anal. Chim. Acta 280:145-155 (1993).

179. D.T. Stanton, W.J. Murray, P.C. Jurs, Comparison of QSAR and Molecular Similarity Approaches for a Structure-Activity Relationship of DHFR Inhibitors, Quant. Str.-Act. Relat. 12:239-245 (1993).

180. L.M. Egolf and P.C. Jurs, Prediction of Boiling Points of Organic Heterocyclic Compounds Using Regression and Neural Network Techniques, J. Chem. Inf. Comp. Sci. 33:616-625 (1993).

181. S.L. Dixon and P.C. Jurs, Estimation of pKa for Organic Oxyacids using Calculated Atomic Charges, J. Comp. Chem. 14:1460-1467 (1993).

182. P.C. Jurs, Applications of Computational Neural Networks in Chemistry, CICSJ Bulletin 11:2-10 (1993).

183. L.M. Egolf and P.C. Jurs, Quantitative Structure-Retention Relationship and Structure-Odor Intensity Relationships for a Diverse Group of Odor-Active Compounds, Anal. Chem. 65:3119-3126 (1993).

184. J.W. Ball and P.C. Jurs, Simulation of Polysaccharide 13C Nuclear Magnetic Resonance Spectra Using Regression Analysis and Neural Networks, Anal. Chem. 65:3615-3621 (1993).

185. T.M. Nelson and P.C. Jurs, Prediction of Aqueous Solubility of Organic Compounds, J. Chem. Inf. Comput. Sci., 34:601-609 (1994).

1994

186. Lu Xu, J.W. Ball, S.L. Dixon, P.C. Jurs, Quantitative Structure-Activity Relationships for Toxicity of Phenols Using Regression Analysis and Computational Neural Networks, Environmental Toxicology and Chemistry 13:841-851 (1994).

187. S.L. Dixon and P.C. Jurs, Fast Geometry Optimization Using a Modified Extended Huckel Method: Results for Molecules Containing H, C, N, O, and F, J. Comput. Chem. 15:733-746 (1994).

188. L.E. Egolf, M.D. Wessel, P.C. Jurs, Prediction of Boiling Points and Critical Temperatures of Industrially Important Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci. 34:947-956 (1994).

189. L. Xu, Y.Q. Yang, C.Y. Hu, P.C. Jurs, J.W. Ball, S.L. Dixon, Studies of the Transition Emission of Eu+2 Ion in Complex Fluorides Using Neural Network, J. Rare Earths 12:168-178 (1994).

190. S.A. Cai, L. Xu, Y.Q. Yang, C.Y. Hu, P.C. Jurs, J.W. Ball, S.L. Dixon, Classification of Human Senile Cataract Lenses Based on Metal Contents Using Neural Networks, Chem. J. Chinese Universities 15:982-985 (1994).

191. D.L. Clouser and P.C. Jurs, Simulation of 13C Nuclear Magnetic Resonance Spectra of Tetrahydropyrans Using Regression Analysis and Neural Networks, Anal. Chim. Acta 295:221-231 (1994).

192. M.D. Wessel and P.C. Jurs, Prediction of Reduced Ion Mobility Constants from Structural Information using Multiple Linear Regression Analysis and Computational Neural Networks, Anal. Chem. 66:2480-2487 (1994).

193. P.C. Jurs, S.L. Dixon, L.M. Egolf, Representations of Molecules, in Quantitative Structure Activity Relationships. Part 2. Chemometric Methods in Molecular Design, H. van de Waterbeemd (Editor), VCH Verlagsgesellschaft, Weinheim, Germany.

1995

194. M.D. Wessel and P.C. Jurs, Prediction of Normal Boiling Points of Hydrocarbons from Molecular Structure, J. Chem. Inf. Comput. Sci. 35:68-76 (1995).

195. J.M. Sutter and P.C. Jurs, Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing, J. Chem. Inf. Comput. Sci. 35:77-84 (1995).

196. J.M. Sutter and P.C. Jurs, Selection of Molecular Descriptors for Quantitative Stucture-Activity Relationships, in Data Handling in Science and Technology. Adaption of Simulated Annealing to Chemical Optimization Problems, Vol. 15, John H. Kalivas (Ed.), Elsevier, 1995.

197. D.L. Clouser and P.C. Jurs, Simulation of the 13C Nuclear Magnetic Resonance Spectra of Trisaccharides using multiple Linear Regression Analysis and Neural Networks, Carbohydrate Research. 271:65 (1995).

198. M.D. Wessel and P.C. Jurs, Prediction of Normal Boiling Points for a Diverse Set of Industrially Important Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci. 35:841-850 (1995).

1996

199. B.E. Mitchell and P.C. Jurs, Computer Assisted Simulation of 13C Nuclear Magnetic Resonance Spectra of Monosaccharides, J. Chem. Inf. Comput. Sci. 36:58-64 (1996).

200. J.M. Sutter and P.C. Jurs, Prediction of Aqueous Solubility for a Diverse Set of Heteroatom-Containing Organic Compounds Using a Quantitative Structure-Property Relationship, J. Chem. Inf. Comput. Sci. 36:100-107 (1996).

201. D.L. Clouser and P.C. Jurs, Simulation of the 13C Nuclear Magnetic Resonance Spectra of Ribonucleosides Using Multiple Linear regression Analysis and Neural Networks, J. Chem. Inf. Comput. Sci. in review.

202. D.L. Clouser and P.C. Jurs, The Simulation of 13C Nuclear Magnetic Resonance Spectra of Dibenzofurans using Multiple Linear Regression Analysis and Neural Networks, Anal. Chim. Acta 321:127-135 (1996).

203. M.D. Wessel, J.M. Sutter and P.C. Jurs, Prediction of Reduced Ion Mobility Constants of Organic Compounds from Molecular Structure, Anal. Chem. 68:4237-4243 (1996).

1997

204. S. R. Johnson and P.C. Jurs, Prediction of Acute Mammalian Toxicity from Molecular Structure for a Diverse Set of Substituted Anilines Using Regression Analysis and Computational Neural Networks in Computer-Assisted Lead Finding and Optimization, H. van de Waterbeemd, B. Testa, G. Folkers (Eds.), Verlag Helvetica Chimica Acta, Basel, 1997.

205. J.M. Sutter and P.C. Jurs, Neural Network Classification and Quantification of Organic Vapors Based on Fluorescence Data from a Fiber-Optic Sensor Array, Anal. Chem. 69:856 (1997).

206. J.M. Sutter, T.A. Peterson and P.C. Jurs, Prediction of Gas Chromatographic Relative Retention Times of Alkylbenzenes. Anal. Chim. Acta. 342: 113.

207. H.L. Engelhardt, P.C. Jurs, Prediction of Supercritical Carbon Dioxide Solubility of Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci. 37:478-484 (1997).

208. B.E. Mitchell, P.C. Jurs, Prediction of Autoignition Temperatures of Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci. 37:538-547 (1997).

209. S.R. Johnson, J.M. Sutter, H.L. Engelhardt, P.C. Jurs, J. White, J.S. Kauer, T.A Dickinson, and D.R. Walt, Identification of Multiple Analytes Using an Optical Sensor Array and Pattern Recognition Neural Networks, Anal. Chem. 69:4641 (1997).

1998

210. M. D. Wessel, P. C. Jurs, J. W. Tolan and S. M. Muskal. Prediction of Human Intestinal Absorption of Drug Compounds from Molecular Structure. J. Chem. Inf. Comput. Sci. 38:726-735 (1998).

211. B. E. Mitchell, P. C. Jurs, Prediction of Infinite Dilution Activity Coefficients of Organic Compounds in Aqueous Solution from Molecular Structure, J. Chem. Inf. Comput. Sci. 38:200-209 (1998).

212. B. E. Mitchell, P. C. Jurs, Prediction of Aqueous Solubility of Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci. 38:489-496 (1998).

213. David R. Walt, Todd Dickinson, Joel White, John Kauer, Stephen Johnson, Heidi Engelhardt, Jon Sutter, Peter Jurs, Optical sensor arrays for odor recognition. Biosensors & Bioelectronics 13:697-699 (1998).

214. Brian E. Turner, Chandra L. Costello, Peter C. Jurs, Prediction of Critical Temperatures and Pressures of Industrially Important Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci. 38:639-645 (1998).

1999

215. Stephen R. Johnson, Peter C. Jurs, Prediction of the Clearing Temperatures of a Series of Liquid Crystals from Molecular Structure, Chemistry of Materials 11:1007-1023 (1999).

216. Gregory A. Bakken, Peter C. Jurs, Prediction of Methyl Radical Addition Rate Constants from Molecular Structure, J. Chem. Inf. Comput. Sci. 39:508-514 (1999).

217. M.D. Wessel, P.C. Jurs, J.W. Tolan, S.M. Muskal, Prediction of Human Intestinal Absorption of Drug Compounds from Molecular Structure, in Molecular Modeling and Prediction of Bioavailability, K. Gundertofte and F.S. Jorgensen (Eds.), Kluwer Academic/Plenum Publishers, New York, 2000, pp. 249-255.

218. D.V. Eldred and P.C. Jurs, Prediction of Acute Mammalian Toxicity of Organophosphorus Pesticide Compounds from Molecular Structure, SAR & QSAR in Environmental Research 10:75-99 (1999).

219. D.V. Eldred, C.L. Weikel, P.C. Jurs, K.L.E. Kaiser, Prediction of Fathead Minnow Acute Toxicity of Organic Compounds from Molecular Structure, Chem. Res. in Toxicology 12:670-678 (1999).

220. G.A. Bakken and P.C. Jurs, Prediction of Hydroxyl Radical Rate Constants from Molecular Structure, J. Chem. Inf. Comput. Sci. 39:1064-1075 (1999).

221. E.S. Goll and P.C. Jurs, Prediction of Normal Boiling Points of Organic Compounds from Molecular Structure with a Computational Neural Network Model, J. Chem. Inf. Comput. Sci. 39:974-983 (1999).

222. E.S. Goll and P.C. Jurs, Prediction of Vapor Pressures of Hydrocarbons and Halohydrocarbons from Molecular Structure, J. Chem. Inf. Comput. Sci. 39:1081-1089 (1999).

2000

223. S.J. Patankar and P.C. Jurs, Prediction of IC50 Values for Inhibitors of ACAT from Molecular Structure, J. Chem. Inf. Comput. Sci., 40:706-723 (2000).

224. G.W. Kauffman and P.C. Jurs, Prediction of Inhibition of the Sodium Ion- Proton Antiporter by Benzoylguanidine Derivatives from Molecular Structure, J. Chem. Inf. Comput. Sci., 40:753-761 (2000).

225. H. Engelhardt McClelland and P.C. Jurs, Quantitative Structure-Property Relationships for the Prediction of Vapor Pressures of Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci.,40:967-975 (2000)

226. P.C. Jurs, G.A. Bakken, H.E. McClelland, Computational Methods for the Analysis of Chemical Sensor Array Data from Volatile Analytes, Chem. Rev. , 100:2649-2678 (2000). 

227. G.A. Bakken, P.C. Jurs, Classification of Multidrug- Resistance Reversal Agents Using Structure-Based Descriptors and Linear Discriminant Analysis, J. Med. Chem., 43:4534-4541 (2000). 

2001

228. S.M. Danauskas, P.C. Jurs, Prediction of C60 Solubilities From Solvent Molecular Structures, J. Chem. Inf. Comput. Sci., 41:419-424 (2001). 

229. G.W. Kauffman, P.C. Jurs, Prediction of Surface Tension, Viscosity, and Thermal Conductivity for Common Organic Solvents Using Quantitative Structure-Activity Relationships, J. Chem. Inf. Comput. Sci., 41:408-418 (2001). 

230. N.R. McElroy, P.C. Jurs, Prediction of Aqueous Solubility of Heteroatom-Containing Organic Compounds from Molecular Structure, J. Chem. Inf. Comput. Sci., 41:1237-1247 (2001). 

231. G.A. Bakken, P.C. Jurs, QSARs for 6-Azasteroids as Inhibitors of Human Type 1 5a-Reductase: Prediction of Binding Affinity and Selectivity Relative to 3-BHSD, J. Chem. Inf. Comput. Sci., 41:1255-1265 (2001). 

232. G.A. Bakken, G.W. Kauffman, P.C. Jurs, K.J. Albert, S.S. Stitzel,  Pattern Recognition Analysis of Optical Sensor Array Data to Detect Nitroaromatic Compound Vapors, Sens. Actuators B 79:1-10 (2001). 

233. J.R. Serra, P.C. Jurs, Linear Regression and Computational Neural Network Prediction of Tetrahymena Acute Toxicity for Aromatic Compounds from Molecular Structure, Chem. Res. Toxicol., 14:1535-1545 (2001). 

234. G.W. Kauffman, P.C. Jurs, QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically-Based Numerical Descriptors, J. Chem. Inf. Comput. Sci., 41:1553-1560 (2001).

2002

235. B.E. Mattioni, P.C. Jurs, Development of Quantitative Structure-Activity Relationships and Classification Models for a Set of Carbonic Anhydrase Inhibitors, J. Chem. Inf. Comput. Sci., 42:94-102 (2002). PDF

236. P.D. Mosier, A.E. Counterman, P.C. Jurs, D.E. Clemmer, Prediction of Peptide Ion Collision Cross Sections from Topological Molecular Structure and Amino Acid Parameters, Anal. Chem., 74:1360-1370 (2002). PDF

237. B.E. Mattioni, P.C. Jurs, Prediction of Glass Transition Temperatures from Monomer and Repeat Unit Structure Using Computational Neural Networks, J. Chem. Inf. Comput. Sci., 42:232-240 (2002). PDF

238. S.J. Patankar and P.C. Jurs, Prediction of Glycine/NMDA Receptor Antagonist Inhibition from Molecular Structure, J. Chem. Inf. Comput. Sci., 42:1053-1068  (2002).

239. P.D. Mosier and P.C. Jurs, QSAR/QSPR Studies Using Probabilistic Neural Networks and Generalized Regression Neural Networks, J. Chem. Inf. Comput. Sci., 42:1460-1470 (2002).

2003

240. B.E. Mattioni and P.C. Jurs, Prediction of Dihydrofolate Reductase Inhibition and Selectivity Using Computational Neural Networks and Linear Discriminant Analysis, J. Molec. Graph. Model., 21:391-419 (2003).

241. J.R. Serra, E.D. Thompson, and P.C. Jurs, Development of Binary Classification of Structural Chromosome Aberrations for a Diverse Set of Organic Compounds from Molecular Structure, Chem. Res. Tox., 16:153-163 (2003).

242 N.R. McElroy, P.C. Jurs, C. Morisseau, and B.D. Hammock, QSAR and Classification of Murine and Human Soluble Epoxide Hydrolase Inhibition by Urea-like Compounds, J. Med. Chem., 46:1066-1080 (2003).

243 S.J. Patankar and P.C. Jurs, Classification of Inhibitors of Protein Tyrosine Phosphatase 1B Using Molecular Structure Based Descriptors, J. Chem. Inf. Comput. Sci. 43:885-899 (2003).

244 B.E. Mattioni, G.W. Kauffman, P.C. Jurs, L. L. Custer, S. K. Durham, G. M. Pearl, Predicting the Genotoxicity of Secondary and Aromatic Amines Using Data Subsetting To Generate a Model Ensemble, J. Chem. Inf. Comput. Sci. 43:949-963 (2003).

245 Philip D. Mosier, Peter C. Jurs, Laura L. Custer, Stephen K. Durham and Greg M. Pearl, Predicting the Genotoxicity of Thiophene Derivatives from Molecular Structure, Chem. Res. Tox. 16:721-732 (2003).

246 P.C. Jurs, Quantitative Structure-Property Relationships, in Handbook of Cheminformatics – From Data to Knowledge, J. Gasteiger and T. Engel (Eds.), Wiley-VCH, 2003.

247 N.R. McElroy, E.D. Thompson, P.C. Jurs, Classification of Diverse Organic Compounds That Induce Chromosomal Aberrations in Chinese Hamster Cells, J. Chem. Inf. Comput. Sci 43:2111- 2119 (2003).

247 L. He, P.C. Jurs, L.L. Custer, S.K. Durham, G.M. Pearl, Predicting the Genotoxicity of Polycyclic Aromatic Compounds from Molecular Structure with Different Classifiers, Chem. Res. Tox., in press, ASAP.

2004

248 R. Guha, J.R. Serra, P.C. Jurs, Generation of QSAR Sets with a Self-Organizing Map, J. Mol. Model. Graph., 23:1-14 (2004).

249 R. Guha, P.C. Jurs, Development of QSAR Models To Predict and Interpret the Biological Activity of Artemisinin Analogues, J. Chem. Inf. Comput. Sci., 44:1440-1449 (2004)

250 R. Guha, P.C. Jurs, The Development of Linear, Ensemble and Non-linear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors, J. Chem. Inf. Comput. Sci., 44:2179-2189 (2004)

2005

251 R. Guha and P.C. Jurs, Determining the Validity of a QSAR Model ¿ A Classification Approach, J. Chem. Inf. Comput. Sci., 45:65-73 (2005).

252 L. He, Peter C. Jurs, C. Kreatsoulas, L.L. Custer, S.K. Durham, G.M. Pearl, A PNN Multiple Classifier System for Predicting the Genotoxicity of Quinolone and Quinoline Derivatives, Chem. Res. Tox., 18:428-440 (2005).

253 L. He and P.C. Jurs, Assessing the Reliability of a QSAR Model¿s Predictions, J. Mol. Graph. Model. 23, 503-523 (2005).

254 R. Guha and P.C. Jurs, Interpreting Computational Neural Network QSAR Models: A Measure of Descriptor Importance, J. Chem. Inf. Model. 45, 800-806 (2005).

255 R. Guha, D.T. Stanton and P.C. Jurs, Interpreting Computational Neural Network QSAR Models: A Detailed Interpretation of the Weights and Biases, J. Chem. Inf. Model. 45:1109-1121 (2005).



Abstracts for references: 1994-present

QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS FOR TOXICITY OF PHENOLS USING REGRESSION ANALYSIS AND COMPUTATIONAL NEURAL NETWORKS

Lu Xu, J.W. Ball, S.L. Dixon, and P.C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, Univeristy Park, Pennsylvania, 16802

ABSTRACT

Quantitative structure-toxicity models were developed that directly link the molecular structures of a set of 50 alkylated and/or halogenated phenols with their polar narcosis toxicity, expressed as the negative logarithm of the IGC50 (50% growth inhibitory concentration) value in millimoles per liter. Regression analysis and fully connected, feed-forward neural networks were used to develop the models. Two neural network training algorithms (back-propagation and a quasi-Newton method) were employed. The best model was a quasi-Newton neural network that had a root-mean-square error of 0.070 log units for the 45 training set phenols and 0.069 log units for the five cross validation set phenols.


FAST GEOMETRY OPTIMIZATION USING A MODIFIED EXTENDED HUCKEL METHOD: RESULTS FOR MOLECULES CONTAINING H, C, N, O, AND F

Steven L. Dixon and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

A semiemperically parameterized version of the extended Huckel molecular orbital method has been combined with an efficient quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm to obtain accurate geometries for compounds containing H, C, N, O, and F. The requirement of only one matrix diagonalization per energy evaluation makes the EHNDO (Extended Huckel Neglect of Differential Overlap) method faster than semiemperical Hartree-Fock NDDO methods such as MNDO, AM1, and PM3. Geometrical results for EHNDO appear to be as good as or better than results for the widely used AM1 technique, and geometry optimization for EHNDO also requires only a fraction of the time.


PREDICTION OF BOILING POINTS AND CRITICAL TEMPERATURES OF INDUSTRIALLY IMPORTANT ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

Leanne M. Egolf, Matthew D. Wessel and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Numeric representations of molecular structure are used to predict the normal boiling points and critical temperatures for compounds drawn from the Design Institute for Physical Property Data (DIPPR) database. Multiple linear regression analysis and computational neural networks (i.e., using back-propagation and quasi-Newton training) are employed to develop models which can accurately predict the boiling points of 298 organic compounds. This approach is assessed by comparing its results against results obtained using the Joback group contribution approach. Finally, the same methodology is used to develop two separate critical temperature models, one based on the methods of corresponding states and the second based on structurally derived parameters alone.


SIMULATION OF 13C NUCLEAR MAGNETIC RESONANCE SPECTRA OF TETRAHYDROPYRANS USING REGRESSION ANALYSIS AND NEURAL NETWORKS

Deborah L. Clouser and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

The 13C NMR spectra of tetrahydropyrans are simulated directly from their molecular structures. A set of 29 tetrahydropyrans is used as a training set to generate regression equations and to train neural networks, and three additional compounds are used as a external prediction set. The results of simulations done by regression analysis are found to be extremely sensitive to molecular geometries. To account for this, two different methods of descriptor manipulation, an averaging method and a Boltzmann-weighted averaging method, are introduced, and the models generated from the descriptor sets are compared. The results for the Boltzmann-weighted averaging method are found to be better than those based on descriptors derived from only the lowest energy conformation.


PREDICTION OF REDUCED ION MOBILITY CONSTANTS FROM STRUCTURAL INFORMATION USING MULTIPLE LINEAR REGRESSION ANALYSIS AND COMPUTATIONAL NEURAL NETWORKS

Matthew D. Wessel and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Multiple linear regression analysis and computational neural networks are used to develop models that predict reduced ion mobility constants (K0) from quantitative structural information encoded as descriptors. The errors associated with the models are similar to the calculated experimental error of ~0.040 K0 units. The best regression model contains five descriptors, has a multiple correlation coefficient (R) value of 0.991 and a standard deviation of 0.0469 K0 units. The neural network model utilizes the same five descriptors and has a root mean square (RMS) error of 0.0393 K0 units. The descriptors encode molecular size, weight, functional group, and structural classifications.


PREDICTION OF NORMAL BOILING POINTS OF HYDROCARBONS FROM MOLECULAR STRUCTURE

Matthew D. Wessel and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Computer assisted methods are used to investigate the relationship between normal boiling point and molecular structure for a set of hydrocarbons. Multiple linear regression methods are used to develop a six-variable linear model with a low root mean square (RMS) error. The six descriptors in the linear model are also used to develop a computational neural network model with a significantly lower RMS error. The methodology used in this study is also compared to Joback's group contribution method to estimate physical properties. The methods used here are found to be superior to Joback's method. However, when one additional variable encoding the square root of the molecular weight is added to Joback's groups, an excellent model is developed.


AUTOMATED DESCRIPTOR SELECTION FOR QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS USING GENERALIZED SIMULATED ANNEALING

Jon M. Sutter, Steve L. Dixon, and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

The central steps in developing QSARs are generation and selection of molecular structure descriptors and development of the model. Recently, computational neural networks have been employed as nonlinear models for QSARs. Neural networks can be trained efficiently with a quasi-Newton method, but the results are dependent on the descriptors used and the initial parameters of the network. Thus, two potential opportunities for optimization arise. The first optimization problem is the selection of the descriptors for use by the neural network. In this study, generalized simulated annealing (GSA) is employed to select an optimal set of descriptors. The cost function used to evaluate the effectiveness of the descriptors is based on the performance of the neural network. The second optimization problem is selecting the starting weights and biases for the network. GSA is also used for this optimization. The result is an automated descriptor selection algorithm that is an optimization inside of an optimization. Application of the method to a QSAR problem shows that effective descriptor subsets are found, and they support models that are as good or better than those obtained using traditional linear regression methods.


SIMULATION OF THE 13C NUCLEAR MAGNETIC RESONANCE SPECTRA OF TRISACCHARIDES USING MULTIPLE LINEAR REGRESSION ANALYSIS AND NEURAL NETWORKS

Deborah L. Clouser and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Predictive models are developed for the 13C NMR chemical shifts of the carbon atoms comprising the central rings of 46 trisaccharide compounds. Thirty-nine trisaccharides are used as a training set for development of models using regression analysis and computational neural networks, and seven compounds are used as an external prediction set. The descriptors used in the models are developed directly from the molecular structures of the trisaccharides. Three different methods of descriptor selection are compared. The dependence of the models on the geometries of the trisaccharides is explored. The models developed with geometric descriptors are better than those developed without geometric descriptors, although the latter models are still of a comparable quality. Overall, the best model found is a neural network based on descriptors selected by multiple linear regression.


PREDICTION OF NORMAL BOILING POINTS FOR A DIVERSE SET OF INDUSTRIALLY IMPORTANT ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

Matthew D. Wessel and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Models that accurately predict normal boiling points for organic compounds containing heteroatoms have been developed with regression and computational neural network methods. The structures of the compounds are represented by calculated structural descriptors. Two models are presented -- one for a set of 277 compounds containing only O, S, and halogens, and a second for a set of 104 compounds all containing N. Root-mean-square errors of about 9 K result. The accuracy of prediction of these models is compared to a widely used group contribution method for boiling point estimation.


COMPUTER ASSISTED SIMULATION OF 13C NUCLEAR MAGNETIC RESONANCE SPECTRA OF MONOSACCHARIDES

Brooke E. Mitchell and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Mathematical models are developed that relate the structures of monosaccharides to their 13C nuclear magnetic resonance spectra. The data set of monosaccharides consists of 55 monosaccharides in the six-membered ring configuration and 56 monosaccharides in the five-membered ring configuration. The structural environment of each carbon atom in the data set is encoded using numerical atom-based descriptors which are then used to develop linear regression models relating the 13C chemical shift to the structural features. The atom-based descriptors used in this study encode topological, geometric, and electronic information about the carbon atoms in monosaccharides. Multiple linear regression analysis is used to develop an eleven-descriptor model to predict the chemical shifts of pyranoses and pyranosides and an eight-descriptor model to predict the chemical shifts of furanoses and furanosides. The models are then submitted to computational neural networks, giving improved results with final training set rms errors of 1.03 ppm for pyranoses and pyranosides and 1.58 ppm for furanoses and furanosides.


PREDICTION OF AQUEOUS SOLUBILITY FOR A DIVERSE SET OF HETEROATOM-CONTAINING ORGANIC COMPOUNDS USING A QUANTITATIVE STRUCTURE-PROPERTY RELATIONSHIP

Jon M. Sutter and Peter C. Jurs

Department of Chemistry
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

The primary goal of a quantitative structure-property relationship (QSPR) is to identify a set of structurally based numerical descriptors that can be mathematically linked to a property of interest. The types of descriptors fall into three categories: topological, electronic, and geometric. In this study, 140 organic compounds with diverse structures were split into a training set, a cross-validation set, and a prediction set. The training set was used to build multiple linear regression and computational neural network models, the cross-validation set was used to prevent overtraining of the neural network, and the prediction set was used to validate the mathematical models. A set of nine descriptors was found that effectively linked the aqueous solubility to each structure. However, the polychlorinated biphenyls (PCBs) had a large root-mean-square (rms) error associated with them. Therefore models were also built using a training set that contained no PCBs. A set of nine descriptors was found with a significant improvement of the rms error of the training set as well as the prediction set.


THE SIMULATION OF THE 13C NUCLEAR MAGNETIC RESONANCE SPECTRA OF RIBONUCLEOSIDES USING MULTIPLE LINEAR REGRESSION ANALYSIS AND NEURAL NETWORKS

Deborah L. Clouser and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Regression equations have been developed to predict the 13C NMR spectra of 17 ribonucleosides through the use of atomic environmental descriptors. These descriptors were calculated directly from the structure of the compounds. 15 compounds are used as a training set for linear regression analysis, and 2 compounds are used as an external prediction set. Due to the diverse nature of the atoms within the data set, the chemical shifts were divided into subsets. The results for each subset are reported. Computational neural networks are also used to predict the chemical shifts of the atoms in the subsets.


THE SIMULATION OF 13C NUCLEAR MAGNETIC RESONANCE SPECTRA OF DIBENZOFURANS USING MULTIPLE LINEAR REGRESSION ANALYSIS AND NEURAL NETWORKS

Deborah L. Clouser and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Regression equations have been developed to predict the 13C NMR chemical shifts of the carbon atoms for a set of 20 dibenzofurans. Seventeen compounds were used as a training set and three compounds were used as an external prediction set. Using a generalized simulated annealing algorithm for descriptor selection, a linear regression model with eight descriptors was found with acceptable errors. Computational neural networks using a Quasi-Newton training algorithm were also used to predict the chemical shifts. The neural network models produced errors in the range of 0.66 - 1.18 ppm. The data were divided into subsets for regression analysis and neural networks because the data formed two distinct groups, and the results are compared to those obtained with the data of the complete set.


PREDICTION OF REDUCED ION MOBILTIY CONSTANTS OF ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

Matthew D. Wessel, Jon M. Sutter and Peter C. Jurs

Department of Chemistry
The Pennsylvania State University
152 Davey Laboratory, University Park, Pennsylvania, 16802

ABSTRACT

Quantitative structure-property relationships (QSPRs) are used to develop mathematical models that accurately predict the reduced ion mobility constants (K0) for a set of 168 organic compounds directly from molecular structure. The K0 values are taken from an unpublished database collected by G. A. Eiceman, Chemistry Department, New Mexico State University. The data were collected using a Graseby Ionics Environmental Vapour Monitor (EVM) gas chromatography/ion mobility spectrometer. Standardized conditions with controlled temperature, pressure, and humidity were used, and 2,4-lutidine was used as an internal standard. K0 values were measured for all monomer peaks. The best model was found with a feature selection routine which couples the genetic algorithm with multiple linear regression analysis. The set of six descriptors was also analyzed with a fully-connected, feed-forward neural network. The model contains six molecular structure descriptors and has a root-mean-square error of about 0.04 K0 units. The descriptors in the model lend insight into some of the important molecular features that influence ion mobility. The model can be utilized for prediction of K0 values of compounds for which there is no empirical K0 data.


PREDICTION OF ACUTE MAMMALIAN TOXICITY FROM MOLECULAR STRUCTURE FOR A DIVERSE SET OF SUBSTITUTED ANILINES USING REGRESSION ANALYSIS AND COMPUTATIONAL NEURAL NETWORKS

Stephen Roger Johnson and Peter C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Pennsylvania State University,
University Park, PA 16802, USA.

ABSTRACT

The acute oral mammalian toxicity (LD50) of a diverse set of substituted anilines was studied using a quantitative structure-activity relationship (QSAR). Feature selection was performed using least median squares to evaluate the fitness of subsets of descriptors chosen by an evolutionary optimization routine. Using this method, a five-descriptor model was found with reasonable training set and prediction set root mean square (rms) errors. Computational neural networks further improved the model, yielding a training set rms error of 0.238 log units and a prediction set rms error of 0.254 log units. Additionally, a feature selection routine using computational neural networks to evaluate the fitness of subsets of descriptors chosen by the genetic algorithm was employed. This routine was able to exploit the non-linear nature of a CNN, resulting in a model with a training set rms error of 0.233 log units and a prediction set rms error of 0.238 log units. The molecular structure descriptors contained in these models encode information regarding functional groups, molecular size, and intermolecular interactions.


NEURAL NETWORK CLASSIFICATION AND QUANTIFICATION OF ORGANIC VAPORS BASED ON FLUORESCENCE DATA FROM A FIBER-OPTIC SENSOR ARRAY

Jon M. Sutter and Peter C. Jurs*

Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, Pennsylvania 16802

ABSTRACT

Computational neural networks have been developed to classify and quantify nine organic vapors. The neural network analyses used data that consisted of the change in fluorescence of a sensor array that was patterned after the mammalian olfactory system. The sensor array consisted of 19 fiber optics that contained a polymer and dye mixture on one end. Plots of change in fluorescence intensity versus time were measured as pulses of analyte were presented to the sensor array. Important features derived from the intensity versus time plots were used to build neural network models that accurately classified and quantified each analyte. Most of the data were used to train the neural networks (training set members), some were used to assist termination of training the neural networks (cross-validation set members), and some were used to validate the models (prediction set members). Classification rates approaching 100% were achieved for the training set data, and 90% of the members in the prediction set were correctly classified. In addition, 97% of the prediction set observations were assigned a correct relative concentration.


PREDICTION OF GAS CHROMATOGRAPHIC RETENTION INDICES OF ALKYLBENZENES

J. M. Sutter, T. A. Peterson, and P. C. Jurs*

Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, Pennsylvania 16802

ABSTRACT

The retention indices of a set of alkylbenzenes on a polar gas chromatographic column are predicted directly from their molecular structures. Numerical descriptors are calculated based on the structure of a group of 150 alkylbenzenes. The descriptors are of three types: topological, geometric, and electronic. Statistical methods are employed to find an informative subset of these descriptors that can accurately predict the gas chromatographic retention indices. The ADAPT software system is used to construct a large pool of structurally derived numerical descriptors which are used to build quantitative structure-retention relationships (QSRRs). Multiple linear regression analysis and computational neural networks are used to map the descriptors to the retention indices.


PREDICTION OF SUPERCRITICAL CARBON DIOXIDE SOLUBILITY OF ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

H. L. Engelhardt and P.C. Jurs*

152 Davey Laboratory, Chemistry Department, The Pennsylvania State University,
University Park, Pennsylvania 16802

ABSTRACT

A diverse data set of 58 compounds taken from the literature was used to create models for the prediction of the solubility of organic compounds in supercritical carbon dioxide. Descriptors encoding information about the topological, geometric, and electronic properties of each compound in the data set were calculated from the molecular structures. A multiple linear regression model containing seven descriptors was generated. Several new descriptors, which were not present in the original pool, were calculated. One of the new descriptors was used to create the final seven descriptor linear model, which had a better root mean square (rms) error than the original model. The seven descriptors that appeared in the final model were used to make a neural network model which had a significantly better rms error than the linear model.


PREDICTION OF AUTOIGNITION TEMPERATURES OF ORGANIC COUMPOUNDS FROM MOLECULAR STRUCTURE

B. E. Mitchell and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University,
University Park, Pennsylvania 16802

ABSTRACT

A quantitative structure-property relationship study is performed to develop mathematical models that relate the structures of a heterogeneous group of organic compounds to their autoignition temperature values. The molecular structures of the compounds are represented by calculated numerical descriptors which encode their topological, electronic, and geometric features. These descriptors are used to develop several multiple linear regression and computational neural network models to predict the autoignition temperatures of a data set consisting of hydrocarbons, halohydrocarbons, and compounds containing oxygen, sulfur, and nitrogen. Both genetic algorithm and simulated annealing routines are used to select subsets of descriptors based on multiple linear regression and computational neural networks. The models that are developed have predictive ability in the range of the experimental error of autoignition temperature measurements.


IDENTIFICATION OF MULTIPLE ANALYTES USING AN OPTICAL SENSOR ARRAY AND PATTERN RECOGNITION NEURAL NETWORKS

Stephen R. Johnson, Jon M. Sutter, Heidi L. Engelhardt, Peter C. Jurs*

152 Davey Laboratory, Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802

Joel White and John S. Kauer

Department of Neuroscience, Tufts University School of Medicine, Boston, Massachusetts 02111

Todd A. Dickinson and David R. Walt

Department of Chemistry, Tufts University, Medford, Massachusetts 02155

ABSTRACT

The further development of a vapor-sensing device utilizing an array of broadly-distributed optical sensors is detailed. Data from these optical sensors provided input to pattern recognizing neural networks which successfully identified and quantified a collection of 20 analyte vapors. The optical sensor array consisted of nineteen optical fibers whose tops were coated with Nile Red immobilized in various polymer matrices. Responses consisted of the change in fluorescence with time resulting from the presentation of a vapor to the sensor array. Numerical descriptors calculated from these responses were then used to highlight important temporal and spatial features. Learning vector quantization neural network models were constructed using subsets of these descriptors, and they accurately identified and quantified each of the presented analytes. Successful classification was achieved for both the training set data (89%) and for the external prediction set data (90%). Relative concentrations were correctly assigned for 90% of the prediction set data.


PREDICTION OF HUMAN INTESTINAL ABSORPTION OF DRUG COMPOUNDS FROM MOLECULAR STRUCTURE

Matthew D. Wessel1, Peter C. Jurs1*, John W. Tolan2 and Steven. M. Muskal2*

1Department of Chemistry
The Pennsylvania State Univeristy
152 Davey Laboratory, Univeristy Park, Pennsylvania, 16802

2Affymax Research Institute
3410 Central Expressway Santa Clara, California 95051

ABSTRACT

The absorption of a drug compound through the human intestinal cell lining is an important property for potential drug candidates. Measuring this property, however, can be costly and time-consuming. The use of quantitative structure-property relationships (QSPRs) to estimate percent human intestinal absorption (%HIA) is an attractive alternative to experimental measurements. A data set of 86 drug and drug-like compounds with measured values of %HIA taken from the literature was used to develop and test a QSPR model. The compounds were encoded with calculated molecular structure descriptors. A non-linear neural network model was developed by using the genetic algorithm with a neural network fitness evaluator. The calculated %HIA (cHIA) model performs well, with root-mean-square (rms) errors of 9.4 %HIA units for the training set, 19.7 %HIA units for the cross-validation (CV) set, and 16.0 %HIA units for the external prediction set.


PREDCTION OF INFINITE DILUTION ACTIVITY COEFFICIENTS OF ORGANIC COMPOUNDS IN AQUEOUS SOLUTION FROM MOLECULAR STRUCTURE

B. E. Mitchell and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University,
University Park, Pennsylvania 16802

ABSTRACT

A quantitative structure-property relationship study is performed to develop models that relate the structures of a heterogeneous group of organic compounds to their infinite dilution activity coefficients, IDAC. The molecular structures are represented by calculated descriptors that encode their topological, electronic, and geometric features. The descriptors are used to develop multiple linear regression and computational neural network models to predict the IDAC. Genetic algorithm and simulated annealing routines are used to select subsets of descriptors that form the best models. The models that are developed have predictive ability in the range of the experimental error of infinite dilution activity coefficient measurements.


PREDICTION OF AQUEOUS SOLUBILITY OF ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

B. E. Mitchell and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University,
University Park, Pennsylvania 16802

ABSTRACT

Multiple linear regression (MLR) and computational neural networks are utilized to develop mathematical models to relate the structures of a diverse set of 332 organic compounds to their aqueous solubilities. Topological, geometric, and electronic descriptors are used to numerically represent structural features of the data set compounds. Genetic algorithm and simulated annealing routines, in conjunction with MLR and CNN, are used to select subsets of descriptors that accurately relate to aqueous solubility. Nonlinear models with nine calculated structural descriptors are developed that have a training set root-mean-square error of 0.394 log units for compounds which span a -log(molarity) range from -2 to +12 log units.


PREDICTION OF CRITICAL TEMPERATURES AND PRESSURES OF INDUSTRIALLY IMPORTANT ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

Brian E. Turner, Chandra L. Costello, and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University,
University Park, Pennsylvania 16802

ABSTRACT

Quantitative-structure property relationships methods are used to develop mathematical models to predict critical temperatures and pressures of a diverse set of organic compounds taken from the Design Institute for Physical Property Data (DIPPR) database. Each compound is repreented with calculated molecular structure descriptors that encode its topological, electronic, geometrical, and other features. Subsets of descriptors are selected with simulated annealing and genetic algorithms. Models to predict the critical properties are constructed using multiple linear regression analysis and computational neural networks with errors comparable to the experimental errors of the critical property data.


PREDICTION OF THE CLEARING TEMPERATURES OF A SERIES OF LIQUID CRYSTALS FROM MOLECULAR STRUCTURE

Stephen R. Johnson and Peter C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University,
University Park, Pennsylvania 16802

ABSTRACT

A quantitative structure-property relationship (QSPR) investigation was performed to develop a mathematical link between molecular structure and the clearing temperature of a series of structurally related liquid crystals. Molecular structures were encoded by a series of numerical descriptors encoding information regarding size, shape, and the ability to participate in intermolecular interactions. A genetic algorithm feature selection routine was utilized to select high-quality subsets of these descriptors for use in computational neural network models. A successful 10-descriptor model was developed using 318 compounds with a root-mean-square error of 5.4 K for the clearing temperature for the compounds in an external prediction set not used in model development.


PREDICTION OF METHYL RADICAL ADDITION RATE CONSTANTS FROM MOLECULAR STRUCTURE

Gregory A. Bakken and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University,
University Park, Pennsylvania 16802

ABSTRACT

Multiple linear regression and computational neural networks (CNNs) are used to develop quantitative structure-property relationships for methyl radical addition rate constants. Structure based descriptors are used to numerically encode substrate information for 191 compounds. Descriptors can be classified as topological, geometric, electronic, or combination. A six-descriptor CNN was developed that produced training set rms error = 0.381 log units and rms error = 0.496 log units for an external prediction set. A seven-descriptor CNN was used to build a model for a subset of 172 of the compounds. Training set rms error was 0.424 log units and prediction set rms error reduced to 0.409 log units. Model predictions were on the order of experimental error.


PREDICTION OF ACUTE MAMMALIAN TOXICITY OF ORGANOPHOSPHORUS PESTICIDE COMPOUNDS FROM MOLECULAR STRUCTURE

D. V. Eldred and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

A quantitative structure-activity relationship (QSAR) investigation was done for the acute oral mammalian toxicity (LD50) of a set of 54 organophosphorus pesticide compounds. The compounds were represented with calculated molecular structure descriptors, which encoded their topological, electronic, and geometrical features. Feature selection was done with a genetic algorithm to find subsets of descriptors that would support a high quality computational neural network (CNN) model to link the structural descriptors to the -log(mmol/kg) values for the compounds. The best seven-descriptor non-linear CNN model found had an rms error of 0.22 log units for the training set compounds and 0.25 log units for the prediction set compounds.


PREDICTION OF FATHEAD MINNOW ACUTE TOXICITY OF ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

D. V. Eldred, C.L. Weikel, P.C. Jurs, K.L.E. Kaiser

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

and

National Water Research Institute
Burlington, Ontario L7R 4A6, Canada

ABSTRACT

Interest in the prediction of toxicity without the use of experimental data is growing, and quantitative structure-activity relationship (QSAR) methods are valuable for such predictions. A QSAR study of acute aqueous toxicity of 375 diverse organic compounds is developed using only calculated structural features as independent variables. Toxicity is expressed as -log(LD50) with the units -log(mmol/L) and ranges from -3 to 6. Multiple linear regression and computational neural networks (CNNs) are utilized for model building. The best model is a non-linear CNN model based on eight calculated molecular structure descriptors. The root-mean-square log(LD50) errors for the training, cross-validation and prediction sets of this CNN model are 0.71, 0.77, and 0.74 -log(mmol/L), respectively. These results are compared to a previous study with the same data set which included many more descriptors and used experimental data in the descriptor pool.


PREDICTION OF HYDROXYL RADICAL RATE CONSTANTS FROM MOLECULAR STRUCTURE

G. A. Bakken and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Quantitative structure-property relationships are developed using multiple linear regression and computational neural networks (CNNs). Structure-based descriptors are used to numerically encode molecular features that can be used to form models describing reaction rates with hydroxyl radicals. For a set of 57 unsaturated hydrocarbons, a 5-2-1 CNN was developed that produced a root-mean-square (rms) error of 0.0638 log units for the training set and 0.0657 log units for an external prediction set. The residular sum of squares for all 57 compunds was 0.234 log units, which compares favorably with existing methodologies. Additionally, a 10-7-1 CNN was built to predict hydroxyl radical rate constants for a diverse set of 312 compounds. The training set rms error was 0.229 log units, and the rms error for the external prediction set was 0.254 log units. This model demonstrates the ability to provide accurate predictions over a wide range of functionalities.


PREDICTION OF THE NORMAL BOILING POINTS OF ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURES WITH A COMPUTATIONAL NEURAL NETWORK MODEL

E. S. Goll and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Computational methods were used to link the molecular structures of diverse, industrially important, organic compounds from three different data sets to their normal boiling points. The data were provided by the Design Institute for Physical Property Data (DIPPR) Project 801 database. These data sets were composed of 298 hydrocarbons and heteroatom-containing structures including N compounds (data set I), 277 heteroatom-containing compounds excluding N compounds (data set II), and 104 haolgen- and heteroatom- containing compounds, all of which contained at least 1 type of N- functional group (data set III). Each compound was represented by a set of calculated molecular structure descriptors. Genetic algorithms were used to select the best subsets of descriptors. Multiple linear regression and computational neural networks were employed to create the models best suited for the prediction of normal boiling points. This study used a nonlinear genetic algorithm program, for the first time on these data sets, to obtain the final models.


PREDICTION OF VAPOR PRESSURES OF HYDROCARBONS AND HALOHYDROCARBONS FROM MOLECULAR STRUCTURE WITH A COMPUTATIONAL NEURAL NETWORK MODEL

E. S. Goll and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Computational methods are used to link the molecular structures of 352 hydrocarbons and halohydrocarbons to their vapor pressures at 25o C. the data are from the Design Institute for Physical Property Data (DIPPR) database. Vapor pressures of the compounds range from -1.016 log(VP) to +6.65 log(VP) with VP in pascals. Multiple linear regression was used to create a nonlinear model best suited for prediction of vapor pressure. The root-mean-square errors assiciated with the training, cross-validation, and prediction set compounds used for this CNN model were 0.163, 0.163, and 0.209 log units.


PREDICTION OF IC50 VALUES FOR ACAT INHIBITORS FROM MOLECULAR STRUCTURE

S. J. Patankar and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

A quantitative structure-activity study is performed on several series of compounds derived from N-chlorosulfonyl isocyanate to develop models that relate structures to IC50 activity for inhibition of acyl-CoA:cholesterol O-acyltransferase (ACAT). Numerical descriptors are used to encode topological, electronic, and geometric information from the molecular structures of the inhibitors. A data set of 157 compounds showing triglyceride- and cholesterol-lowering acitivty is used to develop successful linear regression models and nonlinear computational neural network models. The models are validated using an external prediction set.


PREDICTION OF INHIBITION OF THE SODIUM ION-PROTON ANTIPORTER BY BENZOYLGUANIDINE DERIVATIVES FROM MOLECULAR STRUCTURE

G. W. Kauffman and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

The use of quantitative structure-activity relationships to predict IC50 values of 113 potential NA+/H+ antiporter inhibitors is reported. Multiple linear regression and computational neural networks (CNNs) are used to develop models using a set of information-rich descriptors. The descriptors encode information about topology, geometry, electronics, and combination hybrids. A five-descriptor CNN model with root-mean-square (rms) errors of 0.278 log units for the training set and 0.377 log units for the prediction set was developed. Examination of data set subclasses showed that systematic structural variations were also well-encoded resulting in 100% accuracy of prediction trends. An experiment involving a committee of five CNNs was also performed to examine the effect of network output averaging. This showed improved results decreasing the training and cross-validation set rms error to 0.228 log units and the prediction set rms error to 0.296 log units.


QUANTITATIVE STRUCTURE-PROPERTY RELATIONSHIPS FOR THE PREDICTION OF VAPOR PRESSURES OF ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

H.E. McClelland and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

A quantitative structure-property relationship (QSPR) is developed to relate the molecular structures of 420 diverse organic compounds to their vapor pressures at 25 oC expressed as log(vp), where vp is in pascals. The log(vp) values range over 8 orders of magnitude from -1.34 to 6.68 log units. The compounds are encoded with topological, electronic, geometrical, and hybrid descriptors. Statistical and computational neural network (CNN) models are built using subsets of the descriptors chosen by simulated annealing and genetic algorithm feature selection routines. An 8-descriptor CNN model, which contains only topological descriptors, is presented which has a root-mean-square (rms) error of 0.37 log unit for a 65-member external prediction set. A 10-descriptor CNN model containing a larger selection of descriptor types gives an improved rms error of 0.33 log unit for the external prediction set.


CLASSIFICATION OF MULTIDRUG-RESISTANCE REVERSAL AGENTS USING STRUCTURE-BASED DESCRIPTORS AND LINEAR DISCRIMINANT ANALYSIS

G.A. Bakken and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Linear discriminant analysis is used to generate models to classify multidrug-resistance reversal agents based on activity. Models are generated and evaluated using multidrug-resistance reversal activity values for 609 compounds measured using adriamycin-resistant P388 murine leukemia cells. Structure-based descriptors numerically encode molecular features which are used in model formation. Two types of models are generated: one type to classify compounds as inactive, moderately active, and active (three-class problem) and one type to classify compounds as inactive or active without considering the moderately active class (two-class problem). Two activity distributions are considered, where the separation between inactive and active compounds is different. When the separation between inactive and active classes is small, a model based on nine topological descriptors is developed that produces a classification rate of 83.1% correct for an external prediction set. Larger separation between active and inactive classes raises the prediction set classification rate to 92.0% correct using a model with six topological descriptors. Models are further validated through Monte Carlo experiments in which models are generated after class labels have been scrambled. The classification rates achieved demonstrate that the models developed could serve as a screening mechanism to identify potentially useful MDRR agents from large libraries of compounds.


PREDICTION OF C60 SOLUBILITIES FROM SOLVENT MOLECULAR STRUCTURES

S.M. Danauskas and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Models predicting fullerene solubility in 96 solvents at 298 K were developed using multiple linear regression and feed-forward computational neural networks (CNN). The data set consisted of a diverse set of solvents with solubilities ranging from -3.00 to 2.12 log (solubility) where solubility = (1 × 104)(mole fraction of C60 in saturated solution). Each solvent was represented by calculated molecular structure descriptors. A pool of the best linear models, as determined by rms error, was developed, and a CNN model was developed for each of the linear models. The best CNN model was chosen based on the lowest value of a specified cost function and had an architecture of 9-3-1. The 76-compound training set for this model had a root-mean-square error of 0.255 log solubility units, while the 10-compound cross-validation set had an rms error of 0.253. The 10-compound external prediction set had an rms error of 0.346 log solubility units.


PREDICTION OF SURFACE TENSION, VISCOSITY AND THERMAL CONDUCTIVITY FOR COMMON ORGANIC SOLVENTS USING QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS

G.W. Kauffman and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Predictive models for the surface tension, viscosity, and thermal conductivity of 213 common organic solvents are reported. The models are derived from numerical descriptors which encode information about the topology, geometry, and electronics of each compound in the data set. Multiple linear regression and computational neural networks are used to train and evaluate models based on statistical indices and overall root-mean-square error. Eight-descriptor models were developed for both surface tension and viscosity while a 9-descriptor model was developed for thermal conductivity. In addition, a single 9-descriptor model was developed for prediction of all three properties. The results of this study compare favorably to previously reported prediction methods for these three properties.


PREDICTION OF AQUEOUS SOLUBILITY OF HETEROATOM-CONTAINING ORGANIC COMPOUNDS FROM MOLECULAR STRUCTURE

N.R. McElroy and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

The use of quantitative structure-property relationships (QSPRs) to predict aqueous solubilities (log S) of heteroatom-containing organic compounds from their molecular structure is presented. Three data sets are examined. Data set 1 contains 176 compounds having one or more nitrogen atoms with some oxygen (log S[mol/L] range is -7.41 to 0.96). Data set 2 contains 223 compounds having one or more oxygen atoms, with no nitrogen (log S[mol/L] range is -8.77 to 1.57). Data set 3 contains all 399 compounds from sets 1 and 2 (log S/mol/L] range is -8.77 to 1.57). After descriptor generation and feature selection, multiple linear regression (MLR) and computational neural network (CNN) models are developed for aqueous solubility prediction. The best results were obtained with nonlinear CNN models. Root-mean-square (rms) errors for training with the three data sets ranged from 0.3 to 0.6 log units. All models were validated with external prediction sets, with the rms errors ranging from 0.6 log units to 1.5 log units.


QSARs FOR 6-AZASTEROIDS AS INHIBITORS OF HUMAN TYPE 1 5a-REDUCTASE: PREDICTION OF BINDING AFFINITY AND SELECTIVITY RELATIVE TO 3-BHSD

G.A. Bakken and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Quantitative structure-activity relationships (QSARs) are developed to describe the ability of 6-azasteroids to inhibit human type 1 5a-reductase. Models are generated using a set of 93 compounds with known binding affinities (Ki) to 5a-reductase and 3b-hydroxy-D5-steroid dehydrogenase/3-keto-D5-steroid isomerase (3-BHSD). QSARs are generated to predict Ki values for inhibitors of 5a-reductase and to predict selectivity (Si) of compound binding to 3-BHSD relative to 5a-reductase. Log(Ki) values range from -0.70 log units to 4.69 log units, and log(Si) values range from -3.00 log units to 3.84 log units. Topological, geometric, electronic, and polar surface descriptors are used to encode molecular structure. Information-rich subsets of descriptors are identified using evolutionary optimization procedures. Predictive models are generated using linear regression, computational neural networks (CNNs), principal components regression, and partial least squares. Compounds in an external prediction set are used for model validation. A 10-3-1 CNN is developed for prediction of binding affinity to 5a-reductase that produces root-mean-square error (RMSE) of 0.293 log units (R2 = 0.97) for compounds in the external prediction set. Additionally, an 8-3-1 CNN is generated for prediction of inhibitor selectivity that produces RMSE = 0.513 log units (R2 = 0.89) for the external prediction set. Models are further validated through Monte Carlo experiments in which models are generated after dependent variable values have been scrambled.


PATTERN RECOGNITION ANALYSIS OF OPTICAL SENSOR ARRAY DATA TO DETECT NITROAROMATIC COMPOUND VAPORS

G.A. Bakken,1 G.W. Kauffman,1 P.C. Jurs,1* K.J. Albert,2 and S.S. Stitzel2

1Department of Chemistry, 152 Davey Laboratory, Penn State University, University Park, PA 16802

2Department of Chemistry, Tufts University, Medford, MA 02155

ABSTRACT

A fiber optic based sensor array has been employed to determine the presence or absence of nitroaromatic compound (NAC) vapors in variable backgrounds of volatile organic compound (VOC) vapors. The system is based on previously developed cross-reactive array technology and employs sensor array attached to the distal tips of an optical fiber bundle. Four different sensors, with fifty replicates of each type, were used to computationally train the system to detect and recognize the presence of explosives-like NAC vapors. Two of the NACs were employed because they are commonly detected on the soil surface above buried 2,4,6-trinitrotoluene plastic land mines. Based on fluorescent responses, samples in an external prediction set were classified with 100% accuracy using models trained to determine if NAC vapors were present. Additionally, models were developed with one of the three NAC vapors held out of the training process but included in the prediction set. In all three models, over 92% of samples in an external prediction set were classified correctly.


Linear Regression and Computational Neural Network Prediction of Tetrahymena Acute Toxicity for Aromatic Compounds from Molecular Structure

J.R. Serra and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

A quantitative structure toxicity relationship (QSTR) has been derived for a diverse set of 448 industrially important aromatic solvents. Toxicity was expressed as the 50% growth impairment concentration (ICG50) for the ciliated protozoa Tetrahymena and spans the range -1.46 to 3.36 log units. Molecular descriptors that encode topological, geometrical, electronic, and hybrid geometrical-electronic structural features were calculated for each compound. Subsets of molecular descriptors were selected via a simulated annealing technique and a genetic algorithm. From this reduced pool of descriptors, multiple linear regression models and nonlinear models using computational neural networks (CNNs) were derived and then used to predict the ICG50 values for an external set of representative compounds. An average of 10 nonliear CNN models with 11-5-1 architecture was found to best describe the system with root-mean-square errors of 0.28, 0.29, and 034 log units for the training, cross-validation, and prediction sets, respectively.


QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically- Based Numerical Descriptors

G.W. Kauffman and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Experimental IC50 data for 314 selective cyclooxygenase-2 (COX-2) inhibitors are used to develop quantitation and classification models as a potential screening mechanism for larger libraries of target compounds. Experimental log(IC 50) values ranged from 0.23 to < 5.00. Numerical descriptors encoding solely topological information are calculated for all structures and are used as inputs for linear regression, computational neural network, and classification analysis routines. Evolutionary optimization algorithms are then used to search the descriptor space for information-rich subsets which minimize the rms error of a diverse training set of compounds. An eight-descriptor model was identified as a robust predictor of experimental log(IC50) values, producing a root-mean-square error of 0.625 log units for an external prediction set of inhibitors which took no part in model development. A k-nearest neighbor classification study of the data set discriminating between active and inactive members produced a nine-descriptor model able to classify 83.3% of the prediction set compounds correctly.


Development of Quantitative Structure-Activity Relationship and Classification Models for a Set of Carbonic Anhydrase Inhibitors

B.E. Mattioni and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Mathematical models are developed to find quantitative structure-activity relationships that correlate chemical structure and inhibition toward three carbonic anhydrase (CA) isozymes: CA I, II, and IV. Numerical descriptors are generated to encode important topological, geometric, and electronic features of molecular structure. After descriptor generation, multiple linear regression, and computational neural network (CNN) analyses are performed on various descriptor subsets to find superior models for prediction. Committees of five CNNs were utilized to average final predicted values for the 142-compound data set. For inhibitors of CA I, an 8-5-1 CNN committee produced a training set rms error of 0.105 log K-i (r2 = 0.994) and prediction set rms error of 0.208 log K-i (r2 = 0.980). Training and prediction set rms errors of 0.140 log K-i (r2 = 0.992) and 0.231 log K-i (r2 = 0.971), respectively, were produced by a 9-5-1 CNN committee for inhibitors of CA II. For prediction of CA IV inhibitors, an 8-5-1 CNN committee produced training and prediction set rms errors of 0.147 log K-i (r2 = 0.992) and 0.211 log K-i (r2 = 0.991), respectively. In addition, classification models were built using k-nearest neighbor (kNN) analysis to solve two- and three-class problems for inhibitors of CA IV. A three-descriptor classification model proved superior in labeling compounds as active or inactive inhibitors for the two-class problem. Training and prediction set percent classification rates of 100% and 87.1%, respectively, were obtained. For the three-class (active/moderate/inactive) problem, a five-descriptor model was deemed optimal producing a training set percent classification rate of 98.8% and prediction set rate of 79.0%.


Prediction of Peptide Ion Collision Cross Sections from 
Topological Molecular Structure and Amino Acid Parameters

P.D. Mosier*, A.E. Counterman$*, P.C. Jurs*, and D.E. Clemmer$

*Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802
Department of Chemistry, Indiana University, Bloomington, IN 47405

ABSTRACT

Quantitative structure-property relationships (QSPRs) have been developed to predict the ion mobility spectrometry (IMS) collision cross sections of singly protonated lysine-terminated peptides using information derived from topological molecular structure and various amino acid parameters. The primary amino acid sequence alone is sufficient to accurately predict the collision cross section. The models were built using multiple linear regression (MLR) and computational neural networks (CNNs). The best MLR model found contains six descriptors and predicts 94 of 113 peptides (83%) to within 2% of their experimentally determined values. The best CNN model using the same six descriptors predicts 105 of the 113 peptides (93%) to within 2% of their experimentally determined values. The best overall CNN model, using a different set of six descriptors, predicts 109 of the 113 peptides (96%) to within 2% of their experimentally determined values. In addition, this model can discriminate among peptides having identical amino acid composition, but differing in primary amino acid sequence. This represents a capability not found in previously described models. The descriptors used in the models presented may provide some insight into the nature of peptide ion folding in the gas phase.


Prediction of Glass Transition Temperatures from Monomer and Repeat Unit Structure 
Using Computational Neural Networks
Mattioni and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Quantitative structure-property relationships (QSPR) are developed to correlate glass transition temperatures and chemical structure. Both monomer and repeat unit structures are used to build several QSPR models for Parts 1 and 2 of this study, respectively. Models are developed using numerical descriptors, which encode important information about chemical structure (topological, electronic, and geometric). Multiple linear regression analysis (MLRA) and computational neural networks (CNNs) are used to generate the models after descriptor generation. Optimization routines (simulated annealing and genetic algorithm) are utilized to find information-rich subsets of descriptors for prediction. A 10-descriptor CNN model was found to be optimal in predicting Tg values using the monomer structure (Part 1) for 165 polymers. A committee of 10 CNNs produced a training set rms error of 10.1K (r2 = 0.98) and a prediction set rms error of 21.7K (r2 = 0.92). An 11-descriptor CNN model was developed for 251 polymers using the repeat unit structure (Part 2). A committee of CNNs produced a training set rms error of 21.1K (r2 = 0.96) and a prediction set rms error of 21.9K (r2 = 0.96).


Prediction of Glycine/NMDA Receptor Antagonist Inhibition from Molecular Structure

S.J. Patankar and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

The design and blood brain barrier crossing of glycine/NMDA receptor antagonists are of significant interest in pharmaceutical research. The use of these antagonists in stroke or seizure reduction have been considered. Measuring the inhibitory concentrations, however, can be time-consuming and costly. The use of quantitative structure-activity relationships to estimate IC50 values for these receptor antagonists is an attractive alternative compared to experimental measurement. A data set of 109 compounds with measured log(IC50) values ranging from -0.57 to 4.5 is used. Structural information is encoded with numerical descriptors for topological, electronic, geometric, and polar surface properties. A genetic algorithm with a computational neural network fitness evaluator is used to select the best descriptor subsets. Multiple linear regression and computational neural network models are developed. Additionally, a quantitative radial basis function neural network (QRBFNN) was developed with the intent of introducing nonlinearity at a faster speed. A genetic algorithm using the radial basis function network as a fitness evaluator was also developed to search descriptor space for optimum subsets. All models are tested using an external prediction set. The nonlinear computational neural network model has root-mean-square errors of approximately half a log unit.


QSAR/QSPR Studies Using Probabilistic Neural Networks and
Generalized Regression Neural Networks

P.D. Mosier and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

The Probabilistic Neural Network (PNN) and its close relative, the Generalized Regression Neural Network (GRNN), are presented as simple yet powerful neural network techniques for use in Quantitative Structure-Activity Relationship (QSAR) and Quantitative Structure-Property Relationship (QSPR) studies. The PNN methodology is applicable to classification problems, and the GRNN is applicable to continuous function mapping problems. The basic underlying theory behind these probability-based methods is presented along with two applications of the PNN/GRNN methodology. The PNN model presented identifies molecules as potential soluble epoxide hydrolase inhibitors using a binary classification scheme. The GRNN model presented predicts the aqueous solubility of nitrogen- and oxygen-containing small organic molecules. For each application, the network inputs consist of a small set of descriptors that encode structural features at the molecular level. Each of these studies has also been previously addressed in this research group using more traditional techniques such as k-nearest neighbor classification, multiple linear regression, and multilayer feed-forward neural networks. In each case, the predictive power of the PNN and GRNN models was found to be comparable to that of the more traditional techniques but requiring significantly fewer input descriptors.


Prediction of Dihydrofolate Reductase Inhibition and Selectivity Using
Computational Neural Networks and Linear Discriminant Analysis

B.E. Mattioni and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

A data set of 345 dihydrofolate reductase (DHFR) inhibitors was use to build QSAR models that correlate chemical structure and inhibition potency for three types of DHFR: rat liver (rl), Pneumocystis carinii (pc), and Toxoplasma gondii (tg). Quantitative models were build using subsets of molecular structure descriptors being analyzed by computational neural networks. Neural network models were able to accurately predict log IC50 values for the three types of DHFR to within +/-0.65 log units (data sets ranged ~5.5 log units) of the experimentally determined values. Classification models were also constructed using linear discriminant analysis (LDA) to identify compounds as selective or nonselective inhibitors of bacterial DHFR (pcDHFR and tgDHFR) relative to mammalian DHFR (rlDHFR). A leave-N-out training procedure was used to add robustness to the models and to prove that consistent results could be obtained using different training and prediction set splits. The best LDA models were able to correctly predict DHFR selectivity for ~70% of the external prediction set compounds. A set of new nitrogen- and oxygen-specific descriptors were developed especially for this data set to better encode structural features, which are believed to directly influence DHFR inhibition and selectivity.


Development of Binary Classification of Structural Chromosome Aberrations
for a Diverse Set of Organic Compounds

J.R. Serra, E.D. Thompson, and P.C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

Classification models are generated to predict in vitro cytogenetic results for a diverse set of 383 organic compounds. Both k-nearest neighbor and support vector machine models are developed. They are based on calculated molecular structure descriptors. Endpoints used are the labels clastogenic or nonclastogenic according to an in vitro chromosomal aberration assay with Chinese hamster lung cells. Compounds that were tested with both a 24 and 48 h exposure are included. Each compound is represented by calculated molecular structure descriptors encoding the topological, electronic, geometrical, or polar surface area aspects of the structure. Subsets of informative descriptors are identified with genetic algorithm feature selection coupled to the appropriate classification algorithm. The overall classification success rate for a k-nearest neighbor classifier built with just six topological descriptors is 81.2% for the training set and 86.5% for an external prediction set. The overall classification success rate for a three-descriptor support vector machine model is 99.7% for the training set, 92.1% for the cross-validation set, and 83.8% for an external prediction set.


QSAR and Classification of Murine and Human Soluble Epoxide Hydrolase
Inhibition by Urea-like Compounds

N.R. McElroy, P.C. Jurs*, C. Morisseau, and B.D. Hammock

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

A data set of 348 urea-like compounds that inhibit the soluble epoxide hydrolase enzyme in mice and humans is examined. Compounds having IC50 values ranging from 0.06 to >500 mM (murine) and 0.10 to >500 mM (human) are categorized as active or inactive for classification, while quantitation is performed on smaller compound subsets ranging from 0.07 to 431 mM (murine) and 0.11 to 490 mM (human). Each compound is represented by calculated structural descriptors that encode topological, geometrical, electronic, and polar surface features. Multiple linear regression (MLR) and computational neural networks (CNNs) are employed for quantitative models. Three classification algorithms, k-nearest neighbor (kNN), linear discriminant analysis (LDA), and radial basis function neural networks (RBFNN), are used to categorize compounds as active or inactive based on selected data split points. Quantitative modeling of human enzyme inhibition results in a nonlinear, five-descriptor model with root-mean-square errors (log units of IC50 [mM]) of 0.616 (r2 = 0.66), 0.674 (r2 = 0.61), and 0.914 (r2 = 0.33) for training, cross-validation, and prediction sets, respectively. The best classification results for human and murine enzyme inhibition are found using kNN. Human classification rates using a seven-descriptor model for training and prediction sets are 89.1% and 91.4%, respectively. Murine classification rates using a five-descriptor model for training and prediction sets are 91.5% and 88.6%, respectively.


Generation of QSAR sets with a self-organizing map

R. Guha, J.R. Serra and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

A Kohonen self-organizing map (SOM) is used to classify a data set consisting of dihydrofolate reductase inhibitors with the help of an external set of Dragon descriptors. The resultant classification is used to generate training, cross-validation (CV) and prediction sets for QSAR modeling using the ADAPT methodology. The results are compared to those of QSAR models generated using sets created by activity binning and a sphere exclusion method. The results indicate that the SOM is able to generate QSAR sets that are representative of the composition of the overall data set in terms of similarity. The resulting QSAR models are half the size of those published and have comparable RMS errors. Furthermore, the RMS errors of the QSAR sets are consistent, indicating good predictive capabilities as well as generalizability.


Development of QSAR Models To Predict and Interpret the Biological Activity of Artemisinin Analogues

R. Guha and P. C. Jurs*

Department of Chemistry, 152 Davey Laboratory, Penn State University
University Park, PA 16802

ABSTRACT

This work presents the development of Quantitative Structure-Activity Relationship (QSAR) models to predict the biological activity of 179 artemisinin analogues. The structures of the molecules are represented by chemical descriptors that encode topological, geometric, and electronic structure features. Both linear (multiple linear regression) and nonlinear (computational neural network) models are developed to link the structures to their reported biological activity. The best linear model was subjected to a PLS analysis to provide model interpretability. While the best linear model does not perform as well as the nonlinear model in terms of predictive ability, the application of PLS analysis allows for a sound physical interpretation of the structure-activity trend captured by the model. On the other hand, the best nonlinear model is superior in terms of pure predictive ability, having a training error of 0.47 log RA units (R2 = 0.96) and a prediction error of 0.76 log RA units (R2 = 0.88).


The Development of Linear, Ensemble and Non-linear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors

R. Guha and P. C. Jurs*

Department of Chemistry, 104 Chemistry Research Building, Penn State University
University Park, PA 16802

ABSTRACT

A QSAR modeling study has been done with a set of 79 piperazyinylquinazoline analogues which exhibit PDGFR inhibition. Linear regression and nonlinear computational neural network models were developed. The regression model was developed with a focus on interpretative ability using a PLS technique. However, it also exhibits a good predictive ability after outlier removal. The nonlinear CNN model had superior predictive ability compared to the linear model with a training set error of 0.22 log(IC50) units (R2 = 0.93) and a prediction set error of 0.32 log(IC50) units (R2 = 0.61). A random forest model was also developed to provide an alternate measure of descriptor importance. This approach ranks descriptors, and its results confirm the importance of specific descriptors as characterized by the PLS technique. In addition the neural network model contains the two most important descriptors indicated by the random forest model.