Machine learning-assisted molecular design for high-performance organic photovoltaic materials

Machine learning-assisted molecular design for high-performance organic photovoltaic materials
Using machine learning to assist molecular design. Credit: Wenbo Sun, Science Advances, doi: 10.1126/sciadv.aay4275

To synthesize high-performance materials for organic photovoltaics (OPVs) that convert solar radiation into direct current, materials scientists must meaningfully establish the relationship between chemical structures and their photovoltaic properties. In a new study on Science Advances, Wenbo Sun and a team including researchers from the School of Energy and Power Engineering, School of Automation, Computer Science, Electrical Engineering and Green and Intelligent Technology, established a new database of more than 1,700 donor materials using existing literature reports. They used supervised learning with machine learning models to build structure-property relationships and fast screen OPV materials using a variety of inputs for different ML algorithms.

Using molecular fingerprints (encoding a structure of a molecule in binary bits) beyond a length of 1000 bits Sun et al. obtained high ML prediction accuracy. They verified the reliability of the approach by screening 10 newly designed donor materials for consistency between  predictions and experimental outcomes. The ML results presented a powerful tool to prescreen new OPV materials and accelerate the development of OPVs in materials engineering.

Organic photovoltaic (OPV) cells can facilitate direct and cost-effective transformation of solar energy into electricity with rapid recent growth to exceed power conversion efficiency (PCE) rates. Mainstream OPV research has focused on building a relationship between new OPV molecular structures and their photovoltaic properties. The traditional process typically involves the design and synthesis of photovoltaic materials for the assembly/optimization of photovoltaic cells. Such approaches result in time consuming research cycles that require delicate control of chemical synthesis and device fabrication, experimental steps and purification. The existing OPV development process is slow and inefficient with less than 2000 OPV donor  synthesized and tested so far. However, the data gathered from decades of research work are priceless, with potential values remaining to be fully explored to generate high-performance OPV materials.

Machine learning-assisted molecular design for high-performance organic photovoltaic materials
Information about the database of OPV donor materials. (A) Distribution of PCE values of the 1719 molecules in the database. (B) Schematics of expressions of a molecule, including image, simplified molecular-input line-entry system (SMILES), and fingerprints. Credit: Science Advances, doi: 10.1126/sciadv.aay4275

To extract useful information from the data, Sun et al. required a sophisticated program to scan through a large dataset and extract relationships from among the features. Since machine learning (ML) provides computational tools to learn and recognize patterns and relationships using a training dataset, the team used a data-driven approach to enable ML and predict diverse material properties. The ML algorithm did not have to understand the chemistry or physics behind the materials properties to accomplish the tasks. Similar methods have recently predicted the activity/properties of materials successfully during materials discoverydrug development and materials design. Prior to ML applications, scientists had generated cheminformatics to establish a useful toolbox.

Materials scientists have only recently explored the applications of ML in the OPV field. In the present work, Sun et al. established a database containing 1719 experimentally tested donor OPV materials gathered from literature. They studied the importance of programming language expression of the molecules first to understand ML performance. They then tested several different types of expressions including images, ASCII strings, two types of descriptors and seven types of molecular fingerprints. They observed the model predictions to be in good agreement with the experimental results. The scientists expect the new approach to greatly accelerate the development of new and highly efficient organic semiconducting materials for OPV research applications.

The research team first transformed the raw data into a machine readable representation. A variety of expressions exist for the same molecule comprising vastly different chemical information presented at different abstract levels. Using a set of ML models, Sun et al. explored diverse expressions of a molecule by comparing their predicted accuracy for power conversion efficiency (PCE) to obtain a deep-learning model accuracy of 69.41 percent. The relatively unsatisfactory performance was due to the small size of the database. For instance, previously when the same group used a larger number of molecules of up to 50,000, the accuracy of the deep-learning model exceeded 90 percent. To fully train a deep-learning model, researchers must implement a larger database containing millions of samples.

Machine learning-assisted molecular design for high-performance organic photovoltaic materials
Testing results of ML models. (A) Testing of the deep learning model using images as input. (B to D) Testing results of different ML models using (B) SMILES, (C) PaDEL, and (D) RDKIt descriptors as input. Credit: Science Advances, doi: 10.1126/sciadv.aay4275

Sun et al. only had hundreds of molecules in each category at present, making it difficult for the model to extract enough information for higher accuracy. While it is possible to fine-tune a pre-trained model to reduce the amount of data required, thousands of samples are still necessary to accomplish a sufficient number of features. This led to the option of increasing the size of the database when using images to express molecules.

The scientists used five types of supervised ML algorithms in the study, including (1) back propagation (BP) neural network (BPNN), (2) deep neural network (DNN), (3) deep learning, (4) support vector machine (SVM) and (5) random forest (RF). These were advanced algorithms, where BPNN, DNN and deep learning were based on the artificial neutral network (ANN). The SMILES code (simplified molecular-input line entry system) provided another original expression of a molecule, which Sun et al. used as inputs for four models. Based on the results, the highest accuracy approximated 67.84 percent for the RF model. As before, unlike with deep learning, the four classical methods could not extract hidden features. As a whole, SMILES performed worse than images as descriptors of molecules to predict the PCE (power conversion efficiency) class in the data.

The researchers then used molecular descriptors that can describe the properties of a molecule using an array of numbers instead of the direct expression of a chemical structure. The research team used two types of descriptors PaDEL and RDKIt in the study. After extensive analyses across all ML models, a large data size implied more descriptors irrelevant to PCE affecting the ANN performance. Comparatively, a small data size implied inefficient chemical information to effectively train ML models, when using molecular descriptors as input in ML approaches, the key relied on finding appropriate descriptors that directly related to the target object.

Machine learning-assisted molecular design for high-performance organic photovoltaic materials
Performance of ML models. (A to D) The testing results of (A) BPNN, (B) DNN, (C) RF, and (D) SVM using different types of fingerprints as input. Credit: Science Advances, doi: 10.1126/sciadv.aay4275.

The team next used molecular fingerprints; typically designed to represent molecules as mathematical objects and originally created to identify isomers. During large-scale database screening, the concept is represented as an array of bits containing “1” s and “0” s to describe the presence or absence of specific substructures or patterns within the molecules. Sun et al. used seven types of fingerprints as inputs to train the ML models and considered the influence of the fingerprint length on the prediction performance of different models to obtain diverse fingerprints. For instance, molecular access system (MACCS) fingerprints contained 166 bits and were the shortest input and the results were unsatisfactory due to their limited information.

Sun et al. showed the best combination of programming language and ML algorithm obtained using Hybridization fingerprints of 1024 bits and RF, to achieve a prediction accuracy of 81.76 percent; where Hybridization fingerprints represented SP2 hybridization states of molecules. When the fingerprint length increased from 166 to 1024 bits, the performance of all ML models improved since longer fingerprints included more chemical information.

Machine learning-assisted molecular design for high-performance organic photovoltaic materials
Verification of ML models with experiment. (A) Comparison of the results from four different models. (B) Schematic diagram of the cell architecture used in this study. (C) J-V curve of the solar cell with the active layer using the predicted donor material. (D) Prediction results versus experimental data for the predicted donor materials with the RF algorithm and Daylight fingerprints. Credit: Science Advances, doi: 10.1126/sciadv.aay4275.

To test the reliability of the ML models, Sun et al. synthesized 10 new OPV donor molecules. Then used three representative fingerprints to express the chemical structure of the new molecules and compared the results predicted by the RF model and the experimental PCE values. The system classified eight of the 10 molecules. The results indicated the potential of the synthetic materials for OPV applications with additional experimental optimization for two of the new materials. A minor change in structure could cause a large difference in PCE values. Encouragingly, the ML models identified such minor modifications to facilitate favorable prediction results.

In this way, Wenbo Sun and colleagues used a literature database on OPV donor materials and a variety of programming language expressions (images, ASCII strings, descriptors and molecular fingerprints) to build ML models and predict the corresponding OPV PCE class. The team demonstrated a scheme to design OPV donor materials using ML approaches and experimental analysis. They prescreened a large number of donor materials using the ML model to identify leading candidates for synthesis and further experiments. The new work can speed up new donor material design to accelerate the development of high PCE OPVs. The use of ML in conjunction with experiments will progress materials discovery.

Explore further

XenonPy.MDL: A comprehensive library of pre-trained models for materials properties

More information: Yann LeCun et al. Deep learning, Nature (2015). DOI: 10.1038/nature14539Lingxian Meng et al. Organic and solution-processed tandem solar cells with 17.3% efficiency, Science (2018). DOI: 10.1126/science.aat2612

Wenbo Sun et al. Machine learning–assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials, Science Advances (2019). DOI: 10.1126/sciadv.aay4275

Journal information: Science Advances  Nature  Science

Researchers clear the path for ‘designer’ plants

Credit: CC0 Public Domain

A team of researchers at the University of Georgia has found a way to identify gene regulatory elements that could help produce “designer” plants and lead to improvements in food crops at a critical time. They published their findings in two separate papers in Nature Plants.

With the  projected to reach 9.1 billion by 2050, world food production will need to rise by 70% and food production in the developing world will need to double, according to estimates from the Food and Agricultural Organization of the United Nations. Improvements in  could play a key role in that effort.

The team, led by Bob Schmitz, demonstrated an ability to identify cis-regulatory elements, or CREs, in 13 , including maize, rice, green beans and barley.

Cis-regulatory elements are regions of noncoding DNA that regulate neighboring . If a gene and its CRE can be identified, they can be treated as a modular unit, sometimes called a biobrick. Targeting CREs for editing offers a more refined tool than editing genes, according to Schmitz, associate professor of genetics in the Franklin College of Arts and Sciences.

“Gene editing can be like a hammer. If you target the gene, you pretty much break it,” he said. “Targeting CREs, which are involved in controlling —how a particular characteristic appears—allows you to turn gene expression up or down, similar to a dial. It gives us a tool to create a whole range of variation in expression of a gene.”

Controlling a gene for leaf architecture, for example, might allow a plant breeder to choose the angle at which a leaf grows from a plant, which can play a significant role in the plant’s light absorption and growth. Targeting the gene itself would provide two options: “on,” where the leaf might grow at a 90-degree angle, and “off,” where the leaf might grow straight down. But targeting the CRE instead of the gene would allow the grower to target a range of possibilities in between—a 10-degree angle, a 25-degree angle, a 45-degree angle, etc.

Once biobricks have been created and screened for the desired output, they could be used to produce “designer”  that possess desirable characteristics—for example, salt-tolerant plants that can grow in a landscape with high salinity. The ability to design plants to grow in less-than-ideal landscapes will become more and more important as food growers strive to produce more in an environment facing increasing challenges, like drought and flooding.

Based on their success, the research team recently received a $3.5 million grant from the National Science Foundation to investigate the role of CREs in legumes, including peanuts and soybeans.

Underlying the grant proposal and the papers are technological breakthroughs developed by Zefu Lu, Bill Ricci and Lexiang Ji.

“Zefu took a high-throughput method for identifying specific elements that was developed for animal cells and found a way to apply it to plant cells. It took a long time to address the significant barrier of plant organellar genomes, but now we’re able to do what the animal field has been doing for a few years,” Schmitz said.

“When people try to find trait/disease associations, they look for mutations in genes, but the work in animals has shown that these non-gene regions also possess mutations that affect the way in which a gene is expressed. The regions we’re identifying with this method are revealing regulatory information for gene expression control, which traditionally has been challenging to detect compared to genes.”

One of Ricci’s contributions was developing a technique that shows the link between CREs and the gene they control.

“Typically CREs are located right next to the gene they control, but in plants with larger genomes—soybeans, maize—it’s become clear that these controlling elements can appear very far away,” Schmitz said. “In two-dimensional space something may appear far away, over many thousands of base pairs, but Bill’s method shows that in three dimensions, it’s actually positioned right next to the gene.”

This work—the first time it has been applied to plants—provided the foundation for the two papers published in Nature Plants, and Schmitz paid tribute to his team members’ contributions.

“This is a group effort,” he said. “Zefu, Bill and Lexiang were major drivers of this research.”

“Widespread Long-range Cis-Regulatory Elements in the Maize Genome” provides genetic, epigenomic and functional molecular evidence supporting the widespread existence of long-distance loci that act as long-range CREs influencing if and how a gene in the maize genome is expressed.

In “The prevalence, evolution and chromatin signatures of plant ,” the researchers identified thousands of CREs and revealed that long-distance CREs are prevalent in plants, especially in species with large and complex genomes. Additional results suggest that CREs function with distinct chromatin pathways to regulate gene expression.

The team’s work will be shared via publicly available epigenome browsers that were developed by Brigitte Hofmeister, a recent Ph.D. graduate from the Schmitz Lab.

“Our studies are genome wide, and we do a lot of technique and technology development, but it’s not useful if people can’t access it,” Schmitz said. “We provide epigenome browsers that allow people studying leaf architecture, for example, to access information on the specific genes or traits they’re interested in.”

Industry is also interested in CREs, according to Schmitz. Their editing pipeline is well established for genes, and the next obvious target for editing is CREs once they are located.

“It’s not just academia using this for basic science,” he said. “The applications of this approach to identify CREs will become commonplace in industry to improve crop performance.”

Explore further

Research explains how snakes lost their limbs

More information: William A. Ricci et al, Widespread long-range cis-regulatory elements in the maize genome, Nature Plants (2019). DOI: 10.1038/s41477-019-0547-0

Journal information: Nature Plants

Scientists develop a new method to detect light in the brain

Researchers from Istituto Italiano di Tecnologia, University of Salento, and Harvard Medical School have developed a new light-based method to capture and pinpoint the epicenter of neural activity. The study published on Nature Methods


Researchers from Istituto Italiano di Tecnologia (IIT) and University of Salento, both in Lecce, Italy, and Harvard Medical School in Boston have developed a new light-based method to capture and pinpoint the epicenter of neural activity in the brain.

The approach, described Oct. 7 in Nature Methods, lays the foundation for novel ways to map connections across different brain regions–an ability that can enable the design of devices to image various areas of the brain and even treat conditions that arise from malfunctions in cells inhabiting these regions, the researchers said.

The work was led by Ferruccio Pisanello at IIT, Massimo De Vittorio at IIT and University of Salento, and Bernardo Sabatini, the Alice and Rodman W. Moorhead III Professor of Neurobiology in the Blavantik Institute at Harvard Medical School, and funded by the European Research Council and by the National Institutes of Health in the United States.

One of the central challenges in modern neuroscience is recording the exchange of information between different regions of the brain, as well as between different cell types. The new method overcomes this challenge by allowing the simultaneous collection of signals from various brain regions through the use of a tapered optical probe.

The study marks the first instance of successfully using light to decode the activity of specific neuronal populations as well as manipulation of different brain regions with the use of a single probe. The approach relies on bringing fluorescent molecules into specific nerve cells in order to track their electric activity and to measure the level of neurotransmitters–molecules that act as chemical messengers across neurons.

To achieve this, the team used an optical fibre in the shape of a narrow cone with a tip so thin and so precise that it is capable of capturing light from single neurons along regions as long as 2 millimetres (0.07 inches).

The researchers inserted the light-sensing probe inside the striatum, a region of the brain involved in planning movements, and used it to track the release of dopamine, a critical neurotransmitter involved in motor control which also plays a key role in the development of disorders like Parkinson’s disease, schizophrenia and depression.

The device successfully captured neural activity in specific sub-regions of the striatum involved in the release of dopamine during specific behaviours.

The approach has effectively allowed scientists to capture how nerve signals travel in time and space and to gauge the concentration of specific neurotransmitters during specific actions. The method enriches researchers’ methodological repertoire and augments their ability to study the central nervous system and probe the molecular causes of neurological disorders.

7-Eleven now offers voice ordering through Alexa and Google Home

‘Hey Alexa, I need a slurpee.’
SOPA Images via Getty Images

For the days that leaving your house to get snacks and supplies is just too unfathomable, there’s 7-Eleven delivery. Now, for the days when looking at a screen is equally as taxing, there’s 7-Eleven voice ordering.

The company has launched 7Voice for Amazon Alexa and Google Home, which means that as long as you have the 7Now app and an account login, you can order all the stuff you want simply by asking, and it’ll be delivered to you in less than 30 minutes. Of course, this is a helpful feature for customers that have sight impairment or literacy issues, but will no doubt also appeal to anyone who’s found themselves hungover on the couch in dire need of a slurpee. Just speak it into existence, and it shall be so.

The app will automatically identify a customer’s location and places their order with the nearest participating store. There’s no minimum order, but delivery costs $3.99, although the first order you place through 7Now — voice or otherwise — is free. You’ll be able to keep tabs on the status of your order using real-time tracking, too.

If you’re an Alexa user, simply say “Alexa, enable 7NOW” then go to Skills > 7Now skill > Settings > Link Account. Say “Alexa, open 7NOW” and start ordering. If you’re on Google Home, say “OK Google, talk to 7NOW” and follow the steps on your phone to finish linking your account, and you’re ready to start ordering whatever your heart desires.

New Data From First Human Crispr Trials Shows Promising Results

Results from clinical trials released Tuesday indicate that two patients, one with beta thalassemia and one with sickle cell disease, have potentially been cured of their diseases. The two trials, which involved using Crispr to edit the genes of the patients in question, were jointly conducted by Vertex Pharmaceuticals and CRISPR Therapeutics.

“This is the first clinical evidence to demonstrate that Crispr/Cas9 can be used to cure or potentially cure serious genetic illnesses,” Jeffery Leiden, CEO of Vertex, told Forbes. “It’s a remarkable scientific and medical milestone.”

Crispr/Cas9 is a gene-editing system popular for its ability to snip, repair or insert genes into DNA. The therapies tested in the clinical trials work by extracting bone marrow stem cells from the patients, editing these stem cells to fix the genetic mutations that cause the diseases, and then infusing the cells back into the patients. The patient’s body then takes over and is able to produce new, healthy cells. Engineering of the cells is done ex vivo (outside of the patient’s body). This allows the researchers to make sure the correct changes are made and there are no improper edits to the genome.

Today In: Innovation

CTX001, the gene-editing therapy used in these trials, is “very surgical in how it makes the change,” says David Altshuler, Vertex’s chief scientific officer.

It has been nine months since the patient with beta thalassemia received the one-time-only treatment and over four months for the patient with sickle cell disease. In that time, both of their conditions have improved tremendously, Leiden says. The patient with beta thalassemia, who used to undergo more than 16 blood transfusions each year, hasn’t needed an infusion since the treatment. The patient with sickle cell disease experienced an average of seven excruciating health crises per year before the treatment, and since the treatment hasn’t experienced any.

Despite the fact that these results have only been seen in two patients, says Samarth Kulkarni, CEO of CRISPR Therapeutics, “the effect is so dramatic in these patients that we can’t help but think this brings a lot of promise.”

Both patients suffered side effects during the treatment, but doctors concluded they were caused by the bone marrow preparation, not the Crispr treatment itself. In order to infuse healthy stem cells, both patients had to undergo intensive chemotherapy to destroy their old bone marrow cells. This treatment, also common for bone cancer patients, can cause nausea, hair loss and organ damage.

Precision medicine is known for its hefty price tag, and this treatment is “the zenith of precision medicine,” Kulkarni says. Yet when asked about potential cost of the treatment, Kulkarni says that they are still focusing on clinical development and it is “too early to contemplate any sort of pricing discussions.” Zolgensma, the first FDA approved gene-therapy medication, was priced at $2.1 million last May.

The applications of Crispr seem limitless, but the field has encountered several ethical controversies. Last year, Chinese scientist He Jiankui shocked the medical community by announcing that he had altered the genes of two human children. One of the main worries that researchers have about Crispr is that scientists might alter genes to be inherited, a practice called germline engineering. In a recent article on the anniversary of He’s revelation, Crispr pioneer Jennifer Doudna called for stricter regulations for using Crispr in heritable human genome editing.

But germline editing isn’t a concern in these trials, where only somatic, or non-reproductive cells, were altered. People are “much more concerned about intentional changes to a person’s DNA that could be passed down to their descendants,” says Henry Greely, a Stanford law professor and chairman of the California Advisory Committee on Human Stem Cell Research. When it comes to somatic cells, “they die with the person,” he says.

In addition to following these initial patients for the next two years to see if their diseases reoccur, Leiden says they’re enrolling multiple patients with both diseases for the next phase of the clinical trial and will be starting treatments for those patients in the near future. While they don’t yet have a timeline on when the treatment will be commercially available, “we want to get this to patients as soon as possible,” he says.

Quantum light improves sensitivity of biological measurements

Quantum light improves sensitivity of biological measurements
A multidisciplinary group of researchers has demonstrated that quantum light controlled can be used to make accurate measurements in real time without disrupting enzymatic activity. Credit: Simonetta Pieroni

In a new study, researchers showed that quantum light can be used to track enzyme reactions in real time. The work brings together quantum physics and biology in an important step toward the development of quantum sensors for biomedical applications.

The  known as enzymes are responsible for many processes inside our bodies. However, they can be difficult to study with optical approaches because too much light will reduce their activity or even stop it altogether.

In The Optical Society (OSA) journal Optics Express, a multidisciplinary group of researchers showed that light controlled at the single-photon, or quantum, level can allow  without disrupting .

“Although it might be a few years before practical quantum sensors are achieved, this type of proof-of-principle experiment is important,” said research team leader Ilaria Gianani from Università degli Studi Roma Tre in Italy. “It helps pinpoint the areas where we can start building shared knowledge with other fields and reveals where  is needed to make progress.”

Single-photon control

When investigating biomolecules it is important to avoid using levels of light that might alter their properties or behavior. Achieving this can be challenging because low levels of light may not provide very much information and noise can easily overcome the faint signal. Today, enzymes are studied with measurements performed on assays collected from the main sample to avoid damaging the sample with light. This procedure not only takes time but also prevents direct observation of the enzymes in real time.

The researchers overcame this problem by developing a setup that allowed them to control the light extremely precisely—at the level of a single photon. This made it possible to use low illumination without disrupting the enzymes, with the potential to achieve a better sensitivity. The capability to address the sample directly also allowed dynamic tracking with higher resolution.

“Key to our success was a collaboration between quantum physicists, who know how to deal with photons, and biologists, who know how to deal with biological systems.” said Gianani. “Although it was difficult to exchange ideas at first, the team eventually grew together and developed a shared language that helped the work progress smoothly. This collaboration wouldn’t have been possible without the supervision of Prof. M. Barbieri, principal investigator of the Quantum Optics Group.”

Tracking enzyme activity

The researchers used their new technique to track changes in the chirality of a sucrose solution due to activity of an enzyme known as invertase. Tracking the chirality—the ability of a given molecule to rotate the polarization of light—provides information that can be used to determine how many molecules of sucrose have been processed by the enzymes. The experiments showed that  can be used to probe  activities in real time without perturbing the sample.

“This work is just one example of what quantum sensors could do,” said Gianani. “Quantum sensors could be used to optimally use light for countless applications, including biological imaging, magnetic field sensing and even detection of gravitational waves.”

The researchers say that there are some technological aspects to address before their approach could become a go-to method for tracking enzymatic reactions. For example,  losses are a strong limiting factor, but they hope their work will help spur technology development that could address this problem.

Explore further

A cavity leads to a strong interaction between light and matter

More information: Valeria Cimini et al, Adaptive tracking of enzymatic reactions with quantum light, Optics Express (2019). DOI: 10.1364/OE.27.035245

Journal information: Optics Express
Provided by The Optical Society


The media landscape in the home has changed precipitously over the years. Back in the days when torrents were king, DVD players and TVs started to sprout USB ports and various methods of playing digital videos, while hackers repurposed office machines and consoles into dedicated media boxes. [Roiy Zysman] is a fan of a clean, no-fuss approach, so built his PiVidBox along those lines.

The build, unsurprisingly, starts with a Raspberry Pi. Cheap, capable of playing most common codecs, and fitted with an HDMI port as standard, it’s a perfect platform for the job. Rather than fiddle with complex interfaces or media apps, instead, the PiVidBox uses a simple script. The Pi is configured to continually scan the /media folder for mounted devices, and play any videos it comes across. Simply pop in an SD card or USB drive, and the content starts rolling. No buttons, remotes, or keyboards needed!

It’s a interface without much flexibility, but it makes up for that in barebones simplicity. We can imagine it would come in handy for a conference room or other situation where users grow tired of messing around with configurations to get screens to work. The Raspberry Pi makes a rather excellent basis for a media player build, and we’ve seen some stunning examples in the past!