Donate
Accelerating Materials Discovery with Data Science

Accelerating Materials Discovery with Data Science

From the ancient stone and bronze ages to the present-day silicon age, materials have defined eras of human civilization. Breakthroughs in materials can completely disrupt current technology and accelerate economic growth. For example, the development of the abundant resource of silicon has accelerated communications, energy storage and computation speeds. In today’s rapidly changing world, there is a need for more advanced materials to keep up with the demands for more efficient devices and for finding solutions to issues like climate change. (Melia, 2021) It’s no wonder why scientists are racing to synthesize the next superstar material.

One big inhibitor to the progress of creating new materials is that the process has always been extremely slow. Tedious lab processes are based on trial and error, and there is an immense combination of chemical elements. With an abundance of material options, there is a great opportunity for applying data science. This is where the growing field of materials informatics comes in. Over the last decade, there has been a targeted effort to incorporate data science and machine learning practices into the materials research process. This has benefitted the materials research community by accelerating experiments and promoting data-sharing to create a more collaborative field.  

Applications of data science were adopted early in fields that relied on high volumes of complex data, such as astronomy, biology, and medicine. These have led to discoveries of new galaxies and important drugs. While the adoption for materials discovery was slower, there is hope to accelerate the process. Groups hoping to propel materials discovery are not only researchers, but entire countries. In 2011, the US government launched the Materials Genome Initiative (MGI). This initiative states that its purpose is for “discovering, manufacturing, and deploying advanced materials twice as fast and at a fraction of the cost compared to traditional methods.” The program hopes to develop the infrastructure needed for this progress using data collection, simulations, and computational tools. Throughout, there is an emphasis on the importance of collaboration and educating researchers to accelerate advancements. (2021 MGI Strategic Plan) 

The complexity of materials informatics

The process of materials informatics is not a linear process. Like with many applied-data science fields, there is an iterative cycle of using existing knowledge to make informed decisions, confirming through experiments, and then repeating with added data. One of the main challenges comes from the fundamental puzzle of the structure-property relationship. (Krishna, 2021) This describes the issue of being able to predict the properties and behavior of a material from its molecular or microstructure. The effort to build predictive models must take into account descriptors such as chemical bonding, electronic properties and a diverse array of properties. 

When it comes to data analysis, the algorithm is quite important and is most effective when woven with underlying knowledge by materials scientists. (Mewburn, 2021) A simple comparison of two material properties may only need to be related using linear regression. For projects that include many variables, more complicated algorithms such as random forests or machine learning can be used. Supervised machine learning is valuable when the model must be trained to accomplish a certain task. Some researchers are even using unsupervised learning to look at large datasets of millions of descriptors. This is a machine learning method that is useful when there is no clear goal and is good for finding trends, for example, if complex multi-component materials share any similarities in their chemical components or properties. One implementation of unsupervised learning is self organized maps (SOM), which group vast numbers of materials onto a 2-dimensional map even though there could be hundreds of properties being compared between. Even though the dataset began with very high dimensionality, scientists can use this map in unlimited ways to reinforce understandings or find unexpected trends and outliers. (Qian, 2019)

Examples of Success

Some groups have been successful when using data science for materials discovery. One company, Citrine Informatics, identified two new thermoelectric materials. These are a class of materials that can convert between thermal and electrical energy. Citrine used a model that compiled many relevant thermal and electrical properties such as thermal conductivity and band gap. (Mewburn 2011) Citrine has since begun multiple projects that utilize artificial intelligence (AI) to optimize materials development processes such as nanoparticle synthesis and metal additive manufacturing. 

In one case study, Panasonic used Citrine’s platform to identify a molecule for organic semiconductors applied to flexible electronics. Organic semiconductors are costly and time-consuming to synthesize and test, and Panasonic wanted to expedite this process. They wanted a method to search for a particular molecular class called heteroacenes which are commonly used for spin coating, an essential technique to fabricating semiconductors. However, there are far too many types of heteroacenes to comb through and test individually. Citrine’s model analyzed a significant number of molecular combinations and used the knowledge of Panasonic’s expert scientists to inform their model. In a short amount of time, the model was able to identify at least 4 new compounds and provided the scientists with a greater understanding of how molecular structure relates to electronic performance. 

Applications for Sustainability 

There is an urgent need to tackle challenges relating to climate change and its effects on the environment and our communities. The tool of materials informatics can speed up the engineering of devices like more efficient batteries while taking into account the complexities of resource costs. Designing low-impact power sources will require consideration of the tradeoff between energy performance and the carbon footprint of extracting raw materials. This has been an issue with lithium-ion batteries, as lithium mining takes a severe toll on the environment. In work by Peerless et al, they created an AI model that predicts the energy performance of many different material combinations and takes into account how abundant they are as a raw resource. This provides a model for finding materials that reach certain requirements that also takes into account their scarcity in the environment. (Melia, 2021)

In addition to finding alternative materials, data science methods can also help screen for hazardous byproducts that are released during the manufacturing process. When it comes to the carbon footprint of materials manufacturing, Hertwich states that the materials production alone accounted for 23% of global CO2 emissions. (Hertwitch, 2019) There is still a long way to go before emissions are at a safe level, and manufacturing is a key area to tackle. With products and problems growing more complex by the day, industries are relying more on the power of data science to get ahead of the curve. 

Materials Informatics Resources

There are many efforts to improve online tools and make them even more accessible to users. The Materials Project provides many web-based platforms for searching materials, designing new ones, and predicting different properties. Quantum Espresso is an open-source set of methods for modeling materials at the nanoscale. Databases such as Granta and Crystallography Open Data list the properties of hundreds of thousands of materials.

Final Thoughts

Materials informatics is a promising application that can pave the way for more efficient materials discovery and collaborative research. The new field has allowed researchers and businesses to streamline the process of identifying promising candidates for a wide variety of applications from thermoelectrics to more sustainable devices. As materials informatics gains traction, it will be exciting to see what the next breakthrough materials will be. 


Sources:

 

Melia, H. R., Muckley, E. S., & Saal, J. E. (2021). Materials informatics and sustainability—The case for urgency. Data-Centric Engineering, 2

Materials Genome Initiative. (2021). Strategic Plan. https://www.mgi.gov/

Research Outreach. (2021). Pioneering materials informatics: Clean technologies from molecular design. https://researchoutreach.org/articles/pioneering-materials-informatics-clean-technologies-molecular-design/ 

Su, A., & Rajan, K. (2021). A database framework for rapid screening of structure-function relationships in PFAS chemistry. Scientific Data, 8(1), 1-10. 

Zhang, N. (2021). Materials informatics: a data-based approach for materials discovery. Mewburn Ellis.   https://www.mewburn.com/news-insights/materials-informatics-a-data-based-approach-to-materials-discovery

Qian, J., Nguyen, N. P., Oya, Y., Kikugawa, G., Okabe, T., Huang, Y., & Ohuchi, F. S. (2019). Introducing self-organized maps (SOM) as a visualization tool for materials research and education. Results in Materials, 4, 100020. 

Citrine Informatics. (2022). Case Studies. https://citrine.io/success/case-studies/ 

Hertwich, E. G. (2021). Increased carbon footprint of materials production driven by rise in investments. Nature Geoscience, 14(3), 151-155.

Inflation as a Result of Inflation

Inflation as a Result of Inflation

Blinding Lights

Blinding Lights