Ongoing Projects

This NAI team will explore catalysis of electron transfer reactions by prebiotic peptides to microbial ancestral enzymes to modern nanomachines, integrated over four and a half billion years of Earth’s changing geosphere. Theme 1 focuses on the synthesis and function of the earliest peptides capable of moving electrons on Earth and other planetary bodies. Theme 2 focuses on the evolutionary history of “motifs” in extant protein structures. Theme 3 focuses on how proteins and the geosphere co-evolved through geologic time.

Earth's living and non-living components have co-evolved for 4 billion years through numerous positive and negative feedbacks. Earth and life scientists have amassed vast amounts of data in diverse fields related to planetary evolution through deep time-mineralogy and petrology, paleobiology and paleontology, paleotectonics and paleomagnetism, geochemistry and geochrononology, genomics and proteomics, and more. Yet our ability to document, model, and explore these complex, intertwined changes has been hampered by a lack of data integration from these complementary disciplines. We propose a new program of data-driven discovery in the Earth and life sciences. We want to develop, curate, and integrate diverse data resources to focus on our planet's changing near-surface oxidation state and the rise of oxygen through deep time-a critical problem that exemplifies this co-evolution and underscores the opportunities and challenges of deciphering transient characteristics of Earth's history. Using abductive reasoning applied to our newly developed "Deep-Time Data Infrastructure" to discover patterns in the evolution of our planet's environment, we will create and merge the integrated data sets, statistical methods, and visualization tools that inspire and test hypotheses applicable to modeling Earth's past and today's changing environment.

Deep Carbon Observatory (2017 - Present)

Recent advances in data generation techniques, whether by experiments, measurements or computer simulation, quickly provide complex data characterized by source heterogeneity, multiple modalities, often high volume, high dimensionality, and multiple scales (temporal, spatial, and function). In turn, science and engineering disciplines are rapidly becoming more and more data driven by a variety of goals (the Deep Carbon Observatory is an exemplar); higher sample throughput, high resolution, additional physics/ chemistry/ biology, new instrumentation, and new integrated databases all with the ultimate aim of better understanding/modeling of the complex systems and their dynamics that underlie the processes being studied. However, analyzing libraries of complex data requires managing the inherent complexity to allow integration of the information and knowledge across multiple scales and spanning traditional disciplinary boundaries. Significant advances in methods, tools and applications for data science and informatics over the last five years can now be applied to multi- and inter-disciplinary problem areas. Virtual Observatories, Virtual Organizations, complex networks, linked data across systems, full life cycle data management, data integration, citation and attribution are now increasingly becoming an integral part of projects whether small (few people, one organization, modest data needs) or the very large (many investigators, organizations, diverse data needs).

Given this increasing data deluge, it is clear that each of the Directorates in the Deep Carbon Observatory face diverse data science and data management needs to fulfill both their decadal strategic objectives and their day-to-day tasks. This project will assess in detail the data science and data management needs for each DCO directorate and for the DCO as a whole, using a combination of informatics methods; use case development, requirements analysis, inventories and interviews.