Projects

A simple, no-code web app geared for academics and researchers without any computer science background to apply Bayesian optimization on their own real-world experimentation dataset. The code can be found here.

I am honoured to give an hour demo at the AAAI2023 conference to use the new mlr3 tools for human-in-the-loop Bayesian approaches, including single- and multi-point proposals and interpretability. Below are some examples of what the surrogate model has learned with respect to real-world parameters in fabricating high-quality laser-induced graphene.

Quality control in the manufacturing process of PCBs is usually challenging because a variety of defects occur inevitably due to mishandling or technical faults. Fig. 1 shows common defects in bare PCBs, such as open circuit, mouse bite, spur and missing hole. All these defects could cause the instability of the board or even damage the entire board. Therefore, an efficient, highly accurate automatic detection module needs to be implemented to inspect diverse defects during the PCB manufacturing process.

I have conducted a study to evaluate the performance of Bayesian Optimization (BO) algorithms for general optimization across a wide range of experimental materials science domains. I used six different materials systems, including laser-induced graphene, carbon nanotube polymer blends, silver nanoparticles, lead-halide perovskites, as well as additively manufactured polymer structures and shapes. I defined acceleration and enhancement metrics for general materials optimization objectives and found that Gaussian Process (GP) with anisotropic kernels and Random Forests (RF) had comparable performance in BO as surrogate models, both outperforming GP with isotropic kernels. GP with anisotropic kernel was more robust as a surrogate model across most design spaces, while RF is a close alternative with benefits of being free of distribution assumptions, having lower time complexities, and requiring less effort in initial hyperparameter selection. The study raises awareness about the benefits of using GP with anisotropic kernels over GP with isotropic kernels in future materials optimization campaigns.

I developed a universal synthetic dataset to aid in the development of machine learning methods for automated classification of spectroscopic data. The dataset includes artificial spectra representing various experimental techniques, with customizable parameters such as scan length and peak count. I simulated a dataset with 35,000 spectra from 500 unique classes and evaluated eight different machine learning architectures to automate classification. My findings shed light on the critical factors required for optimal performance in classification, and I made our scripts, dataset, and evaluation routines publicly available to improve machine learning models for spectroscopic analysis.

Following the success of raising funds for Genius Yield through an Initial Stake Pool Offering (ISPO, similar to the traditional IPO), I created a streaming data ingestion pipeline from the Cardano blockchain to the Genius X database. This is the world first multi-token ISPO that aides other web3 startups to raise funds. The data are visualized below:

Onchain Data with Looker

Real-time blockchain data shared internally at Genius Yield with Looker

Genius Yield (GY) created their first Initial Stake Pool Offering (ISPO, web3 equivalent to traditional IPO) in Dec 2021. For the ISPO, users stake their cryptocurrency to the GY stake pool to earn GY-native tokens GENS. I helped create the API schema for users to lookup their reward tokens during this period.

Data curation, writing, and proof-reading the conceptualizing of Genius Yield white paper

NFT allows a buyer to own an original item. Not only that, it contains built-in authentication, which serves as proof of ownership. Every consumer product that can not be eaten will become an NFT. In the Cardano pre-Alonzo era minting NFTs can be done in a semi-elegant way. While waiting for the plutus smart contracts, I wanted to build a telegram bot to help my family get rid of old stamp collections by making them into NFTs using their phone. See the demo below.

The core of the app is a python wrapper around the cardano-cliand using the os and subprocess modules to get python to "talk" to it, e.g. to check wallet UTxOs (this contains the NFT ticker UWYO shown in video):

>>> check_wallet_utxo(wallet)
['4a03c0d27287d70672046d0960ab8ea2c3f7cf7a6f06e46dba43c20d888d1435', '0', '4815699', 'lovelace', '+', '1', 'c470b5d803851809901fa2cc0f0f25cab2c6f2b359d88f6d404740d9.UWYO']

or to build the transaction:

# Build raw transaction tx
ada_return = available_lovelace - tx_fee
logging.info(f"Return this much plus token back to the "
    f"original funder: {ada_return} lovelace")
cmd = f'{config.CARDANO_CLI} transaction build-raw ' \
    f'--fee {tx_fee} ' \
    f'--tx-in {tx_hash}#{tx_ix} ' \
    f'--tx-out {token_data.creator_pay_addr}+{ada_return}+"{token_data.token_amount} {token_data.policy_id.strip()}.{token_data.token_ticker}" ' \
    f'--mint="{token_data.token_amount} {token_data.policy_id.strip()}.{token_data.token_ticker}" ' \
    f'--metadata-json-file {metadata_file} ' \
    f'--invalid-hereafter={invalid_after_slot} ' \
    f'--out-file {matx_raw}'

proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
# Command does not return anything
out = proc.communicate()

Minting via this approach would be deprecated once Plutus smart contracts arrive in September 2021.

+ Minting through mobile app and a Daedalus Testnet wallet is possible

Tags: cardano-cli, ipfs, blockfrost, SQLite, python, nix, homebrew

Advanced materials and manufacturing are important pillars and drivers of today's economy. NASA funded us to merge synergistic expertise in materials development and computer science for the development of powerful methods to design and model the behavior of advanced materials and manufacture of advanced devices. One case study is to integrate active learning experiments using a laser patterning system as part of the. Here, we created an autonomous system to pattern and characterize laser-induced graphene for supercapacitor and flexible sensor applications. The results are published in both materials science journals and IJCAI conference workshops.

+ Human experimentation cost of about a month is cut down to a week. + Patterned LIG quality improved four-fold against state-of-the-art (2020)

Tags: python, R, mlrMBO, pymeasure, DAQ

To address the lack of data in academia, once we have run our simulations, I make sure to publish the dataset online following FAIR data sharing guidelines. This includes results and metadata to allow reproducibility.

Rising energy demands require us to focus on more efficient energy sources, such as electrochemical energy conversion and storage. Recently, single-atom catalysts that contain isolated active metal sites have drawn much research interest, due to their maximum atomic efficiency and exceptional properties. The project aim here is to optimize the crystal structure to improve the adsorption and desorption of hydrogen. I leverage pymatgen tools to create functions in accelerating the structural design:

def create_small_(element,adsorbate=True):
    sup = CubicSupercellTransformation().apply_transformation(struct)
    idx = []
    for i,coords in enumerate(sup.cart_coords):
        if coords[0] < 3 : idx.append(i)
        elif coords[0] > 14: idx.append(i)
        elif coords[1] < 3 : idx.append(i) 
        elif coords[1] > 14 : idx.append(i)
    sup.remove_sites(idx)

    sup.remove_sites(find_corner_idx(sup))
    sup.remove_sites(mid_idx(sup))

    for i in around_idx(sup):
        sup[i] = 'N'

    for i in h_idx(sup): sup[i]='H'

    if adsorbate==True:
        sup.append('H',[find_x_center(sup),find_y_center(sup),9.0],coords_are_cartesian=True)
    sup.append(element,[find_x_center(sup),find_y_center(sup),7.5],coords_are_cartesian=True)
    
    return sup

Instead of manual submissions to the HPC cluster, I pass the structures through the aiida workflow manager, which runs on postgresql, to automatically handle errors and to restart calculations with complete provenance following FAIR guiding principles.

from aiida import orm
from aiida import plugins
from aiida.plugins import DataFactory
from aiida.engine import submit
from aiida.orm.nodes.data.upf import get_pseudos_from_structure

PwBaseWorkChain = plugins.WorkflowFactory('quantumespresso.pw.base')

code = load_code('qe-6.6-pw@arcc-msi')

structures = small

StructureData = DataFactory("structure")
KpointsData = DataFactory('array.kpoints')
kpoints = KpointsData()
kpoints.set_kpoints_mesh([1,1,1])

inputs = {
    'pw': {
        'code': code,
        'parameters': orm.Dict(dict={
            'CONTROL': {
                'calculation':'scf',
            },
            'SYSTEM':{
                'ecutwfc':150.,
                'occupations':'smearing',
                'degauss':0.02
            },
            'ELECTRONS':{
                'conv_thr':1.e-6,
            }
        }),
        'metadata':{
            'label':'LF-smallH',
            'options':{
                'account':'rd-hea',
                'resources':{
                    'num_machines':1,
                    'num_cores_per_mpiproc':32
                },
                'max_wallclock_seconds':1*24*60*60,
                'max_memory_kb':int(128e6)
            }
        }
    },
    'kpoints': kpoints,
}

for structure in structures:
    inputs['pw']['structure'] = StructureData(pymatgen_structure=structure)
    inputs['pw']['pseudos'] = get_pseudos_from_structure(StructureData(pymatgen=structure),'SSSP')
    submit(PwBaseWorkChain, **inputs)

The crystal structure and binding energy of hydrogen were used to train and benchmark several machine learning models. The results are presented at the MRS conference (2021).

+ Machine learning approaches using mlr3 accelerates predictions by fours orders of magnitude when compared to traditional methods (DFT)

Tags: python, R, mlr3, postgresql, aiida, pymatgen

Last updated