I work in genomics on resequencing systems (reads -> alignments, alignments -> genotypes). Right now I'm focusing on doing this with pan-genomic references. So I have a lot of experience in compute-heavy inference problems. I'm also invested in the problem because I see how much of a blocking issue this is for people doing genomics and biology. We have these great ways of observing evolution, which leaves artifacts in DNA, but almost no way to understand the molecular dynamics and interactions which manifest and nediate this process. Do you work on folding?
Hmm, I can think of a few projects building protein structures through inference / homology models / multiple sequence alignments, but most of this I haven't given any serious thought to in years. This might be of interest to you (There's a bunch of papers from the same group along the same lines), but it's more from the perspective of looking at structural evolution through physical models (Modeling the energetic changes in "evolutionary space"), rather than any sort of machine-learning approach. X-ray crystallography in undergrad.Do you work on folding?
I'm curious if there are ways of building folding programs that accelerate MD approaches using LTSM/RNN type neural network models. A literature review hasn't turned up anything related, although maybe in this case people have tried and failed. The closest thing I found was this paper on employing constraints in the MD simulations. The idea would be to train models that behave like human players of foldit. A friend of mine regards this as extremely uninteresting because it seems to him to be a case of incremental improvement. He hasn't been frustrated repeatedly by our lack of capability in this space. It would be a dream to do constructive biology on computers.
I'm pretty skeptical of the methods underlying those simulations given that most are fitting their parameters to either fully folded structures or molecules much smaller than any protein. Protein folding itself is difficult to experimentally measure, and you usually don't end up with a lot of data to base your fitted model on. But then again, having never run an MD simulation before, I'm probably not the right guy to ask on that front... Personally, having done completely computational analysis of other peoples data sets during undergrad, I'm much happier putting myself in a place where I'm on the data-generating side of things. In the case of protein folding / binding, there's still plenty of space for experimentalists to apply high-throughput techniques to screen new / larger libraries of proteins for various properties. Going for the gold of de novo prediction of structures is just so hard to get right when you're basing it on, say, the structures in the PDB, where quality control is often low and the conditions between structures varies so much.