On the $100 million U.S. project to determine the DNA changes that drive nine forms of cancer: It is “not likely to produce the truly breakthrough drugs that we now so desperately need,” Watson argued. On the idea that antioxidants such as those in colorful berries fight cancer: “The time has come to seriously ask whether antioxidant use much more likely causes than prevents cancer.”
Below a list of articles that discuss how biologists see equations on papers:
- T. Fawcett & A. Higginson. Heavy use of equations impedes communication among biologists, PNAS (2012).
- A. Fernandez. No evidence that equations cause impeded communication among biologistists, PNAS (2012).
- N. Chitnis & T. Smith. Mathematical illiteracy impedes progress in biology, PNAS (2012).
- J. Gibbons. Do not throw equations out with the theory bathwater, PNAS (2012).
- A. Kane. A suggestion on improving mathematically heavy papers, PNAS (2012).
Visionary entrepreneurs generally “have mental health profiles that are associated with higher levels of creativity, higher levels of energy, higher levels of risk tolerance and higher levels of impulsivity. Another way to look at impulsivity is a need for speed, a sense of urgency, higher motivation, and greater restlessness.” Elon Musk
Please find here a list of important papers compiled by the graduate students in this lab. These papers range from introductory to technical, general to project-specific, so you should be able to get a good idea of the types of research we are conducting.
Thinking of learning R or Python? Please see the following compilation of resources.
A little food for thought: Dr. Richard Feynman on the difference between Mathematics and Physics; followed by Dr. Murray Gell-Mann on truth, beauty, and physics; Dr. Steven Weinberg on whether mathematics is invented or discovered; and finishing with Jonathan Pillow on statistical modelling of neural data.
The following is a list of software that are used in our lab to analyze networks and data. Students and postdocs should be (or become) familiar with these methods.
Low Expertise Required:
MATLAB — generally used for data cleaning, data analysis, calculations, plot generation, etc.; you can get as simple or complicated as you want with it
Complex Networks toolbox —MATLAB toolbox by Lev Muchnik for analysis of complex networks; includes a k-shell decomposition algorithm
Machine learning toolbox — MATLAB toolbox for (basic) machine learning
Community detection/modularity algorithm — MATLAB code used to find community structure and modularity of a network
Network attributes algorithm — MATLAB code used to find network components, sizes, and lists of member nodes
Python — another general-use platform; again, uses can range in complexity
NetworkX — Python library used to find basic attributes of a network, such as the degree distribution
graph-tool — Python library for fast component decomposition, finding modularity, large network visualization
pandas — Python library used for data management
NumPy — Python library used for vector and matrix operations
SciPy — Python library for statistics, hypothesis testing, regression, and numerical computation
Beautiful Soup — Python library used for website scraping
Scikit-learn — Python library used for basic machine learning methods, including GLasso and stochastic gradient descent
ImageJ — Java image processing program used for optical CT imaging analysis
Gephi — visualization and analysis software for networks ***CAN BE BUGGY — SAVE WORK OFTEN***
Pajek — general network visualization software
Low/medium Expertise Required:
SQLite — used for Twitter data management and analysis
Medium Expertise Required:
Graphical Lasso (GLasso) algorithm — MATLAB code used to find a sparse inverse correlation matrix
Collective Influence algorithm — C code implementation of Collective Influence algorithm; can be downloaded on the Software page
Monte Carlo for Maximum Entropy XY model — C code to find interaction matrix for network which can be modelled via a Maximum Entropy XY model ***BEST FOR VERY SMALL NETWORKS***
FMRIB Software Library (FSL) — used for model-based FMRI analysis (FEAT) and modelling the brain (BET)
BrainNet Viewer — brain network visualization software
Medium/high Expertise Required:
High Expertise Required:
TensorFlow — used for Deep Learning development in machine learning
For computer analysis you will need:
For an introduction to Twitter network analysis, please see the following tutorial by postdoc Alexandre Bovet.
You can find further videos and tutorials pertinent to our research here, courtesy of the NIPS conference.
The courses below will allow you to analyze Big Data in a variety of circumstances ranging from systems biology, to ecology, to social networks and finance:
Complex Networks at the Graduate Center – Physics – PHYS85200 – CRN 23395 – Professor H. Makse
This is my course on Network Theory; please see the syllabus.
Machine Learning at the Graduate Center – Computer Science – CSC74020 – Professor R. Haralick or Professor C. Yuan
Professor Haralick focuses more on the theoretical aspect while Professor Yuan focuses more on Natural Language Processing.
Big Data Analysis: Principles and Methods at the Graduate Center – Physics – PHYS85200 – CRN 32250 – Professor G. Patz
More application than theory, this course is a good introduction to the topic.
Finance for Scientists at the Graduate Center – Physics – PHYS85200 – CRN 30235 – Professor T. Schäfer
This course provides a good mathematical background on stochastic processes.
Computational Methods in Physics at the Graduate Center – Physics – PHYS85200 – CRN 23394 – Professor A. Poje
Ideal for those who have some experience in programming but want to become more comfortable with applications such as Monte Carlo methods.
The following courses cover theoretical principles important to the core of our research program, and in fact, the first two are mandatory for first-year Ph.D. students at the Graduate Center:
Statistical Mechanics at the Graduate Center – Physics – PHYS74100
Mathematical Methods in Physics at the Graduate Center – Physics – PHYS70100
Quantum Information Theory at the Graduate Center – Physics – PHYS85200
Quantum Theory of Fields I & II at the Graduate Center – Physics – PHYS82500 and PHYS82600, respectively
There are also courses outside the CUNY system, which I suggest that you look into if you have time. New York University has a Center for Data Science, as does Columbia University. Some examples of online courses offered are:
Computational Physics – PHYS-GA-2000
Non-equilibrium Statistical Physics – PHYS-GA-2061
Online courses are also important to our field of study:
Deep Learning is an important subject for any data scientist to know, although there is no course currently offered in the CUNY system. My students are self-taught or take online courses.
If you are learning the Python programming language (the language for Data Science), the Python Data Science Handbook is a very useful resource, as are Python courses that can be found at Coursera or edX.
For Data Science, Machine Learning, and Big Data Analysis, most of my students use Python, MATLAB, C, C++, Mathematica, and other languages. Please see “For prospective students and postdocs: Software” for further details.
There are also a great many online courses on applications of Data Science that can be found here. They are mostly (if not all) free, and range in difficulty level from introductory, like “Introduction to Python for Data Science,” to advanced, like “Case Studies in Functional Genomics.” There is even, at the time of this writing, an introductory course in the application of Data Analysis to biological systems, called “Introduction to Bio: Annotation and Analysis of Genomes and Genomic Assays.”
The above are a sampling of what my students found online, so you can also look into it further.