CV
Efstratios Rappos is a post-doc researcher at HEIG-VD, Switzerland currently working in the area of algorithm design and optimization for data science and Machine Learning.
He studied Mathematics at the University of Cambridge (BA/MA) and obtained a Masters in Mathematics (MMath/CASM/Part III) from the same University.
He obtained his Ph.D. in Combinatorial Optimization (Operations Research) at the Business School of Imperial College London. He also holds a MSc in Finance from Birkbeck College, University of London.
He has worked as a principal analyst responsible for leading a team of data scientists in the UK Department for Work and Pensions (DWP) developing mathematical models for large-scale data mining, automated data cleaning, forecasting and economic modelling using social security and demographic data. He has also worked as a post-doc at Imperial College London and taught statistics at LSE.
Competencies by category
- Data analytics
- Data cleaning and preprossessing datasets produced by many commercial tools (major db exports, system logs, website logs, social network data
- Java data structures for mapping, sorting, and apache commons tools
- Predictive analytics
- Forecasting, prediction
- Combinatorial optimization algorithms based on IBM CPLEX
- Real-time analytics
- Big Data
- Hadoop, hive, hbase, Presto, Storm, Spark
- Extensive use of the above tools, including optimizing configuration and programmatically manipulating the above in Java, Python and Linux script
- Machine Learning
- Research on decision trees and feature generation (for time series data)
- Large experience on online methods for very large datasets (vowpal wabbit)
- Scikit-learn tooklit and the like
- Integration scripts (connection to remote machines for data retrieval and analysis via APIs, mysql, posgresql, hbase, redis, mongodb, elasticsearch)
- Outlier detection methods (for risk scoring and fraud detection)
- Regression-based methods
- Handling of imbalanced data
- Part-of-speech tagging and information extraction (via Hidden Markov Chains)
- Textual analysis of Social network data (e.g., ~100 million messages of Twitter)
- Technical skills
- Expert knowledge of Java, Python and C++, very good knowledge of R
- Practical knowledge of efficient data structures and memory-reducing designs
- Research work on the parallelization of algorithms (eg CUDA C++) and/or multithreading approaches in Java and C++
I am a modeller: