CV

Efstratios Rappos is a post-doc researcher at HEIG-VD, Switzerland currently working in the area of algorithm design and optimization for data science and Machine Learning.

He studied Mathematics at the University of Cambridge (BA/MA) and obtained a  Masters in Mathematics (MMath/CASM/Part III)  from the same University.

He obtained his Ph.D. in Combinatorial Optimization (Operations Research) at the Business School of Imperial College London.  He also holds a MSc in Finance from Birkbeck College, University of London.

He has worked as a principal analyst responsible for leading a team of data scientists in the UK Department for Work and Pensions (DWP) developing mathematical models for large-scale data mining, automated data cleaning, forecasting and economic modelling using social security and demographic data. He has also worked as a post-doc at Imperial College London and taught statistics at LSE.

 LinkedIn          Scholar

 

Competencies by category

  • Data analytics
    • Data cleaning and preprossessing datasets produced by many commercial tools (major db exports, system logs, website logs, social network data
    • Java data structures for mapping, sorting, and apache commons tools
  • Predictive analytics
    • Forecasting, prediction
    • Combinatorial optimization algorithms based on IBM CPLEX
    • Real-time analytics
  • Big Data
    • Hadoop, hive, hbase, Presto, Storm, Spark
    • Extensive use of the above tools, including optimizing configuration and programmatically manipulating the above in Java, Python and Linux script
  • Machine Learning
    • Research on decision trees and feature generation (for time series data)
    • Large experience on online methods for very large datasets (vowpal wabbit)
    • Scikit-learn tooklit and the like
    • Integration scripts (connection to remote machines for data retrieval and analysis via APIs, mysql, posgresql, hbase, redis, mongodb, elasticsearch)
    • Outlier detection methods (for risk scoring and fraud detection)
    • Regression-based methods
    • Handling of imbalanced data
    • Part-of-speech tagging and information extraction (via Hidden Markov Chains)
    • Textual analysis of Social network data (e.g., ~100 million messages of Twitter)
  • Technical skills
    • Expert knowledge of Java, Python and C++, very good knowledge of R
    • Practical knowledge of efficient data structures and memory-reducing designs
    • Research work on the parallelization of algorithms (eg CUDA C++) and/or multithreading approaches in Java and C++

 I am a modeller: