HI I’M DANIEL MARCOUS
MY STORY
Hi! I’m Daniel, a Data Wizard - doing magic with data 🧙
I love bleeding edge tech & science, especially if it has to do with data.
I'm passionate about using innovative tech for solving real world existing problems.
I'm currently Co-Founder | CTO @April, using tech to solve tax and save people time & money.
Previously spent 7 years @Google where I did everything from IC to TL, manager, AI strategy for Google Cloud, data science lead & CTO @Waze.
Most recent was :
-
Founding and acting member of Waze's CTO office
-
Founding and leading Waze's data guild (data engineering, science, product analytics and business analytics)
-
Rendering Waze intelligent using ML
Before that I’ve spent a few years at Israeli Defense Forces, founding and leading the first big data team (now an entire division), doing cyber intelligence using ML and founding center of excellence for tech innovation.
As a personal goal, I'm highly committed to advancing the Israeli data science and big data communities.
-
I'm a leading staff member of the DataHack team
-
Founder of DataLearn
-
Founder & leader of the Kaggle IL community.
-
I dedicate time to the community by coding, teaching and speaking at multiple events.
Other than that I spend time perfecting my mixology skills.
FEATURED
What I'm working on and excited about at the moment
WORK EXPERIENCE
What I’ve Done
August 2021 - Present
CO-FOUNDER | CTO
@APRIL
Solving Tax with Tech.
Backed by Team8 Fintech.
February 2020 - September 2021
CTO & DATA SCIENCE LEAD
@GOOGLE, WAZE
Lead data science and big data engineering as area tech lead for Waze (ATL, partially hands on).
Co-founder & member of Waze's office of the CTO.
Lead tech vision and empower technological excellence across the company.
Review and approve system design.
Lead AI & data strategy in partnership with Google Cloud
September 2018 - September 2021
DATA SCIENCE MANAGER @GOOGLE, WAZE
Data Science Manager & Data Wizard - Practitioner of Data science, Analytics & (Big) Data Engineering.
Leading a team of data scientists in the field of Advertising & Monetization (team management & hands-on tech leading).
November 2014 - February 2020
TECH LEAD & DATA WIZARD
@GOOGLE, WAZE
Leading technical aspects of company data science, big data engineering & analytics.
January 2013 - October 2014
BIG DATA LEADER & CTO | FOUNDER & LEADER OF CENTER OF EXCELLENCE
@IDF, MAMRAM
Founded and led the first team of big data engineers in the IDF.
Data CTO - assessing and studying new technologies in the field.
Providing big data and analytics solutions for top IDF projects.
Cross IDF expert on: big data engineering (Hadoop), NoSQL, big data visualization etc.
May 2009 - March 2013
DBA, DATA ENGINEER & TEAM LEAD
@IDF, MAMRAM
Senior member and later leader (manager) of a data engineering team specializing ETL technologies and database administration.
ACADEMIC EXPERIENCE
What I've [Formally] Learned
2016 - 2020
MSC, BEN-GURION UNIVERSITY OF THE NEGEV
Master of Science in Information Systems Engineering - Data Science).
Master dissertation in machine learning - Clustering of Big Geospatial Data.
2011 - 2015
BA, THE OPEN UNIVERSITY OF ISRAEL
Double degree in management and computer science.
First year- president honors.
Second, third year and overall - dean honors.
Graduate Cum Laude.
September 2008 - April 2009
SOFTWARE ENGINEERING, SCHOOL FOR COMPUTER PROFESSIONS, IDF
Intensive & prestige IDF course for software engineering.
Various
ONLINE
Stanford University - Statistical Learning
Coursera - Bayesian Statistics
Coursera - Machine Learning
Johns Hopkins (Coursera) - Data Science Specialization
PUBLISHED WORK
PROJECTS
I Contribute
Matching the best rider-driver couples for Waze Carpool using machine learning
Waze's motorcycle navigation mode - creating a new route recommendation mechanism and ETA prediction models specialised for motorcycle drives using machine learning.
Kaggle Israel community group.
Running regular monthly meetups at https://www.meetup.com/DataHack/events/
Kaggle IL Offering :
Come and work together (or alone) on a live, on-going Kaggle competition.
Receive (if you want it) mentoring and tips & tricks from Kaggle masters, Kaggle Days (Paris) competition winners and top Israeli competitive ML dogs and experts.
Be a part of the first Israeli community in the field of competitive machine learning.
Share knowledge with other experts around competitive ML and learn bleeding edge ML techniques by studying Kaggle Kernels and hearing talks from Kaggle Masters.
The Association for The Advancement of Data Science In Israel.
Datahack is an Israeli non-profit dedicated to the advancement of data science and machine learning in Israel. We focus on strengthening academia-industry ties, data science literacy and education, intra-community cohesion and knowledge sharing and empowerment of underrepresented populations.
CODE
I create
Provide a set of tools to make working with geospatial objects quick and painless. These tools were designed with S2 objects (Google's "geometry on a sphere" abstractions) in mind as the leading data structure to be used when working with geospatial data.
A novel distributed implementation for: k Betweenness Centrality (kBC) algorithm for Spark using GraphX.
Fun facts :
Used in production at several companies
Taught at several university courses
39 Github starts
How it works : shorturl.at/efBCP
Provides deepboost models training, evaluation, predicting and hyperparameter tuning using grid search and cross validation.
Based on Google's Deep Boosting algorithm by Cortes et al.
Over 19k package downloads !
An implementation of GloVe model for learning word representations for big text corpuses distributed with Apache Spark.
Based on the original implementation : https://github.com/stanfordnlp/GloVe
Distributed Density based Geospatial Clustering of Applications with Noise over large datasets, using Apache Spark
First place solution for the Armis DataHack 2019 Challenge - Devices Gone Rogue.
Anomaly detection in network data using self supervised learning
Utilities for data science work, including templated for EDA, cleaning, modeling, pipelining.
Including notebooks and importable Python utilities with advanced scikit-learn compatible transformers I commonly use.
Music recommender system based solely on song audio using Wavenet embeddings.
Mac workstation config from days to minutes
COURSES
I Teach
5 hours
During this course you will take your first steps as a data scientist. By the end of the training you would have already trained your very own model on a real world dataset, and be able to use it for predictions.
This covers both basic theoretical background and practical skills required to successfully tackle your first data science project!
The content in this workshop covers the preliminary concepts & skills necessary for data science work. Although considered “preliminary” (a must have) we will cover them with the depth necessary to fully understand and utilize later.
These include the data science workflow, ML task types (or - what can / should I do with ML), popular ML algorithms (models), gradient descent (what “training” a model means), avoiding overfitting (generalisation, complexity issues, validation etc.) and hyper parameter tuning.
We will go over simple python code for all of the above and later in the workshop use these to compete (together or alone) in a Kaggle competition to practice this for real.
Presentation : https://goo.gl/i8ttz9
Code : https://goo.gl/9jTDb1
5 hours
During this course you will gain a basic understanding of TensorFlow and practice basic coding with TensorFlow 2.0
We will cover :
1. What is TensorFlow (Why TF 2.0 >> TF 1.0)
2. High Level APIs (Keras & Estimators)
3. TensorFlow Components (tf.data, Checkpoints, Accelerators)
4. TensorFlow Hub & Transfer Learning basics
5. TensorBoard Overview
6. TFX useful components overview (tfdv, tfma)
We will have many code examples.
You'll get an understanding of what they
do, how to change them and how to use them for your data.
Presentation : tiny.cc/tf2-101-preso
Code : tiny.cc/tf2-101-lab
2 hours
Practical introduction and applications with code to preprocessing for machine learning.
We will use scikit transformers in this course.
Presentation : https://goo.gl/q6a376
Code : https://goo.gl/XNmkBW
2 hours
Practical introduction and applications with code to machine learning model tuning.
We will go over hyperparameter tuning, evaluation schemes, regularization and more.
Presentation : https://goo.gl/6nnVpy
Code : https://goo.gl/UXWSWh
2 hours
Practical applications of advanced ML models and methods.
We will go over ensembling, boosting, transfer learning and autoML.
Presentation : https://goo.gl/XDHCiV
Code : https://goo.gl/u1zTf3
PRESENTATIONS
I Speak
Overview of current technologies for deepfake detection.
Tips & tricks in relation to this Kaggle competition
Latest / Buzziest / Most Innovative / Most Promising in software engineering of 2019.
Based on :
-
work of leading companies (Netflix, Google, Spotify…)
-
Trends in tech world
-
Opinions (of the people that set the tone - e.g. Martin Fowler)
The world of transportation is radically changing.
It is an industry with immense technological challenges, most of which are AI related.
In the current paste and major active industry players, it will become unrecognisable in following years.
In this talk I aim to cover the different fields that it includes, data science related problems that it poses, and current state of the art solutions.
The focus of this talk will be smart cities, which multiple teams @Google work on, including mine and myself.
I will present my own work and other smart city topics research and solutions by my counterparts at Google and Uber.
HACKATHON WINS
2019 - @Armis - Devices Gone Rogue Anomaly Detection - code
2018 - @Microsoft - AI Based Math Problem Solver
2016 - @Final - NYC Taxi ETA Prediction Challenge
Overview of the Google's S2 library for two-dimensional projections on a three-dimensional sphere (similar to a globe).
How to create and maintain a production-ready BIG ML Workflow - From Zero to Hero.
Intro to cloud AI platform notebooks, creating a Google Cloud Platform account and redeeming credit.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
An overview of the distributed database landscape - what it is, how it works, who uses it and for what purpose.
Analyzing and explaining the intersection between the fields of big data and data visualization including domain theory and practical examples.