HI I’M DANIEL MARCOUS
Hi! I’m Daniel, a Data Wizard - doing magic with data 🧙
I love bleeding edge tech & science, especially if it has to do with data.
I'm passionate about innovation, using the next technology before everyone else has even heard of it, and making crazy ideas turn into the best IT systems.
I’m currently leading data science @Google, Waze where in the last 5 years I’ve had a bunch of different titles including data scientist, data engineer, TLing data processing and architecture and more.
During that time I got to lead cool projects like Carpool matching (ML model), ETA & routing for motorcycles (ML model), funnel / behavioural self service analytics platform (not an ML model but still fun).
Before that I’ve spent a few years at Israeli Defense Forces, founding and leading the first big data team (now an entire division), cyber intelligence using ML and a center of excellence for tech innovation.
As a personal goal, I'm highly committed to advancing the Israeli data science and big data communities.
Other than that I spend time perfecting my mixology skills.
What I'm working on and excited about at the moment
What I’ve Done
September 2018 - Present
DATA SCIENCE MANAGER, GOOGLE, WAZE
Data Science Manager & Data Wizard - Practitioner of Data science, Analytics & (Big) Data Engineering.
Leading a team of data scientists in the field of Advertising & Monetization (team management & hands-on tech leading).
November 2014 - Present
TECH LEAD & DATA WIZARD, GOOGLE, WAZE
Leading technical aspects of company data science, big data engineering & analytics.
June 2014 - October 2014
FOUNDER & LEADER OF CENTER OF EXCELLENCE, IDF, MAMRAM
R&D innovative team, bringing crazy ideas to life in order to magnify business value
January 2013 - October 2014
BIG DATA LEADER & CTO, IDF, MAMRAM
Founded and led the first team of big data engineers in the IDF.
Data CTO - assessing and studying new technologies in the field.
Providing big data and analytics solutions for top IDF projects.
Cross IDF expert on: big data engineering (Hadoop), NoSQL, big data visualization etc.
May 2009 - March 2013
DBA, DATA ENGINEER & TEAM LEAD, IDF, MAMRAM
Senior member and later leader (manager) of a data engineering team specializing ETL technologies and database administration.
What I've [Formally] Learned
2016 - 2020
MSC, BEN-GURION UNIVERSITY OF THE NEGEV
Master of Science in Information Systems Engineering - Data Science).
Master dissertation in machine learning - Clustering of Big Geospatial Data.
2011 - 2015
BA, THE OPEN UNIVERSITY OF ISRAEL
Double degree in management and computer science.
First year- president honors.
Second, third year and overall - dean honors.
Graduate Cum Laude.
September 2008 - April 2009
SOFTWARE ENGINEERING, SCHOOL FOR COMPUTER PROFESSIONS, IDF
Intensive & prestige IDF course for software engineering.
Stanford University - Statistical Learning
Coursera - Bayesian Statistics
Coursera - Machine Learning
Johns Hopkins (Coursera) - Data Science Specialization
Kaggle Israel community group.
Running regular monthly meetups at https://www.meetup.com/DataHack/events/
Kaggle IL Offering :
Come and work together (or alone) on a live, on-going Kaggle competition.
Receive (if you want it) mentoring and tips & tricks from Kaggle masters, Kaggle Days (Paris) competition winners and top Israeli competitive ML dogs and experts.
Be a part of the first Israeli community in the field of competitive machine learning.
Share knowledge with other experts around competitive ML and learn bleeding edge ML techniques by studying Kaggle Kernels and hearing talks from Kaggle Masters.
The Association for The Advancement of Data Science In Israel.
Datahack is an Israeli non-profit dedicated to the advancement of data science and machine learning in Israel. We focus on strengthening academia-industry ties, data science literacy and education, intra-community cohesion and knowledge sharing and empowerment of underrepresented populations.
Provide a set of tools to make working with geospatial objects quick and painless. These tools were designed with S2 objects (Google's "geometry on a sphere" abstractions) in mind as the leading data structure to be used when working with geospatial data.
Provides deepboost models training, evaluation, predicting and hyperparameter tuning using grid search and cross validation.
Based on Google's Deep Boosting algorithm by Cortes et al.
Over 19k package downloads !
Distributed Density based Geospatial Clustering of Applications with Noise over large datasets, using Apache Spark
Utilities for data science work, including templated for EDA, cleaning, modeling, pipelining.
Including notebooks and importable Python utilities with advanced scikit-learn compatible transformers I commonly use.
During this course you will take your first steps as a data scientist. By the end of the training you would have already trained your very own model on a real world dataset, and be able to use it for predictions.
This covers both basic theoretical background and practical skills required to successfully tackle your first data science project!
The content in this workshop covers the preliminary concepts & skills necessary for data science work. Although considered “preliminary” (a must have) we will cover them with the depth necessary to fully understand and utilize later.
These include the data science workflow, ML task types (or - what can / should I do with ML), popular ML algorithms (models), gradient descent (what “training” a model means), avoiding overfitting (generalisation, complexity issues, validation etc.) and hyper parameter tuning.
We will go over simple python code for all of the above and later in the workshop use these to compete (together or alone) in a Kaggle competition to practice this for real.
During this course you will gain a basic understanding of TensorFlow and practice basic coding with TensorFlow 2.0
We will cover :
1. What is TensorFlow (Why TF 2.0 >> TF 1.0)
2. High Level APIs (Keras & Estimators)
3. TensorFlow Components (tf.data, Checkpoints, Accelerators)
4. TensorFlow Hub & Transfer Learning basics
5. TensorBoard Overview
6. TFX useful components overview (tfdv, tfma)
We will have many code examples.
You'll get an understanding of what they
do, how to change them and how to use them for your data.
Presentation : tiny.cc/tf2-101-preso
Code : tiny.cc/tf2-101-lab
Practical introduction and applications with code to preprocessing for machine learning.
We will use scikit transformers in this course.
Practical introduction and applications with code to machine learning model tuning.
We will go over hyperparameter tuning, evaluation schemes, regularization and more.
Latest / Buzziest / Most Innovative / Most Promising in software engineering of 2019.
Based on :
work of leading companies (Netflix, Google, Spotify…)
Trends in tech world
Opinions (of the people that set the tone - e.g. Martin Fowler)
The world of transportation is radically changing.
It is an industry with immense technological challenges, most of which are AI related.
In the current paste and major active industry players, it will become unrecognisable in following years.
In this talk I aim to cover the different fields that it includes, data science related problems that it poses, and current state of the art solutions.
The focus of this talk will be smart cities, which multiple teams @Google work on, including mine and myself.
I will present my own work and other smart city topics research and solutions by my counterparts at Google and Uber.
2019 - @Armis - Devices Gone Rogue Anomaly Detection - shorturl.at/aenVZ
2018 - @Microsoft - AI Based Math Problem Solver - shorturl.at/cqBU8
2016 - @Final - NYC Taxi ETA Prediction Challenge - shorturl.at/hjmJP
Overview of the Google's S2 library for two-dimensional projections on a three-dimensional sphere (similar to a globe).
How to create and maintain a production-ready BIG ML Workflow - From Zero to Hero.
Intro to cloud AI platform notebooks, creating a Google Cloud Platform account and redeeming credit.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
An overview of the distributed database landscape - what it is, how it works, who uses it and for what purpose.
Analyzing and explaining the intersection between the fields of big data and data visualization including domain theory and practical examples.