HI I’M DANIEL MARCOUS

 

MY STORY

Hi! I’m Daniel, a Data Wizard - doing magic with data 🧙

I love bleeding edge tech & science, especially if it has to do with data.

I'm passionate about innovation, using the next technology before everyone else has even heard of it, and making crazy ideas turn into the best IT systems.

I’m currently leading data science @Google, Waze where in the last 5 years I’ve had a bunch of different titles including data scientist, data engineer, TLing data processing and architecture and more.
During that time I got to lead cool projects like Carpool matching (ML model), ETA & routing for motorcycles (ML model), funnel / behavioural self service analytics platform (not an ML model but still fun).

Before that I’ve spent a few years at Israeli Defense Forces, founding and leading the first big data team (now an entire division), cyber intelligence using ML and a center of excellence for tech innovation.

As a personal goal, I'm highly committed to advancing the Israeli data science and big data communities.

I'm a leading staff member of the DataHack team, founder of DataLearn, and founder & leader of the Kaggle IL community.

I dedicate time to the community by coding, teaching and speaking at multiple events.

Other than that I spend time perfecting my mixology skills.

 

FEATURED

What I'm working on and excited about at the moment

 
 

WORK EXPERIENCE

What I’ve Done

 

September 2018 - Present

DATA SCIENCE MANAGER, GOOGLE, WAZE

Data Science Manager & Data Wizard - Practitioner of Data science, Analytics & (Big) Data Engineering.
Leading a team of data scientists in the field of Advertising & Monetization (team management & hands-on tech leading).

November 2014 - Present

TECH LEAD & DATA WIZARD, GOOGLE, WAZE

Leading technical aspects of company data science, big data engineering & analytics.

June 2014 - October 2014

FOUNDER & LEADER OF CENTER OF EXCELLENCE, IDF, MAMRAM

R&D innovative team, bringing crazy ideas to life in order to magnify business value

January 2013 - October 2014

BIG DATA LEADER & CTO, IDF, MAMRAM

Founded and led the first team of big data engineers in the IDF.
Data CTO - assessing and studying new technologies in the field.

Providing big data and analytics solutions for top IDF projects.
Cross IDF expert on: big data engineering (Hadoop), NoSQL, big data visualization etc.

May 2009 - March 2013

DBA, DATA ENGINEER & TEAM LEAD, IDF, MAMRAM

Senior member and later leader (manager) of a data engineering team specializing  ETL technologies and database administration.

ACADEMIC EXPERIENCE

What I've [Formally] Learned

 

2016 - 2020

MSC, BEN-GURION UNIVERSITY OF THE NEGEV

Master of Science in Information Systems Engineering - Data Science).
Master dissertation in machine learning - Clustering of Big Geospatial Data.

2011 - 2015

BA, THE OPEN UNIVERSITY OF ISRAEL

Double degree in management and computer science.
First year- president honors.
Second, third year and overall - dean honors.

Graduate Cum Laude.

September 2008 - April 2009

SOFTWARE ENGINEERING, SCHOOL FOR COMPUTER PROFESSIONS, IDF

Intensive & prestige IDF course for software engineering.

Various

ONLINE

Stanford University - Statistical Learning
Coursera - Bayesian Statistics
Coursera - Machine Learning
Johns Hopkins (Coursera) - Data Science Specialization

PUBLISHED WORK

 

PROJECTS

I Contribute

Matching the best rider-driver couples for Waze Carpool using machine learning

Waze's motorcycle navigation mode - creating a new route recommendation mechanism and ETA prediction models specialised for motorcycle drives using machine learning.

Kaggle Israel community group.

Running regular monthly meetups at https://www.meetup.com/DataHack/events/

Kaggle IL Offering :

  • Come and work together (or alone) on a live, on-going Kaggle competition.

  • Receive (if you want it) mentoring and tips & tricks from Kaggle masters, Kaggle Days (Paris) competition winners and top Israeli competitive ML dogs and experts.

  • Be a part of the first Israeli community in the field of competitive machine learning.

  • Share knowledge with other experts around competitive ML and learn bleeding edge ML techniques by studying Kaggle Kernels and hearing talks from Kaggle Masters.

The Association for The Advancement of Data Science In Israel.
Datahack is an Israeli non-profit dedicated to the advancement of data science and machine learning in Israel. We focus on strengthening academia-industry ties, data science literacy and education, intra-community cohesion and knowledge sharing and empowerment of underrepresented populations.

 

CODE

I create

Provide a set of tools to make working with geospatial objects quick and painless. These tools were designed with S2 objects (Google's "geometry on a sphere" abstractions) in mind as the leading data structure to be used when working with geospatial data.

A novel distributed implementation for: k Betweenness Centrality (kBC) algorithm for Spark using GraphX.

Fun facts :

  1. Used in production at several companies

  2. Taught at several university courses

  3. 39 Github starts

How it works : shorturl.at/efBCP

Provides deepboost models training, evaluation, predicting and hyperparameter tuning using grid search and cross validation.
Based on Google's Deep Boosting algorithm by Cortes et al.

Over 19k package downloads !

An implementation of GloVe model for learning word representations for big text corpuses distributed with Apache Spark.

Based on the original implementation : https://github.com/stanfordnlp/GloVe

Distributed Density based Geospatial Clustering of Applications with Noise over large datasets, using Apache Spark

First place solution for the Armis DataHack 2019 Challenge - Devices Gone Rogue.
Anomaly detection in network data using self supervised learning

Utilities for data science work, including templated for EDA, cleaning, modeling, pipelining.
Including notebooks and importable Python utilities with advanced scikit-learn compatible transformers I commonly use.

Music recommender system based solely on song audio using Wavenet embeddings.

 

COURSES

I Teach

5 hours

During this course you will take your first steps as a data scientist. By the end of the training you would have already trained your very own model on a real world dataset, and be able to use it for predictions.


This covers both basic theoretical background and practical skills required to successfully tackle your first data science project!

The content in this workshop covers the preliminary concepts & skills necessary for data science work. Although considered “preliminary” (a must have) we will cover them with the depth necessary to fully understand and utilize later.

These include the data science workflow, ML task types (or - what can / should I do with ML), popular ML algorithms (models), gradient descent (what “training” a model means), avoiding overfitting (generalisation, complexity issues, validation etc.) and hyper parameter tuning.

We will go over simple python code for all of the above and later in the workshop use these to compete (together or alone) in a Kaggle competition to practice this for real. 

Presentation : https://goo.gl/i8ttz9
Code : https://goo.gl/9jTDb1

5 hours

During this course you will gain a basic understanding of TensorFlow and practice basic coding with TensorFlow 2.0

We will cover :

1. What is TensorFlow (Why TF 2.0 >> TF 1.0)

2. High Level APIs (Keras & Estimators)

3. TensorFlow Components (tf.data, Checkpoints, Accelerators)

4. TensorFlow Hub & Transfer Learning basics 

5. TensorBoard Overview

6. TFX useful components overview (tfdv, tfma)

We will have many code examples.

You'll get an understanding of what they

do, how to change them and how to use them for your data.

Presentation : tiny.cc/tf2-101-preso

Code : tiny.cc/tf2-101-lab



2 hours

Practical introduction and applications with code to preprocessing for machine learning.

We will use scikit transformers in this course.

Presentation : https://goo.gl/q6a376
Code : https://goo.gl/XNmkBW

2 hours

Practical introduction and applications with code to machine learning model tuning.

We will go over hyperparameter tuning, evaluation schemes, regularization and more.

Presentation : https://goo.gl/6nnVpy
Code : https://goo.gl/UXWSWh

2 hours

Practical applications of advanced ML models and methods.

We will go over ensembling, boosting, transfer learning and autoML.

Presentation : https://goo.gl/XDHCiV
Code : https://goo.gl/u1zTf3

 

PRESENTATIONS

I Speak

Latest / Buzziest / Most Innovative / Most Promising in software engineering of 2019.

Based on :

  • work of leading companies (Netflix, Google, Spotify…)

  • Trends in tech world

  • Opinions (of the people that set the tone - e.g. Martin Fowler)

shorturl.at/ilFU1

The world of transportation is radically changing.

It is an industry with immense technological challenges, most of which are AI related.

In the current paste and major active industry players, it will become unrecognisable in following years.
In this talk I aim to cover the different fields that it includes, data science related problems that it poses, and current state of the art solutions.

The focus of this talk will be smart cities, which multiple teams @Google work on, including mine and myself.

I will present my own work and other smart city topics research and solutions by my counterparts at Google and Uber.

shorturl.at/qANPV

youtube.com/watch?v=en5mzFEdwdI

HACKATHON WINS

2019 - @Armis - Devices Gone Rogue Anomaly Detection - shorturl.at/aenVZ

2018 - @Microsoft - AI Based Math Problem Solver - shorturl.at/cqBU8

2016 - @Final - NYC Taxi ETA Prediction Challenge - shorturl.at/hjmJP

Overview of the Google's S2 library for two-dimensional projections on a three-dimensional sphere (similar to a globe).

shorturl.at/DIUV4

How to create and maintain a production-ready BIG ML Workflow - From Zero to Hero.

shorturl.at/nwxHU

Intro to cloud AI platform notebooks, creating a Google Cloud Platform account and redeeming credit.

shorturl.at/bfmqX

Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.

shorturl.at/EOV58

An overview of the distributed database landscape - what it is, how it works, who uses it and for what purpose.

shorturl.at/joz26

Analyzing and explaining the intersection between the fields of big data and data visualization including domain theory and practical examples.

shorturl.at/bQR23

 

MIXOLOGY

 
 

CONTACT ME

Daniel Marcous

+972-547-229760

  • linkedin
  • generic-social-link
  • youtube
 
Posts Are Coming Soon
Stay tuned...