Machine Learning Free Book

Some free ebooks to learn Machine Learning and related fields

Statistics

Machine Learning

Data Engineering

Neural Network

Deep Learning

 Sources

Google Code-In 2017: My Story

Weeks before GCI (Google Code-In) even started, I keep debating with myself whether to join GCI 2017 or not. I was a GCI 2016 participant and my experience with it was not so good. It was kinda a traumatic experience for me.

Long story short, I decided to join. The first thing I have to do is chose an organization I’m interested in. I already knew which organization I’d contribute to, even before I joined; Zulip.

But joining GCI more than a week late (I had some internet problems) ruins my plan. Zulip is a huge community. There sure were a lot of participants. That means I have to do a lot of tasks in order to, well, win? I never expect myself to be a finalist, let alone winning, but I want to push myself to the limit. The competition would be too tough for me, so I prefer to chose other organization.

I scroll through the available organizations and observe them. Surprisingly, a few organizations caught my eyes. OpenWISP, LiquidGalaxy, and CloudCV, to name a few. I feel like I was sorta qualified for them. Not only that, they’re all new organizations! A good thing to forget my past, GCI 2016.

I choose CloudCV as the organization I want to work with. I chose it because it’s related to Machine Learning, a thing that I’ve been interested in for the past several months. Perfect.

CloudCV is a young open source organization which builds some platforms for AI and/or Machine Learning. The goal of CloudCV is to make AI research more reproducible. CloudCV has 3 main projects, EvalAIOrigami, and Fabrik.

Fabrik’s page

CloudCV’s task choices, however, were so limited. At one point, it even only had 7 tasks choices (not counting the beginner tasks)! I mostly give my contributions to Fabrik, such as adding neural network models to its model zoo. Adding a model to Fabrik’s model zoo was like a gambling game for me. When you’re lucky, it was so easy you feel like you’ve done nothing. But other times it’s really hard I feel like I want to give up.

The first thing I have to do when I want to add a new model to Fabrik is to find a neural network model. At this time of writing, Fabrik only supports 3 frameworks, Caffe, Keras, and Tensorflow. However, Fabrik still has some problems with tensorflow models. I don’t have any experience with Caffe so I prefer to go with keras.

After cloning a model I want to add, I have to make sure that the model works perfectly. Some models work well in keras 2, while some others don’t. Some works in tensorflow 1.4.1, some don’t, etcetera. After running the model smoothly, I have to make a JSON file from it. Then, I have to make sure that Fabrik supports the layers in the model.

Sometimes Fabrik throw me an error and I have to find another model. If Fabrik keeps throwing errors, I have to change the model I want to import, and start working from zero again. Repeat.

In this blog post, I’ve listed some models I’ve tried to add to Fabrik. There’s more to it though. Right now I have a collection of more than 20 different neural networks models, only because I keep getting errors on most models I tried! Almost all of them use keras as their framework.

Another thing I did was finding AI challenges on the internet. I already know one website; kaggle! But this task makes me even more creative and I scoured the internet for every possible AI challenge I can find. Some of them can be found here.

I also made some graphics for CloudCV:

A logo for Origami
An illustration for Fabrik

I enjoyed working with CloudCV. I like the atmosphere, the community, the nice and helping people, and pretty much everything, even the timezones. Most students in other organization usually have problems with a huge time zone difference with their mentors and ended up being awake all night long. In CloudCV, I was thankful to have mentors whose timezones were close to mine.

One thing that bugs me a little is that CloudCV only had a few mentors. I counted all the mentors whose name appeared on the task pages, and there were only 9 mentors!

A random screenshot of my terminal

Working with CloudCV gave me the experience about programming in the real world. Programming isn’t all about coding. Sometimes when you find a problem, you gotta solve it yourself because StackOverflow doesn’t have all the answer. Setting up a development environment is the hardest of all. Package versions aren’t just numbers, but it plays an important role in a project.

In the future, I hope to contribute more to CloudCV whenever I have enough time.

I got into the leaderboard and I’m pretty happy with that. Thank you to everyone who has helped me through contributing to CloudCV, including my family, other students, and of course, and my mentors. Thanks for dealing with my dumb questions and dealing with me in general.


ps: if you want to ask me questions about GCI, feel free to, I’d be happy to answer.

Keras Neural Networks and Fabrik

A screenshot of Fabrik


I tried to import several keras neural networks  to Fabrik, and this is the result:
These are the models I successfully imported:

Model Link Fabrik Link
https://github.com/anantzoid/VQA-Keras-Visual-Question-Answering http://fabrik.cloudcv.org/caffe/load?id=20180105045732jmyeu
https://github.com/LemonATsu/Keras-Image-Caption
http://kodu.ut.ee/~leopoldp/2016_DeepYeast/code/caffe_model/ http://fabrik.cloudcv.org/caffe/load?id=20180102135425bzkzy
And these are some models I had troubles with:
Model Link Successfully Generated the JSON Model? Problem Error Message
https://github.com/ykamikawa/SegNet Yes Error when importing ValueError: Unknown layer: MaxPoolingWithArgmax2D
https://github.com/zhixuhao/unet Yes Error when exporting ValueError: `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 64, 64, 512), (None, 63, 63, 512)]
https://github.com/yihui-he/u-net Yes Error when exporting ValueError: `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, -1, 19, 256), (None, 0, 20, 256)]
https://github.com/aitorzip/Keras-ICNet Yes Error when importing
ValueError: bad marshal data (unknown type code)
https://github.com/preddy5/segnet Yes Error when importing Cannot import layer of Layer type
https://github.com/k3nt0w/FCN_via_keras/ No ValueError: The input must have 3 channels; got `input_shape=(3, 224, 224)`
https://github.com/0bserver07/Keras-SegNet-Basic No
ValueError: total size of new array must be unchanged

Image Captioning Models and Fabrik

No
Name
URL
Framework
Note
Fabrik
1 NeuralTalk https://github.com/karpathy/neuraltalk Python + numpy obsoleted by NeuralTalk2
2 NeuralTalk2 https://github.com/karpathy/neuraltalk2 Torch Torch incompatible with Fabriq
3 Show and Tell https://github.com/tensorflow/models/tree/master/research/im2txt Tensorflow Tensorflow parser Fabriq incompatible with Fabriq
4 Keras Image Caption https://github.com/LemonATsu/Keras-Image-Caption Keras
requires python 3.4+
fail with python 3.6
5 https://github.com/amaiasalvador/imcap_keras Keras need MSCOCO
6 Neural Image Captioning https://github.com/oarriaga/neural_image_captioning Keras TimeDistributed Layer incompatible with
I want to add an Image Captioning model to Fabrik. It seems pretty easy, all you have to do is find a JSON format of the model you want, and there you go! All done. But in reality, it wasn’t that easy.
First, I have to find an image captioning mode. My first choice landed to NeuralTalk since it’s pretty popular. After heading to the GitHub page, it seems like NeuralTalk is obsoleted by NeuralTalk2, so I take NeuralTalk2 After having a hard time trying to install it and trying to make the JSON file, I realized something. NeuralTalk2 use torch as its framework and Fabrik doesn’t support torch. Fabrik only supports Caffe, Keras, and Tensorflow. (I never made the JSON file by the way)
I have to try another model. I ended up trying Neural Image Captioning by oarriaga on github. Unlike NeuralTalk2, making the JSON file was pretty smooth.
So I try to find another model. Show and Tell looks good. It uses Tensorflow as its framework, but unfortunately Fabrik <><><>< so I have to find another model

Machine Learning And Artificial Intelligence Challenges in 2018

Host Challenge Name URL Prize Deadline
Visual Geometry Group Visual Domain Decathlon Challenge http://www.robots.ox.ac.uk/~vgg/decathlon/
Driven Data Concept to Clinic https://concepttoclinic.drivendata.org/ 100000 Early 2018
Driven Data Predicting Poverty https://www.drivendata.org/competitions/50/worldbank-poverty-prediction/ 15000 28 Februari 2018
Kaggle Mercari Price Suggestion Challenge https://www.kaggle.com/c/mercari-price-suggestion-challenge 100000
Kaggle Toxic Comment Classification Challenge https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge 35000 20 Februari 2018
Kaggle Nomad2018 Predicting Transparent Conductors https://www.kaggle.com/c/nomad2018-predict-transparent-conductors EUR 5000
Kaggle Statoil/C-CORE Iceberg Classifier Challenge https://www.kaggle.com/c/statoil-iceberg-classifier-challenge 50000 23 Januari 2018
Kaggle TensorFlow Speech Recognition Challenge https://www.kaggle.com/c/tensorflow-speech-recognition-challenge 25000 16 Januari 2018
Crowd AI WWW 2018 Challenge: Learning to Recognize Musical Genre https://www.crowdai.org/challenges/www-2018-challenge-learning-to-recognize-musical-genre
Crowd AI AI-generated music challenge https://www.crowdai.org/challenges/ai-generated-music-challenge
Crowdanalytix Business Analytics for Beginners Using R – Part I https://www.crowdanalytix.com/contests/business-analytics-for-beginners-using-r—part-i
https://community.topcoder.com/longcontest/?module=ViewProblemStatement&rd=17036&pm=14735
Innocentive Machine Tagging Challenge https://www.innocentive.com/ar/challenge/9934063
Innocentive PET Imaging Probes for Visualization and Quantification of Oligonucleotide Exposure https://www.innocentive.com/ar/challenge/9934086
Topcoder Computer Vision – Duplicated Receipts Detector – Improvement of The Initial PoC https://www.topcoder.com/challenges/30061112/?type=develop
Topcoder Road Detector https://community.topcoder.com/longcontest/?module=ViewProblemStatement&rd=17036&pm=14735
https://community.topcoder.com/longcontest/?module=ViewStandings&rd=17036
Hackerrank Correlation and Regression Lines – A Quick Recap #1 https://www.hackerrank.com/challenges/correlation-and-regression-lines-6/problem
DataScience Power Consumption Forecasts https://www.datascience.net/fr/challenge/32/details
Analytics Vidhya Data Science Interview Preparation Test https://datahack.analyticsvidhya.com/contest/data-science-interview-preparation-test/
Quora Quora Challenges https://www.quora.com/challenges
Coda Lab LiTS – Liver Tumor Segmentation Challenge https://competitions.codalab.org/competitions/17094
Microsoft et al MS COCO http://cocodataset.org/
Kaggle & IEEE IEEE Signal Processing Cup https://signalprocessingsociety.org/get-involved/signal-processing-cup
Kaggle & Google Google Landmark Retrieval Challenge https://www.kaggle.com/c/landmark-retrieval-challenge
Kaggle & Google Google Landmark Recognition Challenge https://www.kaggle.com/c/landmark-recognition-challenge
Kaggle Humpback Whale Identification Challenge https://www.kaggle.com/c/whale-categorization-playground
Kaggle Digit Recognizer https://www.kaggle.com/c/digit-recognizer
Crowd AI AI-generated music challenge https://www.crowdai.org/challenges/ai-generated-music-challenge
Crowd AI Mapping Challenge https://www.crowdai.org/challenges/mapping-challenge
RTE Winter electricity demand forecast – a deterministic approach [Part 1] https://www.datascience.net/fr/challenge/33/details
RTE Winter electricity demand forecast – a probabilistic approach [Part 2] https://www.datascience.net/fr/challenge/34/details
Coda Lab Shallow Globe https://competitions.codalab.org/competitions/18113
Coda Lab AutoML2018 challenge PAKDD2018 https://competitions.codalab.org/competitions/17767
 Driven Data Power Laws: Forecasting Energy Consumption https://www.drivendata.org/competitions/51/electricity-prediction-machine-learning/
 Driven Data  Power Laws: Detecting Anomalies in Usage https://www.drivendata.org/competitions/52/anomaly-detection-electricity/
 Driven Data Power Laws: Optimizing Demand-side Strategies https://www.drivendata.org/competitions/53/optimize-photovoltaic-battery/
 Kaggle Dog Breed Identification https://www.kaggle.com/c/dog-breed-identification
Robust Vision Challlenge http://www.robustvision.net/
KITTI Vision Benchmark Suite http://www.cvlibs.net/datasets/kitti/
Clickbait Challenge http://www.clickbait-challenge.org/
Coda Lab Example-based Single-Image Super-Resolution Challenge https://competitions.codalab.org/competitions/18025
Hackerearth various challenges https://www.hackerearth.com/challenges/
Challenge Data https://challengedata.ens.fr/en/season/4/challenge_data_2018.html
Crowdanalytix Identifying Superheroes from Product Images https://www.crowdanalytix.com/contests/identifying-superheroes-from-product-images
Kaggle iMaterialist Challenge (Furniture) at FGVC5 https://www.kaggle.com/c/imaterialist-challenge-furniture-2018
Kaggle TalkingData AdTracking Fraud Detection Challenge https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection
Kaggle DonorsChoose.org Application Screening https://www.kaggle.com/c/donorschoose-application-screening
Kaggle Plant Seedlings Classification https://www.kaggle.com/c/plant-seedlings-classification
Kaggle iNaturalist Challenge at FGVC5 https://www.kaggle.com/c/inaturalist-2018
CrowdAI / EPFL AI-generated music challenge https://www.crowdai.org/challenges/ai-generated-music-challenge
CrowdAI Mapping Challenge https://www.crowdai.org/challenges/mapping-challenge
CrowdAI – CLEF LifeCLEF 2018 Expert https://www.crowdai.org/challenges/lifeclef-2018-expert
CrowdAI / CLEF LifeCLEF 2018 Geo – Location Based Species Recommendation https://www.crowdai.org/challenges/lifeclef-2018-geo
CrowdAI / CLEF ImageCLEF 2018 Tuberculosis – Severity scoring https://www.crowdai.org/challenges/imageclef-2018-tuberculosis-severity-scoring
a https://www.crowdai.org/challenges/imageclef-2018-tuberculosis-tbt-classification
a https://www.crowdai.org/challenges/imageclef-2018-caption-concept-detection
a https://www.crowdai.org/challenges/imageclef-2018-vqa-med
a https://www.crowdai.org/challenges/imageclef-2018-caption-caption-prediction
Alibaba Cloud & Met Office
Future Challenge
Helping Balloons Navigate the Weather
https://tianchi.aliyun.com/competition/introduction.htm?raceId=231622&_lang=en_US
ICPR2018 ICPR MTWI 2018 CHALLENGE 3: End to End Text Detection and Recognition of Web Images https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.5624d780ALjPWJ&raceId=231652
ICPR2018 ICPR MTWI 2018 CHALLENGE 2: Text Detection of Web Images https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.5624d780ALjPWJ&raceId=231651
ICPR2018 ICPR MTWI 2018 CHALLENGE 1: Text Recognition of Web Images https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.5624d780ALjPWJ&raceId=231650
Alibaba FashionAI Global Challenge 2018—Attributes Recognition of Apparel https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.5624d780ALjPWJ&raceId=231649
CAINIAO CAINIAO MSOM data-driven research competition https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.5624d780ALjPWJ&raceId=231623
Tianchi Sina Weibo Interaction-prediction-Challenge the Baseline https://tianchi.aliyun.com/getStart/introduction.htm?spm=5176.100066.0.0.56ecd780V1l8Q4&raceId=231574
IJCAI-18 IJCAI-18 Alimama Sponsored Search Conversion Rate(CVR) Prediction Contest https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.56ecd780V1l8Q4&raceId=231647
Tian Chi FashionAI Global Challenge—Key Points Detection of Apparel https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.56ecd780V1l8Q4&raceId=231648
Tianchi Clothes Matching Challenge
 on Taobao.com-Challenge the Baseline https://tianchi.aliyun.com/getStart/introduction.htm?spm=5176.100066.0.0.56ecd780V1l8Q4&raceId=231575
DCASE Acoustic scene classification http://dcase.community/challenge2018/task-acoustic-scene-classification
DCASE General Purpose audio tagging of Freesound content with AudioSet labels http://dcase.community/challenge2018/task-general-purpose-audio-tagging   and https://www.kaggle.com/c/freesound-audio-tagging
DCASE Bird audio detection http://dcase.community/challenge2018/task-bird-audio-detection
DCASE Large-scale weakly labeled semi-supervised sound event detection in domestic environments http://dcase.community/challenge2018/task-large-scale-weakly-labeled-semi-supervised-sound-event-detection
DCASE Monitoring of domestic activities on multi-channel acoustics http://dcase.community/challenge2018/task-monitoring-domestic-activities
Agorize Smart City Innovation Award https://www.agorize.com/en/challenges/le-monde-smart-cities-2018-world
Open AI Open AI Retro Contest https://contest.openai.com/
IEEE Rebooting Computing Low-Power Image Recognition Challenge (LPIRC 2018) https://rebootingcomputing.ieee.org/lpirc
VIST-NAACL-2018 Visual Storytelling Challenge (NAACL 2018)  https://evalai.cloudcv.org/web/challenges/challenge-page/76/overview
Automatic Visual Advertisements VQA – CVPR2018 Automatic Understanding of Visual Advertisements https://evalai.cloudcv.org/web/challenges/challenge-page/86/overview
z VQA Challenge 2018 https://evalai.cloudcv.org/web/challenges/challenge-page/80/overview
z Leaf Segmentation Challenge https://competitions.codalab.org/competitions/18405
z DeepGlobe Land Cover Classification Challenge https://competitions.codalab.org/competitions/18468
z Chalearn LAP Inpainting Competition Track 2 – Video decaptioning https://competitions.codalab.org/competitions/18421
z Chalearn LAP Inpainting Competition Track 3 – Fingerprint Denoising and Inpainting https://competitions.codalab.org/competitions/18426
z Chalearn LAP Inpainting Competition Track 1 – Inpainting of still images https://competitions.codalab.org/competitions/18423
Kaggle / Google Open Images Challenge 2018 https://storage.googleapis.com/openimages/web/challenge.html
Principal Financial Group IEEE Investment Ranking Challenge https://www.crowdai.org/challenges/ieee-investment-ranking-challenge
z x t y
Stanford ML Group
Bone X-Ray Deep Learning Competition
https://stanfordmlgroup.github.io/competitions/mura/ t y
Berkeley
WAD 2018 Challenges
http://bdd-data.berkeley.edu/wad-2018.html t y
Hackerearth
Deep Learning Beginner Challenge
https://www.hackerearth.com/challenge/competitive/deep-learning-beginner-challenge/ y
x
x t y
x
x
z t y
x
x
z t y
Alibaba
Alibaba Global Scheduling Algorithm Competition
https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.5f1cd780HajDHE&raceId=231663 t y
IEEE ICDM 2018
IEEE ICDM 2018 Global A.I. Challenge on MeteorologyCatch Rain If You Can
https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.0.0.47cbd780fgnIJX&raceId=231662 t y

 

Instalasi Eval AI di Ubuntu 17.10

Artikel ini adalah adaptasi dari prosedur instalasi di https://github.com/Cloud-CV/EvalAI

 

apt-get install openssh-server
apt-get install net-tools

Instalasi software dependencies:

apt-get install python2.7
apt-get install git
apt-get install postgresql

# Success. You can now start the database server using:
# /usr/lib/postgresql/9.6/bin/pg_ctl -D /var/lib/postgresql/9.6/main -l logfile start

apt-get install rabbitmq-server
apt-get install virtualenv
apt-get install python-psycopg2 (thanks to https://stackoverflow.com/questions/28253681/you-need-to-install-postgresql-server-dev-x-y-for-building-a-server-side-extensi)
apt-get install libpq-dev
apt-get install python-dev
apt-get install build-essential

# clone the EvalAI code

git clone https://github.com/Cloud-CV/EvalAI.git evalai

# Create a python virtual environment and install python dependencies.

cd evalai
virtualenv venv
source venv/bin/activate # run this command everytime before working on project
pip install -r requirements/dev.txt

Proses sampai tahap ini berhasil, selanjutnya masih perlu diujicoba:

cp settings/dev.sample.py settings/dev.py

Use your postgres username and password for fields USER and PASSWORD in dev.py file.

Create an empty postgres database and run database migration.

sudo -i -u (username)
createdb evalai
python manage.py migrate –settings=settings.dev

Seed the database with some fake data to work with.

python manage.py seed –settings=settings.dev

This command also creates a superuser(admin), a host user and a participant user with following credentials.

SUPERUSER- username: admin password: password
HOST USER- username: host password: password
PARTICIPANT USER- username: participant password: password

That’s it. Now you can run development server at http://127.0.0.1:8000 (for serving backend)

python manage.py runserver –settings=settings.dev

Open a new terminal window with node(6.9.2) and ruby(gem) installed on your machine and type

npm install

Install bower(1.8.0) globally by running:

npm install -g bower

Now install the bower dependencies by running:

bower install

If you running npm install behind a proxy server, use

npm config set proxy http://proxy:port

Now to connect to dev server at http://127.0.0.1:8888 (for serving frontend)

gulp dev:runserver

That’s it, Open web browser and hit the url http://127.0.0.1:8888.

(Optional) If you want to see the whole game into play, then start the RabbitMQ worker in a new terminal window using the following command that consumes the submissions done for every challenge:

python scripts/workers/submission_worker.py

 

Percobaan iNaturalist

Menghitung Bandwidth Fair Use Policy Indihome 2016

Pada tulisan ini akan dihitung berapa bandwidth yang dapat dipakai untuk paket 10 Mbps Indihome jika diaktifkan terus menerus pada kondisi ideal.

PT Telkom Indonesia menerapkan kebijakan Fair Use Policy (FUP) pada Indihome berikut ini mulai Februari 2016.

Fair Use Policy Indihome 2016
Tabel Fair Use Policy Indihome 2016

Asumsi:

  • Paket 10 Mbps
  • 1 bulan adalah 30 hari
  • Koneksi aktif download setiap saat hanya dibatasi kecepatan Indihome
  • B = byte, b = bit, 1 byte = 8 bit

Perhitungan:

Paket ini akan mengalami 3 macam kecepatan:

  • 10 Mbps ketika pemakaian <300 GB
  • 75% x 10 Mbps = 7.5 Mbps ketika pemakaian < 400 GB
  • 40% x 10 Mbps = 4. Mbps ketika pemakaian > 400 GB

Durasi kecepatan 10 Mbps adalah:

300 GB / 10 Mbps = 300 GB / 10 Mbps x 8 b/B = 240000 detik

Durasi kecepatan 7.5 Mbps adalah:

100 GB / 7.5 MBps = 100 GB / 7.5 Mbps x 8 b/B = 106666 detik

Durasi kecepatan  4 Mbps adalah:

30 hari – 240000 detik – 106666 detik = 30x24x60x60 – 240000 – 106666  = 2245333 detik

Berikut ini ringkasannya:

Kecepatan Durasi Total byte
segmen 1 10 Mbps 240000 detik 300 GB
segmen 2 7.5 Mbps 106666 detik 100 GB
segmen 3 4 Mbps 2245333 detik 1122,7 GB
Total 2592000 detik 1522.7 GB

Jadi total download selama 30 hari adalah 1522.7 GB , dengan kecepatan rata-rata adalah 433127 byte /s atau sekitar 3.46 Mbps

Referensi