Ilmu Pengetahuan & Teknologi - Laman 23 dari 40

Lyft Self Driving Dataset and Competition

Lytft just released Level 5 Dataset, and they plan to release the corresponding challenge. Stay tuned.

Here is the dataset: https://level5.lyft.com/dataset/ . Its size is around 41 GB

LVIS Challenge 2019

Facebook has just introduced a great dataset and corresponding machine learning challenge at same time. The dataset is named LVIS (Large Vocabulary Instance Segmentation).

This is a great visual instance dataset.

The challenge: https://www.lvisdataset.org/challenge

Today, rigorous evaluation of general purpose object detectors is mostly performed in the few category regime (e.g. 80) or when there are a large number of training examples per category (e.g. 100 to 1000+). LVIS provides an opportunity to enable research in the setting where there are a large number of categories and where per-category data is sometimes scarce.
Given that state-of-the-art deep learning methods for object detection perform poorly in the low-sample regime, we believe that our dataset poses an important and exciting new scientific challenge.

The dataset: https://www.lvisdataset.org/dataset

Source: https://twitter.com/facebookai/status/1159548405867139074

Visualization in Scikit-learn

Scikit-learn has just introduced simple API for machine learning visualizations. Here are a few visualization samples:

Plotting multiple ROC curves in a single graph

Making Indonesia 4.0

Kumpulan artikel-artikel tentang “Making Indonesia 4.0”.

Dokumen Resmi Making Indonesia 4.0

Making Indonesia 4.0 ( Dokumen Peluncuran Resmi ) (87 halaman) (https://www.scribd.com/document/397793605/Making-Indonesia-President-Sign-Document-Bahasa-Final)
Making Indonesia 4.0 (Brief, 8 halaman) http://www.kemenperin.go.id/download/18384
Making Indonesia 4.0 (Presentasi Menteri 2018, 21 halaman) (http://kemenperin.go.id/download/18427)
Making Indonesia 4.0 (Seminar Nasional Standardisasi Badan Standardisasi Nasional BSN) (http://bsn.go.id/uploads/download/making_indonesia_4.0_-_kementerian_perindustrian.pdf)
Making Indonesia 4.0 (Seminar Ditjen ILMATE Indonesia IoT Forum, 42 halaman) (http://iotforum.id/wp-content/uploads/2018/11/01.-Paparan-BPPI-TIKI-Seminar-IoT-Batam-7-November-2018_shr.pdf)
Making Indonesia 4.0 (English, 55 halaman) (http://mufidnilmada.staff.gunadarma.ac.id/Downloads/files/60245/Making_Indonesia_Socialization_Pack_vEnglish+%28002%29.pdf)

Artikel Berita tentang Making Indonesia 4.0

Machine Learning and Data Science Competition 2019

Kaggle: https://www.kaggle.com/
Driven Data: https://www.drivendata.org/competitions/
Crowdanalytix: https://www.crowdanalytix.com/community
Coda Lab : https://competitions.codalab.org/
DataScienceChallenge https://www.datasciencechallenge.org/ , but it has shown no activity for long time
CrowdAI: https://www.crowdai.org/challenges
Analytics Vidhya https://datahack.analyticsvidhya.com/
TopCoder: https://www.topcoder.com/challenges, mostly competitive programming challenges, but several data science challenges are available

Object Detection State of The Art Progress

List of object detection progress:

R-CNN
Overfeat
Multibox
SPP-Net
MR-CNN
DeepBox
AttentionNet
Fast R-CNN
Deep[Proposal
Faster R-CNN
OHEM
YOLO v1
G-CNN
AZNet
Inside-OutsideNet (ION)
Hypernet
CRAFT
MultiPathNet (MPN)
SSD
GBDNet
CPF
MS-CNN
R-FCN
PVANET
DeepID
NoC
DSSD
TDM
YOLO v2
Feature Pyramid (FPN)
RON
DCN
DeNet
CoupleNet
RetinaNet
DSOD
Mask R-CNN
SMN
YOLO v3
SIN
STDN
RefineDet
MLKP
Relation-Net
Cascade R-CNN
RFBNet
CornetNet
Pelee
MethAnchor
SNIPER
M2Det

Reference: https://deeplearning.mit.edu/

Siaran Pers Bersama Terkait Bencana Selat Sunda

Berikut ini siaran pers bersama, yang nampaknya pengumuman resmi paling lengkap sejauh ini tentang bencana tsunami di Selat Sunda.

Siaran Pers Bersama dari lembaga-lembaga berikut:

Badan Informasi Geospasial (BIG) (https://twitter.com/InfoGeospasial)
Kementrian Koordinator Bidang Kemaritiman (https://twitter.com/kemaritiman)
Badan Pengkajian & Penerapan Teknologi (BPPT) (https://twitter.com/BPPT_RI)
BMKG (https://twitter.com/infoBMKG)
Lembaga Ilmu Pengetahuan Indonesia (LIPI) (https://twitter.com/lipiindonesia)
Badan Geologi

Sumber data ini adalah twitter dari BIG (Badan Informasi Geospasial) dalam format gambar JPG. Sejauh ini belum didapatkan versi PDF ataupun versi website.

Sumber:

Twitter Badan Informasi Geospasial (https://twitter.com/InfoGeospasial/status/1077338995137236992)
https://darilaut.id/berita/ini-hasil-rakor-tsunami-selat-sunda

Kerusakan Dan Korban Akibat Tsunami Selat Sunda Desember 2018

Data korban dan kerusakan ini diperoleh dari twitter pak Sutopo Purwo Nugroho (Humas BNPB) di alamat https://twitter.com/Sutopo_PN/status/1077092955389812736

Sejauh ini belum ada gambar resmi dari situs Balai Nasional Penanggulanan Bencana (BNPB)

Simple Image Classification with Keras

Keras logo

There are several kind of image classification:

Binary classification
Multiclass classification
Multi label classification

Image generation method for training

image.ImageGenerator.flow_from_directory()
image.ImageGenerator.flow()

Various models for training (built on model)

Xception
VGG16
VGG19
Resnet50
InceptionV3
InceptionResNetV2
MobileNet
DenseNet
NASNet
MobileNetV2

Keras built in models usually have pre-trained weight on Imagenet, which significantly speeds up training, but those weights are only available for some image sizes.

There are two techniques to feed image files for prediction in Keras:

keras.preprocessing.image.flow_from_directory()
keras.preprocessing.image.flow()

Simple Tutorials

Simple Binary Image Classifier with Keras with flow_from_directory()
Simple Binary Image Classifier with Keras with flow()
Simple Multiclass Image Classification with Keras and flow_from_directory()
Simple Multiclass Image Classification with Keras and flow()
Simple Multi Label Image Classification with Keras and flow()

Reference

Simple Multiclass Image Classification with Keras

This tutorial shows how to do multiclass image classification with Keras, using keras.preprocessing.image.flow_from_directory() to feed the image files for training and prediction.


Plant Seedlings Classification dataset

Prepare Directory Structure

download dataset from https://www.kaggle.com/c/plant-seedlings-classification/data
put original training files in <root>/data/train
put original test files in <root>/data/test/0 . Caution: test files must be put into a directory under /data/test. For simplicity, we create /data/test/0, but any directory name is okay, as long as it is under /data/test
create <root>/plant-src to store source codes

Here is the directory structure after previous steps:

Import Libraries

 import tensorflow as tf  
 import keras as keras  
 import os  
 from keras.layers import Flatten, Dense, AveragePooling2D, GlobalAveragePooling2D  
 from keras.models import Model  
 from keras.optimizers import RMSprop, SGD  
 from keras.callbacks import ModelCheckpoint  
 from keras.callbacks import EarlyStopping  
 from keras.preprocessing.image import ImageDataGenerator  
 from keras.callbacks import CSVLogger  
 from keras.layers.normalization import BatchNormalization  
 import numpy as np  
 from keras.models import load_model  
 from pathlib import Path  
 import shutil

Make Training & Validation Directories

Create directories for training and validation set. A little bit complicated, since flow_from_directory() required that each class has it’s own directory.

 #making training & validation directories  
 import pathlib  
 session='simpleNASNet'  
 classnames=['Black-grass','Charlock','Cleavers','Common Chickweed','Common wheat','Fat Hen','Loose Silky-bent','Maize','Scentless Mayweed','Shepherds Purse','Small-flowered Cranesbill','Sugar beet']  
 train_dir="../"+session+"/train"  
 valid_dir="../"+session+"/valid"  
 for dirname in classnames:  
 #  print(dirname)  
   fulldirname=train_dir+'/'+dirname    
   print(fulldirname)  
   pathlib.Path(fulldirname).mkdir(parents=True, exist_ok=True)  
   fulldirname=valid_dir+'/'+dirname    
   print(fulldirname)  
   pathlib.Path(fulldirname).mkdir(parents=True, exist_ok=True)

Split training data between training set and validation set. Usual 80%-20% split is used.

 #copy image files, split 80% training- 20% validation  
 counter=0  
 for root, dirs, files in os.walk(original_data_dir):  
    for file in files:  
       fullfilename = os.path.join(root, file)        
       basename=os.path.basename(fullfilename)  
       #detect image classification from directory name  
       split1=os.path.split(fullfilename)        
       split2=os.path.split(split1[0])  
       classname=str(split2[1])#classname for this particular file  
       if((counter%5)==0): #copy validation  
         dst_filename=valid_dir+"/"+classname+"/"+basename  
         shutil.copyfile(fullfilename,dst_filename)      
       else:       #copy training    
         dst_filename=train_dir+"/"+classname+"/"+basename  
         shutil.copyfile(fullfilename,dst_filename)      
       counter=counter+1

Model

Prepare model, we use NASNet with 331×331 input, using pre-trained weight from Imagenet. Top layers are omitted, and replaced with a Dense layer of 1024 cells and 12 cells output layer for each class. Output activation is softmax, which is usual for multiclass classification.

 #prepare model  
 img_width=331  
 img_height=331  
 network_notop = keras.applications.nasnet.NASNetLarge(input_shape=(img_width, img_height, 3),  
                                  include_top=False,  
                                  weights='imagenet', input_tensor=None,  
                                  pooling=None)      
 x = network_notop.output  
 x = GlobalAveragePooling2D()(x)      
 x = Dense(1024, activation='relu')(x)      
 x = BatchNormalization()(x)  
 predictions = Dense(12, activation='softmax')(x)  
 the_model = Model(network_notop.input, predictions)

Training

Standard training.
Specific parameter for multiclass classification:

loss=’categorical_crossentropy’ in model.compile()
class_mode=’categorical’ in flow_from_directory()

 #training  
 learning_rate = 0.0001   
 logfile = session + '-train' + '.log'   
 batch_size=4  
 nbr_epochs=10  
 print("training  directory: "+train_dir)  
 print("valication directory: "+valid_dir)  
 optimizer = SGD(lr=learning_rate, momentum=0.9, decay=0.0, nesterov=True)  
 the_model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])  
 csv_logger = CSVLogger(logfile, append=True)  
 early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1, mode='auto')  
 best_model_filename=session+'-weights.{epoch:02d}-{val_loss:.2f}.h5'   
 best_model = ModelCheckpoint(best_model_filename, monitor='val_acc', verbose=1, save_best_only=True)  
   # this is the augmentation configuration we will use for training  
 train_datagen = ImageDataGenerator(  
     rescale=1. / 255,  
     shear_range=0.2,  
     zoom_range=0.2,  
     rotation_range=90,  
     width_shift_range=0.2,  
     height_shift_range=0.2,  
     horizontal_flip=True,  
     vertical_flip=True)  
 val_datagen = ImageDataGenerator(rescale=1. / 255)  
 print('prepare train generator')   
 train_generator = train_datagen.flow_from_directory(  
     train_dir,  
     target_size=(img_width, img_height),  
     batch_size=batch_size,  
     shuffle=True,  
     classes=classnames,  
     class_mode='categorical')  
 print('prepare validation generator')   
 validation_generator = val_datagen.flow_from_directory(  
     valid_dir,  
     target_size=(img_width, img_height),  
     batch_size=batch_size,  
     shuffle=True,  
     classes=classnames,  
     class_mode='categorical')  
 print('fit generator')   
 the_model.fit_generator(  
     generator=train_generator,  
     epochs=nbr_epochs,  
     verbose=1,  
     validation_data=validation_generator,  
     callbacks=[best_model, csv_logger, early_stopping])

Training progress

 training  directory: ../simpleNASNet/train  
 valication directory: ../simpleNASNet/valid  
 prepare train generator  
 Found 3800 images belonging to 12 classes.  
 prepare validation generator  
 Found 950 images belonging to 12 classes.  
 fit generator  
 Epoch 1/10  
 950/950 [==============================] - 635s 669ms/step - loss: 1.2295 - acc: 0.6039 - val_loss: 0.6469 - val_acc: 0.7979  
 Epoch 00001: val_acc improved from -inf to 0.79789, saving model to simpleNASNet-weights.01-0.65.h5  
 Epoch 2/10  
 950/950 [==============================] - 557s 586ms/step - loss: 0.6281 - acc: 0.7929 - val_loss: 0.3840 - val_acc: 0.8674  
 Epoch 00002: val_acc improved from 0.79789 to 0.86737, saving model to simpleNASNet-weights.02-0.38.h5  
 Epoch 3/10  
 950/950 [==============================] - 557s 586ms/step - loss: 0.5220 - acc: 0.8345 - val_loss: 0.3026 - val_acc: 0.9000  
 Epoch 00003: val_acc improved from 0.86737 to 0.90000, saving model to simpleNASNet-weights.03-0.30.h5  
 Epoch 4/10  
 950/950 [==============================] - 558s 587ms/step - loss: 0.4369 - acc: 0.8566 - val_loss: 0.2830 - val_acc: 0.9105  
 Epoch 00004: val_acc improved from 0.90000 to 0.91053, saving model to simpleNASNet-weights.04-0.28.h5  
 Epoch 5/10  
 950/950 [==============================] - 558s 588ms/step - loss: 0.3722 - acc: 0.8842 - val_loss: 0.2310 - val_acc: 0.9253  
 Epoch 00005: val_acc improved from 0.91053 to 0.92526, saving model to simpleNASNet-weights.05-0.23.h5  
 Epoch 6/10  
 950/950 [==============================] - 559s 588ms/step - loss: 0.3213 - acc: 0.8966 - val_loss: 0.2210 - val_acc: 0.9232  
 Epoch 00006: val_acc did not improve from 0.92526  
 Epoch 7/10  
 950/950 [==============================] - 556s 585ms/step - loss: 0.3202 - acc: 0.8939 - val_loss: 0.2190 - val_acc: 0.9263  
 Epoch 00007: val_acc improved from 0.92526 to 0.92632, saving model to simpleNASNet-weights.07-0.22.h5  
 Epoch 8/10  
 950/950 [==============================] - 559s 589ms/step - loss: 0.2997 - acc: 0.9063 - val_loss: 0.1861 - val_acc: 0.9389  
 Epoch 00008: val_acc improved from 0.92632 to 0.93895, saving model to simpleNASNet-weights.08-0.19.h5  
 Epoch 9/10  
 950/950 [==============================] - 554s 584ms/step - loss: 0.2469 - acc: 0.9203 - val_loss: 0.1942 - val_acc: 0.9379  
 Epoch 00009: val_acc did not improve from 0.93895  
 Epoch 10/10  
 950/950 [==============================] - 557s 587ms/step - loss: 0.2619 - acc: 0.9147 - val_loss: 0.1695 - val_acc: 0.9421  
 Epoch 00010: val_acc improved from 0.93895 to 0.94211, saving model to simpleNASNet-weights.10-0.17.h5

Prediction & Submission

Caution: test files must be put into a directory under /data/test. For simplicity, we create /data/test/0, but any directory name is okay, as long as it is under /data/test . This behavior is quite strange, but maybe to make flow_from_directory() work the same way for training phase and prediction/inference phase.

#prediction   
 batch_size=4  
 nbr_test_samples=794    
 img_width=331  
 img_height=331  
  #choose weights file manually   
 weights_path = 'simpleNASNet-weights.10-0.17.h5' # choose file manually, filename may be different  
 test_data_dir = '../data/test/'   
 test_datagen = ImageDataGenerator(rescale=1./255)   
 test_generator = test_datagen.flow_from_directory(   
    test_data_dir,   
    target_size=(img_width, img_height),   
    batch_size=batch_size,   
    shuffle = False, # no shuffling, since filenames must match predictions. Shuffling may change file sequence   
    classes = None, #    
    class_mode = None)   
  test_image_list = test_generator.filenames   
  print('Loading model and weights')   
  predict_model = load_model(weights_path)   
  print('Begin to predict for testing data ...')   
  predictions = predict_model.predict_generator(test_generator, nbr_test_samples)   
  np.savetxt(session+'-predictions.txt', predictions) # store prediction matrix, for later analysis if necessary

Constructing submission file

 #submission  
 submission_file=session+'-submit.csv'   
 print('Begin to write submission file:'+submission_file)   
 f_submit = open(submission_file, 'w')   
 f_submit.write('file,speciesn')   
 for i, image_name in enumerate(test_image_list):   
   # find maximum prediction of 12  
   max_index=0  
   max_value=0  
   for x in range(0, 12):  
     if(predictions[i][x]>max_value):  
       max_value=predictions[i][x]  
       max_index=x  
   basename=os.path.basename(image_name)   
   prediction_class = classnames[max_index] # get predictions from array     
   f_submit.write('%s,%sn' % (basename, prediction_class))   
 f_submit.close()   
 print('Finished write submission file ..')

To check final score, let’s go to Late Submission page in Kaggle Plant Seedlings Classification. The score is 0.96095, which ranks about 400 in leaderboard.

Late submission score