DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD) Challenge.

Aim

The aim of this challenge is to evaluate algorithms for automated fundus image quality estimation and grading of diabetic retinopathy.

Abstract

Diabetic Retinopathy (DR) is the most prevalent cause of avoidable vision impairment, mainly affecting the working-age population in the world. Early diagnosis and timely treatment of diabetic retinopathy help in preventing blindness. This objective can be accomplished by a coordinated effort to organize large, regular screening programs. Particularly computer-assisted ones. Well-acquired, large, and diverse retinal image datasets are essential for developing and testing digital screening programs and the automated algorithms at their core. Therefore, we provide a large retinal image dataset, DeepDR (Deep Diabetic Retinopathy), to facilitate the following investigations in the community. First, unlike previous studies, in order to further promote the early diagnosis precision and robustness in practice, we provide the dual-view fundus images from the same eyes, e.g. the optic disc as the center and the fovea as the center, to classify and grade DR lesions. The expected results should outperform the state-of-the-art models built with the single-view fundus images. Second, we include various image quality of fundus images in DeepDR dataset to reflect the real scenario in practice. We expect to build a model that can estimate the image quality level to provide supportive guidance to the fundus image-based diagnosis. Lastly, to explore the extreme generalizability of a DR grading system, we desire to build a model that transfer the capability of DR diagnosis learned from a large number of regular fundus images to the ultra-widefield retinal images. Usually, we use the regular fundus images for initial screening; the widefield scanning performs as a further screening mean because it can provide complete eye information. To the best of our knowledge, our database, DeepDR (Deep Diabetic Retinopathy), is the largest database of DR patient population, and provide more than 1,000 patients data. In addition, it is the only dataset constituting dual-view fundus images from the same eyes and various distinguishable quality levels images. This data set provides information on the disease severity of diabetic retinopathy, and image quality level for each image.

Moreover, we provide the first ultra-widefield retinal image dataset to facilitate the study of model generalizability and meanwhile further extend the DR diagnose means from traditional fundus imaging to wide-field retinal photography. This makes it perfect for development and evaluation of image analysis algorithms for early detection of diabetic retinopathy.

Challenge

The challenge is subdivided into three tasks as follows (participants can submit results for at least one of the challenges):

● Disease Grading: Classification of fundus images according to the severity level of diabetic retinopathy using dual view retinal fundus images. For more details please refer to Sub-challenge 1.

● Image Quality Estimation: Fundus quality assessment for overall image quality, artifacts, clarity, and field definition. For more details please refer to Sub-challenge 2.

● Transfer Learning: Explore the generalizability of a Diabetic Retinopathy (DR) grading system. For more details please refer to Sub-challenge 3.

WeatherBench: A benchmark dataset for data-driven weather forecasting

Github: https://github.com/pangeo-data/WeatherBench

Paper: https://arxiv.org/abs/2002.00469

Announcement: https://twitter.com/raspstephan/status/1229272564729614336

Instalasi Apereo CAS Untuk Development

Berikut ini prosedur instalasi CAS (https://www.apereo.org/projects/cas) hanya untuk keperluan development , misal untuk pengujian fungsionalitas Single Sign On (SSO) di sebuah aplikasi, bukan untuk production.

Keterbatasan:

Webserver menggunakan WAR standalone, untuk production mestinya pakai Java Servlet seperti Tomcat
Database menggunakan cleartext, untuk production mestinya pakai database seperti LDAP
SSL certificate menggunakan self-signed. Untuk yang production mesti pakai yang CA signed.

Prosedur:
buat VM misal di VirtualBox, ukuran disk 10 GB cukup. Setelah instalasi CAS akan memakai space 4,05 GB
Install Ubuntu 19.10 , sebaiknya versi server (http://releases.ubuntu.com/19.10/ubuntu-19.10-live-server-amd64.iso) supaya lebih kecil.

CAS memakai Java, untuk itu perlu install java development kit (download 288 MB, memakai space 800 MB)

apt install default-jdk

install git:

apt install git

clone CAS:

cd /opt
git clone https://github.com/apereo/cas-overlay-template
cd cas-overlay-template

pilih CAS versi 6.1, kemudian lakukan build

checkout 6.1
./gradlew clean build

buat keystore

./gradlew createKeystore

copy konfigurasi CAS dari /opt/cas-overlay-template/etc/cas ke /etc/cas

./gradlew copyCasConfiguration

jalankan CAS sebagai executable WAR:

./gradlew run

Akses ke situs (misal https://192.168.0.202:8443). Akan ada peringatan karena menggunakan self-signed certificate. Klik saja di “accept the risk and continue”
Browse ke situs: https://192.168.0.202:8443/cas

Selanjutnya coba login ke CAS, dengan username:casuser, password:Mellon

Jika loginlancar akan muncul tampilan:

Jika password salah akan muncul tampilan:

Menambah user & password baru:

Edit file /etc/cas/config/cas.properties, user dan password dapat ditambahkan dengan baris berikut:

cas.authn.accept.users=casuserz::Mellon,abcd::efgh, user1::123456, user2::abcdefg

Username dan password dipisahkan dengan ‘::’, antar user dipisahkan dengan koma.

Konfigurasi client agar dapat diakses dari CAS client perlu tahap-tahap berikut:

perlu ditambah JSON service registry, untuk itu perlu aktifkan setting JSON service registry di file build.gradle kemudian build ulang CAS
Buat direktori /etc/cas/services berisi file-file JSON service registry
Tambahkan lokasi file JSON service registry ke file /etc/cas/config/cas.properties

nano /opt/cas-overlay-template/build.gradle

Edit file /opt/cas-overlay-template/build.gradle, edit supaya ada bagian ini:

dependencies {
compile “org.apereo.cas:cas-server-support-json-service-registry:${casServerVersion}”
}

Kemudian build ulang CAS

./gradlew clean build

Buat file /etc/cas/services/wildcard-1000 dengan isi sebagai berikut:

{
“@class” : “org.apereo.cas.services.RegexRegisteredService”,
“serviceId” : “^(https|imaps)://.*”,
“name” : “wildcard”,
“id” : 1000,
“evaluationOrder” : 99999
}

Tambahkan baris berikut di /etc/cas/config/cas.properties:

cas.serviceRegistry.initFromJson=false
cas.serviceRegistry.json.location=file:/etc/cas/services

Pengujian dengan CAS client

Contoh konfigurasi CAS Client di Drupal 8.8.1

Referensi

AMLD 2020 – Transfer Learning for International Crisis Response

URL: https://www.aicrowd.com/challenges/amld-2020-transfer-learning-for-international-crisis-response

What’s the Challenge?

Background

Over the past 3 years, humanitarian information analysts have been using an open source platform called DEEP to facilitate collaborative, and joint analysis of unstructured data. The aim of the platform is to provide insights from years of historical and in-crisis humanitarian text data. The platform allows users to upload documents and classify text snippets according to predefined humanitarian target labels, grouped into and referred to as analytical frameworks. DEEP is now successfully functional in several international humanitarian organizations and the United Nations across the globe.

While DEEP comes with a generic analytical framework, each organization may also create its own custom framework based on the specific needs of its domain. In fact, while there is a large conceptual overlap for humanitarian organizations, various domains define slightly different analytical frameworks to describe their specific concepts. These differences between the analytical frameworks in different domains can still contain various degrees of conceptual (semantic) linkages, for instance on sectors such as Food Security and Livelihoods, Health, Nutrition, and Protection.

Challenge

Currently, the ML/NLP elements of DEEP are trained separately for each organization, using the annotated data provided by the organization. Clearly, for the organizations which start working with DEEP, especially the ones with own custom frameworks, due to the lack of sufficiently tagged data, the text classifier shows poor performance. For these organizations, DEEP faces a cold-start challenge.

This challenge is a unique opportunity to address this issue with a wide impact. It enables not only better text classification, but also showcases those conceptual semantic linkages between the sectors of various organizations, ultimately resulting in improved analysis of the humanitarian situation across domains. You will be provided with the data of four organizations, consisting of text snippets and their corresponding target sectors, where, three of the organizations has the same analytical frameworks (target labels), and one has a slightly different one.

The aim is to learn novel text classification models, able to transfer knowledge across organizations, and specifically improve the classification effectiveness of the organizations with smaller amount of available training data. Ideally, transfer and joint learning methods provide a robust solution for the lack of data in the data-sparse scenarios.

Societal Impact

The DEEP project provides effective solutions to analyze and harvest data from secondary sources such as news articles, social media, and reports that are used by responders and analysts in humanitarian crises. During crises, rapidly identifying important information from the constantly-increasing data is crucial to understand the needs of affected populations and to improve evidence-based decision making. Despite the general effectiveness of DEEP, its ML-based features (in particular the text classifier) lack efficient accuracy, especially in domains with little or no training data.

The benefits of the challenge would be immediately seen in helping to increase the quality of the humanitarian community’s secondary data analysis. As such, humanitarian analysts would be able to spend time doing what the human mind does best: subjective analysis of information. The legwork of the easier to automate tasks such as initial sourcing of data and extraction of potentially relevant information can be left to their android counterparts. With these improvements, the time required to gain key insights in humanitarian situations will be greatly decreased, and valuable aid and assistance can be distributed in a more efficient and targeted manner, while bringing together both in-crisis information, crucial contextual information on socio-economic issues, human rights, peace missions etc. that are currently disjointed.

What should I know to get involved?

The challenge is the classification of multilingual text snippets of 4 organizations into 12 sectors (labels). The data is provided in 4 sets, each one belongs to a humanitarian organization. The amount of the available data highly differs across the organizations. The first 3 organizations have used the same set of sectors; the 4th is tagged based on a different set of sectors, however, its sectors have many semantic overlaps with the ones of the first three organizations. The success of the final classifiers is measured base on the average of the prediction accuracies of organizations.

Resources

The data consists of 4 sets, belonging to 4 organizations (org1 to org4), and each comes with a development set (orgX_dev), and a test set (orgX_test).

The development sets contain the following fields:

id: the unique identifier of text snippet; a string value, created by concatenating the name of the organization with a distinct number, for example org1_13005.
entry_original: the original text of the snippet, provided in languages, such as English and Spanish.
language: the language of the text snippet.
entry_translated: the translation of the text snippet to English, done using Google Translator.
labels: the label identifiers of the sectors. Each entry can have several labels. These labels are separated with semicolons (;).

The test sets contain the following fields:

id: the unique identifier of text snippet.
entry_original: the original text of the snippet.
language: the language of the text snippet.
entry_translated: the translation of the text snippet to English, done using Google Translator.

Important: As mentioned before, the first three organizations have the same labels, but the fourth has a set of different ones. The sectors regarding each label identifier are provided in the label_captions file. Later in this section, you can find a detailed explanation of the meaning of these sectors, and their potential semantic relations.

Submissions

As mentioned above, each entry in train data can have one or more labels (sectors). However, for submission you should provide the prediction of only one label, namely the most probable one.

Given the test sets of the 4 organizations, the submissions should be provided in comma-separated (,) CSV format, containing the following two fields:

id: the unique identifier of text snippets in the test sets
predicted_label: the unique identifier of ONE predicted label

The submission file contains the predictions of all 4 organizations together. Here an example of a submission file:

`id,predicted_label

org1_8186,1

org1_11018,10

…

org2_3828,5

org2_5340,9

…

org3_2206,8

org3_1875,4

…

org4_75,107

org4_158,104

…

Evaluation

The evaluation is done based on the mean of accuracy values over the organizations: we first calculate the accuracy of the predictions of the test data of each organization, and then report the average of these 4 accuracy values. This measure is referred to as Mean of Accuracies.

Since the reference data, similar to train data, can assign one or more labels to each entry, we consider a prediction as correct, when at least one of the reference labels are predicted.

This evaluation measure gives the same weight to each organization, although each organization has a different number of test data. It incentivizes good performance on the organizations with smaller available training (and also test) data, as they have the same importance as the other ones.

To facilitate the development and test of the systems, we provide the evaluation script (deep_evaluator.py), available in Resources.

A Guidance through the Sectors

Humanitarian response is organised in thematic clusters. Clusters are groups of humanitarian organizations, both UN and non-UN, in each of the main sectors of humanitarian action, e.g. water, health and logistics. Those serve as global organizing principle to coordinate humanitarian response.

Sectors for the first, second, and third organization:

(1) Agriculture
(2) Cross: short form of Cross-sectoral; areas of humanitarian response that require action in more than one sector. For example malnutrition requires humanitarian interventions in health, access to food, access to basic hygiene items and clean water, and access to non-food items such as bottles to feed infants.
(3) Education
(4) Food
(5) Health
(6) Livelihood: Access to employment and income
(7) Logistics: Any logistical support needed to carry out humanitarian activities e.g. air transport, satellite phone connection etc.
(8) NFI: Non-food items needed in daily life that are not food such as bedding, mattrassess, jerrycans, coal or oil for heating
(9) Nutrition
(10) Protection
(11) Shelter
(12) WASH (Water, Sanitation and Hygiene)

Sectors for the fourth organization:

(101) Child Protection
(102) Early Recovery and Livelihoods
(103) Education
(104) Food
(105) GBV: Gender Based Violence
(106) Health
(107) Logistics
(108) Mine Action
(109) Nutrition
(110) Protection
(111) Shelter and NFIs
(112) WASH

Snake Species Identification Challenge

URL: https://www.aicrowd.com/challenges/snake-species-identification-challenge

Snakebite is the most deadly neglected tropical disease (NTD), being responsible for a dramatic humanitarian crisis in global health

Snakebite causes over 100,000 human deaths and 400,000 victims of disability and disfigurement globally every year. It affects poor and rural communities in developing countries, which host the highest venomous snake diversity and the highest burden of snakebite due to limited medical expertise and access to antivenoms

Antivenoms can be life‐saving when correctly administered but this depends first on the correct taxonomic identification (i.e. family, genus, species) of the biting snake. Snake identification is challenging due to:

their high diversity
the incomplete or misleading information provided by snakebite victims
the lack of knowledge or resources in herpetology that healthcare professionals have

In this challenge we want to explore how Machine Learning can help with snake identification, in order to potentially reduce erroneous and delayed healthcare actions.

Dive into Deep Learning

Dive into Deep Learning

An interactive deep learning book with code, math, and discussions, based on the NumPy interface. https://d2l.ai/

IDAO: International Data Analysis Olympiad

URL: https://idao.world/

Higher School of Economics and Yandex are proud to announce the 3rd international data analysts olympiad.

The event is open to all teams and individuals, be they undergraduate, postgraduate or PhD students, company employees, researchers or new data scientists, .

The event aims to bridge the gap between the all-increasing complexity of Machine Learning models and performance bottlenecks of the industry. The participants will strive not only to maximize the quality of their predictions, but also to devise resource-efficient algorithms.

This will be a team machine learning competition, divided into two stages. The first stage will be online, open to all participants. The second stage will be the offline on-site finals, in which the top 30 performing teams from the online round will compete at the Yandex office in Moscow.

Open Cities AI Challenge: Segmenting Buildings for Disaster Resilience

LINK: https://www.drivendata.org/competitions/60/building-segmentation-disaster-resilience/

DrivenData: Hakuna Ma-data: Identify Wildlife on the Serengeti with AI for Earth

Link: https://www.drivendata.org/competitions/59/camera-trap-serengeti/

Overview

Leverage millions of images of animals on the Serengeti to build a classifier that distinguishes between gazelles, lions, and more!

In this competition, participants will predict the presence and species of wildlife in new camera trap data from the Snapshot Serengeti project, which boasts over 6 million images.

Camera traps are motion-triggered systems for passively collecting animal behavior data with minimal disturbance to their natural tendencies. Camera traps are an invaluable tool in conservation research, but the sheer amount of data they generate presents a huge barrier to using them effectively. This is where AI can help!

There are two immediate challenges where efforts like this competition are needed:

Camera traps can’t automatically label the animals they observe, creating an immense (and sometimes prohibitive) burden on humans to determine where and what wildlife are present.
Even when automated animal tagging models are available, the models that do exist don’t generalize well across time and locations, severely limiting their usefulness with new data.

To address these opportunities, we’re challenging data scientists, researchers, and developers from around the world to build the best algorithms for wildlife detection.

The competition is designed with a few objectives in mind:

Innovation: Participants use state-of-the-art approaches in computer vision and AI and get live feedback on how well their solutions perform
Generalization: This competition is designed to reward the best generalizable solutions. The private test data used to determine the winners will come entirely from the latest, unreleased season of data from the Snapshot Serengeti project (season 11). For more information on the competition timeline and evaluation, see the problem description.
Execution: Models are trained locally and submitted to execute inference in the cloud – read on!
Openness: All prize-winning models are released under an open source license for anyone to use and learn from

This is a brand new kind of DrivenData challenge! Previous models trained for camera trap images have often failed to generalize well. In this competition, we want to reward the models that generalize best to new images, so you won’t interact directly with the test set. Rather than submitting your predicted labels for a test set you have, you’ll package up everything needed to do inference and send it to us. We’ll execute that code on Azure in a Docker container that has access to the test set images. By leveraging Microsoft Azure’s cloud computing platform and Docker containers, we’re moving our competition infrastructure one step closer to translating participants’ innovation into impact.

We can’t wait to run what you come up with!

National Data Science Challenge

National Data Science Challenge: https://careers.shopee.co.id/ndsc/

The objectives of the National Data Science Challenge (NDSC) are to:

Bring the tech community closer through working together and knowledge sharing
Provide an environment for the development of creative new ideas in Data Science
Equip students and professionals with essential technical skills and expertise to prepare them for Industry 4.0

Through this competition, we hope to showcase the ubiquity and usefulness of data in getting insights, thus reinforcing Indonesia’s emphasis on driving the digital economy and the use of big data.