Round 1: Completed

PlantVillage Disease Classification Challenge

PlantVillage is built on the premise that all knowledge that helps people grow food should be openly accessible to anyone on the planet.


We depend on edible plants just as we depend on oxygen. Without crops, there is no food, and without food, there is no life. It’s no accident that human civilization began to thrive with the invention of agriculture.

Today, modern technology allows us to grow crops in quantities necessary for a steady food supply for billions of people. But diseases remain a major threat to this supply, and a large fraction of crops are lost each year to diseases. The situation is particularly dire for the 500 million smallholder farmers around the globe, whose livelihoods depend on their crops doing well. In Africa alone, 80% of the agricultural output comes from smallholder farmers.

With billions of smartphones around the globe, wouldn’t it be great if the smartphone could be turned into a disease diagnostics tool, recognizing diseases from images it captures with its camera? This challenge is the first of many steps turning this vision into a reality. PlantVillage is a not-for-profit project by Penn State University in the US and EPFL in Switzerland. We have collected - and continue to collect - tens of thousands of images of diseased and healthy crops. The goal of this challenge is to develop algorithms than can accurately diagnose a disease based on an image.

Here are the 38 classes of crop disease pairs that the dataset is offering:

To learn more about the background of the dataset, please refer to the following paper: http://arxiv.org/abs/1511.08060. You must cite this paper if you use the dataset.

Evaluation criteria

Submissions will be evaluated using a Multi Class Log Loss evaluation function, which are defined as :

Mean F1 score

The F1 score is computed separately for all classes by using:

  • p refers to the precision
  • r refers to the recall
  • tp refers to the number of True Positives,
  • fp refers to the number of False Positives
  • fn refers to the number of False Negatives

Then finally the Mean of all the F1 scores across all the classes is used for come up with the combined Mean F1 score.

Mean Log Loss

  • N is the total number of examples in the test set
  • M is the total number of class labels (38 for this challenge)
  • y ij is a boolean value representing if the i-th instance in the test set belongs to the j-th label.
  • p ij is the probability according to your submission that the i-th instance may belong to the j-th label.
  • Ln is the natural logarithmic function.

All submissions will be evaluated on the test dataset in the docker containers referenced in the Resources section. The code archive will be uncompressed into the /plantvillage path, and every code archive is expected to contain a main.sh script which takes path to a folder containing images as its first parameter. So to test your code submission, we will finally execute :

/plantvillage/main.sh pathToFolderContainingTestImages

This is expected to output a CSV file containing the name of the file, and the associated probabilities for all the classes at the location :



References to Docker Containers where the submissions will be tested ::

Caffe : https://hub.docker.com/r/tleyden5iwx/caffe-gpu-master/ Tensorflow : https://hub.docker.com/r/tensorflow/tensorflow/ Torch7 : https://hub.docker.com/r/kaixhin/cuda-torch/ Scikit-Learn :(Python-2): https://github.com/dataquestio/ds-containers/tree/master/python2 Scikit-Learn : (Python-3): https://github.com/dataquestio/ds-containers/tree/master/python3 Octave : https://hub.docker.com/r/schickling/octave/ Keras : https://hub.docker.com/r/patdiscvrd/keras/~/dockerfile/

Feel free to shoot us an email if you want to be able to submit code in your favourite language or framework :D We would be happy to help :)


The author of the most highly ranked submission will be invited to the crowdAI winner’s symposium at EPFL in Switzerland on January 30/31, 2017. The educational award is given to the participant with the either the most insightful submission posts, or the best tutorial - the recipient of this award will also be invited to the symposium (the crowdAI team will pick the recipient of this award). Expenses for travel and accommodation are covered by crowdAI.

Datasets License

All images are released under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), with the clarification that algorithms trained on the data fall under the same license.