The challenge consists of developing the entire life cycle of a Big Data application including data acquisition and storage, data preprocessing and indexing, and data queries and visualization.
The solution must use the Lithops Toolkit and run in the IBM Cloud.
This challenge is organized by the European Research project CloudButton. The project is developing novel Cloud technologies aiming to democratize Big Data applications in the Cloud. The CloudButton project has created the Lithops Toolkit that will be used to implement the Big Data challenge.
The challenge is organized by Universitat Rovira i Virgili, with the collaboration of IBM, RedHat and ATOS. The evaluation committee will include representatives from the four institutions.
Students of Computer Engineering in the last courses and Master students can participate in groups of up to three members.
Application is open until April 15th.
The deadline to submit your solutions is June 12th, and we will announce the winners in a public event by June 18th.
In the context of the distributed system course, but open to external participants, we will provide training in:
Students are free to select the topic and data selected for their challenge, but we encourage students to select data in the catalan and/or spanish language.
In groups of two or three people, you will create a distributed system using Cloud technologies with three main functionalities: (i) create a new text dataset extracting information from the Web (Web crawler, Twitter APIs, …) and store it in Cloud Object Storage, and (ii) preprocess the text dataset to build structured data (csv) that can be queried and analyzed later on. And (iii) create Python notebooks to demonstrate date related queries, basic visualization, and sentiment analysis techniques over the data.
To build the system, you will leverage the Lithops.cloud toolkit developed in CloudButton. This toolkit enables to launch processes in the Cloud over Cloud Functions, and to store data in Cloud Object Storage. We will provide training and examples of how to use this toolkit.
Students can propose their own ideas for data acquisition and analysis, but they may align with Open Data inititiatives like Open Data Lab and Tarragona Open Data Lab. These initiatives are these days interested in datasets helping to understand the social and economic impact of the COVID pandemic in Tarragona/Catalonia.
Some potential ideas could be:
You can also include in your challenge available open data sets with different data formats:
The winner team will receive 1500 euros and a diploma signed and stamped by the organizers (URV, IBM, RedHat, ATOS). Second and third teams will also receive a diploma.