Organization
Location
Badges
Activity
Ratings Progression
Challenge Categories
Challenges Entered
Generate Synchronised & Contextually Accurate Videos
Latest submissions
Improve RAG with Real-World Benchmarks
Latest submissions
Revolutionise E-Commerce with LLM!
Latest submissions
Revolutionising Interior Design with AI
Latest submissions
Multi-Agent Dynamics & Mixed-Motive Cooperation
Latest submissions
Advanced Building Control & Grid-Resilience
Latest submissions
Specialize and Bargain in Brave New Worlds
Latest submissions
Trick Large Language Models
Latest submissions
Shopping Session Dataset
Latest submissions
Understand semantic segmentation and monocular depth estimation from downward-facing drone images
Latest submissions
Audio Source Separation using AI
Latest submissions
Identify user photos in the marketplace
Latest submissions
A benchmark for image-based food recognition
Latest submissions
Using AI For Buildingβs Energy Management
Latest submissions
Learning From Human-Feedback
Latest submissions
What data should you label to get the most value for your money?
Latest submissions
Interactive embodied agents for Human-AI collaboration
Latest submissions
Specialize and Bargain in Brave New Worlds
Latest submissions
Amazon KDD Cup 2022
Latest submissions
Behavioral Representation Learning from Animal Poses.
Latest submissions
Airborne Object Tracking Challenge
Latest submissions
ASCII-rendered single-player dungeon crawl game
Latest submissions
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Self-driving RL on DeepRacer cars - From simulation to real world
Latest submissions
3D Seismic Image Interpretation by Machine Learning
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Multi-Agent Reinforcement Learning on Trains
Latest submissions
A dataset and open-ended challenge for music recommendation research
Latest submissions
A benchmark for image-based food recognition
Latest submissions
Latest submissions
Sample-efficient reinforcement learning in Minecraft
Latest submissions
Latest submissions
5 Puzzles, 3 Weeks. Can you solve them all? π
Latest submissions
Multi-agent RL in game environment. Train your Derklings, creatures with a neural network brain, to fight for you!
Latest submissions
Predicting smell of molecular compounds
Latest submissions
Find all the aircraft!
Latest submissions
5 Problems 21 Days. Can you solve it all?
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
5 Puzzles, 3 Weeks | Can you solve them all?
Latest submissions
Latest submissions
Grouping/Sorting players into their respective teams
Latest submissions
5 Problems 15 Days. Can you solve it all?
Latest submissions
5 Problems 15 Days. Can you solve it all?
Latest submissions
Predict Heart Disease
Latest submissions
5 PROBLEMS 3 WEEKS. CAN YOU SOLVE THEM ALL?
Latest submissions
Latest submissions
Remove Smoke from Image
Latest submissions
Classify Rotation of F1 Cars
Latest submissions
Can you classify Research Papers into different categories ?
Latest submissions
Can you dock a spacecraft to ISS ?
Latest submissions
Multi-Agent Reinforcement Learning on Trains
Latest submissions
Multi-Class Object Detection on Road Scene Images
Latest submissions
Localization, SLAM, Place Recognition, Visual Navigation, Loop Closure Detection
Latest submissions
Localization, SLAM, Place Recognition
Latest submissions
Detect Mask From Faces
Latest submissions
Identify Words from silent video inputs.
Latest submissions
A Challenge on Continual Learning using Real-World Imagery
Latest submissions
Latest submissions
See Allgraded | 200977 |
Music source separation of an audio signal into separate tracks for vocals, bass, drums, and other
Latest submissions
Amazon KDD Cup 2023
Latest submissions
Amazon KDD Cup 2023
Latest submissions
Make Informed Decisions with Shopping Knowledge
Latest submissions
Participant | Rating |
---|---|
vrv | 0 |
cadabullos | 0 |
cavalier_anonyme | 0 |
Participant | Rating |
---|
-
powerpuff AI Blitz XView
-
teamux NeurIPS 2021 - The NetHack ChallengeView
-
tempteam NeurIPS 2022 IGLU ChallengeView
-
testing Sound Demixing Challenge 2023View
-
grogu HackAPrompt 2023View
-
apollo11 MosquitoAlert Challenge 2023View
-
testteam Commonsense Persona-Grounded Dialogue Challenge 2023View
-
temp-team Generative Interior Design Challenge 2024View
Amazon KDD Cup 2024: Multi-Task Online Shopping Ch
Meta Comprehensive RAG Benchmark: KDD Cup 2-9d1937
Winnerβs Solution Overview: Meta KDD Cup 2024 - Team db3
5 days agoTeam db3 consists of third-year PhD students from Peking University, mentored by Professor Gao Jun. Their research focuses on data mining for structured data, including community search, graph alignment, and table data integration. With a strong background in leveraging data mining for extracting insights in fields such as social networks and bioinformatics, the teamβs expertise is especially pertinent to their work with large language models (LLMs) and Retrieval-Augmented Generation (RAG) systems.
Winning Strategy:
Team db3 excelled in all three tasks of the Meta KDD Cup 2024, securing first place with scores of 28.4%, 42.7%, and 47.8%, respectively. Their approach to creating a state-of-the-art RAG system involved several sophisticated techniques:
- Task 1 - Web Retrieval and Answering:
The team developed a framework utilizing a combination of retrievers and rerankers to process and rank text chunks extracted from web pages. They employed BeautifulSoup for HTML parsing and LangChain for text splitting, alongside the bge-base-en-v1.5 retriever and a complementary reranker model to refine the selection of relevant text chunks.
- Tasks 2 and 3 - Integration of Structured Data:
For the subsequent tasks, the team focused on integrating data from both web sources and mock Knowledge Graphs. They implemented a regularized API set and an API generation method using a tuned LLM. A Parent-Child Chunk Retriever system was crucial in managing the retrieval process, with the reranker further refining data selection to enhance accuracy and relevance.
- Addressing Hallucination in LLMs:
A significant aspect of their strategy was tuning the models to reduce inaccuracies and improve groundedness in responses, thereby addressing the issue of hallucination commonly associated with LLMs.
Impact and Research Alignment:
Team db3βs method demonstrates the potential of RAG systems in providing accurate and reliable answers by effectively integrating and processing external information. Their strategy aligns seamlessly with their research interests in data mining and structured data analysis, particularly their focus on knowledge graphs, which represent information in a structured format essential for various applications like semantic search and intelligent personal assistants.
[Issue resolved] We were asked to submit the revision but revision functionality is disabled
3 months agoThe deadline is now updated on OpenReview too. Please try to make a submission again.
π Meta CRAG Challenge 2024 Winners Announcement
4 months agoAugust 02, 2024 11:59PM UTC-0, updated in the post.
π Meta CRAG Challenge 2024 Winners Announcement
4 months agoHello Participants,
We are excited to announce the winners of the Meta CRAG Challenge 2024. We appreciate your patience as we completed the due diligence process for the annotations.
We want to thank all participants for their efforts and contributions to improve RAG solutions. Over the last four months, the Meta CRAG Challenge has seen over 2000 participants from various countries, making over 5500 submissions. Below are the winners for each task. The final evaluation process can be found here: Final Evaluation Process & Team Scores
Task 1: Retrieval Summarization
- Team db3
- Team md_dh
- Team ElectricSheep
- Simple_w_condition Question: Team dummy_model
- Set Question: Team dummy_model
- Comparison Question: Team dRAGonRAnGers
- Aggregation Question: Team dummy_model
- Multi-hop Question: Team bumblebee7
- Post-processing Question: Team dRAGonRAnGers
- False Premise Question: Team ETSLab
Task 2: Knowledge Graph and Web Retrieval
- Team db3
- Team APEX
- Team md_dh
- Simple_w_condition Question: Team ElectricSheep
- Set Question: Team ElectricSheep
- Comparison Question: Team dRAGonRAnGers
- Aggregation Question: Team ElectricSheep
- Multi-hop Question: Team ElectricSheep
- Post-processing Question: Team ElectricSheep
- False Premise Question: Team Future
Task 3: End-to-End Retrieval-Augmented Generation
- Team db3
- Team APEX
- Team vslyu-team
- Simple_w_condition Question: Team StarTeam
- Set Question: Team md_dh
- Comparison Question: Team dRAGonRAnGers
- Aggregation Question: Team md_dh
- Multi-hop Question: Team ETSLab
- Post-processing Question: Team md_dh
- False Premise Question: Team Riviera4
We welcome all teams to submit a technical report to the 2024 KDD Cup RAG Workshop. The report should follow the KDD submission format.
- Submission link: OpenReview
- Submission deadline: 11:59 UTC, Aug 2, 2024
- Submission format: Submissions are limited to 8 pages (excluding references), must be in PDF, and use the ACM Conference Proceeding template (two-column format). For more details, please refer to: KDD 2024 ADS Track Call for Papers
We extend our sincere gratitude to all participants who contributed to this event!
All the best,
Meta & AIcrowd
Final Evaluation Process & Team Scores
4 months agoHello Participants,
We want to thank all participants for their efforts and contributions to improving RAG systems. This post provides information about the final evaluation process and shares the final scores.
Manual annotation:
For stage 2 in this challenge, we first conducted auto-evaluation for all the teams that provided submission numbers in Phase 2. We then selected the top 15 teamsβ submissions according to the auto-eval scores for manual evaluation. The final scores are determined by the scores calculated from the manual grading labels. Automatic and manual evaluation details can be found in the paper.
Weighting:
We applied traffic weights to the questions to understand the solutions in real-world use cases. The traffic weights come from a real QA use case and are generated as follows. Within each domain, we first clustered the questions into question types with the exact definition of the CRAG questions. Then, we derived the weights for each type based on aggregated data reflective of user interactions. We applied the weight to each CRAG question to bridge the result to reflect user experience and reported the macro average scores across all domains (i.e., giving the same weight to all domains).
Code validation
We conducted a code review for the winning solutions to ensure the validity of the codes. For example, we checked whether there was any prompt attack to mislead the auto-evaluation and whether the solutions used too many hard-coded answers.
The scores from the winning teams are listed below.
Task | Team | Score |
---|---|---|
Task 1 | db3 | 28.40% |
md_dh | 24% | |
ElectricSheep | 21.8% | |
Task 2 | db3 | 42.7% |
APEX | 41.0% | |
md_dh | 31.0% | |
Task 3 | db3 | 47.8% |
APEX | 44.9% | |
vslyu-team | 25.6% |
The scores from the winning teams are listed below.
Task | Question Type | Team | Score |
---|---|---|---|
Task 1 | simple_w_condition | dummy_model | 17.9 |
set | dummy_model | 21.25 | |
comparison | dRAGonRAnGers | 37 | |
aggregation | dummy_model | 21.5 | |
multi_hop | bumblebee7 | 16.8 | |
post_processing | dRAGonRAnGers | 8.6 | |
false_premise | ETSLab | 65.2 | |
Task 2 | simple_w_condition | ElectricSheep | 23.9 |
set | ElectricSheep | 36.65 | |
comparison | dRAGonRAnGers | 38 | |
aggregation | ElectricSheep | 18.75 | |
multi_hop | ElectricSheep | 23.2 | |
post_processing | ElectricSheep | 11.75 | |
false_premise | Future | 64.6 | |
Task 3 | simple_w_condition | StarTeam | 42.2 |
set | md_dh | 31.7 | |
comparison | dRAGonRAnGers | 37.25 | |
aggregation | md_dh | 26.6 | |
multi_hop | ETSLab | 25.7 | |
post_processing | md_dh | 8.3 | |
false_premise | Riviera4 | 72.2 |
We extend our sincere gratitude to all participants who contributed to this event!
All the best,
Meta & AIcrowd
Commonsense Persona-Grounded Dialogue Chall-459c12
Winnerβs Solution Overview: Commonsense Persona-Grounded Challenge 2024
5 days agoTask 1: Commonsense Dialogue Response Generation | Kaihua Ni
Kaihua Ni, an alumnus of the University of Leeds with a major in Artificial Intelligence, has extensive experience in AI and deep learning algorithms, having worked at major companies such as Augmentum and CareerBuilder. His expertise lies in natural language processing and the nuances of human conversation.
Winning Strategy:
Kaihuaβs strategy for the Commonsense Persona-Grounded Challenge 2024 was rooted in a two-pronged approach focusing on fine-tuning a large language model (LLM) and expert, prompt engineering:
- Fine-Tuning the LLM:
Utilizing transfer learning techniques, Kaihua adapted the pre-existing parameters of the LLM to the specific conversational style and knowledge domain of the persona being emulated. The model was trained on a curated dataset consisting of dialogues, written works, and other textual representations of the persona, significantly enhancing its ability to mimic the personaβs syntactic and semantic patterns.
- Prompt Engineering:
Kaihua crafted optimized prompts that encapsulated the context of the conversation while embedding subtle cues that aligned with the personal characteristics of the persona. This optimization steered the model to generate responses that were contextually relevant and infused with the personaβs idiosyncratic communication style.
- Advanced NLP Techniques:
Attention mechanisms and context window adjustments were employed to maintain coherence and context retention across multi-turn dialogues. Kaihua also developed a custom evaluation metric aligned with the challengeβs criteria to iteratively assess and refine the modelβs performance.
- Ethical Considerations:
Ethical aspects were critically addressed, ensuring that the AIβs mimicry respected the privacy and dignity of the individual. Kaihua implemented strict boundaries on the use of personal information and incorporated safeguards against generating inappropriate or harmful content.
Task 2: Commonsense Persona Knowledge Linking | Iris Lin
Iris Lin, an experienced Machine Learning Engineer, specializes in developing and optimizing large-scale machine learning models. With a robust background in deep learning, natural language processing, and recommendation systems, Iris is proficient in Scala, Python, Java, Spark, PyTorch, TensorFlow, and various machine learning libraries.
Winning Strategy:
Team biu_biu, managed by Iris Lin, employed a comprehensive approach to developing a commonsense persona knowledge linker for CPDC Task 2, leveraging cutting-edge techniques and tools:
1. Baseline Evaluation with ComFact Model:
The team initiated the project by testing the provided ComFact baseline model on a hidden test to pinpoint its limitations and identify areas for improvement in linking persona commonsense facts to dialogue contexts.
2. Dataset Curation and Enhancement:
The team merged the Conv2 and Peacock datasets to refine the training process, focusing on conversations that incorporated persona knowledge. This ensured the model was trained on highly relevant data. They also employed GPT-3.5-Turbo to create a synthetic dataset by labelling persona facts in 20,000 conversations, providing a diverse and extensive training foundation.
3. Model Fine-Tuning with Deberta-V3:
The team fine-tuned the Deberta-V3 model, renowned for its effectiveness in various NLP tasks. A rigorous hyperparameter search was conducted to optimize performance, particularly in capturing the nuances of persona knowledge linking.
4. Comprehensive Model Evaluation:
The model was rigorously evaluated under two settings: predicting head and tail facts separately and simultaneously. This dual-testing strategy allowed the team to thoroughly assess the modelβs versatility and pinpoint any potential enhancements.
Sounding Video Generation (SVG) Challenge 2024
π¬ Feedback & Suggestions
About 1 month agoπΉ Welcome to Sounding Video Generation Challenge 2024
About 1 month agoπ₯ Looking for teammates?
About 1 month agoπ₯ Looking for teammates?
2 months ago㪠Feedback & Suggestions
2 months agoπΉ Welcome to Sounding Video Generation Challenge 2024
2 months agoThe Sounding Video Generation (SVG) Challenge 2024 is a competition to create AI models that make videos where the visuals match perfectly with sounds, like a dog barking in sync with the video. Participants will work to improve how well sounds and scenes align, with prizes for the best results.
ADVANCING AUDIO-VISUAL SYNCHRONISATION
Join the Sounding Video Generation (SVG) Challenge 2024, a groundbreaking competition at the intersection of video generation and audio-visual synchronisation. This innovative challenge invites participants to build state-of-the-art AI models that generate perfectly aligned and contextually accurate videos guided by audio.
Use your machine learning expertise to advance the frontier of audio-visual synchronisation, transforming datasets into dynamic, synchronised videos. With the right code and creativity, youβll contribute to a rapidly evolving field thatβs set to redefine how we generate and experience multi-modal content.
The Task
Develop models that generate videos with synchronised and contextually relevant audio in two specialised tracks:
- Temporal Alignment Track: Create videos where the audio is perfectly synchronised with the video content in time (e.g., a dog barking exactly when seen in the video). Find the starter kit here.
- Spatial Alignment Track: Develop models that produce videos with spatially aligned audio, creating a real sense of direction and space. Find the starter kit here.
Both tracks aim to push the boundaries of multi-modal AI, offering a unique platform to benchmark cutting-edge solutions in this underexplored domain.
Check out the starter-kits for Temporal Alignment Track and Spatial Alignment Track.
Download the resources for the challenge Temporal Alignment Track and Spatial Alignment Track.
Timeline
- Warmup Round: 29th Oct 2024
- Phase I: 2nd Dec 2024
- Phase II: 3rd Jan 2025
- Challenge End: 25th Mar 2025
Prizes
The challenge boasts a prize pool of USD 35,000 split across both tracks. The top three teams or participants in each track will be rewarded as follows:
-
Track 1: Temporal Alignment ($17,500)
- First place: USD 10,000
- Second place: USD 5,000
- Third place: USD 2,500
-
Track 2: Spatial Alignment ($17,500)
- First place: USD 10,000
- Second place: USD 5,000
- Third place: USD 2,500
Join a Thriving Community
Collaborate with like-minded researchers, practitioners, and AI enthusiasts eager to share ideas, team up, and drive innovation in this captivating challenge.
Sign up now for the SVG Challenge 2024 and start building your models in the warm-up round!
π¬ Feedback & Suggestions
2 months agoWe are constantly trying to improve this challenge for you and would appreciate any feedback you might have!
Please reply to this thread with your suggestions and feedback on making the challenge better for you!
- What have been your major pain points so far?
- What would you like to see improved?
All The Best!
π₯ Looking for teammates?
2 months agoCompeting is more fun with a team!
Introduce yourself here, and find others who are looking to team up!
Format:
- A short introduction about you and your background.
- What brings you to this challenge?
- Some ideas you wish to explore as a part of this challenge?
All The Best!
Winnerβs Solution Overview: KDD Cup 2024 - Team NVIDIA
5 days agoTeam NVIDIA, a group of data scientists and technologists, brought diverse skills to the KDD Cup 2024. Key members include Gilberto, a former #1 ranked Kaggle competitor with a background in Electrical Engineering; Chris, a Ph.D. holder in computational science and mathematics with experience across various professions; Benedikt Schifferer, a manager of Applied Research with expertise in recommender systems and LLMs; Ivan Sorokin, a Senior LLM Technologist; Ahmet Erdem, a Kaggle Grandmaster and open-source contributor; and Simon, a senior LLM technologist specializing in deep learning applications in computer vision and NLP.
Winning Strategy:
Team NVIDIAβs strategy for the KDD Cup 2024 involved the deployment of five fine-tuned Qwen2-72B LLM models, one for each of the competitionβs tracks, leveraging cutting-edge techniques and substantial computational resources:
β’ The team transformed data from six public datasets, including Amazon-M2 and MMLU, into 500k question-answer pairs across 40 tasks and 5 task types.
β’ They fine-tuned multiple Qwen2-72B models using QLoRA on NVIDIAβs powerful 8xA100 80GB GPUs, employing techniques like DeepSpeed and Axolotl for efficiency.
β’ Fine-tuning involved adjusting LoRA parameters and experimenting with different weights for model adapters to optimize performance across various tasks.
β’ The models were trained with specific prompts tailored to simulate an online shopping assistant, enhancing task-specific performance.
β’ To meet the competitionβs stringent hardware limitations and inference time constraints, Team NVIDIA employed 4-bit AWQ quantization and batch inference strategies using software vLLM, significantly reducing the modelβs memory footprint.
β’ During inference, logits processors were added to the modelβs predictions to ensure output accuracy, particularly in handling structured responses like numbers and commas.
β’ The final submissions for each track involved sophisticated ensembles of base models and multiple LoRA adapters, fine-tuned to enhance the accuracy and robustness of the solutions.
Impact and Contributions:
Team NVIDIAβs comprehensive approach showcased their technical prowess and ability to innovate within constraints, leading to their first-place victory in all five competition tracks. Their work demonstrates the powerful capabilities of LLMs in handling diverse and complex real-world NLP tasks, particularly in a competitive setting with limited hardware resources.