Spatial Alignment Track
Challenge Rules
SOUNDING VIDEO GENERATION CHALLENGE OFFICIAL RULES
PLEASE READ THESE OFFICIAL RULES CAREFULLY. ENTRY INTO THIS CHALLENGE CONSTITUTES YOUR ACCEPTANCE OF THESE OFFICIAL RULES. IF YOU DO NOT AGREE TO ANY PART OF THESE OFFICIAL RULES, PLEASE DO NOT ENTER THIS CHALLENGE.
NO PURCHASE IS NECESSARY TO ENTER OR WIN. A PURCHASE OF ANY KIND WILL NOT INCREASE YOUR CHANCES OF WINNING VOID WHERE PROHIBITED.
1. CHALLENGE DESCRIPTION
Sounding Video Generation Challenge is an opportunity for researchers and machine learning enthusiasts to test their skills on the tasks of Temporal Alignment (Track 1) and Spatial Alignment (Track 2) for sounding video generation.
2. SPONSORS
The Challenge is sponsored by the following organizations: Sony Group Corporation , with its principal place of business at 1-7-1 Konan, Minato-ku, Tokyo 108-0075, Japan.
These organizations will be referred to as "Organizers'' collectively from here on.
3. ORGANIZERS ADMINS
"Organizers Admins" are any companies or organizations authorized by Organizers to aid them with the administration or execution of this Challenge including but not limited to AIcrowd SA.
4. CHALLENGE START AND END DATES
The challenge will take place across two tracks and in 3 Rounds which differ in the evaluation dataset used for ranking the systems. The tentative launch dates for each of the Rounds are as follows:
- Warmup Round: 29th Oct 2024
- Phase I: 2nd Dec 2024
- Phase II: 3rd Jan 2025
- Challenge End: 25th Mar 2025
The datasets used for the evaluations of Round 1 and Round 2 will be split across 3 parts. During Round 1, participants will only see their scores on the first split. During Round 2, participants will only see their scores on the 2nd split. The final leaderboard will be based on the scores on the full hidden test set for the specific leaderboard of the specific track.
5. AM I ELIGIBLE TO ENTER THE CHALLENGE?
You are eligible to enter this Challenge if you (and each member of your Team) meet all of the following requirements as of the time and date of entry:
- You are an individual;
- You are 18 years of age or older but in no event less than the age of majority in your place of residence;
- You have Internet Access, an Email Account, and access to a personal computer;
The residents of the following countries or regions are not eligible for cash prizes of the competition:
- The Crimea region of Ukraine
- Cuba
- Iran
- North Korea
- Sudan
- Syria
- Quebec, Canada
- Brazil
- Italy
- Russian Federation
Please note that residents of these countries or regions are still allowed to participate in the challenge and retain their final rank on the leaderboard of the competition. Any cash prizes associated with a leaderboard rank held by a non-eligible team will be passed onto the next eligible team on the leaderboard.
Please Note: it is entirely your responsibility to review and understand your employer's and countries policies about your eligibility to participate in this Challenge. If you participate in violation of your employer's or countries policies, you and your Entry may be disqualified from the Challenge. Organizers disclaim any and all liability or responsibility with respect to disputes arising between an employer and such employer's employee or between a country and its resident in relation to this matter.
6. IS THE ENTRY AN ELIGIBLE ENTRY?
To be eligible to be considered for a prize, as solely determined by the Organizers:
The Entry MUST:
- be compatible with the official submission format;
- be in English;
- be the Team's own original work;
- not have been submitted previously in any promotion of any kind;
- not contain material or content that: is inappropriate, indecent, obscene, offensive, sexually explicit, pornographic, hateful, tortious, defamatory, or slanderous or libelous; or promotes bigotry, racism, hatred or harm against any group or individual or promotes discrimination based on race, gender, ethnicity, religion, nationality, disability, sexual orientation, or age; or promotes alcohol, illegal drugs, or tobacco; or violates or infringes another's rights, including but not limited to rights of privacy, publicity, or their intellectual property rights; or is inconsistent with the message, brand, or image of Organizers, is unlawful; or is in violation of or contrary to the laws or regulations of any jurisdiction in which the Entry is created; and
The Team members MUST:
- ensure the Team has obtained any and all consents, approvals, or licenses required for submission of the Entry;
- obtain any consents necessary from all members of the Team with respect to the sharing of such member's personal information as outlined herein;
- obtain the agreement of all members of the Team to these Rules;
- not generate the Entry by any means which violate these Rules, the Organizers Terms of Service or the Organizers Privacy Policy;
- not engage in false, fraudulent, or deceptive acts at any phase during participation in the Challenge; and
- not tamper or abuse any aspect of this Challenge.
7. DISQUALIFICATION
If you, any Team member, or the Entry is found to be ineligible for any reason, including but not limited to conflicts within Teams and noncompliance with Sections 5 and 6 of these Rules, Organizers and Organizers Affiliates reserve the right to disqualify the Entry and/or you and/or your Team members from this Challenge and any other contest or promotional activity sponsored or administered in any way by the Organizers.
A participant is not allowed to create more than one account to participate in the challenge. Violating this will result in disqualification from the challenge.
8. HOW MAY THE ENTRY POTENTIALLY BE USED?
The Entry may be used in a few different ways. Organizers do not claim to own your Team's Entry, however, by submitting the Entry you and each member of your Team:
- hereby grants to Organizers Admins a non-exclusive, irrevocable, royalty-free, world-wide right and license to review and analyze the Entry in relation to this Challenge;
- hereby grants to Organizers and Organizers Admins a non-exclusive, irrevocable, royalty-free, world-wide right and license to use the Entry or parts of your Entry in any media for any non-commercial or commercial purpose in connection with the marketing, sale, or promotion of Organizers, Organizers Admins and their respective products and services;
- agrees that each member will execute any necessary paperwork for Organizers and Organizers Admins to use the rights and licenses granted hereunder;
- acknowledges and agrees that the Team will not be compensated and may not be credited (at Organizers's sole discretion) for the use of the Entry as described in these Rules;
- acknowledges that the Organizers or Organizers Admins may have developed or commissioned materials similar to the Entry and waive any claims resulting from any similarities to the Entry;
- understand that the Entry may be posted on a public website or social media channel and that Organizers is not responsible for any unauthorized use of the Entry by visitors to such site; and
- understand and acknowledge that, subject to provision of Prizes, Organizers are not obligated to use the Entry in any way, even if the Entry is selected as a winning Entry.
Personal data you submit in relation to this Challenge will be used by Organizers and Organizer Admins in accordance to Section 15 of these Rules.
The outputs and analytical findings of each model may be disclosed in scholarly publications. Such disclosures shall include:
- The outputs generated by each model.
- The outcomes of evaluations, including both automated and manual assessments, irrespective of their current adoption.
- The submission IDs for each model, along with associated data made available on the AIcrowd website, including, without limitation, the presence or absence of GPU utilisation.
- The findings from statistical analyses conducted on the responses generated by each model.
9. HOW WILL WINNERS BE SELECTED AND NOTIFIED?
Entries will be judged via an algorithm that will generate a score based upon which Entries will be ranked and such ranking will be displayed on the AIcrowd Site's Challenge and Track specific leaderboard ("Leaderboard").
For all the leaderboards, the algorithm will rank your Entry using a hidden test set.
Temporal Alignment Track
We use the following six metrics for evaluation: Fréchet Video Distance (FVD), Fréchet Audio Distance (FAD), LanguageBind scores for text-audio and text-video pairs, CAVP score, and AV-Align score.
FAD and FVD are used to assess the quality of the generated audios and videos.
LanguageBind scores are used to assess the fidelity of a pair of the generated audio and video to its conditional text input. The scores are computed for each text-audio and text-video pair within the generated sounding videos, and the final score is the average of these individual scores.
AV-Align score and CAVP score are used to assess how much the generated audio and video are temporally aligned with each other. AV-Align score has been proposed by Guy Yariv et al. for temporal alignment evaluation for sounding videos. We slightly modified how to compute AV-Align score from the official implementation. Specifically, we tuned hyper-parameters of the optical flow estimation and those of the onset detection to accurately estimate hitting timing using annotated timestamps in the Greatest Hits dataset. In addition, we compute IoU after rewriting it with precision and recall to mitigate an issue caused by the difference of temporal resolution between video and audio. CAVP score is a cosine similarity between CAVP features extracted from the generated audio and video. In both metrics, the scores are computed for each sounding video, and the final score is the average of these individual scores.
We use the AV-Align as the main metric for ranking and the CAVP score as the secondary metric to break ties. The other four metrics are used to exclude entries that provide low-quality data from the ranking. Specifically, if the score of the submitted model does not exceed the threshold value in any one of these four metrics, the model is excluded from the ranking. The threshold is set as follows: 2.0 for FAD, 900 for FVD, 0.25 for LanguageBind text-audio score, and 0.12 for LanguageBind text-video score.
The top entries in the final leaderboard will be assessed by human evaluation, and the award winning teams will be selected based only on the results of this subjective evaluation.
Spatial Alignment Track
We use Fréchet Video Distance (FVD), Fréchet Audio Distance (FAD), and SpatialAVAlign_gt for evaluation metrics in addition to the metrics in the baseline.
We compute FVD and FAD to assess the quality of generated videos and audios, respectively.
We newly introduce Spatial AV-Align metric, which quantifies the alignment spatially using pretrained object detection and sound event localization and detection (SELD) models. To be specific, we explain the Spatial AV-Align metric below:
- We first detect candidate positions of sounding objects per frame in each modality separately.
- Then, for each position in audio, we validate whether a position is also detected in the video.
- We determine whether a SELD result has an area of overlap with an object detection result. If there is an area of overlap, it is TP, if not, it is FN.
- We don’t validate whether each position in video is detected in audio because the dataset includes person who doesn’t talk or play instruments.
- Finally, we calculate a recall metric as the alignment score ranging between zero and one: Given TP and FN, the alignment score is defined as: TP / (TP + FN)
We use a combined error metric of the above three metrics for this challenge's real-time leaderboard. The combined error metric is called Spatial SVGC Error. We first normalize the above three metrics with their baseline system and ground truth values, e.g., (FVD - FVD_gt) / (FVD_baseline - FVD_gt). Then, we take a weighted sum among the normalized metrics. The weight is set to 1 for FVD and FAD and 2 for Spatial AV-Align, which emphasizes spatial alignment evaluation. Finally the Spatial SVGC Error is computed as: (FVD - FVD_gt) / (FVD_baseline - FVD_gt) + (FAD - FAD_gt) / (FAD_baseline - FAD_gt) + 2 * {(1 – SpatialAV-Align) - (1 – SpatialAV-Align_gt)} / {(1 - SpatialAV-Align_baseline) - (1 – SpatialAV-Align_gt)}.
The top entries in the final leaderboard will be assessed by human evaluation, and the award-winning teams will be selected based only on the results of this subjective evaluation.
TIED ENTRIES
If two or more participating Teams have the same score, a secondary algorithmic metric will be added to the scores. If all scores are identical and prizes are awarded to the teams they will be shared evenly among the Teams.
Potential winners will be contacted via the email associated with AIcrowd.com account through which the Entry was submitted. If a potential winner cannot be contacted, does not respond as directed, refuses the prize, or is found to be ineligible for any reason, such prize may be forfeited and awarded to an alternate winner. Only one alternate winner will be selected per each prize package, after which prizes will remain unawarded.
To be eligible for the prizes, participants will have to release the inference code (and associated weights) to their solutions under an open-source license of their choice with a proper documentation. The submitted code is expected to be reproducible and should produce a similar score as on the leaderboard.
To the extent that there is any dispute as to the identity of the potential winner, the official account holder of the email address associated with the AIcrowd account through which the Entry was first submitted will be deemed the official potential winner by Organizers. The prize distribution will be done in six months.
10. YOUR ODDS OF WINNING
ODDS OF WINNING A PRIZE ARE SUBJECT TO THE TOTAL NUMBER OF ELIGIBLE ENTRIES RECEIVED AND HOW YOUR ENTRY SCORES IN ACCORDANCE TO THE JUDGING CRITERIA.
11. PRIZES
The total prize pool is 35,000 USD, which will be divided as follows.
Track 1: Temporal Alignment Track
- 🥇 First place: 10,000 USD
- 🥈 Second place: 5,000 USD
- 🥉 Third place: 2,500 USD
Track 2: Spatial Alignment Track
- 🥇 First place: 10,000 USD
- 🥈 Second place: 5,000 USD
- 🥉 Third place: 2,500 USD
12. WHEN WILL PRIZES BE AWARDED?
The prizes will be awarded within a commercially reasonable time frame and may take upto six months. All members of a Team may be required to complete and sign additional documentation, such as non-disclosures, representations and warranties, liability and publicity releases (unless prohibited by applicable law), and tax documents, or other similar documentation in order for the potentially winning team to claim the prize. Organizers will in no way be involved in any dispute with respect to receipt of a prize by any other members of a Team.
Only prizes claimed in accordance to these Rules will be awarded.
13. WINNER LIST
A list of all winners of this Challenge will be posted on AIcrowd Site and may be announced at Organizers' discretion via Organizers' Twitter, Facebook, Blog, or Website, or at an Organizer or Organizer Admins sponsored or hosted event
14. YOUR PERSONAL DATA AND PRIVACY
Organizers may use cookies and/or collect IP addresses for the purpose of implementing or exercising its rights or obligations under the Rules, for information purposes, identifying your location, including without limitation for the purpose of redirecting you to the appropriate geographic website, if applicable, or for any other lawful purpose in accordance with the Privacy Policy.
Organizers may use the personal data you provide via your participation in this Challenge:
- to contact you in relation to the Challenge;
- to confirm the details of your Entry;
- to administer and execute this Challenge, including sharing it with Organizer Admins;
- at Organizers' discretion, to credit you and/or your Team for the Entry, identify you and/or your Team as a Winner, or other similar notice; and
- as otherwise noted in these Rules or as necessary for Organizers to meet their obligations under these Rules or applicable law.
Organizers only require name and email address to be submitted for you to participate in this Challenge for its uses as outlined in this Section 15. Please read the terms and conditions of the AIcrowd Site carefully to understand how your data may be used by AIcrowd SA.
15. DATA USE AGREEMENT
Transparency in Data Use: Participants agree to uphold complete transparency in the use of additional data from external sources. This includes clear documentation of all methods and adherence to ethical standards.
External Dataset Usage: Participants using external datasets must ensure their use is permissible for non-commercial or academic research purposes. Compliance with licensing terms is mandatory. The use of any publicly available datasets, other than those provided as part of the SVG Challenge resources, must be clearly declared and justified.
Use Justification: Participants must provide a rationale for using any declared datasets, ensuring that their use aligns with the objectives of the SVG Challenge and does not violate any dataset-specific terms of use.
Penalties for Violation: Any violation of these terms will result in immediate disqualification and potential further actions as determined by the challenge organizers.
By participating in the SVG Challenge, you, the Participant, acknowledge and agree to these terms, confirming your understanding and commitment to maintaining the integrity and fairness of the competition.
16. ADDITIONAL TERMS AND CONDITIONS
If Organizers determine, in their sole discretion, that any portion of this Challenge is compromised by virus, bugs, unauthorized human intervention, or any other causes beyond its control, that in the sole opinion of Organizers corrupts, or impairs the administration, security, fairness or proper participation in/of the Challenge, Organizers reserves the right to (a) cancel the Challenge; (b) pause the Challenge until such time the aforementioned issues may be resolved; or (c) consider only those Entries submitted prior to the when the Challenge was so compromised for the prizes.
To the fullest extent permitted by applicable law, you agree that Organizers, Organizer Affiliates, and Organizer Admins, and each of their directors, officers, employees, agents and assigns, will not be liable for personal injuries, death, damages, expenses or costs or losses of any kind resulting from participation or inability to participate in this Challenge or acceptance of or use or inability to use a prize or parts thereof including, without limitation, claims, suits, injuries, losses and damages related to personal injuries, death, damage to or destruction of property, rights of publicity or privacy, defamation or portrayal in a false light (whether intentional or unintentional), whether under a theory of contract, tort (including negligence), warranty or other theory.
Your use of any other products and services required by these Rules, whether required by these Rules or not, are subject to the terms and conditions associated with such products or services, including the AIcrowd site and services.
In the event any clause or provision of these Rules prove unenforceable, void or incomplete, the validity of the other conditions will remain unaffected.