Frequently Asked Questions

General Guidelines

How are Round One and Round Two different in terms of the analysis for the participants?

Round 1 submissions date is October 6, 2023.

Round 2 submissions date is October 15, 2023.

Round 1: This is all about prediction accuracy and ‘fairness’ (i.e equal opportunity) in your solution. Using the HOLDOUT file, a ROC/AUC metric will be calculated for each team to measure the accuracy of the prediction.  Similarly, a DISPARITY SCORE will be calculated to measure ‘fairness’ in the modeling solution.  These two measures will be combined for each submission and compared across all participants. The top 50 submissions from Round 1 will move on to Round 2.

Round 2: 

Each team will submit a word document establishing key performance indicators aligned to business needs, depth and description of quantitative analysis resulting in actionable business insights, and provide meaningful implications and recommendations based on results/insights.

  • Multiple judges will review each of the submissions from Round Two, based on the entirety of the solution: approach, analytics, insights, recommendations, and actionability. Judging will be conducted by multiple subject matter experts made up of Data Science professionals from Humana and PhD candidates from Texas A&M.
  • Each of the Top 50 Round Two submissions will be read and evaluated by a panel of five judges.
  • The scores of judging panels will be analyzed and combined to create a composite score for each submission.

Additional details will be provided in the Informational Meeting. Please refer here or visit the official rules page for more details.

Also, refer to the “Fairness in AI Guide” to see more details on the judging criteria.

Once questions pertaining to the case are submitted, when can we expect to receive answers?

Questions will be answered via email within two business days and commonly asked questions will be posted in the FAQ section as they are answered.

Can we get school credit for working on this case?

Competitors may use their submission for class work after the final rounds are complete.

 

Fairness

What are the judging criteria for accuracy and fairness?

Round 1 is evaluating modeling accuracy & fairness using objective metrics based on the HOLDOUT file returned by the participants.

Accuracy: ROC/AUC measure will be calculated

Fairness: Disparity score calculated using RACE & SEX

Additional details will be provided in the Informational Call. Also, refer to the “Fairness in AI Guide” to see more details on the judging criteria.

Where can I find the Humana-Mays Healthcare Analytics Case Competition Fairness in AI Guide?

You can find the Humana-Mays Healthcare Analytics Case Competition Fairness in AI Guide here.

Can teams have a faculty advisor, or receive support from outside sources?

As stated in the Official Rules, coaching and mentoring from outside sources, other than your registered teammates, is not allowed.  These outside sources include but are not limited to, university faculty, university teaching assistants, university staff, or other professional consultants in related fields.

 

Registration

Will we receive a confirmation email once our team has been registered successfully?

Yes, each team member will receive a confirmation that their team has been registered successfully.

Is there a number for us to dial into the Informational Call?

An email will be sent with a link inviting students to join the Informational Call.

Will the Informational Call recording and PowerPoint be available?

Yes, the recording and PowerPoint will be posted on the website after the Informational call.

Is someone who formerly worked for Humana eligible to compete?

As long as the member of the team is no longer considered a Humana employee and is a current student of any of the recognized masters programs, they may compete.

How can we form a team?

Teams are to be formed on your own. As long as you are a full-time and/or part-time master’s student enrolled in the same university, you can be on a team together. Teams can be interdisciplinary.

Can I form a team with an undergraduate student?

No. This competition is open to master’s level students from within the same university.

What is the minimum/maximum number of students allowed on a team?

The team minimum is 2 students and maximum is 4 students.

Can I compete with more than one team?

No, students are only allowed to register for one team.

Have my Round One or Round Two submissions been received?

Each team member will receive a confirmation email that the submission has been received successfully. If you have any issues, please contact humanacasecomp@tamu.edu.

 

Deliverables

Where can I find additional information about the Leaderboard?

You can find the 2023 Leaderboard Guide here.

And you can find additional information here.

Is there a page limit for Round 2 submission?

There is no page limit, but a concise presentation of findings will be noted during judging. Based on previous competitions, finalists typically submit between 15-25 pages. Please see previous finalist submissions here.

Do we submit different documentation for rounds 1 and 2?

Round 1 deliverables deadline is October 6, 2023.

Round 2 deliverable deadline is October 15, 2023.

Deliverable 1: A scored CSV file of the holdout file that contains 3 fields: ID, SCORE, and RANK

Deliverable 2: A written summary of your work including key findings, implications, and recommendations.

Refer to previous finalist submissions for several examples of successful submissions.

Do we need an executive summary?

That is up to the participants. There are no exact guidelines for final submissions. They should be professional and concise.

Can we request feedback from non-participants?

Discussion of the case with external parties is not allowed, per the signed NDA. Discussion of programming is allowed, though guidance on the analytic approach is forbidden.

In the final round, can we include visualization tools?

There are no restrictions as long as the final deliverable is in a PowerPoint presentation.

If we are removing observations or making assumptions, do we have to validate them first through email or can we just move forward as long as we have good reasons to back them up?

As long as you have good reason to back it up, it is fine.

 

Data

We are having problems reading the contents of the “Readme.txt” file. What does it say?

Here is all of the information in that file:

  • 2023_Competition_Training.csv       = Data to be used for analysis & model development
  • 2023_Competition_Holdout.csv       = Holdout data to be scored with final model and results returned for mid-cycle leaderboard and/or Oct 16 submission
  • Humana_Mays_2023_DataDictionary.xls       = File Statistics, File Layout, descriptions of attributes for each event type

When will the dataset be available?

Competition data will be distributed to registered and verified teams starting after September 13, 2023, Informational Call and ending after the registration deadline of September 22, 2023.  (Typically, data will be available no more than 48 hours after registration & verification)

What format will the dataset be in?

The dataset will be available in a CSV file, along with a data dictionary.

Are we allowed to use publicly available data to help us in this case competition?

Yes. Students are encouraged to use open-source data when creating a solution.

Explain why there are instances in the claims data where the process date is less than the service date? 

You are correct in that the process date should not occur prior to the service/visit date.  2 things may be happening: (a) there is a problem with your data (i.e. read-in or join error) (b) the data is erroneous.  Data is messy. You must decide which it is and how to handle it.

What is the distinction between “medclm_key” and “clm_unique_key” in the context of the “medclms_trian” dataset, and why does “medclm_key” seem to have a unique value for every row while “clm_unique_key” has duplicate values?

You should think of medclm_key as the primary key for the medical claims table and it is unique for every claim line.  The clm_unique_key groups together a single “claim” which can consist of multiple “claim lines”.  These unique claims can be combined together to form a logical claim that group together claims from the same provider/member combo with overlapping service dates.  We typically use the logical claim to count utilization/visits rather than clm_unique_key.

Are we allowed to utilize a Private Github repository to share access to data between team members?

Yes. However, make sure that data is not public as it would result in a violation of NDA.

Are the two claim datasets are claims filed by healthcare providers or by patients?

The way this usually works for medical claims is this: Someone goes to the doctor, they show their insurance card to the doctor, and after the appointment, a billing person submits a claim to the insurance company.

For prescription claims, it’s a lot faster but generally the same process. When a patient fills a claim at a pharmacy, the pharmacy submits the claim to the insurance company.

Since Humana is the insurance company, this is the data we have. Generally, it comes from the provider or pharmacy directly to Humana.

How much on average does the insurance cover for patients in both medical claims and prescription drugs? Do participants have access to that data? 

Coverage amounts vary on myriad of factors including, but not limited to, the plan an individual member has chosen.  However, for the purpose of this case competition, those details are not included in the data that was provided to the participants.

 

Why might one prioritize AUC over recall in this context? To elaborate, when we focus on maximizing recall, there is a slight decrease in AUC, but we end up capturing a larger portion of the positive class. Despite the model’s lower precision, the subsequent false positives we investigate could still be advantageous, even if we acknowledge that they might not influence the treatment decision

It’s not that we believe AUC is more important, it is simply the measure of accuracy chosen for Round 1 evaluation where every team’s model will be evaluated using the same measuring stick.  However, feel free, in subsequent rounds, to make a case for a model that maximizes recall, why that makes sense, and what (if any) implications it carries.