Skip to main content

Making Sense of Big Data: Five Things to Know About DataFest

Events

Data on a computer screen

Published March 23, 2015

Smithies are gearing up to compete in the second annual American Statistical Association Five College DataFest being held March 27-29 at the University of Massachusetts Amherst.

Three five-member teams from Smith will take up the DataFest challenge to see who can best analyze large data sets and present their findings to judges from industry and academia. Smith teams include first year students and experienced DataFest competitors, including members of the college’s “Best in Show” team from 2014.

The 48-hour contest uses real-world data and real-world judges and mentors, said Benjamin Baumer, visiting assistant professor in statistical and data sciences at Smith and co-organizer of the ASA Five College DataFest.

“While working on their data sets, the students are also trying to catch the attention of VIP consultants from companies who come to offer advice,” Baumer said. “The competition becomes a capstone experience students can talk about in job interviews.”

The first DataFest was held in 2011 at the University of California Los Angeles and quickly spread to other colleges, Baumer said. This year, 75 students have signed up for the ASA Five College DataFest, which is being sponsored for the first time by the American Statistical Association.

Smith’s involvement in the competition is part of the college’s ongoing efforts to encourage student interest in the emerging field of data science. Through a new Women in Data Science collaboration with Mount Holyoke and MassMutual, Smith will offer new data science courses this fall in areas such as machine learning, applied statistics and computer science specialties. MassMutual is also a sponsor of the ASA Five College DataFest.

To find out more about preparations for the event, we spoke with Sara Stoudt ’15 and Elizabeth Atkins ’15, who were also part of last year’s competition. Atkins is the student organizer for this year’s Smith teams. Here are five things they shared about DataFest:

1. Smithies Are Bringing Their Game Faces
Stoudt said the college is poised to do well in this year’s information analysis marathon. The Smith teams represent diverse disciplines, Stoudt said, noting that “on my team we have computer science, math, psychology and engineering majors, and there are a number of first-year students.” Stoudt, who is a math and statistics major, said courses offered as part of Smith’s new Statistical and Data Sciences program have helped give team members an edge. “It’s given us a foundation in working with data,” she said. “That’s one of the things that really sets us apart.”

2. Big Data Means Really Big Data
“It’s not just how many rows but how many observations are included,” said Atkins, a psychology major and a member of Smith’s “Unpredictables” DataFest 2014 team. “The topics covered are also big,” she added. “Our problem last year was about energy consumption, and it included hundreds of industries.” While it may seem that the key challenge of DataFest is finding answers based on the data sets, Stoudt said the truly hard part is identifying the right questions. “You only have a short period of time, and the data comes without guiding questions,” she said. “You have to distill a problem and then communicate what you’ve found in plain language the judges can understand.”

3. It’s a Data Marathon With Snacks
How do the Smith teams maintain their stamina over the two-day competition? “That’s what the Snapple they give you is for,” said Atkins, with a smile. In addition to snacks, “they force you to go home at midnight that first night,” she added. You’re not allowed to look at the data when you get home. So it forces you to sleep.” Stoudt said her team discovered that keeping the big picture in mind was also important in handling the pressures of the competition. “You learn that done is better than perfect,” she said. “You have to be flexible and not get hung up on any one problem.”

4. The Rewards of DataFest Go Beyond the Competition
While Stoudt enjoyed the “glory” of having her “P Valuables” team win “Best in Show” last year, she said DataFest has other benefits. The competition helped sharpen her teamwork and communications skills. “And it’s a great way to meet new people in the field,” Stoudt said. Atkins, who plans to pursue a Ph.D. in clinical psychology, said she has cited her work on DataFest in her graduate school applications and interviews.

5. It’s Fun to Make Sense of Big Data
One the best things about DataFest is that the information sets students analyze are about real-world issues, the Smithies said. “You’re also allowed to keep your (computer) code after the judging,” Stoudt pointed out. “I’ve looked back at my code from a year ago, and it’s taught me a lot.” Atkins emphasized that studying data science can be helpful to students across academic disciplines. “Even with English, you can analyze data to figure out the patterns of how words are used in Shakespeare,” she said. “Everyone needs to know how to use data.”

To follow the progress of Smith’s DataFest teams, go to #DataFest on Twitter.

Smith DataFest 2014 winners in the category "Best in Show/Best Visualization"

Members of Smith's “P Valuables” team that won “Best in Show/Best Visualization” at DataFest 2014 are (front row from left) Dana Udwin ’14, Michele Handy ’15, Deirdre Fitzpatrick ’14, Maja Milosavijevic ’14 and Sara Stoudt ’15.