Skip to main content
FeaturedNewsPrimary SourceryUncategorized

Refining data mining

By October 31, 2022September 16th, 2023No Comments

By Allison Campbell-Jensen

Teasing out trends in city parks, uncovering evidence of small hospital closures, tracing changing treatments of political figures — as they sleuth, faculty and graduate school researchers may find promising the U of M Libraries substantial holdings of scholarly journals, news magazines, and newspapers.

But where to dig first? And with what tool?

Until recently, they might have asked an undergraduate or graduate student to carry out the grunt work of reviewing tons of content to excavate relevant articles. But the information mountain can be huge: the New York Times runs about 150 stories a day, according to one estimate. Newspapers, daily or weekly, for smaller cities, towns, and rural areas run fewer stories — but over the course of a year or five years, all those words begin to add up to astronomical figures and stacks of articles that would be nearly impossible to scale.

Now, with a proprietary program, ProQuest TDM (text and data mining), and the help of University of Minnesota Librarian Cody Hennesy and Michael Beckstrand of LATIS (Liberal Arts Technologies and Information Services), scholars can dig more efficiently to unearth the info lode that will serve their needs.

Sophisticated searching

Four people smiling at the camera; involved in a research project

From left: Michael Beckstrand of LATIS; Prof. Yuan Cheng of the Humphrey School; researcher Shuping Wang; and Cody Hennesy, Libraries

Text and data mining offers the ability to tackle hundreds or thousands of information sources to find the requested articles — but the sophistication needed to identify good sources often far exceeds developing a good Google search question. That is where Hennesy and Beckstrand can help.

School of Public Health economics professor Sayeh Nikpay consulted with Hennesy and Beckstrand when she wanted to uncover how many rural hospitals have closed. The question is important for patient care — when hospitals merge into other systems or are purchased, it’s not always clear whether they have switched to totally outpatient services or still are in the business of providing help for acute medical issues.

“They were really good at translating my objective” into the TDM program, Nikpay says. “I’m not a computer scientist — I’m literate in coding but I’m not a programmer. If I don’t have that background, I do not know fully what is possible.”

Collaborating with Nikpay’s Research Assistant Solvejg Wastvedt, Hennesy and Beckstrand came up with appropriate terms to excavate an impressive corpus (body) of articles. Additional work in Python programming language within Jupyter Notebooks, a web-based computing platform, she says, allowed the team “to whittle down to the articles that were most relevant.”

While closing hospitals may appear to be efficient, Nikpay says: “Where you enter the health care system really matters. There’s a lot that we don’t understand yet about choosing a hospital.”

Parks and rec

Sussing out how parks have been talked about in the Twin Cities since the late 19th century is a project of Humphrey School Professor Yuan Cheng and a group of post-doc and doctoral students. Are public parks beneficial for people, developers, or the economy? To find out, researchers can access the Libraries holdings of local newspaper archives going back more than 120 years. They just need a bit of help, Beckstrand says.

“They are at the stage where they’re interested in that, but they don’t necessarily have the specific skills or knowledge required to get those answers out of the system. So that’s what Cody and I are working on,” Beckstrand says. “How to build a corpus that will have articles that are really talking about the park department or parks in general — and not about parking or people whose last name is Park.”

Another project was to determine in what circumstances political figures made the news — for instance, in positive or negative coverage. That was a test case for the team, and it showed the great potential of ProQuest TDM.

“From my perspective,” Hennesy says, “this is a very fun space to work in because it’s a little bit new.”

How the Libraries support the researchers also is fresh and arrangements still are being worked out with each project. (The team can only take on one project at a time, due to licensing restrictions.)

Academic libraries often bring different experts together, “which I also really like,” Hennesy says. “It’s really interdisciplinary: We’ve gone from political science to public health to the Carlson School of Management.”

Each of the researchers gained from the experience, with the help of the Libraries and LATIS.


Author markenge

More posts by markenge

© 2024 Regents of the University of Minnesota. All rights reserved. The University of Minnesota is an equal opportunity educator and employer.
Privacy Statement | Acceptable Use of IT Resources