Talent Shortage, Open Source Security Worries Hinder Data Science Developments

Open source security, a lack of talent, and AI bias are all major concerns in the data science field, a new report finds.

Nathan Eddy

October 3, 2022

4 Min Read
data science

The prevalence of open source software (OSS) offers organizations the ability to develop data science projects quickly and affordably, although a lack of talent and concerns over open source security are challenges.

These were among the findings from Anaconda's 2022 State of Data Science Report, which found that the most valued benefits of open source are speed of innovation and affordability.

However, open source security concerns resulted in 40% of respondents pulling back on usage, with 31% citing it as the top concern.

Another key survey finding was that the lack of skill professionals is one of the biggest barriers to the successful enterprise adoption of data science, cited by more than half (56%) of respondents.

Related: Data Scientist Careers Are Evolving — Here's Why

Anaconda CEO and co-founder Peter Wang said that, overall, the findings in this year's report match the broader conversations he's seeing in the data scientist community.

"It was interesting to learn that 65% of respondents cited insufficient investment in data engineering and tooling to enable the production of good models as the biggest barrier to successful enterprise adoption of data science," he said.

"Realizing that not everyone is a data scientist, but everyone can use data science is an important step for building data literacy skills."

— Peter Wang, CEO, Anaconda

Although poor models and inputs are something they know cause friction in data science and machine learning (ML), the fact that this response was the highest ranked was unexpected considering obstacles like insufficient data science skills are at play, Wang said.

OSS, and particularly the Python programming language, plays an enormous role in data science today.

For decades, developers and data scientists were building and open sourcing the tools they used to analyze large sets of data — often in Python.

Once organizations discovered that they were sitting on a mountain of data that could trigger a new wave of growth, OSS became a mainstay thanks to an unmatched community of innovators and a lower lifetime cost to the business, Wang explained.

"Just look at Python," he said. "Over the last decade, Python has grown to become the most popular programming language used by data scientists, coders, and hobby developers alike and continues to translate into new use cases."

Wang added that he expects open source and data science to continue to push each practice further.

"I'm incredibly hopeful to see more open source involvement as it relates to bias in the data science field, as well as AI and ML," he said.

The survey also uncovered that 32% of students rarely or never have been taught bias in AI/ML/data science classes.

Related:5 Ways to Prevent AI Bias

"As we move forward, this should be a major focus for those shaping the future of data science," Wang said. "We'll begin to see priorities shift toward reinvesting in the open source community and its infrastructure, and I'm optimistic we'll see this from education institutions."

Open Source Security Concerns Grow

Security concerns are growing because instances like the Log4j exploit have become a major wake-up call across the board, according to Wang.

"We live in a world where open source is now embedded in nearly every piece of software and technology, and up until recently, there were those at a management level who didn't even realize they were using open source," he said.

Now that things have shifted to place more emphasis on securing software supply chains, open source security has become a top priority.

Related: Secure Open Source Software Is Helping Enterprises Find Their Edge

From Wang's perspective, the main challenge is these conversations around open source security are still somewhat new within the data science community.

"Vetting secure code is something developers and IT are familiar with, but it's a newer territory for data science," Wang explained. "We're starting to see some IT teams building out strict open source posture, leaving data scientists, who aren't expert developers, to do their analysis with whatever they can download."

Organizations Can Help Build Data Science Skills

Wang pointed out that data science is a form of literacy, and it's something that can be taught to any employee.

While some professionals will specialize in becoming data scientists, requiring strong skills and an understanding of math and statistics, other professionals only need to understand a few specific things to understand how data can improve their work.

"Realizing that not everyone is a data scientist, but everyone can use data science is an important step for building data literacy skills," Wang said. "Today, non-programmers are increasingly picking up Python not just to analyze data, but also to build applications, games, and other projects."

The best way for organizations to start building these skills within their employees is to teach them the fundamentals of data science, he added.

About the Author(s)

Nathan Eddy

Nathan Eddy is a freelance writer for ITProToday and covers various IT trends and topics across wide variety of industries. A graduate of Northwestern University’s Medill School of Journalism, he is also a documentary filmmaker specializing in architecture and urban planning. He currently lives in Berlin, Germany.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like