Wednesday, April 11, 2012

Big data in education





The US Department of Education just released a draft version of a brief titled Enhancing Teaching and Learning through Educational Data Mining and Learning Analytics.

One of the old promises of e-learning has been to use data generated in online learning systems to guide student learning, as well as to help instructors and designers, and managers to continually improve the online learning system. In practice most institutions do very little with the data they have, simply because they lack the skills to handle the data, or just don't know what to do with the data. After all, you can only make data 
work for you when you know which questions you'd like to see answered.

Technology to deal with big data has been developing rapidly lately, and the DoE thinks we might be at a tipping point, so it released this timely brief.

The report starts off with some familiar scenarios, think Netflix applied to education, before making an interesting, and useful distinction between educational data mining and learning analytics:

Educational data mining (EDM) develops methods and applies techniques from statistics, machine learning, and data mining to analyze data collected during teaching and learning. EDM tests learning theories and informs educational practice.
Learning analytics applies techniques from information science, sociology, psychology, statistics, machine learning, and data mining to analyze data collected during education administration and services, teaching and learning. Learning analytics creates applications that directly influence educational practice.

The Journal of Educational Data Mining started in 2009. In 2011 both the International Educational Data Mining Society and the Society for Learning Analytics Research were founded. New societies and journals usually mark the birth of new academic fields.

There is much more good information in the 57 pages report and it will be interesting to see how the response to the brief develops.

Thursday, April 5, 2012

Recommended reading: Planning for Big Data



Big data is one of the buzz phrases this year. If you’d like a quick and well written introduction to the subject, you’re lucky. O’Reilly recently released a free e-book on the subject: Planning for Big Data. A CIO’s Handbook to the Changing Data Landscape.

In 10 chapters and under 80 pages, the book introduces the concept of big data and the importance for today’s businesses, and that would include the world of education, research and libraries.

Chapter 3 introduces the (open source) software Hadoop that is at the core of many big data applications. Chapter 4 offers a survey of the market and its main players: EMC Greenplum, IBM, Microsoft and Oracle. Chapter 5 takes a closer look at Microsoft’s strategy in the area of big data.

Chapter 6 discusses the close relation between the cloud and big data and looks at platform solutions by Amazon, Google and, again, Microsoft. Chapter 7 discusses the rapidly developing market for data. Chapter 8 discusses NoSQL, the open source tool for analyzing large amount of unstructured, heterogeneous data.

Chapter 9 discusses visualization as a way of extracting meaning from data but only very superficially. Chapter 10 closes off the book with an outlook on the near future.

If you’d like to learn more about big data and got 90 minutes to spare, I can recommend this book.

Wednesday, April 4, 2012

The state of Open Access

One has to admire people like Richard Poynder. Since 2001 he publishes a series of Open Access interviews. And the series is, of course, Open Access (OA), under a Creative Commons BY NC ND license.

The latest, February 2012, installment is an interview with Michael Eisen, one of the founders of Public Library of Science (PLoS).

The interviews tend towards tl;dr, the one with Eisen is a 19 page pdf file, that begins with a 6 page introduction and continues with an extensive interview for the latter 13 pages. However, and I have read most of the interviews in the series, the reader is always rewarded with a fresh view on the ongoing OA debate. My interest in the subject dates to way back 1994 when I gave a talk about electronic journals at the INET conference that year in Prague. This resulted in an article in the next year in the Journal of Information Networking, itself also probably tl;dr, these days (13 pages if you’d print the html file).

As in many interviews in the series, central questions are about the apparent contradiction between green (authors self archiving their research papers) and gold OA (OA journals); business models - yes, there ain’t no such thing as a free lunch although Eisen thinks the costs could approximate zero if faculty do all the work (the costs for PLoS, that would be); and how the future of OA will develop - what with the recent commotion about the Research Works Act (RWA) in the USA, which led to harsh but justified criticism of Elsevier.

My main takeaway from the interview is that Elsevier’s support for the RWA is a (not so) covert attack on green OA. After all these years they are obviously really worried about Steve Harnad’s 1994 subversive proposal. If you were looking for a strong argument for green OA, here you have one. Whether most of the scientists that signed a pledge will indeed refrain from submitting or reviewing articles to or for Elsevier journals remains to be seen. The Poynder interview references a similar venture in the past that didn’t make any difference at all.

What worries me more though is that I don’t see really much progress in the OA field. And it has to do with the deep conservative attitudes of both faculty and libraries. Earlier this month I came across a dissertation titled: "The Influence of the National Institutes of Health Public-Access Policy on the Publishing Habits of Principal Investigators". The abstract of the dissertation said: there's no influence. I didn’t / couldn’t bother to read on.

My point is, why do we need journals anymore? And yes, I know all the answers why we still need them, but somehow, I find these answers less and less convincing as the years in this debate pass. Physics has arXiv since the early nineties, why would funding bodies and libraries want to take over particle physics journals (SCOAP3)? Surely, in 2012, it should be possible to overcome our outdated tenure procedures based on bad metrics like impact factors. That’s the conservatism on the faculty side. But there’s also a conservatism on the library side where collection size and collection budget are still regarded as the main ranking parameter.

Faculty and libraries are deeply committed, in ways they don’t often realize, to keep the current system alive. It reminds me of the famous quote by Daniel C. Dennett: “A scholar is just a library's way of making another library.”

There’s much more interesting stuff covered in the latest Poynder interview with Eisen, go and read it, I’m worried about a tl;dr blog post ;-).

Update:

A couple of days after I originally wrote this post in February, Elsevier withdrew its support for the RWA, see the press release from February 27: http://www.elsevier.com/wps/find/intro.cws_home/newmessagerwa

However, they also note the following: “[W]hile withdrawing support for the Research Works Act, we will continue to join with those many other nonprofit and commercial publishers and scholarly societies that oppose repeated efforts to extend mandates through legislation.” That'd be the FRPAA.

To be continued …

Can a private investment fund make a difference in higher education?

Just today, via TechCrunch, I came across the announcement of a 100 million dollar University Ventures fund. German media giant Bertelsmann is the lead investor, together with the University of Texas Investment Management Company (they manage UT’s endowment).

This is how they see their mission and strategy:
UV is an investment fund with over $100M in committed capital focused exclusively on the global higher education sector. UV pursues a differentiated strategy of innovation from within – partnering with (rather than competing against) traditional institutions.
By partnering with top-tier universities and colleges, and then strategically directing private capital to develop programs of exceptional quality that address major economic and social needs, UV expects to set new standards for student outcomes. Specifically, UV is committed to establishing data-driven programs that ensure superior student outcomes, as well as to leveraging technology to lower cost while improving access. All UV programs are student-centric, focused on student retention and completion.
I encourage you to read their full announcement, it’s quite interesting. But it raises some questions. UV estimates the global higher education market to be over 1 trillion (that’s a 1 with 12 zeros, a million times a million) dollar, so UV’s 100 million fund is 0.01 percent of that market. They seem to expect quite some leverage.

My biggest issue is with their strategy of working together with existing institutions. If there’s one thing I’ve learned about universities, it is that they are about the worst learning organizations I know (probably the only institution worse is the Catholic church). All investments in e-learning in the past 20 years or so haven’t made substantial changes in existing HE institutions. Technology has been added to otherwise unchanged curricula and educational practices, with no substantial gains in outcomes whatsoever.

In fact, I can think of only one example of a mildly successful innovation in HE educational practices in the Netherlands, and that example was established well before technology has started to not disrupt HE. In 1976, Maastricht University opened its doors. This university designed its curricula based on the philosophy of problem-based learning, and only hired staff that would adhere to the principles of problem-based learning.

My key takeaway from this is that if you want something really new in HE, you must at least start building a completely new program and design it from the ground up. I wonder if this can succeed when your strategy is to work together with existing institutions.

Nevertheless, it will be interesting to see how this develops.

E-learning, who’s in control?

Interesting story in yesterday’s New York Times about teachers (and also parents and students) in Idaho resisting the introduction of computers and online courses in high schools. In a sense it rehashes the ongoing debate (since the end of the 1990s) about e-learning.

One quote stands out for me: “Teachers are resisting, saying that they prefer to employ technology as it suits their own teaching methods and styles”. This perfectly illustrates Larry Cuban’s famous observation "When teachers adopt technological innovations, these changes typically maintain rather than alter existing class room practices." (Cuban 2001, p. 71).  Time and again we see that without redesigning courses the introduction of e-learning does not make much sense, but just adds costs, in terms of hardware and software licenses and teacher time, without producing better results. Why is it so hard for teachers to change their dominant practice of lecturing?

Also yesterday I came across this story about physics teachers that gave up on traditional lecturing because they found that students were not grasping fundamental concepts. Instead, they ask students to go over the material before meeting in class and posting questions in a learning management system that the teacher uses to prepare for class. In the classroom clickers are used to probe students understanding, as well as students discussing with each other, and, more importantly, learning from each other. The simple observation, in a related article is that a student who has just learned something might be better than an expert into explaining a new concept to a fellow student. As one of the physics teachers observes: “That''s the irony of becoming an expert in your field, Mazur says. "It becomes not easier to teach, it becomes harder to teach because you''re unaware of the conceptual difficulties of a beginning learner."

It’s not only important for teachers to understand their own limitations as experts, the hard part,I think, is also giving up control. I had an interesting experience in my own teaching career in the 1980s. One day I came to class unprepared and felt both bad and nervous about it. So I started asking students questions and letting them discuss among themselves solutions to those questions. It went wonderful, for the first time I had the feeling that students were really engaged and actively learning. Giving up control turned out to be fun as well.

Larry Cuban (2001), Oversold and Underused, Computers in the Classroom, Harvard University Press