MSc in Data Science.
There has been a recent upsurge in commercial interest in the new role of “data Scientist”. A data scientist is a person who excels at manipulating and analysing data, particularly large data sets that don’t fit easily into tabular structures (so-called “Big Data”).
The School of Computing has been working on this kind of data and these forms of analysis for at least five years; not only working with data but also developing new algorithms and techniques for data scientists. The School also runs the most successful Business Intelligence Masters course in the UK; so it has decided to offer a new Masters course in Data Science which will start in Jan. 2013.
This will be led by Prof. Mark Whitehorn and Andy Cobley. Mark not only holds the Chair of Analytics at the University of Dundee but also runs a successful consultancy company that specialises in BI, Data Sciences and analytics. Andy is the course organiser for both the existing BI course and the new Data Science course. The two have been working together for more years than they care to remember….
In order to teach Data Science it is necessary to consider the skills that a Data Scientist displays.
General skills include:
• excellent analytical capabilities
• machine learning
• data mining
• algorithm development
• writing coding
• data visualisation
• understanding multi-dimensional database design and implementation
Specific skills include:
Technologies to handle big data
• Hadoop and related technologies
• MapReduce and its implementation on differing software platforms
• NoSQl databases
Knowledge of languages such as
• Functional and OOP languages such as Erlang and Java
General characteristics include:
• Insatiable curiosity
• Interdisciplinary interests
• Excellent communication skills
Duncan Ross (Director of Data Sciences at Teradata) has said that:
The first and most important trait is curiosity. Insane curiosity. In many walks of life evolution selects against the kind of person who decides to find out what happens “if I push that button”. Data Science selects for it.
In addition, communication skills seem to be of paramount importance. Data scientists have to be able to explain what the information means and how it was derived from the underlying data.
However we ultimately define data science, there is clearly a huge overlap with BI. A BI specialist will need to understand data and data analytics, however there may be a bias towards the implementation details of dealing with data, and understanding how data is stored within the system. A data scientist may be less concerned with the physical implementation and more interested in the message the data can deliver. However without an understanding of the implementation the Data scientist will find it difficult to interrogate the data for its secrets. For this reason there will be significant overlap and significant differences between, the BI and DS courses.
For more information please contact Mark (MarkWhitehorn@computing.dundee.ac.uk ) or Andy (firstname.lastname@example.org).
We have been asked for a course outline and are happy to provide the following. However, for several reasons, it is very important that this is not regarded as a complete, definitive, static syllabus.
This is a Masters course and is significantly different from an undergraduate course. A Masters is, in some ways, the bridge between undergraduate study and PhD work. Undergraduates have a syllabus and are examined on that. PhD students start with a research topic and find out information that has never been known before. They are given no syllabus at all. It may sound odd and entirely counter-intuitive but, in their three years of study, a PhD student effectively sets the syllabus for the viva they finally take (the viva is the PhD equivalent of an examination). But in practice, it can be no other way. When they start work, no one knows what they will find so how could there be a syllabus?
Hopefully, in that context, the incomplete syllabus of a Masters makes more sense. We expect the students to read around the subject. The lectures are there to set the landscape, to give a feel for the topics that should be studied. They are not definitive. In addition, students complete a research project which will have an outline at the start but, just like a PhD student, the student will ultimately help to set the syllabus for their own project.
Secondly, we expect the students on the course to ask many questions during the lectures. We try to answer them and often this prompts a relatively long discussion. This is not a mistake or a failing, this is the reason we are all there. The material covered during those discussions may well be entirely appropriate to a Masters in DS but it might not have appeared on a syllabus. Nevertheless, once covered, it may be appropriate for us to examine the students on it.
In like vein, we have excellent external speakers. We can’t (and certainly do not wish to) control what they say. So we can’t put the points they will cover into a syllabus. But again, the material may well be appropriate for examination.
Finally, our part-time students receive a course outline but will not finish the course until two years have passed. A week may be a long time in politics and two years is certainly a very long time in DS. So it is entirely reasonable for new topics to appear in those two years and for us to both teach them and then examine the students on them.
There is a danger that all of this may make the process sound anarchistic with no control and no framework; as if the students can be examined on any subject at the whim of the lecturers. This is not the case. The outline below spells out the main areas of study and by far the majority of the examination of the students will be focused on the topics given. And if, we feel that a topic comes up and is important, we will of course flag this to the students as we go along. All we are trying to make clear in this document is that, at this level of study, with a topic as dynamic as DS, it is to the detriment of the students if we set a fixed, completely defined syllabus for a two year period of study. And our experience so far suggests that the students much prefer that the fact that the course has the capacity to be responsive to change. After all, they are there, not just to gain the qualification, but also to actually learn about DS and to keep up to date.
In order to apply for the course, see the page on ukPass Uk Pass Data Science application page
Course Outline for Masters in Data Science
The module descriptions below may appear to suggest that some modules have more content than others. This is not the case, what currently varies is the level of detail shown.
Module 1 – Analytical systems - introduction and overview
Introduction to the Masters programmes
What’s an MSc?
How is the course delivered?
What is science?
Why do this degree?
Data, information and knowledge
Assignments and writing
Induction and deduction
00 What is analytics?
Data, Information and knowledge
On the singular question of plurality
How is analytics used?
01 Intro to data warehousing and BI
Problem – data is scattered and inconsistent
Solution - DW
Origins of term ‘BI’
How do we define BI professional?
What is a BI architect?
02 Introduction to Data Science
Origins of term ‘DSI’
How does a Data Scientist differ from a BI person?
How much overlap is there between the two jobs?
03 Introduction to the process of building a BI system
Data warehouse development process
Data warehouse operational process
04 Overview of conceptual modelling of analytical systems
Measures and dimensions
Hierarchies and levels
Alternative modelling methodologies
05 Walk through the creation of a sun model
As the title suggests
06 Requirement analysis
Aims of requirement analysis
Integrating multiple requirements into one warehouse model
Choosing initial projects
07 Tabular vs. Big data
Atomicity is very important distinguisher
Big data isn’t new
Rows, columns, etc.
Big data stats.
Why is big data interesting?