There has been a recent upsurge in commercial interest in the new role of "data scientist". This MSc (which can be studied part time or full time) , will prepare you to become a data scientist, a person who excels at manipulating and analysing data, particularly large data sets that don't fit easily into tabular structures (so-called "Big Data").
The program is available either full time (1 year) or part time (2 years). The main teaching is in intensive weeks,for part time study, one in January and one April in year 1 and one in February and one in May for year 2. Full time students do all 4 in one year. The intensive weeks are taught in Dundee and run from 9am to 5pm Monday to Thursday. Friday's teaching finishes at 3:30pm to allow students to travel home. Outside these intensive weeks there will be self study, video and audio lectures, some guest speakers and of course assignments. Full time students will have additional seminars in Dundee
For January 2015 entry. We have a number of fully funded places due to an initiative from the Scottish Funding Council (SFC) designed to support key sectors in the Scottish economy to develop a high-level skills base. Citizens of the EU may be eligible for one of these places. See : SFC Funding Programmes for more details.
The School of Computing has been working on this kind of data and these forms of analysis for at least five years; not only working with data but also developing new algorithms and techniques for data scientists. The School also runs the most successful Business Intelligence Masters course in the UK; so it has decided to offer a new Masters course in Data Science which will start in Jan. 2013.
This will be led by Prof. Mark Whitehorn and Andy Cobley. Mark not only holds the Chair of Analytics at the University of Dundee but also runs a successful consultancy company that specialises in BI, Data Sciences and analytics. Andy is the course organiser for both the existing BI course and the new Data Science course. The two have been working together for more years than they care to remember….
In order to teach Data Science it is necessary to consider the skills that a Data Scientist displays.
General skills include:
- data wrangling
- excellent analytical capabilities
- machine learning
- data mining
- algorithm development
- writing coding
- data visualisation
- understanding multi-dimensional database design and implementation
Specific skills include:
- Technologies to handle big data (Storage and Processing)
- Hadoop and related technologies such as Spark
- MapReduce and its implementation on differing software platforms
- NoSQl databases (including an in depth look at Cassandra)
Knowledge of languages such as:
- Functional and OOP languages such as Erlang and Java
General characteristics include:
- Insatiable curiosity
- Interdisciplinary interests
- Excellent communication skills
Duncan Ross (Director of Data Sciences at Teradata) has said that:
The first and most important trait is curiosity. Insane curiosity. In many walks of life evolution selects against the kind of person who decides to find out what happens “if I push that button”. Data Science selects for it.
In addition, communication skills seem to be of paramount importance. Data scientists have to be able to explain what the information means and how it was derived from the underlying data.
However we ultimately define data science, there is clearly a huge overlap with BI. A BI specialist will need to understand data and data analytics, however there may be a bias towards the implementation details of dealing with data, and understanding how data is stored within the system. A data scientist may be less concerned with the physical implementation and more interested in the message the data can deliver. However without an understanding of the implementation the Data scientist will find it difficult to interrogate the data for its secrets.
For more information please contact Mark (MarkWhitehorn@computing.dundee.ac.uk ) or Andy (email@example.com).
We have been asked for a course outline and are happy to provide the following. However, for several reasons, it is very important that this is not regarded as a complete, definitive, static syllabus.
This is a Masters course and is significantly different from an undergraduate course. A Masters is, in some ways, the bridge between undergraduate study and PhD work. Undergraduates have a syllabus and are examined on that. PhD students start with a research topic and find out information that has never been known before. They are given no syllabus at all. It may sound odd and entirely counter-intuitive but, in their three years of study, a PhD student effectively sets the syllabus for the viva they finally take (the viva is the PhD equivalent of an examination). But in practice, it can be no other way. When they start work, no one knows what they will find so how could there be a syllabus?
Hopefully, in that context, the incomplete syllabus of a Masters makes more sense. We expect the students to read around the subject. The lectures are there to set the landscape, to give a feel for the topics that should be studied. They are not definitive. In addition, students complete a research project which will have an outline at the start but, just like a PhD student, the student will ultimately help to set the syllabus for their own project.
Secondly, we expect the students on the course to ask many questions during the lectures. We try to answer them and often this prompts a relatively long discussion. This is not a mistake or a failing, this is the reason we are all there. The material covered during those discussions may well be entirely appropriate to a Masters in DS but it might not have appeared on a syllabus. Nevertheless, once covered, it may be appropriate for us to examine the students on it.
In like vein, we have excellent external speakers. We can’t (and certainly do not wish to) control what they say. So we can’t put the points they will cover into a syllabus. But again, the material may well be appropriate for examination.
Finally, our part-time students receive a course outline but will not finish the course until two years have passed. A week may be a long time in politics and two years is certainly a very long time in DS. So it is entirely reasonable for new topics to appear in those two years and for us to both teach them and then examine the students on them.
There is a danger that all of this may make the process sound anarchistic with no control and no framework; as if the students can be examined on any subject at the whim of the lecturers. This is not the case. The outline below spells out the main areas of study and by far the majority of the examination of the students will be focused on the topics given. And if, we feel that a topic comes up and is important, we will of course flag this to the students as we go along. All we are trying to make clear in this document is that, at this level of study, with a topic as dynamic as DS, it is to the detriment of the students if we set a fixed, completely defined syllabus for a two year period of study. And our experience so far suggests that the students much prefer that the fact that the course has the capacity to be responsive to change. After all, they are there, not just to gain the qualification, but also to actually learn about DS and to keep up to date.
In order to apply for the course, see the page on ukPass Uk Pass Data Science application page
Part time year 1 (Full time students do all 6 modules in one year)
- AC52038 Introduction to Business Intelligent Systems
- AC52043 Big Data
- AC52040 Analytical Database Design and Modelling
Part Time year 2
- AC 53009 Data Analysis and Visualisation
- AC53005 Analytical Languages
- AC53010 Advanced Statistics and Data Mining
In addition in 2015 we will be offering AC53008 (ETL Theory and practice as a substitute for AC53010)
- For entry requirements and funding options please see the online prospectus page