Data Mining Practice and Analysis
Order ID 53563633773 Type Essay Writer Level Masters Style APA Sources/References 4 Perfect Number of Pages to Order 5-10 Pages Description/Paper Instructions
Assignment – Data Mining Practice and Analysis
Due date: 5pm Friday 15 January 2021 (in week 9)
Aims
- Familiarise with some well-known data mining techniques, in order to understand their working principles;
- Apply data mining techniques to domain-specific datasets;
- Review cutting-edge data mining techniques to gain good overview on current data mining technology;
Requirements (Tasks)
The whole task of this assignment consists of the following procedural steps.
Step 1
Find and download a data set that you think is about an interesting topic.
There will be a bunch of data sets on LearnJCU which the lecturer thinks is interesting, and you are welcome to pick one of those, but there are also thousands of other data sets available from web sites such as these:
- https://toolbox.google.com/datasetsearch
- http://kaggle.com/datasets
- http://www.kdnuggets.com/datasets/index.html
- http://archive.ics.uci.edu/ml/
- http://service.re3data.org/search/
- https://dataverse.harvard.edu/
- https://www.icpsr.umich.edu/icpsrweb/ICPSR/access/subject.jsp
- http://catalog.data.gov/dataset
- http://dataportals.org/
- http://mldata.org/
- http://oad.simmons.edu/oadwiki/Data_repositories
- https://www.quandl.com/
- http://www.google.com/publicdata
- http://lib.stat.cmu.edu/datasets/
- http://webscope.sandbox.yahoo.com/catalog.php
- http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
- https://github.com/caesar0301/awesome-public-datasets
- https://www.reddit.com/r/datasets/
Step 2
The original data set often comes with a short article describing it, or at least a name. Use Google Scholar at https://scholar.google.com (or a similar academic citation index) to find a few articles that use data mining on the same data set (or a similar data set).
If no article uses that same data set, then try looking for articles that use data mining on the same topic.
A few suggestions:
- If there are many articles that use your data set, then just pick a few that are recent, popular, or otherwise interesting.
- You don’t have to read the whole article! Just read the introduction, then skip to the back, and look for a results section or results table. Only then can you decide if the article is worth reading in more detail.
Google Scholar can also do the formatting for your referencing for the articles.
It’s good to this background reading first, before you do any data mining.
Step 3
Choose appropriate data mining techniques, and run some algorithms.
You can select either of two options for this assignment.
- Option (1) – Programming-intensive Assignment
- Once you have your own domain-specific dataset and chosen data mining algorithm, then you need to design and implement the chosen algorithm in your preferred programming language.
- A series of preprocessing will be required at this step. The preprocessing procedure should be designed carefully (considering what kind of processing will be required? How? Why?) to make your data ready to be fed to your program. Some parts of this preprocessing procedure can be included in your program as a part of “pre-data-mining module”.
- Your final program must become a stand-alone data-mining tool designed for your own purpose of data analysis. It is expected that your program should include the following modules (and may include more sub-modules if needed);
- pre-data-mining module – designed for necessary preprocessing and for getting the data ready to be fed to the next module (data-mining module). You don’t need to include all required pre-processing in this module. It is assumed that some initial preprocessing (e.g. cleaning noise data) can be done externally using other software tools (e.g. Excel or Weka).
- data-mining module – the chosen data mining algorithm is implemented. You can directly borrow the algorithm from one popular existing data mining method, or you can design your own algorithm (by amending the existing one)
- post-mining module – this module is for presenting/reporting the output result produced through previous modules. The result can be made in a simple text report or additionally in a non-text visualization way (e.g. graph, chart or diagram).
- This programming-intensive assignment still requires an analysis. Try to find all the patterns you can detect with your implemented algorithm. Try to compare and contrast the result using your chosen preprocessing scheme and algorithm with using other existing algorithm or with using other preprocessing methods.
Note: in particular, for the comparison the result using your program with using other existing algorithm, you can use other existing data mining tools (e.g. Weka) to get the result using other algorithm.
- Option (2) – Analysis-intensive Assignment
- Once you have your own domain-specific dataset chosen, you need to design your own data-mining analysis scheme. This analysis scheme can consist of multiple steps of procedures:
- Set up a strategy for preprocessing on your data.
A series of preprocessing will be required and need to be designed carefully (considering what kind of processing will be required? How? Why?). You may include multiple different preprocessing schemes for the comparison analysis.
- Set up a strategy for data-mining.
You need to select one data mining areas (clustering, classification, association rules mining) of your choice and select AT LEAST TWO existing data mining algorithms in your chosen data mining area.
For example, if you chose Clustering as your data mining area, you can apply two algorithms; DBScan and K-mean and compare the two results.
Alternatively you can design a combined algorithm which applies multiple algorithms from same/different data mining areas in a series. Your strategy also can be designed to apply different parameters for one algorithm. Another strategy you can set up is to apply multiple preprocessing (attribute selection) schemes for one algorithm.
- You can choose one data mining tool (e.g. Weka) to analyze your chosen dataset. Apply the data-mining strategy (you had set up) on your chosen data (preprocessed) using the data mining tool and try to find all the patterns you can detect.
- Do various comparison experiments either by applying different data mining algorithms (or strategy) to the same chosen dataset or by applying a same algorithm to the differently pre-processed datasets.
- Critically analyze experimental results and discuss/demonstrate why a chosen algorithm (strategy) is superior/inferior to other algorithm (strategy).
Step 4
- You need to write a research report paper of minimum 15~20 pages (for CP5634 students) on your project, to summarise your algorithm and experimental results. The report should contain all topics listed above for presentation but with more details.
- As mentioned in step 2 above, your report for CP5634 requires a section that is a literature review, or summary about previous work on the same data set (or on similar data sets).
- Feel free to be critical of earlier work: in past semesters, students have found glaring errors in previous research, or have obtained better accuracy than previous research.
- Note that the review / summary of previous research is worth 10% of the whole course, so this is an important part of the project. You don’t have to re-invent the wheel, it’s good to see what others have already done.
- Please refer to the following link if you need to get further ideas about a “literature review”:
http://www-public.jcu.edu.au/libcomp/assist/training/JCUPRD_026326
- The research paper must follow the generally accepted format of research article consisting of introduction, related work (brief review of methodologies (algorithm/strategy used), a summarized description of your experimental settings and procedures (description of data, justification of chosen data mining area, justification of chosen algorithm, preprocessing details, etc.), comparison, discussion, issues, conclusion, possible future work and a list of references. (you may add more sections if needed)
- In addition to the general components listed above, the report from “Programming-intensive option” should include a summary of your program (including the program structure, implementation details, a summarized algorithm for the main modules etc. including code if necessary).
- For “Analysis-intensive option”, it is required to include a more in-depth analysis on the investigation and experimental comparison made through the project.
Submission
- Due for the report submission: 5pm Monday 22 September 2020 (in week 10)
- You need to submit your final report as a single document file (MS Word or PDF format) to the electronic drop box on LearnJCU.
- For the “Programming intensive option”, you need to submit the source code and executable file of your program accompanied to your report. Please make a zip file including all necessary files (report document and program files).
Useful Links
- http://www.kdnuggets.com/
- http://www.cs.waikato.ac.nz/ml/weka/
- http://mlearn.ics.uci.edu/MLRepository.html
- http://kdd.ics.uci.edu/
- http://www.sigkdd.org/
Writing Skills: http://www-public.jcu.edu.au/learningskills/resources/wsonline/
Scientific Report Writing:
http://unilearning.uow.edu.au/report/2b.html
http://writing.wisc.edu/Handbook/ScienceReport.html
RUBRIC
QUALITY OF RESPONSE NO RESPONSE POOR / UNSATISFACTORY SATISFACTORY GOOD EXCELLENT Content (worth a maximum of 50% of the total points) Zero points: Student failed to submit the final paper. 20 points out of 50: The essay illustrates poor understanding of the relevant material by failing to address or incorrectly addressing the relevant content; failing to identify or inaccurately explaining/defining key concepts/ideas; ignoring or incorrectly explaining key points/claims and the reasoning behind them; and/or incorrectly or inappropriately using terminology; and elements of the response are lacking. 30 points out of 50: The essay illustrates a rudimentary understanding of the relevant material by mentioning but not full explaining the relevant content; identifying some of the key concepts/ideas though failing to fully or accurately explain many of them; using terminology, though sometimes inaccurately or inappropriately; and/or incorporating some key claims/points but failing to explain the reasoning behind them or doing so inaccurately. Elements of the required response may also be lacking. 40 points out of 50: The essay illustrates solid understanding of the relevant material by correctly addressing most of the relevant content; identifying and explaining most of the key concepts/ideas; using correct terminology; explaining the reasoning behind most of the key points/claims; and/or where necessary or useful, substantiating some points with accurate examples. The answer is complete. 50 points: The essay illustrates exemplary understanding of the relevant material by thoroughly and correctly addressing the relevant content; identifying and explaining all of the key concepts/ideas; using correct terminology explaining the reasoning behind key points/claims and substantiating, as necessary/useful, points with several accurate and illuminating examples. No aspects of the required answer are missing. Use of Sources (worth a maximum of 20% of the total points). Zero points: Student failed to include citations and/or references. Or the student failed to submit a final paper. 5 out 20 points: Sources are seldom cited to support statements and/or format of citations are not recognizable as APA 6th Edition format. There are major errors in the formation of the references and citations. And/or there is a major reliance on highly questionable. The Student fails to provide an adequate synthesis of research collected for the paper. 10 out 20 points: References to scholarly sources are occasionally given; many statements seem unsubstantiated. Frequent errors in APA 6th Edition format, leaving the reader confused about the source of the information. There are significant errors of the formation in the references and citations. And/or there is a significant use of highly questionable sources. 15 out 20 points: Credible Scholarly sources are used effectively support claims and are, for the most part, clear and fairly represented. APA 6th Edition is used with only a few minor errors. There are minor errors in reference and/or citations. And/or there is some use of questionable sources. 20 points: Credible scholarly sources are used to give compelling evidence to support claims and are clearly and fairly represented. APA 6th Edition format is used accurately and consistently. The student uses above the maximum required references in the development of the assignment. Grammar (worth maximum of 20% of total points) Zero points: Student failed to submit the final paper. 5 points out of 20: The paper does not communicate ideas/points clearly due to inappropriate use of terminology and vague language; thoughts and sentences are disjointed or incomprehensible; organization lacking; and/or numerous grammatical, spelling/punctuation errors 10 points out 20: The paper is often unclear and difficult to follow due to some inappropriate terminology and/or vague language; ideas may be fragmented, wandering and/or repetitive; poor organization; and/or some grammatical, spelling, punctuation errors 15 points out of 20: The paper is mostly clear as a result of appropriate use of terminology and minimal vagueness; no tangents and no repetition; fairly good organization; almost perfect grammar, spelling, punctuation, and word usage. 20 points: The paper is clear, concise, and a pleasure to read as a result of appropriate and precise use of terminology; total coherence of thoughts and presentation and logical organization; and the essay is error free. Structure of the Paper (worth 10% of total points) Zero points: Student failed to submit the final paper. 3 points out of 10: Student needs to develop better formatting skills. The paper omits significant structural elements required for and APA 6th edition paper. Formatting of the paper has major flaws. The paper does not conform to APA 6th edition requirements whatsoever. 5 points out of 10: Appearance of final paper demonstrates the student’s limited ability to format the paper. There are significant errors in formatting and/or the total omission of major components of an APA 6th edition paper. They can include the omission of the cover page, abstract, and page numbers. Additionally the page has major formatting issues with spacing or paragraph formation. Font size might not conform to size requirements. The student also significantly writes too large or too short of and paper 7 points out of 10: Research paper presents an above-average use of formatting skills. The paper has slight errors within the paper. This can include small errors or omissions with the cover page, abstract, page number, and headers. There could be also slight formatting issues with the document spacing or the font Additionally the paper might slightly exceed or undershoot the specific number of required written pages for the assignment. 10 points: Student provides a high-caliber, formatted paper. This includes an APA 6th edition cover page, abstract, page number, headers and is double spaced in 12’ Times Roman Font. Additionally, the paper conforms to the specific number of required written pages and neither goes over or under the specified length of the paper. GET THIS PROJECT NOW BY CLICKING ON THIS LINK TO PLACE THE ORDER
CLICK ON THE LINK HERE: https://www.perfectacademic.com/orders/ordernow
Do You Have Any Other Essay/Assignment/Class Project/Homework Related to this? Click Here Now [CLICK ME] and Have It Done by Our PhD Qualified Writers!!