3.5 Project Overview (Instructions)

3.5.1 Overview

In this project, you will analyze a dataset and then communicate your findings about it. You will use the Python libraries NumPy, pandas, and Matplotlib to make your analysis easier.

Preparation for this project with: Intro to Data Analysis

3.5.2 What do I need to install?

You will need an installation of Python, plus the following libraries:

* pandas
* NumPy
* Matplotlib
* csv

We recommend installing Anaconda, which comes with all of the necessary packages, as well as iPython notebook. You can find installation instructions here.

3.5.3 Why this Project?

In this project, you’ll go through the data analysis process and see how everything fits together. Later Nanodegree projects will focus on individual pieces of the data analysis process.

You’ll use the Python libraries NumPy, pandas, and Matplotlib, which make writing data analysis code in Python a lot easier! Not only that, these are sought-after skills by employers!

3.5.4 What will I learn?

After completing the project, you will:

  • Know all the steps involved in a typical data analysis process
  • Be comfortable posing questions that can be answered with a given dataset and then answering those questions
  • Know how to investigate problems in a dataset and wrangle the data into a format you can use
  • Have practice communicating the results of your analysis
  • Be able to use vectorized operations in NumPy and pandas to speed up your data analysis code
  • Be familiar with pandas’ Series and DataFrame objects, which let you access your data more conveniently
  • Know how to use Matplotlib to produce plots showing your findings

3.5.5 2. Project Details

3.5.5.1 How do I Complete this Project?

This project is connected with the Introduction to Data Analysis course, but depending on your background knowledge, you may not need to take the whole class to complete this project.

3.5.5.2 Introduction

For the final project, you will conduct your own data analysis and create a file to share that documents your findings. You should start by taking a look at your dataset and brainstorming what questions you could answer using it. Then you should use Pandas and NumPy to answer the questions you are most interested in, and create a report sharing the answers. You will not be required to use inferential statistics or machine learning to complete this project, but you should make it clear in your communications that your findings are tentative. This project is open-ended in that we are not looking for one right answer.

3.5.5.3 Step One - Choose Your Data Set

Click this link to open a document with links and information about data sets that you can investigate for this project. You must choose one of these datasets to complete the project.

3.5.5.4 Step Two - Get Organized

Eventually you’ll want to submit your project (and share it with friends, family, and employers). Get organized before you begin. We recommend creating a single folder that will eventually contain:

  • The report communicating your findings
  • Any Python code you wrote as part of your analysis
  • The data set you used (which you will not need to submit)

You may wish to use Jupyter notebook, in which case you can submit both the code you wrote and the report of your findings in the same document. Otherwise, you will need to submit your report and code separately. If you would like a notebook template to help organize your investigation, you can find a link in the resources at the bottom of the page or you can click here. You can also complete and submit the project in the classroom by going to the Project Notebook part of this lesson.

3.5.5.5 Step Three - Analyze Your Data

Brainstorm some questions you could answer using the data set you chose, then start answering those questions. You can find some questions in the data set options to help you get started.

Try and suggest questions that promote looking at relationships between multiple variables. You should aim to analyze at least one dependent variable and three independent variables in your investigation. Make sure you use NumPy and Pandas where they are appropriate!

3.5.5.6 Step Four - Share Your Findings

Once you have finished analyzing the data, create a report that shares the findings you found most interesting. If you use a Jupyter notebook, share your findings alongside the code you used to perform the analysis. make sure that your report text is contained in Markdown cells to clearly distinguish your comments and findings from your code work. You should also feel free to use other tools and software to craft your final report, but make sure that you can submit your report as an HTML or PDF file so that it can be opened easily.

3.5.5.7 Step Five - Review

Use the Project Rubric to review your project. If you are happy with your submission, then you’re ready to submit your project. If you see room for improvement, keep working to improve your project!

3.5.6 3. Video

3.5.7 4. Investigate a Dataset

3.5.8 Project Submission

Choose one of Udacity’s curated datasets and investigate it using NumPy and pandas. Go through the entire data analysis process, starting by posing a question and finishing by sharing your findings.

3.5.9 Evaluation

Use the Project Rubric to review your project. If you are happy with your submission, then you are ready to submit! If you see room for improvement in any category in which you do not meet specifications, keep working!

Your project will be evaluated by a Udacity reviewer according to the same Project Rubric. Your project must “meet specifications” or “exceed specifications” in each category in order for your submission to pass.

3.5.10 Submission

3.5.10.1 What to include in your submission

  1. A PDF or HTML file containing your analysis. This file should include:
  • A note specifying which dataset you analyzed
  • A statement of the question(s) you posed
  • A description of what you did to investigate those questions
  • Documentation of any data wrangling you did
  • Summary statistics and plots communicating your final results
  1. Code you used to perform your analysis. If you used a Jupyter notebook, you can submit your .ipynb. Otherwise, you should submit the code separately in .py file(s).
  2. A list of Web sites, books, forums, blog posts, github repositories, etc. that you referred to or used in creating your submission (add N/A if you did not use any such resources).

3.5.10.2 Jupyter notebook instructions

If you used a Jupyter notebook on your computer to create your project, you can include all your code and analysis in the notebook and do not need to create additional files for your analysis. You will still need to export your work in a PDF or HTML format also (see point 1 above), and include this in your submission as well. To download your notebook as an HTML file, click on File -> Download.As -> HTML (.html) within the notebook. If you get an error about “No module name”, then open a terminal and try installing the missing module using pip install (don’t include the “<” or “>” or any words following a period in the module name).

3.5.10.3 Ready to submit your project?

Click on the “Submit Project” button and follow the instructions to submit!

It can take us up to a week to grade the project, but in most cases it is much faster. You will get an email when your submission has been reviewed.

If you are having any problems submitting your project or wish to check on the status of your submission, please email us at review-support@udacity.com.

 

A work by AH Uyekita

anderson.uyekita[at]gmail.com