Click this button to return to main.
This work is a snippet reconstructed from an academic coursework 95-885 Data Science and Big Data at Carnegie Mellon University.
Used Pandas. Exported from Jupyter Notebook.
Making a dataset that conforms to H. Wickham’s definition of ‘tidy’ data.
This work is a snippet reconstructed from an academic coursework from 95-885 Data Science and Big Data at Carnegie Mellon University.
In [1]:
import pandas as pd
import datetime
In [2]:
# read csv file and confirm it's the correct file
# make dataframe
df = pd.read_csv('billboard.csv', sep = ',')
display(df.head())
display(df.tail())
| year | artist | track | time | genre | date.entered | date.peaked | x1st.week | x2nd.week | x3rd.week | ... | x67th.week | x68th.week | x69th.week | x70th.week | x71st.week | x72nd.week | x73rd.week | x74th.week | x75th.week | x76th.week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000 | Destiny's Child | Independent Women Part I | 3:38 | Rock | 9/23/2000 | 11/18/2000 | 78 | 63.0 | 49.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2000 | Santana | Maria, Maria | 4:18 | Rock | 2/12/2000 | 4/8/2000 | 15 | 8.0 | 6.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2000 | Savage Garden | I Knew I Loved You | 4:07 | Rock | 10/23/1999 | 1/29/2000 | 71 | 48.0 | 43.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2000 | Madonna | Music | 3:45 | Rock | 8/12/2000 | 9/16/2000 | 41 | 23.0 | 18.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2000 | Aguilera, Christina | Come On Over Baby (All I Want Is You) | 3:38 | Rock | 8/5/2000 | 10/14/2000 | 57 | 47.0 | 45.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 83 columns
| year | artist | track | time | genre | date.entered | date.peaked | x1st.week | x2nd.week | x3rd.week | ... | x67th.week | x68th.week | x69th.week | x70th.week | x71st.week | x72nd.week | x73rd.week | x74th.week | x75th.week | x76th.week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 312 | 2000 | Ghostface Killah | Cherchez LaGhost | 3:04 | R&B | 8/5/2000 | 8/5/2000 | 98 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 313 | 2000 | Smith, Will | Freakin' It | 3:58 | Rap | 2/12/2000 | 2/12/2000 | 99 | 99.0 | 99.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 314 | 2000 | Zombie Nation | Kernkraft 400 | 3:30 | Rock | 9/2/2000 | 9/2/2000 | 99 | 99.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 315 | 2000 | Eastsidaz, The | Got Beef | 3:58 | Rap | 7/1/2000 | 7/1/2000 | 99 | 99.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 316 | 2000 | Fragma | Toca's Miracle | 3:22 | R&B | 10/28/2000 | 10/28/2000 | 99 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 83 columns
In [3]:
# melt dataframe for weeks
df2 = pd.melt(df, id_vars = ['year', 'artist', 'track', 'time', 'genre', 'date.entered',
'date.peaked'], value_vars = ['x1st.week', 'x2nd.week', 'x3rd.week', 'x4th.week',
'x5th.week', 'x6th.week', 'x7th.week', 'x8th.week', 'x9th.week',
'x10th.week', 'x11th.week', 'x12th.week', 'x13th.week', 'x14th.week',
'x15th.week', 'x16th.week', 'x17th.week', 'x18th.week', 'x19th.week',
'x20th.week', 'x21st.week', 'x22nd.week', 'x23rd.week', 'x24th.week',
'x25th.week', 'x26th.week', 'x27th.week', 'x28th.week', 'x29th.week',
'x30th.week', 'x31st.week', 'x32nd.week', 'x33rd.week', 'x34th.week',
'x35th.week', 'x36th.week', 'x37th.week', 'x38th.week', 'x39th.week',
'x40th.week', 'x41st.week', 'x42nd.week', 'x43rd.week', 'x44th.week',
'x45th.week', 'x46th.week', 'x47th.week', 'x48th.week', 'x49th.week',
'x50th.week', 'x51st.week', 'x52nd.week', 'x53rd.week', 'x54th.week',
'x55th.week', 'x56th.week', 'x57th.week', 'x58th.week', 'x59th.week',
'x60th.week', 'x61st.week', 'x62nd.week', 'x63rd.week', 'x64th.week',
'x65th.week', 'x66th.week', 'x67th.week', 'x68th.week', 'x69th.week',
'x70th.week', 'x71st.week', 'x72nd.week', 'x73rd.week', 'x74th.week',
'x75th.week', 'x76th.week'], var_name = 'week', value_name = 'rank')
display(df2)
| year | artist | track | time | genre | date.entered | date.peaked | week | rank | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000 | Destiny's Child | Independent Women Part I | 3:38 | Rock | 9/23/2000 | 11/18/2000 | x1st.week | 78.0 |
| 1 | 2000 | Santana | Maria, Maria | 4:18 | Rock | 2/12/2000 | 4/8/2000 | x1st.week | 15.0 |
| 2 | 2000 | Savage Garden | I Knew I Loved You | 4:07 | Rock | 10/23/1999 | 1/29/2000 | x1st.week | 71.0 |
| 3 | 2000 | Madonna | Music | 3:45 | Rock | 8/12/2000 | 9/16/2000 | x1st.week | 41.0 |
| 4 | 2000 | Aguilera, Christina | Come On Over Baby (All I Want Is You) | 3:38 | Rock | 8/5/2000 | 10/14/2000 | x1st.week | 57.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 24087 | 2000 | Ghostface Killah | Cherchez LaGhost | 3:04 | R&B | 8/5/2000 | 8/5/2000 | x76th.week | NaN |
| 24088 | 2000 | Smith, Will | Freakin' It | 3:58 | Rap | 2/12/2000 | 2/12/2000 | x76th.week | NaN |
| 24089 | 2000 | Zombie Nation | Kernkraft 400 | 3:30 | Rock | 9/2/2000 | 9/2/2000 | x76th.week | NaN |
| 24090 | 2000 | Eastsidaz, The | Got Beef | 3:58 | Rap | 7/1/2000 | 7/1/2000 | x76th.week | NaN |
| 24091 | 2000 | Fragma | Toca's Miracle | 3:22 | R&B | 10/28/2000 | 10/28/2000 | x76th.week | NaN |