Some exercices following tutorial Discover Pandas to make you more comfortable with the concepts.

SendTo: school < at > lucbertin < dot > com
Subject: PYTHON - EXS - <SCHOOL> - <FIRSTNAME LASTNAME (of the team referee)>
Body of the email:First and lastnames of all the team members (included you) + their school email adress
Attached: A Jupyter Notebook converted in HTML file

Base de données accidents corporels de la circulation

data.gouv est une plateforme de diffusion de données publiques de l’État français lancée en 2011.
data.gouv.fr est développé par Etalab, une mission placée sous l’autorité du Premier ministre.

Pour cet exercice nous allons utiliser quelques données tabulaires (format CSV) sur les accidents corporels liés de la circulation.

Voici les 3 url vers les CSV correspondants que nous allons exploiter.

url_usagers2017 = "https://static.data.gouv.fr/resources/base-de-donnees-accidents-corporels-de-la-circulation/20180927-111153/usagers-2017.csv"
url_lieux2017 = "https://static.data.gouv.fr/resources/base-de-donnees-accidents-corporels-de-la-circulation/20180927-111131/lieux-2017.csv"
url_caracteristiques2017 = "https://static.data.gouv.fr/resources/base-de-donnees-accidents-corporels-de-la-circulation/20180927-111012/caracteristiques-2017.csv"

1. ••• Open the CSVs whose urls are given above, store the `DataFrames` in variables `usagers2017`, `lieux2017`, `caracteristiques2017`.

2. ••• Show the 10 first lines of `usagers2017`

3. ••• Show the 10 last lines of `caracteristiques2017`

4. ••• How many lines does `caracteristiques2017` contain ?

5. ••• How many column does `caracteristiques2017` contain ?

6. ••• Show the dtype of each column of `usagers2017`. Same for `caracteristiques2017`.

7. ••• Does `Num_Acc` in `usager2017` contain duplicated values ? What about `Num_Acc` in `caracteristiques2017` ? Hint: `duplicated()`…

8. ••• Conclude on the type of relationship if we were to join `catacteristiques2017` and `usagers2017` on `Num_Acc` ( one-to-one? one-to-many? many-to-many?)

9. ••• Replace all values “1” and “2” in column `sexe` by `Homme` and `Femme`

10. ••• Show women who had experienced accidents.

11. ••• Replace each integers in `grav` (gravité de l’accident) column by their corresponding mapping.

- Indemne
- Tué
- Blessé hospitalisé 
- Blessé léger

12. ••• Same for `catu` (catégorie d’usagers) column

- Conducteur
- Passager
- Piéton
- Piéton en roller ou en trottinette
- Autre véhicule

13. ••• Show the counts for each distinct values in `grav`.

14. ••• Plot it.

15. ••• Show the counts for each distinct values in `catu`.

16. ••• Find women who had mild accidents (“Indemne” or “Blessé léger”, but not more severe!).

Hint: you can use masking and save in an intermediate variable for clean code.

17. ••• Show the number of accidents by `sexe` AND `grav`ity.

18. ••• Using `GroupBy`

19. Using a `Pivot Table`

20. ••• Display it in the form of a stacked bar-chart.

21. ••• Do a merge between `usagers2017` and `caracteristiques2017` on `Num_Acc`.

22. Is there any new value in `Num_Acc` in the final merged table compared to either of table that has been used for merging ? (e.g. does `Num_acc` has a value that does not exist in `usagers2017` but does exist in `caracteristiques2017`, or the other way around)

Hint: DataFrame.equals(other)

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements

23. ••• Count missing values in each column of `caracteristiques2017`. Hint: `.isnull()`

24. Filter the results only by taking the columns having more than 0 missing values + sorted by decreasing number of them.

25. Show the same number of missing values as a pourcentage of the total number of values (lines).

26. ••• Select the accidents who took place in Paris county (département <=> “750” in the table)

27. ••• Plot the map/the maps of accidents by gravity in Paris county.

28. Plot a graph that “shows graphically” the importance of missing values in each column of the dataset.

Let’s play with the dates

29. Rename the columns:

‘jour’ as ‘day’
‘mois’ as ‘month’
‘an’ as ‘year’

Some Pandas exercices

Base de données accidents corporels de la circulation

1. ••• Open the CSVs whose urls are given above, store the `DataFrames` in variables `usagers2017`, `lieux2017`, `caracteristiques2017`.

2. ••• Show the 10 first lines of `usagers2017`

3. ••• Show the 10 last lines of `caracteristiques2017`

4. ••• How many lines does `caracteristiques2017` contain ?

5. ••• How many column does `caracteristiques2017` contain ?

6. ••• Show the dtype of each column of `usagers2017`. Same for `caracteristiques2017`.

7. ••• Does `Num_Acc` in `usager2017` contain duplicated values ? What about `Num_Acc` in `caracteristiques2017` ? Hint: `duplicated()`…

8. ••• Conclude on the type of relationship if we were to join `catacteristiques2017` and `usagers2017` on `Num_Acc` ( one-to-one? one-to-many? many-to-many?)

9. ••• Replace all values “1” and “2” in column `sexe` by `Homme` and `Femme`

10. ••• Show women who had experienced accidents.

11. ••• Replace each integers in `grav` (gravité de l’accident) column by their corresponding mapping.

12. ••• Same for `catu` (catégorie d’usagers) column

13. ••• Show the counts for each distinct values in `grav`.

14. ••• Plot it.

15. ••• Show the counts for each distinct values in `catu`.

16. ••• Find women who had mild accidents (“Indemne” or “Blessé léger”, but not more severe!).

17. ••• Show the number of accidents by `sexe` AND `grav`ity.

18. ••• Using `GroupBy`

19. Using a `Pivot Table`

20. ••• Display it in the form of a stacked bar-chart.

21. ••• Do a merge between `usagers2017` and `caracteristiques2017` on `Num_Acc`.

22. Is there any new value in `Num_Acc` in the final merged table compared to either of table that has been used for merging ? (e.g. does `Num_acc` has a value that does not exist in `usagers2017` but does exist in `caracteristiques2017`, or the other way around)

23. ••• Count missing values in each column of `caracteristiques2017`. Hint: `.isnull()`

24. Filter the results only by taking the columns having more than 0 missing values + sorted by decreasing number of them.

25. Show the same number of missing values as a pourcentage of the total number of values (lines).

26. ••• Select the accidents who took place in Paris county (département <=> “750” in the table)

27. ••• Plot the map/the maps of accidents by gravity in Paris county.

28. Plot a graph that “shows graphically” the importance of missing values in each column of the dataset.

Let’s play with the dates

29. Rename the columns:

30. Add ‘20’ to each element in the lately renamed `year` column, (e.g.: 2017) so to be an understandable year format for `pd.to_datetime`.

31. Create a `date` column taking into account the year, month, and day from question29 and using `pd.to_datetime` in question30.

32. Plot the daily death trends during the entire year.

Join Newsletter

Written by Luc

Some Pandas exercices

Base de données accidents corporels de la circulation

1. ••• Open the CSVs whose urls are given above, store the DataFrames in variables usagers2017, lieux2017, caracteristiques2017.

2. ••• Show the 10 first lines of usagers2017

3. ••• Show the 10 last lines of caracteristiques2017

4. ••• How many lines does caracteristiques2017 contain ?

5. ••• How many column does caracteristiques2017 contain ?

6. ••• Show the dtype of each column of usagers2017. Same for caracteristiques2017.

7. ••• Does Num_Acc in usager2017 contain duplicated values ? What about Num_Acc in caracteristiques2017 ? Hint: duplicated()…

8. ••• Conclude on the type of relationship if we were to join catacteristiques2017 and usagers2017 on Num_Acc ( one-to-one? one-to-many? many-to-many?)

9. ••• Replace all values “1” and “2” in column sexe by Homme and Femme

10. ••• Show women who had experienced accidents.

11. ••• Replace each integers in grav (gravité de l’accident) column by their corresponding mapping.

12. ••• Same for catu (catégorie d’usagers) column

13. ••• Show the counts for each distinct values in grav.

14. ••• Plot it.

15. ••• Show the counts for each distinct values in catu.

16. ••• Find women who had mild accidents (“Indemne” or “Blessé léger”, but not more severe!).

17. ••• Show the number of accidents by sexe AND gravity.

18. ••• Using GroupBy

19. Using a Pivot Table

20. ••• Display it in the form of a stacked bar-chart.

21. ••• Do a merge between usagers2017 and caracteristiques2017 on Num_Acc.

22. Is there any new value in Num_Acc in the final merged table compared to either of table that has been used for merging ? (e.g. does Num_acc has a value that does not exist in usagers2017 but does exist in caracteristiques2017, or the other way around)

23. ••• Count missing values in each column of caracteristiques2017. Hint: .isnull()

24. Filter the results only by taking the columns having more than 0 missing values + sorted by decreasing number of them.

25. Show the same number of missing values as a pourcentage of the total number of values (lines).

26. ••• Select the accidents who took place in Paris county (département <=> “750” in the table)

27. ••• Plot the map/the maps of accidents by gravity in Paris county.

28. Plot a graph that “shows graphically” the importance of missing values in each column of the dataset.

Let’s play with the dates

29. Rename the columns:

30. Add ‘20’ to each element in the lately renamed year column, (e.g.: 2017) so to be an understandable year format for pd.to_datetime.

31. Create a date column taking into account the year, month, and day from question29 and using pd.to_datetime in question30.

32. Plot the daily death trends during the entire year.

Join Newsletter

Written by Luc

1. ••• Open the CSVs whose urls are given above, store the `DataFrames` in variables `usagers2017`, `lieux2017`, `caracteristiques2017`.

2. ••• Show the 10 first lines of `usagers2017`

3. ••• Show the 10 last lines of `caracteristiques2017`

4. ••• How many lines does `caracteristiques2017` contain ?

5. ••• How many column does `caracteristiques2017` contain ?

6. ••• Show the dtype of each column of `usagers2017`. Same for `caracteristiques2017`.

7. ••• Does `Num_Acc` in `usager2017` contain duplicated values ? What about `Num_Acc` in `caracteristiques2017` ? Hint: `duplicated()`…

8. ••• Conclude on the type of relationship if we were to join `catacteristiques2017` and `usagers2017` on `Num_Acc` ( one-to-one? one-to-many? many-to-many?)

9. ••• Replace all values “1” and “2” in column `sexe` by `Homme` and `Femme`

11. ••• Replace each integers in `grav` (gravité de l’accident) column by their corresponding mapping.

12. ••• Same for `catu` (catégorie d’usagers) column

13. ••• Show the counts for each distinct values in `grav`.

15. ••• Show the counts for each distinct values in `catu`.

17. ••• Show the number of accidents by `sexe` AND `grav`ity.

18. ••• Using `GroupBy`

19. Using a `Pivot Table`

21. ••• Do a merge between `usagers2017` and `caracteristiques2017` on `Num_Acc`.

22. Is there any new value in `Num_Acc` in the final merged table compared to either of table that has been used for merging ? (e.g. does `Num_acc` has a value that does not exist in `usagers2017` but does exist in `caracteristiques2017`, or the other way around)

23. ••• Count missing values in each column of `caracteristiques2017`. Hint: `.isnull()`

30. Add ‘20’ to each element in the lately renamed `year` column, (e.g.: 2017) so to be an understandable year format for `pd.to_datetime`.

31. Create a `date` column taking into account the year, month, and day from question29 and using `pd.to_datetime` in question30.