Schedule

Calendar of resources

The material in this module is designed to be experienced in an intensive one week format followed by an assessment meant to showcase data science skills. For enrolled students, the work will be supported with several live sessions during the main week of delivery.

Outline of material


SQL mini course (freecodecamp)

Mon

Reading: Pettit 2022 Ch 01 and ch 02

D/L & install: MySQL Community Server :: PopSQL

⌨️ (0:00) Introduction ⌨️ (2:36) What is a Database? ⌨️ (23:10) Tables & Keys ⌨️ (43:31) SQL Basics ⌨️ (52:26) MySQL Windows Installation ⌨️ (1:01:59) MySQL Mac Installation ⌨️ (1:15:49) Creating Tables ⌨️ (1:31:05) Inserting Data ⌨️ (1:38:17) Constraints ⌨️ (1:48:11) Update & Delete ⌨️ (1:56:11) Basic Queries ⌨️ (2:08:37) Company Database Intro ⌨️ (2:14:05) Creating Company Database ⌨️ (2:30:27 ) More Basic Queries ⌨️ (2:26:24) Functions ⌨️ (2:45:13) Wildcards ⌨️ (2:53:53) Union ⌨️ (3:01:36) Joins ⌨️ (3:11:49) Nested Queries ⌨️ (3:21:52) On Delete ⌨️ (3:30:05) Triggers ⌨️ (3:42:12) ER Diagrams Intro ⌨️ (3:55:53) Designing an ER Diagram ⌨️ (4:08:34) Converting ER Diagrams to Schemas


Databricks mini course

Tues

Reading: Gupta 2022 Ch 01 and Ch 02

Data bricks account and fast start :: data files zip :: lab notebooks zip :: hacktivity notebooks zip

(Tues-Fri suggested ‘module’ schedule)

Mod 00 - intro and setup :: slides

Mod 01 – Spark Architecture :: slides

Mod 02 – SparkSQL (Read/Write DataFrames/Tables) :: slides

Hacktivity 00 (Dates) / Hacktivity 01 (Air)

Wed

Mod 03 – SparkSQL (Transform) :: slides

Mod 04 – Complex Data Types :: slides

Mod 05 – JSON (Optional) :: slides

Hacktivity 02 (Fly)

Thurs

Mod 06 – Streaming :: slides

Mod 07 – Architecture-Spark UI :: slides

Mod 08 – Catalog-Catalyst-Tungsten :: slides

Mod 09 – Adaptive Query Execution :: slides

Fri

Mod 10 – Performance Tuning :: slides

Hacktivity 03 (Stream) / Hacktivity 04 (Air)

Mod 11 – Machine Learning :: slides 11-1 :: slides 11-2 :: slides 11-3 :: slides 11-4

References

Readings books :: Readings articles

Books Gupta, V., 2022. Business Intelligence with Databricks SQL: Concepts, tools, and techniques for scaling business intelligence on the data lakehouse. Packt Publishing.

Pettit, T., Cosentino, S., 2022. The MySQL Workshop: A practical guide to working with data and managing databases with MySQL. Packt Publishing.

Teate, R.M.P., 2021. SQL for Data Scientists: A Beginner’s Guide for Building Datasets for Analysis, 1st edition. ed. Wiley, Indianapolis.

Articles Amani, M., Ghorbanian, A., Ahmadi, S.A., Kakooei, M., Moghimi, A., Mirmazloumi, S.M., Moghaddam, S.H.A., Mahdavi, S., Ghahremanloo, M., Parsian, S., Wu, Q., Brisco, B., 2020. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13, 5326–5350. https://doi.org/10.1109/JSTARS.2020.3021052

Cravero, A., Sepúlveda, S., 2021. Use and Adaptations of Machine Learning in Big Data—Applications in Real Cases in Agriculture. Electronics 10, 552. https://doi.org/10.3390/electronics10050552

Guo, H., 2017. Big Earth data: A new frontier in Earth and information sciences. Big Earth Data 1, 4–20. https://doi.org/10.1080/20964471.2017.1403062

Kamilaris, A., Kartakoullis, A., Prenafeta-Boldú, F.X., 2017. A review on the practice of big data analysis in agriculture. Computers and Electronics in Agriculture 143, 23–37. https://doi.org/10.1016/j.compag.2017.09.037

Northcott, R., 2020. Big data and prediction: Four case studies. Studies in History and Philosophy of Science Part A 81, 96–104. https://doi.org/10.1016/j.shpsa.2019.09.002

Pham, X., Stack, M., 2018. How data analytics is transforming agriculture. Business Horizons 61, 125–133. https://doi.org/10.1016/j.bushor.2017.09.011

Runting, R.K., Phinn, S., Xie, Z., Venter, O., Watson, J.E.M., 2020. Opportunities for big data in conservation and sustainability. Nature Communications 11, 2003. https://doi.org/10.1038/s41467-020-15870-0

Tamiminia, H., Salehi, B., Mahdianpari, M., Quackenbush, L., Adeli, S., Brisco, B., 2020. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS Journal of Photogrammetry and Remote Sensing 164, 152–170. https://doi.org/10.1016/j.isprsjprs.2020.04.001



Harper Adams Data Science

Harper Data Science

This module is a part of the MSc in Data Science for Global Agriculture, Food, and Environment at Harper Adams University, led by Ed Harris.