Schedule
Calendar of resources
The material in this module is designed to be experienced in an intensive one week format followed by an assessment meant to showcase data science skills. For enrolled students, the work will be supported with several live sessions during the main week of delivery.
SQL mini course (freecodecamp)
Mon
Reading: Pettit 2022 Ch 01 and ch 02
D/L & install: MySQL Community Server :: PopSQL
⌨️ (0:00) Introduction ⌨️ (2:36) What is a Database? ⌨️ (23:10) Tables & Keys ⌨️ (43:31) SQL Basics ⌨️ (52:26) MySQL Windows Installation ⌨️ (1:01:59) MySQL Mac Installation ⌨️ (1:15:49) Creating Tables ⌨️ (1:31:05) Inserting Data ⌨️ (1:38:17) Constraints ⌨️ (1:48:11) Update & Delete ⌨️ (1:56:11) Basic Queries ⌨️ (2:08:37) Company Database Intro ⌨️ (2:14:05) Creating Company Database ⌨️ (2:30:27 ) More Basic Queries ⌨️ (2:26:24) Functions ⌨️ (2:45:13) Wildcards ⌨️ (2:53:53) Union ⌨️ (3:01:36) Joins ⌨️ (3:11:49) Nested Queries ⌨️ (3:21:52) On Delete ⌨️ (3:30:05) Triggers ⌨️ (3:42:12) ER Diagrams Intro ⌨️ (3:55:53) Designing an ER Diagram ⌨️ (4:08:34) Converting ER Diagrams to Schemas
Databricks mini course
Tues
Reading: Gupta 2022 Ch 01 and Ch 02
Data bricks account and fast start :: data files zip :: lab notebooks zip :: hacktivity notebooks zip
(Tues-Fri suggested ‘module’ schedule)
Mod 00 - intro and setup :: slides
Mod 01 – Spark Architecture :: slides
Mod 02 – SparkSQL (Read/Write DataFrames/Tables) :: slides
Hacktivity 00 (Dates) / Hacktivity 01 (Air)
Wed
Mod 03 – SparkSQL (Transform) :: slides
Mod 04 – Complex Data Types :: slides
Mod 05 – JSON (Optional) :: slides
Hacktivity 02 (Fly)
Thurs
Mod 06 – Streaming :: slides
Mod 07 – Architecture-Spark UI :: slides
Mod 08 – Catalog-Catalyst-Tungsten :: slides
Mod 09 – Adaptive Query Execution :: slides
Fri
Mod 10 – Performance Tuning :: slides
Hacktivity 03 (Stream) / Hacktivity 04 (Air)
Mod 11 – Machine Learning :: slides 11-1 :: slides 11-2 :: slides 11-3 :: slides 11-4
References
Readings books :: Readings articles
Books Gupta, V., 2022. Business Intelligence with Databricks SQL: Concepts, tools, and techniques for scaling business intelligence on the data lakehouse. Packt Publishing.
Pettit, T., Cosentino, S., 2022. The MySQL Workshop: A practical guide to working with data and managing databases with MySQL. Packt Publishing.
Teate, R.M.P., 2021. SQL for Data Scientists: A Beginner’s Guide for Building Datasets for Analysis, 1st edition. ed. Wiley, Indianapolis.
Articles Amani, M., Ghorbanian, A., Ahmadi, S.A., Kakooei, M., Moghimi, A., Mirmazloumi, S.M., Moghaddam, S.H.A., Mahdavi, S., Ghahremanloo, M., Parsian, S., Wu, Q., Brisco, B., 2020. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13, 5326–5350. https://doi.org/10.1109/JSTARS.2020.3021052
Cravero, A., Sepúlveda, S., 2021. Use and Adaptations of Machine Learning in Big Data—Applications in Real Cases in Agriculture. Electronics 10, 552. https://doi.org/10.3390/electronics10050552
Guo, H., 2017. Big Earth data: A new frontier in Earth and information sciences. Big Earth Data 1, 4–20. https://doi.org/10.1080/20964471.2017.1403062
Kamilaris, A., Kartakoullis, A., Prenafeta-Boldú, F.X., 2017. A review on the practice of big data analysis in agriculture. Computers and Electronics in Agriculture 143, 23–37. https://doi.org/10.1016/j.compag.2017.09.037
Northcott, R., 2020. Big data and prediction: Four case studies. Studies in History and Philosophy of Science Part A 81, 96–104. https://doi.org/10.1016/j.shpsa.2019.09.002
Pham, X., Stack, M., 2018. How data analytics is transforming agriculture. Business Horizons 61, 125–133. https://doi.org/10.1016/j.bushor.2017.09.011
Runting, R.K., Phinn, S., Xie, Z., Venter, O., Watson, J.E.M., 2020. Opportunities for big data in conservation and sustainability. Nature Communications 11, 2003. https://doi.org/10.1038/s41467-020-15870-0
Tamiminia, H., Salehi, B., Mahdianpari, M., Quackenbush, L., Adeli, S., Brisco, B., 2020. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS Journal of Photogrammetry and Remote Sensing 164, 152–170. https://doi.org/10.1016/j.isprsjprs.2020.04.001
Harper Adams Data Science
This module is a part of the MSc in Data Science for Global Agriculture, Food, and Environment at Harper Adams University, led by Ed Harris.