# Unsorted
"Machine Learning: The High-Interest Credit Card of Technical Debt" - https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43146.pdf
- https://www.linkedin.com/posts/alec-campanini_alecfromwalmart-business-leadership-activity-7063905885156225024-tz8E
-
# Approaches
- Clustering
- https://en.wikipedia.org/wiki/Latent_semantic_analysis
- https://aisel.aisnet.org/cgi/viewcontent.cgi?article=4025&context=cais
- "Topic detection"
- https://www.tidytextmining.com/topicmodeling.html
# Databases
## Structure
- https://servian.dev/why-transactional-databases-are-better-suited-for-data-warehouse-control-frameworks-cb9379048967
## ETL
- https://servian.dev/why-batch-time-and-job-based-orchestration-are-false-economies-556fb9a72bd
- https://servian.dev/using-talends-dynamic-run-job-to-run-jobs-in-parallel-and-sequential-order-c4cc061b487a
## Observability?
- https://www.montecarlodata.com/blog-what-is-data-observability/
- https://learn.microsoft.com/en-us/sql/data-quality-services/data-quality-services?view=sql-server-ver16
- https://medium.com/weareservian/visualize-data-lineage-using-only-sql-13f720870f1f
- https://docs.getre.io/ui-latest/#/graph?model=postgres.toy_shop_sources.toy_shop_customers&tab=test
# Governance?
- https://servian.dev/4-data-governance-strategies-to-support-efficient-machine-learning-e0ca544485ef
- https://techmagie.wordpress.com/2020/07/19/data-governance-what-when-why-who-and-how-of-data/
# Sources
- "Mosaic" data:
- https://www.experian.com.au/mosaic
- Pulls together Experian's credit data with some other sources to estimate incomes down to quite small groups (like 40 households apparently)
- Tags: customer data, income data
# Testing & QA
- Data QA
- https://dataform.co/blog/data-assertions
- https://www.kdnuggets.com/2021/05/soda-io-managing-data-quality-sql-scale.html
- Query Testing
- https://dataform.co/blog/unit-tests?utm_medium=organic&utm_source=dataform_blog&utm_campaign=advanced_data_quality_testing
- Cf. "Data-driven testing"
- [Matt Kaye - Pull Requests, Code Review, and The Art of Requesting Changes](https://matthewrkaye.com/posts/series/doing-data-science/2023-04-14-code-review/code-review.html "https://matthewrkaye.com/posts/series/doing-data-science/2023-04-14-code-review/code-review.html")
- [Tidyteam code review principles (tidyverse.org)](https://code-review.tidyverse.org/ "https://code-review.tidyverse.org/")
- https://twitter.com/yabellini/status/1656450313895682052?s=09
# Broader
- Some people apparently swear by "dbt", a Python package/pattern/philosophy:
- https://www.brittanybennett.com/post/there-s-a-better-way-the-case-for-dbt-for-progressive-data-professionals
- Should unpack this...
# Visualisation / apps
To pick through:
- Royal Stat Society on best practices for data viz - [https://royal-statistical-society.github.io/datavisguide/](https://royal-statistical-society.github.io/datavisguide/ "https://royal-statistical-society.github.io/datavisguide/")
Shiny, but really fairly basic UX stuff: [Shiny User Adoption Fails: 9 Reasons Why Nobody Uses Your App - R programming, Shiny for Python (appsilon.com)](https://appsilon.com/reasons-why-shiny-user-adoption-fails/ "https://appsilon.com/reasons-why-shiny-user-adoption-fails/")
"Paired bar charts suck": https://twitter.com/rappa753/status/1643267220464865280?t=qkgOQCSSAwdWPnYfkgPvoQ&s=09
![[screenshot_2022-12-16_at_07.42.22.png]]
Tips for building good apps: https://www.linkedin.com/posts/alec-campanini_analytics-ui-data-activity-7066876924475682816-pTqj
# Qualitative
- The perennial question
- With the advent of much better LLMs there's new opportunities - stuff like embeddings seems to be much better than mucking about with older NLP approaches (but I don't know how true this is *empirically*...)
- I've also heard people use and talk about Nvivo for interrogating big sets of survey responses?
- And there are techniques, like:
- Grounded theory
- Atomic method of qualitative research (?)
- "Insights repositories" like:
- dovetail
- gleanly
- Good ol' whiteboards (Excalidraw, Miro)