Tuesday, March 10, 2015

Data science digest #40 (2 - 8 March 2015)

(Russian version is here)


Theory, machine learning algorithms and code samples

  • Python Mapping Your Music Collection - in this article we'll explore a neat way of visualizing your MP3 music collection. The end result will be a hexagonal map of all your songs, with similar sounding tracks located next to each other. The color of different regions corresponds to different genres of music (e.g. classical, hip hop, hard rock). As an example, here's a map of three albums from my music collection: Paganini's Violin Caprices, Eminem's The Eminem Show, and Coldplay's X&Y.
  • Theory Python Factor Analysis - factor analysis (FA) is a latent variable model that describes the variability of a given dataset. It was developed by the psychologists Charles Spearman, Raymond Catell, and Louis Leon Thurstone.
  • R Base R Plots
  • For beginners Python ML Pitfalls: Measuring Performance (Part 1) - Unfortunately, analysis lives and dies by self-reported metrics. Is this feature A better than feature B? Is this classifier better than another? How much confidence can I have in this financial report? From the development to the consumption, almost every decision regarding analytics inherently asks "How good is this model?"
  • Gradient Descent Training Using C#
  • Understanding Natural Language with Deep Neural Networks Using Torch - anyone who starts investigating ML quickly encounters the somewhat mysterious phrase “gradient descent.” In this article, James McCaffrey will explain what gradient descent is and demonstrate how to use it to train a logistic regression classification system.
  • Python Interactive Data Visualization with D3.js, DC.js, Python, and MongoDB - data visualization plays an important role in data analysis workflows. It enables data analysts to effectively discover patterns in large datasets through graphical means, and to represent these findings in a meaningful and effective way. Data visualization is an interdisciplinary field, which requires design, web development, database and coding skills.
  • Calculate PageRanks with Apache Hadoop
  • For beginners Python Introduction to Machine Learning with Python and Scikit-Learn

Online courses, learning materials and literature

  • Online course Online-course: Deep Learning for Natural Language Processing - natural language processing (NLP) is one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate most everything in language: web search, advertisement, emails, customer service, language translation, radiology reports, etc.
  • Online course Online-course: Introduction to Computational Thinking and Data Science - 6.00.2x is aimed at students with some prior programming experience in Python and a rudimentary knowledge of computational complexity. We have chosen to focus on breadth rather than depth. The goal is to provide students with a brief introduction to many topics, so that they will have an idea of what’s possible when the time comes later in their career to think about how to use computation to accomplish some goal.
  • Online course Online-course: The Analytics Edge - in the last decade, the amount of data available to organizations has reached unprecedented levels. Data is transforming business, social interactions, and the future of our society. In this course, you will learn how to use data and analytics to give an edge to your career and your life. We will examine real world examples of how analytics have been used to significantly improve a business or industry.
  • Online course Online-course: Data Analysis and Statistical Inference - this course introduces you to the discipline of statistics as a science of understanding and analyzing data. You will learn how to effectively make use of data in the face of uncertainty: how to collect data, how to analyze data, and how to use data to make inferences and conclusions about real world phenomena.
  • Literature Book review: About Time Series Databases and a New look at Anomaly detection by Ted Dunning and Ellen Friedman - this blog post is a review of two books. Both are available for free from the MapR site, written by Ted Dunning and Ellen Friedman (published by O Reilly)
  • Python Free online-book: Kalman and Bayesian Filters in Python
  • Literature Free Big Data Analytics Handbook - Brian Liou from Leada was kind enough to provide a guest post about their latest handbook, The Data Analytics Handbook: Big Data Edition.

Videos, podcasts

  • Video Deep Learning at Flickr, Pierre Garrigues - Pierre Garrigues is a Researcher in Machine Perception and Learning at Flickr and also spoke at the Deep Learning Summit at the end of January to give an insight into how Flickr are automating the labelling of their image libraries using Deep Learning techniques as well as the 10 million uploads which they receive each day.
  • Podcast Partially Derivative: Episode 16: Algorithm Aversion - this week the team talks about Jonathon's new ISIS analysis, iPython 3, Indian Food, Algorithm Aversion, and more!

Data engineering


Previous digest: Data science digest #39 (23 February - 1 March 2015)

All data science digests: Data science digests

No comments:

Post a Comment