Sunday, March 15, 2015

Data science digest #41 (9 - 15 March 2015)

(Russian version is here)

General info

Theory, machine learning algorithms and code examples

  • A Full Hardware Guide to Deep Learning
  • Deep Learning, The Curse of Dimensionality, and Autoencoders - autoencoders are an extremely exciting new approach to unsupervised learning and for many machine learning tasks they have already surpassed the decades of progress made by researchers handpicking features.
  • Deep Learning for Text Understanding from Scratch - forget about the meaning of words, forget about grammar, forget about syntax, forget even the very concept of a word. Now let the machine learn everything by itself.
  • Python Python: scikit-learn – Training a classifier with non numeric features
  • For beginners Theory Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 - this article offers a brief glimpse of the history and basic concepts of machine learning. We will take a look at the first algorithmically described neural network and the gradient descent algorithm in context of adaptive linear neurons, which will not only introduce the principles of machine learning but also serve as the basis for modern multilayer neural networks in future articles.
  • Naive Bayes on Apache Flink - in this blog post we are going to implement a Naive Bayes classifier in Apache Flink. We are going to use it for text classification by applying it to the 20 Newsgroup dataset. To understand what is going on, you should be familiar with Java and know what MapReduce is.
  • For beginners Beginner's Guide to Machine Learning: Part 1 of 2 - data science, big data, data mining and machine learning are some of the most prominent buzzwords around right now. The difference between success or failure is more and more about the data you collect from your customers, their actions and devices and how these data points impact your business. Companies want to collect data, lots of it, and then do something with it.
  • The genetic algorithms - a genetic algorithm (GA) is a variant of stochastic beam search, which involves several search points/states concurrently (similar to the shotgun approach noted in the former post), and somehow combines their features according to their performance to generate better successor states. Thus, GA differs from former approaches like simulated annealing that only rely on the modification and evolution of a single state.
  • Python Data-processing and machine learning with Python
  • Python Clustering With K-Means in Python - a very common task in data analysis is that of grouping a set of objects into subsets such that all elements within a group are more similar among them than they are to the others. The practical applications of such a procedure are many: given a medical image of a group of cells, a clustering algorithm could aid in identifying the centers of the cells; looking at the GPS data of a user’s mobile device, their more frequently visited locations within a certain radius can be revealed; for any set of unlabeled observations, clustering helps establish the existence of some sort of structure that might indicate that the data is separable.
  • Introduction to Machine Learning Studio
  • How-to: Tune Your Apache Spark Jobs (Part 1)
  • Gravitational Clustering - new supervised learning method that works through mimicking gravity.

Machine learning competitions

Online courses, training materials and literature

Videos, podcasts

Data engineering

Reviews

Previous digest: Data science digest #40 (2 - 8 March 2015)

All data science digests: Data science digests

No comments:

Post a Comment