查看: 1557|回复: 2

Julia for Data Science

[复制链接]
论坛徽章:
1097
scala徽章
日期:2014-11-27 15:51:52统计徽章
日期:2014-12-09 15:52:20数据挖掘徽章
日期:2014-12-18 12:07:12Oracle研习者中级
日期:2014-12-18 12:09:05树莓派
日期:2014-12-18 14:59:16树莓派
日期:2014-12-18 14:59:40R研习者中级
日期:2014-12-18 15:08:55scala徽章
日期:2014-12-18 15:10:43Oracle研习者高级
日期:2014-12-24 15:04:47JVM徽章
日期:2014-12-24 15:14:33R研习者中级
日期:2014-12-24 15:25:22R研习者中级
日期:2014-12-24 15:28:23
发表于 2017-3-10 14:36 | 显示全部楼层 |阅读模式

tm Python Linux Scala Julia

Table of Contents

Preface 1

Chapter 1: The Groundwork – Julia's Environment 7
Julia is different 8
Setting up the environment 10
Installing Julia (Linux) 10
Installing Julia (Mac) 12
Installing Julia (Windows) 12
Exploring the source code 13
Using REPL 13
Using Jupyter Notebook 15
Package management 18
Pkg.status() – package status 18
Pkg.add() – adding packages 19
Working with unregistered packages 19
Pkg.update() – package update 20
METADATA repository 20
Developing packages 20
Creating a new package 21
Parallel computation using Julia 21
Julia's key feature – multiple dispatch 23
Methods in multiple dispatch 24
Ambiguities – method definitions 25
Facilitating language interoperability 26
Calling Python code in Julia 27
Summary 28
References 28

Chapter 2: Data Munging 29
What is data munging? 30
The data munging process 30
What is a DataFrame? 31
The NA data type and its importance 32
DataArray – a series-like data structure 33
DataFrames – tabular data structures 34
Installation and using DataFrames.jl 37
Writing the data to a file 41
Working with DataFrames 42
Understanding DataFrames joins 42
The Split-Apply-Combine strategy 48
Reshaping the data 49
Sorting a dataset 56
Formula – a special data type for mathematical expressions 57
Pooling data 59
Web scraping 61
Summary 64
References 65

Chapter 3: Data Exploration 66
Sampling 67
Population 67
Weight vectors 68
Inferring column types 70
Basic statistical summaries 70
Calculating the mean of the array or dataframe 71
Scalar statistics 73
Standard deviations and variances 73
Measures of variation 77
Z-scores 78
Entropy 79
Quantiles 80
Modes 82
Summary of datasets 83
Scatter matrix and covariance 83
Computing deviations 84
Rankings 84
Counting functions 85
Histograms 87
Correlation analysis 91
Summary 93
References 94

Chapter 4: Deep Dive into Inferential Statistics 95
Installation 96
Understanding the sampling distribution 97
Understanding the normal distribution 97
Parameter estimation 99
Type hierarchy in Distributions.jl 100
Understanding Sampleable 100
Representing probabilistic distributions 101
Univariate distributions 102
Retrieving parameters 102
Statistical functions 102
Evaluation of probability 103
Sampling in Univariate distributions 103
Understanding Discrete Univariate distributions and types 103
Bernoulli distribution 103
Binomial distribution 104
Continuous distributions 105
Cauchy distribution 105
Chi distribution 105
Chi-square distribution 106
Truncated distributions 106
Truncated normal distributions 107
Understanding multivariate distributions 108
Multinomial distribution 108
Multivariate normal distribution 109
Dirichlet distribution 110
Understanding matrixvariate distributions 111
Wishart distribution 111
Inverse-Wishart distribution 111
Distribution fitting 112
Distribution selection 112
Symmetrical distributions 112
Skew distributions to the right 112
Skew distributions to the left 112
Maximum Likelihood Estimation 113
Sufficient statistics 114
Maximum-a-Posteriori estimation 115
Confidence interval 116
Interpreting the confidence intervals 116
Usage 117
Understanding z-score 118
Interpreting z-scores 118
Understanding the significance of the P-value 119
One-tailed and two-tailed test 119
Summary 120
References 120

Chapter 5: Making Sense of Data Using Visualization 121
Difference between using and importall 121
Pyplot for Julia 122
Multimedia I/O 122
Installation 122
Basic plotting 123
Plot using sine and cosine 124
Unicode plots 125
Installation 126
Examples 126
Generating Unicode scatterplots 126
Generating Unicode line plots 127
Visualizing using Vega 127
Installation 127
Examples 128
Scatterplot 128
Heatmaps in Vega 130
Data visualization using Gadfly 131
Installing Gadfly 131
Interacting with Gadfly using plot function 132
Example 132
Using Gadfly to plot DataFrames 135
Using Gadfly to visualize functions and expressions 139
Generating an image with multiple layers 141
Generating plots with different aesthetics using statistics 142
The step function 142
The quantile-quantile function 143
Ticks in Gadfly 144
Generating plots with different aesthetics using Geometry 145
Boxplots 145
Using Geometry to create density plots 147
Using Geometry to create histograms 147
Bar plots 149
Histogram2d – the two-dimensional histogram 150
Smooth line plot 151
Subplot grid 152
Horizontal and vertical lines 154
Plotting a ribbon 154
Violin plots 156
Beeswarm plots 156
Elements – scale 157
x_continuous and y_continuous 158
x_discrete and y_discrete 159
Continuous color scale 160
Elements – guide 161
Understanding how Gadfly works 162
Summary 162
References 162

Chapter 6: Supervised Machine Learning 163
What is machine learning? 163
Uses of machine learning 165
Machine learning and ethics 166
Machine learning – the process 166
Different types of machine learning 168
What is bias-variance trade-off? 168
Effects of overfitting and underfitting on a model 169
Understanding decision trees 169
Building decision trees – divide and conquer 170
Where should we use decision tree learning? 171
Advantages of decision trees 171
Disadvantages of decision trees 172
Decision tree learning algorithms 173
How a decision tree algorithm works 174
Understanding and measuring purity of node 174
An example 175
Supervised learning using Naïve Bayes 178
Advantages of Naïve Bayes 179
Disadvantages of Naïve Bayes 179
Uses of Naïve Bayes classification 179
How Bayesian methods work 180
Posterior probabilities 180
Class-conditional probabilities 181
Prior probabilities 182
Evidence 182
The bag of words 182
Advantages of using Naïve Bayes as a spam filter 183
Disadvantages of Naïve Bayes filters 183
Examples of Naïve Bayes 183
Summary 185
References 186

Chapter 7: Unsupervised Machine Learning 187
Understanding clustering 188
How are clusters formed? 189
Types of clustering 190
Hierarchical clustering 191
Overlapping, exclusive, and fuzzy clustering 192
Differences between partial versus complete clustering 192
K-means clustering 192
K-means algorithm 193
Algorithm of K-means 193
Associating the data points with the closest centroid 193
How to choose the initial centroids? 194
Time-space complexity of K-means algorithms 195
Issues with K-means 195
Empty clusters in K-means 196
Outliers in the dataset 196
Different types of cluster 196
K-means – strengths and weaknesses 198
Bisecting K-means algorithm 199
Getting deep into hierarchical clustering 201
Agglomerative hierarchical clustering 202
How proximity is computed 202
Strengths and weaknesses of hierarchical clustering 205
Understanding the DBSCAN technique 206
So, what is density? 206
How are points classified using center-based density 206
DBSCAN algorithm 206
Strengths and weaknesses of the DBSCAN algorithm 207
Cluster validation 207
Example 207
Summary 211
References 212

Chapter 8: Creating Ensemble Models 213
What is ensemble learning? 214
Understanding ensemble learning 215
How to construct an ensemble 215
Combination strategies 216
Subsampling training dataset 217
Bagging 217
When does bagging work? 219
Boosting 219
Boosting approach 219
Boosting algorithm 220
AdaBoost – boosting by sampling 220
What is boosting doing? 221
The bias and variance decomposition 221
Manipulating the input features 221
Injecting randomness 222
Random forests 223
Features of random forests 225
How do random forests work? 226
The out-of-bag (oob) error estimate 226
Gini importance 227
Proximities 227
Implementation in Julia 227
Learning and prediction 229
Why is ensemble learning superior? 234
Applications of ensemble learning 236
Summary 236
References 236

Chapter 9: Time Series 237
What is forecasting? 237
Decision-making process 238
The dynamics of a system 239
What is TimeSeries? 240
Trends, seasonality, cycles, and residuals 240
Difference from standard linear regression 240
Basic objectives of the analysis 241
Types of models 241
Important characteristics to consider first 241
Systematic pattern and random noise 243
Two general aspects of time series patterns 243
Trend analysis 243
Smoothing 244
Fitting a function 244
Analysis of seasonality 244
Autocorrelation correlogram 244
Examining correlograms 245
Partial autocorrelations 245
Removing serial dependency 246
ARIMA 246
Common processes 246
ARIMA methodology 247
Identification 247
Estimation and forecasting 248
The constant in ARIMA models 248
Identification phase 248
Seasonal models 249
Parameter estimation 250
Evaluation of the model 251
Interrupted time series ARIMA 251
Exponential smoothing 251
Simple exponential smoothing 252
Indices of lack of fit (error) 253
Implementation in Julia 254
The TimeArray time series type 254
Using time constraints 258
when 258
from 258
to 259
findwhen 259
find 260
Mathematical, comparison, and logical operators 260
Applying methods to TimeSeries 261
Lag 261
Lead 262
Percentage 263
Combining methods in TimeSeries 263
Merge 263
Collapse 264
Map 264
Summary 265
References 265

Chapter 10: Collaborative Filtering and Recommendation System 266
What is a recommendation system? 267
The utility matrix 268
Association rule mining 269
Measures of association rules 270
How to generate the item sets 270
How to generate the rules 271
Content-based filtering 271
Steps involved in content-based filtering 272
Advantages of content-based filtering 274
Limitations of content-based filtering 274
Collaborative filtering 275
Baseline prediction methods 276
User-based collaborative filtering 277
Item-item collaborative filtering 280
Algorithm of item-based collaborative filtering 280
Building a movie recommender system 281
Summary 286

Chapter 11: Introduction to Deep Learning 287
Revisiting linear algebra 290
A gist of scalars 290
A brief outline of vectors 290
The importance of matrices 290
What are tensors? 291
Probability and information theory 291
Why probability? 291
Differences between machine learning and deep learning 293
What is deep learning? 293
Deep feedforward networks 296
Understanding the hidden layers in a neural network 298
The motivation of neural networks 298
Understanding regularization 299
Optimizing deep learning models 300
The case of optimization 300
Implementation in Julia 301
Network architecture 302
Types of layers 302
Neurons (activation functions) 303
Understanding regularizers for ANN 305
Norm constraints 305
Using solvers in deep neural networks 305
Coffee breaks 306
Image classification with pre-trained Imagenet CNN 307
Summary 311
References 311

Index 312

Packt.Julia.for.Data.Science.1785289691.part1.rar

10 MB

Packt.Julia.for.Data.Science.1785289691.part2.rar

1.25 MB

回复

使用道具 举报

论坛徽章:
17
R研习者中级
日期:2017-03-30 14:23:46python徽章
日期:2018-12-20 15:14:57OpenAI课程徽章
日期:2018-08-16 15:34:25Keras徽章
日期:2018-07-26 15:36:38人工智能徽章
日期:2018-07-19 15:00:17知识图谱徽章
日期:2018-06-15 13:47:21Julia徽章
日期:2017-12-25 17:39:25Kaggle徽章
日期:2017-12-25 17:28:27Hadoop研习者中级
日期:2017-09-19 10:59:42spark徽章
日期:2017-09-19 10:57:24Hadoop研习者初级
日期:2017-09-04 17:12:36R研习者初级
日期:2017-08-17 17:12:19
发表于 2017-3-11 23:13 | 显示全部楼层
感谢校长分享如此优秀的语言。
回复 支持 反对

使用道具 举报

论坛徽章:
0
发表于 2018-4-17 14:48 | 显示全部楼层
谢谢分享,非常感谢,非常感谢
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册 新浪微博登陆

本版积分规则

 

GMT+8, 2019-6-17 10:45 , Processed in 0.177639 second(s), 38 queries .