Второй вопрос Explain gradient descent method for linear regression

Название	Второй вопрос Explain gradient descent method for linear regression
Анкор	Machine Learning
Дата	30.12.2021
Размер	0.96 Mb.
Формат файла
Имя файла	1.docx
Тип	Документы #322090
страница	3 из 3

1 2 3

>>> from sklearn.ensemble import RandomForestRegressor

>>> from sklearn.datasets import make_regression

>>> X, y = make_regression(n_features=4, n_informative=2,

... random_state=0, shuffle=False)

>>> regr = RandomForestRegressor(max_depth=2, random_state=0)

>>> regr.fit(X, y)

RandomForestRegressor(...)

>>> print(regr.predict([[0, 0, 0, 0]]))

[-8.32987858]

7. Decision Tree Regressor

criterion{“squared_error”, “friedman_mse”, “absolute_error”, “poisson”}, default=”squared_error”

The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node, “friedman_mse”, which uses mean squared error with Friedman’s improvement score for potential splits, “absolute_error” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node, and “poisson” which uses reduction in Poisson deviance to find splits.

New in version 0.18: Mean Absolute Error (MAE) criterion.

New in version 0.24: Poisson deviance criterion.

Deprecated since version 1.0: Criterion “mse” was deprecated in v1.0 and will be removed in version 1.2. Use criterion="squared_error" which is equivalent.

Deprecated since version 1.0: Criterion “mae” was deprecated in v1.0 and will be removed in version 1.2. Use criterion="absolute_error" which is equivalent.

splitter{“best”, “random”}, default=”best”

The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_featuresint, float or {“auto”, “sqrt”, “log2”}, default=None

The number of features to consider when looking for the best split:

If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

>>> from sklearn.datasets import load_diabetes

>>> from sklearn.model_selection import cross_val_score

>>> from sklearn.tree import DecisionTreeRegressor

>>> X, y = load_diabetes(return_X_y=True)

>>> regressor = DecisionTreeRegressor(random_state=0)

>>> cross_val_score(regressor, X, y, cv=10)

...

...

array([-0.39..., -0.46..., 0.02..., 0.06..., -0.50...,

0.16..., 0.11..., -0.73..., -0.30..., -0.00...])

8. label encoder

LabelEncoder can be used to normalize labels.

>>> from sklearn import preprocessing

>>> le = preprocessing.LabelEncoder()

>>> le.fit([1, 2, 2, 6])

LabelEncoder()

>>> le.classes_

array([1, 2, 6])

>>> le.transform([1, 1, 2, 6])

array([0, 0, 1, 2]...)

>>> le.inverse_transform([0, 0, 1, 2])

array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>> le = preprocessing.LabelEncoder()

>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])

LabelEncoder()

>>> list(le.classes_)

['amsterdam', 'paris', 'tokyo']

>>> le.transform(["tokyo", "tokyo", "paris"])

array([2, 2, 1]...)

>>> list(le.inverse_transform([2, 2, 1]))

['tokyo', 'tokyo', 'paris']
One hot encoder
Encode categorical features as a one-hot numeric array.

categories‘auto’ or a list of array-like, default=’auto’

Categories (unique values) per feature:

‘auto’ : Determine categories automatically from the training data.
list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric values within a single feature, and should be sorted in case of numeric values.

The used categories can be found in the categories_ attribute.

New in version 0.20.

drop{‘first’, ‘if_binary’} or an array-like of shape (n_features,), default=None

Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into a neural network or an unregularized regression.

However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models.

None : retain all features (the default).
‘first’ : drop the first category in each feature. If only one category is present, the feature will be dropped entirely.
‘if_binary’ : drop the first category in each feature with two categories. Features with 1 or more than 2 categories are left intact.
array : drop[i] is the category in feature X[:, i] that should be dropped.

New in version 0.21: The parameter drop was added in 0.21.

Changed in version 0.23: The option drop='if_binary' was added in 0.23.

sparsebool, default=True

Will return sparse matrix if set True else will return an array.

dtypenumber type, default=float

Desired dtype of output.

handle_unknown{‘error’, ‘ignore’}, default=’error’

Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None.

9. Support Vector Machines

Машины опорных векторов (SVM) - это набор контролируемых методов обучения, используемых для классификации, регрессии и обнаружения выбросов. Преимущества машин опорных векторов: Эффективен в пространствах больших размеров. По-прежнему эффективен в случаях, когда количество измерений превышает количество образцов. Использует подмножество обучающих точек в функции принятия решений (называемых опорными векторами), поэтому это также эффективно с точки зрения памяти. Универсальность: для функции принятия решения могут быть указаны различные функции ядра. Предоставляются общие ядра, но также можно указать собственные ядра.

>>> from sklearn import svm

>>> X = [[0, 0], [1, 1]]

>>> y = [0, 1]

>>> clf = svm.SVC()

>>> clf.fit(X, y)

SVC()
>>> clf.predict([[2., 2.]])

array([1])
>>> # get support vectors

>>> clf.support_vectors_

array([[0., 0.],

[1., 1.]])

>>> # get indices of support vectors

>>> clf.support_

array([0, 1]...)

>>> # get number of support vectors for each class

>>> clf.n_support_

array([1, 1]...)

10. Logistic Regression (aka logit, MaxEnt) classifier.
В случае мультикласса алгоритм обучения использует схему one-vs-rest (OvR), если опция multi_class установлена на ovr, и использует потерю кросс-энтропии, если опция multi_class установлена на multinomial. '. (В настоящее время опция «multinomial» поддерживается только решателями «lbfgs», «sag», «saga» и «newton-cg».)
penalty{‘l1’, ‘l2’, ‘elasticnet’, ‘none’}, default=’l2’

Specify the norm of the penalty:

'none': no penalty is added;
'l2': add a L2 penalty term and it is the default choice;
'l1': add a L1 penalty term;
'elasticnet': both L1 and L2 penalty terms are added.

Warning

Some penalties may not work with some solvers. See the parameter solver below, to know the compatibility between the penalty and solver.

New in version 0.19: l1 penalty with SAGA solver (allowing ‘multinomial’ + L1)

dualbool, default=False

Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

tolfloat, default=1e-4

Tolerance for stopping criteria.

Cfloat, default=1.0

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

intercept_scalingfloat, default=1

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

class_weightdict or ‘balanced’, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

New in version 0.17: class_weight=’balanced’

random_stateint, RandomState instance, default=None

Used when solver == ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data. See Glossary for details.

solver{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’

Algorithm to use in the optimization problem. Default is ‘lbfgs’. To choose a solver, you might want to consider the following aspects:

For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones;
For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss;
‘liblinear’ is limited to one-versus-rest schemes.

Warning

The choice of the algorithm depends on the penalty chosen: Supported penalties by solver:

‘newton-cg’ - [‘l2’, ‘none’]
‘lbfgs’ - [‘l2’, ‘none’]
‘liblinear’ - [‘l1’, ‘l2’]
‘sag’ - [‘l2’, ‘none’]
‘saga’ - [‘elasticnet’, ‘l1’, ‘l2’, ‘none’]

Note

‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

See also

Refer to the User Guide for more information regarding LogisticRegression and more specifically the Table summarazing solver/penalty supports.

New in version 0.17: Stochastic Average Gradient descent solver.

New in version 0.19: SAGA solver.

Changed in version 0.22: The default solver changed from ‘liblinear’ to ‘lbfgs’ in 0.22.

max_iterint, default=100

Maximum number of iterations taken for the solvers to converge.

>>> from sklearn.datasets import load_iris

>>> from sklearn.linear_model import LogisticRegression

>>> X, y = load_iris(return_X_y=True)

>>> clf = LogisticRegression(random_state=0).fit(X, y)

>>> clf.predict(X[:2, :])

array([0, 0])

>>> clf.predict_proba(X[:2, :])

array([[9.8...e-01, 1.8...e-02, 1.4...e-08],

[9.7...e-01, 2.8...e-02, ...e-08]])

>>> clf.score(X, y)

0.97...

1 2 3