Nonlinear models in R language: polynomial regression, local spline, smooth spline, generalized additive model GAM analysis
Here we relax the assumptions of the popular linear approach. Sometimes the linear hypothesis is just a bad approximation. There are many ways to solve this problem, some of which can be solved by using regularization to reduce model complexity. However, these techniques still use linear models and can only be improved so far. Polynomial regression this is a simple way to provide a nonlinear fit to the data. Step...
Application of R language multi-classification Logistic logistic regression model in assessment of individual risk loss value in mixed distribution simulation
Usually, one of the things we say in regression models is "please look at the data." In the last article, we didn't look at the data. Looks like we have a fixed cost claim in our database. [1] 0.[1] 1171. For probabilities, we should use polynomial models. Here, the variables (divided into three levels) are divided into three indicators (just like the standard regression model...
The Apriori algorithm in Python is used to mine association rules
Association rule mining is a technique to identify potential relationships between different items. Take a supermarket for example, where customers can buy all kinds of goods. Often, there is a pattern to what customers buy. For example, mothers with babies buy baby products such as milk and diapers. Teenage girls can buy cosmetics, while bachelors can buy beer and chips. In short, trading involves a pattern. If you can identify items purchased in different transactions...
Matlab detects outliers using quantile random forest (QRF) regression trees
This example shows how to use quantile random forest to detect outliers. Quantile random forest can detect outliers related to the conditional distribution of Y for a given X. Outliers are observations that are located far enough from most other observations in the data set to be considered outliers. The causes of outlier observations include inherent variability or measurement errors. Outliers significantly influence estimates and inferences, so detecting them determines whether to delete or...
Python Data Mining and Machine Learning -- Communication Credit Risk Assessment (3) -- Feature Engineering
There is a common saying in the industry that data and features determine the upper limit of machine learning, and models and algorithms only approximate this limit. Data is preprocessed and merged into the wide table train_USER_COMM_BASIC. According to the analysis of single feature, feature processing is carried out. Missing value processing, based on the business and data understanding of communication, the missing value is filled. AGE column missing data...
R language ARIMA, vector autoregression (VAR), periodic autoregression (PAR) model analysis of temperature time series
There are at least two kinds of nonstationary time series: time series with trends and time series with unit roots (called integral time series). The unit root test cannot be used to evaluate whether a time series is stationary or not. They can only detect integral time series. The same is true of seasonal unit roots. Here, consider the monthly average temperature data. > mon=read.table("temp.as.numeric(pp.tes...
WEKA text Mining analyzes the spam classification model
The application of E-mail has become very extensive, which has brought great convenience to people's life. However, as a by-product of its development -- spam, it has brought a lot of trouble to the majority of users, network administrators and ISP(Internet service providers). The problem of spam is becoming more and more serious and has been widely concerned by researchers. Spam usually refers to unauthorized, but forced into a user's mailbox...
Lorenz system simulation visualization based on MATLAB
I write a function that takes three systems of differential equations as input and solves the system using the Runge-Kutta method with step sizes. I used MATLAB to generate a GIF of the solution. L = LorenzRK (' - + 10 * 10 * y1 y2, y3 + '- y1 * 28 * y1 y2 -', '* y1 y2 - (8/3) * y3', [0, 50...
R language uses vector autoregression (VAR) to study and analyze the impulse response of economic data
Since Sims (1980) published his seminal paper, vector autoregressive models have become a key tool in macroeconomic research. This article introduces the basic concepts of VAR analysis and guides the estimation process for simple models. VAR stands for vector autoregression. To understand what this means, let's first look at a simple univariate (i.e. only one dependent or endogenous variable) autoregressive (AR) model...
Time series prediction of R language multivariate Copula GARCH model
Unlike macroeconomic data, financial markets tend to use high-frequency data, such as stock return sequences. Intuitively speaking, the latter is a sequence with more "fluctuations" and random fluctuations than the former. In the case of unary or multivariate, it is the best choice to build Copula function model and GARCH model. In the multivariate GARCH family, there are many kinds, so we need to deduce and understand more by ourselves and choose the optimal model. This paper uses R software to analyze...