How to tap the features of game paying users based on machine learning

Games gamers Beijing

youxiputao· 2017-11-23 04:01:52


online game production is fragile, to investigate the current domestic game making process and modify the process, often found that the lack of objective and neutral data of its decision-making, planning experience and racking the head, the overall probability of product success is very low. Beijing deep intelligent technology recently published an article based on game data mining, trying to provide a basis for decision-making with a large number of players behavior data showing the law.

payment rate and pay depth, is one of the key indicators of online games. If the user can dig out the tendency in the game according to game user behavior data, developers can according to the behavior and property differences between the two types of users, to modify the game, the more non paid game player into paying game player, improve product rate; or according to the user's behavior to the user to give appropriate preferential stimulation this can increase the income of consumption. In this paper, a class of

using machine learning technology in supervised learning technology - tree promotion, from a successful ARPG K (hereinafter referred to as the mobile phone online game K) buried points in the data for data mining, data subscribers and non subscribers automatic feature learning and prediction problem based on achieved initial results, to share the game the industry, promote scientific and quantitative decision making online games.

1. has introduced the principle of XGBoost algorithm for each user in the game in the first 4 days of action, click the action time stamp, diamonds, gold, diamond, combat, game player equipment type, IP address and other data

game K data, the first half of the total data reached 8TB.

, we compare the classical supervised learning algorithms, consider the efficiency and accuracy of the algorithm, and finally select the XGBoost algorithm [3].

XGBoost is a design of efficient and flexible and portable excellent distributed decision gradient upgrade library, through a group of weak classifiers (decision tree) to achieve accurate classification results of iterative calculation, realize the gradient promotion under the framework of machine learning algorithm, has been proved in practice can be effectively used to predict the classification and regression tasks mining. This algorithm is very suitable for dealing with game K massive game player behavior log data.

1.1 decision tree

the basis of this method is decision tree. Decision tree is a basic method of classification and regression. It can be considered as a set of if-then rules. The decision tree is composed of nodes and directed edges. The internal nodes represent the characteristic attributes, and the external nodes (leaf nodes) represent the categories. The decision tree learning is to use the existing data to learn various rules and predict the unknown data in the future. But the

decision tree algorithm is too simple, for some of the more complex logical data, decision tree or incapable of action, prone to overfitting, especially the single game player can perform in network game action in more than 4000 cases, is very complex under various attributes.

1.2 ensemble learning

is usually a base model, it is difficult to guarantee the prediction accuracy. The common solution is to use the tree integrated model to integrate the prediction of multiple trees. Suppose there are K trees, and the tree integration model is:

, where FK is a function in the function space F, and F is the function space containing all the classification trees. The parameters of tree integration model including the structure of every tree and leaf score can simply use the FK function as a parameter theta = {F1, F2, F3, fK}.,... The objective function includes the loss function and the regularization term:

: where the regularization term of

XGBoost includes L1 and L2 regular. Optimizing the objective function is actually solving the structure of the classification tree and the leaf fraction, and it is not easy to train all the trees immediately, which is much more difficult than the traditional optimization problem.

XGBoost (Additive Training) by incremental training: every step we are adding a tree based on the previous step, and this tree is a new tree on the repair, we put each step with fi prediction t (T) said, "so we have:

"using the above incremental learning method, XGBoost training, with Newton training every tree.

XGBoost adds a regular term to the cost function to control the complexity of the model. The regular term contains the number of leaf nodes of the tree and the sum of squares of the L2 modules of the score output on each leaf node. From the Bias-variance tradeoff point of view, the regularization term reduces the variance of the model, making the learning model simpler and preventing overfitting, which is also a feature of xgboost over the traditional GBDT. XGBoost draws on the random forest approach to support column sampling, not only reduces overfitting, but also reduces computation. XGBoost parallel computing weights on feature granularity, and there are a lot of features in our game data. We calculate the entropy of each feature in parallel to accelerate the training process.

2. embedded data mining and classification process of

game K data mining process is as follows:

" concrete steps and relevant data are as follows:


K the experimental data using the game in August 4, 2017 -8 month 10 days, a total of 97203 users. The VIP user has a total of 4448 people. VIP users are paid players, rather than VIP users are non paid players, two categories of players constitute a classification of data labels, used to monitor learning and training

The lastest articles of youxiputao

How to use machine learning to do game retention data mining?

How to tap the features of game paying users based on machine learning

How to use machine learning to do game retention data mining?

How to tap the features of game paying users based on machine learning