河北国呈科技|河北国呈电子|空气质量监测|无人机应用|

AG亚游集团_亚洲直营游戏平台_AG亚游集团官网_12306

示例图片三
网站首页 > 新闻资讯 > 行业资讯

使用PYTHON数据挖掘指南:Data Mining in Python: A Guide

Michael Rundell SOTON数据分析

Data mining and algorithms

Data mining is the process of discovering predictive information from the analysis of large databases. For a data scientist, data mining can be a vague and daunting task – it requires a diverse set of skills and knowledge of many data mining techniques to take raw data and successfully get insights from it.  You’ll want to understand the foundations of statistics, anddifferent programming languages that can help you with data mining at scale.  

This guide will provide an example-filled introduction to data mining using Python, one of the most widely used data mining tools – from cleaning and data organization to applying machine learning algorithms. First, let’s get a better understanding of data mining and how it is accomplished.

A data mining definition 

The desired outcome from data mining is to create a model from a given dataset that can have its insights generalized to similar datasets. A real-world example of a successful data mining application can be seen in automatic fraud detection from banks and credit institutions.

Your bank likely has a policy to alert you if they detect any suspicious activity on your account – such as repeated ATM withdrawals or large purchases in a state outside of your registered residence. How does this relate to data mining? Data scientists created this system by applying algorithms to classify and predict whether a transaction is fraudulent by comparing it against a historical pattern of fraudulent and non-fraudulent charges. The model “knows” that if you live in San Diego, California, it’s highly likely that the thousand dollar purchases charged to a scarcely populated Russian province were not legitimate.

That is just one of a number of the powerful applications of data mining. Other applications of data mining include genomic sequencing, social network analysis, or crime imaging – but the most common use case is for analyzing aspects of the consumer life cycle. Companies use data mining to discover consumer preferences, classify different consumers based on their purchasing activity, and determine what makes for a well-paying customer – information that can have profound effects on improving revenue streams and cutting costs.

If you’re struggling to get good datasets to to begin your analysis, we’ve compiled 19 free datasets for your first data science project.

What are some data mining techniques?

There are multiple ways to build predictive models from datasets, and a data scientist should understand the concepts behind these techniques, as well as how to use code to produce similar models and visualizations. These techniques include:

  • Regression – Estimating the relationships between variables by optimizing the reduction of error.

  • 河北国呈科技|河北国呈电子|

  •  

An example of a scatterplot with a fitted linear regression model.

  • Classification – Identifying what category an object belongs to. An example is classifying email as spam or legitimate, or looking at a person’s credit score and approving or denying a loan request.

  • Cluster Analysis – Finding natural groupings of data objects based upon the known characteristics of that data. An example could be seen in marketing, where analysis can reveal customer groupings with unique behavior – which could be applied in business strategy decisions.

河北国呈科技|河北国呈电子|

 

 

Powered by MetInfo 5.3.14 ©2008-2019 www.metinfo.cn