AI-Powered Credit Card Fraud Detection Assignment Sample

Type Assignment
Downloads5652
Pages37
Words9135

Chapter 1: Introduction

Get free samples written by our Top-Notch subject experts for taking online Assignment Helper services.

1.1 Introduction

Over the last few decades, the use of credit card while shopping in the mall has drastically increased. With the emerging technological advancement, the risk of credit card fraud has also increased over time. Fraudulent transactions associated with credit cards have been increased due to their low-risk nature. The incidence of credit card fraud in shopping malls is limited to less than 0.2% of all transactions all over the world but the impact of such fraud in the finance sector is huge and it can cause a huge loss as transactions can be of large amount. The losses due to such fraud in the shopping malls cause more than a billion dollars all over the world. So it becomes necessary to prevent such fraud and take some security measures to stop this fraud as due to such fraud the shopping mall bears a huge loss every year. There are ways of preventing such fraud such as using OTP on the mobile phone of the user during the transaction, securing the payment gateway while the transaction and creating security questions for online banking. The use of machine learning can be taken into consideration for detecting fraud. The use of ensemble learning can be useful in detecting credit card fraud in shopping malls.

1.2 Background Study

Credit card fraud in shopping malls in recent times has emerged as a global problem. Due to such fraud, big companies suffer huge financial losses every year. The incidence of credit card fraud in shopping malls is limited to less of all transactions all over the world but the impact of such fraud in the finance sector is huge and it can cause a huge loss as transactions can be of large amounts (Krishna Rao et al., 2021.). The losses due to such fraud in the shopping malls cause more than a billion dollars all over the world. Due to such frauds in shopping malls, financial losses emerge as a global problem nowadays. The importance of detecting such fraud has become mandatory so that the overall financial loss can be minimized globally. The use of various methods can be taken into consideration for preventing credit card fraud in shopping malls (Prusti and Rath., 2019). The use of a one-time password on the mobile phone of the user can be beneficial so that no one can use the credit card details of the user while transactions. There are so many other ways of preventing credit card fraud such as securing the payment gateway, creating security questions during the transactions, and so on. Though these methods are not fully proven and they are inconvenient for some users. So it is desirable to have a balance between convenience and security. The use of machine learning techniques become useful in detecting fraud associated with credit cards in shopping malls. The whole process of detecting credit card fraud is quite difficult and sometimes it becomes confusing to determine whether a fraudulent transaction attempt has passed the mechanisms associated with the prevention of such fraud. The main task of the fraud detection systems is to identify every transaction using credit cards in shopping malls (Safa and Ganga., 2019). The fraud detection system filters every transaction and identifies the fraudulent ones as soon as possible. The increment in fraud activities related to credit cards leads financial institutions to look for various ways in detecting fraud. Despite having the prevention mechanism the companies suffer a huge loss from fraudulent transactions as fraudsters have changed their deceiving strategies constantly so that they cannot be detected. Due to this factor, the traditional, rule-based fraud detection systems have become obsolete nowadays. The importance of the implementation of machine learning techniques in credit card fraud detection has increased drastically all over the world. In this research work, the implementation of ensemble learning in detecting credit card fraud in shopping malls will be discussed. There are several problems associated with credit card fraud detection such as unavailability of the datasets, dynamic fraudulent behaviour, skewed dataset, and accurate evaluation parameters (Yourself, Alaghband, and Garibay. 2019). The unavailability of suitable datasets creates several problems in collecting information reloaded to credit cards. The dataset associated with the transaction of the customer contains vital information about the customer. Due to this factor, the companies are not able to release such data in the public domain. This unavailability of the dataset creates several challenges in detecting credit card fraud. The behavioural changes of the scammers create challenges in detecting credit card fraud. The scammers change their behaviour to beat the detection system by changing their patterns. Highly skewed datasets associated with the transaction details create several challenges in the fraud detection system. The affectivity of any classifier depends on its accuracy but in the case of credit card fraud detection, accuracy is not considered the correct measure since the dataset of the transaction is of skewed nature. The skewed dataset in a model with high accuracy can sometimes be misclassified due to this factor. Therefore, the evaluation of such models becomes mandatory. Various evaluation processes can be taken into consideration for detecting credit card fraud such as recall, classifying fraudulent transactions, and correct classification of the normal transaction with precision. The research work will be done on the Python Jupyter notebook software. The whole process will be done to obtain the required output of the project. Various methods have been performed in the python software to obtain the required output of the project such as linear regression, lasso regression, elastic net regression, XGB regression, and gradient boosting. After obtaining the output as errors in the system the stacking is done to minimize the errors in the output of the system. Thus a securing process is obtained that can be used to prevent credit card fraud in the shopping malls. Thus ensemble learning plays a major role in obtaining a secure method of preventing credit card fraud in shopping malls.

1.3 Problem statement

Several problems are associated with the credit card fraud detection system using ensemble learning. The use of a machine learning algorithm can be taken into consideration in detecting frauds associated with credit cards. The unavailability of the datasets creates problems in the fraud detection process. The ensemble learning faces problems during interpretation and the obtained output of the ensemble learning is hard to predict. The operation of the system is very critical and wrong input can lead to lower predictive accuracy which is a severe problem that can create huge difficulties in detecting credit card fraud (Rashid et al., 2020). Ensemble learning is costly due to this factoring it creates several problems in detecting credit card fraud. The predictive model obtained from ensemble learning is expensive and difficult to understand. It costs more in creating, training and, deploying the model. Such problems in the system need to be overcome so that better outcomes can be obtained.

Credit Card Fraud Detection In Shopping Malls Assignment Sample

Liked This Sample? Hire Me Now

Jacqueline Gough

5 Years | PhD

1.4 Research aim and objective

Aims:

This research aims to detect credit card fraud in shopping malls using ensemble learning.

Objectives: The objective of this research is

To identify the ensemble learning concept with the combination of various predictive machine learning techniques
To obtain accuracy in detecting credit card fraud using ensemble learning
To implement Python programming language in the system
To develop a secure method by which fraud detection can be done precisely

1.5 Research questions

There is some research questions associated with the credit card fraud detection system such as

How to implement Python programming language in detecting credit card fraud?
How do develop a secure method by using ensemble learning to detect credit card fraud?
Why python programming languages are used to detect such frauds?
What are the major challenges of a credit card fraud detection system?

1.6 Scope of the research

There are various scopes of research that are associated with the credit card fraud detection system. The dataset comprises important transaction data and based on the data the training and testing can be done to obtain the required output which can be beneficial for detecting fraud in credit cards. The obtained dataset can be used in future works by which credit card fraud can be minimized. The predictive model that will be prepared with the help of ensemble learning will be used in the future. The study of fraud detection has several future scopes and the obtained data set can be useful for future purposes.

1.7 Dissertation structure

1.8 Summary

The use of ensemble learning has played a major role in detecting credit card fraud in shopping malls. The importance of such machine learning techniques has increased drastically in recent times due to the increment of fraudulent activity by scammers globally. The basic idea of credit card fraud in shopping malls has been discussed in the introduction part of this research. A brief idea of ensemble learning and its relation to credit card fraud has been given in the background study of the research where the research approach is discussed. There are several problems associated with the credit card fraud detection system which are discussed in the problem statement of the research. The aim and objective of the research have also been discussed which helps to understand the research work. The future scope of the research work has also been discussed in the next section of this research. Thus a basic idea of the research work has been discussed in this section.

Chapter 2: Critical Review of Literature

2.1 Introduction

In modern times, credit card fraud in shopping malls has increased due to the advancement in technology. The fraudsters use various techniques to scam users and generate critical information from the customer in the shopping mall. Credit card transactions are of low-risk nature and due to this factor; fraudulent transactions associated with credit cards have been increased. Though the incidence of credit card fraud in shopping malls is limited yet the impact of such fraud in the finance sector is huge and it can cause a huge loss as transactions can be of large amounts. It is very important to prevent such fraud otherwise it can cause a huge loss to the e-commerce industry. The use of ensemble learning along with machine learning algorithms can be taken into consideration in this domain. Several security measures can be taken into consideration in preventing credit card fraud. The use of a one-time password on the mobile phone of the user can be beneficial so that no one can use the credit card details of the user while transactions. Apart from that, there are many other ways present that can be used in preventing credit card fraud such as securing the payment gateway, creating security questions during online transactions, and so on. These methods are not considered as full-proof methods that's why the evaluation of the model becomes essential. There are several problems associated with credit cards fraud detection such as unavailability of the datasets, skewed datasets, accurate evaluation parameters, and dynamic fraudulent behaviour. The datasets of the transaction details contain vital information about the customer which cannot be made public. Due to this factor, the unavailability of the data sets creates several problems in the fraud detection mechanism. The dynamic behavioural change of the fraudsters beats the functionality of the system to detect fraud transactions. Highly skewed datasets associated with the transaction details create several challenges in the fraud detection system. The affectivity of any classifier depends on its accuracy but in the case of credit card fraud detection, accuracy is not considered the correct measure due to the fact that the dataset of the transaction is of skewed nature. Such vulnerabilities in the system make the whole process of credit card fraud detection difficult. The use of ensemble learning plays a major role in detecting credit card fraud and the implementation of the techniques helps to obtain a predictive model that is beneficial for detecting the fraud associated with the credit card in shopping malls.

2.2 Previous Study of Literature

2.2.1 Overview of Ensemble Learning

Ensemble learning plays a major role in detecting credit card fraud in malls and it is a machine learning process that is used to create a predictive model through the combination of the various prediction model (Goyal, and Manjhvar., 2020). The ensemble learning method is classified into three main classes: bagging, stacking and boosting. The bagging ensemble learning method is used to obtain many decision trees on different samples and based on the decision tree averaging of the prediction is done (Shekhar, Kedia, and Guha., 2020). The stacking ensemble learning method is used to fit different models on the same data with the help of another model for learning the process of combining the best prediction model. The boosting ensemble learning method is used to add ensemble members sequentially which helps to correct the prediction model.

Bagging: it is an ensemble learning algorithm that requires a diverse group of members associated with the ensemble learning through verifying the training data. This technique involves a decision tree for training each model on a different sample within the same dataset with the help of a machine learning algorithm. The predictive model is then made with the help of ensemble members by using the statistics such as voting and averaging. Based on the obtained statistics a prediction model is prepared (Alharbiet al., 2022). The key elements of the bagging ensemble are bootstrap samples related to the training dataset, unpruned decision trees that fit on each of the samples, and simple statistics such as voting and averaging of the predictions.

2.2.2 Challenges in detecting credit card fraud

There exists a long list of obstacles that a developer can face while developing the fraud detection model. There is a very limited number of research studies that have analyzed a real-world dataset of credit cards due to the issue of confidentiality. However, the author (Randhawa et al., 2018) has taken some real-world credit card data sets from an institution of finance and analysed them. The main challenges that are involved in fraud detection of credit cards are:

The model that has been built for detecting fraud in credit cards must be efficient enough to quickly respond to the scam in time, in spite of the fact that enormous data are processed every day.
Data not being classified properly might be another major issue since all fraudulent transactions are not caught and reported.
Another challenge related to the fraud detection in credit cards includes the adaptive techniques that the scammers use against the developed fraud detection model.
Imbalanced data is also one of the major challenges that are faced during the process of detection of fraud. In this case, the number of fraud transactions is very less or almost negligible as compared to the transactions that are not fraudulent. This results in an imbalance number of frauds and honest transactions. It becomes very difficult to detect fraud transactions among such a large number of honest transactions.

As per the opinion of the researcher (Sadineni, 2020), the model that they have developed, does not assure the same outcome in every other scenario. They have tested their model on a smaller set of data. They are not sure whether their model will run successfully in case of a huge dataset or not. Their model proved to give accurate results but they faced difficulties during the machine learning and training process. The cost of training the system was too expensive for them. Challenges have arisen during the process of deep learning and while applying the machine learning algorithms. The challenges and issues that the authors of the paper (Singh and Jain, 2020), have discussed are as follows:

Lack of standard datasets of credit cards- In earlier research works, most of the researchers have used their own set of data to evaluate their proposed methods of detecting fraud cases.
As per them, criteria for standard evaluation do not exist so as to assess and compare the results obtained by the fraud detection model. The metrics of accuracy are not suitable due to the presence of imbalanced data.
Existence of insufficient algorithms to detect the new type of fraudulent patterns.
Fraudsters are intelligent enough to change their behaviour or fraud styles at times for obtaining the card details and somehow bypass the fraud detection model. So keeping up with them is a great challenge for developers of the fraud detection system.
Sometimes changes in behaviours of the cardholder from time to time due to certain circumstances may not be considered by the system as normal. This may result in the wrong detection of fraud by the system. This is where the system may fail to distinguish between fraud and honest transactions.
Developing the algorithms for pattern recognition of the fraudsters and customers is also a challenging task for the developers of the system.

All the above-mentioned issues and challenges must be kept in mind while developing the system or model of fraud detection. Much effort is required to overcome these challenges. It might not be possible to overcome all the challenges. However to obtain accurate results and a successful fraud detection model, at least some of these issues must be overcome.

Feeling overwhelmed by your assignment?

Get assistance from our PROFESSIONAL ASSIGNMENT WRITERS to receive 100% assured AI-free and high-quality documents on time, ensuring an A+ grade in all subjects.

place order now WhatsApp Order live chat

2.2.3 Importance of secured credit cards

A secured credit card helps in building the credit score of a person. Since a person requires making periodic payments in order to clear the balance, secure credit cards count this as credit repayments. A secured, credit card is a card, which is backed by a deposit of a certain amount from the cardholder. The deposit made in it acts as account collateral, by providing the issuer of the card with security in such a case when the cardholder is unable to make any payment. The deposit that the person has done will now become the limit of the credit card of that person. As a result, the person will not have to face any kind of obligations or difficulties if any, during the payment process. Securing credit card is very important as fraud transactions results in huge loss to the business or company. Besides, the cardholder who has suffered fraud transactions also faces a huge loss of money. If certain simple security risks are overlooked, it might lead to stealing customers’ information and revoking the privileges of credit card acceptance. Lack of security in credit card transactions thus leads to data breaches, and lost revenue and may result in the loss of the customers to the merchant company. Using a secured credit card sends the account data of the cardholder to the credit bureaus. This is very essential in building the credibility of a person. Secured credit cards help the issuer of the card to get approved easily. It stimulates the potential to earn rewards. Another advantage of using a secured credit card is that the cardholder gets the amount that they had earlier deposited as collateral, refunded to their account.

People who do not have a credit history can avail of using this card to create the same in order to get a loan approval in the future.
Using this type of card helps in increasing the credit limit and earning good interest on fixed deposits.
It also helps people in availing of low-interest loans. It means that the secured credit cardholder can get any type of loan by paying low-cost interest as compared to the other customers.

Certain researchers have studied and understood the importance of the relationships between the bank and the firms in obtaining higher limits in credit. They have developed a model and confirmed from their findings that relationships are the most important for securing higher limits of credits. According to the paper of the author (Gencoglu, 2019), due to the advancement in the process of electronic data exchange and digital communication, most people communicate and share their private information knowingly or unknowingly in cyberspace. The credit card security process comes with the application of a reduction in authorized access from malicious activity that delivers a gateway authenticity system to the user account. This results in the users leaving their digital footprints in cyberspace. Now, this information is often unprotected and easily available to cybercriminals to access and manipulate. For this reason, security of the credit cards is very very important. It is extremely important and very essential to encrypt private information by using suitable encryption & decryption techniques and secure it in cyberspace to avoid data breaches.

2.2.4 Advantages and disadvantages of ensemble learning

Ensemble learning refers to the process in which multiple models like experts or classifiers are generated strategically and are combined for solving a definite problem related to computational intelligence. The paper (Ganaie, 2021) throws light on the different models of deep ensemble learning. This learning process has several advantages & disadvantages or pros & cons which are discussed in the following bullets.

Advantages:

This type of learning is used to improve the overall model performance. It improves the processes of the model like prediction, classification, function approximation, and so on.
It assigns confidence to the decision made by the model and helps it to select optimal features and correct errors.
Besides, it helps in incremental learning and data fusion.
Ensemble methods unlike other individual models, providence the users with higher predictive accuracy.
These methods are especially useful when the data set consists of both linear as well as non-linear types of data.
The authors of the paper (Gao et al., 2019) have made use of the methods of ensemble learning for improving the effects of detection. They proved through analysis and findings obtained by their developed model that using this ensemble method provides effective detection accuracy as compared to other models.
With the use of ensemble learning, project managers can easily deal with variance or bias and remove it. Variance refers to the scattered results that are generally difficult to converge. Bias, on the other hand, refers to the error or mis-calibration that occurred in achieving or obtaining the desired result.
This type of learning basically helps in bringing a consensus-based decentralized approach to ML that further helps in refining results and ensuring precision.
Most of the time, if a model uses this method it will neither be under fitted nor be over fitted.
Ensemble of models often is less noisy and at the same time, it is more stable.
Ensemble learning is often used to find results of better prediction like a high classification accuracy or smaller regression error.
As per the research works of the authors, (Chen, Dong, and Wu, 2022), ensemble learning methods can work by combining the pros of multiple learners and provide robustness, higher model accuracy, and overall induction ability. Therefore, it can be concluded that this method can be an effective technique to be used in crown profile modelling & prediction.

When it is difficult to rely upon one model, the use of the ensemble model can come to the rescue. This is the only reason why winners chose this model while being a part of ML or machine learning competitions. The advantage of the ML algorithm comes with the application of classification and regression techniques in a user system. Classification categorizes the set of statistical data whereas the regression identifies the patterns between those statistical data. The classification algorithm comes with a random forest, xgboost, naive Bayes, and SVM algorithm whereas the regression algorithm comes with random forest regression technique, lasso regression, elastic net, gradient, linear, and boost regression techniques that can be deployed over the security system to identify the patterns between those statistical data.

Disadvantages:

Besides, the advantages of Ensembling models, it also has certain disadvantages as the famous proverb says that “It is a necessary evil”. The disadvantages are as under:

The process of ensembling is a bit difficult to learn. However, it can be learned by experience.
If any wrong selection of method is done, it may result in obtaining a lower level of predictive accuracy as compared to that of an individual model.
The use of models of ensembling is quite expensive in terms of both space and time.
Ensembles can sometimes be difficult to interpret.
Using ensemble learning helps in improving endurance.

Other authors (Huang et al., 2019) have developed prediction models based on the methods of ensemble learning. The authors have used it in combination with extreme ML, multiple linear regression, extreme gradient boosting along with regression for the support vector.

2.2.5 Advantages of implementing machine learning in securing the credit card

With the use of machine learning, the entire population can be segmented furthermore effectively. The models for credit line management search for people having similar behaviour from the already existing data. In this way, it determines the worthiness of transactions done in credit cards based on that dataset. Those people who have good credit scores will receive higher credit limits.

The author (Nguyen et al., 2020), focuses on providing a thorough study related to deep learning methods for detecting fraud in credit card transactions and compares it with other different ML algorithms. These authors have made use of experimental results and proved that their proposed algorithm proved to show effective results as compared to those of traditional machine learning models. The proposed model can easily be used in real-world credit card fraud detection systems. The advantage of the ML algorithm comes with the application of a supervised learning method that can classify malicious activity from normal transactions. The ML classification algorithm can be deployed to classify the transactions accordingly. The classification algorithm can categorize the transaction with a high classification score, which reduces the chance of risks from the transaction data. While the model can provide a valid understanding of the program, the output of the algorithms needs to be identified based on the overall program. In case there are alterations that need to be made to the datasets the process of machine learning should always be done using coding.

The advantage of the classification algorithm over transaction data comes with

Reduction in number of malicious activity
Users can securely use their credit cards for online transaction
It adds a multilayer of security
A huge set of financial data can be analysed through this classification method
An additional layer of security reduces multiple transactions from a single account

“Naive Bayes” or “support vector machine” (SVM) classification algorithm can be deployed on the categorized data to classify the set of statistical data accordingly. Values with customer financial and transactional data can be secured by implementing an ML algorithm. Fraudulent user accounts can be mitigated through the process of classifying the categorized data with normal and fraud data. Proper implementation of ML techniques reduces unauthorized access from a user account. Another author (Vaithyasubramania, 2020), they have illustrated a new scheme of authenticating the credit card transaction by using a primary pin along with multifactor authentication to secure transactions of the credit card. They have made use of ML to develop this model. In their system, the model will give an alert to the customers or credit card holders about any kind of fraud in case of any kind of mismatch situation. The methodology proposed by them aimed to maintain the integrity, security, and privacy of the customer’s information that has already been entered into the system. The authors have also said that biometric authentication can also be used to overcome credit card-related threats. Biometric authentication is obviously a more secure process to maintain good authenticity.

Various techniques of machine learning can be used to detect fraudulent transactions in credit cards like Decision trees, Support Vector Machines, Artificial Neural networks (ANN), Random Forest, Logistic Regression, and so on. On the other hand, there also is space for the implementation of regression algorithms to the dataset, the implementation of algorithms such as Linear Regression, Lasso Regression, Elastic Net Regression, XGB regression, and Gradient boosting regression algorithms could also be implemented. This provides definitive ideas regarding the concept of ensemble learning. Overall, this process is highly useful to ensure that the database is able to extract information

The author (Sadineni, 2020) has mentioned all the machine learning techniques in their paper. They have performed the analysis of all the above-mentioned techniques by using precision, accuracy, and false rate metrics. The dataset that they have used to carry out their experiment is taken from a repository of Kaggle data. Thus, it can be concluded from all the reviews of previous studies how ML is advantageous to be implemented in securing the transactions done on credit cards and how well it can secure the private data of the credit cardholders.

A bagging ensemble is a general approach that can be easily extended. Various changes related to the training dataset can be obtained and the training of the data can be replaced with the help of begging ensemble and the mechanism which is used in combining the predictions can be modified with this learning.

Stacking: The stacking ensemble learning method is used to fit different models on the same data with the help of another model for learning the process of combining the best prediction model. Stacking ensemble learning has a unique nomenclature where members of the ensemble learning are referred to as level-0 models and another model is used to combine the obtained predictions and is referred to as level-1. In this learning, the two-level model is the most common approach however more layered model can be used. More than 3 level-1 models and single level-2 models can be used to combine the prediction of the level-1 model instead of using a single level-1 model (Zhang, Gardner, and Vukotic, 2019). Such models are used to make predictions with the help of a stacking ensemble. There are three main elements present in the stacking ensemble such as an unchanged training dataset associated with the model, implementation of a machine learning model to obtain the process of combining the predictions. and using different machine learning algorithms for every ensemble member (Nur-E-Arefin., 2020).

Boosting: The boosting ensemble learning method is used to add ensemble members sequentially which helps to correct the prediction model. Boosting ensemble learning is used to correct the prediction errors. The models are added sequentially to the ensemble to correct the prediction that is obtained from the first model. The third model corrects the prediction of the second model and the whole is done in this way (Hu, Zhang, and Lovrich., 2021). The booting ensemble uses a very simple decision tree which is used to make single decisions. Such single decisions are referred to as weak learners. The prediction of such weak learners is combined with the help of simple statistics such as voting or averaging. The key elements of boosting are biased training data which is hard to predict, predictions combined with the help of the weighted average of the models, and the addition of ensemble members for correcting the prediction related to the prior models.

Several ensemble learning is based on this approach such as gradient boosting machines, stochastic gradient boosting, and AdaBoost which is often considered as canonical boosting.

There are several advantages and disadvantages associated with ensemble learning that are mentioned below.

Advantages: Ensemble learning poses several advantages which are beneficial for obtaining credit card detection. Higher predictive accuracy can be achieved with the help of ensemble methods compared to other predictive models (Jhangiani, Bein, and Verma., 2019). In the case of linear and non-linear data present in the dataset, it becomes difficult to prepare a predictive model, and in this scenario, ensemble learning can be beneficial. Ensemble learning is capable of producing a combination of different models to handle such cases. The bias or variance in the dataset can be reduced with the help of .ensemble learning and de to this factor the problem of underfitting or overfitting the model can be overcome (Panigrahi, Saitejaswi, and Devarapalli., 2019). The stability of the model can be obtained with the help of ensemble learning. This method is less noisy and easy to use. Such advantages of ensemble learning make the system more preferable and more reliable. The ensemble learning along with the python programming language is used in the detection of credit card fraud in shopping malls. Such advantages of learning can be beneficial for obtaining the required output of the system.

Disadvantages: There are several problems associated with ensemble learning which can cause several difficulties in obtaining the required output of the research. The ensemble learning faces problems during interpretation and the obtained output of the ensemble learning is hard to predict (Ryman-Tubb, Krause, and Garn., 2018). The operation of the system is very critical and any wrong input can lead to lower predictive accuracy which is a severe problem that can create huge difficulties in detecting credit card fraud. Ensemble learning is costly due to this factor it creates several problems in detecting credit card fraud. Such difficulties create several problems which need to minimize so that a better outcome can be obtained.

2.3 Literature gap:

Usage of ensemble learning in various sectors of purpose has expanded in recent years and these types of technology can be used in most of the critical sectors like business, corporate or other purposes. As the usage of credit cards in shopping malls during payment methods has increased in recent years, the fraud is also increasing for using credit cards at shopping malls. Although this issue of fraud while using credit cards can be solved by the modern technology of ensemble learning as discussed in this whole report. Ensemble learning or machine learning has come a long way from the remedial methods of the issues related to fraud and improving the payments methods oriented to credit card or debit card. But there are some literatures gaps also present in this technology.

The literature gap is the area where it could be found that are the areas that are not discussed in the previous researchers and the areas that are left unexplored from the previous reports. Generally scientists and developers try to create a framework or a structure to resolve the issues using the existing methods, they either try to improve new technology or try to update the existing remedial ways that are preferred in the researches that happened in the previous era. Literature gaps are mainly the inspiration behind the invention or updating new technologies and “research expansion” (Abdelrahman and Keikhosrokiani, 2020). The gaps can be further explored or invented to recreate the existing structure and for “better understanding of the discussion”.

It has been seen that new ensemble learning technologies have failed to perform well in the prediction of the number of users due to the “complex association features” in the access of the user during payments methods. There are many costumes that use these kinds of machine learning technologies to make payment for their bought items and various types of machines and mobile applications are being used as a medium for their payment (Yousefi et al. 2019). Even for the credit card there are more two or three methods as there are machine where the credit card can be scratched for the payment, there are machines that can access the credit card and fulfill the payment methods using the magnetic chip method without even touching the credit card and there are also some mobile application from which the payment can be done by applying the secret pin code. Due to these kinds of various methods of paying the price, ensemble learning technologies are facing difficulties due to the complexity of the applicable machines.

The variety of data which are given to machine learning technologies can show the lack of properly maintained model or structure of this technology and the gaps are clearly can e seen at any practical venue where this kinds of payment often happens like in a shopping mall where the customers are facing many obstruction while doing the employment most of the time but credit card. For this kind of complexity many frauds are happening under the drawbacks and suffering for the customers are continuously increasing.

Improving the literature gaps:

As it is analysed that what are the advantages and disadvantages of using the ensemble technologies for the detection and security measures of fraud happening in the shopping malls during the payment methods include the credit cards, the literature gaps also came in front and the harassment related to it (Goyal and Manjhvar, 2020). With the various gaps and drawbacks of the ensemble learning, there are also some remedial ways identified which can fulfil the literature gaps.

For fulfilling the voids of the at first the credit card need to be kept there where it cannot be easily accessible, without the access of the credit card for too many users, the fraud will decrease and one type of gap can be cured (Yontar et al. 2020). For the main issue or gap of using ensemble learning which is to provide data related can be cured using some steps. The authority of the shopping mall should obtain some more updated database system to categorize all the data that are daily added to their server from the customers. If the database is well structured and the data are categorized then the complexity will reduce for the data server (Keswani et al. 2020). There should be separated groups of tables of the data like for ones who use swipe facility or the ones who use magnetic touch facility and for the ones who use online credit card service. This separated and well categorized data will decrease the complexity and will help at the time of data analysing.

2.4 Summary:

The summary section of the part “critical review of literature” includes the brief discussion of the whole report and compiled theory of all the elaborated details of this report. At first an overview is given to understand the basic fundamentals of the usage of ensemble learning technologies to detect the frauds that are happening regarding credit cards and what are the database analysing tools or types of algorithm used to apply the methods. The difficulties that are faced by the technology regarding detection of credit cards and the steps of sorting problems are clearly discussed step by step. The ways of structured models that will be used for the future obstructions and potential threats is also included.

The secured credit card has various kinds of acuity in the modern era starting from getting any loan to payment of any product (Shivanna et al. 2020). In addition to the daily use of credit cards and busy schedules for the user there are often some minor security threats that are usually overlooked. That happens because of the flaws of this technology. All the gaps that are discussed here have to be identified in the early stage of the project and the actions that are to be performed are related to the issues. Using credit cards there are lots of advantages present regarding the payment methods and they also have risks and it can be said that from this discussion about critical literature review all the aspects whether it is good or it is risky is elaborated apparently with the possible reformative methods that could be used for the security measures in the coming future.

Chapter 3: Research Methodology

3.1 Introduction

The implementation of ensemble learning deals with implementing a number of selected machine learning models on a dataset. Based on these initial models, the program is used to find solutions that can provide an even better output by finding the most useful model by generalizing the models. This helps the python program to find solutions that will be able to represent the overall output of the program. This provides details regarding the alterations which can be made to the code to find the most relevant machine learning algorithm. In this case, the processes of Bagging, Boosting, and Stacking models have been carried out using the following algorithms:

Linear Regression
Lasso Regression
Elastic Net Regression
Gradient boosting regressor
XGB regressor

The implementation of these facets ensures that the database can be used to identify the solutions based on the requirements of the program. In case there are alterations that could be made to the output of the program, the output would be wildly different. Hence, the output of the initial dataset is majorly different from the initial database. The output, however, is entirely dependent on the initial table. It is vital to note in this case, the objective is to identify the most relevant algorithm used in the program, not its level of efficiency. Thus, even if the accuracy (or error) scores are not as high, it does not matter, the identification of the most useful algorithm in this case is.

Before the implementation of the program, it has to be understood that the initial process of preparing the database has to be carried out in this program. A vital aspect that was kept in mind while developing the program, in this case, was the size of the dataset. If a small dataset has been used, the overall program would take far less time to run, however, the implementation of the algorithm would be incorrect. This means that the output of the program would not be entirely relevant. Hence, the output of the program would be altered based on the initial data. Thus, a big dataset was selected that represented fraud prediction in the major institutions.

After the selection and importing of the dataset, it had to be checked for the presence of any redundant values. The removal of such values is vital to ensure that the output of the algorithms would not face issues associated with dealing with null and other redundant values. Initially, the dataset has to be checked for the presence of null values. The presence of the duplicate values would not be checked since, in a continuous dataset, the presence of the duplicate values is quite natural.

3.2 Research strategy

The process which has been followed in this program was entirely dependent on the implementation of the dataset. In this case, a major dataset was used to identify the issues associated with the development of the program. Initially, the dataset was downloaded from an online data library. This provided the algorithm with an opportunity to develop outputs in the program. Hence, the overall application has been used to identify the advantages of implementing the dataset. The development of the output has been entirely based on the creation of the program in a manner that represents the overall dataset. In case there are alterations that could be used to find solutions based on the initial database, these had to be created based on the initial dataset.

The concept implemented in the program is based on the identification of the output. In this case, the beginning of the program is based on the initial algorithm of linear regression. This process is followed by the implementation of four more relevant algorithms. Finally, the output of these algorithms has to be used to stack the initially implemented algorithms of machine learning to create a completely new model. The selected models which have been implemented in the program are entirely based on this concept. After the implementation of the linear regression algorithm, the remaining algorithms of Lasso Regression, Elastic Net Regression, Gradient boosting regressor, and finally, XGB regressor algorithms were implemented. After the implementation of these algorithms, the implemented algorithms were identified as the most useful in identifying the outputs based on the dataset.

However, before the implementation of the algorithms, the overall process of implementing the algorithms, the dataset had to be prepared for this process. Thus, the dataset was split into subsets of a target column and the remainder of the dataset. In this case, the target column was selected as the "is fraud" column, kept under the variable "y". While all the remaining columns were kept under the variable "X". This ensured that the output of the column could be used to test the output of the program. Following this phase, the dataset has to be split into test and train segments. The train segment represented 75% of the dataset while the test segment constituted the remaining 25%. In case a smaller dataset had to be implemented in the program, the output could have been flawed. However, the effect of the output would not be entirely relevant to the program. Thus, a massive dataset has been used in the program to represent the output.

3.3 Research approach

In the financial transaction department, various types of fraudulent activity have been detected. The main motto of this research is to predict all these types of fraud-related activity in payment services. The prediction has been done after executing various analysis processes. Here data analysis has been executed based on the collected dataset that contains information motion related to the financial fraud-related activity. Detailed approaches to this research have been mentioned in this section to understand the initial approaches of the research process. All type of activity has been done on the python-based software IDE that is Jupyter notebook. The entire activity is done through python programming. This programming code has been run on this software platform. Data mining is a trending approach in this current scenario where all type of activity has been done on the online platform.

Suitable data collection is the initial step of this research approach. This data set has been collected from an online platform that is “kaggle” (Kaggle.com, 2022). On this website, all type of informational dataset has been available. Here a fraud-related dataset has been collected from this website to develop the prediction model based on this historical informational set. Here these types of details are based on the current and past conditions of financial transactions. At first in this data analysis process, all required library functions are imported into this platform of python IDE. These library functions are helpful to import data into the platform and also help to use the inbuilt d\functions of python code. For example, seaborn is a python library that has been used here to develop all types of visualization plots. After that, the collected fraud dataset has been imported into the software ID. To execute the further process of analysis. After that, splitting of the dataset has been done to implement a learning algorithm on it.

3.4 Research design

The implemented method to develop outputs in the program included the use of a dataset that was downloaded from an online data library. This process ensures that the dataset was available for every observer to be able to download and review. Following this process, the dataset was imported into the program and an initial set of data cleaning codes were run to observe the output. The process can be carried out based on the initial database to identify the alterations which had to be made to the dataset. Upon checking for the presence of null values in the dataset it was observed that there were none. Hence the quality of the dataset was confirmed even more. In this case, there were no issues with the dataset, however, if there were some of the alterations required in the dataset, the missing values in the column could be replaced by the mean value of the column.

Following this phase, the dataset was divided into X and y sections. The “y” variable contained all the information regarding the target column, while the remainder of the table was kept under the variable “X”. Hence, the overall output of the program was entirely based on the creation of the algorithm and was able to represent the output of the machine learning algorithms. Following this phase, the dataset was split into test and train segments. While the train section contained 3/4ths of the dataset, the remaining 1/4th was kept under the test section of the dataset. This process ensured that the dataset was now able to be represented by the algorithm. Hence, the following step included the fitting of the dataset into the machine learning models. After the fitting of this model, the dataset was able to use the overall process of creating a relevant output and develop an output that could represent the program.

Following the implementation of the algorithms, the output of the program could be extracted. Hence the output could be used to find solutions for the overall processes of Bagging, boosting, and stacking. Following the initial process of the program, the dataset was able to sit through the passage of the process of developing outputs based on the initial creation of the program and developing outputs. Based on the initially implemented machine learning algorithms, the overall output of the program was then taken into consideration and stacked into one single machine-learning algorithm to be able to successfully create a dataset and ensure that the output could be used to build a new model by stacking.

It is vital to note that the process of visualization of the data has to be carried out only after the main process of ensemble learning has been carried out. This aspect is important to remember since the development of the outputs was not completely guaranteed in the program. Hence, the output had to be extracted at first, only after this process of implementing the algorithms, the visualisations were carried out. In this case, the overall visualisation was then finally developed after the creation of the database was completed. This process ensures that the overall database was able to represent the program in a manner that ensures the program was able to represent the initial database in a visual manner. The graphical representation of the dataset ensures that the overall dataset was able to be represented in a highly meaningful fashion.

3.5 Data collection techniques

Data collection is an important part of the data analysis process. Based on this dataset the entire prediction has been done. A suitable dataset has been needed for this research process. Here a quantitative dataset has been suitable but in this dataset, there are son non-numerical values all these non-numerical values and null values are removed by cleaning the dataset by using python comment. The dataset has been chosen from the online platform, that is kaggle after that this dataset has been stored to understand the variable name: “credfraud” (Kaggle.com, 2022). Fraud-related details based on the current scenario have been available in this dataset. After collection of this dataset, there are multiple column structures such as type of payment, mane, old balance, present balance, amount, and other details. Based on these details the entire analysis has been executed. The entire analysis process has been executed on the python software platform to develop the error ratio after using the ML algorithm on this selected dataset. In this dataset credit card, fraud details have been mentioned that help to develop a prediction model based on this credit card fraud activity. There are various activities that are executed on this credit card fraud details dataset to understand the hidden pattern of the dataset and also develop the best error score after execution regression methods. The entire analysis process has been executed after importing the dataset into the software platform.

3.6 Data analysis plan

In this, different types of machine learning techniques are used for recommending the process in the system for developing the features of the system. Linear regression is used for developing the requirements in the model that could be used for finding the line based on the points used that are available on the system. It also helps in developing the plot system for generating the process that would help in delivering the output based on the outputs of the system that would help design the process in the system for covering the output in the system. It also helps in predicting the values based on the input of the system which would be helpful for discovering the methods properly.

In this model, Lasso Regression is also used for completing the penalized product that would be helpful for creating the selection of the methods in their system. This process is completed using the techniques of the machine learning algorithms that could be considered for demanding the products in the system. It is based on the subsets of machine learning algorithms that help in predicting the system that allows the users for maintaining the requirements of the users based on the error of the system that would be utilized for developing the regression models in the system. It also helps in increasing the interpretation of the model based on our requirements of us which would help develop the system.

Elastic Net Regression is also used in the system which would be helpful for developing the model in the correct order so that it gives better errors in the system. It is also one of the regression models that would inspire the system for achieving the file for creating the models for interpreting the demands of the system. This would also use the terminology that could be helpful for finding the statistical learning process in hyper terminology that helps in generating the factors based on the requirements of the system. It also contains the process of bagging boosting and stacking as it could be helpful for developing the files based on the given factors in the system. It also helps in generating the parameters that would affect the system and increase the efficiency in the system would improve the process in the files. The process is used for defining the values in for, of alpha based on the penalties of the system.

Gradient boosting regressor is used in the system for calculating the values for maintaining the prediction in the system that could be helpful for identifying the target value in the system that could help in generating the residual value in the system. XGB regressor is helpful for implementing the procedures in the system that would be useful for developing the predictive models in the required system that helps in validating the cross models for predicting the new data in the system. It also helps in creating the final model that would be helpful for generating the new data in the system.

3.7 Ethical consideration

The creation of a program such as this consists of maintaining several facets to ensure that the development of the program does not impede laws. Hence, the process of creating the program has been based on maintaining the data privacy laws stated by the government of the UK. To maintain the law stated, it is vital to ensure that none of the names or any form of personal information is used in the creation of the program. Since this information can release private data for the users into the world, it is necessary to ensure that secrecy is maintained (Legislation.gov.uk, 2018). This process confirms that the members of the development method ensure that the program was able to satisfy the overall requirement and also ensures that the program is also able to carry out the process of ensemble learning. Besides this, all the literature which has been studied for the creation of this program has been mentioned at the end of the research in the