{"id":21075,"date":"2024-04-13T19:13:07","date_gmt":"2024-04-13T19:13:07","guid":{"rendered":"https:\/\/www.writemyessays.app\/blog\/questions\/data-collected-experiment-with-computers-for-enhancing-software-vulnerability-detection-with-large-language-models\/"},"modified":"2024-04-13T19:13:07","modified_gmt":"2024-04-13T19:13:07","slug":"data-collected-experiment-with-computers-for-enhancing-software-vulnerability-detection-with-large-language-models","status":"publish","type":"questions","link":"https:\/\/www.writemyessays.app\/blog\/questions\/data-collected-experiment-with-computers-for-enhancing-software-vulnerability-detection-with-large-language-models\/","title":{"rendered":"Data collected!!  experiment with computers for Enhancing Software Vulnerability Detection with Large Language Models"},"content":{"rendered":"<h4 style=\"margin: 6px 0px; line-height: 1.5; cursor: auto; color: rgb(66, 68, 90);\">Task: the paper is attached and 75 percent done &nbsp; &nbsp;I just need the research part done and then added to the final paper provided. &nbsp;below is the data I wanted to be<\/h4>\n<div>collected which involves knowing about vulnerbilties, python, Openai, GPT-3 turbo and GPT-4 turbo. &nbsp;<\/div>\n<div><\/div>\n<ol style=\"margin-right: 0px; margin-bottom: 6px; margin-left: 25px; font-size: 16px; cursor: auto;\">\n<li style=\"cursor: auto;\">\n<p style=\"margin: 12px 0px; cursor: auto;\"><strong style=\"font-weight: bold; cursor: auto;\">Data Collection<\/strong>: Based on the methodology and tools identified in Assignment 3, collect or prepare a set of data. Ensure that the data is sufficient in quantity and quality to support the analysis.<\/p>\n<\/li>\n<li style=\"cursor: auto;\">\n<p style=\"margin: 12px 0px; cursor: auto;\"><strong style=\"font-weight: bold; cursor: auto;\">Data Analysis<\/strong>: Analyze the collected data to answer your research questions. This analysis should be methodical and may involve various techniques, such as statistical analysis, pattern recognition, or comparative studies.<\/p>\n<\/li>\n<li style=\"cursor: auto;\">\n<p style=\"margin: 12px 0px; cursor: auto;\"><strong style=\"font-weight: bold; cursor: auto;\">Findings and Interpretation<\/strong>: Interpret the results of your analysis. Discuss what the data reveals about your research questions and the broader topic. Identify any significant findings, unexpected results, or areas for further research.<\/p>\n<\/li>\n<li style=\"cursor: auto;\">\n<p style=\"margin: 12px 0px; cursor: auto;\"><strong style=\"font-weight: bold; cursor: auto;\">Submission<\/strong>: Prepare a presentation with (please upload the PPTs here):<\/p>\n<\/li>\n<ul style=\"margin-right: 0px; margin-left: 25px; cursor: auto;\">\n<li style=\"cursor: auto;\">A detailed account of the data collection process<\/li>\n<li style=\"cursor: auto;\">The methods and techniques used in your analysis<\/li>\n<li style=\"cursor: auto;\">Interpretation of the findings, discussing how they relate to your research questions<\/li>\n<li style=\"cursor: auto;\">Any conclusions or recommendations based on your analysis<\/li>\n<\/ul>\n<\/ol>\n<div><\/div>\n<div><\/div>\n<div><span style=\"cursor: auto;\"><\/p>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">I. Introduction<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">We aim to gain a larger understanding of the potential, effectiveness, and challenges involved in utilizing Large Language Models (LLMs) as a means of identifying and correcting software vulnerabilities.&nbsp;<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">A. Research Questions<\/span><\/p>\n<ol style=\"cursor: auto;\">\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">How do Large Language Models (LLMs), such as Open AI\u2019s GPT-4, enhance the detection and correction of software vulnerabilities compared to traditional static analysis tools?<\/span><\/p>\n<\/li>\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">How effective are LLMs in identifying vulnerabilities in software compared to traditional vulnerability detection methods?<\/span><\/p>\n<\/li>\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">What challenges and limitations are associated with using Large Language Models (LLMs) for vulnerability detection in software?<\/span><\/p>\n<\/li>\n<\/ol>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">II. Preliminary Data<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\">\t<\/span><\/span><span style=\"font-size: 12pt; cursor: auto;\">Existing research already provides a small insight into the potential of integrating LLMs into common security practices. LLMs, in comparison to traditional static analysis tools, provide an estimated 20-25% higher false positive rate, but an average of 35% higher true positive rate [1]. GPT-4 detected approximately four times more vulnerabilities than conventional tools with only around 6.67% being false positives [2]. LLMs can detect vulnerability with great accuracy, up to 92.65%, outperforming their counterparts, as a result of their abilities to efficiently analyze natural language data [3]. Another LLM model, SecureFalcon, showcases an accuracy rate of 94% in vulnerability detection [4]. We hope that upon completion of our research, more conclusive results on the nature of the effectiveness of LLMs will be available.<\/span><\/p>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">III. Methodology<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\">\t<\/span><\/span><span style=\"font-size: 12pt; cursor: auto;\">The purpose of this study is to directly compare the effectiveness of LLMs to traditional static analysis tools in identifying security vulnerabilities. Following a detailed analysis of the collected results, an assessment of each tools strengths and weaknesses will be conducted and compared to existing research.<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">A. Tools and Resources<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">The selection of each tool\/resource to be used in the experiment serves a clearly defined purpose. A common theme in our selection process for each resource was to pick one tool that was utilized in at least one of the studies from our preliminary research, and one tool that was not. This would allow for the results of our evaluation to be easily comparable to that of existing research, while also presenting new data for discussion. Furthermore, we decided to select resources that are very commonly used in order to allow for an increase in reliability and accessibility.<\/span><\/p>\n<ol style=\"cursor: auto;\">\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Large Language Models (LLMs)<\/span><\/p>\n<\/li>\n<\/ol>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Because OpenAI models are among the most widely recognized and utilized LLMs, we decided to select GPT-3.5 Turbo and GPT-4 Turbo for analysis. We will initially employ GPT-3.5 Turbo, however, if the test cases provide inconclusive\/unreliable results or if GPT-3.5 Turbo significantly underperforms against traditional static analysis tools, we will then transition to GPT-4 Turbo. This approach allows us to supply through testing while strategically managing resources to align with our objective of optimizing cost-efficiency in our methodology.<\/span><\/p>\n<ol style=\"cursor: auto;\">\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Traditional Static Analysis Tools<\/span><\/p>\n<\/li>\n<\/ol>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\">\t<\/span><\/span><span style=\"font-size: 12pt; cursor: auto;\">The two traditional static analysis tools selected for our experiment are Flawfinder and SonarQube. Flawfinder was utilized in related research which will give us a solid baseline for comparison. SonarQube was not which allows for original research to be presented for supplemental analysis and discussion. Because Flawfinder is only compatible with C\/C++, these programming languages will be the focus of our research in relation to which datasets are chosen. Additionally, both SonarQube and Flawfinder are very common tools.<\/span><\/p>\n<ol style=\"cursor: auto;\">\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Datasets<\/span><\/p>\n<\/li>\n<\/ol>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\">\t<\/span><\/span><span style=\"font-size: 12pt; cursor: auto;\">Testing cases will be pulled from the Software Assurance Reference Dataset (SARD) and the Common Vulnerabilities and Exposures (CVE) Database. Both libraries are very trusted and reliable sources that receive continuous updates. With such extensive and diverse databases, it should eliminate any issues regarding finding testing cases relevant to the experiment.&nbsp;<\/span><\/p>\n<ol style=\"cursor: auto;\">\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Vulnerabilities<\/span><\/p>\n<\/li>\n<\/ol>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\">\t<\/span><\/span><span style=\"font-size: 12pt; cursor: auto;\">In order to narrow down the testing case options to align with the deadline for the final report, we decided to choose five specific vulnerabilities to be used in the data collection process: SQL Injection, Buffer Overflow, Cross-Site Scripting (XSS), Out-of-Bounds Write, and Broken Access Control. These vulnerabilities were strategically selected because of their high levels of practicality and real-world significance. Furthermore, by having a diverse selection of vulnerabilities, there is a greater chance of seeing significant differences in detection by LLMs versus traditional static analysis tools.<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">B. Evaluation Metrics<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\">\t<\/span><\/span><span style=\"font-size: 12pt; cursor: auto;\">In order to foster consistency with existing research, each tool will be measured based on the following attributes:&nbsp;<\/span><\/p>\n<ul style=\"cursor: auto;\">\n<li style=\"font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-weight: 700; font-size: 12pt; cursor: auto;\">True Positive Rate (TPR)<\/span><span style=\"font-size: 12pt; cursor: auto;\">: TPR refers to the rate at which the tool was correctly able to detect a vulnerability within the test case.&nbsp;<\/span><\/p>\n<\/li>\n<li style=\"font-weight: 700; font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">False Positive Rate (FPR)<\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">:<\/span><span style=\"font-size: 12pt; cursor: auto;\"> <\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">FPR represents the ratio of flagging a non-vulnerability as a vulnerability.<\/span><\/p>\n<\/li>\n<li style=\"font-weight: 700; font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">True Negative Rate (TNR)<\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">:<\/span><span style=\"font-size: 12pt; cursor: auto;\"> <\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">TNR is the proportion of actual non-vulnerabilities that are correctly identified as negative by the tool.<\/span><\/p>\n<\/li>\n<li style=\"font-weight: 700; font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">False Negative Rate (FNR)<\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">:<\/span><span style=\"font-size: 12pt; cursor: auto;\"> <\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">FNR is the rate of failure in detecting existing vulnerabilities.<\/span><\/p>\n<\/li>\n<\/ul>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">While such data will provide a general overview of how each tool perform, advanced metrics will be calculated that allow for more relevancy in the context of vulnerability detection effectiveness:<\/span><\/p>\n<ul style=\"cursor: auto;\">\n<li style=\"font-weight: 700; font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Accuracy<\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">: <\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\"><\/span><\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">Accuracy is the measure of how often the tool is able to correctly determine the results of the test case, whether it be vulnerable or not.&nbsp;<\/span><\/p>\n<\/li>\n<li style=\"font-weight: 700; font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Precision<\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">: <\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\"><\/span><\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\"> Precision refers to the percentage of correctly identified vulnerabilities.<\/span><\/p>\n<\/li>\n<li style=\"font-weight: 700; font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">Recall<\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">: <\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\"><\/span><\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\"> Recall represents the proportion of vulnerabilities that is able to be detected by the tool.<\/span><\/p>\n<\/li>\n<li style=\"font-weight: 700; font-size: 12pt; cursor: auto;\" aria-level=\"1\">\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">F1 Score<\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\">: <\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\"><\/span><\/span><span style=\"font-weight: 400; font-size: 12pt; cursor: auto;\"> An F1 score combines the precision and recall in order to provide a measure of predictive performance.<\/span><\/p>\n<\/li>\n<\/ul>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">C. Data Collection<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><span style=\"cursor: auto;\">\t<\/span><\/span><span style=\"font-size: 12pt; cursor: auto;\">The identified LLMs (GPT-3.5 Turbo &amp; GPT-4 Turbo) will be directly compared to the traditional static analysis tools (SonarQube &amp; Flawfinder) in their ability to detect security vulnerabilities in C\/C++ databases (SARD &amp; CVE). This will be done through analysis of the selected evaluation metrics. We will start the research by conducting preliminary testing on two instances of each vulnerability to gain insight into the subsequent procedures. This will allow for any necessary modification to occur before excessive resources are exhausted. Our objective is to examine a minimum of ten different cases for each vulnerability, with the option of expanding the testing scope should initial results prove inconclusive. A manual review process will take place subsequently in order to validate and verify the accuracy of the collected data. Detailed documentation of the entire procedure will be taken to maintain transparency and reliability.<\/span><\/p>\n<p style=\"margin-left: 36pt; margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">IV. Conclusion<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\">The main objectives of this project is to compare LLMs with traditional static analysis tools in terms of ability to effectively detect security vulnerabilities and to validate the potential of LLMs to enhance software security practices. The limited existing research acknowledges this potential, suggesting that LLMs have higher true positive rates with lower false positives. Future research on LLM optimizations for security should be conducted before mass integration of LLMs into the security workflow.<\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><br \/><\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><br \/><\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><br \/><\/span><\/p>\n<p style=\"margin-top: 0pt; margin-bottom: 0pt; line-height: 2.4; cursor: auto;\"><span style=\"font-size: 12pt; cursor: auto;\"><br \/><\/span><\/p>\n<p><\/span><br style=\"cursor: auto;\"><br style=\"cursor: auto;\"><\/div>\n<div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Task: the paper is attached and 75 percent done &nbsp; &nbsp;I just need the research part done and then added to the final paper provided. &nbsp;below is the data I wanted to be collected which involves knowing about vulnerbilties, python, Openai, GPT-3 turbo and GPT-4 turbo. &nbsp; Data Collection: Based on the methodology and tools [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","template":"","meta":[],"disciplines":[],"paper_types":[],"tagged":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/questions\/21075"}],"collection":[{"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/questions"}],"about":[{"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/types\/questions"}],"author":[{"embeddable":true,"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/comments?post=21075"}],"version-history":[{"count":0,"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/questions\/21075\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/media?parent=21075"}],"wp:term":[{"taxonomy":"disciplines","embeddable":true,"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/disciplines?post=21075"},{"taxonomy":"paper_types","embeddable":true,"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/paper_types?post=21075"},{"taxonomy":"tagged","embeddable":true,"href":"https:\/\/www.writemyessays.app\/blog\/wp-json\/wp\/v2\/tagged?post=21075"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}