HOW DOES AI SECURE SMART CONTRACTS? PRACTICE SHARING FROM GENERIC MODEL TO THREE AUDIT MODELS
A complete security system for the Web3 project is being built。

Original source:Beosin
In recent years, large language models such as GPT-4, Claude, Gemini have become better equipped to understand codes, to read smart contract language such as Solidity, Rust, Go, and to identify classic loopholes with obvious code characteristics, such as re-attacks, integer spills, etc。This has led the industry to wonder whether big models can be used to complement or even replace contractual audits
Because generic models do not have sufficient understanding of the business logic of a specific project, in the face of complex DeFi agreements, there is a high rate of misreporting and the risk of missing loopholes that need to be combined with cross-contract interactions or economic models to be identified. Later, the industry proposed a programme to join the “Skill” mechanism - Based on a large generic model, a dedicated knowledge base, testing rules and operational context for smart contract security is in place to provide the model with a clearer basis for judgement at the time of audit, rather than relying solely on universal capability to determine whether the code is problematic。
Even with the Skill enhancement, AI audits have a clear scope of application. It's good at scanning and code standard checks for known leak patterns, but..It is still difficult to address effectively the complex gaps that require in-depth understanding of overall agreement design, cross-contract logic or economic modelsI don't know. Such issues still need to be addressed by experienced audit experts and, in the context of complex computing logic, the introduction of formalization to provide greater assurance. In this context, Beosin built the Skill Enhanced AI Baseline Check + Manual Depth Audit + Formalized Three Audit Models, each focused and complementary。

SCOPE OF AUDIT CAPACITY OF THE GENERIC AI MODEL: CONTROLLED COMPARISON TESTS AND CASE ANALYSIS
FROM THE PROJECT LIBRARY WHERE MANUAL AUDITS HAVE BEEN COMPLETED, THE PAPER SELECTS TWO TYPES OF CONTRACTS THAT VARY CONSIDERABLY IN COMPLEXITY TO TEST CASES: ONE IS SIMPLE CONTRACTS THAT ARE MORE LOGICALLY INDEPENDENT AND HAVE CLEAR FUNCTIONAL BOUNDARIES. SUCH PROJECTS ARE USUALLY AI'S MOST WELL-DOCUMENTED AND THEORETICALLY DOMINANT SCENARIO; THE OTHER INVOLVES COMPLEX CONTRACTS INVOLVING MULTI-CONTRACT INTERACTION, COMPLEX-STATE MACHINES OR CROSS-AGREEMENT RELIANCE, WHICH IS THE HIGH-RISK SCENE MOST OFTEN PRESENTED WHEN INDUSTRY DISCUSSES “AI AS AN ALTERNATIVE TO MANUAL AUDITING”。
IN COMPARISON, WE USE THE EXACT SAME CODE LIBRARY TO ALLOW AI TO RUN THE AUDIT INDEPENDENTLY, GENERATE THE REPORT AND ALIGN IT WITH THE MANUAL AUDIT REPORT. THE OUTPUT PROCESS OF BOTH REPORTS IS COMPLETELY NON-INTERRUPTED - - THE MANUAL AUDITORS DO NOT KNOW THE RESULTS OF AI WHEN THEY REPORT, AND DO NOT INFLUENCE EACH OTHER. FINALLY, WE WILL ANALYSE THE RESULTS FROM FOUR DIMENSIONS:

Case A & Standard Currency Contract (BSC-USDT / BEP20USDT.sol)
The first set of tests we took a standard & nbsp; the BEP-20 token contract, prepared using Solidity 0.5.16. Its logic is relatively independent, its functional boundaries are clear, it does not involve any cross-contract interaction, and its main security risks are concentrated in some common and known patterns of loopholes. These types of contracts are now theoretically the most dominant scenario for AI auditing... • There are many such standard token contracts in training data, and there are more obvious gaps in rules。

AI out a total of 6 alerts (2 high-risk, 1 medium-risk, 3 low-risk/recommended), which are significant in quantitative terms. The low-risk and recommendation items are generally accurate and cover common code specifications such as the old Solidity version and the way in which the state variable is exposed, with some reference value. However, both of AI's "high-risk" outputs constitute miscalculation. AI labels the homeer casting rights and privileges as a high-risk loophole - – In practice, for a centrally stable currency (USDT), ownership of the seigniorage is an expected design, and risk assessment should be combined with a combination of multiple signature controls, authority governance mechanisms, and contract promotion strategies。The rationality of such authority structures depends fundamentally on the business model of the project rather than the code itself, AI LACKS THIS LEVEL OF LANGUAGE AND CAN ONLY JUDGE ON THE BASIS OF PATTERN MATCHING。

The test case showed that AI was able to identify the authority structure, but was unable to judge whether the authority was reasonable in relation to the business context, so the direct marking of the owner seigniorage of USDT-type contracts as a "high-risk loophole" was a classic error away from the actual logic of business — such misrepresentations could interfere with the project's judgement of real risk。
Case & nbsp; B Complex business contract (IPC Protocol / 2025-02-recall)
The second group tested the IPC Protocol project in the public report of the Code4rena platform (report link: code4rena.com/reports/2025-02-recall). The project consists of multiple interdependent core components, such as Gateway, SubnetActor and Diamond proxy models, and safety is highly dependent on a deep understanding of the overall structure of the agreement and the logic of cross-component interaction, which is the typical scene of high-value ecological attacks in DeFi. The following are the findings of AI:

In respect of complex contracts, AI audit of total output 3 high-risk, 6 high-risk, and risk alerts, not bad output. However, a significant proportion of these were found by the auditors to have been misreported - AI made an erroneous risk judgement for the lack of context. At the same time, out of the & nbsp; 9 High-level loopholes identified by the auditors, AI covered only one item in its entirety, and two others were found but clearly rated low (actually High, AI reported Mediam), while the remaining six were not found at all. Out of 4 Medium-level holes, AI covered 1 item and 3 were completely missing。
The common feature of these gaps is that they all rely on a complete reasoning of the protocol's transition path across components rather than a model matching a single function。In the case of & nbsp; H-01 (re-opening of signatures) in manual audit reports, the loopholes need to be used to understand the design intent for multiple-signature verifications, how the aggressor constructs a duplicate signature collection, and how the behaviour circumvents the weight threshold. The same is true for H-06 (leave() function re-attack): the loophole exists only in the critical state of the subnet Bootstream, which requires an understanding of the cross-dependence between the pledge flow, the Bootstream trigger and the external call time sequence. A similar deep logical loophole is not recorded in the list of alarms for AI。

THE RESULTS SHOW THAT IN COMPLEX CONTRACTUAL AUDITS, AI ' S AUDIT CAPACITY IS BASED ON LOCAL CODE MODEL RECOGNITION, WHILE AGREEMENT-LEVEL GAPS MAY HAVE A BIAS IN UNDERSTANDING THE OVERALL BUSINESS LOGIC. AI ' S CURRENT REASONING CAPACITY CANNOT BE EFFECTIVELY COVERED WHEN THE TRIGGER CONDITIONS OF THE LOOPHOLE SPAN MULTIPLE CONTRACTS, MULTIPLE STATES AND MULTIPLE CALL LEVELS。
IN A COMBINATION OF TWO CASES, AI AUDITS WERE NOT WITHOUT VALUE -It makes a substantial contribution to the coverage of known patterns of loopholes, code specifications checks and the discovery of some independent perspectivesI don't know. But its value boundary is very clear:It can be a baseline scan, but not a direct security conclusion。For complex agreements, reliance only on AI reports will not only leave out high-risk loopholes, but also because a large number of low-quality whistleblowers occupy significant screening time in teams. This is at the heart of Beosin ' s creation of the Skill knowledge base and the introduction of the three audit model mechanism in the audit process.
II. Special Skill Knowledge Bank: Engineering Paths to Upgrade the AI Baseline Inspection
In order for AI audits to be included in the audit process of the baseline inspection, it will be necessary to address their high rate of misstatement and underreporting in the audit of genuine DeFi agreements. Whether it is authority management, AMM liquidity mechanisms, news validation across the bridge, or the clearing logic of a loan agreement, AI is currently able to simply match the characteristics of the face of the code, making it difficult to determine whether or not a code is problematic in relation to a specific business landscape and defensive logic. At the heart of this problem is the structured incorporation of the experience accumulated by audit experts over the years into AI judgement to provide them with a certain level of operational understanding。
It needs to be made clear, however, that even with the introduction of Skill enhancements, AI ' s positioning in audits will not change。Manual audits remain irreplaceable for complex issues involving multi-contract interaction, economic modelling and new attack techniquesI don't know. The role of Skill is to raise the quality of initial scans to a truly useful level within the scope of AI (e.g. to identify common loophole models and to understand business logic to a limited extent) and to provide more valuable preliminary results for manual audits, rather than to create a series of ineffective alarms that require repeated scrutiny。
2.1 Refinement from the audit field: the construction mechanism of the Skill Rules
Beosin ' s Skill knowledge base, which is derived from more than 4000 smart contract projects that have completed manual audits, has been extensively summarized, summarized and collated by audit experts on an article-by-article basis. The formation of each rule completes the entire process from the discovery of a loophole to the landing of the rule: Once the auditors have identified the security issues in the real project, they will complete the route of the attack, conduct an in-depth analysis of the underlying causes, verify the effectiveness of the rehabilitation programme, and eventually organize this whole set of defensive perceptions into a rule entry with context-based criteria to be included in the Skill Library for subsequent audit call。
The following is a sample of one of the rules in the Skill library, which contains a pattern of holes, the path of the attack, the underlying cause and the structure of the proposed four dimensions:
[Beosin-AMM Skill-1] add liquidity check bypassed by transfer order
Hole mode:The contract determines whether the WBNB balance in Pair exceeds the reserve amount (balanceof >=reserve+required) to be an added liquidity operation. This test relies on the assumption that WBNB predated the token to Pair, but the addLiquidityETH function of Router is fixed to switch the ERC-20 token to WETH, and the order of transfer of the addLiquidity function is determined by the order of the parameters。
Attack route:The attackers had to use ddLiquidityETH (coin fixed first) or call ddLiquidity (Token, WBNB, ...) to transfer Token to Pair before WBNB. WBNB has not yet arrived at the time of testing, baranceof=reserve, the detection function returns false, thus completely bypassing the 'no add freedom' limit。
Causes:The method of testing based on the Pair balance snapshot, which is not technically reliable at the design level, divides swap and add liquidity, is structurally flawed rather than achieving Bug。
Recommendations for rehabilitation:Replaced by a ban on direct transfers from non-white list addresses to Pair, all transactions are performed through contractual built-in functions, removing the fundamental deficiencies of the balance snapshot detection at the structural level。
The rule is not a simple illustration of a single code model, but a systematic combo of a type of attack: how the trigger conditions are constituted, how the attackers bypass detection, where the detection mechanism is structured, and at what level the repair intervention is required。
2.2 Coverage of the knowledge base
Beosin has now created a dedicated skill loophole covering the Web3 main technology warehouseIncluding major categories such as Solidity, Rust, Motoko, FunC, Go and ZKI don't know. Its core content, which is not publicly available as an internal core asset, is structured as follows:

Skills under each repository are managed separately according to the type of loophole, and each rule contains the numbering, the trigger conditions, the reduction of the route of the attack, the logic of context determination and the proposal for repair. The entire Skillku will continue to evolve with the emergence of each new type of attack and the accumulation of audit examples to ensure that it remains in step with the real threat environment in the chain。
2.3 Comparison of quality of baseline checks after Skill intervention
To quantify the actual impact of the Skill library on the quality of the baseline scan, we use the two test cases in chapter II as a benchmark to operate the generic AI and Skill enhancements on the same code library, respectively, and to compare the results by item。
CASE A . COMPARISON OF STANDARD TOKEN CONTRACTS (BEP-20):

Case B. Comparison of complex business contracts (IPC Protocol):

The comparison showed that the quality of testing for both types of contracts had improved significantly with the introduction of and Skill. In the standard currency contract scenario, high-risk false reporting was eliminated as a result of the inclusion of business language judgement capabilities; in the complex business contract scenario, coverage of the known loophole model increased from 11 per cent to 44 per cent, the rate of misreporting dropped from about 55 per cent to about 30 per cent, and the accuracy of serious grade judgement improved significantly. The report could serve as a baseline check to help project parties understand the deficiencies in the code in advance. While these issues will not result in direct financial losses for the time being, they will continue to have an important positive impact on the maintenance and upgrading of subsequent projects。
However, the data also clearly reveal the inherent boundaries of AI capabilities:Even with Skill's enhancement, the coverage of High's gap in complex contracts is only 44%I don't know. Deep gaps that require cross-contract path reasoning, analysis of economic incentives models, or specific time-series conditions to trigger remain well beyond the capabilities of the AI baseline scan. This is the underlying reason why, following the introduction of the Skill enhancements, the full manual audit chain remains in the audit process。
2.4 White paper as audit input: code alignment with design intent Verification
In addition to the loophole matrix, we have added an important capacity to the audit process: Use the project white paper as an additional input for & nbsp; AICompatibility between code achievement and white paper design validatedI don't know。
SPECIFICALLY, PRIOR TO THE START OF THE CODE AUDIT, AI SYSTEMATICALLY INTERPRETS THE PROJECT ' S WHITE PAPER, TECHNICAL SPECIFICATIONS AND DEMAND FILES, EXTRACTS FROM THEM ROLE PERMISSION MODELS, CORE BUSINESS PROCESSES, THE DEFINITION OF A TRUSTED BOUNDARY AND EXPECTED BEHAVIOURAL CONSTRAINTS, AND FORMS A STRUCTURED PROJECT NARRATIVE SUMMARY. THEN, THROUGHOUT THE CODE AUDIT, AI CONTINUALLY CROSS-REFERENCED WITH THIS CONTEXT. THIS MECHANISM HAS PRODUCED TWO VALUABLE OUTCOMES IN ITS PRACTICAL USE:
FIRST, WITH REGARD TO THE STRUCTURE OF AUTHORITY IN THE CODE THAT APPEARS TO CONTAIN RISKS, AI WILL ADJUST ITS JUDGEMENT ACCORDINGLY IF ITS DESIGN INTENTIONS AND CONSTRAINTS ARE CLEARLY STATED IN THE WHITE PAPER, THEREBY EFFECTIVELY REDUCING SUCH MISSTATEMENTS。
SECOND, IF THE CODE FULFILS A CLEAR DEVIATION FROM WHITE PAPER COMMITMENTS, SUCH AS THAT THE SLIP POINT PROTECTION MECHANISM CLAIMED IN THE DOCUMENT IS NOT ACHIEVED IN THE CODE, OR IF THE TIME WINDOW CONSTRAINTS OF THE GOVERNANCE PROCESS ARE NOT CORRECTLY IMPLEMENTED, AI WILL ISSUE A WARNING ACCORDINGLY. THE LACK OF CONSISTENCY BETWEEN SUCH CODES AND DOCUMENTS IS EASILY OVERLOOKED IN CONVENTIONAL CODE SCANS, BUT IS OFTEN A POTENTIAL SECURITY HAZARD, WHILE HELPING THE PROJECTOR TO AVOID, TO THE EXTENT POSSIBLE, BEHAVIOUR THAT IS NOT IN LINE WITH ITS EXPECTATIONS。
III. Triple audit model: complete assurance of coordinated construction of smart contract security
Once smart contracts are deployed, the cost of any loophole is often irreversible. Beosin uses manual in-depth audit + validation as the basis for contractual audit, focusing on issues that may have led to financial losses or logical anomalies. At the same time, we have introduced an enhanced AI baseline check based on the exclusive Skill knowledge base, which helps clients to detect more early code problems that are currently only flawed and do not cause actual harm. On this basisBeosin built a manual in-depth audit + formalization validation + enhanced AI baseline inspection three audit model to develop a more comprehensive security system with three layers of collaborationI don't know。
3.1 Manual in-depth audit and formalization validation: core pillars of security and safety
Manually auditedThe core advantages are a deep understanding of the overall design of the agreement and a proactive analysis of potential risks from the point of view of the attackersI don't know. Experienced audit specialists are responsible for comprehensive agreement-level audits of projects, including validation of cross-contract interactive logic, face-to-face analysis of financial security, logical analysis of agreements under extreme market conditions and identification and judgement of new types of attack. This agreement-level defensive understanding, which is highly dependent on long-term accumulation and operational experience of the Web3 ecology, is not currently achievable independently at the tool level。
On this basis, Beosin translated the findings of the manual audit into quantifiable mathematical assurance through an internal tool chain. In response to the core business logic identified by audit experts, such as the critical path to the highest risks, such as the flow of funds, price calculations, Beosin has integrated LLM-driven formalization into an internal certification tool chainBUILDS A CLOSED LOOP ENGINE CALLED "AI CODE GENERATION QUIZ FORMALIZATION, VALIDATION AND BACK-DRIVING PRECISION"I don't know. The tool chain starts with the Beosin accumulated audit language library as the knowledge base, with an attack face modelling of artificially identified high-risk paths, supporting initial candidate sets for formalization of non-variant and safety attribute norms; and then, the automated formalization certification engine provides exhaustive validation of the integrity of the contract. When the validation engine finds an exception, the system automatically distinguishes between two types of situations: If the exception arises from a deviation between the normative definition and the operational syntax, the reverse context of the AI module is refined to drive the next round of iteratives; if the reverse corresponds to the real available path of the contract code, it is directly exported as evidence of a loophole, followed by a complete path of attack, for confirmation and subsequent repair by audit experts. Both paths drive a condensation of closed loops until the mathematical confirmation of target properties is established for all possible inputs. The critical path verified by the closed loop mechanism constitutes the most definitive line of defence in the overall contractual security system, reducing the impact surface to a very narrow range。
3.2 ENHANCED AI BASELINE INSPECTION: ONGOING RISK ALERT SERVICES FOR DEVELOPERS
In the meantimeBeosin will also provide enhanced AI baseline checks based on the Skill knowledge base as a stand-alone service to clientsI don't know. Unlike the manual in-depth audit focused on identifying high-risk gaps, the service is located closer to a code health report for development teams。AI BASELINE SCAN WILL PROVIDE FULL COVERAGE OF CONTRACT CODESThe potential problems that need to be addressed by developers in the subsequent maintenance and iterative process of the project, which do not currently result in direct economic loss, are systematically addressed. Examples include the use of outdated dependencies, missing critical event statements, state variables that are not in line with best practice exposure, and , which can be further optimized; and Gas use models. While these issues will not normally be directly exploited by the attackers under the current operational logic, some of them may evolve into real security risks as agreement functions expand, code re-engineering or external reliance is updated. The three levels are focused, step by step, and work together on a complete safeguards system for the security of the Web3 project。
