File similarity detection method based on blockchain
Background and purpose:
Background: With the popularization of the Internet and the reduction of the cost of content plagiarism, original online content faces serious infringement problems. The traditional method of directly comparing file content has limitations and cannot effectively detect plagiarized files that have been replaced with synonyms or plagiarized.
Purpose: To provide a file similarity detection method based on blockchain to improve the accuracy and efficiency of file infringement detection.
Technical implementation:
Blockchain network: The decentralized, open, transparent and tamper-proof characteristics of blockchain are used to ensure the fairness and security of file similarity detection results.
Smart contract: Smart contracts are deployed on the blockchain network to detect the similarity between any file and the target original file. The smart contract contains file similarity detection logic and can be automatically executed when a transaction containing the file to be detected is received.
File vector and distance calculation: The file is divided into units of preset length through an algorithm (such as doc2vec) and the corresponding file vector is generated. The distance between the vector of the file to be detected and the vector of the target file is calculated to determine the similarity.
Detection process:
Receiving transaction: The blockchain node receives the first transaction containing the file to be detected.
Calling smart contract: The node calls the smart contract to execute the file similarity detection logic.
Obtaining the detection result: According to the distance calculation result of the file vector, the similarity detection result of the file to be detected and the target original file is obtained.
Further detection: If similar file units are detected, the second file containing these units is obtained and further similarity detection is performed on them.
Similarity calculation:
Based on the ratio of the sum of the content of similar file units to the total content of the second file, or the ratio of the sum of the content of similar file units to the total content of the target original file, the similarity between the second file and the target original file is calculated.
When the similarity is greater than the preset threshold, it is considered that the second file may constitute infringement, and a deposit transaction is sent to the blockchain to record evidence.
System and application:
System composition: including one or more computers and computer memory devices that are interoperably coupled with them, storing instructions for executing file similarity detection operations.
Application scenario: Applicable to any scenario that requires file similarity detection, such as copyright protection, academic paper duplication detection, etc.
Advantages:
The accuracy and efficiency of file similarity detection are improved, especially for plagiarized files that have been replaced with synonyms or plagiarized.
The immutable characteristics of blockchain ensure the fairness and security of the detection results, avoiding the possible risks of human intervention in traditional detection methods.
A fast process from detection to evidence storage is realized, reducing the chances of infringers denying or eliminating evidence.
This document is a specification for a file similarity detection method, system and non-transient computer-readable medium based on blockchain. The following is the answer to the short answer question:
What is the main purpose?
Answer: The main purpose is to provide a file similarity detection method based on blockchain to solve the limitations of traditional file similarity detection methods when facing plagiarized files after synonym replacement or plagiarism, and improve the accuracy and efficiency of file infringement detection.
What role does blockchain play in file similarity detection?
Answer: Blockchain plays a decentralized, open, transparent and immutable role in file similarity detection. Using these characteristics of blockchain, the fairness and security of file similarity detection results can be ensured, avoiding the possible risks of human intervention in centralized detection systems.
What is a smart contract and how is it used in files?
Answer: A smart contract is a contract that can be triggered by transactions on the blockchain and is defined in code. In the file, the smart contract is deployed on the blockchain network to detect the similarity between any file and the target original file. The smart contract contains file similarity detection logic and can be automatically executed when a transaction containing the file to be detected is received.
How is the file vector generated and what is its role?
Answer: File vectors are divided into units of preset lengths by algorithms (such as doc2vec) and corresponding vectors are generated for each unit. File vectors play a key role in file similarity detection. By calculating the distance between the vector of the file to be detected and the vector of the target file, the similarity between the files can be determined.
What is the specific process of file similarity detection?
Answer: The specific process of file similarity detection includes receiving a first transaction containing the file to be detected, calling a smart contract to execute the file similarity detection logic, and obtaining a similarity detection result based on the distance calculation result of the file vector. If similar file units are detected, a second file containing these units is further obtained, and a similarity test is performed, and finally the similarity between the second file and the target original file is calculated.
How to determine whether the second file constitutes infringement?
A: The similarity between the second file and the target original file is obtained by calculating the ratio of the sum of the content of similar file units in the second file to the entire content of the second file, or the ratio of the sum of the content of similar file units to the entire content of the target original file. When the similarity is greater than the preset threshold, it is considered that the second file may constitute infringement, and a proof transaction is sent to the blockchain to record the evidence.