Reflections on Embracing Shadows: The Dance of Regulation and Disruption in the Age of AI and Cryptocurrency
AI and blockchain are inevitably intertwined.
In fact, this has been evident since day one with the application for ChatGPT accounts, where many people used blockchain to create a virtual credit card for binding and recharging. As an AI user, I hope to receive high-quality responses and private data processing. Whether the data is processed locally on my computer or in the cloud, whether I buy my own graphics card or pay someone else, it all comes down to the right price.
If I buy my own graphics card, of course, I hope that it can help others compute things and earn some extra money when I'm not using it. This is like the Grid project back in the day. Therefore, the computational power needed by AI should be decentralized, rather than concentrated on a company's servers. As for how to charge and collect fees, existing technology has already provided the answer - smart contracts built on blockchain.
On the flip side, looking at OpenAI, various major GPU cloud service providers, serverless, or edge function, the action of computing the next token of GPT is quite similar to mining: using a GPU to calculate a sequence of given length based on input. So, now or soon, there will be "AI mines," perhaps with a different name, emphasizing "continuous generation," such as "AI mineral springs."
Of course, there are still some technical difficulties here.
- Can I encrypt my data, give it to GPT/LLM for calculation, get the results back, decrypt it and still get high-quality output?
- Can I encrypt my data, split it to multiple GPT/LLM terminals for calculation, get the results back, decrypt them separately, then merge them and still get high-quality output?
I think these should be issues related to #homomorphic encryption. If the process of encryption -> AI calculation -> decryption can be smoothly completed, then it will be impossible to regulate AI. For such AI, blockchain might truly become an indispensable necessity.
Next, I consulted GPT about whether it would be possible to apply homomorphic encryption on GPT, or Transformer models.
GPT responded as follows:
The Transformer model primarily uses the following during inference:
- Matrix multiplication: This is the most common operation in the Transformer model, used for calculating attention scores, as well as in feed-forward neural networks.
- Addition: Also used in calculating attention scores and in feed-forward neural networks.
- Softmax: Used when calculating attention weights.
- Layer Normalization: Used after each sublayer.
Among these operations, matrix multiplication and addition can be directly performed in a homomorphic encryption environment. However, Softmax and Layer Normalization involve non-linear operations and division, which are more complex in homomorphic encryption. Although there are some methods to approximate these operations in a homomorphic encryption environment, this would increase computational complexity and may affect the accuracy of the results.
I asked: Is it possible to perform operations that can be homomorphically encrypted on remote servers, and operations that are difficult to homomorphically encrypt locally?
GPT responds:
In the Transformer model, the amount of computation for matrix multiplication and addition typically far exceeds that of Softmax and Layer Normalization.
- Matrix Multiplication and Addition: These are the main operations in the Transformer model, including the computation of queries, keys, and values in the self-attention mechanism, and the linear transformations in the feed-forward neural networks. These operations involve a large amount of matrix multiplication and addition, and the computation is very intensive.
- Softmax: In the self-attention mechanism, it is used to calculate attention weights. Although the computational complexity of Softmax is also high, compared to matrix multiplication and addition, its computational load is relatively smaller.
- Layer Normalization: It is an operation to normalize the output of each layer, and its computation, compared to matrix multiplication and addition, is also smaller.
Therefore, in terms of computation, the computation of matrix multiplication and addition far exceeds the others.
In theory, this is feasible. You can perform operations that can be executed in a homomorphic encryption environment (such as matrix multiplication and addition) on a remote server, and perform operations that cannot be directly executed in a homomorphic encryption environment (such as Softmax and Layer Normalization) locally.
However, this method has some potential problems:
- Data Security: Although homomorphic encryption can protect data privacy, it may be necessary to decrypt a part of the data when executing other parts locally, which may expose the data.
- Computational Efficiency: The computational efficiency of homomorphic encryption is usually much lower than that of ordinary operations, which may affect the overall computational efficiency.
- Result Accuracy: Because it is necessary to perform different operations in different environments, this may affect the accuracy of the final result.
Next, Lyric found a project like this on Github: concrete-ml The documentation of this project states that it can perform fully homomorphic encryption inference at 16-bit integer precision. In other words, many algorithms that are simplified after compression for LLM may be able to run under full homomorphic encryption.
The future looks promising!