The first network to think like a human! Nature: AI simulates human perception decision-making
In terms of capabilities, the current AI professionalism has surpassed that of humans in many aspects.
However, we still retain some "sacred" qualities.
For example, the efficiency of the human brain is very high, a bowl of rice can provide half a day's computing power, and a chicken leg can output a lot of tokens.
For example, our soul and emotions will also produce behaviors that transcend common sense while rational cognition.
As for whether the ultimate superintelligence needs to learn these mysterious characteristics of humans, you may only know after trying.
- Little AI, do you want to improve? Let's imitate me first.
Recently, researchers from the Georgia Institute of Technology have developed the first neural network that is similar to the way humans think, RTNet.
The decision-making behavior of traditional neural networks is significantly different from that of humans.
Taking CNN image classification as an example, no matter whether the input image seems simple or complex, the computation amount of the network is fixed, and the same input must get the same output.
Humans, on the other hand, tend to be quick at easy questions, but occasionally make a low-level mistake if they are careless.
The new RTNet is capable of simulating human perceptual behavior, generating random decisions and human-like response time (RT) distributions.
The internal mechanism of RTNet is closer to the real mechanism of human RT generation, and its core assumption is that RT is generated by a process of sequential sampling and result accumulation.
The following diagram shows the network structure of RTNet, which is divided into two phases:
In the first stage, the Alexnet architecture is adopted, but the weight parameters are in the form of BNN, which is different from the general neural network with definite weights, and BNN learns distributions during training.
At each inference, BNN randomly samples the weights used this time from the learned distribution, thus introducing randomness.
The second stage is an additive process, taking the classification task as an example, a threshold is set in advance, and the results of each inference are accumulated to the respective classification, until a certain class reaches the threshold, and the inference is stopped.
It can be seen that RTNet simulates at least two characteristics of human decision-making in principle: the first is the randomness introduced by BNN, and the second is that there are different completion times (RT) for different difficulty tasks, because simpler images can accumulate to the threshold with fewer inferences.
Through comprehensive testing, the authors have also shown that RTNet replicates all the essential characteristics of human accuracy, RT, and confidence, and does it better than all current alternatives.
Mimic human-perceptual decision-making
There are six essential characteristics of human perception decision-making:
1) Human decision-making is random, meaning that the same stimulus can elicit different responses in different trials2) Increasing velocity pressure shortens RT but decreases accuracy (SAT)3) More difficult decisions lead to reduced accuracy and longer RT4) The RT distribution is right-skewed, and this skew increases with increasing difficulty of the task5) The RT of the correct trial is lower than that of the wrong trial6) The confidence of the correct trial is higher than that of the wrong trial
At present, relatively little work has been done on the extent to which existing image computable models can reproduce all human behavioral characteristics.
In this paper, the authors have selected the most advanced neural networks in this area: CNet, BLNet, and MSDNet, as the comparison objects for RTNet.
Design of Experiments
Human control group
Sixty participants were selected to perform a number discrimination task, reporting their perceived numbers and assessing their confidence in decision-making.
At the beginning of each trial, participants looked at a small white cross for 500-1,000 milliseconds, followed by an image that needed to be discerned for 300 milliseconds.
The digital images are derived from the MNIST dataset and use numbers between 1 and 8 with varying degrees of noise overlay.
Participants report perceived numbers using a computer keyboard, placing four fingers of their left hand on numbers 1-4 and four fingers of their right hand on numbers 5-8. This allows participants to react without looking at the keyboard, reducing additional distractions.
The experiment consists of tests on the SAT and the difficulty of different tasks.
The SAT test requires participants to focus on their reaction speed or accuracy, and alternates between speed and accuracy tests in experiments.
Change the difficulty of the task by adding varying degrees of uniform noise to the image. Easy tasks contain an average uniform noise of 0.25 (on a scale of 0-0.5), while difficult tasks contain a uniform noise of 0.4 (on a range of 0-0.8). (PS: The relative image pixel value is between 0 and 1)
In addition, in order to adapt to the test, the human group also participated in the training phase, which was divided into three parts: noise-free, focusing on accuracy, and focusing on speed, with 50 training sessions in each part.
The test phase consists of 960 experiments divided into four rounds, integrating SAT conditions as well as different levels of difficulty.
RTNet
RTNet uses the Alexnet architecture for two reasons: one is to match the other networks in the experiment, which is too small to suffer.
On the other hand, RTNet's BNN is difficult to train, which restricts the model from being too large. All things considered, Alexnet is more suitable.
In BNN, weights are modeled as probability distributions rather than point estimates. According to the Bayesian inference rule, the posterior distribution of the weight w can be inferred using the following formula:
However, this kind of computation is difficult to accomplish for large networks, so calculating this posterior distribution is usually approximated using variational inference.
Specify an alternative distribution q (w) to approximate the posterior and adjust its parameters to maximize the similarity between the two distributions, which is quantified by the KL divergence:
However, since p(x) is difficult to compute, this calculation can be bypassed by defining a lower limit of evidence (ELBO) function to proxy the objective function:
The researchers trained RTNet's BNN module on a total of 15 epochs with a batch size of 500, achieving a classification accuracy of greater than 97% on the MNIST test set.
The authors trained 60 RTNet instances to benchmark 60 human subjects using a groupwork of 60 mean variances initialized by the authors, and similarly, the other networks described below used a similar method (random seed) to generate 60 instances each.
CNet
CNet is built on the architecture of a residual network (ResNet) that utilizes skip connections to introduce propagation delays during input processing.
At each processing step, all cells in all layers are updated in parallel. However, due to the propagation delay introduced by each residual block, simpler perceptual features are transmitted faster between blocks.
In general, the residual block t requires t−1 time step to receive a complete and stable input. At any point in the processing process, the network can generate predictions.
However, if the time step t is less than the number of residual blocks, the response will be based on the unstable representation in the higher blocks.
BLNet
BLNet is an RCNN consisting of a standard feedforward CNN and cyclic connections that connect each layer to itself, and the final readout layer calculates the network output for each time step via the softmax function.
At each time step, a given layer receives input from two sources: a feedforward input from the previous convolutional layer and a cyclic input from itself.
If the current calculation exceeds a predefined threshold, the network generates a response.
MSDNet
The architecture of MSDNet is similar to that of a standard feedforward neural network, but with an early exit classifier after each layer.
At each output layer, the softmax function is used to calculate the result for each selection, and if the result of any one scenario exceeds a predefined value, the network will stop processing and immediately produce a response.
Experimental results
Figures a – e below represent the randomness of decisions made by humans, RTNet, CNet, BLNet, and MSDNet, respectively. Warm colors indicate that the reaction given twice when the image is rendered twice, while cool colors indicate that the reaction given is different when the image is presented twice.
Humans and RTNets exhibit random decision-making, with randomness increasing with task difficulty and speed pressure. However, the decisions of CNet, BLNet, and MSDNet are completely deterministic.
The following diagram illustrates the behavior of the human participants and the model:
The RT of a human is measured in seconds, and the RT of a neural network is measured in terms of the number of inferences consumed (RTNet), the number of propagation steps (CNet), the number of feedforward scans (BLNet), and the number of layers (MSDNet).
All models were able to replicate the SAT observed in humans. However, the SAT has a much stronger effect on humans, RTNet, and BLNet than other models, and the individual RT distributions show a clear separation between speed and accuracy focus conditions.
Overall, the RT distribution generated by RTNet better reflects the patterns observed in human data than all other networks.
It is important to note that CNet, BLNet, and MSDNet can only produce different RTs that are less than or equal to their number of layers or residual blocks, in contrast to which RTNet can process any number of samples, regardless of the number of layers in their architecture.
The figure above illustrates the correlation of accuracy, RT, and confidence separately for each participant's human data and the graph-by-graph correlation between each model for all experimental conditions.
For each measurement, RTNet has a stronger correlation than CNet, BLNet, or MSDNet. In all cases, RTNet's predictions are fairly close to the noise ceiling.
discuss
Relationship to cognitive models
The traditional cognitive model of decision-making is often referred to as the sequential sampling model.
RTNet is conceptually more similar to a subgroup of ordinal sampling models, called ethnic models: each choice has its own accumulation system, and the evidence for each choice is accumulated in parallel.
RTNet has two important advantages over traditional cognitive models. First, RTNet is image-computable and can be applied to real-world images, while traditional models cannot.
Second, traditional cognitive models can't naturally capture the relationships between different choices, whereas RTNet learns all the relationships between choices during training its core BNN.
Biological feasibility
Physiological records reveal several features of human visual system processing:
First, conduction from one area of the visual cortex to another takes about 10 milliseconds, and the signal from the photoreceptors reaches the top of the visual hierarchy in the inferior temporal cortex within 70-100 milliseconds. Therefore, a scan from input to output in a pure feedforward network should be within a few hundred milliseconds.
Second, neurons in each layer of the visual cortex continue to excite action potentials for a few hundred milliseconds after the start of stimulation and receive intense cyclic input from the later processing layer.
Finally, neuronal processing is noisy, i.e., the same image input will produce very different neuronal activations in different trials.
From the above introduction, it can be seen that RTNet basically conforms to the biological characteristics of human vision.
This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com