Although I haven't fully understood it yet, I have a feeling that this thing is far more advanced than the mainstream artificial intelligence frameworks currently used on Earth.
The main problem is that this thing is too specialized, and highly specialized things are destined to be out of reach of the mainstream population.
Even professional algorithm engineers need a period of time to study and master the techniques.
Most of the online discussions revolved around concerns about whether they would lose their jobs.
"It feels like Merlin isn't a magician, but a programmer, trying every possible way to promote the large-scale implementation of artificial intelligence."
First, they used artificial intelligence to replace human labor in Singapore, and then they released an open-source AI programming language and framework.
I seriously suspect that a magician's use of magic is somewhat similar to programming.
"Merlin, please stop your magic! I finally found a job, and you're not going to fire me again, are you? I don't want to graduate from Weibo!"
This is the job I finally found after a long and difficult process.
The person who posted the following Weibo post is a pornography reviewer.
This profession does exist, and the pay is quite good.
However, for him, it was a self-fulfilling prophecy.
After Merlin released its AI language, along with its supporting framework and toolkit, these major internet companies told their algorithm engineers to put down all their work:
"The only task for everyone during this period is to learn how to use the M language, the M framework, and the series of packages that come with it."
Then, based on our work, let's see where we can rewrite using the M language.
The first application of AI was in detecting pornography.
AI-based content moderation has always been a core requirement for content security, and basically everyone from NetEase and Tencent to Weibo and ByteDance is researching this.
It's a classic example of something that's easy to get started with but difficult to master.
Early pornography review was mostly done manually, which was a labor-intensive job.
As the number of internet users increased and the amount of content grew, the cost of manual review became increasingly high. Therefore, the combination of AI and human review became the mainstream approach for pornography detection.
The AI + human approach typically involves first filtering out most of the images that are definitely normal and those that are definitely problematic using the machine, and then handing the rest over to humans for review. This can significantly reduce labor costs, and the better the machine recognition results, the lower the cost of human review.
AI-based content moderation is a broad concept. It can be implemented through a rule-based system, such as setting up a blacklist database based on MD5 or user IP information, and directly blocking content based on the rules.
Most of them still use algorithmic models, which means using algorithmic models to determine whether an image contains sq information, which is essentially image recognition.
Image recognition has now surpassed human performance in some tasks.
The most common type of image recognition algorithm is image classification, from Alex to VGG, and from Res.
Current image classification algorithms can distinguish 1,000 classes of image data relatively accurately. Since content moderation also involves classifying input images, it is only natural to adopt image classification algorithms.
Moreover, object detection algorithms can be used to detect dew points in SQ images, which is a relatively reliable method.
In addition, there are features and logics constructed based on the business level, such as whether there is a person or the area of skin, which are used to assist in the judgment and are indeed effective in some cases.
The main difficulty in AI-based pornography detection lies in soft pornography that doesn't show nudity, pornography with subtle features, non-standard pornography, and cartoon/anime pornography, etc.
This is the challenge of AI-powered image content moderation; the challenges of moderation for video and audio content are even greater.
Moreover, for these major internet companies, even if they can achieve a 99% interception rate, the remaining 1% of content is still a significant amount.
Taking Weibo as an example, the data generated every day is measured in terabytes (TB).
Even with tens of terabytes of interception, a one percent error rate is enough to give Lai Zong a hard time.
More importantly, China's content moderation industry is not limited to pornography detection; major internet companies have been doing OCR review for a long time.
That's enough.
Similarly, a 99% interception rate is unacceptable to them.
So even in 2031, the method of pornography detection will still be a combination of AI and human review.
It's just that in 2021, a platform like Weibo might need thousands of content moderators, but by 2031, only a few hundred will be needed.
Four-digit numbers become three-digit numbers.
As for WeChat, you do it once, then submit it to the backend review interface.
The algorithm uses a weighting system to determine if you have violated any rules. Once a specific rule is triggered, your weight will be adjusted accordingly. If your weight exceeds a threshold, you will be subject to close monitoring.
Someone will review your application.
Of course, this kind of review is not limited to large domestic companies; Facebook, Instagram, YouTube, Google, and Twitter all have large human review teams.
Their review team is located in the Philippines.
In 2018, PBS released a documentary about this matter.
For internet giants both domestically and internationally, they can achieve near 100% accuracy in text recognition, but in image classification tests, they can only achieve around 98%.
Moreover, it has extremely high requirements for computing power, making it completely unusable in actual production environments.
This is the result of Image's annual image classification test competition. In actual operation, image and video recognition is much more difficult than Image's competition.
After Zheng Li publicly revealed the AI algorithm, he created a model using the M language.
The deployment and use of this AI model broke through the understanding of these internet companies. The computing power required for an AI model that can achieve a 99.9% success rate in content recognition is about the same as before.
This chapter is not finished yet. Please click on the next page to continue reading the exciting content!
Continue read on readnovelmtl.com