Wednesday, February 5, 2020

Making it easier to discover datasets

https://datasetsearch.research.google.com/

Google has made it easier to discover thousands of data repositories on the web, providing access to millions of datasets. Great help to AI, ML, NLP devs!

here is the release note -https://www.blog.google/products/search/making-it-easier-discover-datasets/

Saturday, February 1, 2020

How can WhatsApp control fake news?

India is WhatsApp's largest market in terms of net users with 400 million monthly users according to July 2019 company figures. That equates to more than one-quarter (26.7%) of its total reported user base. India is such a big market for WhatsApp which can be assessed by the fact that the second-largest market for the company is Brazil with 120 million users. In terms of user penetration among smartphone users too, India is among the top three markets for WhatsApp with more than 90% of smartphone users in India using WhatsApp.


Fake news is something that has infested WhatsApp with conspiracy theories, anti-vaccination misinformation and panicked rumours about child abductors that have even led to fatal lynching in some parts of India. WhatsApp's previous attempts to contain fake news included steps like limiting the number of times a message can be forwarded to five and a visual indicator to indicate that it has been forwarded. However, limits on the forwards slows the spread of fake news but doesn’t curtail it and if media reports are to be believed there are software tools for as cheap as Rs 1,000 that let you bypass WhatsApp’s forward restrictions.

So what can WhatsApp do? The underlying solution lies in the form of a mix of AI and NLP. NLP or Natural Language Processing can help identify the text that the user has shared in his/her message and  AI or Artificial Intelligence can be used to match it to an offline database of fake news to infer if a particular shared message is fake or not. For images, auto text extraction from images and for videos, frame by frame analysis and speech to text will have to be used. As I understand, using 'CheckPoint Tipline' that was initiated by WhatsApp in April 2019 and discontinued later in the year, the company has already crowdsourced (collected) enough data about fake news and probably has the offline database (technically a model in AI) ready.

With an AI model in place, WhatsApp can thus provide automatic checking of all messages before it reaches you and notify a user that received message is fake. User can also be provided an option to report a suspicious message as fake which behind the scene sends a message to WhatsApp for verification and let them add it to their database post verification.


In my suggestion, I emphasize notifying a user instead of auto-removal for two reasons -

  • Results from a study show that participants who were exposed to a correction of any kind were significantly less likely to believe the false information posted by the first user, relative to those who do not receive a correction. 
  • Users should be given the power to know and understand which messages were fake and at the same time provide them an opportunity to refute the WhatsApp claim.


There are quite a few challenges in this approach, the biggest of which pertains to the fact that messages between the sender and receiver are encrypted and even WhatsApp cannot see the content of the messages. This calls for change in encryption policy and while fake messages can be filtered, it opens up WhatsApp to government agencies who might want to sniff on your messages. Also, any learned AI model would have to be constantly adapted to the new strategies and techniques of disinformation. Further, the model will have to be sufficiently trained over a wide range of languages as messages on WhatsApp are shared across various languages.