Random Thoughts

Saturday, May 8, 2021

Book - Midnight Library

Just completed reading Midnight Library. The central theme of the book is that while we may regret in life for not having made a different choice that could have led to a different life, we all have a unique purpose and reason to be whatever we are in life and whatever we do. Sometimes, things that look most ordinary might end up being the things that lead one to victory. There is no way of life that can immunize against sadness and that sadness is intrinsically part of the fabric of happiness.

5/5 recommended

Sunday, May 24, 2020

Blockchain simplified

In this article, I will attempt to give a high-level overview of Blockchain, some current and potential use-cases, and challenges with Blockchain in simple words.

What is Blockchain?

A blockchain is simply a ledger (one which records transactions), which is decentralized (that is, instead of one centralized server, exact copy stored on multiple nodes) and immutable (that is, one which cannot be altered). Blockchain makes this possible by using strong cryptographic encryption, linking/chaining of blocks, consensus protocol, and others. (In the interest of keeping this simple and high level, will skip details of all those things for now). The ledger itself can also be programmed to trigger transactions automatically.

So, what makes Blockchain useful?

Decentralization and immutability are the biggest technical benefits which translate into two big benefits from a business perspective -

Trust with no dependence on a central authority
Transparency with publicly verified transactions

Trust allows individuals, organizations, machines, and algorithms to freely transact and interact with one another with little friction. Transparency allows us to avoid intermediaries like lawyers, brokers, and bankers.

In the next section, we will look at some of the use cases to understand these benefits in more detail.

Blockchain use cases

Financial currency - Bitcoin is probably the most known use case of Blockchain, one that makes people believe that Bitcoin and Blockchain are interchangeable terms. Bitcoin is a digital currency that people can use to buy and pay others in the network without the need for intermediaries and central authority (like the central bank of each country). The Bitcoin transactions are stored in blocks and publicly verifiable (the entity sending/receiving coins is anonymous). Anyone can create a block (the technical term is 'mining' a block) and earns by keeping a part of Bitcoin with itself for putting the transactions on the block.
Social Media- Steem is a blockchain-powered social media platform (think of something like Facebook) where users are rewarded (in Steem currency) for writing content and/or liking/commenting. It promises control of your own content, no sale of user data, and free speech. The content (posts, comments, likes) is stored on the blocks and the company earns from people/companies wanting to share promoted content on the platform.
Cloud storage- Sia is the decentralized cloud storage platform that leverages blockchain technology to create a data storage marketplace where people can buy and pay rent for available spare hard disks on networked machines. The blocks here store transactions between the host and the renter and the company earns a small commission on every rent amount. The rented disk space offered by the platform claims to be cheaper than public cloud storage companies and lower risk because of decentralization.
Supply Chain- Provenance offers a platform to combat counterfeiting by providing stakeholders with a transparent, secure, and highly accurate audit trail of any item in the entire supply chain. Here, physical goods are fitted with tamper-proof RFID tags, holograms, and QR codes that get scanned through each stage of the supply chain, all of the information is recorded on a blockchain. The company charges a fixed monthly fee for on-boarding and maintaining the blockchain infrastructure.
Academic research - A blockchain-based system for research data could prevent data manipulation by providing a complete, transparent audit trail of all data that is collected, processed, and accessed by researchers. Any modifications made to research data would require at least 51% consensus from stakeholders and would be visible to everyone- ensuring high data quality and preventing individuals from acting dishonestly.
Internet of Things (IoT)- Blockchain can be used to record the sensor data information shared by various IoT devices which power smart-homes, autonomous vehicles, supply chain, automotive industry, etc. Blockchain solves the need around interoperability amongst devices of various organizations, provides a scalable solution against a centralized database, and provides additional security and audit of the data exchange between multiple devices.

As can be seen, Blockchain solves a wide variety of use cases. However, Blockchain is not for all. Next section, we briefly explore when Blockchain is not for you.

When would Blockchain not be for your use case?

If your system doesn't need transparency
It needs to be pretty fast
It's not a transaction-based system

Finally, we look at what the future holds for Blockchain.

Future of Blockchain

I found this inference from HBR on Blockchain pretty apt and therefore copying it as it is - "True blockchain-led transformation of business and government, we believe, is still many years away. That’s because blockchain is not a disruptive technology, which can attack a traditional business model with a lower-cost solution and overtake incumbent firms quickly. Blockchain is a foundational technology: It has the potential to create new foundations for our economic and social systems. But while the impact will be enormous, it will take decades for blockchain to seep into our economic and social infrastructure."

Sunday, April 26, 2020

Kafka for Business Professionals

If you are in the IT industry, I am sure you would have already heard about Kafka. There are a lot of articles around how Kafka works from a technical standpoint but very few on exactly what is the need and what use cases it serves. This article is my humble attempt at later.

Overview: Kafka as Message Queue

Let's first understand that Kafka is a message queue. What this means is that Message-Producer publishes a message onto Kafka queue from which a Message-Consumer consumes that message. Now, why is a message queue needed in the first place? The answer is that it acts as an intermediate communication layer that helps various modules aka service decouples from each other. This is the basis of a microservices-based architecture.

Kafka's Advantages

Next, let us understand how Kafka founders viewed data which is fundamental to why such a platform was created in the first place. They believed that instead of focusing on piles of data in relational databases, cache, key-value pairs, all of which are static, focus on data in real-time, as and when its captured. Let's understand this by figuring out what all data is generated for one user session on Netflix. Data is generated by one or more events corresponding to various user activities viz. when a user logs in, browsing different genre options, watching a preview, selecting and playing a movie, and then pausing it and then resuming it after a while. Now across all these generated events, appropriate actions like recommendations, resuming it on another device, have to be generated and fed back to the user in real-time. The 'real-time' in this example is where Kafka fits in. Kafka allows for events to be published and consumed by various services with high throughput (million records per second) and low latency (less than 20 milliseconds) and thus cater to high data being generated in today's systems.

Kafka wasn’t the first one in the market with this idea. We had JMS, RabbitMQ, and AMQP but what worked in favor of Kafka was higher throughput, reliability, and replication characteristics suited for today's real-time logging and analytics requirements. RabbitMQ can also process a million messages per second but requires a big cluster (30+ nodes) for in-memory operations and thus is not suitable from a hardware perspective.

Another advantage of Kafka is that it allows for on-the-fly horizontal scaling and is fault tolerance. As compared to traditional systems that are limited on scalability because of hardware limits and downtime to add new hardware, on Kafka, adding a new machine does not require downtime nor are there any limits to the number of machines you can have in your cluster. For fault tolerance, in a lot of non-distributed systems, there is a single point of failure. In Kafka, on the other hand, in a 3 node cluster, you can continue to work even if two nodes go down.

Usage

Coming onto usage in the market, according to HG insights[1], approx. 20,000 companies use Kafka including LinkedIn, Spotify, Uber, JP Morgan Chase, New York Times, Shopify, Cisco, CloudFlare, and Netflix. Let's look at some of the use cases -

Uber uses Apache Kafka as a message bus for connecting different parts of the ecosystem. They collect system and application logs as well as event data from the rider and driver apps viz. location coordinates of the ride and driver and use this for computing nearest vehicle, exact route taken by vehicle, computing the price, etc. They handle trillion+ (info from 2017) messages per day over tens of thousands of topics.
Netflix which we covered above has ~500 billion events and ~1.3 PB per day generated from video viewing activities, UI activities, Error logs, Performance events, Troubleshooting & diagnostic events
New York Times uses Kafka to connect multiple Content Management Systems, third-party data and wire stories on one side and a range of services and applications like search engines, personalization services, feed generators, as well as all the different front-end applications, like the website and the native apps that need access to this published content on the other side. Whenever an asset is published, it is made available to all these systems with very low latency — this is news, after all — and without data loss.
LinkedIn handles 7 trillion messages per day, divided into 100,000 topics, 7M partitions, stored over 4000 brokers. Kafka is used extensively throughout its software stack, powering use cases like activity tracking, message exchanges, metric gathering.

You can view [1] and [2] for more use cases.

I hope this basic info was useful!

[1] https://discovery.hgdata.com/product/apache-kafka
[2] https://blog.softwaremill.com/who-and-why-uses-apache-kafka-10fd8c781f4d
[3] https://kafka.apache.org/powered-by

Saturday, April 25, 2020

Uber and Airbnb - Both sharing economy, yet so different

Both Uber and Airbnb started in 2007 and defined what is now known as "Sharing Economy". The U.S. Commerce Department in a report [1] in June 2016 attempted to define and map out the contours of this emerging business sector as following :

They use information technology (IT systems), typically available via web-based platforms, such as mobile “apps” on Internet-enabled devices, to facilitate peer-to-peer transactions.
They rely on user-based rating systems for quality control, ensuring a level of trust between consumers and service providers who have not previously met.
They offer the workers who provide services via digital matching platforms flexibility in deciding their typical working hours.
To the extent that tools and assets are necessary to provide a service, digital matching firms rely on the workers using their own.

Both Uber and Airbnb tend to meet the above criteria. Despite this, they are different in two main respects -

The first relates to geographical scalability of marketplace [2] -

Uber’s model relies on hyperlocal network effects, i.e. the addition of a unit of supply (a driver) makes the product more valuable for the demand side (riders) within a small geographic radius. However, when Uber expands to other cities, they have to re-invest in driver acquisition without the benefit of any latent demand. Airbnb’s model, on the other hand, is built on cross-border network effects, i.e. the addition of a unit of supply (a host) makes the product more valuable for the demand side (guests) across geographic boundaries. While Uber faced local competitors like Didi in China, Ola in India, Grab in SE Asia who had replicated the same model in their own market, Airbnb, on the other hand, faced very little competition from other regional startups who had limited supply in cross-geo regions.

The second relates to the commoditization of supply [3] -

Uber’s suppliers are interchangeable (or commoditized), i.e. customers just want a ride and are not particularly sensitive to driver identity or vehicle brand. Riders too are indifferent to wait times below a certain threshold. Airbnb, on the other hand, has differentiated supply, i.e. each unit of supply is unique to some degree across a number of attributes: type of property, quality, nightly rate, location, capacity, etc. Thus, as Airbnb scaled offering choices across each variety, it became more and more difficult for its competitors to match the variety while in the case of Uber, it was much easy to build a similar network of vehicle types and drivers.

These two differences account for why valuation to funding ratio is so different for both these companies.

Credits to Sameeer Singh from Breadcrumb.vc for insights into this.

[1] http://web.archive.org/web/20161121184952/http://esa.gov/reports/digital-matching-firms-new-definition-%E2%80%9Csharing-economy%E2%80%9D-space

[2] https://medium.com/breadcrumbs-guiding-startups/marketplaces-scalability-lessons-from-uber-and-airbnb-d461aded18a2

[3] https://medium.com/breadcrumbs-guiding-startups/defensibility-x-scalability-the-marketplace-matrix-9d8b02a1e6fa

Is Zoom sustainable?

Given the usage that Zoom has seen in Covid infected world, the question is if Zoom's business model is sustainable. This article[1] from Sameer Singh explains that Zoom's core business model is based on virality of its product and not on the network effect. As compared to other video conferencing tools (Skype, Facetime) that need you to be part of a network to enjoy the features, there is no such limitation on Zoom. Zoom meeting host can share the meeting ID and anyone having the link can join. There is no attempt at forming the network in the first place which could have built a high switching cost for its users. Sameer thus feels that Zoom's business is susceptible to being run over by a similar video conference app with better customer experience or technology.

[1] https://medium.com/breadcrumbs-guiding-startups/zooms-achilles-heel-843d78ffedcc

Thursday, March 26, 2020

Technology to aid of Covid spread

Came across interesting uses of technology around Covid spread, surveillance, violations -

1 Using Location tracking via smartphones for Covid surveillance
This one talks about how Taiwan is ensuring people who have been exposed to the virus stay in their homes. The system monitors phone signals to alert police and local officials if those in-home quarantines move away from their address or turn off their phones. Officials also call twice a day to ensure people don’t avoid tracking by leaving their phones at home.

April 1 news mentioned that the system was tracking more than 55,000 people. The system has been very accurate with only about 1% of alerts being false alarms mostly because of inaccurate location readings.

2. Identifying lockdown violations (post-facto analysis)
The second one here is a report that was published to demonstrate how public data and visual AI can be used to identify lockdown violations. Taking actual images and videos from public Instagram profiles of 552,000 Italians between March 11-20, 2020, and applying image recognition technology, they were able to predict what percentage of people were not following quarantine, which city/region they belong to at an aggregate level and exactly where they were spending time (viz. parks, markets, malls). Obviously, the entire data was anonymized in the interest of privacy.

3. Using cough analysis to determine if one is Covid affected
This link talks about using AI and Deep learning to determine if a person has Covid by analyzing the sound of the cough, the way they breathe or the way they speak. It's based on the fact that the cough of a Covid patient is distinct from a healthy person. I also stumbled upon this site https://www.coughagainstcovid.org/ which is collecting data around cough sounds to crowdsource and create such a technology. This initiative is supported by Bill & Melinda Gates Foundation and is in collaboration with Stanford University.

4. Using real-time mobile location data to detect violation of social distancing
Unacast is a company that collects and provides cellphone location data. It has aggregated all this to come up with a Social Distancing Scorecard. This scorecard is based on analysis of information such as two devices being at same place at same time (thus violating social distance), visit to non-essential places (other than grocery), and other parameters. Unacast collects data from various apps installed on phone which track location.

5. Mapping movement of coronavirus carriers
The South Korean government is publishing the movements of people before they were diagnosed with the virus — retracing their steps using tools such as GPS phone tracking, credit card records, surveillance video and old-fashioned personal interviews with patients. The idea is to let the public know, via a central website and regional text messages, if they may have crossed paths with carriers, whose names are not made public. Here is the link to site- https://coronamap.site/

Will keep adding more as I discover.

Wednesday, February 5, 2020

Making it easier to discover datasets

https://datasetsearch.research.google.com/

Google has made it easier to discover thousands of data repositories on the web, providing access to millions of datasets. Great help to AI, ML, NLP devs!

here is the release note -https://www.blog.google/products/search/making-it-easier-discover-datasets/