Random Thoughts: Technology Trends

Showing posts with label Technology Trends. Show all posts

Sunday, April 26, 2020

Kafka for Business Professionals

If you are in the IT industry, I am sure you would have already heard about Kafka. There are a lot of articles around how Kafka works from a technical standpoint but very few on exactly what is the need and what use cases it serves. This article is my humble attempt at later.

Overview: Kafka as Message Queue

Let's first understand that Kafka is a message queue. What this means is that Message-Producer publishes a message onto Kafka queue from which a Message-Consumer consumes that message. Now, why is a message queue needed in the first place? The answer is that it acts as an intermediate communication layer that helps various modules aka service decouples from each other. This is the basis of a microservices-based architecture.

Kafka's Advantages

Next, let us understand how Kafka founders viewed data which is fundamental to why such a platform was created in the first place. They believed that instead of focusing on piles of data in relational databases, cache, key-value pairs, all of which are static, focus on data in real-time, as and when its captured. Let's understand this by figuring out what all data is generated for one user session on Netflix. Data is generated by one or more events corresponding to various user activities viz. when a user logs in, browsing different genre options, watching a preview, selecting and playing a movie, and then pausing it and then resuming it after a while. Now across all these generated events, appropriate actions like recommendations, resuming it on another device, have to be generated and fed back to the user in real-time. The 'real-time' in this example is where Kafka fits in. Kafka allows for events to be published and consumed by various services with high throughput (million records per second) and low latency (less than 20 milliseconds) and thus cater to high data being generated in today's systems.

Kafka wasn’t the first one in the market with this idea. We had JMS, RabbitMQ, and AMQP but what worked in favor of Kafka was higher throughput, reliability, and replication characteristics suited for today's real-time logging and analytics requirements. RabbitMQ can also process a million messages per second but requires a big cluster (30+ nodes) for in-memory operations and thus is not suitable from a hardware perspective.

Another advantage of Kafka is that it allows for on-the-fly horizontal scaling and is fault tolerance. As compared to traditional systems that are limited on scalability because of hardware limits and downtime to add new hardware, on Kafka, adding a new machine does not require downtime nor are there any limits to the number of machines you can have in your cluster. For fault tolerance, in a lot of non-distributed systems, there is a single point of failure. In Kafka, on the other hand, in a 3 node cluster, you can continue to work even if two nodes go down.

Usage

Coming onto usage in the market, according to HG insights[1], approx. 20,000 companies use Kafka including LinkedIn, Spotify, Uber, JP Morgan Chase, New York Times, Shopify, Cisco, CloudFlare, and Netflix. Let's look at some of the use cases -

Uber uses Apache Kafka as a message bus for connecting different parts of the ecosystem. They collect system and application logs as well as event data from the rider and driver apps viz. location coordinates of the ride and driver and use this for computing nearest vehicle, exact route taken by vehicle, computing the price, etc. They handle trillion+ (info from 2017) messages per day over tens of thousands of topics.
Netflix which we covered above has ~500 billion events and ~1.3 PB per day generated from video viewing activities, UI activities, Error logs, Performance events, Troubleshooting & diagnostic events
New York Times uses Kafka to connect multiple Content Management Systems, third-party data and wire stories on one side and a range of services and applications like search engines, personalization services, feed generators, as well as all the different front-end applications, like the website and the native apps that need access to this published content on the other side. Whenever an asset is published, it is made available to all these systems with very low latency — this is news, after all — and without data loss.
LinkedIn handles 7 trillion messages per day, divided into 100,000 topics, 7M partitions, stored over 4000 brokers. Kafka is used extensively throughout its software stack, powering use cases like activity tracking, message exchanges, metric gathering.

You can view [1] and [2] for more use cases.

I hope this basic info was useful!

[1] https://discovery.hgdata.com/product/apache-kafka
[2] https://blog.softwaremill.com/who-and-why-uses-apache-kafka-10fd8c781f4d
[3] https://kafka.apache.org/powered-by

Thursday, March 26, 2020

Technology to aid of Covid spread

Came across interesting uses of technology around Covid spread, surveillance, violations -

1 Using Location tracking via smartphones for Covid surveillance
This one talks about how Taiwan is ensuring people who have been exposed to the virus stay in their homes. The system monitors phone signals to alert police and local officials if those in-home quarantines move away from their address or turn off their phones. Officials also call twice a day to ensure people don’t avoid tracking by leaving their phones at home.

April 1 news mentioned that the system was tracking more than 55,000 people. The system has been very accurate with only about 1% of alerts being false alarms mostly because of inaccurate location readings.

2. Identifying lockdown violations (post-facto analysis)
The second one here is a report that was published to demonstrate how public data and visual AI can be used to identify lockdown violations. Taking actual images and videos from public Instagram profiles of 552,000 Italians between March 11-20, 2020, and applying image recognition technology, they were able to predict what percentage of people were not following quarantine, which city/region they belong to at an aggregate level and exactly where they were spending time (viz. parks, markets, malls). Obviously, the entire data was anonymized in the interest of privacy.

3. Using cough analysis to determine if one is Covid affected
This link talks about using AI and Deep learning to determine if a person has Covid by analyzing the sound of the cough, the way they breathe or the way they speak. It's based on the fact that the cough of a Covid patient is distinct from a healthy person. I also stumbled upon this site https://www.coughagainstcovid.org/ which is collecting data around cough sounds to crowdsource and create such a technology. This initiative is supported by Bill & Melinda Gates Foundation and is in collaboration with Stanford University.

4. Using real-time mobile location data to detect violation of social distancing
Unacast is a company that collects and provides cellphone location data. It has aggregated all this to come up with a Social Distancing Scorecard. This scorecard is based on analysis of information such as two devices being at same place at same time (thus violating social distance), visit to non-essential places (other than grocery), and other parameters. Unacast collects data from various apps installed on phone which track location.

5. Mapping movement of coronavirus carriers
The South Korean government is publishing the movements of people before they were diagnosed with the virus — retracing their steps using tools such as GPS phone tracking, credit card records, surveillance video and old-fashioned personal interviews with patients. The idea is to let the public know, via a central website and regional text messages, if they may have crossed paths with carriers, whose names are not made public. Here is the link to site- https://coronamap.site/

Will keep adding more as I discover.

Wednesday, June 3, 2015

The Next Disruption of Music

Interesting article around how technology is shaping the music industry.

Automated algorithms are creating lyrics and tunes.
People are flocking to stadiums to hear their favorite artists but are instead being served by Electronic Dance Music (EDMs)
Apps like Spotify which knows ALL, viz. which songs you skip and exactly when you skip them, artists which you like and the mood you’re in when you listen to each, will probably use this data to create a song which will sell!