Cloud/Ephemeral Everything eh?

I understand why people want to move back to the shared (cloud) model, but understanding what needs to be ephemeral needs to be considered based off of requirements. Elasticsearch in a docker container scheduled by a compute scheduler? Is that not a bad idea for something as unstable as Elasticsearch? MemSQL in a docker container? The point of MemSQL is for fast data access in memory and I would think there is a legit case for it, so why not allocate MemSQL dedicated servers? The more important question one should answer is ‘Should important storage nodes have to compete for compute power against CRUD nodes?’ Most likely your answer would be no, so my suggestion is to use a physical dedicated server, but alas we have the cloud craze blazing ahead of common sense.

Let’s get right into it.

Persistent Storage Nodes

Databases, file storage and other persistent systems should not be ephemeral by any means. These are hard storage and require a fixed location unless you can figure out how to move data at a super fast rate in order to avoid catastrophic failure. Replicate only mitigates this issue (is it?), it is not designed to be a magic bullet for some how guaranteeing some crazy 99.9999999999% availability.

Should storage query layer clients be ephemeral? Depends on what the clients are used to do.
– For reading? Most likely not a problem as the initiating user can try again. This is bandwidth, throughput and latency, but it is easier to do a resume!
– For writing on the other hand, imagine transferring 100mb to be stored and the storage query layer client dies (for whatever reason), let’s assume that the storage engine cleans up the mess left behind (with no fragmentation) then what do you communicate to the initiating client (human or machine)? That’s right you can’t! An initiating client will likely try again and get connected (in the backend via proxy) to another storage query client. Why is this fine with people? Availability? Distributed? “Data store suffered a failure, please try your upload of 100mb again”. That is wasted bandwidth, throughput and latency!

A ephemeral node can go down at anything for no reason whatsoever. Is it worth the latency and decreased throughput to put storage write clients on an ephemeral node? I would argue against the worth given the randomness in death of nodes in the cluster. Write clients should be in a fixed location where the same loss/retry cycle can be tolerated with a load balancer.

I would like to know the architectures in distributed systems that account for death of nodes where a client is currently transferring data if there are any. Failure is usually handled with an error message and then retry of the idempotent operations. Idempotency is nice, but not for large files.

Unless you have a metric shit ton of nodes where you can replicate data enough and the in memory storage/file system is big enough to tolerate large node failures and random node failures that it would not matter if the nodes were cloud or fixed then sure cloud works, but I haven’t seen anywhere that has large enough of a deployment. Ephemeral would represent unstable nodes and fixed would represent stable long-term nodes as that point.

Math/Compute-heavy nodes

So how about math/compute-heavy nodes? Again, depends on the data structures and algorithms that are used. Naive will always restart in the face of failures and my guess is that most people use naive algorithms to do compute heavy operations. Another factor to consider is how loaded the nodes are going to be by not only your application but the others who are scheduled on the physical node. Do these nodes need a fixed location? Fixed if you use naive algorithms and dynamic if you use algorithms that break the problem up to distribute it.

Outward Facing Data Nodes

CRUD (REST, SOAP, etc.), UI, cache (memcache) and anything that basically serves data independently of other nodes and do not store data locally usually can live on a ephemeral node. HTTP clients usually retry on their own, so users/machines can deal with delay in the case of failure. Writes are by default considered idempotent, so if a write fails then user can retry, but this is not always the case.

You just read Cloud/Ephemeral Everything eh?. Please share if you've liked it.
You may find related posts by clicking on the tags and/or categories above.

Shallow Critique of Mozilla Talk and Online Commenting

A topic I have cared enough about to think about pretty intensely is the commenting landscape on websites. Sadly I have not executed on my own commenting system yet even though I have ideas and partial UI code. Anyway, Mozilla’s Talk (collab project: The Coral Project) is interesting, but not in the way that I was looking towards.

Talk is open source and install-able by anyone on their own servers and at their leisure. They have tech docs (very little, but looks to the point). Well anyway, since I have not installed it to try it, I’ll go off from what I see.

Let’s take a look at the UI for the admin/moderators

Talk Moderation

Very simple looking and to the point. Keyboard shortcuts I guess were desired by moderators? Fat fingering is a common problem, so I assume the decisions are easily reversible (undo shortcut?). In terms of sorting, newest first is questionable. Some comment streams are high velocity, so do I really want to look at newest first from the get go? Looking at the comments stream, I see links are highlighted and I assume it will expand out rich url (html) links as well. Not every link is a bad place, so that red colored info button looks very ominous and probably ignored in the long run as a UX thing. UX is hard to get right since each community has way of user interaction. Let me give my thoughts on moderation.

In the end, even for the newsroom industry, we want to remove heavy moderation since it is a bottleneck, so I fail to see how adding shortcuts will help with one of the complaints/conversations around moderation (Light or heavy moderation). One of the “tenets” of Talk is the idea of getting moderators to focus on the positive comments, which again I fail to see why I would want that. My thoughts are that any website that wants a comments section only want humans to make decisions in the case that a machine cannot rather than focus on positive or negative spectrum.

Since the software is self hosted, I wonder how the project will expand to include machine learning techniques to harvest data and create models. Is there a way to collect that data and have someone do data analysis to fit a model and hook that model right in to a moderation decision engine. You may want to have traffic/comment shadowing where both models are used and you can test moderation performance to see if you want to keep the model or make adjustments. Are those models shareable? Do publishers really care that much about owning their data that they are unwilling to share moderation data or commenting data (comments are public by the way)?

Alright, let’s look at what a community member might see below an article with Mozilla Talk.

Talk Comments

So, of course I look at this UI and first question that pops up is how is this different from Disqus or Discourse? I wonder if I have to create an account on each site that embeds their own instance of Talk. Anonymity on news sites? Please, not going to happen. That respect button looks like a +1 sort of thing, so does that give me reputation in this particular community? I’m way more interested in moderation automation and UX and less of the UI itself, but I have to indulge myself.

I have to wonder if self hosted commenting systems are scale-able at all outside of small installations. The whole point of commenting systems to me is to increase engagement so that users will come back and form a sense of community. Once you get to a certain size, you are going to have separate communities/categories of people. Who is reputable with respect to particular categories? Should users be marked as reputable or should other users figure that out by reading comments from that user? It seems commenting systems are essentially forums, so why separate these concepts? I thought that Discourse made a good step with forums as embedded comments… There are missing pieces here that need to be addressed by Mozilla’s Talk in order for it to be taken seriously as it looks like more of the same to me.

Pretty tired of writing about this topic today. I do want to bring back my own commenting system back from the dead though as I believe I care enough about it to have notes after notes on it and the target customers.


Online Dating

(First off, I am not proposing a solution, just what I think about the current solutions so far and what I think the ideal solution is. In another post, I can think about how to define a solution with a better user experience.)

I’ve been thinking about the whole idea of online dating and how it should work. Personally, for me it requires too much effort to put in and I think the point of technology is to make the things in my life effortless. No one uses technology to make their life harder right?

Let’s set some context here. Take Instagram, Facebook, Google and the mobile phone, they make it easier to accomplish a task that you set out for yourself. Instagram, in particular, makes it effortless to upload photos as well as make the average photo look “better”. If made the connection with the title, this is not to say that existing online dating apps don’t make life easier, in fact they do!

Popular online dating services such as OkCupid and Tinder make it easy to define your “profile”, put yourself out there and see who else is looking to date. These are very low hanging fruit and I get it. If you are willing to take generally attractive pictures of yourself (friends or photographers who can take pics), willing enough to craft a good profile of yourself and stick around long enough to find a way to connect to who you desire to date, then yeah it works. Does this scale to the average person though? Can we do this without superficial images and “defining” yourself designed to make yourself look good?

Is online dating just an extension of dating services? There is no reason for it to be more than that. I think many people come to the same ideal scenario of sign up, look for person, talk to person, go on date and then live happily ever after.

There are some apps that talk about location based service, but is a bar really where you will find people using their phones for location apps? I guess if push notifications about people interested in other people nearby makes sense. Sounds like a battery drain in my opinion.

What problems does online dating have today?

  • Elephant in the room is that you craft a profile and you are one of many fishes in the pond, so by chance you will find someone to even speak with, but that’s part of the game right?
    • 6 -12 months on a service and how do you feel about the service so far?
  • So, you sign up for a service, define your profile and send a couple of messages. Online dating services makes me define myself and then expect me to put more effort in trying to find someone to even SPEAK with. Again part of the game right?
    • Emphasis of filtering is similarity of profiles in the current services, which I do not think fits the bar in match making since everyone has their own internal scoring system.
  • Let’s say you get connected to another person and so what is the first thing that one wants to talk/ask about? You practically know everything about the other person since you looked at their profile once or twice. Realistically, this is the hardest part to get right since everyone wants to different.
    • There are tutorials on how to craft your first message huh? Why is that? High school is over right? Shouldn’t I just be myself and ask questions? “How was your day?” from an attractive person is sure loads different from the same from an average person.
    • In general I think, people who do not talk when the other talks should be negatively penalized in some kind of scoring system.
  • Paying for additional features, I mean come on… paying for the idea of when your messages are read. RLY? Paying for anonymity because you want to STALK? Let’s be more creative than that.
    • Storage space is a limited resource, so why not charge for that?
    • It takes two to tango, why not add an matcher/curator for an extra charge who will compare profiles, talk to the paying members and recommend. Why offer recommendations for free anyway?
    • Membership is a great way to charge people for things, but again enabling what I think are basic features as part of the membership is sleazy.
  • How do you vet people? Just date them and see where it goes right.
    • What happened to video? Why can’t two people just video chat each other as a light hearted attempt at feeling each other out? Texting and talking are very different and I think the ideal progression should be from chatting through keyboard to video to going out.
  • Interracial relationships appear to be on the rise, but online dating services are lagging. Why is that though?
    • There is a prominent study by OkCupid on message respond rates across racial boundaries, so clearly bias exists.
  • People who write one line messages as the first attempt at contacting.
    • I would argue that this is due to the number of people you can see in a service, so the more you can see then the more spam behavior you are likely to see. There are many reasons, but I focus on what people on dating services perceive to be unwanted (spam).

What is a ideal scenario?

Simple ideal scenario is that you sign up for the service, find someone that you find attractive, go on a date and then live happily ever after.

What qualities of the ideal scenario are realistic?

  • Signing up should be quick and easy.
  • Defining yourself needs to be quick. A few physical attributes, beliefs and lifestyle choices are essential.
    • Age, height, do you smoke or drink. etc.
  • What are you looking for? Searching and filtering people is the biggest pain. The ideal might be to just not let people browse at all and send them notifications about new people in the area who match their filter.
    • Letting people browse the pool is not ideal because wasting time looking? Let technology take care of that.
    • Filtering in a way to corresponds to attractive qualities such as mindset and lifestyle choices.
      • No age range choices. It is not worth it.
      • Do you want a smoker, drinker, druggie or etc.?
      • Let us recommend through your Facebook friends of friends
      • Let us reach out to past good/neutral connections?
      • Proximity is important, but people cannot be be allowed specify it directly. Should it even be asked the way it is? (do you prefer people near or far?) Maybe reframe it as do you move around a lot or is your typical day is to go to to work and etc. to figure out if proximity is important to the person.
    • Targeted recommendations, I argue should only be for paying members via match makers. Having a third party interact with two people is the best way to go forward as machines cannot play match maker.
      • Would be interesting if people talked to bots and if the bots could recommend some people right off the bat to talk to.
    • Looks are not the emphasis here because we are online. Why do we treat online as if you are in the vicinity of that person? Seriously need to get past that and focus on the things you can display online such as personality.
      • For instance, Disqus changed it’s system to allow anonymous comments in a yik yak/secret (apps) way because identity (this includes pictures and social links) doesn’t scale for comments and anyone who has been online reading comments can tell you that. Who wants to sign in to leave a comment under their name anyway?
  • I want to talk to potential mates. Connecting people, just another filter. Once we filter out the noise, how do we get these people to just talk without biasness included?
    • I would argue that if you invited more than two people to a blind restaurant then there will be a conversation that has topics of many sort. Some will like each other and others will not, but mutual respect is obtained in just talking about it. I cannot say this can be replicated online, but definitely there has to be a good way to get people to see each other not through society’s lens.
    • Chat room with multiple people (mutual friends who can introduce to people?) in it may be a way to get a conversation started.
    • Video chat is a great starter since it is not the first date. It has quite the few benefits,
      • Identity verification. It is easy to fake images and harder to fake a live stream.
      • You get to see if this person is actually someone you want to hang around and have a conversation.
      • It is noncommittal. No one is pressured to stay in the chat if it is boring. Just press a button, get out and never speak again.
      • You get to see the face.
  • Happily ever after
    • Give feedback on the experience.
You just read Online Dating. Please share if you've liked it.
You may find related posts by clicking on the tags and/or categories above.

So uhh… Google

Looks like an interesting company from the outside and I’m sure it is on the inside too. I like Google Maps, GMail, Search, Android and others. These software collect data and act on the data, which is fine with me as long as it makes the software useful. Is it a problem that I like a number of products created by one company? Is it concerning to me at all?

Not much if decisions are made with good intent and good reasons. From time to time, mistakes will be made and those usually result in public outcry and/or fines (regulations are good tools). Google isn’t a perfect company and I get that part.

Google’s had some external and internal issues put out in the air lately. Such as an engineer who put out a doc which could have used an editor (https://assets.documentcloud.org/documents/3914586/Googles-Ideological-Echo-Chamber.pdf), which got him fired in the end. Google pushing out a person from a think thank who was critical of Google’s policies. Just a few that I can remember right now. Are these a pattern of removing critical people and for what goal?

Antitrust issues too because of someone being critical/being cautious? (https://vivaldi.com/blog/google-return-to-not-being-evil/) We gotta do what we gotta do to stay competitive, but what is the reason for this? Why bother with critics?

I guess we will see in the upcoming months what direction Google is taking if the stories pick up or not.

You just read So uhh… Google. Please share if you've liked it.
You may find related posts by clicking on the tags and/or categories above.

Interesting Things in Software (9/17/2016)

Left-Right Algorithm for Concurrent Reading and Writing

Here’s an interesting, practical and more memory efficient way alternative to the Read Copy Update (RCU) Algo. These algos assume far more reads than writes. Instead of copying the data structure for writing (essentially creating a new version), you maintain two copies of your data structures where both sides are eventually consistent. Strictly, writers will modify the opposite (say left) side of what the current readers are reading (right) and then point new readers to the modified side (left) while waiting for old readers to finish up before modifying the other side (right). When another write comes in, the same process occurs starting with the right side.

Simple concurrent algorithm too and there is no reason to use heavy synchronization mechanisms here. A counter (Array of counters is better because contention is bad) for each side and an index for where the reads are going. It is pretty novel for me.

Links

  1. Video: https://www.youtube.com/watch?v=FtaD0maxwec&index=73&list=PLHTh1InhhwT75gykhs7pqcR_uSiG601oh
  2. CF: http://concurrencyfreaks.blogspot.com/2013/12/left-right-concurrency-control.html

 


Nvidia Tesla for Time Series Data?

I wonder if it is worth the effort to use an Nvidia Tesla for automatic anomaly detection in time series data. Obviously, you do not want detection for all time series data because it has to be meaningful, but if you had a way to combine multiple anomalies with a rule based system then it should be effective for large amounts of data.

Look at this beast: http://www.anandtech.com/show/10675/nvidia-announces-tesla-p40-tesla-p4

My idea is to throw this beast into 1-10 machines and have data fed to them from a TSDB in real time (as they are committed to the TSDB), so rules can be evaluated ASAP. TSDBs nowadays focus on you querying them for data instead of pub/sub. It is not to say querying isn’t useful as it gives you history, but threshold based checks can be super quick. Pub/sub is for streaming and I think that is the way to go for building real time services on top of a TSDB.


Thinking about moving off of WordPress

Just thinking about moving away from PHP in general and to NodeJS. I can definitely make my own blog in NodeJS nowadays without any worries about security updates and etc. Blog from anywhere too since my HTML/CSS chops have not withered away. I will definitely make use of Aurelia because it is a really good framework.


Got Internet at My New Place

It has been quite awhile since I last posted here. Last post, I complained about Citibank’s initial setup which was in Nov 2016! and it is almost Sept 2017!

My biggest project has been released at work, but since it is a user-driven product rather than a “one person with all the knowledge” thing, it is hard to get loyal users. My brain is pretty tired though since this project was written by only me (though thanks to open source library authors too!) in Java. The project exposes a UI application and backend server that does the heavy lifting. It is powerful, but the UI may be too nascent (exposes too much? not in the right layout?). I’ve been fixing bugs and trying to find better ways to combine primitives into user-readable options. All while trying to balance other things… Ya ya, my work-life balance needs work.

Gonna travel a little soon enough.

I should get back into C++ and restart my “Dabbling in C++” series to learn more
modern C++ and things I wanted to understand, but never did. Maybe I should do a Java series.

Anyway, I never do understand why people jump to other languages with nascent support for it. I stick with Java because I trade away the memory concern (bloated) for all the support it has gotten over the years. I accept the JVM’s types are bloated and depending on what you do it doesn’t matter. I sincerely hope the JVM maintainers get around to allowing defining more compact types that can be fully stack allocated.

Javascript has been progressing quite a bit and I like the direction that ES6 has taken. Async/Await is the most godly feature there. Async returns a promise and await will wait on the promise. This means no more promise chaining, callback hell and etc. Callback hell is what turned me away from javascript in the first place, but I’m glad I rediscovered JS.

Time comes and goes and how much time i have to do brain dumps depends on my willingness and mood.

You just read Got Internet at My New Place. Please share if you've liked it.
You may find related posts by clicking on the tags and/or categories above.

Opening an Account at Citibank is so slow and dumb.

Two weeks ago from today November 15, I opened an Access account at Citibank through their online application. I’ve never seen a bank account that is online-only, so it is fitting that I sign up online to get it.

Sure, the approval process was quick, but 1. why did I receive an email saying that I needed to place an initial deposit (AS IF IT WERE BLOCKING THE APPROVAL) and 2. where the hell is the account? Furthermore, why the hell am I getting emails to place an initial deposit after I authorized it during the application process? Are the people there just not aware that I already did? It is not associated with my account yet and at the time of approval I did not get an account number. They say I can GO TO A CITIBANK to freakin’ deposit into it… so much for the do your banking online-only crap.

Just got the account number through the mail last week, but no routing number. Don’t fret, routing numbers are public and easy to look up. I decided since I got mail and email for placing an initial deposit that I would just push one manually, which turned out to be successful. Nov. 14 comes along and I look at my previous bank and see the initial deposit that I authorized going through, so now I have TWO “initial” deposits… wtf.

All throughout this process, I have live chatted and called, none of them said anything about the process specifically. Citibank has failed to communicate their entire process and their (new account) customer service needs to get better at asking the right questions such as whether I opened it online and if I authorized an initial deposit and that it would be done automatically.

This is my feedback Citibank. Fix your shit, stop confusing me and AUTO-ASSOCIATE MY BANK ACCOUNT WITH THE GODDAMNED ONLINE ACCOUNT. Why in the world do I have to wait for my debit card in order to link my account.

What the hell is this, 2005? Archaic, af.


Interesting Things in Software (9/10/2016)

Decided to start off an “Interesting things in Software” post to start logging things that I found interesting to read about.

Sonification of Algorithms

It is all about assigning a sound effect (essentially) to your algorithms. For a sorting algorithm, you could assign a sound effect when it makes a data access or write. The sound effect’s tone/pitch can be modified depending on data values (see youtube link below).

This is an interesting topic and idea because it is something that is weird to think about. First question is what the heck is this and why would I want to do it. Patterns is why. Every algorithm has a sound signature. Read the Sonification of Monitoring below about the idea. My take away from the post is that you can figure out what the system is doing by the signature and if the system makes other sounds that what is expected then there may be a problem. Applying this idea at scale (w.r.t. complexity and size) makes more sense than trying to apply it to smaller algorithms.

Look at Example S18.3: Sonnet in Sonification of Data link below. Off by one error detection by sound.

Imagine being able to detect a bug in your algorithm because the sounds it was making was not harmonized. Very odd way to think about finding issues in my opinion.

Links

Sorting Sounds: https://www.youtube.com/watch?v=kPRA0W1kECg

Sonification of Monitoring: http://muratbuffalo.blogspot.com/2016/09/sonification-for-monitoring-and.html

Sonification of Data: http://sonification.de/handbook/index.php/chapters/chapter18/

The Sound of Voldemort: http://charap.co/sound-of-voldemort/

(Extra) Sorting Algorithm Sounds Playlist: https://www.youtube.com/playlist?list=PLZh3kxyHrVp_AcOanN_jpuQbcMVdXbqei&src_vid=kPRA0W1kECg

Dynamic Time Warping (DTW)

I’ve been reading about the idea of figuring out how to assign a similarity score between two time series that is transformation invariant and it has led me to DTW as well as other challengers to DTW. The idea of DTW is to compare every point in a time series (TS1) against a reference time series (TS2) with a distance function (Euclidean Distance is one) and then find the path that optimizes a cost function (minimum distance/maximum similarity).

I like this idea but I think you can see the inherent inefficiency… do we really have to compare every value in TS1 to TS2? Couldn’t we select a window of values to compare instead? Sure this has been explored and that is the whole idea of Subsequence DTW. Other researchers took it farther and decided to explore and implement the idea of multilevel (resolution based) DTW to take the run time down to linear time. They’ve even implemented it for all to reproduce the results.

Links

DTW: https://en.wikipedia.org/wiki/Dynamic_time_warping

FastDTW: http://cs.fit.edu/~pkc/papers/tdm04.pdf