Both languages are equipped with a wide range of packages: check the CRAM

6 min readDec 5, 2020

How to deal with possible duplicate entries in your data set? How to work around the problem of duplicates using SQL? Let me guide you through some of the techniques I have been using as a Data Analyst to identify, avoid and remove duplicates from any type of data set.

Wait! What is data science? Don’t worry! Wikipedia is your friend! Briefly, we can define data science as an interdisciplinary field that extracts insights from data by using algorithms and tools. You can imagine data science as a blend of mathematics, statistics, and computer science. Furthermore, data science is applied also in several areas: wherever the data is present!
Altering how you perceive time is a powerful concept. When time looks and feels different, you look and feel different. People spend so much of their lives chasing experiences and trying to change state.

Flow states are like teleporting from consciousness to a dreamland in your mind where effort looks easy. Without flow, you’re inefficient. Work without flow feels like a hike up the steepest hill you’ve ever seen in comparison.

You’ve been burnt-out and stressed at work because of too many projects. Your manager stops by your office and asks if you can take on a new account. Because you’re afraid to lose your status as “The guy who gets stuff done,” you say yes and your stress only gets worse.

Regardless of how duplicates were created, the first step is to identify them within your data table. The most difficult part here is not technical: it is about acknowledging their existence. Too often one may be tempted to consider that a data table (especially within a company) does not contain any duplicate entry by default. While it is, fortunately, the case for the majority of them, sometimes duplicates were created unknowingly over time.

Now imagine swallowing that magic mushroom in Nintendo’s Super Mario game and leveling up. Imagine getting an upgrade to every part of your character in a computer game. That’s what flow is: an upgrade.

Based on the example introduced previously, the following query allows to identify possible customers that are present more than once in the data table customers. In this example, I want to ensure that all the customers registered are unique. To do so, I use customers’ email address and date of birth, as they are supposed to be unique to each customer.
See, people who are insecure consistently feel bad about themselves. And often, they don’t know how to feel better in a healthy or productive way. So they often resort to criticizing others.

Yet, with nary a whisper, a friend and co-host… gone. The podcast never made it to air. Right around Memorial Day. Probably for the best. And, if that’s a bizarre entr’acte for an existential essay… well, welcome. Hi. More to follow.

Each time you say yes to someone else at the expense of yourself, you’re telling your mind that what you want isn’t that important. If this becomes a habit, it shouldn’t be surprising when your mind doesn’t value itself!
We tinkered. Tested microphones. I bought domain names and played with digital audio workstations. We recorded episodes for the “can” — and I edited them. We laughed, pondered, wondered. We had fired up the turntable. By May, we’d prepped to drop the needle on the vinyl as the motherfucker spun.

The most used languages are R and Python. It is important to point out that they are not mutually exclusive. R is more suitable for statistical analysis while Python is more appropriate for machine learning. This doesn’t mean that you cannot do machine learning with R or statistical analysis with Python. Personally, I mostly work with Python but it sometimes happens that we deploy packages written in R. It depends on the industry and your (future) company production environment’s standard. My recommendation is to learn both of them!
Benjamin Hardy, wrote this headline in one of his recent articles. He, too, has witnessed the power of flow states. Many people understand the notion of flow, but they don’t utilize flow because it’s not simplistically explained for them to implement it.
Well, that’s the thing: in the long-run, it doesn’t. Being overly critical of other people will end up making you feel guilty and worse about yourself in the long run, only adding to your insecurity.What if the change of state you need so desperately is not a holiday to an island in the Caribbean, but a flow state delivered to you at home? You don’t have to wait for flow states to find you, magically. You can create flow.Let’s set the scene. You just conducted a brilliant analysis to evaluate how many clients you or your company acquired over the past year. Your conclusion seems to confirm your intuition: the number of clients has tripled over the past 12 months! You are so proud of this result — although somewhat surprised — that you want to double-check it before presenting it to your boss. Digging into the data, you suddenly realize that some customers were counted twice… which clearly questions your previous finding.

How to become a data scientist? In this article, I provide a hypothetical learning path to become a data scientist. This guide mainly consists of online free resources and is made for people who want to learn data science. If you are already a data professional, then you can ignore the first sections and jump to the sections containing resources to remain updated. As with every list, this one is far from being complete. If you have good suggestions, please leave a comment!

After all, to navigate life successfully we have to be able to discriminate and analyze the people, problems, and situations in our lives so that we can make good decisions. For example: A good way to end up in an unhappy marriage is to not think critically about the person you’re about to marry.

When you habitually ask for reassurance, you’re really telling yourself you can’t handle things on your own. Tell yourself that often enough and you’re going to feel like you can’t handle anything.

Your mother-in-law asks you if she can drop by and hang out with the kids. You’re having a rough day and really don’t need the added stress of hosting her. But because you’re afraid she’ll think badly of you, you say yes anyway.

A simple SQL query allows you to retrieve all duplicates present in a data table. Looking at some particular examples of duplicate rows is a good way to get started. Five to ten concrete examples will give you a first idea of what is going on in your data, in order to better understand what could have led to the creation of duplicates.

Then, a Monday went by without a meeting. A lonely text exchange, “Hey! I’m on the Zoom!” Nothing. It just slipped, as things do. The following Monday, it slipped again. That was it. We chit-chatted. I didn’t push too hard, nor did she. The embers left a trace of blanks and cloud of smoke behind. I thought it was me. Then, I assumed she had something going on. If she wanted me to know, she’d let me know.