Thursday, 30 June 2022

New top story on Hacker News: Show HN: Ploomber Cloud (YC W22) – run notebooks at scale without infrastructure

New top story on Hacker News: Show HN: Ploomber Cloud (YC W22) – run notebooks at scale without infrastructure

Show HN: Ploomber Cloud (YC W22) – run notebooks at scale without infrastructure
23 by idomi | 3 comments on Hacker News.
Hi, we’re Ido & Eduardo, the founders of Ploomber. We’re launching Ploomber Cloud today, a service that allows data scientists to scale their work from their laptops to the cloud. Our open-source users ( https://ift.tt/edDW6J2 ) usually start their work on their laptops; however, often, their local environment falls short, and they need more resources. Typical use cases run out of memory or optimize models to squeeze out the best performance. Ploomber Cloud eases this transition by allowing users to quickly move their existing projects into the cloud without extra configurations. Furthermore, users can request custom resources for specific tasks (vCPUs, GPUs, RAM). Both of us experienced this challenge firsthand. Analysis usually starts in a local notebook or script, and whenever we wanted to run our code on a larger infrastructure we had to refactor the code (i.e. rewrite our notebooks using Kubeflow’s SDK) and add a bunch of cloud configurations. Ploomber Cloud is a lot simpler, if your notebook or script runs locally, you can run it in the cloud with no code changes and no extra configuration. Furthermore, you can go back and forth between your local/interactive environment and the cloud. We built Ploomber Cloud on top of AWS. Users only need to declare their dependencies via a requirements.txt file, and Ploomber Cloud will take care of making the Docker image and storing it on ECR. Part of this implementation is open-source and available at: https://ift.tt/gEonxRe Once the Docker image is ready, we spin up EC2 instances to run the user’s pipeline distributively (for example, to run hundreds of ML experiments in parallel) and store the results in S3. Users can monitor execution through the logs and download artifacts. If source code hasn’t changed for a given pipeline task, we use cached artifacts and skip redundant computations, severely cutting each run's cost, especially for pipelines that require GPUs. Users can sign up to Ploomber Cloud for free and get started quickly. We made a significant effort to simplify the experience ( https://ift.tt/qYB2cNP ). There are three plans ( https://ift.tt/ETwJVmr ): the first is the Community plan, which is free with limited computing. The Teams plan has a flat $50 monthly and usage-based billing, and the Enterprise plan includes SLAs and custom pricing. We’re thrilled to share Ploomber Cloud with you! So if you’re a data scientist who has experienced these endless cycles of getting a machine and going through an ops team, an ML engineer who helps data scientists scale their work, or you have any feedback, please share your thoughts! We love discussing these problems since exchanging ideas sparks exciting discussions and brings our attention to issues we haven’t considered before! You may also reach out to me at ido@ploomber.io.

Wednesday, 29 June 2022

Tuesday, 28 June 2022

Monday, 27 June 2022

New top story on Hacker News: Ask HN: How on earth are you using your Apple computer with external displays?

New top story on Hacker News: Ask HN: How on earth are you using your Apple computer with external displays?

Ask HN: How on earth are you using your Apple computer with external displays?
4 by n42 | 6 comments on Hacker News.
I own four different Apple computers -- a 2017 MacBook Pro, an M1 MacBook Air, an M1 MacBook Pro, and most recently a maxed out Mac Studio. I also have had in that timespan three different Windows desktops that I have built and a ThinkPad running Windows or Linux depending on the mood. I have spent countless dollars on cables and adapters in an attempt to find the magic combination. I have read DisplayPort specs, I know every brand of certified cable. I now know way more than I would ever care to know about DisplayPort and HDMI protocols. I have tried 4 different brands and models of monitor. For one of those models, I had three of the exact same model. All combinations work flawlessly with anything that is not one of the Apple devices. I have all but eliminated any of these components being the problem. Depending on the device and the day I will get: - Visual artifacts like snow, lines, flickering - Failure to support native resolution on any high resolution monitors - Failure to support high refresh rates - Forced scaling, detecting monitor as a TV and using interlacing - Most reliably of all, failure to wake from sleep without plugging/unplugging; doing a dance with power cycling my monitor or device until it finally works, or just giving up and logging into my Windows PC because today I can't use my Apple computer It's never all at once, but it's always at least one thing. In the time of owning any of these devices, I have without exaggeration, not once had the expected experience of sitting down at my desk and starting my day without fighting my computer to work properly with my monitor. Searching the internet, I can't be alone. All of the problems I have, as far as I can tell, other people experience. And as far as I can tell, no one has an answer. I'm at a breaking point after ordering this $4k desktop Mac Studio and waiting 3 months for it to arrive. I hoped that, being a device that requires an external display, they at least worked it out with this one. They did not. So how does the entire professional industry working with Apple computers manage to start their day, every day, like this? Am I insane? Is no one else dealing with this? Are you all just using the built in display? This has been going on for YEARS for me, across multiple generations of devices.

Sunday, 26 June 2022

Saturday, 25 June 2022

Friday, 24 June 2022

Thursday, 23 June 2022

New top story on Hacker News: Show HN: Data Diff – compare tables of any size across databases

New top story on Hacker News: Show HN: Data Diff – compare tables of any size across databases

Show HN: Data Diff – compare tables of any size across databases
27 by hichkaker | 0 comments on Hacker News.
Gleb, Alex, Erez and Simon here – we are building an open-source tool for comparing data within and across databases at any scale. The repo is at https://ift.tt/85N6LVT , and our home page is https://datafold.com/ . As a company, Datafold builds tools for data engineers to automate the most tedious and error-prone tasks falling through the cracks of the modern data stack, such as data testing and lineage. We launched two years ago with a tool for regression-testing changes to ETL code https://ift.tt/gpWka9v . It compares the produced data before and after the code change and shows the impact on values, aggregate metrics, and downstream data applications. While working with many customers on improving their data engineering experience, we kept hearing that they needed to diff their data across databases to validate data replication between systems. There were 3 main use cases for such replication: (1) To perform analytics on transactional data in an OLAP engine (e.g. PostgreSQL > Snowflake) (2) To migrate between transactional stores (e.g. MySQL > PostgreSQL) (3) To leverage data in a specialized engine (e.g. PostgreSQL > ElasticSearch). Despite multiple vendors (e.g., Fivetran, Stitch) and open-source products (Airbyte, Debezium) solving data replication, there was no tooling for validating the correctness of such replication. When we researched how teams were going about this, we found that most have been either: Running manual checks: e.g., starting with COUNT(*) and then digging into the discrepancies, which often took hours to pinpoint the inconsistencies. Using distributed MPP engines such as Spark or Trino to download the complete datasets from both databases and then comparing them in memory – an expensive process requiring complex infrastructure. Our users wanted a tool that could: (1) Compare datasets quickly (seconds/minutes) at a large (millions/billions of rows) scale across different databases (2) Have minimal network IO and database workload overhead. (3) Provide straightforward output: basic stats and what rows are different. (4) Be embedded into a data orchestrator such as Airflow to run right after the replication process. So we built Data Diff as an open-source package available through pip. Data Diff can be run in a CLI or wrapped into any data orchestrator such as Airflow, Dagster, etc. To solve for speed at scale with minimal overhead, Data Diff relies on checksumming the data in both databases and uses binary search to identify diverging records. That way, it can compare arbitrarily large datasets in logarithmic time and IO – only transferring a tiny fraction of the data over the network. For example, it can diff tables with 25M rows in ~10s and 1B+ rows in ~5m across two physically separate PostgreSQL databases while running on a typical laptop. We've launched this tool under the MIT license so that any developer can use it, and to encourage contributions of other database connectors. We didn't want to charge engineers for such a fundamental use case. We make money by charging a license fee for advanced solutions such as column-level data lineage, CI workflow automation, and ML-powered alerts.

Wednesday, 22 June 2022

Tuesday, 21 June 2022

New top story on Hacker News: Instagram demands I send a picture of myself to prove I own my account

New top story on Hacker News: Instagram demands I send a picture of myself to prove I own my account

Instagram demands I send a picture of myself to prove I own my account
53 by jdthedisciple | 32 comments on Hacker News.
So I tried to create an Instagram account yesterday. After registering, I was immediately told my account was disabled for suspicious activity, but that if I wished they would review it within 24 hours. Weird, I thought, but maybe it's just some rare false positive that can be triggered and I'm just unlucky. So I waited, patiently. After 24 hours I tried to log in again and to my surprise, my account wasn't just temporarily disabled anymore but permanently deactivated and I was met with this message: > Your account has been disabled for violating our terms. Learn how you may be able to restore your account. https://ift.tt/mTc4BX1 How can I allegedly have broken Instagram terms when I just created the account and even verified it by phone? So I visited that link and asked them to restore it. What I get is an email by facebook that demands I send them a picture of myself holding a paper that I wrote a specific code on. Verbatim the email is this: > Hello, thank you for contacting us. Before we can help you, you must confirm that you are the owner of the account. Please respond to this email and attach a photograph of yourself, where you hold a piece of paper with the following, handwritten code on it: *** Please make sure that the photo fulfills the following criteria: - shows the above mentioned, handwritten code on a clean piece of paper, followed by your full name and username - shows both of your hands holding the paper as well as your complete face - it is well-lit and not too small, dark or blurred - is attached as a JPEG-file to your response E-Mail Note: Even if this account does not contain and pictures of yourself or it represents somebody or something else, we can only help you when we receive a picture of you which fulfills these criteria. Am I the only one who finds this incredibly intrusive? I know I might be partially beating a dead horse here, as everyone knows Meta is pure evil. But this email really "gave me the rest". I wouldn't use IG for posting pictures of myself anyway but now I won't ever be using anything from Meta even for business reasons. Are there really no less intrusive ways than the above to prove ones ownership of account?? Why is email and phone verification not enough anymore these days? Is this the type of "progress" happening at FAANG? LOL

New top story on Hacker News: Ask HN: Having trouble getting senior applicants, wondering what to do about it

New top story on Hacker News: Ask HN: Having trouble getting senior applicants, wondering what to do about it

Ask HN: Having trouble getting senior applicants, wondering what to do about it
20 by throw1138 | 71 comments on Hacker News.
We're a fairly typical run-of-the-mill mid-size enterprise software vendor trying to hire for fully-remote SWEs in the "DevOps" software space (Linux, containers, k8s, yadda yadda). We post in the usual places including Who's Hiring but we haven't even managed to backfill a retirement from six months ago, and we're junior-heavy already. Benefits and salary are good (though salary isn't posted in the ad), and the people are great, though the work requires a reasonably deep understanding of the underlying platforms which a lot of people seem to dislike. I'm wondering if the work being a higher percentage non-code is what's causing us trouble, if we're just rubbish at hiring in general, or if it's something else. What's everyone else's experience attracting applications from senior talent in this market, and what is everyone doing to increase their attractiveness? Current hiring process: - Resume screened by in-house recruiter - 30m call with them - Resume passed up to engineering - Hour-long call with hiring manager (typically the engineering manager of the team the candidate would join) - Take-home technical assignment (~4h) or similar at candidate's choosing - Presentation of technical assignment to the team - Offer

Monday, 20 June 2022

Sunday, 19 June 2022

Saturday, 18 June 2022

Friday, 17 June 2022