Monday, 28 August 2023

Saturday, 26 August 2023

Friday, 25 August 2023

Thursday, 24 August 2023

New top story on Hacker News: Show HN: Gentrace – evaluation and observability for generative AI

New top story on Hacker News: Show HN: Gentrace – evaluation and observability for generative AI

Show HN: Gentrace – evaluation and observability for generative AI
9 by dsaffy | 0 comments on Hacker News.
Hi HN, Gentrace is our new evaluation and observability tool for generative AI (open beta). Generative pipelines are hard to evaluate because outputs are subjective. Lots of developers end up just doing “gut checks” on a few inputs before shipping changes, or they build up a spreadsheet of test cases that they manually run through the pipeline. Some companies outsource filling out the spreadsheet. However, in any of these cases, you end up with a very slow and expensive process for evaluation. At one point, we did this too. Gentrace is the result of a pivot; it was an internal tool we used to automatically grade new PRs as developers shipped changes to generative pipelines that other people thought might be useful. Gentrace makes pre-production testing of generative pipelines continuous and nearly instantaneous. In Gentrace, you: - Import and/or construct suites of test data - Use a combination of AI and heuristic evaluators to grade for quality, hallucination, safety, etc - Use our interface to correct automated grades or add your own (yourself or a member of your team) Gentrace integrates at a code level for evaluation, meaning we test your generative AI pipeline the way you would test normal code. This allows you to test more than just prompt changes; for example, you can compare models (eg Claude 2 vs GPT-4 vs GPT 3.5 vs Llama 2) or see the effects of additional chained steps (”Rewrite the previous answer in the following tone:”). Here’s a video overview that goes into a bit more detail: https://youtu.be/XxgDPSrTWIw In production, Gentrace observes for speed, cost, and data flow. It also shows real user feedback as well. We do this by integrating via our SDK at a code level; Gentrace does not proxy requests. Soon, we’ll allow you to convert production data into test cases, allowing customer support to turn bad production generations into “failing tests” for AI teams to make pass. We process interim steps and multiple outputs as well, helping evaluate agent flows / chains where the “last output” isn’t always the only thing that matters. There’s been a lot of observability tools published recently. We differ from those by focusing more strongly on blending observability with strong evaluation and by using an SDK rather than a “man-in-the-middle” approach to capturing data (ie Gentrace can be down and your request to OpenAI will still succeed). Within the evaluation landscape, we differentiate by integrating with code (see above for benefits) for capturing generative outputs and by providing a customizable UI workflow for building evaluators. In Gentrace, you start with off-the-shelf automated evaluators and then customize them to your specific task. You also build and run new evaluators on old generative outputs. Finally, you easily override automated evaluators and/or blend automated evaluation with evaluation by humans on your team. We also focus on being suitable for business use. We are SOC 2 Type 1 compliant (Type 2 coming shortly), have robust legal documentation around data processing, security, and privacy, and have already passed several vendor legal and security reviews at large technology companies. Our standard usage-based pricing is available on the website: https://ift.tt/Sf0eauU If you are building features with generative AI, we would love to get your feedback. You can self-serve sign up (without a credit card) for a 14 day trial here: https://gentrace.ai/ We’re available right here for feedback and questions. We’re also available at support@gentrace.ai. Best, Doug, Vivek, and Daniel

Monday, 21 August 2023

New top story on Hacker News: Show HN: Talk to AI Models in Terminal

New top story on Hacker News: Show HN: Talk to AI Models in Terminal

Show HN: Talk to AI Models in Terminal
7 by today072 | 0 comments on Hacker News.
Hi everyone, nice to meet you and I am a newcomer of HN. I have made a binary tool Aih that could communicate with Bard, ChatGPT, Claude, and Llama(HuggingChat) from the terminal. https://ift.tt/EbUNcid Since CAPTCHA challenges and bots detecting have become increasingly difficult, I've changed my strategy from hacking the APIs to simulating a real browser's action. The tool first takes the logged-in cookies of Google, ChatGPT, Claude, and HuggingChat accounts from the real Chrome browser, then it opens an invisible instance of Chromium for communication, then displays the answers in terminal. I think it's useful especially when I am researching some topics and need to compare answers of those AI models at the same time. Feel free to test and welcome provide feedback!

Saturday, 19 August 2023

Friday, 11 August 2023

Wednesday, 9 August 2023

Monday, 7 August 2023

Thursday, 3 August 2023

Wednesday, 2 August 2023

Tuesday, 1 August 2023

New top story on Hacker News: Show HN: Openexus – Building blocks for the internet

New top story on Hacker News: Show HN: Openexus – Building blocks for the internet

Show HN: Openexus – Building blocks for the internet
26 by lominming | 7 comments on Hacker News.
Hi HN! We are thrilled to share a sneak peek of https://openexus.com after months of work. You can try out the cool demos on the front page! The idea is to build a platform and community for composable building blocks where anyone can easily create, find and connect different modules to create dynamic and interactive apps, sites, dashboards, and docs. This can be done without using a single line of code. Key principles of what we are building: 1) True composability that enables infinite possibilities — Modules today are either too complex to be used only by developers (e.g. NPM packages) or too simplistic where they are usually used as an embed in isolation. We are establishing a modularization foundation that is powerful enough for developers to express functionalities, simple enough for anyone (even kids) to use, and flexible enough to build sophisticated creations. 2) Smart connect without code — All logic can be clearly expressed by simply drawing lines. Depending on the data type and other characteristics of the connector, we can figure out how the connection should behave. For example, triggers can only be connected to actions, and data connectors that fetch from APIs can be defined as directional read-only connectors. 3) Open-connections for instant forking — Forking today is a time-consuming complex endeavor. Even a simple change requires many layers of code understanding. Instead of open-source code, we see a future of open-source connections, where remixing simply means adding new blocks and re-wiring them. What we are building can be described as NPM for non-developers, connectable Lego blocks for the Internet, or Minecraft for non-game creations. Our focus is to create a community where we can share ideas and innovations. We are super excited about the possibilities of this platform, especially when we incorporate AI. We will be releasing tutorials, opening up the playground, and sending out invites in the coming weeks. If you are eager to try out our tooling and create your own building blocks, drop us an email! We would love to hear your feedback! Website: https://openexus.com Email: m@openexus.com