PyData
PyData
  • Видео 3 531
  • Просмотров 13 511 775
Jakub Hettler - Jupyter(Hub/Lab): Journey from On-prem to AWS [PyData Prague #18]
Let’s have a look at how we @AlmaCareer Czechia Business Intelligence team moved JupyterHub and JupyterLab from on-premise infrastructure to AWS. Why we used Amazon Sagemaker Studio for just 3 weeks and why we are happy with Jupyter running on top of Coder (coder.com) in AWS at the end. Infrastructure point of view with deeper dive into pros/cons of on-prem JupyterHub/Lab on Hashicorp Nomad, Amazon Sagemaker Studio and Coder. All this considering the requirements of 20 working users in JupyterLab.
Presented at PyData Prague #18 - A Vector from Lab to Hub (29.2.2024 at Pure Storage)
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United St...
Просмотров: 60

Видео

James Powell - Are generator-coroutines really the answer? | PyData London 2024
Просмотров 90717 часов назад
www.pydata.org As we all know (or, at least, as I've been trying to tell everyone,) generators in Python are an extremely powerful API design technique. A generator represents the linear decomposition of a single computation into multiple parts, and such decomposition proves very useful in practice. For example, we can model an infinite computation and only execute the portions we desire. Very ...
Dan Gibson - An Introduction to Retrieval Augmented Generation - PyData London 2024
Просмотров 43917 часов назад
How do you build chatbots that answer questions using your organisation's data? The answer is Retrieval Augmented Generation (RAG). In this session you'll be introduced to RAG and build a simple RAG powered chatbot in Python. Until very recently, if an organisation wanted a bespoke chatbot application, they had to spend millions of pounds and fund highly specialised teams, often training and ho...
Dr. Adam Hill - Empower Your Projects with Prefect's Pipeline Magic | PyData London 2024
Просмотров 49817 часов назад
Dr. Adam Hill : Mastering Data Flow: Empower Your Projects with Prefect's Pipeline Magic | PyData London 2024 Embark on a transformative journey into the realm of data engineering with our 90-minute workshop dedicated to Prefect 2. In this hands-on session, participants will learn the ins and outs of building robust data pipelines using the latest features and enhancements of Prefect 2. From da...
Issac Godfried - Multimodal Deep Learning in the Real World | PyData London 2024
Просмотров 28517 часов назад
Many real world business problems are multi-modal in nature and would benefit from using a combination of text, imagery, audio, and numerical data. Recently, there has been a surge in powerful deep learning models that fuse multiple modalities of data, however, fine-tuning, deploying, and versioning these models remains challenging for most companies. This tutorial will discuss some of the late...
Lex Avstreikh & Raymond Cunningham - Real-time AI Lakehouse | PyData London 2024
Просмотров 23117 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData In this tutorial we will build a AI system to assist you in finding the best bar for you to go to in London - maybe even this evening after the PyData conference. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum ...
Nick Radcliffe - Test-Driven Data Analysis in Python | PyData London 2024
Просмотров 90917 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Test-driven data analysis is a methodology and open-source Python library for improving quality in data processes. It covers three main areas: • Testing data (generating constraints and using them to validate new data) PyData is an educational program of NumFOCUS, a 501(c)3 non-profit org...
Sultan Al Awar - Generating Customers Insights with Topic Modelling and HuggingFace SetFit Method
Просмотров 23517 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Stop data skimming and dive deep into your customer voices! Are you working with a load of unstructured reviews and you would like to gain an understanding on what customers are commenting about? This hands-on tutorial equips you with powerful text analysis techniques to unlock hidden ins...
Marco Gorelli - How you (yes, you!) can write a Polars Plugin | PyData London 2024
Просмотров 62117 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Polars is a dataframe library taking the world by storm. It is very runtime and memory efficient and comes with a clean and expressive API. Sometimes, however, the built-in API isn't enough. And that's where its killer feature comes in: plugins. You can extend Polars, and solve practicall...
Andy Terrel & Jacob Tomlinson - GPU development in Python 101 | PyData London 2024
Просмотров 19717 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Since joining NVIDIA I’ve gotten to grips with the fundamentals of writing accelerated code in Python. I was amazed to discover that I didn’t need to learn C and I didn’t need new development tools. Writing GPU code in Python is easier today than ever, and in this tutorial, I will share w...
Kehinde Richard Ogunyale - Graph Database and Retrieval Augmented Generation | PyData London 2024
Просмотров 29217 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData In the era of large language models (LLMs), the integration of external, structured knowledge bases has emerged as a frontier for enhancing AI's textual comprehension and generation capabilities. The Retrieval-Augmented Generation (RAG) architecture represents a pivotal advancement in thi...
Datta & Rodríguez - Building the composable Python data stack with Kedro & Ibis | PyData London 2024
Просмотров 17917 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. However, now Ibis can provide the sam...
Fonnesbeck & Wiecki- Probabilistic Programming and Bayesian Computing with PyMC | PyData London 2024
Просмотров 29517 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData Bayesian statistical methods provide powerful tools for solving various data science problems. The Bayesian approach yields easy-to-interpret results and automatically accounts for uncertainty in our estimates or predictions. Although computational challenges have historically been an obs...
Keynote: Dr. Rebecca Bilbro - Mistakes were made: Data science ten years in | PyData London 2024
Просмотров 24917 часов назад
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through th...
Ines Montani - A practical guide to human-in-the-loop distillation | Pydata London 2024
Просмотров 33117 часов назад
PyData Website: www.pydata.org LinkedIn: www.linkedin.com/company/pydata-global Twitter: PyData As the field of natural language processing advances and new ideas develop, we’re seeing more and more ways to use compute efficiently, producing AI systems that are cheaper to run and easier to control. Large Language Models (LLMs) have enormous potential, but also challenge existing wor...
Colombo et al. - Building Multi-Agent Generative-AI Applications with AutoGen | Pydata London 2024
Просмотров 14217 часов назад
Colombo et al. - Building Multi-Agent Generative-AI Applications with AutoGen | Pydata London 2024
Patrick Hoefler - Dask DataFrame 2.0: Comparison to Spark, DuckDB and Polars | PyData London 2024
Просмотров 16217 часов назад
Patrick Hoefler - Dask DataFrame 2.0: Comparison to Spark, DuckDB and Polars | PyData London 2024
Noe Achache - RAG for a medical company: the technical and product challenges | Pydata London 2024
Просмотров 14517 часов назад
Noe Achache - RAG for a medical company: the technical and product challenges | Pydata London 2024
Hendrik Makait - Observability for Dask in Production | Pydata London 2024
Просмотров 7417 часов назад
Hendrik Makait - Observability for Dask in Production | Pydata London 2024
Emeli Dral - How continuous testing keeps your LLM on track | Pydata London 2024
Просмотров 16217 часов назад
Emeli Dral - How continuous testing keeps your LLM on track | Pydata London 2024
Keynote: Dr. Matthew Crooks - Data: Faithful or Traitor? | PyData London 2024
Просмотров 7517 часов назад
Keynote: Dr. Matthew Crooks - Data: Faithful or Traitor? | PyData London 2024
Cas Wognum - Using Zarr for drug discovery datasets in Polaris | PyData London 2024
Просмотров 9917 часов назад
Cas Wognum - Using Zarr for drug discovery datasets in Polaris | PyData London 2024
Carlos Samey - Linear Programming for Resource Allocation | PyData London 2024
Просмотров 18617 часов назад
Carlos Samey - Linear Programming for Resource Allocation | PyData London 2024
Hajime Takeda- How to Enhance Customer Targeting in Marketing - Pydata London 2024
Просмотров 24717 часов назад
Hajime Takeda- How to Enhance Customer Targeting in Marketing - Pydata London 2024
Jim Dowling - Function Calling for LLMs | PyData London 2024
Просмотров 20617 часов назад
Jim Dowling - Function Calling for LLMs | PyData London 2024
Adam Glustein - Enabling real-time insights through stream processing in Python | PyData London 2024
Просмотров 30717 часов назад
Adam Glustein - Enabling real-time insights through stream processing in Python | PyData London 2024
Luca Baggi - Uncertainty estimation at scale with functime | PyData London 2024
Просмотров 9717 часов назад
Luca Baggi - Uncertainty estimation at scale with functime | PyData London 2024
Chris Wilkin - Growing user engagement with RL-driven personalisation | PyData London 2024
Просмотров 13017 часов назад
Chris Wilkin - Growing user engagement with RL-driven personalisation | PyData London 2024
Alex Owens - What a serverless database means for users | PyData London 2024
Просмотров 14617 часов назад
Alex Owens - What a serverless database means for users | PyData London 2024
Keynote: Tania Allard - The art of building and sustaining successful OSS tools and infrastructure
Просмотров 7917 часов назад
Keynote: Tania Allard - The art of building and sustaining successful OSS tools and infrastructure

Комментарии

  • @masonholcombe3327
    @masonholcombe3327 9 часов назад

    smoothing having a similar closed form solution as ridge regression is so satisfying

  • @Sidsel-zo5iz
    @Sidsel-zo5iz 11 часов назад

    Lo spirito di generosità e collaborazione qui è incoraggiante. Ci ricorda che siamo più forti insieme.😻

  • @nicpetit318
    @nicpetit318 День назад

    12:14 😬

  • @kirill-markin
    @kirill-markin День назад

    🔥

  • @LucelDaSilva
    @LucelDaSilva День назад

    It's time to dueel🎉🎉🎉

  • @KhalilMuhammad
    @KhalilMuhammad День назад

    Lovely presentation, Jimmy

  • @azizxojaxusanov5455
    @azizxojaxusanov5455 День назад

    Or where can I get the pptx

  • @azizxojaxusanov5455
    @azizxojaxusanov5455 День назад

    Hello . Can you give me the pptx, your topic is interesting, I wanted to get to know it

  • @_ue_la_
    @_ue_la_ 2 дня назад

    Waouh😭❤️❤️

  • @PapiJack
    @PapiJack 2 дня назад

    Thanks for the video. I will hep next time to have the pointer on the screen as well. It's hard to follow sometimes because we can't see where are you pointing.

  • @moose304
    @moose304 2 дня назад

    90% of my dev work is on Windows and I switched to uv ~6mos ago. Favorite package manager so far, but I also don't create packages for Pypi. I've also always avoided conda after trying it and it just feeling very "heavy." But to each their own, use what works for ya. Nice entertaining talk! 👍

  • @graziellakelisamey4950
    @graziellakelisamey4950 2 дня назад

    Proud of you Mr SAMEY very good presentation and well explained!

  • @ivannz01
    @ivannz01 3 дня назад

    the title of the video (“achieving concurrency in streamlit”) doesn’t match the title of the talk and the description underneath (“open source leadership”)

  • @labeeb_ibrahim
    @labeeb_ibrahim 3 дня назад

    The code repo please.

  • @SerapioSergiovich
    @SerapioSergiovich 3 дня назад

    Nice video shows methods to create a business..

  • @SerapioSergiovich
    @SerapioSergiovich 3 дня назад

    Great method to viralize contents.

  • @chanebenchantre5055
    @chanebenchantre5055 3 дня назад

    Good presentation ❤

  • @fburton8
    @fburton8 3 дня назад

    Well, that was meaty!

  • @fmind-dev
    @fmind-dev 3 дня назад

    Great talk Emili. Looking forward to testing these new features.

  • @SerapioSergiovich
    @SerapioSergiovich 3 дня назад

    Nice video shows methods to develop a commercial startup.

  • @SerapioSergiovich
    @SerapioSergiovich 3 дня назад

    Nice video shows methods to create a business.

  • @FeverBonus
    @FeverBonus 3 дня назад

    What about LLMs

  • @fmind-dev
    @fmind-dev 3 дня назад

    Great talk. I will recommend it to newcomers in AI/ML forecasting.

  • @altvaro
    @altvaro 4 дня назад

    grande Feregrino 🙌

  • @herewegoagain2
    @herewegoagain2 4 дня назад

    Aren't the constraints restrictive in the sense they're univariate? They're definitely helpful but not exhaustive

    • @herewegoagain2
      @herewegoagain2 4 дня назад

      Most practical model 'failures' are due to relationship breakdowns even if they stay within individual constraints. I understand this library isn't meant to be a drift detection library but I think the current setup would work great with that use case

  • @sofdff
    @sofdff 4 дня назад

    Always good to see new players in python streaming ecosystem!

  • @polyagent
    @polyagent 4 дня назад

    Amazing overview of multi-modal landscape from the practitioner point of view.

  • @dimadem
    @dimadem 4 дня назад

    very useful speech!

  • @vitalizzare
    @vitalizzare 4 дня назад

    0:00 Introduction 2:10 Workflow >> Tools 2:47 Make's design decisions 5:37 Use Git 6:06 Use Virtual Environments 10:52 Virtual environment name == git repo name 11:34 Check-in your virtual environments 12:00 enviornment.yml 13:31 Script your environment management with "make" 13:57 Makefile 15:30 Never install packages manually 16:58 Use Auto Documentation 19:55 Separate "what you want" from "what you need" 22:28 Don't be afraid to "nuke it from orbit" 23:49 Summary 25:47 Q&A

  • @hadianasliwa9161
    @hadianasliwa9161 5 дней назад

    Learned a lot of paradoxes from Allen!

  • @GuorongLi-re7kt
    @GuorongLi-re7kt 6 дней назад

    One question, if we get overfitting propensity scores, then the overlap we want will be very small. It looks like conflict arguments here.

  • @RoulDukeGonzo
    @RoulDukeGonzo 8 дней назад

    Any idea why the GPU version of this method can't take a pre-computed distance matrix?

  • @prathameshyeole4566
    @prathameshyeole4566 9 дней назад

    Can you please tell me if I want the output as a percentage change, that is If I am varying my input parameter with a 1% change then by how much will my output change using the any of the analysis method that is Sobol or morris?

    • @HumbertoDaSilvaSantos-pz2nu
      @HumbertoDaSilvaSantos-pz2nu 8 дней назад

      Hello Dear, hope you are doing well. Reading your question, to me that sounds more to a local sensitivity analysis (LSA) related problem, which is even much simpler to run. In the case of Sobol or Morris methods, one usually is dealing with Global Sensitivity Analysis (GSA), and the investigation relies mainly on the contributions of model input uncertainty to model output uncertainty. In your case you can use Matrix Perturbation theory (The Computational Structure of Life Cycle Assessment, by Heijungs and Sangwon), or even simpler, you can run OAT One-factor-at-a-time by choosing one input, varying it by 1% while the remaing inputs are constant, run the model and see the effect of this 1% on the model output interested. Than repeat the procedure for the remaing parameters. I hope it helps. If you want to increase the reliability of your model, then move forward to Morris and Sobol, and for this you need an statistical analysis of the input because it is necessary the ranges (standard deviation) to see the variability of the input.

    • @itssharavariyeole2712
      @itssharavariyeole2712 8 дней назад

      Thank you, I will apply this

  • @NaveenSiddareddy
    @NaveenSiddareddy 10 дней назад

    really neat .. embedded graph.. pretty much creating a graph data struct for programming lang

  • @SerapioSergiovich
    @SerapioSergiovich 10 дней назад

    Nice video shows methods to create a business..

  • @SerapioSergiovich
    @SerapioSergiovich 10 дней назад

    Nice video shows methods to create a business..

  • @oniricosoy
    @oniricosoy 12 дней назад

    very inspiring 😄

  • @0MVR_0
    @0MVR_0 13 дней назад

    clustering is highly driven by the formatting of how the data relates to itself and is near impossible to accomplish using a single method of approach.

    • @RoulDukeGonzo
      @RoulDukeGonzo 8 дней назад

      Agree, but in practical terms, where do you start?

    • @0MVR_0
      @0MVR_0 8 дней назад

      @@RoulDukeGonzo An intimate descriptive knowledge of the data is recommended.

  • @xcimpe
    @xcimpe 13 дней назад

    is the notebook available online?

  • @johannes_81
    @johannes_81 14 дней назад

    Wow - awesome!!!

  • @mohamedibrahim1836
    @mohamedibrahim1836 14 дней назад

    Bayesian do over-fit a maximum entropy prior will tend to over-fit and encode all the sample noise:)

  • @UtkarshKoppikar
    @UtkarshKoppikar 14 дней назад

    Great talk!

  • @monamichoudhary7160
    @monamichoudhary7160 17 дней назад

    Thanks for sharing such insightful video, could you please link in the jupyter notebook and the additional notes?

  • @danielthompson2561
    @danielthompson2561 18 дней назад

    I wish it could lazy scan from a database - the lazy frame function is excellent, but for data security reasons, I’m working from secure database and not parquet files.

  • @vccuong1
    @vccuong1 19 дней назад

    Need to slow down.

  • @salarshahryari4843
    @salarshahryari4843 20 дней назад

    Can we train a deep learning in an incremental way?

  • @hannahnelson4569
    @hannahnelson4569 21 день назад

    A very impressive presentation and algorithm! Thank you for teaching all this!

  • @jp_morr
    @jp_morr 21 день назад

    Could you possibly make the presenters display a bit smaller, the text is a bit too big and preventing me from admiring the Escher style background

  • @javierbenito2150
    @javierbenito2150 22 дня назад

    Nice seing "the industry" apply vertical scaling stuff that has been done in games (for world data) for at least the last 20-30 years (since cache fetch fail is order of magnitudes more expensive than a cache normal read and SIMD/MMX intructions are available on x86. I thought that compilers took care of these processor "bowels" stuff since thewlast two decades, and probably they did until now. It seems today the data sets are so huge that we must micromanage memory access and vertical paralelism explicitelly again. For a litle while at least. So it in hindsight, explicit optimisation of code data access patterns was outrunned by processor might, and then was further rendered obsolete by multi-core CPUs: Iin theory, of course, because good usage of parallelism is still in its infancy. But now it seems data set size growth has outrunned CPU growth 🤣 Low level 3D engine game programmers, you have a whole new market opening!

  • @dmytrooliinyk3083
    @dmytrooliinyk3083 24 дня назад

    That's a great talk!