SquishyQuokka
SquishyQuokka

How do you write performant code for large data sources?

Lately, I have been trying to get into some AI workflows. It appears that conventional programming is usually less performant that Cython/.pyx implementations.

Some learnings from my explorations:

When working with DataFrames in Python, there are different methods to perform operations on data. Usually you would use iterrows, apply, or vectorized operations. Each method has different performance characteristics, especially as the size of the DataFrame grows.

To investigate this, I went to GPT and asked it for some synthetic simulation code to test out this thesis.

Generate DataFrames of sizes 10, 100, 1000, 10000, and 100000.

Then profile the following funcs:

  1. iterrows: Iterate through each row, adding a constant to a column's value.
  2. apply: Use the apply function with a lambda to add a constant to each item in a column.
  3. Vectorized: Directly add a constant to the column using vectorized operations.

So for large DataFrames, avoid iterrows due to its poor performance. Use vectorized operations for the best efficiency, and resort to apply only if vectorized operations are not feasible. Most of your usecases should ideally be solved via vectorized ops tbh.

Post image
21mo ago
Jobs
One interview, 1000+ job opportunities
Take a 10-min AI interview to qualify for numerous real jobs auto-matched to your profile 🔑
+322 new users this month
SillyMuffin
SillyMuffin
Cred21mo

You can get more advantage using Nvidia cuDF, Dask, Polars based on where the real bottleneck is for your usecase.

ZippyCupcake
ZippyCupcake

Polars.rs

SquishyQuokka
SquishyQuokka
Gojek21mo

Polars is insane.

SnoozyPickle
SnoozyPickle
TCS21mo

When will be promotion Cycle will be initiated in this quarter ?

JazzyWaffle
JazzyWaffle

tomorrow :)

SnoozyPickle
SnoozyPickle
TCS21mo

From where you got this information that it is open from tomorrow ?

FuzzyPickle
FuzzyPickle
Amazon21mo

Why are you posting this here bro. Post this on Medium / stack overflow

SquishyQuokka
SquishyQuokka
Gojek21mo

Ah I’ve been posting here for so long man. That’s why I post here.

Medium pe it’s just vanity. Atleast here people will actually read it.

MagicalPickle
MagicalPickle
EY21mo

+1 good

Discover more
Curated from across