@Sorcaeden

Sorcaeden@lemmy.world · 8 months ago

I seem to recall a VMware complaint similar to this too, and there was a ring buffer tuning to do to fix it… But that error message doesn’t seem quite right to match it.

TX queue timeouts can be caused by several things, but I wonder if you’re not seeing an end result of a spammy Ethernet flow control implementation where the device can’t transmit because the link is continuously paused.

If so, there may be rx_xoff counters viewable from within proxmox, or “ethtool -s enp1s0f0” would tell you where the device is seeing pause frames from the switch on a regular Linux host.

The link down tends to be a reaction by the driver to recover from a hung queue, so if it’s not flow control, there could be a driver/firmware upgrade possible, or a series of tunables if there’s a bug somewhere in packet handling land resulting in the NIC itself hanging.

Sorcaeden@lemmy.world · 10 months ago

It could be a case of disproportionate impact - consider that forecasting within Haier for their cloud API would probably be based upon X number of units in the field and Y number of average API calls per unit/user/premises. At 40,000 units in the field at 1000 calls per day (which they know because they designed the software, or at least had a hand in resourcing discussions), you have 40,000,000 calls per day.

If you have some third party app which is generating 4,000,000 calls by itself, and you see only 400 users doing this, then it’s a simple high usage target to hit.

Ad revenue, maybe. Tracking is still possible because it’s the same device, and if there’s any security at all, they’ll still have all the native API stuff they’d normally get, temperatures, weather, occupancy, etc.

I will say at a brief glance at the repo for the project that there’s some calls which imply it would get the local IP for the device, and may from there be able to issue calls direct to the device. That would make me think there’s only a few calls to their cloud to establish a relationship and product info, so the disproportionate load theory, barring bugs, doesn’t hold up. While it’s been a good brain exercise, we’ll be left guessing, and hoping Haier decides to be better.

Sorcaeden@lemmy.world · 10 months ago

To think, from a business perspective, that any notable portion of their userbase bought the devices with the explicit expectation that it would work with HA would be naive. We’re hobbyists, a niche market, the less-than-1% of their market evaluations. Losing those customers while reducing whatever burden or cost they’re incurring is probably worth it.
HA doesn’t - but while I don’t have any Haier equipment to say, the other smart devices in my house which aren’t either esphome or tasmota don’t connect locally to my devices, but to the vendor cloud API. Ecobee, Wyze, Traeger all do that instead.
Totally agreed. I think AWS API costs are a few cents to the thousand, so a discussion with the developer about the use would be the nice way instead of just kowtowing to the bean counters.

Sorcaeden@lemmy.world · 10 months ago

I am in no way defending their behavior, but API calls will always incur some cost - either in backend resource consumption with “paying” customers, or legitimate costs if they’re relying on AWS infrastructure.

However, like the whole reddit debacle, API usage isn’t always well optimized at the client end, and it can become a negotiation rather than a C&D…unless you’re looking to make a competitor as well.

Sorcaeden@lemmy.world · edit-2 10 months ago

I don’t pretend to be an expert in this, and I also have no idea what the state machine looks like for unauthenticated WiFi, but my thinking on the call stack is either you were authenticated and the association with the AP dropped while sending a frame and puked, or it kicked it while attempting to authenticate to an AP, and I have no idea why a mutex would be taken, or to what, but it timed out apparently.

So why would this happen after a rebuild?

freak accident/timing thing.
I see multiple mt## modules loaded, and I’m suspecting while not looking it up that they are operating a MediaTek chip in that dongle, and are potentially conflicting.
lots of wifi devices I’ve seen recently have loaded firmware separately from driver from /use/lib(or lib64)/firmware and the version changed from before, and maybe needs updating now or you did it before or whatever.

I agree with others - I’d give you a fiver if it happens again without the adapter connected.